Continuing with our review of basic math skills, let’s take a little look-see at statistics. This field is not only vast (and confusing for many folks) but also hugely important in our daily lives. Just about every single thing we do has some sort of relationship to statistics — from watching television to buying a car to supporting a political candidate to making medical decisions. Like it or not, stats rule our world. Unfortunately, trusting bad data can lead to big problems.
First some definitions. A population is the entire group that the researchers are interested in. So, if a school system wants to know parents’ attitudes about school starting times, the population would be all parents and caregivers with children who attend school in that district.
A sample is a subset of the population. It would be nice to track the viewing habits of every single television viewer, but that’s just not a realistic endeavor. So A.C. Nielsen Co. puts its set-top boxes in a sample of homes. The trick is to be sure that this sample is big enough (more on that Friday) and that its representative. When samples don’t represent the larger population, the results aren’t worth a darn. Here’s an example:
Ever hear of President Landon? There’s good reason for that. But on Halloween 1936, a Literary Digestpoll predicted that Gov. Alfred Landon of Kansas would defeat President Franklin Delano Roosevelt come November.
And why not? The organization had come to this conclusion based on an enormous sample, mailing out 10 million sample ballots, asking recipients how they planned to vote. In fact, about 1 in 4 Americans had been asked to participate, with stunning results: the magazine predicted that Landon would win 57.1% of the popular vote and an electoral college margin of 370 to 161. The problem? This list was created using registers of telephone numbers, club membership rosters and magazine subscription lists.
Remember, this was 1936, the height of the Great Depression and also long before telephones and magazine subscriptions became common fixtures in most families. Literary Digest had sampled largely middle- and upper-class voters, which is not at all representative of the larger population. At the same time, only 2.4 million people actually responded to the survey, just under 25 percent of the original sample size.
On Election day, the American public delivered a scorching defeat to Gov. Landon, who won electoral college votes in Vermont and Maine only. This was also the death knell for Literary Digest, which folded a few years later.
This example neatly describes two forms of sample bias: selection bias and nonresponse bias. Selection bias occurs when there is a flaw in the sample selection process. In order for a statistic to be trustworthy, the sample must be representative of the entire population. For example, conducting a survey of homeowners in one neighborhood cannot represent all homeowners in a city.
Self-selection can also play a role in selection bias. If a poll, survey or study depends solely on participants volunteering on their own, the sample will not necessarily be representative of the entire population. There’s a certain amount of self-selection in any survey, poll or study. But there are ways to minimize the effects of this problem.
Nonresponse bias is related to self-selection. It occurs when people choose not to respond, often because doing so is too difficult. For this reason, mailed surveys are not the best option. In-person polling has the least risk of nonresponse bias, while telephone carries a slightly higher risk.
If you’re familiar with information technology, you know the old adage: Garbage in, garbage out. This definitely holds true for statistics. And this is precisely why Mark Twain’s characterization of number crunching — “Lies, damned lies and statistics” — is so apropos. When the sample is bad, the results will be too, but that doesn’t stop some from unintentionally or intentionally misleading the public with bad stats. If you plan to make good decisions at any point in your everyday life, well, you’d better be able to cull the lies from the good samples.
If you have questions about sample bias, please ask in the comments section. Meantime, here are the answers to last Wednesday’s practice with percentage change problems: –2%, 7%, –6%, –35%. Friday, we’ll talk about sample size, which (to me) is a magical idea. Really!