Exit Polling: A statistics refresher

Most of you are probably sick to death of Political campaign polls. But these numbers have become a mainstay of the American political process. In other words, we’re stuck with them, so you might as well get used to it — or at least understand the process as well as you can.

Last Friday, I wrote about how the national polls really don’t matter. That’s because our presidential elections depend on the Electoral College. We certainly don’t want to see one candidate win the popular vote, while the other wins the Electoral College, but it’s those electoral votes that really matter.

Still, polls matter too. I know, I know. Statistics can be created to support *any* cause or person. And that’s true. (Mark Twain popularized the saying, “There are lies, damned lies, and statistics.”) But good statistics are good statistics. These results are only as reliable as the process that created them.

But what is that process? If it’s been a while since you took a stats course, here’s a quick refresher. You can put it to use tomorrow when the media uses exit polls to predict election and referendum results before the polls close.

[laurabooks]

Random Sampling

If I wanted to know how my neighbors were voting in this year’s election, I could simply ask each of them. But surveying the population of an entire state — or all of the more than 200 million eligible voters in the U.S. — is downright impossible. So political pollsters depend on a tried-and-true method of gathering reliable information: random sampling.

A random sample does give a good snapshot of a population — but it may seem a bit mysterious. There are two obvious parts: random and sample.

The amazing thing about a sample is this: when it’s done properly (and I’ll get to that in a minute) the sample does accurately represent the entire population. The most common analogy is the basic blood draw. I’ve got a wonky thyroid, so several times a year, I need to check to see that my medication is keeping me healthy, which is determined by a quick look at my blood. Does the phlebotomist take all of my blood? Nope. Just a sample is enough to make the diagnosis.

The same thing is true with population samples. And in fact, there’s a magic number that works well enough for most situations: 1,000. (This is probably the hardest thing to believe, but it’s true!) For the most part, researchers are happy with a 95% confidence interval and a ±3% margin of error. This means that the results can be trusted with 95% accuracy, but only outside ±3% of the results. (More on that later.) According to the math, to reach this confidence level, only 1,000 respondents are necessary.

So we’re looking at surveying at least 1,000 people, right? But it’s not good enough to go door-to-door in one neighborhood to find these people. The next important feature is randomness.

If you put your hand in a jar full of marbles and pull one marble out, you’ve randomly selected that marble. That’s the task that pollsters have when choosing people to respond to their questions. And it’s not as hard as you might think.

Let’s take exit polls on Election Day. These are short surveys conducted at the voting polls themselves. As people exit the polling place, pollsters stop certain voters to ask a series of questions. The answers to these questions can predict how the election will end up and what influenced voters to vote a certain way.

The enemy of good polling is homogeneity. If only senior citizens who live in wealthy areas of a state are polled, well, the results will not be reliable. But randomness irons all of this out.

First, the polling place must be random. Imagine writing down the locations of all of the polling places in your state on little strips of paper. Then put all of these papers into a bowl, reach in and choose one. That’s the basic process, though this is done with computer programs now.

Then the polling times must be well represented. If a pollster only surveys people who voted in the morning, the results could be skewed to people who vote on their way home from their night-shift or don’t work at all or who are early risers, right? So, care is made to survey people at all times of the day.

And finally, it’s important to randomly select people to interview. Most often, this can be done by simply approaching every third voter who exits the polling place (or every other voter or every fifth voter; you get my drift).

Questions

But the questions being asked — or I should say the ways in which the questions are asked — are at least as important. These should not be “leading questions,” or queries that might prompt a particular response. Here’s an example:

Same-sex marriage is threatening to undermine religious liberty in our country. How do you plan to vote on Question 6, which legalizes same-sex marriage in the state?

(It’s easier to write a leading question asking for intent rather than a leading exit poll.)

Questions must be worded so that they illicit the most reliable responses. When they are confused or leading, the results cannot be trusted. Simplicity is almost always the best policy here.

Interpreting the Data

It’s not enough to just collect information. No survey results are 100 percent reliable 100 percent of the time. In fact, there are “disclaimers” for every single survey result. First of all, there’s a confidence level, which is generally 95%. This means exactly what you might think: Based on the sample size, we can be 95 percent confident that the results are accurate. Specifically, a 95% confidence interval covers 95 percent of the normal (or bell-shaped) curve.

The larger the random sample, the greater the confidence level or interval. The smaller the sample, the smaller the confidence level or interval. And the same is true for the margin of error.

But why 95%? The answer has to do with standard deviation or how much variation (deviation) there is from the mean or average of the data. When the data is normalized (or follows the normal or bell curve), 95% is plus or minus two standard deviations from the mean.

This isn’t the same thing as the margin of error, which represents the range of possibly incorrect results.

Let’s say exit polls show that Governor Romney is leading President Obama in Ohio by 2.5 percentage points. If the margin of error is 3%, Romney’s lead is within the margin of error. And therefore, the results are really a statistical tie. However, if he’s leading by 8 percentage points, it’s more likely the results are showing a true majority.

Of course, all of that depends — heavily — on the sampling and questions. If either or both of those are suspect, it doesn’t matter what the polling shows. We cannot trust the numbers. Unfortunately, we often don’t know how the samples were created or the questions were asked. Reliable statistics will include that information somewhere. And of course, you should only trust stats from sources that you can trust.

Summary

In short, there are three critical numbers in the most reliable survey results:

1,000 (sample size)
95% (confidence interval or level)
±3% (margin of error)

Look for these in the exit polling you hear about tomorrow. Compare the exit polls with the actual election results. Which polls turned out to be most reliable?

I’m not a statistician, but in my math books, you’ll learn math that you can apply to your everyday lives and help you understand polls and other such things.

P.S. I hope every single one of my U.S. readers (who are registered voters) will participate in our democratic process. Please don’t throw away your right to elect the people who make decisions on your behalf. VOTE!

Exit Polling: A statistics refresher

More posts

What, Where and How to do Mathematics

An Easy Approach to Mathematical Modeling

Math at Work Monday Gets Artsy

Numbers in the News: Teacher Salaries