## Occasional reflections on Life, the World, and Mathematics

### Screens or Weights?

I probably shouldn’t be spending so much of my time thinking about U.S. election polls: I have no special expertise, and everyone else in the country has lost interest by now. But I’ve just gotten some new information about a question that was puzzling me throughout the recent election campaign: What do pollsters mean when they refer to a likely voter screen?

At first I thought “likely voter screen” sounds like the pollsters ask a bunch of questions designed to ferret out a respondent’s likelihood of voting (like, are you likely to vote?), and then “screen out” — dismiss from the sample — respondents who are deemed unlikely to vote. But then I realised that this wouldn’t make any sense. A “screen” makes sense when talking about a property that is easy to evaluate and determinative for the goals of your survey. For instance, if you’re trying to find out what Hispanic residents of the US think about current immigration policy, you might as well just ask them up front (Are you Hispanic? — or, if you want to make sure they’re following the same definition that you are, perhaps some more specific questions, like Are you an immigrant from a majority Spanish-speaking country or a child of same? Is Spanish your primary language? Did you grow up speaking Spanish at home?), and if they don’t answer the right way you might as well hang up at that point. At that point, you’ll call your sample “Hispanic” rather than “likely Hispanic”, though of course you could be wrong.

When it’s a question of probabilities and proportions, what you want to do is reweighting. This is what the pollsters do with demographic variables. Suppose you’re doing a survey about dietary habits, and your sample includes 5% African Americans, 30% Hispanics, and 65% white; but you know the target population includes only 10% African Americans but 20% Hispanics. So in making up an average for the population you count each AA from your sample double, each Hispanic 2/3, and the white respondents are slightly upweighted (70/65) as well. (Alternatively, you stratify, meaning that you report the three groups separately.)

So, let’s say you do your calling, and find that 60% of the respondents are “likely voters” according to your screen, and 45% of them support Obama, with 50% supporting Romney. Of the remainder, the “unlikely voters”, 55% support Obama and 40 % support Romney. What number should you report? It’s not as though you have reason to think that the likely voters are going to come out and vote, and the others are going to stay home. Even if your questions are well chosen, the best you might be able to say would be something like “80% of the likely voters will ultimately vote, and 30% of the rest.” Once they show up at the polls (or don’t) it doesn’t matter which group they belonged to, so the only thing to do is to mix the populations to obtain a estimate for the true electorate. That yields
(0.8 x 0.6 x 45% + 0.3 x 0.4 x 55%)/(0.8 x 0.6 + 0.3 x 0.4) = 47% support for Obama, and 48% support for

On further reflection I decided that “likely voter screen” was really just imprecise journalese for reweighting. After all, professional pollsters certainly know how to do reweighting.

First thought, best thought, as they say. The fact that it’s nonsense doesn’t mean that that’s not what they do. Maybe I’m missing something? Here is an article about the Gallup likely voter screen.

Gallup uses a series of seven questions, including if they voted in previous elections, if they plan to vote this year, and if they know where their polling place is. Those who score highest on these measures are classified as likely voters.

On a somewhat related point, I posted some comments about Nate Silver’s situation as an attractor for Republican ire during the election campaign, that pulled in powerful currents of right-wing anti-intellectualism and anti-science sentiment. Robert Waldmann has posted a test on the blog Angry Bear, and the result is extremely impressive. He tested Silver’s state-by-state predictions, each of which came with a central estimate and a standard error. Assuming a normal model, each of these can easily be turned into a p-value for the true result. If his predictions were not just on the right side of zero, but genuinely were unbiased and even had the right sized confidence intervals, then these p-values should be like 50 samples from a uniform distribution on (0,1) (not independent though). So if you put the 50 p-values in order and plot them, they should line up approximately on a straight diagonal on the unit square. And that’s pretty much what you see.

This is a fairly sensitive test, and proves pretty conclusively that the predictions were genuine and carefully done. The most obvious temptation for someone who was being aggressively accused of exaggerating his certainty, or even biasing the predictions, would have been to find new sources of uncertainty — it’s easy to come up with a convincing story — to inflate the standard errors. This would have given him a convenient argument after the fact — “Sure, Obama lost state X when I said he’d probably win, but I only put a 65% probability on it”, or whatever — which is exactly the argument he was accused of making. (This is bullshit! He’s only predicting probabilities! He can always say he wasn’t actually “wrong”, no matter which way it turns out.) 50 samples are enough for the law of large numbers, so if he’d done that, it would have shown up conspicuously with p-values bunched near the middle, and a sigma-shaped plot.

In other words, not only did Silver get most of his predictions “right”, he also got the predicted number of predictions “wrong”, to essentially the right degree.