polling – Common Infirmities

The return of quota sampling

Everyone knows about the famous Dewey Defeats Truman headline fiasco, and that the Chicago Daily Tribune was inspired to its premature announcement by erroneous pre-election polls. But why were the polls so wrong?

The Social Science Research Council set up a committee to investigate the polling failure. Their report, published in 1949, listed a number of faults, including disparaging the very notion of trying to predict the outcome of a close election. But one important methodological criticism — and the one that significantly influenced the later development of political polling, and became the primary lesson in statistics textbooks — was the critique of quota sampling. (An accessible summary of lessons from the 1948 polling fiasco by the renowned psychologist Rensis Likert was published just a month after the election in Scientific American.)

Serious polling at the time was divided between two general methodologies: random sampling and quota sampling. Random sampling, as the name implies, works by attempting to select from the population of potential voters entirely at random, with each voter equally likely to be selected. This was still considered too theoretically novel to be widely used, whereas quota sampling had been established by Gallup since the mid-1930s. In quota sampling the voting population is modelled by demographic characteristics, based on census data, and each interviewer is assigned a quota to fill of respondents in each category: 51 women and 49 men, say, a certain number in the age range 21-34, or specific numbers in each “economic class” — of which Roper, for example, had five, one of which in the 1940s was “Negro”. The interviewers were allowed great latitude in filling their quotas, finding people at home or on the street.

In a sense, we have returned to quota sampling, in the more sophisticated version of “weighted probability sampling”. Since hardly anyone responds to a survey — response rates are typically no more than about 5% — there’s no way the people who do respond can be representative of the whole population. So pollsters model the population — or the supposed voting population — and reweight the responses they do get proportionately, according to demographic characteristics. If Black women over age 50 are thought to be equally common in the voting population as white men under age 30, but we have twice as many of the former as the latter, we count the responses of the latter twice as much as the former in the final estimates. It’s just a way of making a quota sample after the fact, without the stress of specifically looking for representatives of particular demographic groups.

Consequently, it has most of the deficiencies of a quota sample. The difficulty of modelling the electorate is one that has gotten quite a bit of attention in the modern context: We know fairly precisely how demographic groups are distributed in the population, but we can only theorise about how they will be distributed among voters at the next election. At the same time, it is straightforward to construct these theories, to describe them, and to test them after the fact. The more serious problem — and the one that was emphasised in the commission report in 1948, but has been less emphasised recently — is in the nature of how the quotas are filled. The reason for probability sampling is that taking whichever respondents are easiest to get — a “sample of convenience” — is sure to give you a biased sample. If you sample people from telephone directories in 1936 then it’s easy to see how they end up biased against the favoured candidate of the poor. If you take a sample of convenience within a small demographic group, such as middle-income people, then it won’t be easy to recognise how the sample is biased, but it may still be biased.

For whatever reason, in the 1930s and 1940s, within each demographic group the Republicans were easier for the interviewers to contact than the Democrats. Maybe they were just culturally more like the interviewers, so easier for them to walk up to on the street. And it may very well be that within each demographic group today Democrats are more likely to respond to a poll than Republicans. And if there is such an effect, it’s hard to correct for it, except by simply discounting Democrats by a certain factor based on past experience. (In fact, these effects can be measured in polling fluctuations, where events in the news lead one side or the other to feel discouraged, and to be less likely to respond to the polls. Studies have suggested that this effect explains much of the short-term fluctuation in election polls during a campaign.)

Interestingly, one of the problems that the commission found with the 1948 polling with relevance for the Trump era was the failure to consider education as a significant demographic variable.

All of the major polling organizations interviewed more people with college education than the actual proportion in the adult population over 21 and too few people with grade school education only.

The Silver Standard 4: Reconsideration

After writing in praise of the honesty and accuracy of fivethirtyeight’s results, I felt uncomfortable about the asymmetry in the way I’d treated Democrats and Republicans in the evaluation. In the plots I made, low-probability Democratic predictions that went wrong pop out on the left-hand side, whereas low-probability Republican predictions that went wrong would get buried in the smooth glide down to zero on the right-hand side. So I decided, what I’m really interested in are all low-probability predictions, and I should treat them symmetrically.

For each district there is a predicted loser (PL), with probability smaller than 1/2. In about one third of the districts the PL was assigned a probability of 0. The expected number of PLs (EPL) who would win is simply the sum of all the predicted win probabilities that are smaller than 1/2. (Where multiple candidates from the same party are in the race, I’ve combined them.) The 538 EPL was 21.85. The actual number of winning PLs was 13.

What I am testing is whether 538 made enough wrong predictions. This is the opposite of the usual evaluation, which gives points for getting predictions right. But when measured by their own predictions, the number of districts that went the opposite of the way they said was a lot lower than they said it would be. That is prima facie evidence that the PL win probabilities were being padded somewhat. To be more precise, under the 538 model the number of winning PLs should be approximately Poisson distributed with parameter 21.85, meaning that the probability of only 13 PLs winning is 0.030. Which is kind of low, but still pretty impressive, given all the complications of the prediction game.

Below I show plots of the errors for various scenarios, measuring the cumulative error for these symmetric low predictions. (I’ve added an “Extra Tarnished” scenario, with the transformation based on the even more extreme beta(.25,.25).) I show it first without adjusting for the total number of predicted winning PLs:

We see that tarnished predictions predict a lot more PL victories than we actually see. The actual predictions are just slightly more than you should expect, but suspiciously one-sided — that is, all in the direction of over predicting PL victories, consistent with padding the margins slightly, erring in the direction of claiming uncertainty.

And here is an image more like the ones I had before, where all the predictions are normalised to correspond to the same number of predicted wins:

TarnishedSymmetric

Small samples

New York Republican Representative Lee Zeldin was asked by reporter Tara Golshan how he felt about the fact that polls seem to show that a large majority of Americans — and even of Republican voters — oppose the Republican plan to reduce corporate tax rates. His response:

What I have come in contact with would reflect different numbers. So it would be interesting to see an accurate poll of 100 million Americans. But sometimes the polls get done of 1,000 [people].

Yes, that does seem suspicious, only asking 1,000 people… The 100 million people he has come in contact with are probably more typical.

Opinion polling can’t stabilise democracy

Something I’ve been thinking about since the Brexit vote: There was a prevailing sentiment at the time that the British people are inherently conservative, and so would never vote to upend the international order. In fact, they did, by a small but decisive margin. But how was this “conservatism” imagined to act? The difference between 52-48 for Leave and 48-52 is happening in the minds of 4% of the population who might have decided the other way. Except that there’s nothing to tell them that they are on the margin. If you are negotiating over a policy, even if you start with some strategically maximum demand, you can look at where you are and step back if it appears you’ve crossed a dangerous line.

A referendum offers two alternatives, and one of them has to win. (Of course, a weird thing about the Brexit vote is that only one side — Remain — had a clear proposal. Every Leave voter was voting for the Leave in his mind. In retrospect, the Leave campaign is trying to stretch the mantle of democratic legitimation over their maximal demands.) There is no feedback mechanism that tells an individual “conservative” voter that the line is being crossed. Continue reading “Opinion polling can’t stabilise democracy”

Soft bones

From political journalist Simon Maloy in Salon

People often note that public opinion of Hillary tends to be ossified after more than two decades spent continuously in the national political spotlight, but Trump’s unique and unrelenting awfulness as a candidate represents a timely opportunity to get voters to start thinking more positively about Hillary Clinton.

It amazes me that people can make claims like this, in light of the fact that her net favourability in Gallup polls has shifted by almost 50 points in three years. (She was at +31 in April 2015.)

Popularity contest

People talk about Hillary Clinton’s poll-reported unpopularity as though it represented some natural fact about her. A failure of character, or a judgement on her weakness as a politician or human being. But it hasn’t always been that way. Just to check my memory, I looked up Gallup’s record: In April 2013 64 percent of Americans surveyed had a favorable impression of her, as against 31 percent with an unfavorable impression. In May 2016 it was nearly reversed: 39 percent favorable, 54 percent unfavorable. Were there dastardly revelations about her character or public conduct in the interim? Or did she just happen to be the frontrunner in an ideologically heated Democratic primary? (By pure coincidence, the last time her relative favorability was negative was October 2000. I can’t remember what was going on then…)

As for Donald Trump (“Businessman Donald Trump”, as Gallup terms him) there has been only one Gallup survey — in June 2005 — that gave him a positive margin (51 to 38, so it wasn’t even close). Otherwise, every Gallup survey since they first asked about him in 1999 has negative favorability, usually by a wide margin.

The force of “overwhelming”

The New Republic has published a film review by Yishai Schwartz under the portentous title “The Edward Snowden Documentary Accidentally Exposes His Lies”. While I generally support — and indeed, am grateful — for what Snowden has done, I am also sensitive to the problems of democratic governance raised by depending on individuals to decide that conscience commands them to break the law. We are certainly treading on procedural thin ice, and our only recourse, despite the commendable wish of Snowden himself, as well as Greenwald, to push personalities into the background, is to think carefully about the motives — and the honesty — of the man who carried out the spying. So in principle I was very interested in what Schwartz has to say.

Right up front Schwartz states what he considers to be the central dishonesty of Snowden’s case:

Throughout this film, as he does elsewhere, Snowden couches his policy disagreements in grandiose terms of democratic theory. But Snowden clearly doesn’t actually give a damn for democratic norms. Transparency and the need for public debate are his battle-cry. But early in the film, he explains that his decision to begin leaking was motivated by his opposition to drone strikes. Snowden is welcome to his opinion on drone strikes, but the program has been the subject of extensive and fierce public debate. This is a debate that, thus far, Snowden’s and his allies have lost. The president’s current drone strikes enjoy overwhelming public support.

“Democratic theory” is a bit ambivalent about where the rights of democratic majorities to annihilate the rights — and, indeed, the lives — of individuals, but the reference to “overwhelming” public support is supposed to bridge that gap. So how overwhelming is that support? Commendably, Schwartz includes a link to his source, a Gallup poll that finds 65% of Americans surveyed support “airstrikes in other countries against suspected terrorists”. Now, just stopping right there for a minute, in my home state of California, 65% support isn’t even enough to pass a local bond measure. So it’s not clear that it should be seen as enough to trump all other arguments about democratic legitimacy.

Furthermore, if you read down to the next line, you find that when the targets to be exterminated are referred to as “US citizens living abroad who are suspected terrorists” the support falls to 42%. Not so overwhelming. (Support falls even further when the airstrikes are to occur “in the US”, but since that hasn’t happened, and would conspicuously arouse public debate if it did, it’s probably not all that relevant.) Not to mention that Snowden almost surely did not mean that he was just striking out at random to undermine a government whose drone policies he disapproves of; but rather, that democratic support for policies of targeted killing might be different if the public were aware of the implications of ongoing practices of mass surveillance. Continue reading “The force of “overwhelming””

Framing the question on electronic surveillance

Quinnipiac has published a poll purporting to find the following facts:

55 percent of Americans say Edward Snowden is a “whistle-blower”, as opposed to 34 percent calling him a “traitor”;
voters say 45 – 40 percent the government’s anti-terrorism efforts go too far restricting civil liberties, a reversal from a January 10, 2010 survey … when voters said 63 – 25 percent that such activities didn’t go far enough to adequately protect the country.
While voters support the phone-scanning program 51 – 45 percent and say 54 – 40 percent that it “is necessary to keep Americans safe,” they also say 53 – 44 percent that the program “is too much intrusion into Americans’ personal privacy”.

Now, the most striking thing to me is that 88 percent of the people surveyed in January 2010 thought they knew enough about the government’s intrusion on personal privacy to even formulate an opinion — in particular, that 63 percent thought they knew enough about the scope to say that it didn’t go far enough.

But even more interesting is the formulation of the question that got 54% to agree that “the phone-scanning program” is “necessary”. (It is noteworthy that at least 4% of those surveyed both support the program and believe that it is “too much intrusion”. They must have a different concept than I have of either the word “support” or “too much”.) What they were asked was

Do you support or oppose the federal government program in which all phone calls are scanned to see if any calls are going to a phone number linked to terrorism?

Now, if you put it that way, I’d kind of support it myself. “Scanning” sounds pretty innocuous, and “phone numbers linked to terrorism” sound pretty ominous. But that’s only a small part of what’s being done. They are receiving all metadata — that’s a lot more than just a phone number — and storing them, presumably, forever. They are data-mining to try to identify patterns. They are already, or are preparing to, store the content of all communications, so they may be examined in depth if there is sufficient reason in the future.

And how much of this is this about terrorism? We don’t know. And even if it is about terrorism right now, it won’t take long before enthusiastic or corrupt government officials think of all kinds of other legitimate purposes of government that could be promoted by just breaking down some of the petty bureaucratic restrictions on use of the data.

To put it in the crassest terms: This sort of unfocused big-data espionage may be marginally useful for catching terrorists, but it seems certain to be far more useful for pressuring or destroying political opponents of the anti-terror policies.

Screens or Weights?

I probably shouldn’t be spending so much of my time thinking about U.S. election polls: I have no special expertise, and everyone else in the country has lost interest by now. But I’ve just gotten some new information about a question that was puzzling me throughout the recent election campaign: What do pollsters mean when they refer to a likely voter screen? Continue reading “Screens or Weights?”

Are you demographic?

As a sometime demographer myself, I am fascinated by the prominence of “demographics” as an explanatory concept in the recent presidential election, now already slipping away into hazy memory. Recent political journalism would barely stand without this conceptual crutch, as here and here and here. A bit more nuance here. Some pushback from the NY Times here.

The crassest expression of this concept came in an article yesterday by (formerly?) respected conservative journalist Michael Barone, explaining why he was no longer confident that Mitt Romney would win the election by a large margin. Recall that several days before the election, despite the contrary evidence of what tens of thousands of voters were actually telling pollsters, he predicted 315 electoral votes for Romney, saying “Fundamentals usually prevail in American elections. That’s bad news for Barack Obama.” In retrospect, he says,

I was wrong because the outcome of the election was not determined, as I thought it would be, by fundamentals…. I think fundamentals were trumped by mechanics and, to a lesser extent, by demographics.

Continue reading “Are you demographic?”