Occasional reflections on Life, the World, and Mathematics

Archive for the ‘Politics’ Category

The Silver Standard 4: Reconsideration

After writing in praise of the honesty and accuracy of fivethirtyeight’s results, I felt uncomfortable about the asymmetry in the way I’d treated Democrats and Republicans in the evaluation. In the plots I made, low-probability Democratic predictions that went wrong pop out on the left-hand side, whereas low-probability Republican predictions  that went wrong would get buried in the smooth glide down to zero on the right-hand side. So I decided, what I’m really interested in are all low-probability predictions, and I should treat them symmetrically.

For each district there is a predicted loser (PL), with probability smaller than 1/2. In about one third of the districts the PL was assigned a probability of 0. The expected number of PLs (EPL) who would win is simply the sum of all the predicted win probabilities that are smaller than 1/2. (Where multiple candidates from the same party are in the race, I’ve combined them.) The 538 EPL was 21.85. The actual number of winning PLs was 13.

What I am testing is whether 538 made enough wrong predictions. This is the opposite of the usual evaluation, which gives points for getting predictions right. But when measured by their own predictions, the number of districts that went the opposite of the way they said was a lot lower than they said it would be. That is prima facie evidence that the PL win probabilities were being padded somewhat. To be more precise, under the 538 model the number of winning PLs should be approximately Poisson distributed with parameter 21.85, meaning that the probability of only 13 PLs winning is 0.030. Which is kind of low, but still pretty impressive, given all the complications of the prediction game.

Below I show plots of the errors for various scenarios, measuring the cumulative error for these symmetric low predictions. (I’ve added an “Extra Tarnished” scenario, with the transformation based on the even more extreme beta(.25,.25).) I show it first without adjusting for the total number of predicted winning PLs:

image

We see that tarnished predictions predict a lot more PL victories than we actually see. The actual predictions are just slightly more than you should expect, but suspiciously one-sided — that is, all in the direction of over predicting PL victories, consistent with padding the margins slightly, erring in the direction of claiming uncertainty.

And here is an image more like the ones I had before, where all the predictions are normalised to correspond to the same number of predicted wins:

TarnishedSymmetric

 

The Silver Standard, Part 3: The Reckoning

One of the accusations most commonly levelled against Nate Silver and his enterprise is that probabilistic predictions are unfalsifiable. “He never said the Democrats would win the House. He only said there was an 85% chance. So if they don’t win, he has an out.” This is true only if we focus on the top-level prediction, and ignore all the smaller predictions that went into it. (Except in the trivial sense that you can’t say it’s impossible that a fair coin just happened to come up heads 20 times in a row.)

So, since Silver can be tested, I thought I should see how 538’s predictions stood up in the 2018 US House election. I took their predictions of the probability of victory for a Democratic candidate in all 435 congressional districts (I used their “Deluxe” prediction) from the morning of 6 November. (I should perhaps note here that one third of the districts had estimates of 0 (31 districts) or 1 (113 districts), so a victory for the wrong candidate in any one of these districts would have been a black mark for the model.) I ordered the districts by the predicted probability, to compute the cumulative predicted number of seats, starting from the smallest. I plot them against the cumulative actual number of seats won, taking the current leader for the winner in the 11 districts where there is no definite decision yet.

Silver_PredictedvsActual

The predicted number of seats won by Democrats was 231.4, impressively close to the actual 231 won. But that’s not the standard we are judging them by, and in this plot (and the ones to follow) I have normalised the predicted and observed totals to be the same. I’m looking at the cumulative fractions of a seat contributed by each district. If the predicted probabilities are accurate, we would expect the plot (in green) to lie very close to the line with slope 1 (dashed red). It certainly does look close, but the scale doesn’t make it easy to see the differences. So here is the plot of the prediction error, the difference between the red dashed line and the green curve, against the cumulative prediction:

Silver_PredictedvsError

There certainly seems to have been some overestimation of Democratic chances at the low end, leading to a maximum cumulative overprediction of about 6 (which comes at district 155, that is, the 155th most Republican district). It’s not obvious whether these differences are worse than you would expect. So in the next plot we make two comparisons. The red curve replaces the true outcomes with simulated outcomes, where we assume the 538 probabilities are exactly right. This is the best case scenario. (We only plot it out to 100 cumulative seats, because the action is all at the low end. The last 150 districts have essentially no randomness. The red curve and the green curve look very similar (except for the direction; the direction of the error is random). The most extreme error in the simulated election result is a bit more than 5.

What would the curve look like if Silver had cheated, by trying to make his predictions all look less certain, to give himself an out when they go wrong? We imagine an alternative psephologist, call him Nate Tarnished, who has access to the exact true probabilities for Democrats to win each district, but who hedges his bets by reporting a probability closer to 1/2. (As an example, we take the cumulative beta(1/2,1/2) distribution function. this leaves 0, 1/2, and 1 unchanged, but .001 would get pushed up to .02, .05 is pushed up to .14, and .2 becomes .3. Similarly, .999 becomes .98 and .8 drops to .7. Not huge changes, but enough to create more wiggle room after the fact.

In this case, we would expect to accumulate much more excess cumulative predicted probability on the left side. And this is exactly what we illustrate with the blue curve, where the error repeatedly rises nearly to 10, before slowly declining to 0.

SilverTornished

I’d say the performance of the 538 models in this election was impressive. A better test would be to look at the predicted vote shares in all 435 districts. This would require that I manually enter all of the results, since they don’t seem to be available to download. Perhaps I’ll do that some day.

The Silver Standard: Stochastics pedagogy

I have written a number of times in support of Nate Silver and his 538 project: Here in general, and here in advance of the 2016 presidential elections. Here I want to make a comment about his salutary contribution to the public understanding of probability.

His first important contribution was to force determinism-minded journalists (and, one hopes, some of their readers) to grapple with the very notion of what a probabilistic prediction means. In the vernacular, “random” seems to mean only a fair coin flip. His background in sports analysis was helpful in this, because a lot of people spend a lot of time thinking about sports, and they are comfortable thinking about the outcomes of sporting contests as random, where the race is not always to the swift nor the battle to the strong, but that’s the way to bet. People understand intuitively that the “best team” will not win every match, and winning 3/4 of a large number of contests is evidence of overwhelming superiority. Analogies from sports and gaming have helped to support intuition, and have definitely improved the quality of discussion over the past decade, at least in the corners of the internet where I hang out.*

Frequently Silver is cited directly for obvious insights like that an 85% chance of winning (like his website’s current predicted probability of the Democrat’s winning the House of Representatives) is like the chance of rolling 1 through 5 on a six-sided die, which is to say, not something you should take for granted. But he has also made a great effort to convey more subtle insights into the nature of probabilistic prediction. I particularly appreciated this article by Silver, from a few weeks ago.

As you see reports about Republicans or Democrats giving up on campaigning in certain races for the House, you should ask yourself whether they’re about to replicate Clinton’s mistake. The chance the decisive race in the House will come somewhere you’re not expecting is higher than you might think…

It greatly helps Democrats that they also have a long tail of 19 “lean R” seats and 48 “likely R” seats where they also have opportunities to make gains. (Conversely, there aren’t that many “lean D” or “likely D” seats that Democrats need to defend.) These races are long shots individually for Democrats — a “likely R” designation means that the Democratic candidate has only between a 5 percent and 25 percent chance of winning in that district, for instance. But they’re not so unlikely collectively: In fact, it’s all but inevitable that a few of those lottery tickets will come through. On an average election night, according to our simulations, Democrats will win about six of the 19 “lean R” seats, about seven of the 48 “likely R” seats — and, for good measure, about one of the 135 “solid R” seats. (That is, it’s likely that there will be at least one total and complete surprise on election night — a race that was on nobody’s radar, including ours.)

This is a more subtle version of the problem that all probabilities get rounded to 0, 1, or 1/2. Conventional political prognosticators evaluate districts as “safe” or “likely” or “toss-up”. The likely or safe districts get written off as certain — which is reasonable from the point of view of individual decision-making — but cumulatively a large number of districts with a 10% chance of being won by the Democrat are simply different from districts with a 0% chance. It’s a good bet that the Republican will win each one, but if you have 50 of them it’s a near certainty that the Democrats will win at least 1, and a strong likelihood they will win 8 or more.

The analogy to lottery tickets isn’t perfect, though. The probabilities here don’t represent randomness so much as uncertainty. After 5 of these “safe” districts go the wrong way, you’re almost certainly going to be able to go back and investigate, and discover that there was a reason why it was misclassified. If you’d known the truth, you wouldn’t have called it safe it all. This enhances the illusion that no one loses a safe seat — only, toss-ups can be mis-identified as safe.

* On the other hand, Dinesh D’Souza has proved himself the very model of a modern right-wing intellectual with this tweet:

 

How to do it Canada-style

A continuing series (previous entries here, here, and here) about the kind of table-thumping simple-minded blather that you sometimes hear about public policy. It depends on drawing out very superficial aspects of the problem, and waving away the core difficulties with some appeal to optimism or courage or something. With reference to a Monty Python sketch, I call this approach How to Do It (HTDI).

Chancellor of the Exchequer Phillip Hammond has described Boris Johnson’s policy-analysis process, which is pure HTDI:

When the pair discussed a ‘Canada’ style trade deal, ‘Boris sits there and at the end of it he says ‘yeah but, er, there must be a way, I mean, if you just, if you, erm, come on, we can do it Phil, we can do it. I know we can get there.’ ‘And that’s it!’ exclaimed the Chancellor, mimicking the Old Etonian.

Senatorial finitude

A paradox. Senate majority leader Mitch McConnell has chided the Democrats for suggesting that basic honesty is essential for a Supreme Court justice, as well as not trying to rape anyone.

The time for endless delay and obstruction has come to a close,” he said.

This suggests that there was once a time for endless delay and obstruction, but it has now ended. The First Age has passed. The giants of legislative logic will fade into song and fable.

“When it becomes serious, you have to lie”

I wish I could think of some witty way to frame this, but some comments just have to speak for themselves. I’ve been reading the latest book by my favourite economic historian, Adam Tooze, who has moved from the financial history of the Third Reich and the First World War to examine in his new book the financial crash of 2007-8 and its aftermath. I’ve never had much time for those who see the EU being run by arrogant anti-democratic technocrats. But then we have this remark by Jean-Claude Juncker, then prime minister of Luxembourg and acting chair of the Eurogroup, now president of the European Commission:

Monetary policy is a serious issue. We should discuss this in secret, in the Eurogroup …. If we indicate possible decisions, we are fueling speculations on the financial markets and we are throwing in misery mainly the people we are trying to safeguard from this …. I am for secret, dark debates …. I’m ready to be insulted as being insufficiently democratic, but I want to be serious …. When it becomes serious, you have to lie.

I guess the best you can say is, this is macho posturing of a tax-evaders’ shill trying to show he’s tough enough to sit at the top table of power politics.

In the long dark night of the European soul, even a Luxembourgish prime minister dreams of being Metternich.

Credibility gap

So, this is weird, on a purely linguistic level: Donald Trump, commenting on yesterday’s Senate testimony about the Brett Kavanaugh sexual assault allegations, allowed that Christine Blasey Ford, the accuser, was a “very credible witness”, and that Brett Kavanaugh was “incredible”. I know, words acquire nonliteral meanings. But still…

Truth and reconciliation

Senator Lindsey Graham has lamented the chaotic way that old accusations of sexual abuse are resurfacing to derail men’s careers.

“If this is enough – 35 years in the past, no specifics about location and time, no corroboration – God help the next batch of nominees that come forward,” he told reporters. “It’s going to be hard to recruit good people if you go down based on allegations that are old and unverified.”

I think we can all agree that the current haphazard approach to reporting, investigating, and punishing sexual violence from the distant past, with mores changing and memories fraying, is not ideal, not for the victims, not for justice.

Ultimately, I think what we need is a Sexual Truth and Reconciliation Commission (STaR Commission). As in post-Apartheid South Africa, the Commission would be empowered to offer amnesty to offenders in exchange for confession of all sexual offenses, and full and frank accounts of the facts from the period of the War on Women.

Of course, before we can have the Truth and Reconciliation, we need first to overthrow the old regime of gender-apartheid and hold free and fair gender-neutral elections. That will be some time yet. By that time, we can hope that computer technology will have progressed to the point that it will be possible to store and distribute the complete record of the crimes.

Kavanaugh’s evil twin and the Hitler diaries

I was in high school when the Hitler diaries flashed across the media firmament, and I was fascinated by the eagerness with which so many responsible people accepted as plausible what were quickly unmasked as transparent frauds. An important selling point was the observation that the diaries never mentioned the extermination of the Jews, and I remember very specifically an article in Time magazine that teased the possibility that Hitler himself may not have known of the extent of the Holocaust, with speculation by historians that underlings may have acted on their own. I had an insight then about what would motivate people to seek out evidence that someone they “know” — even if knowing them only by their reputation as a famous monster — was innocent of an important crime. Just by learning about a historical figure we inevitably develop some psychological identification with him, he becomes one of our acquaintances, and then to mitigate the cognitive dissonance we are attracted to exculpatory evidence, even better if it is such as tends to diffuse responsibility rather than creating other specific monsters.

The writer Richard Marius once told me that after he had written his biography of Thomas More, where he had to come to some resolution on the purported crimes of Richard III, and decided that Richard was guilty of everything, he got harassed by people calling themselves Ricardians. They insisted that the criminals were Henry VII, or Edward Tyrell, or some anonymous unknowable others. Again, Richard III is a famous villain, but since he is famous, people identify him, and want to believe him not such a villain.

The French aphorism tout comprendre c’est tout pardonner goes deep. Bare familiarity is enough to create a motivation to pardon everything.

I see a connection to the way conservatives jumped at the theory that Christine Blasey Ford had indeed been sexually assaulted, but that she had mis-identified Brett Kavanaugh as the perpetrator. This doesn’t change anything about the number of evil people in the world, but it renders them anonymous. (Ed Whelan crossed a line when he went full Ricardian and accused a specific classmate of Kavanaugh’s. In principle, this serves all relevant purposes of the free-floating accusation, but by libelling a specific private citizen it created too many other complications and even, dare I suggest, moral qualms.) (more…)

The forerunner

Whatever one thinks of the burden of proof on sexual assault, the new reports on Brett Kavanaugh’s youthful leisure activities show that he anticipated the modern GOP in putting party ahead of principle.

Tag Cloud