The Silver Standard 4: Reconsideration

After writing in praise of the honesty and accuracy of fivethirtyeight’s results, I felt uncomfortable about the asymmetry in the way I’d treated Democrats and Republicans in the evaluation. In the plots I made, low-probability Democratic predictions that went wrong pop out on the left-hand side, whereas low-probability Republican predictions  that went wrong would get buried in the smooth glide down to zero on the right-hand side. So I decided, what I’m really interested in are all low-probability predictions, and I should treat them symmetrically.

For each district there is a predicted loser (PL), with probability smaller than 1/2. In about one third of the districts the PL was assigned a probability of 0. The expected number of PLs (EPL) who would win is simply the sum of all the predicted win probabilities that are smaller than 1/2. (Where multiple candidates from the same party are in the race, I’ve combined them.) The 538 EPL was 21.85. The actual number of winning PLs was 13.

What I am testing is whether 538 made enough wrong predictions. This is the opposite of the usual evaluation, which gives points for getting predictions right. But when measured by their own predictions, the number of districts that went the opposite of the way they said was a lot lower than they said it would be. That is prima facie evidence that the PL win probabilities were being padded somewhat. To be more precise, under the 538 model the number of winning PLs should be approximately Poisson distributed with parameter 21.85, meaning that the probability of only 13 PLs winning is 0.030. Which is kind of low, but still pretty impressive, given all the complications of the prediction game.

Below I show plots of the errors for various scenarios, measuring the cumulative error for these symmetric low predictions. (I’ve added an “Extra Tarnished” scenario, with the transformation based on the even more extreme beta(.25,.25).) I show it first without adjusting for the total number of predicted winning PLs:


We see that tarnished predictions predict a lot more PL victories than we actually see. The actual predictions are just slightly more than you should expect, but suspiciously one-sided — that is, all in the direction of over predicting PL victories, consistent with padding the margins slightly, erring in the direction of claiming uncertainty.

And here is an image more like the ones I had before, where all the predictions are normalised to correspond to the same number of predicted wins:



The Silver Standard, Part 3: The Reckoning

One of the accusations most commonly levelled against Nate Silver and his enterprise is that probabilistic predictions are unfalsifiable. “He never said the Democrats would win the House. He only said there was an 85% chance. So if they don’t win, he has an out.” This is true only if we focus on the top-level prediction, and ignore all the smaller predictions that went into it. (Except in the trivial sense that you can’t say it’s impossible that a fair coin just happened to come up heads 20 times in a row.)

So, since Silver can be tested, I thought I should see how 538’s predictions stood up in the 2018 US House election. I took their predictions of the probability of victory for a Democratic candidate in all 435 congressional districts (I used their “Deluxe” prediction) from the morning of 6 November. (I should perhaps note here that one third of the districts had estimates of 0 (31 districts) or 1 (113 districts), so a victory for the wrong candidate in any one of these districts would have been a black mark for the model.) I ordered the districts by the predicted probability, to compute the cumulative predicted number of seats, starting from the smallest. I plot them against the cumulative actual number of seats won, taking the current leader for the winner in the 11 districts where there is no definite decision yet.


The predicted number of seats won by Democrats was 231.4, impressively close to the actual 231 won. But that’s not the standard we are judging them by, and in this plot (and the ones to follow) I have normalised the predicted and observed totals to be the same. I’m looking at the cumulative fractions of a seat contributed by each district. If the predicted probabilities are accurate, we would expect the plot (in green) to lie very close to the line with slope 1 (dashed red). It certainly does look close, but the scale doesn’t make it easy to see the differences. So here is the plot of the prediction error, the difference between the red dashed line and the green curve, against the cumulative prediction:


There certainly seems to have been some overestimation of Democratic chances at the low end, leading to a maximum cumulative overprediction of about 6 (which comes at district 155, that is, the 155th most Republican district). It’s not obvious whether these differences are worse than you would expect. So in the next plot we make two comparisons. The red curve replaces the true outcomes with simulated outcomes, where we assume the 538 probabilities are exactly right. This is the best case scenario. (We only plot it out to 100 cumulative seats, because the action is all at the low end. The last 150 districts have essentially no randomness. The red curve and the green curve look very similar (except for the direction; the direction of the error is random). The most extreme error in the simulated election result is a bit more than 5.

What would the curve look like if Silver had cheated, by trying to make his predictions all look less certain, to give himself an out when they go wrong? We imagine an alternative psephologist, call him Nate Tarnished, who has access to the exact true probabilities for Democrats to win each district, but who hedges his bets by reporting a probability closer to 1/2. (As an example, we take the cumulative beta(1/2,1/2) distribution function. this leaves 0, 1/2, and 1 unchanged, but .001 would get pushed up to .02, .05 is pushed up to .14, and .2 becomes .3. Similarly, .999 becomes .98 and .8 drops to .7. Not huge changes, but enough to create more wiggle room after the fact.

In this case, we would expect to accumulate much more excess cumulative predicted probability on the left side. And this is exactly what we illustrate with the blue curve, where the error repeatedly rises nearly to 10, before slowly declining to 0.


I’d say the performance of the 538 models in this election was impressive. A better test would be to look at the predicted vote shares in all 435 districts. This would require that I manually enter all of the results, since they don’t seem to be available to download. Perhaps I’ll do that some day.

Senatorial finitude

A paradox. Senate majority leader Mitch McConnell has chided the Democrats for suggesting that basic honesty is essential for a Supreme Court justice, as well as not trying to rape anyone.

The time for endless delay and obstruction has come to a close,” he said.

This suggests that there was once a time for endless delay and obstruction, but it has now ended. The First Age has passed. The giants of legislative logic will fade into song and fable.

Credibility gap

So, this is weird, on a purely linguistic level: Donald Trump, commenting on yesterday’s Senate testimony about the Brett Kavanaugh sexual assault allegations, allowed that Christine Blasey Ford, the accuser, was a “very credible witness”, and that Brett Kavanaugh was “incredible”. I know, words acquire nonliteral meanings. But still…

Truth and reconciliation

Senator Lindsey Graham has lamented the chaotic way that old accusations of sexual abuse are resurfacing to derail men’s careers.

“If this is enough – 35 years in the past, no specifics about location and time, no corroboration – God help the next batch of nominees that come forward,” he told reporters. “It’s going to be hard to recruit good people if you go down based on allegations that are old and unverified.”

I think we can all agree that the current haphazard approach to reporting, investigating, and punishing sexual violence from the distant past, with mores changing and memories fraying, is not ideal, not for the victims, not for justice.

Ultimately, I think what we need is a Sexual Truth and Reconciliation Commission (STaR Commission). As in post-Apartheid South Africa, the Commission would be empowered to offer amnesty to offenders in exchange for confession of all sexual offenses, and full and frank accounts of the facts from the period of the War on Women.

Of course, before we can have the Truth and Reconciliation, we need first to overthrow the old regime of gender-apartheid and hold free and fair gender-neutral elections. That will be some time yet. By that time, we can hope that computer technology will have progressed to the point that it will be possible to store and distribute the complete record of the crimes.

The forerunner

Whatever one thinks of the burden of proof on sexual assault, the new reports on Brett Kavanaugh’s youthful leisure activities show that he anticipated the modern GOP in putting party ahead of principle.

Pathetic Republicans

The name of the Republican Party derives ultimately from the Latin res publica, meaning “public matters”. Which suggests a posture diametrically opposed to that which Senate Republicans have taken to the Kavanaugh sexual assault accusation. It seems obvious that, from a public policy perspective, there are broadly three possible stances you could have toward these allegations: 1) They are facially incredible; 2) They are credible but irrelevant to his fitness to serve on the Supreme Court; 3) If true they may (or certainly do) disqualify him from the Supreme Court, so it is essential to take pains to ascertain their truth or falsity. (I suppose there is a fourth as well, the mirror of (1): We believe the accuser, so the nomination simply needs to be withdrawn.)

Republicans seem to have settled bizarrely on the first part of (3), but then veered off into personal pathos. The accuser, Christine Blasey Ford, has requested that an independent investigation determine some essential facts before her testimony, and that other witnesses be called. Republicans have rejected this, and seem generally to represent her testimony as a personal favour to her, to assuage her suffering.

Senator John Cornyn:

We don’t know if she’s coming or not but this is her chance. This is her one chance. We hope she does.

Why is it “her chance”? Presumably it is the nation’s chance to avoid having an attempted rapist and perjurer on the Supreme Court. I understand that the disposition of a crucial witness is important, but surely that cannot affect the need to resolve the matter before an irrevocable decision is taken. Senator Bob Corker:

I just felt that it was important that if she had these types of serious allegations that she ought to have the opportunity to be heard. And I hope she is going to take advantage of that. If she doesn’t — that’s a whole other thing.

Majority Leader Mitch McConnell:

Dr. Ford has talked to the Washington Post, indicated she wants to talk to the committee, and we’re going to give her that opportunity on Monday.

Pseudo-centrist and sometime feminist Susan Collins is particularly concerned about Brett Kavanaugh’s feelings:

I think it’s not fair for Judge Kavanaugh for her not to come forward and testify.

The subtext is, if she doesn’t help us, we’ll just have to move ahead and confirm him. Which suggests that they really don’t think this is important, raising the question, why are they inviting her to testify at all? Surely the Senate Judiciary Committee is not the place for a public therapy session, particularly when the witness will be bringing great public opprobrium on herself, regardless of how the hearings turn out.

Had enough Kavanaugh?

To adopt for a moment the president’s rhetorical style:

The Federalist Society isn’t sending their best. They’re sending people that have lots of problems, and they’re bringing those problems with them. They’re bringing crime. They’re rapists. And some, I assume, are good people.
We need a total and complete shutdown of men being appointed to positions of power and influence until our country’s representatives can figure out what is going on. Until we are able to determine and understand this problem and the dangerous threat it poses, our country cannot be the victims of horrendous attacks by people that believe only in male supremacy, and have no sense of reason or respect for human life.

Trump’s branding

In reading Donald Trump’s rant on the anonymous freak who wrote in the NY Times that, yes, Donald Trump is a raving loon, but no need to take any extreme measures like electing Democrats, because the people supposedly working for him have everything under control, I was reminded of a weird tic that Trump has that I’ve never seen remarked upon. It’s in this line:

“We have somebody in what I call the failing New York Times talking about he’s part of the resistance within the Trump administration. This is what we have to deal with,” he told reporters in the East Room early Wednesday evening.

Now, if you’re trying to insult someone, you say, “He’s an idiot.” You don’t say, “He’s what I call an idiot.” Calling attention to the fact that this is merely your private designation saps the force of the insult.

Trump is enormously proud of his ability to brand people with epithets (even if no one else actually uses them). So proud, that he needs to call attention to his invention at every opportunity, even against the objective of the epithets. One of the many ways that he acts like a toddler (or a Hollywood producer). “Look Mama, I made it self!”

I imagine a version of the Odyssey featuring Homer’s trademarked characters “what I call grey-eyed Athena” and “Odysseus, or as I call him, ‘sacker of cities'”.

Fake secrets. A paradox of Trumpism

The story of Donald Trump’s effort to intimidate his critics by threatening to revoke security clearances has entered a new and paradoxical phase:

On Sunday, national security adviser John Bolton… told ABC’s This Week: “A number of people have commented that [Brennan] couldn’t be in the position he’s in of criticizing President Trump and his so-called collusion with Russia unless he did use classified information.”

I thought the story was that the “so-called collusion” was all a pack of lies. But can lies be classified? Is there fiction whose release threatens national security? Or are there lies which cannot be told without certain secret true information?

Truly, a paradox.

