Occasional reflections on Life, the World, and Mathematics

I cited this joke in the context of Brexit before:

A rabbi announces in synagogue, at the end of Yom Kippur, that he despairs at the burning need for wealth to be shared more equally. He will depart for the next year to travel through the world, speaking to all manner of people, ultimately to persuade the rich to share with the poor. On the following Yom Kippur he returns, takes his place at the head of the congregation without a word, and leads the service. At the end of the day congregants gather around him. “Rabbi, have you accomplished your goal? Will the rich now share with the poor?” And he says, “Halfway. The poor are willing to accept.”

And it’s returned, because we again have these headlines:

Except the backing of the “cabinet” means the backing of those who didn’t resign because they wouldn’t back the agreement. Including the Northern Ireland Secretary (!) and the Brexit Secretary (!!) who was tasked with negotiating the thing. So, except for all the members most relevant to the deal (oh, yes, the Foreign Secretary already resigned over Brexit a few months ago), Theresa May has managed to get her handpicked cabinet to accept the negotiated agreement. All that remains is the other half of the task, which is to get it through Parliament, which only requires that she corral votes from her coalition partner, who see the backstop on Northern Ireland as a fundamental betrayal, and sufficient members of the opposition parties, because why wouldn’t they sign up to take responsibility for an agreement that everyone will blame for everything that goes wrong, even if nothing goes wrong, rather than forcing new elections?

After writing in praise of the honesty and accuracy of fivethirtyeight’s results, I felt uncomfortable about the asymmetry in the way I’d treated Democrats and Republicans in the evaluation. In the plots I made, low-probability Democratic predictions that went wrong pop out on the left-hand side, whereas low-probability Republican predictions  that went wrong would get buried in the smooth glide down to zero on the right-hand side. So I decided, what I’m really interested in are all low-probability predictions, and I should treat them symmetrically.

For each district there is a predicted loser (PL), with probability smaller than 1/2. In about one third of the districts the PL was assigned a probability of 0. The expected number of PLs (EPL) who would win is simply the sum of all the predicted win probabilities that are smaller than 1/2. (Where multiple candidates from the same party are in the race, I’ve combined them.) The 538 EPL was 21.85. The actual number of winning PLs was 13.

What I am testing is whether 538 made enough wrong predictions. This is the opposite of the usual evaluation, which gives points for getting predictions right. But when measured by their own predictions, the number of districts that went the opposite of the way they said was a lot lower than they said it would be. That is prima facie evidence that the PL win probabilities were being padded somewhat. To be more precise, under the 538 model the number of winning PLs should be approximately Poisson distributed with parameter 21.85, meaning that the probability of only 13 PLs winning is 0.030. Which is kind of low, but still pretty impressive, given all the complications of the prediction game.

Below I show plots of the errors for various scenarios, measuring the cumulative error for these symmetric low predictions. (I’ve added an “Extra Tarnished” scenario, with the transformation based on the even more extreme beta(.25,.25).) I show it first without adjusting for the total number of predicted winning PLs:


We see that tarnished predictions predict a lot more PL victories than we actually see. The actual predictions are just slightly more than you should expect, but suspiciously one-sided — that is, all in the direction of over predicting PL victories, consistent with padding the margins slightly, erring in the direction of claiming uncertainty.

And here is an image more like the ones I had before, where all the predictions are normalised to correspond to the same number of predicted wins:



One of the accusations most commonly levelled against Nate Silver and his enterprise is that probabilistic predictions are unfalsifiable. “He never said the Democrats would win the House. He only said there was an 85% chance. So if they don’t win, he has an out.” This is true only if we focus on the top-level prediction, and ignore all the smaller predictions that went into it. (Except in the trivial sense that you can’t say it’s impossible that a fair coin just happened to come up heads 20 times in a row.)

So, since Silver can be tested, I thought I should see how 538’s predictions stood up in the 2018 US House election. I took their predictions of the probability of victory for a Democratic candidate in all 435 congressional districts (I used their “Deluxe” prediction) from the morning of 6 November. (I should perhaps note here that one third of the districts had estimates of 0 (31 districts) or 1 (113 districts), so a victory for the wrong candidate in any one of these districts would have been a black mark for the model.) I ordered the districts by the predicted probability, to compute the cumulative predicted number of seats, starting from the smallest. I plot them against the cumulative actual number of seats won, taking the current leader for the winner in the 11 districts where there is no definite decision yet.


The predicted number of seats won by Democrats was 231.4, impressively close to the actual 231 won. But that’s not the standard we are judging them by, and in this plot (and the ones to follow) I have normalised the predicted and observed totals to be the same. I’m looking at the cumulative fractions of a seat contributed by each district. If the predicted probabilities are accurate, we would expect the plot (in green) to lie very close to the line with slope 1 (dashed red). It certainly does look close, but the scale doesn’t make it easy to see the differences. So here is the plot of the prediction error, the difference between the red dashed line and the green curve, against the cumulative prediction:


There certainly seems to have been some overestimation of Democratic chances at the low end, leading to a maximum cumulative overprediction of about 6 (which comes at district 155, that is, the 155th most Republican district). It’s not obvious whether these differences are worse than you would expect. So in the next plot we make two comparisons. The red curve replaces the true outcomes with simulated outcomes, where we assume the 538 probabilities are exactly right. This is the best case scenario. (We only plot it out to 100 cumulative seats, because the action is all at the low end. The last 150 districts have essentially no randomness. The red curve and the green curve look very similar (except for the direction; the direction of the error is random). The most extreme error in the simulated election result is a bit more than 5.

What would the curve look like if Silver had cheated, by trying to make his predictions all look less certain, to give himself an out when they go wrong? We imagine an alternative psephologist, call him Nate Tarnished, who has access to the exact true probabilities for Democrats to win each district, but who hedges his bets by reporting a probability closer to 1/2. (As an example, we take the cumulative beta(1/2,1/2) distribution function. this leaves 0, 1/2, and 1 unchanged, but .001 would get pushed up to .02, .05 is pushed up to .14, and .2 becomes .3. Similarly, .999 becomes .98 and .8 drops to .7. Not huge changes, but enough to create more wiggle room after the fact.

In this case, we would expect to accumulate much more excess cumulative predicted probability on the left side. And this is exactly what we illustrate with the blue curve, where the error repeatedly rises nearly to 10, before slowly declining to 0.


I’d say the performance of the 538 models in this election was impressive. A better test would be to look at the predicted vote shares in all 435 districts. This would require that I manually enter all of the results, since they don’t seem to be available to download. Perhaps I’ll do that some day.

I have written a number of times in support of Nate Silver and his 538 project: Here in general, and here in advance of the 2016 presidential elections. Here I want to make a comment about his salutary contribution to the public understanding of probability.

His first important contribution was to force determinism-minded journalists (and, one hopes, some of their readers) to grapple with the very notion of what a probabilistic prediction means. In the vernacular, “random” seems to mean only a fair coin flip. His background in sports analysis was helpful in this, because a lot of people spend a lot of time thinking about sports, and they are comfortable thinking about the outcomes of sporting contests as random, where the race is not always to the swift nor the battle to the strong, but that’s the way to bet. People understand intuitively that the “best team” will not win every match, and winning 3/4 of a large number of contests is evidence of overwhelming superiority. Analogies from sports and gaming have helped to support intuition, and have definitely improved the quality of discussion over the past decade, at least in the corners of the internet where I hang out.*

Frequently Silver is cited directly for obvious insights like that an 85% chance of winning (like his website’s current predicted probability of the Democrat’s winning the House of Representatives) is like the chance of rolling 1 through 5 on a six-sided die, which is to say, not something you should take for granted. But he has also made a great effort to convey more subtle insights into the nature of probabilistic prediction. I particularly appreciated this article by Silver, from a few weeks ago.

As you see reports about Republicans or Democrats giving up on campaigning in certain races for the House, you should ask yourself whether they’re about to replicate Clinton’s mistake. The chance the decisive race in the House will come somewhere you’re not expecting is higher than you might think…

It greatly helps Democrats that they also have a long tail of 19 “lean R” seats and 48 “likely R” seats where they also have opportunities to make gains. (Conversely, there aren’t that many “lean D” or “likely D” seats that Democrats need to defend.) These races are long shots individually for Democrats — a “likely R” designation means that the Democratic candidate has only between a 5 percent and 25 percent chance of winning in that district, for instance. But they’re not so unlikely collectively: In fact, it’s all but inevitable that a few of those lottery tickets will come through. On an average election night, according to our simulations, Democrats will win about six of the 19 “lean R” seats, about seven of the 48 “likely R” seats — and, for good measure, about one of the 135 “solid R” seats. (That is, it’s likely that there will be at least one total and complete surprise on election night — a race that was on nobody’s radar, including ours.)

This is a more subtle version of the problem that all probabilities get rounded to 0, 1, or 1/2. Conventional political prognosticators evaluate districts as “safe” or “likely” or “toss-up”. The likely or safe districts get written off as certain — which is reasonable from the point of view of individual decision-making — but cumulatively a large number of districts with a 10% chance of being won by the Democrat are simply different from districts with a 0% chance. It’s a good bet that the Republican will win each one, but if you have 50 of them it’s a near certainty that the Democrats will win at least 1, and a strong likelihood they will win 8 or more.

The analogy to lottery tickets isn’t perfect, though. The probabilities here don’t represent randomness so much as uncertainty. After 5 of these “safe” districts go the wrong way, you’re almost certainly going to be able to go back and investigate, and discover that there was a reason why it was misclassified. If you’d known the truth, you wouldn’t have called it safe it all. This enhances the illusion that no one loses a safe seat — only, toss-ups can be mis-identified as safe.

* On the other hand, Dinesh D’Souza has proved himself the very model of a modern right-wing intellectual with this tweet:


Tom Lehrer on academic spam

One of the weirdest phenomena engendered by the tawdry incentive structure of academic publishing is the academic spam that we all get, such as this invitation that I just found in my inbox:

Dear Prof./Dr. D. Steinsaltz:
As Co-Editor-in-Chief of the International Journal of Statistics in Medical Research. I am pleased to invite you to join our editorial board member team or reviewer board team.

Esteemed researchers are invited to join the editorial boards of excellent journals, and it is a mark of prestige that they put on their CVs, etc. This is vaguely formulated like such an invitation, but then goes on with

If you are interested to be the part of the journal as member in editorial or reviewer board then please send your CV along with photograph.

making it clear that they have no idea who I am, and they’re just spraying out these invitations to all and sundry. It’s part of the fungal growth of journals needed to meet the needs for publications to fill other people’s CVs for their academic hiring and promotion. The best part was that this flattering invitation was closed off with a dutiful unsubscribe notice “If you wish not tor receive any such communication in future.” (Which also gave a taste of the standard of writing and editing expected for this journal.)

It reminded me of this line from Tom Lehrer’s outro to the song “Alma” (on the album That Was the Year that Was):

Not long ago I received a letter which said: “Darling, I love you, and I cannot live without you. Marry me, or I will kill myself.” Well, I was a little disturbed at that until I took another look at the envelope, and saw that it was addressed to Occupant.

Prescience and the opposite

I’ve just been reading a book of collected essays by Tony Judt, the wonderful historian of the 20th century who died in 2010. The book was from 2006, and some of his observations seem remarkably prescient, while others… have not aged well.

On the plus side is this, from the introduction:

It was in large measure thanks to the precautionary services and safety nets incorporated into their postwar systems of governance that the citizens of the advanced countries lost the gnawing sentiment of insecurity and fear which had dominated political life between 1914 and 1945.

Until now. For there are reasons to believe that this may be about to change. Fear is reemerging as an active ingredient of political life in Western democracies. Fear of terrorism, of course; but also, and perhaps more insidiously, fear of the uncontrollable speed of change, fear of the loss of employment, fear of losing ground to others in an increasingly unequal distribution of resources, fear of losing control of the circumstances and routines of one’s daily life. And, perhaps above all, fear that it is not just we who can no longer shape our lives but that those in authority have lost control as well, to forces beyond their reach.

Few democratic governments can resist the temptation to turn this sentiment of fear to political advantage. Some have already done so. In which case we should not be surprised to see the revival of pressure groups, political parties, and political programs based upon fear: fear of foreigners; fear of change; fear of open frontiers and open communications; fear of the free exchange of unwelcome opinions.

Those inclined to see Donald Trump as a sad symptom of decline for what was once a party of Republican giants, would be disappointed (in the extremely unlikely event that they would read this book) by his portrayal of Nixon’s foreign policy — in the context of reviewing William Bundy’s book on the subject — as a first-time-tragedy adumbration of Trumpism:

His criticism concerns deception, and the peculiar combination of duplicity and vagueness that marked foreign policy in the Nixon era. “The essential to good diplomacy,” Harold Nicolson once suggested, “is precision. The main enemy of good diplomacy is imprecision.” And, paradoxical as it may seem, the main source of imprecision in this era was the obsession with personal diplomacy…

[Nixon] was so absorbed in the recollection and anticipation of slights and injustices, real and imagined, that much of his time as president was taken up with “screwing” his foes, domestic and foreign alike: Even when he had a defensible plan to implement, such as his “new economic policy” of 1971…, he just couldn’t help seeing in it the additional benefit of “sticking it to the Japanese”. He warned even his allies against offering unwanted (critical) counsel… He surrounded himself with yes-men and hardly ever exposed his person or his policies to open debate among experts or more than one adviser at a time.

Purely neutral in the prescience-stakes I was amused to be reminded that the phrase “Make America Great Again” appeared as the subtitle of Peter Beinart’s 2007 Bushian-psycho-militarism-but-from-the-left screed.

On the other side of the ledger,

Liberalism in the United States today is the politics that dare not speak its name… Today a spreading me-first consensus has replaced vigorous public debate… And like their political counterparts, the critical intelligentsia once so prominent in American cultural life has fallen silent.

This seems like an accurate portrayal of the universal rejection of “liberalism” in the US in the GW Bush years, and Judt can’t really be faulted for not having predicted that nearly a decade after his death out-and-proud liberals would be battling self-proclaimed socialists for control of the Democratic party, while free-market ideologues would be trying to rebrand themselves as “classical liberals”.

And then, on its own special plane of awful there is his defence of Arthur Koestler against the accusation of his biographer that he was “a serial rapist”:

If Koestler were alive, he would surely sue for libel, and he would surely win. Even on Cesarani’s own evidence there is only one unambiguously attested charge of rape.

I think I have a pretty good memory of cultural change over my lifetime, but still I was amazed to see a smart and humane person — someone who entirely identified with the Left even — suggesting that a man who had violently raped a woman (with other accusations unproven or more ambiguous, or at least nonviolent) had been unfairly maligned by calling him a “serial rapist”. His confidence that the man would have prevailed at an imaginary libel trial is just extraordinary, and even more extraordinary is to consider that under the conditions that prevailed at the time, so recently, he might have been right.

So long, Sokal

I wonder how Alan Sokal feels about becoming the new Piltdown, the metonym for a a certain kind of hoax?

So now there’s another attack on trendy subfields of social science, being called “Sokal squared” for some reason. I guess it’s appropriate to the ambiguity of the situation. if you thought the Sokal Hoax was already big, squaring it would make it bigger; on the other hand, if you thought it was petty, this new version is just pettier. And if, like me, you thought it was just one of those things, the squared version is more or less the same.

The new version is unlike the original Sokal Hoax in one important respect: Sokal was mocking social scientists for their credulity about the stupid stuff physicists say. The reboot mocks social scientists for their credulity about the stupid stuff other social scientists say. A group of three scholars has produced a whole slew of intentionally absurd papers, in fields that they tendentiously call “grievance studies”, and managed to get them past peer review at some reputable journals. The hoaxers wink with facially ridiculous theses, like the account of canine rape culture in dog parks.

But if we’re not going to shut down bold thought, we have to allow for research whose aims and conclusions seem strange. (Whether paradoxical theses are unduly promoted for their ability to grab attention is a related but separate matter. For example, one of the few academic economics talks I ever attended was by a behavioural economist explaining the “marriage market” in terms of women’s trading off the steady income they receive from a husband against the potential income from prostitution that they would forego. And my first exposure to mathematical finance was a lecture on how a person with insider information could structure a series of profitable trades that would be undetectable by regulators.) If the surprising claim is being brought by a fellow scholar acting in good faith, trying to advance the discussion in the field, then you try to engage with the argument generously. You need to strike a balance, particularly when technical correctness isn’t a well-defined standard in your field. Trolling with fake papers poisons this cooperative process of evaluation. Read the rest of this entry »

Hannah Arendt on referenda

I decided it was about time to reread The Origins of Totalitarianism. I was pleased to come across her description of the role of referenda, which I have often thought of in the context of recent UK history, but whose origin I had forgotten:

The mob is primarily a group in which the residue of all classes are represented. This makes it so easy to mistake the mob for the people, which also comprises all strata of society… Plebiscites, therefore, with which modern mob leaders have obtained such excellent results, are an old concept of politicians who rely upon the mob.

I was also pleased to see this comment about Jules Guérin, the founder of the French Ligue Antisémite:

Ruined in business, he had begun his political career as a police stool pigeon, and acquired that flair for discipline and organization which invariably marks the underworld.

I think that is all the demonstration required for my honesty and good character.

How to do it Canada-style

A continuing series (previous entries here, here, and here) about the kind of table-thumping simple-minded blather that you sometimes hear about public policy. It depends on drawing out very superficial aspects of the problem, and waving away the core difficulties with some appeal to optimism or courage or something. With reference to a Monty Python sketch, I call this approach How to Do It (HTDI).

Chancellor of the Exchequer Phillip Hammond has described Boris Johnson’s policy-analysis process, which is pure HTDI:

When the pair discussed a ‘Canada’ style trade deal, ‘Boris sits there and at the end of it he says ‘yeah but, er, there must be a way, I mean, if you just, if you, erm, come on, we can do it Phil, we can do it. I know we can get there.’ ‘And that’s it!’ exclaimed the Chancellor, mimicking the Old Etonian.

I just discovered that Donna Strickland, the woman who just won a Nobel Prize in physics, is an associate professor at the University of Waterloo.

There are 20 full professors in the department, and I bet their research is pretty fucking amazing.

Tag Cloud