Occasional reflections on Life, the World, and Mathematics

Posts tagged ‘science and society’

Schrödinger’s menu

I was just rereading Erwin Schrödinger’s pathbreaking 1944 lectures What is Life? which is often praised for its prescience — and influence — on the foundational principals of genetics in the second half of the twentieth century. At one point, in developing the crucial importance of his concept of negative entropy as the driver of life,  he remarked on the misunderstanding that “energy” is what organisms draw from their food. In an ironic aside he says

In some very advanced country (I don’t remember whether it was Germany or the U.S.A. or both) you could find menu cards in restaurants indicating, in addition to the price, the energy content of every dish.

Also prescient!

How odd that the only biological organisms that Schrödinger is today commonly associated with are cats…

FDA sample menu with energy content

The Silver Standard 4: Reconsideration

After writing in praise of the honesty and accuracy of fivethirtyeight’s results, I felt uncomfortable about the asymmetry in the way I’d treated Democrats and Republicans in the evaluation. In the plots I made, low-probability Democratic predictions that went wrong pop out on the left-hand side, whereas low-probability Republican predictions  that went wrong would get buried in the smooth glide down to zero on the right-hand side. So I decided, what I’m really interested in are all low-probability predictions, and I should treat them symmetrically.

For each district there is a predicted loser (PL), with probability smaller than 1/2. In about one third of the districts the PL was assigned a probability of 0. The expected number of PLs (EPL) who would win is simply the sum of all the predicted win probabilities that are smaller than 1/2. (Where multiple candidates from the same party are in the race, I’ve combined them.) The 538 EPL was 21.85. The actual number of winning PLs was 13.

What I am testing is whether 538 made enough wrong predictions. This is the opposite of the usual evaluation, which gives points for getting predictions right. But when measured by their own predictions, the number of districts that went the opposite of the way they said was a lot lower than they said it would be. That is prima facie evidence that the PL win probabilities were being padded somewhat. To be more precise, under the 538 model the number of winning PLs should be approximately Poisson distributed with parameter 21.85, meaning that the probability of only 13 PLs winning is 0.030. Which is kind of low, but still pretty impressive, given all the complications of the prediction game.

Below I show plots of the errors for various scenarios, measuring the cumulative error for these symmetric low predictions. (I’ve added an “Extra Tarnished” scenario, with the transformation based on the even more extreme beta(.25,.25).) I show it first without adjusting for the total number of predicted winning PLs:

image

We see that tarnished predictions predict a lot more PL victories than we actually see. The actual predictions are just slightly more than you should expect, but suspiciously one-sided — that is, all in the direction of over predicting PL victories, consistent with padding the margins slightly, erring in the direction of claiming uncertainty.

And here is an image more like the ones I had before, where all the predictions are normalised to correspond to the same number of predicted wins:

TarnishedSymmetric

 

The Silver Standard, Part 3: The Reckoning

One of the accusations most commonly levelled against Nate Silver and his enterprise is that probabilistic predictions are unfalsifiable. “He never said the Democrats would win the House. He only said there was an 85% chance. So if they don’t win, he has an out.” This is true only if we focus on the top-level prediction, and ignore all the smaller predictions that went into it. (Except in the trivial sense that you can’t say it’s impossible that a fair coin just happened to come up heads 20 times in a row.)

So, since Silver can be tested, I thought I should see how 538’s predictions stood up in the 2018 US House election. I took their predictions of the probability of victory for a Democratic candidate in all 435 congressional districts (I used their “Deluxe” prediction) from the morning of 6 November. (I should perhaps note here that one third of the districts had estimates of 0 (31 districts) or 1 (113 districts), so a victory for the wrong candidate in any one of these districts would have been a black mark for the model.) I ordered the districts by the predicted probability, to compute the cumulative predicted number of seats, starting from the smallest. I plot them against the cumulative actual number of seats won, taking the current leader for the winner in the 11 districts where there is no definite decision yet.

Silver_PredictedvsActual

The predicted number of seats won by Democrats was 231.4, impressively close to the actual 231 won. But that’s not the standard we are judging them by, and in this plot (and the ones to follow) I have normalised the predicted and observed totals to be the same. I’m looking at the cumulative fractions of a seat contributed by each district. If the predicted probabilities are accurate, we would expect the plot (in green) to lie very close to the line with slope 1 (dashed red). It certainly does look close, but the scale doesn’t make it easy to see the differences. So here is the plot of the prediction error, the difference between the red dashed line and the green curve, against the cumulative prediction:

Silver_PredictedvsError

There certainly seems to have been some overestimation of Democratic chances at the low end, leading to a maximum cumulative overprediction of about 6 (which comes at district 155, that is, the 155th most Republican district). It’s not obvious whether these differences are worse than you would expect. So in the next plot we make two comparisons. The red curve replaces the true outcomes with simulated outcomes, where we assume the 538 probabilities are exactly right. This is the best case scenario. (We only plot it out to 100 cumulative seats, because the action is all at the low end. The last 150 districts have essentially no randomness. The red curve and the green curve look very similar (except for the direction; the direction of the error is random). The most extreme error in the simulated election result is a bit more than 5.

What would the curve look like if Silver had cheated, by trying to make his predictions all look less certain, to give himself an out when they go wrong? We imagine an alternative psephologist, call him Nate Tarnished, who has access to the exact true probabilities for Democrats to win each district, but who hedges his bets by reporting a probability closer to 1/2. (As an example, we take the cumulative beta(1/2,1/2) distribution function. this leaves 0, 1/2, and 1 unchanged, but .001 would get pushed up to .02, .05 is pushed up to .14, and .2 becomes .3. Similarly, .999 becomes .98 and .8 drops to .7. Not huge changes, but enough to create more wiggle room after the fact.

In this case, we would expect to accumulate much more excess cumulative predicted probability on the left side. And this is exactly what we illustrate with the blue curve, where the error repeatedly rises nearly to 10, before slowly declining to 0.

SilverTornished

I’d say the performance of the 538 models in this election was impressive. A better test would be to look at the predicted vote shares in all 435 districts. This would require that I manually enter all of the results, since they don’t seem to be available to download. Perhaps I’ll do that some day.

So long, Sokal

I wonder how Alan Sokal feels about becoming the new Piltdown, the metonym for a a certain kind of hoax?

So now there’s another attack on trendy subfields of social science, being called “Sokal squared” for some reason. I guess it’s appropriate to the ambiguity of the situation. if you thought the Sokal Hoax was already big, squaring it would make it bigger; on the other hand, if you thought it was petty, this new version is just pettier. And if, like me, you thought it was just one of those things, the squared version is more or less the same.

The new version is unlike the original Sokal Hoax in one important respect: Sokal was mocking social scientists for their credulity about the stupid stuff physicists say. The reboot mocks social scientists for their credulity about the stupid stuff other social scientists say. A group of three scholars has produced a whole slew of intentionally absurd papers, in fields that they tendentiously call “grievance studies”, and managed to get them past peer review at some reputable journals. The hoaxers wink with facially ridiculous theses, like the account of canine rape culture in dog parks.

But if we’re not going to shut down bold thought, we have to allow for research whose aims and conclusions seem strange. (Whether paradoxical theses are unduly promoted for their ability to grab attention is a related but separate matter. For example, one of the few academic economics talks I ever attended was by a behavioural economist explaining the “marriage market” in terms of women’s trading off the steady income they receive from a husband against the potential income from prostitution that they would forego. And my first exposure to mathematical finance was a lecture on how a person with insider information could structure a series of profitable trades that would be undetectable by regulators.) If the surprising claim is being brought by a fellow scholar acting in good faith, trying to advance the discussion in the field, then you try to engage with the argument generously. You need to strike a balance, particularly when technical correctness isn’t a well-defined standard in your field. Trolling with fake papers poisons this cooperative process of evaluation. (more…)

An inspiration to associate professors everywhere

I just discovered that Donna Strickland, the woman who just won a Nobel Prize in physics, is an associate professor at the University of Waterloo.

There are 20 full professors in the department, and I bet their research is pretty fucking amazing.

Anti-publishing

George Monbiot has launched an exceptionally dyspeptic broadside in the Guardian against academic publishing, and in support of the heroic/misguided data scraper Alexandra Elbakyan, who downloaded millions of papers, and made them available on a pirate server.

I agree with the headline “Scientific publishing is a rip-off. We fund the research – it should be free”, but disagree with most of the reasoning. Or, maybe it would be better said, from my perspective as an academic his complaints seem to me not the most significant.

Monbiot’s perspective is that of a cancer patient who found himself blocked from reading the newest research on his condition. I think, though, he has underestimated the extent to which funding bodies in the UK and US, and now in the EU as well, have placed countervailing pressure for publicly funded research to be made available in various versions of “open access”, generally within six months of journal publication. In many fields — though not the biomedical research of most interest to Monbiot — it has long been the case that journal publication is an afterthought, with research papers published first as “preprints” on freely accessible archive sites. (more…)

Hysterical costs

There’s an interesting article in the NY Times about a young legal scholar, Lina Khan, who is gaining attention for a novel and detailed argument that antitrust enforcement in the US has come to be inappropriately fixated on price as the sole anticompetitive harm, and so giving a free pass to Amazon. I have no original thoughts about the argument, but I am intrigued by the dismissive language of the critics cited in the article. One (antitrust lawyer Konstantin Medvedovsky) called her approach “hipster antitrust”. And then there’s this:

Herbert Hovenkamp, an antitrust expert at the University of Pennsylvania Law School, wrote that if companies like Amazon are targeted simply because their low prices hurt competitors, we might “quickly drive the economy back into the Stone Age, imposing hysterical costs on everyone.”

Is “hysterical costs” a real thing? Or was he just reaching for a word that would impugn the rationality of a female opponent, and came up with the classic wandering womb?

Self-deconstructing clichés: Weight-loss edition

Continuing my series on figures of speech being modified to eliminate their actual meaning, we have this comment on the discovery of the “holy grail” of obesity research. The holy grail, as a reminder, was a unique item in Christian mythology, the dish that caught Jesus’ blood, the single holy focus of the quest of King Arthur’s knights. According to legend it had magical healing properties. As for this holy grail,

Tam Fry, of Britain’s National Obesity Forum, said the drug is potentially the “holy grail” of weight-loss medicine… “I think there will be several holy grails, but this is a holy grail and one which has been certainly at the back of the mind of a lot of specialists for a long time.

As for the magical healing,

All of the other things apply – lifestyle change has got to be root and branch part of this.

And then we have to wonder — a self-deconstructing cliché twofer — what does he mean by “root and branch part”?

Fraud detection and statistics

Elizabeth Holmes, founder of Theranos, has now been formally indicted for criminal fraud. I’ve commented on the company before, and on the journalistic conventions around intellectuals that fostered her rise. But now that the Theranos story is coming to an end, I feel a need to comment on how utterly unnecessary this all was.

At its peak, Theranos was valued at $9 billion and employed 800 people. Yet according to John Carreyrou, the Wall Street Journal reporter whose investigations exposed Theranos’s fraud, the company is down to just 20 employees who are trying to close up shop.

All credit to Carreyrou, who by all accounts has done an excellent job investigating and reporting on this fiasco, but literally any statistician — anyone who has been through and understood a first-year statistics course — could have said from the start that this was sheer nonsense. That’s presumably why the board was made up mainly of politicians and generals.

The promise of Theranos was that they were going to revolutionise medicine by performing a hundred random medical tests on a drop of blood, and give patients a complete readout of their state of health, independent of medical recommendation of specific tests. But any statistician knows — and every medical practitioner should know — that the reason we don’t do lots of random tests without any specific indication isn’t that they’re too expensive — many aren’t — or that they require too much blood, but that the more tests you do, the more false positives you’re going to accumulate.

If you do a hundred tests on an average person, you’re going to find at least a few questionable results — either from measurement error, or because most tests aren’t all that specific — requiring followups and expensive investigations, and possibly unnecessary treatments.

Of course, if I had to evaluate the proposal for such a company I would keep an open mind about the possibility of a conceptual breakthrough that would allow them to control the false positives. But I would have demanded very clear evidence and explanations. The fact that the fawning news reports back in 2013-15 raved about the genius new biomedical technology, and failed to even claim to have produced (or found) any innovative statistical methodology, made me pretty sure that they had no idea what they were doing. In the end, it turned out that the biomedical innovations were also fake, which I probably should have guessed. But if the greedhead generals — among them the current secretary of defense, who definitely should be questioned about this, and probably ought to resign — had asked a statistician, they could have saved a lot of people a lot of unpleasantness, and maybe helped save Elizabeth Holmes from herself.

Statistics, politics, and causal theories

A new headline from the Trump era:

Fewer Immigrants Are Reporting Domestic Abuse. Police Blame Fear of Deportation.

Compare it to this headline from a few months ago:

Arrests along Mexico border drop sharply under Trump, new statistics show

This latter article goes on to comment

The figures show a sharp drop in apprehensions immediately after President Trump’s election win, possibly reflecting the deterrent effect of his rhetoric on would-be border crossers.

It must be noted that these two interpretations of declining enforcement are diametrically opposed: In the first case, declining reports to police are taken as evidence of nothing other than declining reports, whereas the latter analysis eschews such a naive interpretation, suggesting that the decline in apprehensions is actually evidence of a decline in the number of offenses (in this case, illegal border crossings).

I don’t mean to criticise the conventional wisdom, which seems to me eminently sensible. I just think it’s interesting how little the statistical “facts” are able to speak for themselves. The same facts could mean that the election of Trump was associated with a decline in domestic violence in immigrant communities, and also with a reduction in border patrol effectiveness. It’s hard to come up with a causal argument for either of these — Did immigrant men look at Trump with revulsion and decide, abusing women is for the gringos? Did ICE get so caught up with the fun of splitting up families in midwestern towns and harassing Spanish speakers in Montana, that they stopped paying attention to the southern border? — so we default to the opposite conclusion.

Tag Cloud