The time lords

The European parliament has voted to stop the practice of switching clocks forward and backward every year, from 2021. I’ve long thought this practice rather odd. Imagine that a government were to pass a law stating that from April 1 every person must wake up one hour earlier than they habitually do, and go to sleep one hour earlier. All shops and businesses are required to open an hour earlier, and to close an hour earlier. The same for schools, universities, and the timing of private lessons and appointments must also be shifted. Obviously ridiculous, even tyrannical. The government has nothing to say about when I go to bed or wake up, when my business is open. But because they enforce it through adjusting the clocks, which seem like an appropriate subject of regulation and standardisation, it is almost universally accepted.

But instead of praising this blow struck for individual freedom and against statist overreach, we have Tories making comments like this:

John Flack, the Conservative MEP for the East of England, said: “We’ve long been aware the EU wants too much control over our lives – now they want to control time itself. You would think they had other things to worry about without wanting to become time lords,” he said, in an apparent reference to the BBC sci-fi drama Doctor Who.

“We agreed when they said the clocks should change across the whole EU on an agreed day. That made sense – but this is a step too far,” Flack added. “I know that farmers in particular, all across the east of England, value the flexibility that the clock changes bring to get the best from available daylight.

So, the small-government Tory thinks it’s a perfectly legitimate exercise of European centralised power to compel shopkeepers in Sicily and schoolchildren in Madrid to adjust their body clocks* in order to spare English farmers the annoyance of having to consciously adjust the clocktime when they get out of bed to tend to their harvest. But to rescind this compulsion, that is insufferably arrogant.

*Nor is this a harmless annoyance. Researchers have found a measurable increase in heart attacks — presumed attributable to reduced sleep — in the days following the spring clock shift. A much smaller decrease may accompany the autumn shift back.

The Brexit formula

A collaborative project with Dr Julia Brettschneider (University of Warwick) has yelded a mathematical formulation of the current range of Brexit proposals coming from the UK, that we hope will help to facilitate a solution:

Numeric calculations seem to confirm the conjecture that the value of the solution tends to zero as t→29/3.

Medical hype and under-hype

New heart treatment is biggest breakthrough since statins, scientists say

I just came across this breathless headline published in the Guardian from last year. On the one hand, this is just one study, the effect was barely statistically significant, and experience suggests a fairly high likelihood that this will ultimately have no effect on general medical practice or on human health and mortality rates. I understand the exigencies of the daily newspaper publishing model, but it’s problematic that the “new research study” has been defined as the event on which to hang a headline. The only people who need that level of up-to-the-minute detail are those professionally involved in winnowing out the new ideas and turning them into clinical practice. We would all be better served if newspapers instead reported on what new treatments have actually had an effect over the last five years. That would be just as novel to the general readership, and far less erratic.

On the other hand, I want to comment on one point of what I see as exaggerated skepticism: The paragraph that summarises the study results says

For patients who received the canakinumab injections the team reported a 15% reduction in the risk of a cardiovascular event, including fatal and non-fatal heart attacks and strokes. Also, the need for expensive interventional procedures, such as bypass surgery and inserting stents, was cut by more than 30%. There was no overall difference in death rates between patients on canakinumab and those given placebo injections, and the drug did not change cholesterol levels.

There is then a quote:

Prof Martin Bennett, a cardiologist from Cambridge who was not involved in the study, said the trial results were an important advance in understanding why heart attacks happen. But, he said, he had concerns about the side effects, the high cost of the drug and the fact that death rates were not better in those given the drug.

In principle, I think this is a good thing. There are far too many studies that show a treatment scraping out a barely significant reduction in mortality due to one cause, which is highlighted, but a countervailing mortality increase due to other causes, netting out to essentially no improvement. Then you have to say, we really should be aiming to reduce mortality, not to reduce a cause of mortality. (I remember many years ago, a few years after the US started raising the age for purchasing alcohol to 21, reading of a study that was heralded as showing the success of this approach, having found that the number of traffic fatalities attributed to alcohol had decreased substantially. Unfortunately, the number of fatalities not attributed to alcohol had increased by a similar amount, suggesting that some amount of recategorisation was going on.) Sometimes researchers will try to distract attention from a null result for mortality by pointing to a secondary endpoint — improved results on a blood test linked to mortality, for instance — which needs to be viewed with some suspicion.

In this case, though, I think the skepticism is unwarranted. There is no doubt that before the study the researchers would have predicted reduction in mortality from cardiovascular causes, no reduction due to any other cause, and likely an increase due to infection. The worry would be that the increase due to infection — or to some unanticipated side effect — would outweigh the benefits.

The results confirmed the best-case predictions. Cardiovascular mortality was reduced — possibly a lot, possibly only slightly. Deaths due to infections increased significantly in percentage terms, but the numbers were small relative to the cardiovascular improvements. The one big surprise was a very substantial reduction in cancer mortality. The researchers are open about not having predicted this, and not having a clear explanation. In such a case, it would be wrong to put much weight on the statistical “significance”, because it is impossible to quantify the class of hypotheses that are implicitly being ignored. The proper thing is to highlight this observation for further research, as they have properly done.

When you deduct these three groups of causes — cardiovascular, infections, cancer — you are left with approximately equal mortality rates in the placebo and treatment groups, as expected. So there is no reason to be “concerned” that overall mortality was not improved in those receiving the drug. First of all, overall mortality was better in the treatment group. It’s just that the improvement in CV mortality — as predicted — while large enough to be clearly not random when compared with the overall number of CV deaths, it was not large compared with the much larger total number of deaths. This is no more “concerning” than it would be, when reviewing a programme for improving airline safety, to discover that it did not appreciably change the total number of transportation-related fatalities.

Social choice Brexit

The discussion over a possible second Brexit referendum has foundered on the shoals of complexity: If the public were only offered May’s deal or no deal, that wouldn’t be any kind of meaningful choice (and it’s absurd to imagine that a Parliament that wouldn’t approve May’s deal on its own would be willing to vote for the fate of Brexit to be decided by the public on those terms. So you’re left with needing an unconventional referendum with at least three options: No deal, May’s deal, No Brexit (plus possible additional alternatives, like, request more time to negotiate the Better Deal™).

A three-choice (or more) referendum strikes many people as crazy. There are reasonable concerns. Some members of the public will inevitably find it confusing, however it is formulated and adjudicated. And the impossibility of aggregating opinions consistent with basic principles of fairness, not even to say in a canonical way, is a foundational theorem of social-choice theory (due to Kenneth Arrow).

Suppose we followed the popular transferable vote procedure: People rank the options, and we look only at the first choices. Whichever option gets the smallest number of first-choice votes is dropped, and we proceed with the remaining options, until one option has a first-choice majority. The classic paradoxical situation is all too likely in this setting. Suppose the population consists of

  1. 25% hardened brexiteers. They prefer a no-deal Brexit, but the last thing they want is to be blamed for May’s deal, which leaves the UK taking orders from Brussels with no say in them. If they can’t have their clean break from Brussels, they’d rather go back to the status quo ante and moan about how their beautiful Brexit was betrayed.
  2. 35% principled democrats. They’re nervous about the consequences of Brexit, so they’d prefer May’s soft deal, whatever it’s problems. But if they can’t have that, they think the original referendum needs to be respected, so their second choice is no deal Brexit.
  3. 40% squishy europhiles. They want no Brexit, barring that they’d prefer the May deal. No-deal Brexit for them is the worst.

The result will be that no deal drops out, and we’re left with 65% favouring no Brexit. But if the PDs anticipated this, they could have ranked no deal first, producing a result that they would have preferred.

So, that seems like a problem with a three-choice referendum. But here’s a proposal that would be even worse: We combine choices 2 and 3 into a single choice, which we simply call “Leave”. Then those who wants to abandon the European project entirely will be voting for the same option as those who are concerned about the EU being dominated by moneyed interests, and they’ll jointly win the referendum and then have to fight among themselves after the fact, leaving them with the outcome — no-deal Brexit — that the smallest minority preferred.

Unfortunately, that’s the referendum we actually had.

Schrödinger’s menu

I was just rereading Erwin Schrödinger’s pathbreaking 1944 lectures What is Life? which is often praised for its prescience — and influence — on the foundational principals of genetics in the second half of the twentieth century. At one point, in developing the crucial importance of his concept of negative entropy as the driver of life,  he remarked on the misunderstanding that “energy” is what organisms draw from their food. In an ironic aside he says

In some very advanced country (I don’t remember whether it was Germany or the U.S.A. or both) you could find menu cards in restaurants indicating, in addition to the price, the energy content of every dish.

Also prescient!

How odd that the only biological organisms that Schrödinger is today commonly associated with are cats…

FDA sample menu with energy content

The Silver Standard 4: Reconsideration

After writing in praise of the honesty and accuracy of fivethirtyeight’s results, I felt uncomfortable about the asymmetry in the way I’d treated Democrats and Republicans in the evaluation. In the plots I made, low-probability Democratic predictions that went wrong pop out on the left-hand side, whereas low-probability Republican predictions  that went wrong would get buried in the smooth glide down to zero on the right-hand side. So I decided, what I’m really interested in are all low-probability predictions, and I should treat them symmetrically.

For each district there is a predicted loser (PL), with probability smaller than 1/2. In about one third of the districts the PL was assigned a probability of 0. The expected number of PLs (EPL) who would win is simply the sum of all the predicted win probabilities that are smaller than 1/2. (Where multiple candidates from the same party are in the race, I’ve combined them.) The 538 EPL was 21.85. The actual number of winning PLs was 13.

What I am testing is whether 538 made enough wrong predictions. This is the opposite of the usual evaluation, which gives points for getting predictions right. But when measured by their own predictions, the number of districts that went the opposite of the way they said was a lot lower than they said it would be. That is prima facie evidence that the PL win probabilities were being padded somewhat. To be more precise, under the 538 model the number of winning PLs should be approximately Poisson distributed with parameter 21.85, meaning that the probability of only 13 PLs winning is 0.030. Which is kind of low, but still pretty impressive, given all the complications of the prediction game.

Below I show plots of the errors for various scenarios, measuring the cumulative error for these symmetric low predictions. (I’ve added an “Extra Tarnished” scenario, with the transformation based on the even more extreme beta(.25,.25).) I show it first without adjusting for the total number of predicted winning PLs:

image

We see that tarnished predictions predict a lot more PL victories than we actually see. The actual predictions are just slightly more than you should expect, but suspiciously one-sided — that is, all in the direction of over predicting PL victories, consistent with padding the margins slightly, erring in the direction of claiming uncertainty.

And here is an image more like the ones I had before, where all the predictions are normalised to correspond to the same number of predicted wins:

TarnishedSymmetric

 

The Silver Standard, Part 3: The Reckoning

One of the accusations most commonly levelled against Nate Silver and his enterprise is that probabilistic predictions are unfalsifiable. “He never said the Democrats would win the House. He only said there was an 85% chance. So if they don’t win, he has an out.” This is true only if we focus on the top-level prediction, and ignore all the smaller predictions that went into it. (Except in the trivial sense that you can’t say it’s impossible that a fair coin just happened to come up heads 20 times in a row.)

So, since Silver can be tested, I thought I should see how 538’s predictions stood up in the 2018 US House election. I took their predictions of the probability of victory for a Democratic candidate in all 435 congressional districts (I used their “Deluxe” prediction) from the morning of 6 November. (I should perhaps note here that one third of the districts had estimates of 0 (31 districts) or 1 (113 districts), so a victory for the wrong candidate in any one of these districts would have been a black mark for the model.) I ordered the districts by the predicted probability, to compute the cumulative predicted number of seats, starting from the smallest. I plot them against the cumulative actual number of seats won, taking the current leader for the winner in the 11 districts where there is no definite decision yet.

Silver_PredictedvsActual

The predicted number of seats won by Democrats was 231.4, impressively close to the actual 231 won. But that’s not the standard we are judging them by, and in this plot (and the ones to follow) I have normalised the predicted and observed totals to be the same. I’m looking at the cumulative fractions of a seat contributed by each district. If the predicted probabilities are accurate, we would expect the plot (in green) to lie very close to the line with slope 1 (dashed red). It certainly does look close, but the scale doesn’t make it easy to see the differences. So here is the plot of the prediction error, the difference between the red dashed line and the green curve, against the cumulative prediction:

Silver_PredictedvsError

There certainly seems to have been some overestimation of Democratic chances at the low end, leading to a maximum cumulative overprediction of about 6 (which comes at district 155, that is, the 155th most Republican district). It’s not obvious whether these differences are worse than you would expect. So in the next plot we make two comparisons. The red curve replaces the true outcomes with simulated outcomes, where we assume the 538 probabilities are exactly right. This is the best case scenario. (We only plot it out to 100 cumulative seats, because the action is all at the low end. The last 150 districts have essentially no randomness. The red curve and the green curve look very similar (except for the direction; the direction of the error is random). The most extreme error in the simulated election result is a bit more than 5.

What would the curve look like if Silver had cheated, by trying to make his predictions all look less certain, to give himself an out when they go wrong? We imagine an alternative psephologist, call him Nate Tarnished, who has access to the exact true probabilities for Democrats to win each district, but who hedges his bets by reporting a probability closer to 1/2. (As an example, we take the cumulative beta(1/2,1/2) distribution function. this leaves 0, 1/2, and 1 unchanged, but .001 would get pushed up to .02, .05 is pushed up to .14, and .2 becomes .3. Similarly, .999 becomes .98 and .8 drops to .7. Not huge changes, but enough to create more wiggle room after the fact.

In this case, we would expect to accumulate much more excess cumulative predicted probability on the left side. And this is exactly what we illustrate with the blue curve, where the error repeatedly rises nearly to 10, before slowly declining to 0.

SilverTornished

I’d say the performance of the 538 models in this election was impressive. A better test would be to look at the predicted vote shares in all 435 districts. This would require that I manually enter all of the results, since they don’t seem to be available to download. Perhaps I’ll do that some day.

So long, Sokal

I wonder how Alan Sokal feels about becoming the new Piltdown, the metonym for a a certain kind of hoax?

So now there’s another attack on trendy subfields of social science, being called “Sokal squared” for some reason. I guess it’s appropriate to the ambiguity of the situation. if you thought the Sokal Hoax was already big, squaring it would make it bigger; on the other hand, if you thought it was petty, this new version is just pettier. And if, like me, you thought it was just one of those things, the squared version is more or less the same.

The new version is unlike the original Sokal Hoax in one important respect: Sokal was mocking social scientists for their credulity about the stupid stuff physicists say. The reboot mocks social scientists for their credulity about the stupid stuff other social scientists say. A group of three scholars has produced a whole slew of intentionally absurd papers, in fields that they tendentiously call “grievance studies”, and managed to get them past peer review at some reputable journals. The hoaxers wink with facially ridiculous theses, like the account of canine rape culture in dog parks.

But if we’re not going to shut down bold thought, we have to allow for research whose aims and conclusions seem strange. (Whether paradoxical theses are unduly promoted for their ability to grab attention is a related but separate matter. For example, one of the few academic economics talks I ever attended was by a behavioural economist explaining the “marriage market” in terms of women’s trading off the steady income they receive from a husband against the potential income from prostitution that they would forego. And my first exposure to mathematical finance was a lecture on how a person with insider information could structure a series of profitable trades that would be undetectable by regulators.) If the surprising claim is being brought by a fellow scholar acting in good faith, trying to advance the discussion in the field, then you try to engage with the argument generously. You need to strike a balance, particularly when technical correctness isn’t a well-defined standard in your field. Trolling with fake papers poisons this cooperative process of evaluation. Continue reading “So long, Sokal”

Anti-publishing

George Monbiot has launched an exceptionally dyspeptic broadside in the Guardian against academic publishing, and in support of the heroic/misguided data scraper Alexandra Elbakyan, who downloaded millions of papers, and made them available on a pirate server.

I agree with the headline “Scientific publishing is a rip-off. We fund the research – it should be free”, but disagree with most of the reasoning. Or, maybe it would be better said, from my perspective as an academic his complaints seem to me not the most significant.

Monbiot’s perspective is that of a cancer patient who found himself blocked from reading the newest research on his condition. I think, though, he has underestimated the extent to which funding bodies in the UK and US, and now in the EU as well, have placed countervailing pressure for publicly funded research to be made available in various versions of “open access”, generally within six months of journal publication. In many fields — though not the biomedical research of most interest to Monbiot — it has long been the case that journal publication is an afterthought, with research papers published first as “preprints” on freely accessible archive sites. Continue reading “Anti-publishing”