The first statistician

I like to share with beginning statistics students the aphorism of C. R. Rao

Uncertain knowledge + knowledge about the extent of uncertainty in it = Useable knowledge

And it just occurred to me that the extreme limit of this dictum is that if you are infinitely uncertain, if nothing is knowable, the knowledge of that fact in itself is highly useable. The realisation of which would seem to make Socrates the first statistician:

Well, although I do not suppose that either of us knows anything really beautiful and good, I am better off than he is – for he knows nothing, and thinks that he knows. I neither know nor think that I know. In this latter particular, then, I seem to have slightly the advantage of him. (Apology, translated Benjamin Jowett)

Clushtering

I was somewhat nonplussed by this article in Slate by journalist John Ore, who gives up drinking alcohol every January and had the dubious inventiveness to coin the name “Drynuary” which, he says, has caught on in some circles. What I found odd was that he seems to be plagued by demands to explain or hide the fact that he’s not drinking alcohol.

Everyone who knows me well already understands that I do this Drynuary madness every year—I’m not shy about it, after all—so their immediate reaction is usually an eye-rolling “Again?!” as they pathetically try to peer-pressure me into doing a shot with them.[…]

My wife, and other pregnant friends, have used certain sleight-of-hand tricks early in a pregnancy before they were ready to reveal that they were expecting. She would order the same drink as I would—say, a glass of red wine with dinner—and wait until mine was almost drained. Subtly, we’d switch glasses when no one was looking, and viola! It looked like she was pounding hers, and I was playing catch up.

It seemed odd to me personally because I rarely drink alcohol — and in Oxford that means frequently turning over my wine glass at dinners and drinking orange juice at social events with students — but I can’t recall that anyone has ever asked me why. Maybe it’s a difference between Britain and the US — more universal alcohol consumption here, but less eagerness to intrude on other people’s privacy — but I never had those questions when I lived in the US either. (Once I recall someone expressing surprise that I did drink something alcoholic, but without asking for an explanation. Perhaps I was just not sufficiently sensitive to the implications.)

I recently came upon this plot of alcohol consumption in the US. About 30% consume no alcohol, and the median is about one drink per week. So if Ore were hanging out with average Americans one would have to think that one in three of his companions would also not be drinking, and a second of three might very well pass on the opportunity as well. It wouldn’t seem worth commenting on. But obviously people don’t hang out with random samples of the population. And he specifically says that in his profession — presumably he means journalism — “business events and travel naturally involve expense accounts and the social lubricant of alcohol.” I’ll refrain from commenting on what this might explain about the state of journalism as a profession, but I’m pretty sure that in my profession alcohol definitely doesn’t get to be counted as a travel expense, and in some cases even the bottle of wine shared at a post-seminar dinner needs to be paid for separately because it’s specifically excluded. Continue reading “Clushtering”

We need better scientific fraud

A friend sent me this article about Dutch social psychologist Diederik Stapel, who “perpetrated an audacious academic fraud by making up studies that told the world what it wanted to hear about human nature.” What caught my attention was this comment about how the fraud was noticed:

He began writing the paper, but then he wondered if the data had shown any difference between girls and boys. “What about gender differences?” he asked Stapel, requesting to see the data. Stapel told him the data hadn’t been entered into a computer yet.

Vingerhoets was stumped. Stapel had shown him means and standard deviations and even a statistical index attesting to the reliability of the questionnaire, which would have seemed to require a computer to produce. Vingerhoets wondered if Stapel, as dean, was somehow testing him. Suspecting fraud, he consulted a retired professor to figure out what to do. “Do you really believe that someone with [Stapel’s] status faked data?” the professor asked him.

And later

When Zeelenberg challenged him with specifics — to explain why certain facts and figures he reported in different studies appeared to be identical — Stapel promised to be more careful in the future.

How hard is it to invent data? The same thing occurred to me with regard to Jan Hendrik Schön, a celebrated Dutch (not that I’m suggesting anything specific about the Dutch…) [update: German, as a commenter has pointed out. Sorry. Some of my best friends are Dutch.] materials scientist who was found in 2002 to have faked experimental results.

In April, outside researchers noticed that a figure in the Nature paper on the molecular-layer switch also appeared in a paper Science had just published on a different device. Schön promptly sent in a corrected figure for the Science paper. But the incident disturbed McEuen, who says he was already suspicious of results reported in the two papers. On 9 May, McEuen compared figures in some of Schön’s other papers and quickly found other apparent duplications.

I’m reminded of a classic article from the Journal of Irreproducible Results, “A Drastic Cost Saving Approach to Using Your Neighbor’s Electron Microscope”, advocating that researchers take advantage of the fact that all electron micrographs look the same. It printed four copies of exactly the same picture, with four different captions: One described it as showing fine structure of an axe handle, another said it showed macrophages devouring a bacterium. When it comes to plots of data (rather than photographs, which might be hard to generate de novo) I really can’t see why anyone would need to re-use a plot, or would be unable to supply made-up data for a made-up experiment. Perhaps there is a psychological block against careful thinking, or against willfully generating a dataset, some residual “I’m-not-really-doing-this-I’m-just-shifting-figures-around” resistance to acknowledging the depths to which one has sunk.

Certainly a statistician would know how to generate a perfect fake data set — which means a not-too-perfect fit to relevant statistical and scientific models. Maybe there’s an opportunity there for a new statistical consulting business model. Impact!

Update: Of course, I should have said, there’s an obvious bias here: I only know about the frauds that have been detected. They were unbelievably amateurish — couldn’t even be bothered to invent data — and still took years to be detected. How many undetected frauds are out there? It’s frightening to think about it. Mendel’s wonky data weren’t discovered for half a century. Cyril Burt may have committed the biggest fraud of all time, or maybe he was just sloppy, and we may never know for sure.

I just looked at the Wikipedia article on Burt, and discovered a fascinating quote from one of his defenders, psychologist Arthur Jensen that makes an appropriate capstone for this post:

[n]o one with any statistical sophistication, and Burt had plenty, would report exactly the same correlation, 0.77, three times in succession if he were trying to fake the data.

In other words, his results were so obviously faked that they must be genuine. If he were trying to fake the data he would certainly have made them look more convincingly real.

Bayesian theology

I was reading (finally, after seven years in Oxford) Thomas Hardy’s Jude the Obscure, and found the following quote from John Henry Newman:

My argument was … that absolute certitude as to the truths of natural theology was the result of an assemblage of concurring and converging probabilities … that probabilities which did not reach to logical certainty might create a mental certitude.

 

The paradoxes of adultery, Renaissance edition

An example that is frequently cited in elementary statistics courses for the unreliability of survey data, is that when people are surveyed about their sexual history, men report more lifetime female partners on average than women report male partners. (A high-quality example is this UK survey from 1992, where men reported 9.9 female partners on average, while women averaged 3.4 male partners. It’s possible to tinker around the edges with effects of changes over time, and age differences between men and women in sexual relationships, but the contradiction is really inescapable. One thing that is quite striking in this survey is the difference between the cross-sectional and longitudinal pictures, which I’ve discussed before. For example, men’s lifetime numbers of sexual partners increase with age — as they must, longitudinally — but among the women the smallest average number of lifetime sex partners is in the oldest group.)

In any case, I was reminded of this when reading Stephen Greenblatt’s popular book on the rediscovery of De rerum naturae in the early 15th century by the apostolic secretary Poggio Bracciolini, and the return of Epicurean philosophy more generally into European thought. He cites a story from Poggio’s Liber Facetiarum a sort of jokebook based on his experiences in the papal court, about

dumb priests, who baffled by the fact that nearly all the women in confession say that they have been faithful in matrimony, and nearly all the men confess to extramarital affairs, cannot for the life of them figure out who the women are with whom the men have sinned.

The CDC misunderstand screening too

Last week I mocked the Spanish health authorities who refused to treat an Ebola-exposed nurse as a probable Ebola case until her fever had crossed the screening threshold of 38 degrees Celsius (or, in the absurdly precise American translation, 100.4 degrees Fahrenheit). Well, apparently the Centers for Disease Control in the US aren’t any better:

Before flying from Cleveland to Dallas on Monday, Vinson called the CDC to report an elevated temperature of 99.5 Fahrenheit. She informed the agency that she was getting on a plane, the official said, and she wasn’t told not to board the aircraft.

The CDC is now considering putting 76 health care workers at Texas Health Presbyterian Dallas hospital on the TSA’s no-fly list, an official familiar with the situation said.

The official also said the CDC is considering lowering the fever threshold that would be considered a possible sign of Ebola. The current threshold is 100.4 degrees Fahrenheit.

Most disturbing is the fact that they don’t seem capable of combining factors. Would it be so hard to have a rule like, For most people, let’s hold off on the hazmat suits until your fever goes above 38. But if you’ve been cleaning up the vomit of an Ebola patient for the past week, and you have any elevated temperature at all — let’s say 37.2 — it would be a good idea to get you under observation.

The tyranny of the 95%

The president of the National Academy of Science is being quoted spouting dangerous nonsense. Well, maybe not so dangerous, but really nonsense.

I found this by way of Jonathan Chait, a generally insightful and well-informed political journalist, who weighed in recently on the political response to the IPCC report on climate change. US Republican Party big shot Paul Ryan, asked whether he believes that human activity has contributed to global warming, replied recently “I don’t know the answer to that question. I don’t think science does, either.” Chait rightly takes him to task for this ridiculous dodge (though he ignores the fact that Ryan was asked about his beliefs, so that his skepticism may reflect a commendable awareness of the cognitive theories of Stephen Stich, and his need to reflect upon the impossibility of speaking scientifically, or introspecting coherently, about the contents of beliefs), but the form of his criticism left me troubled:

In fact, science does know the answer. Climate scientists believe with a 95 percent level of certainty (the same level of certainty as their belief in the dangers of cigarette smoking) that human activity is contributing to climate change.

Tracking through his links, I found that he’d copied this comparison between climate change and the hazards of smoking pretty much verbatim from another blog, and that it ultimately derived from this “explanation” from the AP:

Some climate-change deniers have looked at 95 percent and scoffed. After all, most people wouldn’t get on a plane that had only a 95 percent certainty of landing safely, risk experts say.

But in science, 95 percent certainty is often considered the gold standard for certainty.

[…]

The president of the prestigious National Academy of Sciences, Ralph Cicerone, and more than a dozen other scientists contacted by the AP said the 95 percent certainty regarding climate change is most similar to the confidence scientists have in the decades’ worth of evidence that cigarettes are deadly.

Far be it from me to challenge the president of the National Academy of Sciences, particularly if it’s the “prestigious” National Academy of Sciences, or more than a dozen other scientists, but the technical term for this is “bollocks”. Continue reading “The tyranny of the 95%”

False positives, false confidence, and ebola

Designing a screening test is hard. You have a large population, almost all of whom do not have whichever condition you’re searching for. Thus, even with a tiny probability of error, most of the cases you pick up will be incorrect — false positives, in the jargon. So you try to set the bar reasonably high; but set it too high and you’ll miss most of the real cases — false negatives.

On the other hand, if you have a suspicion of the condition in a particular case, it’s much easier. You can set the threshold much lower without being swamped by false positives. What would be really dumb is to use the same threshold from the screening test to judge a case where there are individual grounds for suspicion. But that’s apparently what doctors in Spain did with the nurse who was infected with Ebola. From the Daily Beast:

When Teresa Romero Ramos, the Spanish nurse now afflicted with the deadly Ebola virus first felt feverish on September 30, she reportedly called her family doctor and told him she had been working with Ebola patients just like Thomas Eric Duncan who died today in Dallas. Her fever was low-grade, just 38 degrees Celsius (100 degrees Fahrenheit), far enough below the 38.6-degree Ebola red alert temperature to not cause alarm. Her doctor told her to take two aspirin, keep an eye on her fever and keep in touch.

She was caring for Ebola patients, she developed a fever, but they decided not to treat it like a possible case of Ebola because her fever was 0.6 degrees below the screening threshold for Ebola.

A failure of elementary statistical understanding, and who knows how many lives it will cost.

Absence of correlation does not imply absence of causation

By way of Andrew Sullivan we have this attempt by Philip N. Cohen to apply statistics to answer the question: does texting while driving cause accidents? Or rather, he marshals data to ridicule the new book by Matt Richtel on a supposed epidemic of traffic fatalities, particularly among teens, caused by texting while driving. He has some good points about the complexity of the evidence, and a good general point that people like to fixate on some supposed problem with current cars or driving practices, to distract their attention from the fact that automobiles are inherently dangerous, so that the main thing that causes more fatalities is more driving. But then he has this weird scatterplot, that is supposed to be a visual knock-down argument:

We need about two phones per person to eliminate traffic fatalities...
We need about two phones per person to eliminate traffic fatalities…

So, basically no correlation between the number of of phone subscriptions in a state and the number of traffic fatalities. So, what does that prove? Pretty much nothing, I would say. It’s notable that there is really very little variation in the number of mobile phones among the states, and at the lowest level there’s still almost one per person. (Furthermore, I would guess that most of the adults with no mobile phone are poor, and likely don’t have an automobile either.) Once you have one mobile phone, there’s no reason to think that a second one will substantially

Whether X causes Y is a separate question from whether variation in X is linked to variation in Y. You’d like to think that a sociologist like Cohen would know this. A well-known example: No one would doubt that human intelligence is a product of the human brain (most directly). But variations in intelligence are uncorrelated with variations in brain size. (Which doesn’t rule out the possibility that more subtle measurements could find a physical correlate.) This is particularly true with causes that are saturated, as with the one phone per person level.

You might imagine a Cohen-like war-crimes investigator deciding that the victims were not killed by bullets, because we find no correlation between the number of bullets in a gun and the fate of the corresponding victim.

Just to be clear: I’m not claiming that evidence like this could never be relevant. But when you’re clearly in the saturation region, with a covariate that is only loosely connected to the factor in question, it’s obviously just misleading.

Innumeracy: UK prison service edition

The BBC reports on a study by the Prisoners Education Trust, of the impact of the recent decision of the prison service to limit prisoners’ access to books. The Ministry of Justice has dismissed the study, saying

the PET survey of 343 inmates represented just 0.01% of the total prison population in England and Wales.

This is a twofer, with a pair of errors packed into impressively small space. Even a government minister should be able to calculate that if 343 inmates represent 0.01% of the prison population, then more than 6% of the population (53.5 million) must be imprisoned, which I don’t need to check the figures to know must be wrong. But I did check it, and find that the Ministry of Justice made a wee error of not quite 2 orders of magnitude. According to this publication (coincidentally, also from the Ministry of Justice) there were about 84,000 prisoners in June 2013. Assuming there haven’t been any huge changes since then, those 343 inmates in fact represent 0.4% of the prison population. Where is Michael Gove when you need him?

More generally, the comment conveyed the impression that if the sample were a small fraction of the population then it couldn’t be statistically valid. Of course, that’s not true. If you were doing an election poll of the whole population of England, a random sample of 0.01% of the population would be about 5000 people, which is much larger than most surveys, and enough to get a result that’s accurate to within about ± 1.5%. The real problem with this survey is that it’s not a random sample, and not representative, being self-selected among readers of a certain magazine; but there is no pretence about that, and if the Ministry of Justice were interested in addressing the issue rather than issuing talking points, they could address the question of whether the concerns raised by the more literate of the prisoner population most concerned with literacy are worth taking seriously.