“Strong support for Johnson”

According to The Guardian, people want Boris Johnson to be the next leader of the Conservatives. They don’t say it explicitly, but they suggest that “next” means, like, tomorrow, and not after the next election. After citing a poll finding that 29% of voters want Johnson to be the next Tory leader (are those Conservative supporters? I might want Johnson to be the next Tory leader because I think he’ll lead his party to disaster…), they write

The strong support for Johnson feeds into the party standings. The poll finds that Labour’s seven-point lead would fall to three points if he led the Tories. The Tories would see their support increase by three points under a Johnson premiership to 34% while Labour would see its support fall by one point to 37%. Johnson would also hit support for Ukip,. which would see its support fall by two points to 8%.

Before the Tories dump Cameron, they might want to check whether this 3% boost is statistically robust. This looks like an elementary statistics exercise, but it’s not quite so simple. If D is the Tory support under Cameron, and B the Tory support under Johnson, then B-D might be expected to be about 3%. But how confident should we be that Johnson is really better than Cameron? Unfortunately, we can’t know that without knowing the correlations: in this case, that means we need to know how many people supported the Tories only with Cameron, and how many supported them only with Johnson, and how many supported them with either leader. Continue reading ““Strong support for Johnson””

Bayesian Fables: The Trojan Horse

I was talking recently to a friend who said he saw the story of the Trojan horse as an object lesson in the failure of governance. “Wasn’t there anyone who could say, wait a minute, maybe it’s just not a good idea to bring that horse in here, even if the Greeks seem to have all left?”

I said it was a fable about the inaccessibility of Bayesian reasoning. Laocoön warned them that the prior probability for a net benefit from a Greek gift was low (timeō Danaōs et dōna ferentēs). But the Trojans placed more credence in new information, particularly private information that they hold exclusively, particularly when they seem to have won the information at great effort, by their own ingenuity, by torturing the captured Sinon. (This lesson was learned by the British spies in WWII who conceived Operation Mincemeat.) Laocoön was punished for insisting on his strong prior, being crushed to death by the clever serpents sent by the Goddess of Worldly Wisdom. And the Trojans celebrated their ingenious victory, until they were overrun by reality, in the form of well-muscled Achaean warriors who were not impressed by their highly significant rejection of the likelihood of a subterfuge.

Percents are hard

Some really bad science reporting from the BBC. They report on a new study finding the incidence of diagnosed coeliac disease increasing (and decreasing incidence of dermatitis herpetiformis, though this doesn’t rate a mention) in the UK. Diagnoses have gone up from 5.2 to 19.1 per 100,000 in about 20 years, which they attribute to increased awareness. Except, they don’t say what that is 100,000 of. You have to go back to the original article to see that it is person-years, and that they are talking about incidence, and not prevalence (in technical parlance); they use the word “rate”, which is pretty ambiguous, and commonly used — particularly in everyday speech — to refer to prevalence. If you read it casually — and, despite being a borderline expert in the subject, I misread it at first myself —  you might think they mean that 19 in 100,000 of the population of Britain suffers from coeliac; that would be about 12,000 people, hardly enough to explain the condition’s cultural prominence (and prominent placement on the BBC website). In fact, they estimate that about 150,000 have diagnosed CD in the UK.

As if aiming maximally to compound the confusion, they quote one of the authors saying

“This [increase] is a diagnostic phenomenon, not an incidence phenomenon. It is exactly what we had anticipated.”

In the article they (appropriately) refer to the rate of diagnosis as incidence, but here they say it’s not about “incidence”.

To make matters worse, they continue with this comment:

Previous studies have suggested around 1% of the population would test positive for the condition – but the data from this study suggests only 0.25% are diagnosed.

I think that normally, if you say “only x% are diagnosed” is meant relative to the number of cases; here it would mean 0.25% of the 1%. But, in fact, they mean to compare the 0.25% of the population who are diagnosed with the 1% who actually suffer from the disease.

Identifiability

A hot topic in statistics is the problem of anonymisation of data. Medical records clearly contain highly sensitive, private information. But if I extract just the blood pressure measurements for purposes of studying variations in blood pressure over time, it’s hard to see any reason for keeping those data confidential.

But what happens when you want to link up the blood pressure with some sensitive data (current medications, say), and look at the impact of local pollution, so you need at least some sort of address information? You strip out the names, of course, but is that enough? There may be only one 68-year-old man living in a certain postcode. It could turn into one of those logic puzzles where you are told that Mary likes cantelope and has three tattoos, while John takes cold baths and dances samba, along with a bunch of other clues, and by putting it all together in an appropriate grid you can determine that Henry is adopted and it’s Sarah’s birthday. Some sophisticated statistical work, particularly in the peculiar field of algebraic statistics, has gone into defining the conditions under which there can be hidden relations among the data that would allow individuals to be identified with high probability.

I thought of this careful and subtle body of work when I read this article about private-sector mass surveillance of automobile license plates — another step in the Cthulhu-ised correlations of otherwise innocuous information that modern information technology is enabling. Two companies are suing the state of Utah to block a law that prevents them from using their own networks of cameras to record who is travelling where when, and use that information for blackmail market research.

The Wall Street Journal reports that DRN’s own website boasted to its corporate clients that it can “combine automotive data such as where millions of people drive their cars … with household income and other valuable information” so companies can “pinpoint consumers more effectively.” Yet, in announcing its lawsuit, DRN and Vigilant argue that their methods do not violate individual privacy because the “data collected, stored or provided to private companies (and) to law enforcement … is anonymous, in the sense that it does not contain personally identifiable information.”

They’re only recording information about  So, in their representation, data are suitably anonymised if they don’t actually include the name and address. We’re just tracking vehicles. Could be anyone inside… We’re just linking it up with those vehicles’ household incomes. Presumably they’re going to target ads for high-grade oil and new tires at those cars, or something.

 

More Hockey Statisticks

I wrote last week about my surprising response to two books about the public conflicts over palaeoclimatology. Whereas I expected to find myself sympathising with the respected scientist Michael Mann, I found both authors equally repellant — both are smug and self-absorbed, both write crudely — and had most sympathy with Steven McIntyre, the former mining engineer who stars in Andrew Montford’s book. Fundamentally, I found that Mann’s own account made him seem like just the sort of arrogant senior scientist I have occasionally had to deal with as a statistician, one who is outraged that anyone outside his close circle would want to challenge his methodology.

A pair of long comments on the post underlined my impression of the cultish behaviour of people who have gotten enmeshed in the struggle over climate change, on both sides. The commenter writes:

I would suggest that McIntyre’s work went out of its way to try to cast doubt on Mann’s research, and in that process created as many errors of its own. Montford’s book takes that dubious effort and magnifies it for the purposes of attacking climate change science in general by vilifying a single piece of research by a single researcher.

I have to say, Montford’s effort has been highly effective. In one lecture I saw, given by Dr Richard Alley, he recounted being in Washington speaking to a science committee where one high level member stated, “Well, we know all this climate change stuff is based on a fraudulent Hockey Stick graph.”

I’m sure [Andrew] Montford appreciates your piece here perpetuating that position.

I don’t know exactly what Montford’s “effort” is. Certainly, in his book he has little to say about the rest of climate science, but what he does have to say can hardly give any impression other than that the “hockey stick” is a small part of palaeoclimatology, and that palaeoclimatology is a small part of climate research. He never accuses Mann or anyone else of fraud in his book, although he is unyielding and close to hysterical in imputing incompetence to Mann and some of his closest collaborators.

As for McIntyre’s work going “out of its way to try to cast doubt”, this hardly seems different to me than the usual way scientists are motivated. It’s no different than the comments about “getting rid of the Mediaeval Warm Period”, that Montford is obsessed with, as evidence of scientific corruption. I was never bothered by that comment, or any of the comments that came out of the disgraceful email hack of the Climatic Research Unit, because I understand that scientists rarely launch an investigation without any preconceptions. It’s perfectly plausible — even likely — that climate researchers would have had a strong gut feeling that this warm period was much less substantial than it had seemed, but were casting about for a way to prove the point. The trick here is to have a rigorous methodology that won’t bend to your preconceptions. The same way, McIntyre had a gut feeling that the climate was much more variable in the past than the mainstream researchers wanted to believe, and he set about proving his point by trying to find the flaws in their methodology.

The fact that later studies ended up confirming the broad outlines of Mann’s picture, and disproving McIntyre’s intuition does not make his critique any the less serious or important. And it doesn’t make Mann’s efforts to portray all of his opponents as villains any less unsavoury. And his efforts to present scientific defensiveness as high principle do a disservice to science in general, and to climate science more specifically.

The commenter describes Mann’s self-righteous refusal to provide essential materials for McIntyre’s attempts to re-evaluate his work as a natural response to ” the levels to which “skeptics” are willing to go. It may seem absurd, but I think that is only because the levels they go to are so outrageous.” Except that it looks to me as though Mann’s stonewalling came first. Maybe that’s wrong, but again, if so, he doesn’t seem to think anyone has a right to expect evidence of the fact.

Mann comes across in his own book as a manipulator who would like to tar all of his opponents with the outrageous actions that some have committed. He accuses McIntyre of “intimidation” without considering it necessary to provide any shred of evidence. The portion of their correspondence quoted by Montford obviously doesn’t show anything beyond occasional exasperation at Mann’s stonewalling. Obviously there could be more to it, but Mann seems so persuaded of his own saintliness that his bare assertion of his own pure motives — and of the correctness of his methodology — ought to persuade every reader. And so convinced of the objectivity of his friends and colleagues that merely quoting their statements in his defence should suffice.

Science is science, but many climate scientists have (quite rightly) decided that the implications of what they have learned demand political action. They can’t then express horror when others blend their scientific inquiry with a political agenda.

Of hockey sticks and statistics

[Updated at bottom] I recently read two books on climate science — or rather, two books on the controversies around climate science. One book was Michael Mann’s book The Hockey Stick and the Climate Wars; the other The Hockey Stick Illusion by Andrew Montford.

Now, I am, by inclination and tribal allegiance, of the party of Michael Mann, one of the world’s leading climate scientists. He and his colleagues have been subject to beastly treatment by political opponents, some of which is detailed in his book. And I only picked up the Montford book out of a sense of obligation to see what the opposing side was saying. And yet…

Montford’s book makes a pretty persuasive case. Not that climate science is bunk, or a conspiracy, or that anthropogenic global warming is a fiction — there is far too much converging evidence from different fields to plausibly make that claim (and indeed, Montford never makes such a claim) — but that a combination of egotism and back-scratching has seriously slowed down the process of evaluation and correction of sometimes sloppy statistical procedures, and tarnished the reputation of the scientific community generally.

I admit to a certain bias here: The attacks on Mann’s work that Montford describes are statistical in nature, and Mann’s response reminds me of the tone that is all too common when statisticians raise questions about published scientific work. Montford has a remarkable amount of technical detail — so much that I found myself wondering who his intended audience is — and the critiques he describes (mainly due to the Canadian mining engineer Steve McIntyre) seem eminently sensible. In the end, I don’t think they panned out, put they were genuine shortcomings in the early work, and McIntyre seems to have done the right things in demonstrating the failure of a statistical method, at least in principle, and to have earned for his trouble only incomprehension and abuse from Mann and his colleagues.

Continue reading “Of hockey sticks and statistics”

Demographic fallacies and classical music

I was just reading an article in Slate with the title “Classical Music in America is Dead”. The argument boils down to two points:

  1. Classical music listeners are a small portion of the population.
  2. Relatively few young people in the audience.

With regard to (1), I thought it interesting that he writes

Just 2.8 percent of albums sold in 2013 were categorized as classical. By comparison, rock took 35 percent; R&B 18 percent; soundtracks 4 percent. Only jazz, at 2.3 percent, is more incidental to the business of American music.

What’s interesting is that, while jazz is certainly a minority taste, and its trajectory in American culture has closely paralleled that of classical music in the 20th century, I don’t think anyone would claim that jazz is dead.

He quotes the critic Richard Sandow, who makes a demographic argument that

And the aging audience is also a shrinking one. The NEA, ever since 1982, has reported a smaller and smaller percentage of American adults going to classical music performances. And, as time goes on, those who do go are increasingly concentrated in the older age groups (so that by now, the only group going as often as it did in the past are those over 65).

Which means that the audience is most definitely shrinking. Younger people aren’t coming into it. In the 1980s, the NEA reported, the percentage of people under 30 in the classical music audience fell in half. And older people also aren’t coming into the classical audience. If they were, we’d see a steady percentage of people in their 40s and 50s going to classical events, but we don’t. That percentage is falling.

Of course, this is vastly overstated. “Younger people” are “coming into it”… in smaller numbers than before. It’s an absurd fallacy (not uncommon, and addressed first (to my knowledge) in theoretical ageing research by Yashin et al. in the 1980s) that you can determine the longitudinal dynamics for individuals by looking at the cross-sectional age distribution.

Consider a model where individuals are recruited into classical music at a constant rate over their lifetimes, ending with 10% of the 80-year-olds. (We’ll leave the older population out of it.) Then about 11% of the adult audience would be under 30. Suppose there were now a change, just so that children under 15 were no longer being recruited into classical music, but after that age they continued to be enter at the same rate as before. Then the fraction of the adult audience under 30 would be halved, to about 5.5%. The number of people in their 40s and 50s going to concerts would decline by about 15%.

I’m not arguing that this is what is going on. A lot of the story is probably the general splintering of the music audience, and the fact that people increasingly prefer to stay home for their entertainment. (This is one reason why I have argued that the classical music establishment’s reliance on enormously expensive orchestras and opera companies is a mistake.) Just that you can’t make inferences about individual trajectories over time without data about individual trajectories.

Health selection bias: A choose your own preposition contest

Back when I was in ninth grade, we were given a worksheet where we were supposed to fill in the appropriate conjunction in sentences where it had been left out. One sentence was “The baseball game was tied 0 to 0, ——– the game was exciting.” Not having any interest in spectator sports, I guessed “but”, assuming that no score probably meant that nothing interesting had happened. This was marked wrong, because those who know the game know that no score means that lots of exciting things needed to happen to prevent scoring. Or something.

With that in mind, fill in the appropriate preposition in this sentence:

Death rates in children’s intensive care units are at an all-time low ————— increasing admissions, a report has shown.

If you chose despite you would agree with the BBC. But a good argument could be made that because of or following a period of. That is, if you think about it, it’s at least as plausible — I would say, more plausible — to expect increasing admissions to lead to lower death rates. The BBC is implicitly assuming that the ICU children are just as sick as ever, and more of them are being pushed into an overburdened system, so it seems like a miracle if the outcomes improve. Presumably someone has done something very right.

But in the absence of any reason to think that children are getting sicker, the change in numbers of admissions must mean a different selection criterion for admission to the ICU. The most likely change would be increasing willingness to admit less critically ill children to the ICU, which has the almost inevitable consequence of raising survival rates (even if the effect on the sickest children in the ICU is marginally negative).

When looking at anything other than whole-population death rates, you always have the problem of selection bias. This is a general complication that needs to be addressed when comparing medical statistics between different systems. For instance, an increase of end-of-life hospice care, it has been pointed out, has the effect of making hospital death rates look better. (Even for whole-population death rates you can have problems caused by migration, if people tend to move elsewhere when they are terminally ill. This has traditionally been a problem with Hispanic mortality rates in the US, for instance.)

What is a disease?

Gilbert’s Syndrome is a genetic condition, marked by raised blood levels of unconjugated bilirubin, caused by less active forms of the gene for conjugating bilirubin.

There are disagreements about whether this should be called a disease. Most experts say it is not a disease, because it has no significant adverse consequences. The elevated bilirubin can lead to mild jaundice, and some people with GS may have difficulty breaking down acetaminophen and some other drugs, and so be at greater risk of drug toxicity. They also have elevated risk for gallstones. GS may be linked to fatigue, difficulty in concentration, and abdominal pain. On the other hand, a large longitudinal study found that the 11% of the population possessing one of these Gilbert’s variants had its risk of cardiovascular disease reduced by 2/3.

WHAT? 2/3 lower risk of the greatest cause of mortality in western societies? That’s the “syndrome”?

Maybe we should rewrite that: anti-Gilbert Syndrome is a genetic ailment, marked by lowered blood levels of unconjugated bilirubin, caused by overly active forms of the gene for conjugating bilirubin. This leads to a tripled risk of cardiovascular disease. On the other hand, the 89% of the population suffering from AGS has lower risk of gallstones, and tends to have lowered risk of acetaminophen poisoning. They may have lowered incidence of fatigue and abdominal pain.

Avastin didn’t fail the clinical trial. The clinical trial failed Avastin.

Writing in the NY Times, management professor Clifton Leaf quotes (apparently with approval) comments that ought to win the GlaxoSmithKline Prize for Self-Serving Distortions by a Pharmaceutical Company. Referring to the prominent recent failure of Genentech’s cancer drug Avastin to prolong the lives of patients with glioblastoma multiforme, Leaf writes

Doctors had no more clarity after the trial about how to treat brain cancer patients than they had before. Some patients did do better on the drug, and indeed, doctors and patients insist that some who take Avastin significantly beat the average. But the trial was unable to discover these “responders” along the way, much less examine what might have accounted for the difference. (Dr. Gilbert is working to figure that out now.)

Indeed, even after some 400 completed clinical trials in various cancers, it’s not clear why Avastin works (or doesn’t work) in any single patient. “Despite looking at hundreds of potential predictive biomarkers, we do not currently have a way to predict who is most likely to respond to Avastin and who is not,” says a spokesperson for Genentech, a division of the Swiss pharmaceutical giant Roche, which makes the drug.

This is, in technical terms, a load of crap, and it’s exactly the sort of crap that double-blind randomised clinical trials are supposed to rescue us from. People are generally prone to see patterns in random outcomes; physicians are probably worse than the average person, because their training and their culture biases them toward action over inaction.

It’s bizarre, the breezy self-confidence with which Leaf (and the Genentech spokesman) can point to a trial where the treatment group did worse than the placebo group — median survival of 15.7 months vs. 16.1 months — and conclude that the drug is helping some people, we just can’t tell which they are. If there are “responders”, who do better with Avastin than they would have otherwise, then there must also be a subgroup of patients who were harmed by the treatment. (If the “responders” are a very small subset, or the benefits are very small, they could just be lost in the statistical noise, but of course that’s true for any test. You can only say the average effect is likely in a certain range, not that it is definitely zero.)

It’s not impossible that there are some measurable criteria that would isolate a subgroup of patients who would benefit from Avastin, and separate them from another subgroup that would be harmed by it. But I don’t think there is anything but wishful thinking driving insistence that there must be something there, just because doctors have the impression that some patients are being helped. The history of medicine is littered with treatments that physicians were absolutely sure were effective, because they’d seen them work, but that were demonstrated to be useless (or worse) when tested with an appropriate study design. (See portacaval shunt.)

The system of clinical trials that we have is predicated on the presumption that most treatments we try just won’t work, so we want strong positive evidence that they do. This is all the more true when cognitive biases and financial self interest are pushing people to see benefits that are simply not there.