## Health selection bias: A choose your own preposition contest

Back when I was in ninth grade, we were given a worksheet where we were supposed to fill in the appropriate conjunction in sentences where it had been left out. One sentence was “The baseball game was tied 0 to 0, ——– the game was exciting.” Not having any interest in spectator sports, I guessed “but”, assuming that no score probably meant that nothing interesting had happened. This was marked wrong, because those who know the game know that no score means that lots of exciting things needed to happen to prevent scoring. Or something.

With that in mind, fill in the appropriate preposition in this sentence:

Death rates in children’s intensive care units are at an all-time low ————— increasing admissions, a report has shown.

If you chose despite you would agree with the BBC. But a good argument could be made that because of or following a period of. That is, if you think about it, it’s at least as plausible — I would say, more plausible — to expect increasing admissions to lead to lower death rates. The BBC is implicitly assuming that the ICU children are just as sick as ever, and more of them are being pushed into an overburdened system, so it seems like a miracle if the outcomes improve. Presumably someone has done something very right.

But in the absence of any reason to think that children are getting sicker, the change in numbers of admissions must mean a different selection criterion for admission to the ICU. The most likely change would be increasing willingness to admit less critically ill children to the ICU, which has the almost inevitable consequence of raising survival rates (even if the effect on the sickest children in the ICU is marginally negative).

When looking at anything other than whole-population death rates, you always have the problem of selection bias. This is a general complication that needs to be addressed when comparing medical statistics between different systems. For instance, an increase of end-of-life hospice care, it has been pointed out, has the effect of making hospital death rates look better. (Even for whole-population death rates you can have problems caused by migration, if people tend to move elsewhere when they are terminally ill. This has traditionally been a problem with Hispanic mortality rates in the US, for instance.)

## What is a disease?

Gilbert’s Syndrome is a genetic condition, marked by raised blood levels of unconjugated bilirubin, caused by less active forms of the gene for conjugating bilirubin.

There are disagreements about whether this should be called a disease. Most experts say it is not a disease, because it has no significant adverse consequences. The elevated bilirubin can lead to mild jaundice, and some people with GS may have difficulty breaking down acetaminophen and some other drugs, and so be at greater risk of drug toxicity. They also have elevated risk for gallstones. GS may be linked to fatigue, difficulty in concentration, and abdominal pain. On the other hand, a large longitudinal study found that the 11% of the population possessing one of these Gilbert’s variants had its risk of cardiovascular disease reduced by 2/3.

WHAT? 2/3 lower risk of the greatest cause of mortality in western societies? That’s the “syndrome”?

Maybe we should rewrite that: anti-Gilbert Syndrome is a genetic ailment, marked by lowered blood levels of unconjugated bilirubin, caused by overly active forms of the gene for conjugating bilirubin. This leads to a tripled risk of cardiovascular disease. On the other hand, the 89% of the population suffering from AGS has lower risk of gallstones, and tends to have lowered risk of acetaminophen poisoning. They may have lowered incidence of fatigue and abdominal pain.

## Avastin didn’t fail the clinical trial. The clinical trial failed Avastin.

Writing in the NY Times, management professor Clifton Leaf quotes (apparently with approval) comments that ought to win the GlaxoSmithKline Prize for Self-Serving Distortions by a Pharmaceutical Company. Referring to the prominent recent failure of Genentech’s cancer drug Avastin to prolong the lives of patients with glioblastoma multiforme, Leaf writes

Doctors had no more clarity after the trial about how to treat brain cancer patients than they had before. Some patients did do better on the drug, and indeed, doctors and patients insist that some who take Avastin significantly beat the average. But the trial was unable to discover these “responders” along the way, much less examine what might have accounted for the difference. (Dr. Gilbert is working to figure that out now.)

Indeed, even after some 400 completed clinical trials in various cancers, it’s not clear why Avastin works (or doesn’t work) in any single patient. “Despite looking at hundreds of potential predictive biomarkers, we do not currently have a way to predict who is most likely to respond to Avastin and who is not,” says a spokesperson for Genentech, a division of the Swiss pharmaceutical giant Roche, which makes the drug.

This is, in technical terms, a load of crap, and it’s exactly the sort of crap that double-blind randomised clinical trials are supposed to rescue us from. People are generally prone to see patterns in random outcomes; physicians are probably worse than the average person, because their training and their culture biases them toward action over inaction.

It’s bizarre, the breezy self-confidence with which Leaf (and the Genentech spokesman) can point to a trial where the treatment group did worse than the placebo group — median survival of 15.7 months vs. 16.1 months — and conclude that the drug is helping some people, we just can’t tell which they are. If there are “responders”, who do better with Avastin than they would have otherwise, then there must also be a subgroup of patients who were harmed by the treatment. (If the “responders” are a very small subset, or the benefits are very small, they could just be lost in the statistical noise, but of course that’s true for any test. You can only say the average effect is likely in a certain range, not that it is definitely zero.)

It’s not impossible that there are some measurable criteria that would isolate a subgroup of patients who would benefit from Avastin, and separate them from another subgroup that would be harmed by it. But I don’t think there is anything but wishful thinking driving insistence that there must be something there, just because doctors have the impression that some patients are being helped. The history of medicine is littered with treatments that physicians were absolutely sure were effective, because they’d seen them work, but that were demonstrated to be useless (or worse) when tested with an appropriate study design. (See portacaval shunt.)

The system of clinical trials that we have is predicated on the presumption that most treatments we try just won’t work, so we want strong positive evidence that they do. This is all the more true when cognitive biases and financial self interest are pushing people to see benefits that are simply not there.

## Not the Lake Wobegon Hospital

From the front page of the West County Times:

## Death rates at Bay Area hospitals vary widely, new report reveals

While some hospitals excelled at keeping patients alive, more than half of institutions around the Bay Area had worse-than-average death rates for at least one medical procedure or patient condition in 2010 and 2011, a new state report reveals.

## Longitudinal fables, ctd.: Is Julia shrinking?

I was commenting on how people like to turn age-structured information into longitudinal stories: If 80-year-olds buy more big-band recordings, and 20-year-olds more rap, we describe how people’s tastes shift as they age, from the hard rhythms of rap to the gentle lilt of swing. And I noted that the Obama campaign got itself into trouble last year trying to turn its age-specific policies into a longitudinal fable, called “The Life of Julia”. Looking at the pictures of Julia at different ages

I had the impression that Julia is shrinking as she moves into her forties.

More careful inspection of the pictures revealed that she is not shrinking (or not much); the main height change came when she stopped wearing high heels at age 37. But  that got me wondering: should she have been shrinking? Or would that again have been confusing the cross section with the individual life course — the period with the cohort effect, in demographer jargon?

It’s certainly true that cohorts in America (and in many other prosperous countries) have been getting taller. US Civil war soldiers in 1863-4 averaged 5′ 7 1/2″. 50 years later the average height of young men had not changed significantly, but by 1955 the average height of young men was up to 5′ 9 1/2″ (and they were attaining their maximum height several years earlier). It’s not clear to what extent the trend has continued in the US — according to recent data, the average height of young male adults in the US is still about 5′ 9 1/2″ — though it clearly has in other countries that have seen a substantial improvement in children’s average nutritional welfare, such as Portugal, or the Netherlands, Italy, and Japan.

There is also a tendency for individuals to shrink as they age, from compression of the spine, particularly pronounced after age 60, and more extreme in women than in men. A sketch from this paper is included below. So, in fact, the hypothetical Julia should probably have been drawn about 2 inches shorter at age 67 than when she was 20. That’s about 3% — hard to tell from the silhouettes, with the changing hairstyle and all…

It’s funny, because I have seen height used as a paradigm example of where cross-sectional measures are misleading if you interpret them as cohort effects — narrating the changes within individual lives — but at least for the latter half of the 20th century in the US, the cross sectional data seem to give the right picture.

## The Life of Julia: Another longitudinal fable

Picking up from my earlier discussion of the way cross-sectional data  get turned into (sometimes misleading) longitudinal stories, it’s been about a year since the Obama campaign unveiled The Life of Julia, a slide show that contrasted Obama’s and Romney’s policies with regard to their effects on women at different ages. Stated that way it would be pretty standard and uncontroversial, but in fact it turned into a flashpoint for the early part of the campaign. Why? Precisely because it was not a list of cross-sectional promises — What President Obama will do for children; What President Obama will do for seniors; etc. would be standard campaign web site headings — but was turned into a longitudinal story. These were not 12 different women, of different ages, who would putatively be helped by the president’s policies, but a single woman “Julia” who seems to be spending her whole life looking for government programs to scrounge from. Of course, it only seems this way because of the way this infographic interacts with our expectations of a biographical narrative, where we expect to be seeing the high points of her life, and every one of them involves government services. Creepy! It’s no wonder some critics were reminded of cradle-to-grave socialism.

Of course, the real story is cross-sectional. If Julia is 3 years old now, Obama is not really promising to provide a small business loan to her in the year 2040. And by the time she reaches retirement, she’ll probably be living on a Mars colony or hiding out from roving mutant bandits in subterranean bunkers after the nuclear climate catastrophe.

## Stephen Wolfram’s longitudinal fables

There’s lots of interesting plots on Stephen Wolfram’s analysis of Facebook data, but what jumps out to me is the way he feels compelled to turn his cross-sectional data — information about people’s interests, structure of friendship networks, relationship status, etc. as a function of age — into a longitudinal story. For example, he describes this plot

by saying “The rate of getting married starts going up in the early 20s[…] and decreases again in the late 30s, with about 70% of people by then being married.” Now, this is more or less a true statement, but it’s not really what is being illustrated here. (And it’s not just the weird anomaly, which he comments on but doesn’t try to explain, of the 10% or so of Facebook 13 year olds who describe themselves as married.) What we see is a snapshot in time — a temporal cross section, in the jargon — rather than a description of how the same people (a cohort, as demographers would put it) moves through life. To see how misleading this cross-sectional picture can be if you try to see it as a longitudinal story of individuals moving through life, think first about the right-hand side of the graph. It is broadly true, according to census data, that about 80% of this age group are married or widowed. But it is also true that 95% were once married. In fact, if they had had Facebook when they were 25 years old, their Stephen Wolfram would have found that most of them (about 75%) were already married by that age. (In fact, about 5% of the women and 3% of the men were already in a second marriage by age 25.)

So, the expansion of the “married” segment of the population as we go from left to right reflects in part the typical development of a human life, but it reflects as well the fact that we are moving back in time, to when people were simply more likely to marry. And the absence of a “divorced” category masks the fact that while the ranks of the married expand with age, individuals move in and out of that category as they progress through their lives.

Of course, the same caveat applies to the stories that Wolfram tells about his (quite fascinating) analyses of structure of friend networks by age, and of the topics that people of different ages refer to in Facebook posts. While it is surely true that the surge in discussion of school and university centred at age 18 reflects life-phase-determined variation in interests, the extreme drop in interest in salience of social media as a topic is likely to reflect a generational difference, and the steep increase in prominence of politics with age may be generational as well. (I wonder, too, whether the remarkably unchanging salience of “books” might reflect a balance between a tendency to become less involved with books with age, cancelling out a generational shift away from interest in books.)

## Without a net

One linguistic phenomenon that fascinates me more than it probably should is when a word or phrase can have opposite meanings in different contexts. Like the English word cleave (e.g. Genesis 2:24 “Therefore shall a man leave his father and his mother, and shall cleave unto his wife”, as opposed to a meat cleaver.)

I recently watched Ken Burns’s controversial 15-hour documentary Jazz. One segment, focusing on Miles Davis’s turn to fusion and electronica, was titled “Tennis Without a Net”. This quotes the critic Gerald Early, who appears in the segment, but of course refers back to Robert Frost’s bon mot about free verse (“tennis with the net down”). The implication is that music is a game, whose spectators are judging above all the players’ adroitness in accomplishing inherently simple things under complicated artificial constraints. Free jazz has also been tagged with this undead witticism.

So playing “without a net” means making things too easy, too safe, since no one can say if you’ve gotten it wrong. But “performing without a net” can also mean taking exceptional risks, as in the title of the Grateful Dead’s 1990 album “Without a Net”. There the metaphor is the circus acrobat’s net, and the implication is that the band’s free improvisation is particularly risky, since they are performing live without the support of a predetermined musical structure.

And this reminds me again of Natalia Cecire’s fascinating attack on statistics as an “inherently puerile discipline”, because its highest priority is “commitment to the rules of the game”. (I should make clear, as I argued before, I disagree with Cecire’s opinion of statistics, but I find the framework she lays on it both creative and useful.) Are statisticians making their research too safe by performing with the net of mathematical methodology, or are humanists like Cecire setting themselves a too-easy task by playing with the net of rigorous quantitative analysis down?

## What’s the Matter with Economics?

One of the most politically important economics results of recent years has been the paper by Reinhart and Rogoff on the link between high sovereign debt and low GDP growth. This work is something I’d been following for a while, as R&R’s book was one that I’d admired greatly. Their work claimed to show a strong negative correlation between sovereign debt/GDP ratio and ensuing GDP growth, and was reported as saying that 90% debt/GDP ratio marks a cliff that an economy falls off, killing future growth. This was seized upon by proponents of austerity as proof that budget cuts can’t wait.

As reported here and here by Paul Krugman, and here and here by Matt Yglesias, it now turns out that the result isn’t just theoretically misguided, it’s bogus. Economists who struggled to reproduce the results finally isolated a whole raft of errors and dubious hidden assumptions that completely undermine the conclusion. Only the most blatantly ridiculous fault was an error in their Excel spreadsheet formula that caused them to exclude important sections of the data from their computation. You’d think that this couldn’t get any worse, but instead of apologising abjectly, R&R have tried to argue that none of this was really essential to their real point, whatever that was.

My main thoughts:

1. Do economists really do their analysis with Excel? I find this kind of shocking, like if I found out that some surgeons like to make their incisions with flint knives, or if airline pilots were calculating their flightpaths with slide rules. Once you accept that premise, it’s not surprising that they made a blunder like this. I’m not a snob about technology. Spreadsheets are great for doing payrolls, and for getting a look at tables of numbers, and doing some quick calculations. But they’re so opaque, they’re not appropriate to academic work, and they’re so inflexible that it’s inconceivable to me that someone who analyses data on a more or less regular basis would choose to use them. Continue reading “What’s the Matter with Economics?”

## Statisticians of the World, Unite!

#### You have nothing to lose but your Markov chains!

From “A Brief History of Britain 1066-1485”, by Nicholas Vincent:

Finally, as in all modern debates from which statistics and the spirit of Karl Marx are never far distant, it has been argued that, by 1200, Philip of France was far richer than the King of England and therefore ideally placed to seize the Plantagenet lands.

Statisticians or Stakhanovites? You decide…