Fraud detection and statistics

Elizabeth Holmes, founder of Theranos, has now been formally indicted for criminal fraud. I’ve commented on the company before, and on the journalistic conventions around intellectuals that fostered her rise. But now that the Theranos story is coming to an end, I feel a need to comment on how utterly unnecessary this all was.

At its peak, Theranos was valued at $9 billion and employed 800 people. Yet according to John Carreyrou, the Wall Street Journal reporter whose investigations exposed Theranos’s fraud, the company is down to just 20 employees who are trying to close up shop.

All credit to Carreyrou, who by all accounts has done an excellent job investigating and reporting on this fiasco, but literally any statistician — anyone who has been through and understood a first-year statistics course — could have said from the start that this was sheer nonsense. That’s presumably why the board was made up mainly of politicians and generals.

The promise of Theranos was that they were going to revolutionise medicine by performing a hundred random medical tests on a drop of blood, and give patients a complete readout of their state of health, independent of medical recommendation of specific tests. But any statistician knows — and every medical practitioner should know — that the reason we don’t do lots of random tests without any specific indication isn’t that they’re too expensive — many aren’t — or that they require too much blood, but that the more tests you do, the more false positives you’re going to accumulate.

If you do a hundred tests on an average person, you’re going to find at least a few questionable results — either from measurement error, or because most tests aren’t all that specific — requiring followups and expensive investigations, and possibly unnecessary treatments.

Of course, if I had to evaluate the proposal for such a company I would keep an open mind about the possibility of a conceptual breakthrough that would allow them to control the false positives. But I would have demanded very clear evidence and explanations. The fact that the fawning news reports back in 2013-15 raved about the genius new biomedical technology, and failed to even claim to have produced (or found) any innovative statistical methodology, made me pretty sure that they had no idea what they were doing. In the end, it turned out that the biomedical innovations were also fake, which I probably should have guessed. But if the greedhead generals — among them the current secretary of defense, who definitely should be questioned about this, and probably ought to resign — had asked a statistician, they could have saved a lot of people a lot of unpleasantness, and maybe helped save Elizabeth Holmes from herself.

Statistics, politics, and causal theories

A new headline from the Trump era:

Fewer Immigrants Are Reporting Domestic Abuse. Police Blame Fear of Deportation.

Compare it to this headline from a few months ago:

Arrests along Mexico border drop sharply under Trump, new statistics show

This latter article goes on to comment

The figures show a sharp drop in apprehensions immediately after President Trump’s election win, possibly reflecting the deterrent effect of his rhetoric on would-be border crossers.

It must be noted that these two interpretations of declining enforcement are diametrically opposed: In the first case, declining reports to police are taken as evidence of nothing other than declining reports, whereas the latter analysis eschews such a naive interpretation, suggesting that the decline in apprehensions is actually evidence of a decline in the number of offenses (in this case, illegal border crossings).

I don’t mean to criticise the conventional wisdom, which seems to me eminently sensible. I just think it’s interesting how little the statistical “facts” are able to speak for themselves. The same facts could mean that the election of Trump was associated with a decline in domestic violence in immigrant communities, and also with a reduction in border patrol effectiveness. It’s hard to come up with a causal argument for either of these — Did immigrant men look at Trump with revulsion and decide, abusing women is for the gringos? Did ICE get so caught up with the fun of splitting up families in midwestern towns and harassing Spanish speakers in Montana, that they stopped paying attention to the southern border? — so we default to the opposite conclusion.

Why Israel?

The Guardian has published an “exclusive” on the future of European science funding after  Brexit. The key point:

A draft copy of the so-called Horizon Europe document, seen by the Guardian, suggests that the UK is set to be offered less generous access than countries with associate status in the current programme, known as Horizon 2020, including Israel, Turkey, Albania and Ukraine.

So why does the headline say

Brexit: UK may get poorer access than Israel to EU science scheme

Why Israel? If I had to pick a country on the list whose prominence in scientific research makes it seem insulting that they would have a higher priority in research collaboration than the UK, it might be Albania. It definitely wouldn’t be Israel. So might there be some other reason why The Guardian wants to highlight for its readers the shame of being treated worse than Israel by the EU?

Do loony leftists use the right-hand rule?

So Leave.EU is still active, and apparently last year they were soliciting a graphic to ridicule journalist Carole Cadwalladr:

As a mathematical scientist it strikes me as significant that she is considered to be discredited by association with three images: Flat Earth, Illuminati (though it looks to me like the Masonic eye from the US dollar bill), and what looks like a cheat sheet for an introductory electromagnetism course. Down in the corner we see that she’s been learning the right-hand rule for multiplying vectors. Right above it she has the formula for calculating power, which seems problematic.

Neanderthal science

I just listened to all of a two-hour discussion between journalist Ezra Klein and professional atheist Sam Harris, about Harris’s defense of the right-wing policy entrepreneur (as Matthew Yglesias has described him) Charles Murray, famous for his racist application of intelligence research to public policy, most famously in a notorious chapter of his book The Bell Curve. Klein pushes back effectively against Harris’s self-serving martyrdom — Harris, not unreasonably, identifies with the suffering of a wealthy and famous purveyor of quack science whose livelihood is ever-so-slightly harmed by public criticism* — but he doesn’t sufficiently engage, I think, with Harris’s contention that he is promoting the values of real science. Unfortunately, the “mainstream social science” that Harris and Murray are promoting exists only in secret messages from “reputable scientists in my inbox, who have totally taken my side in this, but who are too afraid to say so publicly”. Harris doesn’t allow for a second that there is any good-faith argument on the other side. Anyone who disagrees is merely trying to shut down scientific progress, or simply confusing scientific truth with do-gooding wishful thinking.

The truth of the matter is, Murray and other brave seekers of truth are doing the opposite of helping to clarify reality. They are wading into a swamp of confusion, and pulling out some especially stinky slime that they can hurl at disfavoured groups.

As much as Harris tries to promote Murray as a pure-hearted “content-of-our-character” anti-racist individualist, as long as “race” exists as a social factor affecting people’s self-image, the communities they belong to, and the way they are perceived by others, it remains a potent social force. When demographers argue that “race” isn’t “real”, they are saying that racial categories don’t separate natural clusters by genetic or physical traits. When Murray says, let’s stop talking about race, let’s talk about individual genetic endowments, he is saying that racial groupings have no causal effect on their own, but only label clusters whose difference arise from deep physical causes — wrong on both sides. (more…)

Why people hate statisticians

Andrew Dilnot, former head of the UK Statistics Authority and current warden (no really!) of Nuffield College, gave a talk here last week, at our annual event honouring Florence Nightingale qua statistician. The ostensible title was “Numbers and Public policy: Why statistics really matter”, but the title should have been “Why people hate statisticians”. This was one of the most extreme versions I’ve ever seen of a speaker shopping trite (mostly right-wing) political talking points by dressing them up in statistics to make the dubious assertions seem irrefutable, and to make the trivially obvious look ingenious.

I don’t have the slides from the talk, but video of a similar talk is available here. He spent quite a bit of his talk trying to debunk the Occupy Movement’s slogan that inequality has been increasing. The 90:10 ratio bounced along near 3 for a while, then rose to 4 during the 1980s (the Thatcher years… who knew?!), and hasn’t moved much since. Case closed. Oh, but wait, what about other measures of inequality, you may ask. And since you might ask, he had to set up some straw men to knock down. He showed the same pattern for five other measures of inequality. Case really closed.

Except that these five were all measuring the same thing, more or less. The argument people like Piketty have been making is not that the 90th percentile has been doing so much better than the 10th percentile, but that increases in wealth have been concentrated in ever smaller fractions of the population. None of the measures he looked was designed capture that process. The Gini coefficient, which looks like it measures the whole distribution, because it is a population average is actually extremely insensitive to extreme concentration at the high end. Suppose the top 1% has 20% of the income. Changes of distribution within the top 1% cannot shift the Gini coefficient by more than about 3% of its current value. He also showed the 95:5 ratio, and low-and-behold, that kept rising through the 90s, then stopped. All consistent with the main critique of rising income inequality.

Since he’s obviously not stupid, and obviously understands economics much better than I do, it’s hard to avoid thinking that this was all smoke and mirrors, intended to lull people to sleep about rising inequality, under the cover of technocratic expertise. It’s a well-known trick: Ignore the strongest criticism of your point of view, and give lots of details about weak arguments. Mathematical details are best. “Just do the math” is a nice slogan. Sometimes simple (or complex) calculations can really shed light on a problem that looks to be inextricably bound up with political interests and ideologies. But sometimes not. And sometimes you just have to accept that a political economic argument needs to be melded with statistical reasoning, and you have to be open about the entirety of the argument. (more…)

Small samples

New York Republican Representative Lee Zeldin was asked by reporter Tara Golshan how he felt about the fact that polls seem to show that a large majority of Americans — and even of Republican voters — oppose the Republican plan to reduce corporate tax rates. His response:

What I have come in contact with would reflect different numbers. So it would be interesting to see an accurate poll of 100 million Americans. But sometimes the polls get done of 1,000 [people].

Yes, that does seem suspicious, only asking 1,000 people… The 100 million people he has come in contact with are probably more typical.

Learning to count

The US espionage services promised last year to reveal roughly how many Americans were illegally spied upon through “accidents” in the warrantless surveillance law restricted to communications by foreigners overseas.

Last month the promise was retracted.

“The NSA has made Herculean, extensive efforts to devise a counting strategy that would be accurate,” Dan Coats, a career Republican politician appointed by Republican President Donald Trump as the top U.S. intelligence official, testified to a Senate panel on Wednesday.

Coats said “it remains infeasible to generate an exact, accurate, meaningful, and responsive methodology that can count how often a U.S. person’s communications may be collected” under the law known as Section 702 of the Foreign Intelligence Surveillance Act.

So we’re supposed to believe that the NSA is capable of making brilliant use of the full depth of private communications to map out threats to US national security… but isn’t capable of counting them. Presumably these “Herculean” efforts involve a strong helping of Cretan Bull—-.

I am reminded of this passage in Der Mann ohne Eigenschaften [The Man without Qualities]:

Es gibt also in Wirklichkeit zwei Geistesverfassungen, die einander nicht nur bekämpfen, sondern die gewöhnlich, was schlimmer ist, nebeneinander bestehen, ohne ein Wort zu wechseln, außer daß sie sich gegenseitig versichern, sie seien beide wünschenswert, jede auf ihrem Platz. Die eine begnügt sich nicht damit, genau zu sein, und hält sich an die Tatsachen; die andere begnügt sich nicht damit, sondern schaut immer auf das Ganze und leitet ihre Erkenntnisse von sogenannten ewigen und großen Wahrheiten her. Die eine gewinnt dabei an Erfolg, und die andere an Umfang und Würde. Es ist klar, daß ein Pessimist auch sagen könnte, die Ergebnisse der einen seien nichts wert und die der anderen nicht wahr. Denn was fängt man am Jüngsten Tag, wenn die menschlichen Werke gewogen werden, mit drei Abhandlungen über die Ameisensäure an, und wenn es ihrer dreißig wären?! Andererseits, was weiß man vom Jüngsten Tag, wenn man nicht einmal weiß, was alles bis dahin aus der Ameisensäure werden kann?!

Thus there are in fact two distinct mental types, which not only battle each other, but which, even worse, generally coexist side by side without ever exchanging a word, other than that they assure each other that they are each desirable in their own place. The one contents itself with being exact and keeping to the facts; the other is not content with that, always looking for the big picture, and derives its knowledge from so-called great eternal truths. The one gains success thereby, the other gains scope and value. Of course, a pessimist could always say, the results of the one have no value, while those of the other have no truth. After all, when the Last Judgment comes, what are we going to do with three treatises on formic acid, and so what if we even had thirty of them?! On the other hand, what could we possibly know about the Last Judgment, if we don’t even know what could happen by then with formic acid?!

The NSA is focusing on the big picture…

Exponential vernacular

Like most mathematicians, I think, I’m irritated by the way “grows exponentially” has come into common parlance as a synonym for “grows rapidly”; whereas exponential growth in mathematics may be fast or slow, depending on the current level of the quantity. This has even crossed into technical discussions, as when I heard a talk by a cancer expert who objected to standard claims that cancer mortality increases exponentially through adulthood — which it does — because the levels actually stay low through the 50s, and so only “increase exponentially” after that point.

Anyway, I was under the impression that the vernacular application of this mathematical concept was fairly recent. So I was intrigued to find the cognate concept of “growing geometric”  popping up in Evan Thomas’s Nixon biography, on the Watergate tapes. In the context of cancer. Used correctly! It’s quite a famous part of Watergate lore, where John Dean refers to Watergate as a “cancer… close to the presidency”.

We have a cancer — within — close to the presidency, that’s growing. It’s growing daily. It’s compounding, it grows geometrically now, because it’s compounding.

“A triumph of personal bias over research”

The Guardian reports on a new research study that finds the overstretching of the NHS — particularly in the winter — has caused about 30,000 excess deaths in 2015. The government’s response is practically Trumpian:

A DH spokesman described the study as “a triumph of personal bias over research”. He added: “Every year there is significant variation in reported excess deaths, and in the year following this study they fell by nearly 20,000, undermining any link between pressure on the NHS and the number of deaths. Moreover, to blame an increase in a single year on ‘cuts’ to the NHS budget is arithmetically impossible given that budget rose by almost £15bn between 2009-10 and 2014-15.”

Demeaning experts who bring unpleasant news is the primary tactic. (more…)

