Gender and the Metropolis (algorithm)

I’ve always heard of the Metropolis algorithm having been invented for H-bomb calculations by Nicholas Metropolis and Edward Teller. But I was just looking at the original paper, and discovered that there are five authors: Metropolis, Rosenbluth, Rosenbluth, Teller, and Teller. Particularly striking having two repeated surnames, and a bit of research uncovers that these were two married couples: Arianna Rosenbluth and Marshall Rosenbluth, and Augusta Teller and Edward Teller. In particular, Arianna Rosenbluth (née Wright) appears to have been a formidable character, according to her Wikipedia page: She completed her physics PhD at Harvard at the age of 22.

In keeping with the 1950s conception of computer programming as women’s work, the two women were responsible, in particular, for all the programming — a heroic undertaking in those pre-programming language days, on the MANIAC I — and Rosenbluth in particular did all the programming for the final paper.

And also in keeping with the expectations of the time, and more depressingly, according to the Wikipedia article “After the birth of her first child, Arianna left research to focus on raising her family.”

Can’t look away

Many years ago I read to my daughter a children’s book in which a little girl learning to ride a bicycle keeps running into objects like trees and lampposts. A bicycle instructor explains to her that when you become too fixated on an obstacle it exerts a strong psychological pull, so that the very exigency of evading it leads to a crash.

I used to wonder whether this was a real phenomenon. I don’t anymore…

Actually, I’ve long thought the second Iraq War was an example of the same phenomenon. There was no possibility that there wouldn’t be a war, because once they’d started to consider it Bush and Blair couldn’t bare not to see how it would turn out.

The first principle of statistical inference

When I first started teaching basic statistics, I thought about how to explain the importance of statistical hypothesis testing. I focused on a textbook example (specifically, Freedman, Pisani, Purves Statistics, 3rd ed., sec 28.2) of a data set that seems to show more women being right-handed than men. I pointed out that we could think of many possible explanations: Girls are pressured more to conform, women are more rational — hence left-brain-centred. But before we invest too much time and credibility in abstruse theories to explain the phenomenon, we should first make sure that the phenomenon is real, that it’s not just the kind of fluctuation that could happen by accident. (It turns out that the phenomenon is real. I don’t know if either of my explanations is valid, or if anyone has a more plausible theory.)

I thought if this when I heard about the strange Oxford-AstraZeneca vaccine serendipity that was announced this week. The third vaccine success announced in as many weeks, the researchers announced that they had found about a 70% efficacy, which is good, but not nearly as impressive as the 95% efficacy of the mRNA vaccines announced earlier in the month. But the strange thing was, they found that a subset of the test subjects who received only a half dose at the first injection, and a full dose later, showed a 90% efficacy. Experts have been all over the news media trying to explain how some weird idiosyncrasies of the human immune system and the chimpanzee adenovirus vector could make a smaller dose more effective. Here’s a summary from Science:

Researchers especially want to know why the half-dose prime would lead to a better outcome. The leading hypothesis is that people develop immune responses against adenoviruses, and the higher first dose could have spurred such a strong attack that it compromised the adenovirus’ ability to deliver the spike gene to the body with the booster shot. “I would bet on that being a contributor but not the whole story,” says Adrian Hill, director of Oxford’s Jenner Institute, which designed the vaccine…

Some evidence also suggests that slowly escalating the dose of a vaccine more closely mimics a natural viral infection, leading to a more robust immune response. “It’s not really mechanistically pinned down exactly how it works,” Hill says.

Because the different dosing schemes likely led to different immune responses, Hill says researchers have a chance to suss out the mechanism by comparing vaccinated participants’ antibody and T cell levels. The 62% efficacy, he says, “is a blessing in disguise.”

Others have pointed out that the populations receiving the full dose and the half dose were substantially different: The half dose was given by accident to a couple of thousand subjects at the start of the British arm of the study. These were exclusively younger, healthier individuals, something that could also explain the higher efficacy, in a less benedictory fashion.

But before we start arguing over these very interesting explanations, much less trying to use them to “suss out the mechanisms” the question they should be asking is, is the effect real? The Science article quotes immunologist John Moore asking “Was that a real, statistically robust 90%?” To ask that question is to answer it resoundingly: No.

They haven’t provided much data, but the AstraZeneca press release does give enough clues:

One dosing regimen (n=2,741) showed vaccine efficacy of 90% when AZD1222 was given as a half dose, followed by a full dose at least one month apart, and another dosing regimen (n=8,895) showed 62% efficacy when given as two full doses at least one month apart. The combined analysis from both dosing regimens (n=11,636) resulted in an average efficacy of 70%. All results were statistically significant (p<=0.0001)

Note two tricks they play here. First of all, they give those (n=big number) which makes it seem reassuringly like they have an impressively big study. But these are the numbers of people vaccinated, which is completely irrelevant for judging the uncertainty in the estimate of efficacy. The reason you need such huge numbers of subjects is so that you can get moderately large numbers where it counts: the number of subjects who become infected. Further, while it is surely true that the “results” were highly statistically significant — that is, the efficacy in each individual group was not zero — this tells us nothing about whether we can be confident that the efficacy is actually higher than what has been considered the minimum acceptable level of 50%, or — and this is crucial for the point at issue here — whether the two groups were different from each other.

They report a total of 131 cases. They don’t say how many cases were in each group, but if we assume that there were equal numbers of subjects getting the vaccine and the treatment in all groups then we can back-calculate the rest. We end up with 98 cases in the full-dose group (of which 27 received the vaccine) and 33 cases in the half-dose group, of which 3 received the vaccine. Just 33! Using the Clopper-Pearson exact method, we obtain 90% confidence intervals of (.781,.975) for the efficacy of the half dose and (.641, .798) for the efficacy of the full dose. Clearly some overlap there, and not much to justify drawing substantive conclusions from the difference between the two groups — which may actually be zero, or close to 0.

The return of quota sampling

Everyone knows about the famous Dewey Defeats Truman headline fiasco, and that the Chicago Daily Tribune was inspired to its premature announcement by erroneous pre-election polls. But why were the polls so wrong?

The Social Science Research Council set up a committee to investigate the polling failure. Their report, published in 1949, listed a number of faults, including disparaging the very notion of trying to predict the outcome of a close election. But one important methodological criticism — and the one that significantly influenced the later development of political polling, and became the primary lesson in statistics textbooks — was the critique of quota sampling. (An accessible summary of lessons from the 1948 polling fiasco by the renowned psychologist Rensis Likert was published just a month after the election in Scientific American.)

Serious polling at the time was divided between two general methodologies: random sampling and quota sampling. Random sampling, as the name implies, works by attempting to select from the population of potential voters entirely at random, with each voter equally likely to be selected. This was still considered too theoretically novel to be widely used, whereas quota sampling had been established by Gallup since the mid-1930s. In quota sampling the voting population is modelled by demographic characteristics, based on census data, and each interviewer is assigned a quota to fill of respondents in each category: 51 women and 49 men, say, a certain number in the age range 21-34, or specific numbers in each “economic class” — of which Roper, for example, had five, one of which in the 1940s was “Negro”. The interviewers were allowed great latitude in filling their quotas, finding people at home or on the street.

In a sense, we have returned to quota sampling, in the more sophisticated version of “weighted probability sampling”. Since hardly anyone responds to a survey — response rates are typically no more than about 5% — there’s no way the people who do respond can be representative of the whole population. So pollsters model the population — or the supposed voting population — and reweight the responses they do get proportionately, according to demographic characteristics. If Black women over age 50 are thought to be equally common in the voting population as white men under age 30, but we have twice as many of the former as the latter, we count the responses of the latter twice as much as the former in the final estimates. It’s just a way of making a quota sample after the fact, without the stress of specifically looking for representatives of particular demographic groups.

Consequently, it has most of the deficiencies of a quota sample. The difficulty of modelling the electorate is one that has gotten quite a bit of attention in the modern context: We know fairly precisely how demographic groups are distributed in the population, but we can only theorise about how they will be distributed among voters at the next election. At the same time, it is straightforward to construct these theories, to describe them, and to test them after the fact. The more serious problem — and the one that was emphasised in the commission report in 1948, but has been less emphasised recently — is in the nature of how the quotas are filled. The reason for probability sampling is that taking whichever respondents are easiest to get — a “sample of convenience” — is sure to give you a biased sample. If you sample people from telephone directories in 1936 then it’s easy to see how they end up biased against the favoured candidate of the poor. If you take a sample of convenience within a small demographic group, such as middle-income people, then it won’t be easy to recognise how the sample is biased, but it may still be biased.

For whatever reason, in the 1930s and 1940s, within each demographic group the Republicans were easier for the interviewers to contact than the Democrats. Maybe they were just culturally more like the interviewers, so easier for them to walk up to on the street. And it may very well be that within each demographic group today Democrats are more likely to respond to a poll than Republicans. And if there is such an effect, it’s hard to correct for it, except by simply discounting Democrats by a certain factor based on past experience. (In fact, these effects can be measured in polling fluctuations, where events in the news lead one side or the other to feel discouraged, and to be less likely to respond to the polls. Studies have suggested that this effect explains much of the short-term fluctuation in election polls during a campaign.)

Interestingly, one of the problems that the commission found with the 1948 polling with relevance for the Trump era was the failure to consider education as a significant demographic variable.

All of the major polling organizations interviewed more people with college education than the actual proportion in the adult population over 21 and too few people with grade school education only.

Exotic animal farming

I remember when people were muttering about Covid-19 being all the fault of the weird Chinese and their weird obsession with eating weird animals like pangolins.

So now we have a second version of Covid, that may start a completely novel pandemic, and it comes from the weird Europeans and their weird obsession with wearing the fur of weird animals like minks. Apparently, it was well known that Covid was spreading widely among the minks, but the animals were too valuable to give up on, so they tried to get away with just culling the obviously sick ones. And now we can just hope that they can get the new plague out of Denmark under control before it becomes a second pandemic.

But the people who advocate just giving up on eating and wearing animals are still treated as something between dreamy mystics and lunatics…

Less than zero, part 2

In a long-ago post I wrote about how huge debts don’t make you poor, and illustrated this with the story of real-estate mogul Donald Trump. Negative large fortunes are closer to positive large fortunes than either is to zero. (I later had to correct my interpretation later, on discovering that the counterintuitive behaviour of Trump’s creditors was largely a reflection of their involvement in money laundering.)

Now we learn from the N Y Times that Trump has been paying $750 in federal income tax each year as president. Presumably that’s just an arbitrary number that he made up so that he could say it wasn’t zero. (Apparently even Trump has some limits to his his explicit lying.)

But here’s the thing: $750 is probably worse than $0. People have been assuming he wasn’t paying taxes. It sounds like a general insult. $750 is too specific (as well as being too small). The number becomes a shorthand for his tax-dodging, as well as inviting people to compare their own tax bills to Trump’s.

This demonstrates again how absurdly miserly Donald Trump, above and beyond his criminality. He had to choose an amount to pay purely for the symbolism of possibly needing to tell average Americans how much he had paid. He could certainly have afforded not to choose an amount large enough that even Americans of modest means would find risible. At least four figures…

The opposite of a superficial lie

“The opposite of a fact is a falsehood. But the opposite of a profound truth may very well be another profound truth.”

Niels Bohr

The news media have gotten themselves tangled up, from the beginning of the Trump era, in the epistemological question of whether any statement can objectively be called a lie. Yes, Trump says things that are untrue, that contradict objectively known facts, but are they lies? Does he have the appropriate mens rea to lie, the intention to deceive, or is that just a partisan insult?

The opposite problem has gotten too little attention. Just because Donald Trump says something that corresponds to objective facts, one cannot infer that he is speaking the truth. (We don’t really have a word in English to correspond to the opposite of lie, in this dichotomy.) A good example is the controversy over Trump’s private and public comments on the incipient Coronavirus pandemic in February and March of this year. On February 7, 2020, Trump told Woodward

You just breathe the air and that’s how it’s passed. And so that’s a very tricky one. That’s a very delicate one. It’s also more deadly than even your strenuous flus.

This is quite an accurate statement, and also very different than what he was saying publicly. On February 10 he said, in a campaign speech,

I think the virus is going to be — it’s going to be fine.

And February 26 in an official White House pandemic task force briefing:

The 15 [case count in the U.S.] within a couple of days is going to be down to close to zero. … This is a flu. This is like a flu.

When you see that someone has been saying one thing in public and something completely different in private, it’s natural to interpret the former as lying and the latter as the secret truth — particularly when, as in this case, the private statement is known to be, in fact, true, and the public statement false. And particularly when the speaker later says

I wanted to always play it down. I still like playing it down, because I don’t want to create a panic.

With Trump, though, this interpretation is likely false.

The thing is, while his statement of February 7 was true, he could not have known it was true. No one knew it was true. We can see any number of statements by responsible public-health officials making similar statements at the time. For example, Anthony Fauci on February 19:

Fauci doesn’t want people to worry about coronavirus, the danger of which is “just minuscule.” But he does want them to take precautions against the “influenza outbreak, which is having its second wave.”

“We have more kids dying of flu this year at this time than in the last decade or more,” he said. “At the same time people are worrying about going to a Chinese restaurant. The threat is (we have) a pretty bad influenza season, particularly dangerous for our children.”

And it’s not just Americans under the thumb of Trump. February 6, the day before Trump’s remark to Woodward, the head of the infectious disease clinic at a major Munich hospital, where some of the first German Covid-19 patients were being treated, told the press that “Corona is definitely not more dangerous than influenza,” and criticised the panic that was coming from exaggerated estimates of mortality rates.

Researchers were posting their data and models in real time, but there just wasn’t enough understanding possible then. This is the kind of issue where the secret information that a government has access to is of particularly limited value.

So how are we to interpret Trump’s statements? I think the key is that Trump is not a liar per se, he is a conman and a bullshitter, someone to whom the truth of his statements is completely irrelevant.

In early February he probably did receive a briefing where the possibility that the novel coronavirus was highly lethal and airborne was raised as one possibility, as well as the possibility that it was mild and would disappear on its own. .In talking to elite journalist Bob Woodward he delivered up the most frightening version, not because he believed it was true, but because it seemed most impressive, making him seem like the mighty keeper of dangerous secrets. When talking to the public he said something different, because he had other motives. It’s purely coincidence that what he said in private turned out to be true.

It would be poetic justice of Trump were to be damaged by the bad luck of one time accidentally having told the truth.

Jack and the Beehive

It suddenly struck me that the English word beanstalk and the German word Bienenstock (beehive) sound powerfully like cognates, even though they are not. There are quite a lot of faux amis between English and German, and they are usually cognate, even when the meanings are radically different — as between the English fabric and the German Fabrik (factory), or the English stuff and the German Stoff (fabric). They have a common root, from which they have evolved differently. Even the bizarre Gift meaning “poison” started out as something given, a dose of medicine (dosis also from the Latin root for “given”).

But beanstalk and Bienenstock are both compound words made up of parts that both seem like they could be cognates, but actually are unrelated. That beans and bees are unrelated is unsurprising. It took me a bit of work to convince myself that stalk is etymologically unrelated to Stock, which is indeed cognate to the English stick. The roots are quite different: Stalk from Old English stale, meaning a handle or part of a ladder; Stock originally a branch or a treestump, presumably then a stump that houses bees, either naturally or agriculturally.