Technical – Page 5 – Common Infirmities

Security theatre, WWII and today

Computer security researcher Chris Roberts has been banned from United Airlines for the offense of pointing out that the lax security in their onboard wifi systems could endanger the safety of the aircraft. At the same time, they insisted that

We are confident our flight control systems could not be accessed through techniques [Mr Roberts] described.

The only danger to the flight control systems, it turns out, was the researcher who informed them (via Twitter) of the security flaws.

This reminded me of the story Richard Feynman told about cracking safes for a lark at Los Alamos. One time he decided to needle a colonel he was visiting at Oak Ridge, who had just deposited some highly secret documents extra heavy-duty safe, but with the same easy-to-crack lock on it. He’d figured out that when the safe was left open, it was easy to pick up two of the three numbers of the combination by feel.

“The only reason you think they’re safe in there is because civilians call it a ‘safe’.”

The colonel furiously challenged him to open it up. This Feynman accomplished, in two minutes, though he pretended to need much longer, to distract from what an easy trick it was.) After allowing some moments of astonishment, he decided to be responsible:

“Colonel, let me tell you something about these locks: When the door to the safe or the top drawer of the filing cabinet is left open, it’s very easy for someone to get the combination. That’s what I did while you were reading my report, just to demonstrate the danger. You should insist that everybody keep their filing cabinet drawers locked while they’re working, because when they’re open, they’re very, very vulnerable.”

The next time Feynman visited Oak Ridge, everyone was wanting to keep him out of their offices. It seems, the colonel’s response to the danger was to make everyone change their combinations if Feynman had been in or passed through their office, which was a significant nuisance.

That was his solution: I was the danger.[…] Of course, their filing cabinets were still left open while they were working.

Prenatal sex ratio

A paper that I’ve been involved with for a dozen years already has finally been published. We bring together multiple data sets to show that the primary sex ratio — the ratio of boys to girls conceived — is 1, or very close to 1. Consequently, the fact that more boys than girls are born — the ratio is about 1.06 pretty universally, except where selective abortion is involved — implies that there must be a period in the first trimester when female embryos are more likely to miscarry than male.

This is one of those things that is unsurprising if you’re not an expert. The experts had developed something close to a consensus, based on very little evidence, that the sex ratio at conception was much higher, some saying it’s has high as 2 (so that 2/3 of the conceptuses would be male), with excess female mortality throughout gestation. (We know that male mortality is higher in the second half of pregnancy, and after that… forever.)

The paper has its problems, but I think it’s a useful contribution. It’s also the first time I’ve been involved in research that is of any interest to the general public. Several publications have expressed interest, and an article has already appeared in two German magazines online, including the general news magazine Der Spiegel.

Update: Guardian too. This makes it interesting, in retrospect, that we had such a hard time getting a journal even to be willing to review it. One said it was too specialised.

Correct me, Lord, but in moderation…

Jeremiah 10:24.

Accounts of error-correcting codes always start with the (3,1)-repetition code — transmit three copies of each bit, and let them vote, choosing the best two out of three when there is disagreement. Apparently this code has been in use for longer than anyone had realised, to judge by this passage from the Jerusalem Talmud:

Three scrolls [of the Torah] did they find in the Temple courtyard. In one of these scrolls they found it written “The eternal God is your dwelling place (maon)“. And in two of the scrolls it was written, “The eternal God is your dwelling place (meonah)”. They confirmed the reading found in the two and abrogated the other.

In one of them they found written “They sent the little ones of the people of Israel”. And in the two it was written, “They sent young men…”. They confirmed the two and abrogated the other.

In one of them they found written “he” [written in the feminine spelling] nine times, and in two they found it written that way eleven times. They confirmed the reading found in the two and abrogated the other. (tractate Ta’anit 4:2, trans. Jacob Neusner)

(h/t Masorti Rabbi Jeremy Gordon, who alluded to this passage in an inter-demominational panel discussion yesterday at the OCHJS. He was making a different point, which for some reason had very little to do with information theory.)

In the still of the night

I just read a popular book on chemical elements, The Disappearing Spoon by Sam Kean. It was very entertaining, and seemed quite credible and clear, even if slightly fuzzy on the quantum mechanics. There was one claim that I took exception to, though:

The length of a day is slowly increasing because of the sloshing of ocean tides, which drag and slow earth’s rotation. To correct for this, metrologists slip in a “leap second” about every third year, usually when no one’s paying attention, at midnight on December 31.

Is there any time in the year when people are paying more attention to the time exact to a second than precisely at midnight on December 31? Does he think people would notice an extra second if it were interpolated at noon on July 7? I always assumed they did it at the one time of year when people would notice an extra second precisely because they want to be noticed. “Never fear, humble citizens. While you sleep, we are looking after your time.”

Up is down

I mentioned in an earlier post George Lakoff’s work on metaphorical language. One fascinating issue is the way same metaphorical target can be mapped onto by multiple conceptual domains, and sometimes these can come into conflict — or a metaphor can come into conflict with the literal meaning of the target. When the figurative-literal target conflict is particularly succinct, this tends to be called “oxymoron”. One of my favourites is the 1970s novel and subsequent film about a burning skyscraper, called The Towering Inferno.

This particular one depends on the conflict between the “UP is GOOD, DOWN is BAD” metaphor (an indirect form of it, since it goes by way of DOWN is BAD is HELL is BURNING), conflicting with the literally towering skyscraper. Anyway, the UP-DOWN dichotomy gets used a lot, creating lots of potential confusion. For example, UP is DIFFICULT and DOWN is EASY, inspiring the famous allegory of Hesiod that inspired so many devotional images:

Vice is easy to have; you can take it by handfuls without effort. The road that way is smooth and starts here beside you, but between us and virtue, the immortals have put what will make us sweat. The road to virtue is long and steep uphill hard climbing at first.

Hence the uncertainty of the phrase “Everything’s going downhill.” Is it getting worse, or getting easier?

There is a triple ambiguity when numbers get involved. LARGE NUMBERS are UP (“higher numbers”, “low number”) when we are counting the floors of a building, but SMALL NUMBERS are UP when ranking (#1 is the winner and comes at the top of the list).

This brings us to the example that inspired this post. The BBC news web site this morning told us that “A&E waiting times in England have fallen to their worst level for a decade.” It’s hard to feel much sense of urgency about the fact that waiting times have “fallen”.

Presumably that’s why the text had changed in the afternoon:

Innumeracy: UK prison service edition

The BBC reports on a study by the Prisoners Education Trust, of the impact of the recent decision of the prison service to limit prisoners’ access to books. The Ministry of Justice has dismissed the study, saying

the PET survey of 343 inmates represented just 0.01% of the total prison population in England and Wales.

This is a twofer, with a pair of errors packed into impressively small space. Even a government minister should be able to calculate that if 343 inmates represent 0.01% of the prison population, then more than 6% of the population (53.5 million) must be imprisoned, which I don’t need to check the figures to know must be wrong. But I did check it, and find that the Ministry of Justice made a wee error of not quite 2 orders of magnitude. According to this publication (coincidentally, also from the Ministry of Justice) there were about 84,000 prisoners in June 2013. Assuming there haven’t been any huge changes since then, those 343 inmates in fact represent 0.4% of the prison population. Where is Michael Gove when you need him?

More generally, the comment conveyed the impression that if the sample were a small fraction of the population then it couldn’t be statistically valid. Of course, that’s not true. If you were doing an election poll of the whole population of England, a random sample of 0.01% of the population would be about 5000 people, which is much larger than most surveys, and enough to get a result that’s accurate to within about ± 1.5%. The real problem with this survey is that it’s not a random sample, and not representative, being self-selected among readers of a certain magazine; but there is no pretence about that, and if the Ministry of Justice were interested in addressing the issue rather than issuing talking points, they could address the question of whether the concerns raised by the more literate of the prisoner population most concerned with literacy are worth taking seriously.

“Strong support for Johnson”

According to The Guardian, people want Boris Johnson to be the next leader of the Conservatives. They don’t say it explicitly, but they suggest that “next” means, like, tomorrow, and not after the next election. After citing a poll finding that 29% of voters want Johnson to be the next Tory leader (are those Conservative supporters? I might want Johnson to be the next Tory leader because I think he’ll lead his party to disaster…), they write

The strong support for Johnson feeds into the party standings. The poll finds that Labour’s seven-point lead would fall to three points if he led the Tories. The Tories would see their support increase by three points under a Johnson premiership to 34% while Labour would see its support fall by one point to 37%. Johnson would also hit support for Ukip,. which would see its support fall by two points to 8%.

Before the Tories dump Cameron, they might want to check whether this 3% boost is statistically robust. This looks like an elementary statistics exercise, but it’s not quite so simple. If D is the Tory support under Cameron, and B the Tory support under Johnson, then B-D might be expected to be about 3%. But how confident should we be that Johnson is really better than Cameron? Unfortunately, we can’t know that without knowing the correlations: in this case, that means we need to know how many people supported the Tories only with Cameron, and how many supported them only with Johnson, and how many supported them with either leader. Continue reading ““Strong support for Johnson””

Percents are hard

Some really bad science reporting from the BBC. They report on a new study finding the incidence of diagnosed coeliac disease increasing (and decreasing incidence of dermatitis herpetiformis, though this doesn’t rate a mention) in the UK. Diagnoses have gone up from 5.2 to 19.1 per 100,000 in about 20 years, which they attribute to increased awareness. Except, they don’t say what that is 100,000 of. You have to go back to the original article to see that it is person-years, and that they are talking about incidence, and not prevalence (in technical parlance); they use the word “rate”, which is pretty ambiguous, and commonly used — particularly in everyday speech — to refer to prevalence. If you read it casually — and, despite being a borderline expert in the subject, I misread it at first myself — you might think they mean that 19 in 100,000 of the population of Britain suffers from coeliac; that would be about 12,000 people, hardly enough to explain the condition’s cultural prominence (and prominent placement on the BBC website). In fact, they estimate that about 150,000 have diagnosed CD in the UK.

As if aiming maximally to compound the confusion, they quote one of the authors saying

“This [increase] is a diagnostic phenomenon, not an incidence phenomenon. It is exactly what we had anticipated.”

In the article they (appropriately) refer to the rate of diagnosis as incidence, but here they say it’s not about “incidence”.

To make matters worse, they continue with this comment:

Previous studies have suggested around 1% of the population would test positive for the condition – but the data from this study suggests only 0.25% are diagnosed.

I think that normally, if you say “only x% are diagnosed” is meant relative to the number of cases; here it would mean 0.25% of the 1%. But, in fact, they mean to compare the 0.25% of the population who are diagnosed with the 1% who actually suffer from the disease.

Killing the braces in order (technical)

I just noticed a funny tic that I have in programming (or writing LaTeX, which is sort of similar): When I remove brackets, I always look to remove the matched pair, even when they’re functionally equivalent, and even when it costs extra effort. For example, I have a text in LaTeX that goes something like

$\frac{1}{e^{x}}$.

I decide to change it to $e^{-x}$. So I remove the \frac{1}, and then the open brace, leaving me with $e^{-x}}$, and my cursor is right at the x. The simplest thing (measured in keystrokes) would be to shift over and remove the brace directly following the x. But there is mental resistance to removing the “wrong” brace, which I resolve by making the extra keystroke and removing the final brace. Not a big deal, but it occurs to me that there’s probably interesting work to be done (or already being done) in the psychology of programming, and the conflicts between the logical deep structure that programmers need to work with, and the surface structure that is all the computer gets to work with.

Identifiability

A hot topic in statistics is the problem of anonymisation of data. Medical records clearly contain highly sensitive, private information. But if I extract just the blood pressure measurements for purposes of studying variations in blood pressure over time, it’s hard to see any reason for keeping those data confidential.

But what happens when you want to link up the blood pressure with some sensitive data (current medications, say), and look at the impact of local pollution, so you need at least some sort of address information? You strip out the names, of course, but is that enough? There may be only one 68-year-old man living in a certain postcode. It could turn into one of those logic puzzles where you are told that Mary likes cantelope and has three tattoos, while John takes cold baths and dances samba, along with a bunch of other clues, and by putting it all together in an appropriate grid you can determine that Henry is adopted and it’s Sarah’s birthday. Some sophisticated statistical work, particularly in the peculiar field of algebraic statistics, has gone into defining the conditions under which there can be hidden relations among the data that would allow individuals to be identified with high probability.

I thought of this careful and subtle body of work when I read this article about private-sector mass surveillance of automobile license plates — another step in the Cthulhu-ised correlations of otherwise innocuous information that modern information technology is enabling. Two companies are suing the state of Utah to block a law that prevents them from using their own networks of cameras to record who is travelling where when, and use that information for ~~blackmail~~ market research.

The Wall Street Journal reports that DRN’s own website boasted to its corporate clients that it can “combine automotive data such as where millions of people drive their cars … with household income and other valuable information” so companies can “pinpoint consumers more effectively.” Yet, in announcing its lawsuit, DRN and Vigilant argue that their methods do not violate individual privacy because the “data collected, stored or provided to private companies (and) to law enforcement … is anonymous, in the sense that it does not contain personally identifiable information.”

They’re only recording information about So, in their representation, data are suitably anonymised if they don’t actually include the name and address. We’re just tracking vehicles. Could be anyone inside… We’re just linking it up with those vehicles’ household incomes. Presumably they’re going to target ads for high-grade oil and new tires at those cars, or something.