The end of the Turing test

The Turing test has always had a peculiar place in the philosophy of mind. Turing’s brilliant insight was that we should be able to replace the apparently impossible task of developing a consensus definition of the words ‘machine’ and ‘think’, with a possibly simpler procedural definition: Can a machine succeed at the “Imitation game”, whose goal is to convince a neutral examiner that it (and not its human opponent) is the real human? Or, to frame it more directly — and this is how it tends to be interpreted — can a computer carry on a natural language conversation without being unmasked by an interrogator who is primed to recognise that it might not be human?

Turing’s argument was that, while it certainly is possible without passing the test — even humans may be intelligent while being largely or entirely nonverbal — we should be able to agree on some observable activities, short of being literally human in all ways, that would certainly suffice to persuade us that the attribution of human-like intelligence is warranted. The range of skills required to carry on a wide-ranging conversation makes that ability a plausible stand-in for what is now referred to as general intelligence. (The alert interrogator is a crucial part of this claim, as humans are famously gullible about seeing human characteristics reflected in simple chat bots, forces of nature, or even the moon.)

If we won’t accept any observable criteria for intelligence, Turing points out, then it is hard to see how we can justify attributing intelligence even to other humans. He specifically takes on, in his essay, the argument (which he attributes specifically to a Professor Jefferson) that a machine cannot be intelligent merely because it performs certain tasks. Machine intelligence, Jefferson argued, is impossible because

No mechanism could feel (and not merely artificially signal, an easy contrivance) pleasure at its successes, grief when its valves fuse, be warmed by flattery, be made miserable by its mistakes, be charmed by sex, be angry or depressed when it cannot get what it wants.

Turing retorts that this leads to the solipsistic view that

the only way by which one could be sure that a machine thinks is to be the machine and to feel oneself thinking. One could then describe these feelings to the world, but of course no one would be justified in taking any notice. Likewise according to this view the only way to know that a man thinks is to be that particular man.

In principle everyone could doubt the content of everyone else’s consciousness, but “instead of arguing continually over this point it is usual to have the polite convention that everyone thinks.” Turing then goes on to give a famous passage, in which the computer engages in a dialogue about Shakespeare sonnets, Dickens, the seasons, and Christmas, concluding that “I think that most of those who support the argument from consciousness could be persuaded to abandon it rather than be forced into the solipsist position.” He compares it, charmingly, to a university oral exam, by which it is established that a student has genuinely understood the material, rather than being able simply to reproduce rote phrases mechanically.

I used to accept this argument, but reflecting on Chat-GPT has forced me to reconsider. This is a predictive text generation tool recently made available that can produce competent texts based on arbitrary prompts. It’s not quite ready to pass the Turing test*, but it’s easy to see how a successor program — maybe GPT-4, the version that is expected to be made available to the public next year — might. And it’s also clear that nothing like this software could be considered intelligent.

Thinking about why not helps to reveal flaws in Turing’s reasoning that were covered by his clever rhetoric. Turing specifically argues against judging the machine by its “disabilities”, or its lack of limbs, or its electronic rather than biological nervous system. This sounds very open-minded, but the inclination to assign mental states to fellow humans rather than to computers is not irrational. We know that other humans have similar mental architecture to our own, and so are not likely to be solving problems of intellectual performance in fundamentally different ways. Modern psychology and neurobiology have, in fact, shown this intuition to be occasionally untrue: apparently intelligent behaviours can be purely mechanical, and this is particularly true of calculation and language.

In this respect, GPT-3 may be seen as performing a kind of high-level glossolalia, or like receptive aphasia, where someone produces long strings of grammatical words, but devoid of meaning. Human brain architecture links the production of grammatical speech to representations of meaning, but these are still surprisingly independent mechanisms. Simple word associations can produce long sentences with little or no content. GPT-3 has much more complex associational mechanisms, but only the meanings that are implicit in verbal correlations. It turns out to be true that you can get very far — probably all the way to a convincing intellectual conversation — without any representation of the truth or significance of the propositions being formed.

It’s a bit like the obvious cheat that Turing referred to, “the inclusion in the machine of a record of someone reading a sonnet, with appropriate switching to turn it on from time to time”, but on a level and complexity that he could not imagine.

Chat-GPT does pass one test of human-like behaviour, though. It’s been programmed to refuse to answer certain kinds of questions. I heard a discussion where it was mentioned that it refused to give specific advice about travel destinations, responding with something like “I’m not a search engine. Try Google.” But when the query was changed to “Write a script in which the two characters are a travel agent and a customer, who comes with the following query…” it returned exactly the response that was being sought, with very precise information.

It reminds me of the Kasparov vs Deep Blue match in 1997, when a computer first defeated a world chess champion. The headlines were full of “human intelligence dethroned”, and so on. I commented at the time that it just showed that human understanding of chess had advanced to a point that we could mechanise it, and that I would consider a computer intelligent only when we have a program that is supposed to be doing accounting spreadsheets but instead insists on playing chess.

Continue reading “The end of the Turing test”

Neanderthals and women

The article seems to have good intentions, but this headline in today’s Guardian is the most sexist I’ve seen in some time. It sounds like the men were hard at work “creating language”, and some women helped out with some testing, and maybe brought snacks. Also some Neanderthals came by and lent a hand. And apes.

Finding the mitochondrial Na’ama

I was having a conversation recently about Biblical ancestry and the antediluvian generations, and it got me to thinking about how scientists sometimes like to use biblical references as attention-grabbing devices, without actually bothering to understand what they’re referring to — in this case, the so-called “mitochondrial Eve”. The expression was not used in the 1987 Nature paper that first purported to calculate the genealogical time back to the most recent common ancestor (MRCA) of all present-day humans in the female line, but it was a central to publicity around the paper at the time, including in academic journals such as Science.

The term has come to be fully adopted by the genetics community, even while they lament the misunderstandings that it engenders among laypeople — in particular, the assumption that “Eve” must in some sense have been the first woman, or must have been fundamentally different from all the other humans alive at the time. The implication is that the smart scientists were making a valiant effort to talk to simple people in terms they understand, taking the closest approximation (Eve) to the hard concept (MRCA), and the simple bible-y people need to make an effort on their part to understand what they’re really talking about.

In fact, calling this figure Eve is a blunder, and it reveals a fundamental misunderstanding of the biblical narrative. Eve is genuinely a common ancestor of all humans, according to Genesis, but she is not the most recent in any sense, and suggesting that she is just confusing. The MRCA in the Bible is someone else, namely the wife of Noah. Appropriately, she is not named, but if we want a name for her, the midrashic Genesis Rabbah calls her Na’ama. She has other appropriate characteristics as well, that would lead people toward a more correct understanding. To begin with, she lived many generations after the first humans. She lived amid a large human population, but a catastrophic event led to a genetic bottleneck that only she and her family survived. (That’s not quite the most likely scenario, but it points in the right direction.) And perhaps most important — though this reflects the core sexism of the biblical story — there was nothing special about her. She just happened to be in right place at the right time, namely, partnered with the fanatic boat enthusiast when the great flood happened.

The poisoned roots of German anti-vax sentiment

I’ve long thought it odd that Germany, where the politics is generally fairly rational, and science education in particular is generally quite good, has such broad acceptance of homeopathy and a variety of other forms of quackery, and a special word — Schulmedizin — “academic medicine” — to express a dismissive attitude toward what elsewhere would be called just “medicine”, or perhaps “evidence-based medicine”. I was recently looking into the history of this, and found that attacks on Schulmedizin — or “verjudete Schulmedizin” (jewified academic medicine) — were as much a part of the Nazi state science policy as “German mathematics” and “Arian physics”.

Medicine in the Third Reich remained a weird mixture of modern virology and pseudo-scientific “racial hygiene”. The celebrated physician Erwin Liek wrote

Es ist mein Glaube, dass das deutsche Volk berufen ist, nach und nach eine ganz neue, rein deutsche Heilkunst zu entwickeln.
(It is my belief, that the German people has a calling, gradually to develop a pure German art of healing.)

Liek was appealing for a synthesis of Schulmedizin with traditional German treatment. As with Arian physics*, and the Nazi state was careful not to push the healthy German understanding so far as to undermine important technology and industry. But the appeal to average people’s intuitive discomfort with modern science was a powerful propaganda tool that they couldn’t resist using, as in this 1933 cartoon “The vaccination” from Der Stürmer that shows an innocent blond arian mother uncomfortably watching her baby being vaccinated by a fiendish Jewish doctor. The caption reads “This puts me in a strange mood/Poison and Jews never do good.”

1933 Cartoon from Der Stürmer: Blond German mother looking concerned as a beastly Jewish doctor vaccinates her baby. Caption: "This puts me in a strange mood/Poison and Jews are seldom good."
1933 Der Stürmer cartoon “The vaccination”.

Today’s anti-vaxers fulminating against Schulmedizin and the Giftspritze (poison shot) are not necessarily being consciously anti-Semitic, but the vocabulary and the paranoid conspiracy thinking are surely not unconnected.

* Heisenberg was famously proud of having protected “Jewish physics” from being banned at his university, considering himself a hero for continuing to teach relativity theory, even while not objecting to the expulsion of the Jewish physicists, and agreeing not to attach their names to their work. Once when I was browsing in the science section of a Berlin bookstore in the early 1990s a man started chatting with me, telling me that he had worked for decades as a radio engineer in the GDR, and then going on to a long monologue apropos of nothing about how wonderful Heisenberg was, and how he had courageously defended German science during the Third Reich.

The first principle of statistical inference

When I first started teaching basic statistics, I thought about how to explain the importance of statistical hypothesis testing. I focused on a textbook example (specifically, Freedman, Pisani, Purves Statistics, 3rd ed., sec 28.2) of a data set that seems to show more women being right-handed than men. I pointed out that we could think of many possible explanations: Girls are pressured more to conform, women are more rational — hence left-brain-centred. But before we invest too much time and credibility in abstruse theories to explain the phenomenon, we should first make sure that the phenomenon is real, that it’s not just the kind of fluctuation that could happen by accident. (It turns out that the phenomenon is real. I don’t know if either of my explanations is valid, or if anyone has a more plausible theory.)

I thought if this when I heard about the strange Oxford-AstraZeneca vaccine serendipity that was announced this week. The third vaccine success announced in as many weeks, the researchers announced that they had found about a 70% efficacy, which is good, but not nearly as impressive as the 95% efficacy of the mRNA vaccines announced earlier in the month. But the strange thing was, they found that a subset of the test subjects who received only a half dose at the first injection, and a full dose later, showed a 90% efficacy. Experts have been all over the news media trying to explain how some weird idiosyncrasies of the human immune system and the chimpanzee adenovirus vector could make a smaller dose more effective. Here’s a summary from Science:

Researchers especially want to know why the half-dose prime would lead to a better outcome. The leading hypothesis is that people develop immune responses against adenoviruses, and the higher first dose could have spurred such a strong attack that it compromised the adenovirus’ ability to deliver the spike gene to the body with the booster shot. “I would bet on that being a contributor but not the whole story,” says Adrian Hill, director of Oxford’s Jenner Institute, which designed the vaccine…

Some evidence also suggests that slowly escalating the dose of a vaccine more closely mimics a natural viral infection, leading to a more robust immune response. “It’s not really mechanistically pinned down exactly how it works,” Hill says.

Because the different dosing schemes likely led to different immune responses, Hill says researchers have a chance to suss out the mechanism by comparing vaccinated participants’ antibody and T cell levels. The 62% efficacy, he says, “is a blessing in disguise.”

Others have pointed out that the populations receiving the full dose and the half dose were substantially different: The half dose was given by accident to a couple of thousand subjects at the start of the British arm of the study. These were exclusively younger, healthier individuals, something that could also explain the higher efficacy, in a less benedictory fashion.

But before we start arguing over these very interesting explanations, much less trying to use them to “suss out the mechanisms” the question they should be asking is, is the effect real? The Science article quotes immunologist John Moore asking “Was that a real, statistically robust 90%?” To ask that question is to answer it resoundingly: No.

They haven’t provided much data, but the AstraZeneca press release does give enough clues:

One dosing regimen (n=2,741) showed vaccine efficacy of 90% when AZD1222 was given as a half dose, followed by a full dose at least one month apart, and another dosing regimen (n=8,895) showed 62% efficacy when given as two full doses at least one month apart. The combined analysis from both dosing regimens (n=11,636) resulted in an average efficacy of 70%. All results were statistically significant (p<=0.0001)

Note two tricks they play here. First of all, they give those (n=big number) which makes it seem reassuringly like they have an impressively big study. But these are the numbers of people vaccinated, which is completely irrelevant for judging the uncertainty in the estimate of efficacy. The reason you need such huge numbers of subjects is so that you can get moderately large numbers where it counts: the number of subjects who become infected. Further, while it is surely true that the “results” were highly statistically significant — that is, the efficacy in each individual group was not zero — this tells us nothing about whether we can be confident that the efficacy is actually higher than what has been considered the minimum acceptable level of 50%, or — and this is crucial for the point at issue here — whether the two groups were different from each other.

They report a total of 131 cases. They don’t say how many cases were in each group, but if we assume that there were equal numbers of subjects getting the vaccine and the treatment in all groups then we can back-calculate the rest. We end up with 98 cases in the full-dose group (of which 27 received the vaccine) and 33 cases in the half-dose group, of which 3 received the vaccine. Just 33! Using the Clopper-Pearson exact method, we obtain 90% confidence intervals of (.781,.975) for the efficacy of the half dose and (.641, .798) for the efficacy of the full dose. Clearly some overlap there, and not much to justify drawing substantive conclusions from the difference between the two groups — which may actually be zero, or close to 0.

Vaccine probabilities

From an article on the vaccine being developed by Robin Shattock’s group at Imperial College:

The success rate of vaccines at this stage of development is 10%, Shattock says, and there are already probably 10 vaccines in clinical trials, “so that means we will definitely have one”

It could be an exercise for a probability course:

  1. Suppose there are exactly 10 vaccines in this stage of development. What is the probability that one will succeed?
  2. Interpret “probably 10 vaccines” to mean that the number of vaccines in clinical trials is Poisson distributed with parameter 10. What is the probability that one will succeed?

Extra precision: Currency edition

I have commented before on the phenomenon where changing units turns an obviously approximate number into a weirdly precise one. Here is a new example, from the Guardian’s disturbing report on the mass slaughter of donkeys for the use of their hides in traditional Chinese medicine:

Since the booming skin trade has driven up donkey prices, owners struggle to replace their animals when they are stolen. The cost of a donkey in Kenya increased from £78 to £156 between 2016-19.

£78 seems like an oddly precise figure for what is surely a very diverse market in animals of varying qualities. Even weirder is that that precise figure precisely doubled in the period under consideration. Then it occurred to me, at current exchange rates £78 is about what you get when you convert the round number of US$100. So I’m going to hazard a guess that the reporter was told that the price had risen from around $100 to around $200, and simply converted it to pounds for the UK market without further comment.

Moon over Brussels

Brexit secretary David Davis, June 2017:

Half of my task is running a set of projects that make the NASA moon shot look quite simple.

And now, soon-to-be-prime-minister-select Boris Johnson:

Boris Johnson: ‘can-do spirit’ can solve problem of Irish border

Favourite to be PM compares Brexit to mission to put astronauts on moon in 1969

There is no task so simple that government cannot overcomplicate if it doesn’t want to do it.

Brexit has gone in two years from being as complicated as the first moon landing to being… as easy as the first moon landing. Continue reading “Moon over Brussels”

The nature vs. nurture debate: High Christology edition

At least since the late nineteenth century the social interpretation of biology — and of genetics in particular — has devolved repeatedly upon the nature–nurture dispute: To what extent is a human’s individual characteristics determined by a predetermined essence or nature — qualities they are born with, commonly identified with inheritance; or by nurture, the particulars of the physical and social environment in which they develop after birth. From one of the most interesting books I’ve read recently, Bart Ehrman’s How Jesus Became God, I learned that an analogous debate roiled the early Christian Church.

One of the key disputes among early followers of Jesus concerned the nature and meaning of Jesus’s divinity. At the extremes you had the “low” christology belief that Jesus was a wise man and preacher, of the same nature as any other human; and the “high” christology claim that Jesus was identical with the creator God of the Hebrew Scriptures, and only appeared to be human. (Perhaps even more extreme were the gnostic claims that Jesus was an even higher being than that nasty Yahweh, who obviously fucked up his one major task*, for possibly nefarious purposes. In between were a range of beliefs that Jesus was entirely divine and entirely human. Ehrman points out that in the ancient Mediterranean world there were two “major ways” that it was believed possible for a human to be divine:

  • By adoption or exaltation. A human being… could be made divine by an act of God or a god…
  • By nature or incarnation. A divine being… could become human, either permanently or, more commonly, temporarily.

In other words, God by nurture or God by nature. Nurture is particularly emphasised in the Gospel of Mark, Nature in the Gospel of John. Reflecting the common prejudice in favour of “nature” as the more powerful, one typically thinks of incarnation as representing a more exalted view of Jesus. A Jesus who grew up as a human, and only in adulthood was adopted by God seems less genuinely godlike than one who is, so to speak, fruit of God’s loins — hence the virgin-birth story of Matthew and Luke.

One of the more fascinating novelties of Ehrman’s account is his elucidation of adoption customs in the Roman world, particularly as regards nobles and rulers. Of course, we know that Roman emperors commonly adopted heirs — most famously, Julius Caesar’s adopted son Octavian — but Ehrman explains how prevalent views of adoption were that today would be called progressive: Adoptive families are families by choice, so could be considered superior to the accidental biological families. An heir chosen by a great leader for the qualities he has demonstrated better incorporates and perpetuate’s the leader’s essence than his biological descendant.

Thus, a Christ nurtured by and ultimately adopted into the divine family by God after he had proved himself worthy is a more genuinely divine being than any merely so-to-speak genetically divine progeny, who might ultimately turn out to be a disappointment to his father.

* A classic joke with a gnostic perspective: A man goes to the tailor to order a new coat. The tailor fusses around taking measurements, asking exacting questions about the fabric, the cut, and so on. Having finished he names a price and tells the customer the jacket will be finished in three weeks. “Three weeks! The Lord created the whole world in just one week!”

The tailor shakes his head, picks up another recently completed coat, and beckons the man to come to the window. “One week you want? Look at the work here. The precision cuts. The minute stitching. The harmonious interplay of the parts. And now” gesturing out the window, “look at this world…”

The time lords

The European parliament has voted to stop the practice of switching clocks forward and backward every year, from 2021. I’ve long thought this practice rather odd. Imagine that a government were to pass a law stating that from April 1 every person must wake up one hour earlier than they habitually do, and go to sleep one hour earlier. All shops and businesses are required to open an hour earlier, and to close an hour earlier. The same for schools, universities, and the timing of private lessons and appointments must also be shifted. Obviously ridiculous, even tyrannical. The government has nothing to say about when I go to bed or wake up, when my business is open. But because they enforce it through adjusting the clocks, which seem like an appropriate subject of regulation and standardisation, it is almost universally accepted.

But instead of praising this blow struck for individual freedom and against statist overreach, we have Tories making comments like this:

John Flack, the Conservative MEP for the East of England, said: “We’ve long been aware the EU wants too much control over our lives – now they want to control time itself. You would think they had other things to worry about without wanting to become time lords,” he said, in an apparent reference to the BBC sci-fi drama Doctor Who.

“We agreed when they said the clocks should change across the whole EU on an agreed day. That made sense – but this is a step too far,” Flack added. “I know that farmers in particular, all across the east of England, value the flexibility that the clock changes bring to get the best from available daylight.

So, the small-government Tory thinks it’s a perfectly legitimate exercise of European centralised power to compel shopkeepers in Sicily and schoolchildren in Madrid to adjust their body clocks* in order to spare English farmers the annoyance of having to consciously adjust the clocktime when they get out of bed to tend to their harvest. But to rescind this compulsion, that is insufferably arrogant.

*Nor is this a harmless annoyance. Researchers have found a measurable increase in heart attacks — presumed attributable to reduced sleep — in the days following the spring clock shift. A much smaller decrease may accompany the autumn shift back.