The end of the Turing test

The Turing test has always had a peculiar place in the philosophy of mind. Turing’s brilliant insight was that we should be able to replace the apparently impossible task of developing a consensus definition of the words ‘machine’ and ‘think’, with a possibly simpler procedural definition: Can a machine succeed at the “Imitation game”, whose goal is to convince a neutral examiner that it (and not its human opponent) is the real human? Or, to frame it more directly — and this is how it tends to be interpreted — can a computer carry on a natural language conversation without being unmasked by an interrogator who is primed to recognise that it might not be human?

Turing’s argument was that, while it certainly is possible without passing the test — even humans may be intelligent while being largely or entirely nonverbal — we should be able to agree on some observable activities, short of being literally human in all ways, that would certainly suffice to persuade us that the attribution of human-like intelligence is warranted. The range of skills required to carry on a wide-ranging conversation makes that ability a plausible stand-in for what is now referred to as general intelligence. (The alert interrogator is a crucial part of this claim, as humans are famously gullible about seeing human characteristics reflected in simple chat bots, forces of nature, or even the moon.)

If we won’t accept any observable criteria for intelligence, Turing points out, then it is hard to see how we can justify attributing intelligence even to other humans. He specifically takes on, in his essay, the argument (which he attributes specifically to a Professor Jefferson) that a machine cannot be intelligent merely because it performs certain tasks. Machine intelligence, Jefferson argued, is impossible because

No mechanism could feel (and not merely artificially signal, an easy contrivance) pleasure at its successes, grief when its valves fuse, be warmed by flattery, be made miserable by its mistakes, be charmed by sex, be angry or depressed when it cannot get what it wants.

Turing retorts that this leads to the solipsistic view that

the only way by which one could be sure that a machine thinks is to be the machine and to feel oneself thinking. One could then describe these feelings to the world, but of course no one would be justified in taking any notice. Likewise according to this view the only way to know that a man thinks is to be that particular man.

In principle everyone could doubt the content of everyone else’s consciousness, but “instead of arguing continually over this point it is usual to have the polite convention that everyone thinks.” Turing then goes on to give a famous passage, in which the computer engages in a dialogue about Shakespeare sonnets, Dickens, the seasons, and Christmas, concluding that “I think that most of those who support the argument from consciousness could be persuaded to abandon it rather than be forced into the solipsist position.” He compares it, charmingly, to a university oral exam, by which it is established that a student has genuinely understood the material, rather than being able simply to reproduce rote phrases mechanically.

I used to accept this argument, but reflecting on Chat-GPT has forced me to reconsider. This is a predictive text generation tool recently made available that can produce competent texts based on arbitrary prompts. It’s not quite ready to pass the Turing test*, but it’s easy to see how a successor program — maybe GPT-4, the version that is expected to be made available to the public next year — might. And it’s also clear that nothing like this software could be considered intelligent.

Thinking about why not helps to reveal flaws in Turing’s reasoning that were covered by his clever rhetoric. Turing specifically argues against judging the machine by its “disabilities”, or its lack of limbs, or its electronic rather than biological nervous system. This sounds very open-minded, but the inclination to assign mental states to fellow humans rather than to computers is not irrational. We know that other humans have similar mental architecture to our own, and so are not likely to be solving problems of intellectual performance in fundamentally different ways. Modern psychology and neurobiology have, in fact, shown this intuition to be occasionally untrue: apparently intelligent behaviours can be purely mechanical, and this is particularly true of calculation and language.

In this respect, GPT-3 may be seen as performing a kind of high-level glossolalia, or like receptive aphasia, where someone produces long strings of grammatical words, but devoid of meaning. Human brain architecture links the production of grammatical speech to representations of meaning, but these are still surprisingly independent mechanisms. Simple word associations can produce long sentences with little or no content. GPT-3 has much more complex associational mechanisms, but only the meanings that are implicit in verbal correlations. It turns out to be true that you can get very far — probably all the way to a convincing intellectual conversation — without any representation of the truth or significance of the propositions being formed.

It’s a bit like the obvious cheat that Turing referred to, “the inclusion in the machine of a record of someone reading a sonnet, with appropriate switching to turn it on from time to time”, but on a level and complexity that he could not imagine.

Chat-GPT does pass one test of human-like behaviour, though. It’s been programmed to refuse to answer certain kinds of questions. I heard a discussion where it was mentioned that it refused to give specific advice about travel destinations, responding with something like “I’m not a search engine. Try Google.” But when the query was changed to “Write a script in which the two characters are a travel agent and a customer, who comes with the following query…” it returned exactly the response that was being sought, with very precise information.

It reminds me of the Kasparov vs Deep Blue match in 1997, when a computer first defeated a world chess champion. The headlines were full of “human intelligence dethroned”, and so on. I commented at the time that it just showed that human understanding of chess had advanced to a point that we could mechanise it, and that I would consider a computer intelligent only when we have a program that is supposed to be doing accounting spreadsheets but instead insists on playing chess.

Continue reading “The end of the Turing test”