The end of the Turing test

The Turing test has always had a peculiar place in the philosophy of mind. Turing’s brilliant insight was that we should be able to replace the apparently impossible task of developing a consensus definition of the words ‘machine’ and ‘think’, with a possibly simpler procedural definition: Can a machine succeed at the “Imitation game”, whose goal is to convince a neutral examiner that it (and not its human opponent) is the real human? Or, to frame it more directly — and this is how it tends to be interpreted — can a computer carry on a natural language conversation without being unmasked by an interrogator who is primed to recognise that it might not be human?

Turing’s argument was that, while it certainly is possible without passing the test — even humans may be intelligent while being largely or entirely nonverbal — we should be able to agree on some observable activities, short of being literally human in all ways, that would certainly suffice to persuade us that the attribution of human-like intelligence is warranted. The range of skills required to carry on a wide-ranging conversation makes that ability a plausible stand-in for what is now referred to as general intelligence. (The alert interrogator is a crucial part of this claim, as humans are famously gullible about seeing human characteristics reflected in simple chat bots, forces of nature, or even the moon.)

If we won’t accept any observable criteria for intelligence, Turing points out, then it is hard to see how we can justify attributing intelligence even to other humans. He specifically takes on, in his essay, the argument (which he attributes specifically to a Professor Jefferson) that a machine cannot be intelligent merely because it performs certain tasks. Machine intelligence, Jefferson argued, is impossible because

No mechanism could feel (and not merely artificially signal, an easy contrivance) pleasure at its successes, grief when its valves fuse, be warmed by flattery, be made miserable by its mistakes, be charmed by sex, be angry or depressed when it cannot get what it wants.

Turing retorts that this leads to the solipsistic view that

the only way by which one could be sure that a machine thinks is to be the machine and to feel oneself thinking. One could then describe these feelings to the world, but of course no one would be justified in taking any notice. Likewise according to this view the only way to know that a man thinks is to be that particular man.

In principle everyone could doubt the content of everyone else’s consciousness, but “instead of arguing continually over this point it is usual to have the polite convention that everyone thinks.” Turing then goes on to present an imagined dialogue that has since become a classic, in which the computer riffs on Shakespeare sonnets, Dickens, the seasons, and Christmas. The visceral impact of the computer’s free-flowing expression of sentiment and understanding, Turing then suggests, is such that “I think that most of those who support the argument from consciousness could be persuaded to abandon it rather than be forced into the solipsist position.” He compares it, charmingly, to a university oral exam, by which it is established that a student has genuinely understood the material, rather than being able simply to reproduce rote phrases mechanically.

I used to accept this argument, but reflecting on Chat-GPT has forced me to reconsider. This is a predictive text generation tool recently made available that can produce competent texts based on arbitrary prompts. It’s not quite ready to pass the Turing test*, but it’s easy to see how a successor program — maybe GPT-4, the version that is expected to be made available to the public next year — might. And it’s also clear that nothing like this software could be considered intelligent.

Thinking about why not helps to reveal flaws in Turing’s reasoning that were covered by his clever rhetoric. Turing specifically argues against judging the machine by its “disabilities”, or its lack of limbs, or its electronic rather than biological nervous system. This sounds very open-minded, but the inclination to assign mental states to fellow humans rather than to computers is not irrational. We know that other humans have similar mental architecture to our own, and so are not likely to be solving problems of intellectual performance in fundamentally different ways. Modern psychology and neurobiology have, in fact, shown this intuition to be occasionally untrue: apparently intelligent behaviours can be purely mechanical, and this is particularly true of calculation and language.

In this respect, GPT-3 may be seen as performing a kind of high-level glossolalia, or like receptive aphasia, where someone produces long strings of grammatical words, but devoid of meaning. Human brain architecture links the production of grammatical speech to representations of meaning, but these are still surprisingly independent mechanisms. Simple word associations can produce long sentences with little or no content. GPT-3 has much more complex associational mechanisms, but only the meanings that are implicit in verbal correlations. It turns out to be true that you can get very far — probably all the way to a convincing intellectual conversation — without any representation of the truth or significance of the propositions being formed.

It’s a bit like the obvious cheat that Turing referred to, “the inclusion in the machine of a record of someone reading a sonnet, with appropriate switching to turn it on from time to time”, but on a level and complexity that he could not imagine.

Chat-GPT does pass one test of human-like behaviour, though. It’s been programmed to refuse to answer certain kinds of questions. I heard a discussion where it was mentioned that it refused to give specific advice about travel destinations, responding with something like “I’m not a search engine. Try Google.” But when the query was changed to “Write a script in which the two characters are a travel agent and a customer, who comes with the following query…” it returned exactly the response that was being sought, with very precise information.

It reminds me of the Kasparov vs Deep Blue match in 1997, when a computer first defeated a world chess champion. The headlines were full of “human intelligence dethroned”, and so on. I commented at the time that it just showed that human understanding of chess had advanced to a point that we could mechanise it, and that I would consider a computer intelligent only when we have a program that is supposed to be doing accounting spreadsheets but instead insists on playing chess.

* Below is ChatGPT’s interaction with Turing’s dialogue. I altered the first question to ask for a brief answer, because it does tend to waffle on. As should be clear, it’s not very good at metrical analysis, and doesn’t really seem to understand what a syllable is.

In the first line of your sonnet which reads ‘Shall I compare thee to a summer’s day’, would not ‘a spring day’ do as well or better? Give a brief answer.

It is possible that the speaker in the sonnet “Shall I compare thee to a summer’s day?” could have made a more positive comparison if they had chosen to compare the person to a spring day instead. Spring is often associated with renewal, growth, and beauty, and these qualities could be seen as more favorable for the comparison than the qualities of summer. However, it is also possible that the speaker could have found other aspects of spring to be less favorable for the comparison, such as the unpredictable weather that can sometimes occur in the spring. Ultimately, it is up to the speaker’s subjective judgment and perspective on the qualities of spring and summer.

What about the poetic scansion?

Poetic scansion refers to the process of analyzing the metrical structure of a poem, or the pattern of stressed and unstressed syllables in the lines of a poem. In English poetry, this is often done using the system of feet, with each foot consisting of a certain number of syllables and a particular pattern of stresses.

In the case of the sonnet “Shall I compare thee to a summer’s day?”, the poem follows the traditional form of an English sonnet, which consists of 14 lines with a rhyme scheme of abab cdcd efef gg. The lines of the sonnet are written in iambic pentameter, which means that each line has 10 syllables and follows a pattern of an unstressed syllable followed by a stressed syllable (da-DUM). Here is an example of the scansion of the first line of the sonnet:

Shall I com-pare thee to a sum-mer’s day?

In this line, the first syllable of “compare” and the second syllable of “summer’s” are unstressed, while the second syllable of “compare” and the first syllable of “summer’s” are stressed. The other syllables in the line are unstressed.

I hope this helps! Let me know if you have any other questions about poetic scansion.

But would it work with “summer” replaced by “spring”?

Yes, it is possible to replace the word “summer” with the word “spring” in the first line of the sonnet “Shall I compare thee to a summer’s day?” and maintain the same metrical structure. The resulting line would be:

Shall I compare thee to a spring day?

In this line, the first syllable of “compare” and the second syllable of “spring” are unstressed, while the second syllable of “compare” and the first syllable of “spring” are stressed. The other syllables in the line are unstressed.

Iambic pentameter is a common meter in English poetry, and many words can be substituted in a line of iambic pentameter without disrupting the metrical structure. However, it is important to note that the choice of words can still affect the meaning and tone of the poem, even if the metrical structure remains the same. In this case, replacing the word “summer” with the word “spring” would change the meaning and connotations of the line, as the two seasons have different associations and characteristics.

Share this:

Leave a comment Cancel reply