|The TL/DR Synopsis||Synopsis: The Turing Test was developed by computer pioneer Alan Turing, ostensibly to determine whether something artificial, like a computer, is actually engaged in thought. This post looks at an alleged flaw in the test to enquire about what the test really is and does.
Includes links to: (a) the Wired article alleging the flaw, (b) a PDF of Turing’s seminal paper “Computing Machinery and Intelligence,” (c) the site Homo Artificialis, and (d) Wikipedia entries for “uncanny valley” and Masahiro Mori, who coined the term, and to Mori’s paper in which he introduces the idea.
Embeded video: about the uncanny valley effect.
The Flaw in the Turing Test?
Late last year, as the International Alan Turing Year drew to a close, Terry Walby, the UK managing director at IPsoft, had a guest post on the Wired Science blog entitled Why the Turing Test Is a Flawed Benchmark.
The main thrust of Walby’s argument seems to be that Turing was misguided in recommending that we measure the ability of a machine to think by using human intelligence as a standard:
But Turing was wrong. A machine should not demonstrate intelligence by emulating a human. In fact, in some regards today’s expert systems are displaying intelligence far beyond the capability of a human. Should we mask such intellectual prowess in order for the machine to appear human, or allow it to run free to reach its full potential?
So is the Turing Test flawed and–as Walby later suggests–in need of replacing with a more satisfactory process?
How the Turing Test Works
First, for those who are new to it, a quick review of how the Turing Test works.
The test, or the Imitation Game as Turing himself called it in his seminal paper “Computing Machinery and Intelligence” [pdf], requires three participants:
- a human judge
- a hidden human who communicates with the judge only in writing (basically by text message)
- a hidden artificial intelligence that similarly communicates with the judge only in writing
The judge knows that either participant 2 or participant 3 is a computer while the other is human, and 2 and 3 both have to try to convince the judge that they’re the human being. If the computer succeeds–if it can act human enough to fool a human judge–it has passed the Turing Test and has earned the right to be treated as intelligent without any consideration of the means by which it managed that persuasion.
Turing introduces the idea of the Imitation Game to the reader gradually by first having the hidden participants be a man and a woman, with the judge having to figure out which is which. This is a parlor game version of the Imitation Game.
He then replaces the woman with a machine to turn the parlor game into a scientific enquiry and get at the question of machine intelligence. Remember that the paper was published in 1950 when Turing was in the process of inventing the discipline of artificial intelligence, so at the time this process would have eased readers into unfamiliar territory.
Is the Flaw in the Turing Test Real?
So is Walby right? This would be a boring post if I simply agreed with him, and overall I won’t (though his post is interesting and my critique is friendly and respectful).
But I want to start by agreeing on this point: machine intelligence should not be judged solely in comparison to human intelligence.
(One of my other blogs, Homo Artificialis, looks at disciplines that could eventually contribute to the creation of synthetic human bodies, artificial intelligence, or both–if you’ve seen it you know I’m at least notionally sympathetic to the idea of free range artificial intelligence developing on its own terms into its own most realized form.)
The trouble with Walby’s argument is that I don’t think Turing ever said that artificial intelligence should be judged by human standards–he simply never made the claim that Walby is disputing.
In his paper “Computing Machinery and Intelligence” [pdf], in which he codifies the famous test, Turing directly addressed the possibility that machines might ultimately be possessed of some form of intelligence unique to them and distinguishable from that of human beings:
May not machines carry out something which ought to be described as thinking but which is very different from what a man does?
He then simply puts this issue to one side, not because he’s dismissing it–he explicitly doesn’t dismiss it–but because it’s not the topic he’s addressing:
This objection is a very strong one, but at least we can say that if, nevertheless, a machine can be constructed to play the imitation game satisfactorily, we need not be troubled by this objection.
In other words, Turing agrees that machine intelligence may comprise different types, including some that do resemble human intelligence and some that don’t. The fact that there may be types that don’t simply doesn’t affect the subject of his enquiry: the types that do.
Indeed, while Turing famously starts the paper by asking “can machines think?”, later he is at pains to carefully circumscribe the question he’s addressing and to distinguish it from that larger, initial question:
We now ask the question, “What will happen when a machine takes the part of A [participant 3, above] in this game?”
Will the interrogator decide wrongly as often when the game is played like this as he does when the game is played between a man and a woman?
These questions replace our original, “Can machines think?” [emphasis added]
What the Turing Test Does
The Turing Test is not an exhaustive test for any and all kinds of artificial intelligence and I think it’s apparent that it wasn’t constructed to be.
What it is, is a test for a particular kind of evidence of artificial intelligence and it was carefully created to find the kind of evidence that is most persuasive to even the most skeptical of doubters.
We human beings ascribe intelligence to each other all the time even though we have no direct experience of another person’s intellect in action (a fact that Turing explicitly acknowledges in his discussion of the Argument from Consciousness).
We witness other people’s actions and hear or read their words, but that’s not conclusive of their engaging in thought. Maybe they’re actually hallucinations without intellects of their own, conjured up by our own minds. Or perhaps they’re illusions without substance projected by manipulative alien creatures in a Star Trek episode.
We have no direct evidence that other people think, but there is nonetheless a logic to our assumption that they do. If you compare the actions and words of other people with your own, and find a high degree of similarity, it’s logical to conclude that since you have intelligence and they behave like you do, then they must have intelligence as well.
(We don’t actually think this process through, it’s an assumption we make, but making the assumption that other things that behave like you are like you is useful from the point of view of survival. Other animals do this as well, like a cat treating a wiggling piece of string as though it were living prey or hissing defensively at a self-propelled toy.)
This is a process in which we all engage and the strength of the Turing Test is that it takes this pre-existing reaction that we universally share and applies it to the question of machine intelligence. It says: if and when a machine can do the things that we ourselves do, then at that point we will make the same assumption about the machine that we do about other people, that is, that it is thinking.
The Turing Test Doesn’t Need Turing to Function
When the Turing Test is viewed in this light, it can be seen not as Turing’s invention, but as his recognition of a naturally-occuring process that would eventually be applied to artificial constructs (once they were sophisticated enough to engage it) just as it’s always been applied to natural creatures.
Arguing with it makes little sense because it’s simply what we have always done and will continue to do: react to other things based upon their resemblance to us.
And by now our artificial constructs have finally become sophisticated enough to engage this instinct. When we recognize the spooky near-humanity of some piece of CGI that doesn’t quite fool us into thinking it’s a person, we’re giving it a failing grade in a kind of Turing Test that we automatically apply to the everything around us.
The tension and unease that arise when something almost passes the test, but doesn’t quite, was described in 1970 by Japanese robotics professor Masahiro Mori as the “uncanny valley,” [Wikipedia, Mori’s paper] and it’s well illustrated by the video below.
Walby’s Argument for a New Turing Test
Terry Walby concludes that a new Turing Test is needed. Given the arguments above, should we reject this conclusion? I don’t think so.
If, as I’ve argued, Walby mistakes the Turing Test for something it isn’t, that doesn’t change the fact that the thing he’s calling for would be a damned useful thing to have.
Turing purposely sidestepped an exhaustive definition of “thinking” in order to get to a practical test for a particular kind of thinking–the kind that humans do.
But thinking is not a unitary thing. At a minimum, each of us experiences different kinds of thinking at different moments in our lives. “Thought” is not a point on a graph, it’s a blob that stretches along the X and Y axes (and possibly the Z as well), encompassing a variety of intellectual functions.
Any tool that helps us to explore, describe, and understand the territory that “thinking” maps on that graph is beneficial and worth working toward.