When bots can pass for human in conversation, it will be a milestone in AI, but not necessarily the significant moment that sci-fi would have us believe. Phillip Ball explores the strengths and limitations of the Turing Test.

AI: The Ultimate Guide

Alan Turing made many predictions about artificial intelligence, but one of his lesser known may sound familiar to those who have heard Stephen Hawking or Elon Musk warn about AI’s threat in 2015. “At some stage… we should have to expect the machines to take control,” he wrote in 1951.

He not only seemed quite sanguine about the prospect but possibly relished it: his friend Robin Gandy recalled that when he read aloud some of the passages in his seminal ‘Turing Test’ paper, it was “always with a smile, sometimes with a giggle”. That, at least, gives us reason to doubt the humourless portrayal of Turing in the 2014 biopic The Imitation Game.

The Turing Test is often held up as a vital threshold AI must pass en route to true intelligence

Turing has influenced how we view AI ever since – the Turing Test has often been held up as a vital threshold AI must pass en route to true intelligence. If an AI machine could fool people into believing it is human in conversation, he proposed, then it would have reached an important milestone.

What's more, the Turing Test has been referenced many times in popular-culture depictions of robots and artificial life – perhaps most notably inspiring the polygraph-like Voight-Kampff Test that opened the movie Blade Runner. It was also namechecked in Alex Garland’s Ex Machina.

But more often than not, these fictional representations misrepresent the Turing Test, turning it into a measure of whether a robot can pass for human. The original Turing Test wasn’t intended for that, but rather, for deciding whether a machine can be considered to think in a manner indistinguishable from a human - and that, even Turing himself discerned, depends on which questions you ask.

What’s more, there are many other aspects of humanity that the test neglects – and that’s why several researchers have devised new variants of the Turing Test that aren’t about the capacity to hold a plausible conversation.

Researchers have devised new variants of the Turing Test that aren’t about the capacity to hold a plausible conversation

Take game-playing, for example. To rival or surpass human cognitive powers in something more sophisticated than mere number-crunching, Turing thought that chess might be a good place to start – a game that seems to be characterised by strategic thinking, perhaps even invention. After Deep Blue’s victory over World Chess Champion Garry Kasparov in 1997, we have clearly crossed that particular threshold. And we now have algorithms that are all but invincible (in the long term) for bluffing games like poker – although this turns out to be less psychological than you might think, and more a matter of hard maths.

What about something more creative and ineffable, like music? Machines can fool us there too. There is now a music-composing computer called Iamus, which produces work sophisticated enough to be deemed worthy of attention by professional musicians. Iamus’s developer Francisco Vico of the University of Malaga and his colleagues carried out a kind of Turing Test by asking 250 subjects – half of them professional musicians – to listen to one of Iamus’s compositions and music in a comparable style by human composers, and decide which is which. “The computer piece raises the same feelings and emotions as the human one, and participants can’t distinguish them”, says Vico. “We would have obtained similar results by flipping coins.”

Some say that computer poetry has passed the test too, though one can’t help thinking that this says more about the discernment of the judges. Consider the line: “you are a sweet-smelling diamond architecture”.

You are a sweet smelling diamond architecture – AI poet

Then there’s the “Turing touch test”. Turing himself claimed that even if a material were ever to be found that mimicked human skin perfectly, there was little reason to try to make a machine more human by giving it artificial flesh. Yet the robot Ava in Ex Machina clearly found it worthwhile, intent as she was on passing unnoticed within normal human society.

Our current motivation is a little different: We know that prosthetic limbs that can pass for the real thing may lessen the psychological and emotional impact that wearers report. To this end, mechanical engineer John-John Cabibihan at Qatar University and his colleagues are creating materials that look and feel indistinguishable from human skin. Earlier this year, he and his coworkers reported that they had created a soft silicone polymer that, when heated close to body temperature with sub-surface electronic heaters, closely resembled real skin. The researchers created an artificial hand by coating a 3D-printed resin skeleton with the electrically warmed polymer and used it to touch the forearms of people while the hand itself was concealed. The participants proved unable to make any reliable distinction between the touch of the artificial hand and a real one.

Ava might be pleased, but some robotics researchers argue that there are ethical reasons for making sure that humans and robots can be told apart. It would certainly prevent us from having one day to instigate Blade Runners who look for their prey with a Turing Test.

Participants proved unable to make any reliable distinction between the touch of the artificial hand a real one

A somewhat more prosaic reason to devise new varieties of Turing Tests is not to pass off a machine as human but simply to establish if an AI or robotic system is up to scratch. Computer scientist Stuart Geman of Brown University in Providence, Rhode Island, and collaborators at the Johns Hopkins University in Baltimore, recently described a “visual Turing Test” for a computer-vision system that attempted to see if the system could extract meaningful relationships and narratives from a scene in the way that we can, rather than simply identifying specific objects. Such a capability is becoming increasingly relevant for surveillance systems and biometric sensing.

For example, if looking at a street scene, could a computer answer the questions: “Is person one walking on a sidewalk?”, “Is person two interacting with any other object?”, “Are person two and person three talking?”

Online gamers can find themselves unsure if they are competing against a human or a gaming bot

As for the original Turing Test, its future is likely to be online. Already, online gamers can find themselves unsure if they are competing against a human or a “gaming bot” – in fact, some players actually prefer to play against bots (which are assumed to be less likely to cheat). You can find yourself talking to bots in chatrooms and when making online queries: some are used as administrators to police the system, others are a cheap way of dealing with routine enquiries. Some might be there merely to keep us company – a function for which we can probably anticipate an expanding future market, as Spike Jonze’s 2013 movie Her cleverly explored.

Last year an algorithm devised by a team of Russian computer scientists persuaded one in three people on a team of judges, based on short online chats, that it was a real 13-year-old Ukrainian boy called Eugene Goostman. Some critics might suggest that, with no disrespect to 13-year-old Ukrainian boys, this is not a stunning feat of deception; certainly it doesn’t obviously justify claims that the Turing Test has been passed. As computer scientist Scott Aaronson of the Massachusetts Institute of Technology has said, “Turing’s famous example dialogue, involving Mr. Pickwick and Christmas, clearly shows that the kind of conversation Turing had in mind was at a vastly higher level than what any chatbot, including Goostman, has ever been able to achieve.”

More to the point, Aaronson’s splendid conversation with Eugene, after he decided to probe further into all the publicity surrounding “him”, demonstrates the limitations rather graphically:

Scott: … Do you understand why I’m asking such basic questions?  Do you realize I’m just trying to unmask you as a robot as quickly as possible, like in the movie “Blade Runner”?

Eugene: … wait

Scott: Do you think your ability to fool unsophisticated judges indicates a flaw with the Turing Test itself, or merely with the way people have interpreted the test?

Eugene: The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

I guess we won’t be needing Blade Runners just yet.

Follow us on Facebook, TwitterGoogle+ and LinkedIn