Artificial intelligence: How to turn Siri into Samantha

Microsoft Research is testing its AI technology with robots and a virtual receptionist at its headquarters in Redmond

Related Stories

"Siri, why do you struggle with conversations?"

"I don't know what you mean - how about a web search for it?"

If you want the latest football scores, to add meetings to your calendar or launch an app, today's virtual assistants are relatively good at understanding your voice and doing what's asked.

But try to have the type of natural conversation seen in sci-fi movies featuring artificial intelligence systems - from HAL in 2001 to the sultry-voiced operating system Samantha in Spike Jonze's Her - and you'll find your device about as smart as a waterproof teabag.

"Google and Apple are painfully aware that their systems are not getting better fast enough because right now Siri and Google Now and the other personal assistant type applications are all programmed by hand," says Steve Young, professor of information engineering at the University of Cambridge.

Siri Apple's Siri allows iPhone and iPad owners to control the devices with natural language commands

"If you speak to Siri about baseball it seems relatively intelligent, but if you ask it something much less common it doesn't really do anything except for a web search.

"That's an indication that the programmers have been busy trying to anticipate what people want to ask about baseball but haven't thought about people who ask about, for example, GPU chips because you don't get many queries about that."

Dancing a tango

So what's the alternative?

Microsoft doesn't yet have a virtual assistant on its Windows Phone platform, but the company is experimenting with AI in lifts and reception desks at its headquarters.

Eric Horvitz, managing director of Microsoft's research unit, believes part of the solution involves allowing computers to look beyond questions posed.

"The ability of a system to understand more broadly what the overall context of a communication is turns out to be very important," he told the BBC.

Her still The movie Her is about a man who falls in love with his smart device's operating system

"There are some critical signals in context. These include location, time of day, day of week, user patterns of behaviour, current modality - are you driving, are you walking, are you sitting, are you in your office. Are you in a place you are familiar with versus one you are not?

"A person's calendar can be a very rich source of context, as is their email."

Start Quote

We will be able to come up with very compelling personalities”

End Quote Eric Horvitz Microsoft Research

He adds that for a more natural interaction, software also needs to learn how to simulate the rhythm and beat of the way humans talk to each other.

To do this, he says, computers should be working out their response while the person is still speaking, rather than waiting for them to finish.

"It turns out that conversation is more or less like a very, very complex tango - a dance between two people," he explains.

"[It] involves not just a simple turn-taking, like you might see in today's assistants on cellphones, for example.

Microsoft's Eric Horvitz speaks to the BBC's Leo Kelion

"It's actually a very complicated, fluid operation where people are breaking in and starting over again and reflecting and listening, all at the same time sometimes."

Mr Horvitz wouldn't reveal when Microsoft might start offering such capabilities to the public.

But reports suggest that the company could unveil Cortana - an app named after the AI system in its Halo video game - in April.

Coughs and tuts

Apple is notoriously secretive. But research by a company whose tech helps power Siri provides clues about how the facility could be improved.

Voice-recognition specialist Nuance says its researchers are currently studying paralinguistics - how users speak rather than what they talk about.

"We're looking at the acoustic elements to be able to detect emotions in speech," reveals John West, a principal solutions architect at the firm.

"The intonation, what's termed the prosody - the tune you use to speak - if you are happy it rolls along quite nicely. If you are sad it's more abrupt - and the language used."

As well as helping clients' AIs work out the best response, he says this can also help them sound more natural.

Cortana The forthcoming Microsoft AI app is reported to be based on Cortana - a character in its video game series Halo

"Although I've yet to see it deployed, we do have the capability to put hesitations and other non-verbal audio into an output engine," he says.

"However, they need to be very carefully programmed because you need to understand where to put the pauses, tuts, breathes and possibly a cough."

DIY database

But Prof Young believes a more fundamental change is needed: rather than telling an AI how to respond we should make it learn through a process of trial and error.

This is the basis of a system he is developing called Parlance.

An example of conversation it might have would be:

  • Human: I want to eat a pizza
  • AI: Sorry, I don't know what a pizza is
  • Human: OK, well do you know where there's a nice Italian restaurant?
  • AI: Yes, there's one 20m down the road to your right
  • Human: Thank you

If the user appears satisfied, Prof Young says, the computer adds an association to its knowledge database.

"It stores this away, not as a rule, but it changes the probabilities in its statistical maps," he explains.

"So, the next time someone asks for a pizza it knows that you get them from an Italian restaurant. And it's not been told that except through the users themselves."

Google Now Google Now accepts voice commands but tries to anticipate its users' needs

Google's £400m takeover of British AI developer DeepMind could hasten the rollout of such self-taught systems, improving the quality and breadth of knowledge offered, Prof Young believes.

But both he and Microsoft warn they still won't deliver the kind of sentient presence Hollywood loves to depict.

"When I come in the morning my [AI] assistant on my door recognises me and in a very nice British voice says: 'Good morning Eric' - and I enjoy it even though I know it's artificial," says Mr Horvitz.

"So, I do think that we will be able to come up with very compelling personalities.

"However, unlike the kinds of things we see in the movies, for many years to come there probably won't be anybody home in the way people would expect or desire."

More on This Story

Related Stories

The BBC is not responsible for the content of external Internet sites

More Technology stories

RSS

Features & Analysis

  • SyedTanks instead of toys

    Lyse Doucet on the plight of children in Syria and Gaza


  • Silhouette of manSuper-shy

    Why do Germany's super-rich so often keep their heads down?


  • Children playing in Seoul fountainDay in pictures

    The best news photos from around the world in the past 24 hours


  • Gin drinkerMother's ruin

    The time when gin was full of sulphuric acid and turpentine


BBC Future

(Getty Images)

Interactive: How planes crash

Shedding light on air disasters Read more...

Programmes

  • The smartphones of shoppers being tracked in a storeClick Watch

    How free wi-fi can enable businesses to track our movements and learn more about us

BBC © 2014 The BBC is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.