Speak and spell: Can voice technology give your app the edge?

  • Published
Switchboard operator
Image caption,
You are connected: We take speech recognition for granted - but computers struggle to keep up

"What Henry Ford did for automobiles - they did for dictionaries".

Miles Kronby is chief digital product officer at Merriam-Webster, the dictionary people. He's talking about the Merriam brothers.

Seeing a gap in the market, they took Noah Webster's unwieldy two volume dictionary of American English, and brought it to the mass market - creating one of the best-selling books of all time.

Thinner paper and modern production techniques let them cut costs to a level that would allow almost every family in north America to have a copy on their shelves.

The year was 1847, pre-dating the Oxford English Dictionary by nearly 50 years.

Mr Kronby says the company is proud of its history of early adoption. It was among the first dictionary publishers to put their content online, for free, in the mid-90s.

"The history of Merriam Webster is probably the history of using technology to make meaning, in the sense of definitions, more accessible to people."

In line with this philosophy they recently launched a smartphone app that allows users to search for definitions using their keyboards - and also using voice commands.

Image caption,
Using voice search lets users look up words they can't spell

"What we want is to get as many people as possible accessing our dictionary, wherever they might be. Voice search is one element of that," says Mr Kronby.

The app hit the million download mark in the first month. Mr Kronby says users have been enthusiastic about the voice search feature.

"It addresses a need that some people have, when they want to look up a word and they don't know how to spell it.

"If you don't know how to spell a word you can't look it up, but you're looking it up to find out how to spell it. It's Catch 22. Voice search addresses that need."

Speak up

Having voice functionality available in smartphone apps isn't new.

But cloud computing, more powerful devices, and improvements in the technology mean that more and more companies are using it to try and set themselves apart from the crowd.

Nuance is the US-based company behind the Merriam Webster app. They are responsible for Dragon Naturally Speaking, the market-leading voice transcription software.

Image caption,
The Dragon Dictation app turns speech into a text, tweet, email or Facebook update

But they are also behind a range of other products - from medical and legal transcription services and call centre voice interfaces to the voice technology found in luxury cars, predictive texting and sat nav systems.

John West is a senior solutions architect in the mobile division.

"We're probably the largest IT company you've never heard of," he says.

He is convinced that voice is the future.

"Looking at the way that people input data into [smartphones], we've now got the bandwidth and devices that are capable of using voice.

"Keyboards can be difficult to use. If people have large fingers it can be tricky. Voice is a richer experience to be able to put those things in."

Nuance has developed its own smartphone apps - Dragon Search and Dragon Dictation - as well as having their technology embedded in others.

The US version of the Amazon app uses Nuance's voice search functionality, as does Ask.com, and virtual personal assistant app Siri, which will even find you a cab if you tell it you're drunk.

Image caption,
Google Translate uses voice recognition

The company has launched a software kit that allows developers to create apps using their voice recognition technology.

It's free for 90 days, with a tiered pricing plan for apps using the technology that make it to market.

Mr West thinks using voice can give businesses the edge.

"It's a very crowded market out there now. There are a 100,000 Android apps now and far more iPhone apps. It's growing at a rapid rate of knots - how do you differentiate your app from everyone else's?"

In for the long haul

Voice recognition technology has been around for 50 years or so - but it's been a bumpy ride.

Anyone that tried using it on their PC when it first hit the IT mainstream back in the 90s could be forgiven for being sceptical.

It hasn't been without controversy, and the problems that people with thick accents face have spawned more than one comedy sketch.

Chewy Trewhella, new business development manager at Google, admits that users from North America may have a better experience than those with other accents. But he says this is constantly improving.

Image caption,
Chewy Trewhella: "We're really interested in making it easier for people to interact with their devices."

"I think we've come on a long way from the old days of installing software on your computer and having it analyse your voice and try to understand phonetically what's being said.

"Now we can do things like matching voice patterns and matching words against existing files and recordings of words, which is a lot more successful."

Google has invested heavily in the space. Google Translate is available for voice as well as text.

Voice search functionality - available as part of the search engine, and voice-to-text message functionality within Google Voice in the US - is also a key part of the Android operating system.

Apps like Google Shopper have voice search as key part of the experience.

"Higher speed internet and big data centres, with lots of cloud computing power being able to do some of this raw crunching, able to turn much more accurate matching results, meant we thought voice absolutely made sense," says Mr Trewhella.

"The timing was right."

Android users can 'personalise' the software, with it adapting to an individual's speech.

The application programming interface - or API - is free to developers.

One Italian start-up that has taken advantage of this is Eudata, who used Google's voice functionality in their Travel Cyborg app for Android phones.

Image caption,
Travel Cyborg is an Italian language app for Android that lets users search for cheap flights

Users search for cheap flights and accommodation. Chief technical officer Mirko Puliafito says the app, which came out of development just two months ago, was originally an experiment.

"We wanted to better understand the market and business model behind this type of mobile application - what would be the user experience?"

The company is pleased with the feedback it has had. It plans to translate the app into English and German, and has been in talks with both Nokia and Apple about bringing it to their platforms.

"This is one of the new applications that could change the way the user interacts with technology," says Mr Puliafito.

"It's a field I want to investigate and invest money in because I think there could be a lot of business in the future for that."

Human v computer

So is 2011 the year of voice? Maybe, but problems still remain.

Computers are very good at doing dumb tasks rapidly, but they have a very hard time doing a lot of stuff that we find quite easy, says Google's Chewy Trewhella.

"We speak as second nature, we can recognise images. Computers struggle with this.

"And translation - learning how language is structured - computers are not very good at that... It's that holy grail of artificial intelligence, actually teaching a computer to act like a human being."

With both Nuance and Google, voice technology is only possible at the moment on smartphone devices thanks to cloud computing, which means that when there's no internet connection, there's no functionality.

Steve Cramoysan, research director from technology analysts Gartner, says this is one of the big challenges facing the technology.

"A huge vocabulary means you can't physically store it - you have memory limitations on the smartphone, so the phone is dependent on the cloud [to store the data].

"The challenge for providers is to get the balance right with an app that sits on the smartphone and can do the basics - but then refers to the cloud to do the heavy lifting."

Looking to future, he is more cautious about the speed of adoption of voice technology in general.

"Every year people say - voice is going to take off, it's going to be the year of voice recognition. I've been through this too many times to say this is the year.

"I think we're going to see an increasing visibility of it being deployed in all sorts of applications, but I see this as rising tide rather than a tsunami."

Merriam Webster's Miles Kronby points out that as users come to expect certain functionality, those apps that don't incorporate it risk losing them.

"We're encouraged by the response, so we're going to continue using it.

"We get the sense now that we are probably really at the beginning of how this technology can be used in interesting ways."

Related Internet Links

The BBC is not responsible for the content of external sites.