However, parallel corpora and inter-language replacement only get you so far. Language is so complex that models need to take into account other factors. For example, in some languages words are gender specific – the word for a male cousin is different from the one for a female cousin, for example – so a good system has to be able to look forward or back in an English text to see if it can find a reference to "he" or "she" to determine the appropriate form of the word when it translates it, which is not a simple process.
Things also tend to go wrong when you go beyond simple sentences to more complex ones. "If you use a metaphor or any poetic language, then things get much more difficult," says Dr Blunsom. "If you use a pun that the system has not seen before, then it will just translate it literally."
The vagaries and origins of different languages also mean that some things cannot be expressed – a concept known as untranslatability. Throw in some neologisms or the use of portmanteau, and subtle meanings are totally lost.
Google’s techniques also bring in another more obscure problem – the "Google time loop". This happens when the search firm finds what appears to be new parallel corpora, but is in fact is a piece of text that has been translated using Google's own service. "We try to detect if a translation that we find is one of our own, because it can cause problems," Dr Och explains. "It means that we would get stuck in the past – it can reintroduce old mistakes, as finding them again appears to reaffirm that they are correct." As a result, the search firm has had to develop its own bespoke range of algorithms just to try and detect its own translations, Dr Och reveals.
But as researchers grapple with these problems, every advance takes us one step closer to the idea of a universal translator. These mobile devices, a common feature of science fiction, allow speech to be translated on the fly, with no need to be in front of a computer. Perhaps the most famous depiction is the Babel Fish, a surreal invention of the late novelist Douglas Adams, and the inspiration of the name of Yahoo!’s translation engine. In his book The Hitchhiker’s Guide to the Galaxy, Adams described a small, yellow creature that, when inserted into the ear, translated alien languages instantly.
However, the reality of creating such a device presents even bigger challenges than just translating text.
"Spoken language is actually quite different from the written word, because sentences contain 'ums' and 'errs'," says Bill Byrne, a reader in Information Engineering at the University of Cambridge. "They may also have false starts, and reference things that were said earlier with phrases such as 'like I said.' Then of course there are some spoken languages that don't have a written form at all," he says, adding that researchers are only just beginning to figure out how to grapple with problems like these.
But several devices have been developed over the years, including one used by the US military in Iraq that stored around 2500 unique phrases and allowed soldiers to communicate – albeit in a basic fashion – with the local population.
But perhaps the nearest thing to a Babel Fish that is currently available is Google's Translate smartphone app. This builds on its web tool and can currently recognise speech in 17 languages.
To carry out a translation, a person speaks into the phone with the app running. This is recorded, and the speech clip is sent over the internet to Google's speech recognition servers. This processes the sound and transcribes it into text, allowing it to be run through the company’s existing web tools to produce a translation. This is then passed to a text-to-speech system, which produces an audio file that can be then sent back over the internet to the phone. This all happens in a matter of seconds.