Word Games: Machine Translation and Speech Recognition Evolve

Carl Weinschenk
Slide Show

Five Innovations That Will Change Our Lives Within Five Years

The explosion of mobility that has been an increasing characteristic of personal and business relationships during the past decade is having a secondary impact that tends to run under the radar screen. It's clear that more people are communicating, both verbally and in writing. What people don't realize is that they increasingly are doing so in different languages.

Perhaps it is not often thought of because it is so self-evident. But the folks who are charged with making communications easier - and creating revenue streams for vendors, service providers and others in the food chain - have been paying attention and are on the case. The bottom line is that the interconnected worlds of speech recognition and machine translation technology are evolving quickly.

Certainly, exotic lingual processing is nothing new: It was a big part of the Cold War and deeply rooted in the general consciousness long before Captain Kirk and Lieutenant Spock - who, strangely, spoke perfect English - began using their mobile devices to try to understand every alien who turned up.

Fast forward 50 years: The high technology that is making all of the instantaneous and untethered communications of the modern world possible also is making actual understanding of the words written and said in those other languages easier and quicker to decode. Though it is a world that is a bit different than that usually dealt with by enterprises and service providers, there is a constant: Integration and miniaturization - doing the job in less space, with less hardware and software and with less of an energy footprint - are considered essential.

There is a lot of money on the table, and vendors know it. "There is no doubt that the technology of both speech recognition and translation has increased in terms of the pace of change because online access has exposed the mobile needs and language needs," said Salim Roukos, the senior manager for multilingual NLP technologies and the CTO of translation technologies for IBM. "Everything is at everyone's fingertips everywhere, but the information may be in another language."

What Is IBM's Watson

How far the industry has evolved was on display on network television last year. The performance of IBM's Watson, which beat human competitors on the game show "Jeopardy!" early last year, was a milestone in natural language processing, according to Don DePalma, the chief strategy officer and founder of Common Sense Advisory, a firm that follows the automated translation and other industry sectors.

The translation element was not part of the "Jeopardy!" experience. But the speed with which Watson was able to keep Alex Trebek happy by presenting its responses in the form of a question is a big part of the overall task. "There was minimal latency," DePalma said. "In as much time as it took [for the other contestants] to hit the button, it was processing the question, parsing, sending queries to the back end and formulating an answer. That was quite impressive from a technical viewpoint. [Someday] the databases will be in other languages."

DePalma said that two related but formerly distinct disciplines - machine translation and automated speech recognition - are coalescing. The integration into the same software simplifies operations, reduces power consumption and shrinks the footprint and computation demands of the package. Indeed, the entire combined system will be small enough to be housed in a tablet or smartphone. A dazzling array of apps can be spun out of this basic functionality.

Integration Leads to Big Benefits

The massive decentralization of business makes flexibility and the ability to do anything anywhere shine a light on speech in general, especially since unified communications platforms make it easy for employees in different countries - who like speak different languages - to work together on exacting projects. Jonathan Litchman, a senior vice president at Science Applications International Corporation (SAIC), said that the company is working on a patent-pending approach that uses the same platform for speech and text.

Litchman's assessment of why SAIC is trying to integrate speech and text translation mirrors the belief held by both Roukos and DePalma. In the legacy world, Litchman said, machine translation and speech recognition were discrete systems and may even have come from different vendors. That no longer needs to be the case. Combining them both provides core IT-type benefits in terms of reduced power draw and footprint. It also enables a specialized advantage, which is the use of the same dictionary for both the speech recognition and text translation tasks. This further reduces resource drain and eliminates all sorts of potential problems in terms of the end users' experience.

Two examples of how such a system may be used, Litchman said, are automatic translations of documents, email and chat between locales of a multinational or a mobile health clinic application that allows patient intake and communications fluidly when the parties are speaking different languages. Indeed, Litchman said that the latter implementation currently is being using by the New Jersey Department of Health and Senior Services.

Litchman pointed out that the old way of doing things in which an enterprise-wide application was connected to the company server and sold on a per-seat basis no longer is operable. The tasks that must be performed by modern corporations are far too varied and must have a small enough footprint to be carried on mobile devices.

Roukos' view of the evolution of speech technology appeared to run parallel with Litchman's and DePalma's. He said that there are two basic paths: natural language understanding and machine translation. The former is evolving well, he said. Apple's Siri is the highest-profile project, but there are others in the field.

The promise of combining the two is expansive. Folks in different offices of a multinational corporation can exchange emails written in French and Spanish and have them translated on the fly. A smartphone with the proper app can be held up to a television program in English and have it translated to Norwegian.

The closely connected worlds of machine translation and automated speech recognition are moving ahead together. Indeed, companies big and small in this area are in a wonderful position because the explosion of mobility and the general decentralization of modern business is making such developments a necessary step rather than a nice but superfluous innovation.

Add Comment      Leave a comment on this blog post

May 3, 2012 11:14 AM Sander123 Sander123  says:

I doubt that the performance of  IBM's Watson was a milestone in automated speech recognition when it won on "Jeopardy!", since that version of Watson did not use speech recognition.

According to Jennifer Chu-Carroll from the Watson/Jeopardy team, Watson received the text of the clue electronically. See http://www.ibm.com/developerworks/podcast/dwi/feature022411-watson.html

May 3, 2012 11:25 AM Kevin Brown Kevin Brown  says:

Sadly Mr. Weinschenk either fell for Don Depalma's explanation or mistranslated his answer (and you wonder why it is so difficult for us to do this with computers?!?)

However, given the attributed quote to Mr. Depalma, it sounds like he needs to do a little homework, not the article's author.  "..it was listening to the question, parsing, sending queries to the back end and formulating an answer." 

"It" was a human who keyed in the questions ahead of time, entered the query and when Watson had a result, then the buzzer went off.

So this article loses most of its value by this completely incorrect perspective attributed to Mr. Depalma.

Natural Language Processing does NOT include automated speech recognition (ASR), nor does it include machine translation.    Machine translation can use keyed in information or ASR as the user interface.  Google Translate does a pretty darned good job with ASR on the front end as input, machine translation in the middle and text to speech on the output.

Siri uses a very constricted Natural Language Recognition ASR as the UI, does NOT contain any Artificial Intelligence (counter to many media claims and thankfully not mixed into this article) that lead to only eleven functions (count them, there are only eleven things you can do with Siri.)

Unfortunately due to the incorrect information attributed to Mr. Depalma, with the convoluted mixture of three different technologies, this article does more harm than understanding for the lay person.

Kevin C. Brown

Managing Director


May 4, 2012 10:23 AM Carl Weinschenk Carl Weinschenk  says:

The passage concerning Watson and Jeopardy! has been updated...

Oct 3, 2012 4:24 AM e urdu point e urdu point  says:
it was listening to the question, parsing, sending queries to the back end and formulating an answer." "It" was a human who keyed in the questions ahead of time, entered the query and when Watson had a result, then the buzzer went off. Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Resource centers

Business Intelligence

Business performance information for strategic and operational decision-making


SOA uses interoperable services grouped around business processes to ease data integration

Data Warehousing

Data warehousing helps companies make sense of their operational data