The explosion of mobility that has been an increasing characteristic of personal and business relationships during the past decade is having a secondary impact that tends to run under the radar screen. It's clear that more people are communicating, both verbally and in writing. What people don't realize is that they increasingly are doing so in different languages.
Perhaps it is not often thought of because it is so self-evident. But the folks who are charged with making communications easier - and creating revenue streams for vendors, service providers and others in the food chain - have been paying attention and are on the case. The bottom line is that the interconnected worlds of speech recognition and machine translation technology are evolving quickly.
Certainly, exotic lingual processing is nothing new: It was a big part of the Cold War and deeply rooted in the general consciousness long before Captain Kirk and Lieutenant Spock - who, strangely, spoke perfect English - began using their mobile devices to try to understand every alien who turned up.
Fast forward 50 years: The high technology that is making all of the instantaneous and untethered communications of the modern world possible also is making actual understanding of the words written and said in those other languages easier and quicker to decode. Though it is a world that is a bit different than that usually dealt with by enterprises and service providers, there is a constant: Integration and miniaturization - doing the job in less space, with less hardware and software and with less of an energy footprint - are considered essential.
There is a lot of money on the table, and vendors know it. "There is no doubt that the technology of both speech recognition and translation has increased in terms of the pace of change because online access has exposed the mobile needs and language needs," said Salim Roukos, the senior manager for multilingual NLP technologies and the CTO of translation technologies for IBM. "Everything is at everyone's fingertips everywhere, but the information may be in another language."
What Is IBM's Watson
How far the industry has evolved was on display on network television last year. The performance of IBM's Watson, which beat human competitors on the game show "Jeopardy!" early last year, was a milestone in natural language processing, according to Don DePalma, the chief strategy officer and founder of Common Sense Advisory, a firm that follows the automated translation and other industry sectors.
The translation element was not part of the "Jeopardy!" experience. But the speed with which Watson was able to keep Alex Trebek happy by presenting its responses in the form of a question is a big part of the overall task. "There was minimal latency," DePalma said. "In as much time as it took [for the other contestants] to hit the button, it was processing the question, parsing, sending queries to the back end and formulating an answer. That was quite impressive from a technical viewpoint. [Someday] the databases will be in other languages."
DePalma said that two related but formerly distinct disciplines - machine translation and automated speech recognition - are coalescing. The integration into the same software simplifies operations, reduces power consumption and shrinks the footprint and computation demands of the package. Indeed, the entire combined system will be small enough to be housed in a tablet or smartphone. A dazzling array of apps can be spun out of this basic functionality.
Integration Leads to Big Benefits
The massive decentralization of business makes flexibility and the ability to do anything anywhere shine a light on speech in general, especially since unified communications platforms make it easy for employees in different countries - who like speak different languages - to work together on exacting projects. Jonathan Litchman, a senior vice president at Science Applications International Corporation (SAIC), said that the company is working on a patent-pending approach that uses the same platform for speech and text.
Litchman's assessment of why SAIC is trying to integrate speech and text translation mirrors the belief held by both Roukos and DePalma. In the legacy world, Litchman said, machine translation and speech recognition were discrete systems and may even have come from different vendors. That no longer needs to be the case. Combining them both provides core IT-type benefits in terms of reduced power draw and footprint. It also enables a specialized advantage, which is the use of the same dictionary for both the speech recognition and text translation tasks. This further reduces resource drain and eliminates all sorts of potential problems in terms of the end users' experience.
Two examples of how such a system may be used, Litchman said, are automatic translations of documents, email and chat between locales of a multinational or a mobile health clinic application that allows patient intake and communications fluidly when the parties are speaking different languages. Indeed, Litchman said that the latter implementation currently is being using by the New Jersey Department of Health and Senior Services.
Litchman pointed out that the old way of doing things in which an enterprise-wide application was connected to the company server and sold on a per-seat basis no longer is operable. The tasks that must be performed by modern corporations are far too varied and must have a small enough footprint to be carried on mobile devices.
Roukos' view of the evolution of speech technology appeared to run parallel with Litchman's and DePalma's. He said that there are two basic paths: natural language understanding and machine translation. The former is evolving well, he said. Apple's Siri is the highest-profile project, but there are others in the field.
The promise of combining the two is expansive. Folks in different offices of a multinational corporation can exchange emails written in French and Spanish and have them translated on the fly. A smartphone with the proper app can be held up to a television program in English and have it translated to Norwegian.
The closely connected worlds of machine translation and automated speech recognition are moving ahead together. Indeed, companies big and small in this area are in a wonderful position because the explosion of mobility and the general decentralization of modern business is making such developments a necessary step rather than a nice but superfluous innovation.