Weinschenk spoke to Lexifone CEO and Chief Scientist Ike Sagie. This week, Lexifone and international telephone number provider Voxbone announced a partnership that will utilize Lexifone's real-time translation capabilities.
Real-time translation over telephone connections — one person speaking in English and another in Spanish and both hearing immediate translations, for instance — clearly has great potential for businesses and consumers. Carriers also will benefit, since traffic would grow if people who speak different languages suddenly could communicate. Lexifone CEO and Chief Scientist Ike Sagie tells IT Business Edge blogger Carl Weinschenk that the secret is marrying speech recognition and telecommunications technologies.
Weinschenk: What is Lexifone?
Sagie: Lexifone, very simply put, is the first and probably only fully automatic phone interpreter. When people pick up a phone — any phone — and use any carrier they will have the option of having the call translated in real time.
The 50,000-foot view is that we are the first to combine two different technologies into one: telecommunications and speech recognition. The speech recognition technology is to some extent the same as on smartphones. We use the telecommunications technology of conference calls, VoIP and other technologies that usually do not include speech recognition. We combine very advanced telephony technology with voice recognition technology.
Weinschenk: Is the challenge to doing this squeezing all the functionality into the device?
Sagie: We do not package anything on any phone. We do this as a service. The way you access Lexifone is by dialing a telephone number as you would 411 or any other service. You can do it with a regular phone, a smartphone, Skype or any other type of phone. You access our service via the telephone number provided locally all over the world. We have a large cloud-based service that is very, very powerful. It does all the processing. It all is done on our very big servers. Nothing is being processed in your phone.
You do not need Internet access or Wi-Fi or any other Internet connection. This is extremely important if you are traveling abroad. People are very sensitive when they are traveling abroad because roaming is very expensive.
Weinschenk: So the amount of computing necessary to perform these tasks makes it impossible to do on a device.
Sagie: There is too much number crunching to give you any high level of accuracy … [Existing services are not useful in real time scenarios] because the telecommunications functionality is detached from the application processing. This combination is very important.
Weinschenk: What do you mean by “combining” the elements?
Sagie: Combining means that they work in the same circuitry. It is the same loop for the entire process, from when the person dials a number. We act as the operator. We pick up the call, process it, do the speech recognition and complete the cycle by connecting to the other party and establishing kind of a three-way conference call with the caller and the person called, with Lexifone in the middle.
This combination is unique to us to the best of my knowledge. It has been in development for three years. This level of seamless integration between telephone and voice recognition is unique. The entire concept is unique.
Weinschenk: What is the overall state of speech recognition?
Sagie: It’s in constant progress. Some breakthroughs in speech recognition have been made in the last year or so. The state of the art is moving very nicely. Today recognition is at a very high level of accuracy. The state of the art is advancing and we will see, say five years from now, very high-recognition accuracy levels. We already are in the domain of over 85 percent. The advances can get us to the 95 or 98 percent point. We no longer are in the era of very low accuracy.
Weinschenk: What you are doing — and perhaps products from competitors and other types of speech recognition tools — seems to be a great opportunity for carriers.
Sagie: The bottom line is that we can get billions of minutes of air time per year for the product. This is a very tempting business opportunity for operators worldwide.
The reason is very simple. Operators today have more or less saturated the air time. People talk as much as they want. The challenge to increasing revenues is adding value and premium content. You don’t have new reasons to use the phone. Lexifone opens up a completely untapped reservoir, a repository of air time. You now will be able to talk to people who do not speak your language. Until now it did not occur to people to just pick up the phone and talk to a colleague who speaks Chinese and just talk to them. In the past, email and other tools were the means of communicating. Now they will start using air time. The potential is huge for operators.
Weinschenk: Lexifone appears aimed at both consumers and business users.
Sagie: We appeal to both with same level of excitement. For businesses the advantages more or less go without saying. It will benefit large enterprises, government themselves, hospitals and the entire travel industry — any organization that encounters language barriers. And it’s not just for expats. In the U.S., for instance, there is a need for understanding Spanish by people who don’t speak it.
For consumers, the way to use Lexifone is to register like you would for the Skype out service. … the way you register is to go onto the site. You prepay $10 or $50 or what you can subscribe for a monthly charge. Once your phone is registered you can use the number wherever you are. You call local Lexifone number and you are directed to dial the number you want to call and then you are set. We are working on a version for smartphones, which features automatic dialing from the device’s phone book.
We have 15 languages and growing. We will add a new language on average every two months. The service also distinguishes, for instance, between American English, Australian English and English English. It recognizes Castilian Spanish and Mexican Spanish and U.S. Spanish and other language variants.
Weinschenk: Do you support translations in conference calls — when more than two languages are being spoken?
Sagie: We do now to some extent. We also want to support multilingual, multi-language conference calls. Today during a conference call it is very simple. You create the conference call yourself and then dial up Lexifone as another party to the call. As soon as it is on it listens to all participants. Participants simply announce today which language they are speaking.
We are in contact with a number of carriers. I can’t disclose what we are discussing or make an announcement. The service today requires people to dial in to Lexifone to make a call and then dial the call. If the service is offered— say by Verizon — they would not need to dial any number. They would dial to the destination; if you want to translate, they give you the means to do it.
Weinschenk: It sounds like the science and business of speech recognition is moving ahead quickly.
Sagie: The entire field is progressing and even accelerating the progress. I predict that in the next five to 10 years we will get to almost 100 percent accurate speech translation.