Voice recognition is the ultimate user interface, the point at which humans meet machines, and it’s moving front and center as technical choices proliferate. Nothing, to most people, is as simple as saying what he or she wants. It’s up to voice recognition vendors to create technology that is capable of a response.
This week, Microsoft said that it has improved the accuracy and increased the speed of Bing’s voice recognition technology. A piece at CNET, which includes a Microsoft video, says the company claims that its deep neural network technology increases speed by about half and improves accuracy 15 percent compared to its prior products.
Microsoft declined to compare its results against its competitors due to differences in measurements. The idea is to mime how the human brain understands speech. Obviously, much more information is available in the papers referenced in this cursory explanation:
To get those improvements, Microsoft replaced the acoustic model in its speech recognition technology. In a technical paper, five Microsoft researchers found that using deep neural networks for speech recognition helped minimize the variability in speech that often trips up the acoustic model that Microsoft’s previous technology used, known as the Gaussian mixture model.
Bing and Microsoft are not the only companies to push the verbal envelope. The Next Web reported this week that Nuance, the big banana in voice recognition, released the Dragon Mobile Assistant app for Android. The app offers “intelligent driver mode,” in which the system detects vehicle movement and switches to hands-free mode. The story asks a good question: How does the app know if the user is the driver or simply a passenger in the vehicle? A related question is whether the hands-free option automatically kicks in on a train.
Nuance also late last month agreed to acquire Tweddle Connect, a company that specializes in in-vehicle entertainment systems. These platforms include voice recognition systems. Nuance already is a player in that sector, so the closing of the deal, which is expected during the third quarter, will strengthen its position. Between them, the combined company will serve Toyota, Lexus, BMW, Chrysler and Ford.
The controversial element of the voice recognition arena is very much related to Nuance’s two announcements. It seems to be a no-brainer that hands-free driving using a voice recognition user interface would be far safer than other UIs. That’s undoubtedly true. However, as IT Business Edge posted on last week, driving safety is not quite that simple. The University of Utah, on behalf of the AAA Foundation for Traffic Safety, ran a study that proved the level of driver distraction is tied to the complexity of the task being done, even if the driver is using a hands-free platform. Thus, hands-free can be seen as a useful tool – but not a panacea.
And finally, a high-level, academic look at the difference between voice recognition and natural language processing which, apparently, is when the machine actually understands what is being said, is at The Chronicle of Higher Education’s Lingua Franca.