Google Sees Speech APIs Transforming Range of Business Processes

    Chatbots and other forms of voice-enabled artificial intelligence (AI) are slowly but surely being incorporated into multiple business processes, especially when it comes to anything to do with customer service. The success of those efforts, however, has been mixed at best. Most end users can still identify the robotic sound of the voice when they are dealing with, for example, a chatbot. Google is out to transform that experience by improving both the text-to-speech and speech-to-text application programming interfaces (APIs) it exposes via Google Cloud.

    Dan Aharon, a product manager at Google, says these APIs will significantly advance adoption of chatbots and other form of integrated voice response (IVR) systems. Aharon admits those capabilities may not be able to mimic all the emotional inflections a human voice is capable of emitting. But they will provide enough of a natural-sounding voice to extend the usage of these technologies across a broader range of processes, says Aharon.

    While the speech-to-text API is used mainly to facilitate voice-enabled queries, Aharon says the text-to-speech API now available in beta represents a critical advance in terms of making it possible for a machine to verbally share reports with an end user. Instead of having to read the report, the details of the report will be read aloud to the end user by the machine. That capability will enable machines to deal with routine customer support inquires in a way that goes way beyond what IVR systems can do today, says Aharon.

    “The responses won’t have to be pre-recorded,” says Aharon.

    In fact, as those capabilities improve, many customers may prefer to deal with a machine for customer support versus waiting for a human to get around to their call.

    These capabilities are becoming more accessible because graphical processor units (GPUs) and tensor processor units (TPUs) optimized for machine and deep learning algorithms can be readily invoked as a cloud service via an API. That makes it possible for developers to more easily embed these capabilities into a broad range of applications.

    It may be a while before advanced speech technologies become pervasively embedded into every business process. But at this point, it’s now only a matter of time before machines play a much bigger role in the overall business process conversation.

    Mike Vizard
    Mike Vizard
    Michael Vizard is a seasoned IT journalist, with nearly 30 years of experience writing and editing about enterprise IT issues. He is a contributor to publications including Programmableweb, IT Business Edge, CIOinsight and UBM Tech. He formerly was editorial director for Ziff-Davis Enterprise, where he launched the company’s custom content division, and has also served as editor in chief for CRN and InfoWorld. He also has held editorial positions at PC Week, Computerworld and Digital Review.

    Get the Free Newsletter!

    Subscribe to Daily Tech Insider for top news, trends, and analysis.

    Latest Articles