As video becomes a more mainstream element of everything from sales and marketing collateral to training provided by the human resources (HR) department, managing all the video being produced is becoming exponentially more challenging. To enable organizations to rise to that challenge, IBM today extended the artificial intelligence (AI) capabilities it has infused into the IBM Cloud Video service.
Hosted on the IBM Cloud platform, the IBM Cloud Video service makes use of the machine and deep learning algorithms IBM developed for the IBM Watson cognitive computing platform. New capabilities being added to IBM Cloud Video include the ability to recognize speech within videos and convert spoken words and phrases into text for video captions. IBM is also now making it possible for the captions generated by Watson to be reviewed by a human to increase accuracy. The Watson algorithms then incorporate those adjustments into any caption that gets generated for similar types of content.
Chris Zaloumis, senior director of enterprise video offerings for IBM Cloud Video, says that as video becomes a mainstream element of business processes, there’s a pressing need to automate the process through which captions and tags get generated. Tags are especially critical when it comes to exposing video content that would otherwise be invisible to search engines, adds Zaloumis.
One of the top use cases for video in the enterprise is sales. Rodan + Fields, a manufacturer of skincare products, makes extensive use of IBM Cloud Video to train over 60,000 consultants in addition to using video to engage over 80 million people living in 45 different countries via social channels. The Watson capabilities embedded in IBM Cloud Video enable Rodan + Fields to automatically create captions in multiple languages, says Zaloumis. No matter the language spoken, Zaloumis says it’s already been shown that video that is accompanied by captions is retained better.
“You can use the captions to stress key messages,” says Zaloumis.
Zaloumis notes that the IBM approach to employing AI is much more accessible than rival cloud service providers that simply expose application programming interfaces (APIs) to AI functions that are aimed primarily at application developers.
Most of the unstructured content being generated today comes in the form of video. By 2020, it is forecasted that as much as 82 percent of the content on the internet will be in the form of video. Tags and captions make it simpler to include video files as part of an overarching analytics strategy. But none of that is going to be of much use if every video being created needs to be manually tagged and captioned. There simply are not enough hours in a day to watch it all, much less comprehend the key message the creator of a video may have being trying to deliver.



