Thanks mainly to the rise of advanced analytics, a whole new DataOps discipline is starting to take shape. Borrowing heavily from concepts first advanced under the banner of DevOps, the goal for organizations embracing DataOps is to create a set of structure processes around the data pipelines that are needed to drive analytics applications that are increasingly incorporating machine learning algorithms.
To advance that goal, MapR Technologies has updated its platform for running Big Data applications based on Hadoop to now include tools for administering and monitoring volumes, tables and streams that are integrated via version 6.0 of the MapR Converged Data Platform. To accomplish that goal, the database that MapR developed for this platform now supports event streaming.
Anoop Dawar, vice president of product management and marketing for MapR, says machine learning algorithms along with other forms of artificial intelligence (AI) technologies involving deep learning algorithms are forcing the DataOps issue. Many of the organizations embracing these technologies will be required to systematically show how various models built using these algorithms were employed at different times to achieve a specific outcome. In effect, compliance issues surrounding usage of advance analytics to automate a specific process will soon become a significant issue, says Dawar.
“There will be a crisis of compliance,” says Dawar.
The MapR Convergred Data Platform is designed to head that crisis off by providing a single pane of glass that makes it simpler to keep track of what types of data and models are being employed at any given time, says Dawar.
It’s still early days as far as DataOps is concerned. But if data is truly the new oil, then IT organizations need to start thinking of themselves as data refineries. After all, crude oil stored in a barrel isn’t worth nearly as much as it is once that oil has been refined. The real challenge is setting up all the pipelines that make that process possible.