Last year, IBM put a stake in the ground by identifying the open source Apache Spark in-memory computing framework as a strategic initiative around which it would build what amounts to a data analytics operating environment. This week, IBM put some more meat on the bones of that initiative in the form of Project DataWorks, a framework for data governance that IT organizations can employ on top of both Apache Spark and the IBM Watson cognitive computing platform running on the IBM Bluemix platform-as-a-service (PaaS) environment.
Ritika Gunnar, vice president of offering management for IBM Analytics, says IBM is putting in place a data management and governance framework that pulls together all the data, tools and applications that organizations need to create advanced analytics applications.
Project DataWorks, says Gunnar, takes the tools that IBM developed as part of a Data Science Experience initiative and makes them available on a cloud service that eliminates the need for internal IT organizations to first acquire and provision IT infrastructure and then craft a framework for managing all that data.
“We’ve created a data analytics pipeline,” says Gunnar.
By combining Apache Spark and Watson on the same cloud platform, IBM recognizes that Big Data analytics in the form of Spark is a crucial front end for cognitive computing applications running on Watson. Rather than trying to analyze massive amounts of data on high-end Power Series servers running Watson, organizations can use Apache Spark to analyze massive amounts of data using lower cost x86 or Power Series servers. That data can then be passed on to Watson in a way that makes it simpler to recommend a particular course of action based on all the data that Watson has available.
IBM is essentially betting that, when it comes to advanced analytics and cognitive computing, most organizations would rather take advantage of pre-fabricated cloud services that can be customized versus trying to build everything themselves from the ground up. Given the costs involved in these types of projects, that’s a reasonable expectation. The question is what type of applications will organizations actually build now that the core platform for creating them is becoming more accessible.






