With interest in the open source Apache Spark framework for running Big Data applications in real time growing rapidly, IBM this week made it clear that the Apache Spark framework is now a foundational element of its overall strategy.
At the IBM Insight 2015 conference this week, IBM revealed that it has now moved more than 15 analytics applications onto Apache Spark clusters and that an Apache Spark cloud service that it previously announced is now generally available.
Separately, IBM announced that as part of existing alliances with Twitter and The Weather Company, it is making available four new application programming interfaces (APIs) that developers can invoke via the IBM Bluemix platform-as-a-service (PaaS) environment that IBM makes available as a cloud service. In addition, IBM is making available cognitive analytics techniques and IBM Insight Data Packages for Weather, a set of pre-packaged data sets tailored for specific industries.
Finally, IBM unveiled a revamped user interface for the IBM Cognos business intelligence application to bring it in line with the user experience provided by IBM Watson Analytics and launched a data capture application, called IBM Datacap Insight Edition, which automatically classifies documents at the point when they are first digitized.
Last June, IBM made a strategic commitment to Apache Spark. Now Rob Thomas, vice president of product development for IBM Analytics, says IBM is starting to reap the dividends of those investments. Ultimately, Thomas says that Apache Spark will be as strategic an investment for IBM as Linux. In fact, Thomas says what makes Apache Spark so attractive is that it enables IT organizations to pull data from multiple sources that can all be processed in memory. As such, Thomas says Apache Spark is rapidly becoming a superset of Hadoop in the sense that Hadoop is only one of many data sources.
At the same time, Thomas says Apache Spark enables IBM to substantially reduce the amount of code required to build analytics applications. For example, IBM says using Apache Spark allowed it to reduce the code base of DataWorks, a data preparation and data refinement service that is used by applications such as IBM Watson Analytics, from 40 million to five million lines of code. Similar benefits will be derived from running, for example, IBM SPSS analytics software on Apache Spark, says Thomas.
While Apache Spark and platforms such as Hadoop may not replace the need for traditional data warehouses overnight, Thomas says it’s more than apparent that data warehouse applications are rapidly being transformed. The challenge facing organizations now is figuring out how to make best use of all that data that is now available literally in a matter of seconds.