IBM Embraces Spark and Gives Open Source Access to Its Machine Learning Technology

Mike Vizard
Slide Show

Capitalizing on Big Data: Analytics with a Purpose

To join in the momentum from the open source Apache Spark in-memory computing framework build, IBM today announced that it is making a major commitment to Spark in the form of IBM SystemML machine learning software that it will donate to a project that 3,500 IBM researchers located in a dozen labs are already now working on.

Joel Horwitz, director of portfolio marketing for the IBM Analytics Platform, says that IBM views the in-memory framework for creating clusters as a foundational component of an emerging “insight economy” where analytics are processed in real time alongside transactions. As such, IBM will embed Apache Spark software into all of its analytics and e-commerce software, says Horwitz.

In addition, Horwitz says that IBM will offer Spark on its SoftLayer cloud alongside an instance of Spark that can be invoked as a service running on the IBM Bluemix platform-as-a-service (PaaS) environment that can be provisioned in as little as 10 minutes. One of the things that makes this possible, says Horwitz, is that the application programming interfaces (APIs) that have been created for Apache Spark are already well defined.


Horwitz says IBM is committed to making additional contributions to the project as it continues to invest in machine learning applications designed to, for example, advance gene sequencing or optimize transportation routes using data collected from millions of Internet of Things (IoT) endpoints. Horwitz adds that IBM is committed to extending the number of programming languages that can be used to create Spark applications. Spark itself, notes Horwitz, is written in Scala, a derivative of Java.

IBM will also open a Spark Technology Center in San Francisco. The company is pledging to educate at least 1 million data scientists and data engineers on Spark through extensive partnerships with AMPLab, DataCamp, MetiStream, Galvanize and Big Data University MOOC. 

Though data itself is not actually stored in Spark, as an in-memory compute engine that is layered on top of Hadoop, Horwitz says Spark is becoming part of the logical data warehouse that is starting to emerge in Big Data environments. In fact, Spark is not only multiple orders of magnitude faster than standard Hadoop, it sharply reduces the number of machines needed in a cluster to process Big Data.

As a top-level Apache open source project originally developed by Databricks, Horwitz says that IBM views Spark today as significant an open source project as Linux itself. The challenge, of course, is turning what is clearly still an emerging, immature technology into something that can be deployed in support of production applications across the enterprise.



Add Comment      Leave a comment on this blog post
Jun 15, 2015 9:34 PM kiter kiter  says:
Awesome that they are coming to San Francisco. Kinda weird though they didn't open it there first....Nice article! Reply

Post a comment

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

null
null

 

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.