As Hadoop continues to evolve, the types of applications that can be run on it are evolving quickly too, thanks to the emergence of an Apache Spark framework that enables Hadoop to run in-memory.
Today, MapR Technologies became the latest distributor of Hadoop to announce support for Spark by partnering with Databricks, the developer of the Spark framework. Jack Norris, chief marketing officer for MapR Technologies, says that by adding Spark support, MapR is making it possible to now reuse code across batch, interactive and streaming applications.
As IT organizations gear up to start running Hadoop applications in production, more attention is being paid to how those applications will perform. As an alternative to using the MapReduce interface to access that data, Spark is also a lot simpler to use because it supports Java, Python and Scala application programming interfaces.
While there is a lot of debate about what role Hadoop will play in the enterprise, the general consensus appears to be that Hadoop is emerging as a central data hub repository. In some cases, that hub or “data lake” is going to be accessed by applications residing on other platforms. But at the same time, many applications will invoke any number of data processing engines directly on top of Hadoop.
The percentage of applications in the enterprise that will eventually wind up running directly on top of Hadoop is anybody’s guess. But the one thing that is for certain is that as enterprise IT organizations get more comfortable with Big Data in general, the odds are good that Hadoop in some form is going to be a significant part of the construct.