A campaign to put Hadoop at the center of the enterprise just got a shot in the arm this week in the form of $160 million in additional funding being raised by Cloudera.
Cloudera CEO Tom Reilly says a large portion of those funds, a significant amount of which came from Google and the Michael Dell family, will be applied to research that will make it simpler to integrate the Cloudera distribution of Hadoop with the rest of an enterprise IT environment using both ODBC and JDBC connectors alongside Hadoop application programming interfaces.
The goal, says Reilly, is to move Hadoop to the center of the enterprise to provide an “active archive” that feeds data to any number of enterprise applications. All told, Reilly says there are four primary use cases for Hadoop in the enterprise. The first is to provide a landing zone in which raw data can be cost-effectively stored in its native format. The second is to transform that data into a structured format that can be persistently stored. The third is to enable users to make use of SQL and enterprise search tools to directly interrogate and explore that data, and finally, converged analytics of both unstructured and structured data.
Reilly says that as Hadoop evolves into becoming a data hub, both batch-oriented and real-time applications will be developed. For that reason, Cloudera has embraced a Spark real-time engine that allows Hadoop to run in-memory, says Reilly.
Cloudera, which has existing partnerships with Oracle and Dell, has now garnered a total of $300 million in funding. Reilly says this latest investment will make it possible for Cloudera to one day go public on its own terms. In the meantime, Reilly concedes that there is likely to be some consolidation activity across a Hadoop distribution landscape that includes rival platforms from IBM, Intel, Hewlett-Packard and Hortonworks. But ultimately, Reilly says there will only be a handful of distributions of Hadoop that will serve as the foundation for a set of technologies that are starting to transform enterprise IT architecture from the data layer up.