To date, there are three companies offering Hadoop distributions aimed at enterprises — Cloudera, MapR and Hortonworks. To be honest, so far, it’s been hard to differentiate between the distributions, since none of the companies are all that old.
But that may be changing. Forbes contributor John Webster says this is the year we’ll see these companies and their enterprise partners engage in a “land grab for the enterprise” this year. He also anticipates this year will see the development of a Hadoop 2.0, with more enterprise-friendly features.
So far, there really hasn’t been much difference between the three Hadoop distributions. All offer some version of Apache’s Hadoop, professional services, education, and a distribution that can be purchased for a subscription.
But the three Hadoop vendors definitely are headed in different directions, according to industry watcher Dan Woods.
“You have Cloudera, Hortonworks, and MapR Technologies all working on making Hadoop better, but all working on their own strategy for commercializing Hadoop,” he wrote in a Forbes article, “Evolving Hadoop: Can Hadoop Survive its Weird Beginnings.”
Cloudera, the veteran of the group, is focused on evolving a management suite for Hadoop, he said. While Doug Cutting, creator of Hadoop, works at Cloudera, Woods doesn’t see him playing an open-source advocacy role, like Linus Torvalds at Linux.
MapR was formed by techies from a variety of companies, including CTO and Co-founder M.C. Srivas, who ran one of the major search infrastructure teams at Google that used MapReduce. Woods described it as more focused on fixing the core capabilities with proprietary replacements and standards-based extensions.
Hortonworks is a Yahoo spin-off company, which Woods describes as “the open source purist, which wants to be the Red Hat of Hadoop.”
It certainly seems like vendors will be focusing on ways to make Hadoop more enterprise-friendly.
This week, Hortonworks released its Hortonworks Data Platform (HDP) 1.2, stressing that it is “the industry’s only complete 100-percent open source platform powered by Apache Hadoop.”
There are a few additions designed to make this product more enterprise-friendly, including Apache Ambari, a Web-based tool for managing, monitoring and provisioning Hadoop clusters. It has also added new capabilities to improve security and make Hadoop easier to use.
It also includes a high-performance ODBC connector thanks to a partnership with Simba. What this means is it’s easier to integrate and move large datasets from current systems.
eWeek notes that the new platform also “improves scalability by supporting multiple concurrent query connections to Hive from business intelligence tools and Hive clients.”
Not long ago, I spoke with MapR’s CEO and Co-founder John Schroeder about what he thinks will happen with Hadoop this year. Among other things, Schroeder said he foresees an expansion of SQL-based tools for Hadoop, which should make it a bit accessible for enterprise developers, as well as Hadoop’s use in real-time applications.
But to really deliver in the enterprise, Hadoop needs to evolve in substantial ways, according to Woods. He offered a list of changes Hadoop needs, including:
- A simplified MapReduce that can be used by more people, without specialized data science skills
- A faster file system
- Support for integration with the standards used by BI software
- Basic data protection features, such as snapshots and mirrors
Two of those seem significantly harder to me than the other two. Will all four happen? We’ll see, but so far, I’ve only heard about integration support for BI tools — and usually, that integration is coming from the BI vendors.