Hadoop distributor MapR declined an invitation to be part of a vendor consortium centered on the Hadoop stack. MapR CEO John Schroeder explained the company’s decision in a post last week, deriding the consortium definition of Hadoop’s core as “vendor-biased.”
The consortium calls itself the Open Data Platform, which is a bit of an odd title given how platform is generally defined in the industry. In February, the group revealed its founding Platinum members as GE, Hortonworks, IBM, Infosys, Pivotal, SAS and an undisclosed international telecom. Gold members include Altiscale, Capgemini, CenturyLink, EMC, Splunk, Verizon Enterprise Solutions, Teradata and VMware.
It’s not unusual for vendors to join forces for creating interoperability standards and that is the stated intent in the group’s press release:
“A key benefit of the ODP will be for members to collaborate across various Apache projects as well as other open source-licensed big data projects with a goal toward meeting enterprise class requirements. The ODP is expected to promote a set of standard open source technologies and versions that will increase compatibility among big data solutions and simplify the process for applications and tools to integrate with and run on any compliant system.”https://o1.qnsr.com/log/p.gif?;n=203;c=204663295;s=11915;x=7936;f=201904081034270;u=j;z=TIMESTAMP;a=20410779;e=i
You would think that wouldn’t be an issue, since all vendors draw on Apache’s open source version of Hadoop. Members of the consortium say otherwise. Sunny Madra, head of data products at Pivotal, pointed out that the distributions are “dissimilar by design.” In an InfoWorld interview, he compared the Hadoop situation to the early days of Unix distributions.
“The Unix ecosystem was quite fragmented; everyone had their own things going on, and you couldn't be sure if something ran in one place or the other,” Madra told InfoWorld. “Then Linux comes around and standardizes that. So if you take a look at RHEL or CentOS or Oracle, you know that if you have something that runs on any one of those, it'll run on all of them."
Hadoop’s lack of standardization makes it hard to certify software that is developed for it, Madra added.
But there are a few things that make this a consortium a bit … awkward. First, MapR isn’t the only major Hadoop distributor missing from that list. The top two commercial Hadoop companies — Amazon Web Services EMR and Cloudera — are also absent. Cloudera declined to join early on. Chief Strategy Officer Mike Olson didn’t mince words about why.
“I have an engineer’s disdain for industry consortia in general, and for vendor-driven consortia in particular. Far too often, these organizations aim not at promoting, but rather at slowing, innovation in the technology industry,” he wrote. “Pivotal and Hortonworks claim that the ODP is driven by an industry-wide longing for standardization in the Apache Hadoop ecosystem. I don’t believe them.”
That’s significant since Cloudera is the leader in customer deployments, with more than 200 customers in March, 2014. Schroeder writes that together, MapR and Cloudera run nearly 75 percent of Hadoop implementations.
Also missing are Intel and Microsoft, which offers Microsoft Windows Azure HDInsight Service, one of two Hadoop distributions that run on Windows, according to CIO.com. A version of Hortonworks Data Platform also runs on Windows.
So, you have to wonder what good standardization does if the largest distributors aren’t involved — a point that Schroeder succinctly raises in his post, referring specifically to the idea of “platinum” and “gold” memberships that bestow different rights.
“The Open Data Platform is not open unless equal voting rights are provided to the leading Hadoop distributions,” he writes. “The Open Data Platform has not disclosed how governance is done, but it is a different model than the preferred and fair meritocracy used by the Apache Software Foundation.”
This is one of three concerns Schroeder outlines in his post, the other two being:
The Open Data Platform is redundant with Apache Software Foundation Governance.
The Open Data Platform is ‘solving’ problems that don’t need solving. “Companies implementing Hadoop applications do not need to be concerned about vendor lock-in or interoperability issues,” he writes. “Applications built on one distribution can be migrated with virtually zero switching costs to the other distributions.”
Gartner’s informal findings from a webinar back that up. In a joint post, Gartner analysts Nick Heudecker and Merv Adrian reveal that less than 1 percent of attendees indicated that vendor lock-in or interoperability was a concern.
Where stands Gartner amid this vendor bickering? You’ll note that Heudecker and Adrian’s post is written as a dialogue between Muppet malcontents Statler and Waldorf, which I think softens their criticism of the ODP:
“This simply institutionalizes a dichotomy in favor of a few favored players. Who wants it? As Cloudera suggests, the paying members, and it’s not clear who else. It’s ironic that Hortonworks is one of the founders of an organization that wants to add an anchor slowing innovation in the open source free-for-all it has been the flag-bearer for.”
Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.