Most vendors are choosing to stick with Apache's distribution of Hadoop, rather than create their own. As far as I can tell, that may make Apache's Hadoop the single most successful open-source solution ever - I honestly can't think of another situation where open source was the first product and so readily stayed the only product in play.
When it comes to distributions for the enterprise, the startup Cloudera has been the go-to company, but as Bloor Research Director Philip Howard recently pointed out, it now has competition from MapR Technologies' Hadoop distribution, which EMC used in its Greenplum HD Enterprise Edition. Both, of course, are based on Apache's open-source distribution.
The wildcard in this is the recently announced Hortonworks, which boasts an incredible pedigree since it spun off from Yahoo, taking a slew of engineers with it to focus on Hadoop. So far, the word is that Hortonworks will work to improve the Apache Hadoop distribution. Already, the company received mention in connection with a recent release of Apache Hadoop that includes changes to make it more user-friendly and to support data management.
One of the biggest barriers to open source adoption so far has been precisely that degree of required self-integration. Gartner's second half 2010 open source survey showed that more than half of the 547 surveyed organizations have adopted OSS solutions as part of their IT strategy. Data management and integration is the top initiative they name; 46 percent of surveyed companies named it. This is where the game is.
He's not the only one who thinks governance will be a big issue for Big Data moving forward. I recently discussed this topic with David Corrigan, director of strategy for IBM's InfoSphere portfolio. Corrigan believes with some questions and adjustments, many organizations will be able to use existing governance strategies with Big Data, but he cautioned that businesses should start the groundwork before the first Big Data projects are approved.
According to Corrigan:
A lot of businesses are at the point where research organizations are doing proofs of concept for extending master data management beyond the single view, and they talk about gathering customer information from the World Wide Web and various sources. The business is likely going to pick up one of those options and say, 'Go ahead and do that.' You don't want to lose the momentum by saying, 'Well, now I have to sit back and think about governance and the policies that I want to apply.' You want to engage with the governance team right at the same time.