Goverance, Integration Two Concerns for Hadoop

Loraine Lawson
Slide Show

Why the Hoopla over Hadoop?

Hadoop in nine easy to understand facts.

Most vendors are choosing to stick with Apache's distribution of Hadoop, rather than create their own. As far as I can tell, that may make Apache's Hadoop the single most successful open-source solution ever - I honestly can't think of another situation where open source was the first product and so readily stayed the only product in play.


When it comes to distributions for the enterprise, the startup Cloudera has been the go-to company, but as Bloor Research Director Philip Howard recently pointed out, it now has competition from MapR Technologies' Hadoop distribution, which EMC used in its Greenplum HD Enterprise Edition. Both, of course, are based on Apache's open-source distribution.


The wildcard in this is the recently announced Hortonworks, which boasts an incredible pedigree since it spun off from Yahoo, taking a slew of engineers with it to focus on Hadoop. So far, the word is that Hortonworks will work to improve the Apache Hadoop distribution. Already, the company received mention in connection with a recent release of Apache Hadoop that includes changes to make it more user-friendly and to support data management.


As I wrote last Friday, the list of players in the Hadoop space is rapidly growing, thanks in large part to a slew of announcements around Hadoop and Big Data in general this summer. Apparently, a number of these releases were timed around the Hadoop Summit, which happened in June. Earlier this week, Gartner's Merv Adrian wrote his take on the event, with a focus on introducing the main players as well as the up-and-comers in this space. It's a great update on who to watch and why they may or may not matter as this market evolves. Beyond the Hadoop's Who's Who, he also mentions two key issues to emerge for those dealing with Hadoop:


  • Governance specifically, but data management in general
  • Integration among the related Big Data pieces (Hive, Zookeeper, Pig, etc.)


Adrian writes:

One of the biggest barriers to open source adoption so far has been precisely that degree of required self-integration. Gartner's second half 2010 open source survey showed that more than half of the 547 surveyed organizations have adopted OSS solutions as part of their IT strategy. Data management and integration is the top initiative they name; 46 percent of surveyed companies named it. This is where the game is.

He's not the only one who thinks governance will be a big issue for Big Data moving forward. I recently discussed this topic with David Corrigan, director of strategy for IBM's InfoSphere portfolio. Corrigan believes with some questions and adjustments, many organizations will be able to use existing governance strategies with Big Data, but he cautioned that businesses should start the groundwork before the first Big Data projects are approved.


According to Corrigan:

A lot of businesses are at the point where research organizations are doing proofs of concept for extending master data management beyond the single view, and they talk about gathering customer information from the World Wide Web and various sources. The business is likely going to pick up one of those options and say, 'Go ahead and do that.' You don't want to lose the momentum by saying, 'Well, now I have to sit back and think about governance and the policies that I want to apply.' You want to engage with the governance team right at the same time.

Add Comment      Leave a comment on this blog post

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.