Evolving Hadoop and Big Data for Enterprises

Loraine Lawson
Slide Show

Why the Hoopla over Hadoop?

Hadoop in nine easy to understand facts.

Hadoop, and big data in general, is like some sort of tech primordial soup these days. No one's quite sure what's going to work, but every week or so, a new business or product emerges to give it a go.


Most recently, a company called Hortonworks spun off a Hadoop group from Yahoo. Their business model is to offer training and support for Hadoop, and certainly, given the skill shortage and demand, that's a smart start to a new business.


In a recent Q&A with Information Management's Jim Ericson, Hortonworks CEO Eric Baldeschwieler said the company's leaders expect half the world's data will be in Hadoop within five years. Sit with that a minute-half the world's data, running on Hadoop. That's bound to keep Larry Ellison up at night.


Although the Yahoo spinoff made big headlines, a quieter recent announcement may be more significant, at least in terms of enterprise adoption of Hadoop. So far, vendors have stuck with supporting Apache's distribution of Hadoop, leaving enterprises by default to Cloudera, "the preferred distribution of Hadoop for enterprise-class environments," according to a recent post by Philip Howard, a research director in Data Management at Bloor Research.


But in May, EMC unveiled its Greenplum HD Enterprise Edition, and guess what? It's not based on the standard distribution. It's based around MapR Technologies' Hadoop distribution, according to Howard, who adds that distribution is also now available directly from MapR.


This isn't just a matter of market dominance: As Howard explains, the standard distribution of Hadoop has three major problems: resiliency, compression and performance. He contends that the MapR distribution addresses these flaws. If you're considering Hadoop at all, you'll want to take into acount Howard's critique.


Of course, my focus has been on the integration front, where a slew of companies are offering various connectors and ways of accessing and analyzing Hadoop-stored information. Here the question isn't about distribution, but rather about how you use the information stored within Hadoop. Informatica, Composite, Talend, Syncsort, Pentaho, SnapLogic, IBM-I'm still working my way through the product briefings as company after company unveils Hadoop-focused solutions.


One topic that has come up several times-particularly from Informatica and IBM-is the concept of a "big data platform." The word "platform" always creates more questions in my mind, so during a recent interview, I asked David Corrigan, the director of strategy for IBM's InfoSphere portfolio, what, exactly, constitutes a "big data platform."


Corrigan said that in IBM's view, a big data platform would incorporate five core capabilities:

  1. Volume, velocity and variety. Big data isn't just large amounts of information, he explained: It also comes in a variety of forms-structured and not-and it comes at high speeds. A platform has to address all three components.
  2. Analytics. "I mean, the point of Big Data is to generate insight," Corrigan said. "This isn't an exercise in converting all of those different sources and variety of information into a structured relational format."
  3. Enterprise-class capabilities, such as governance, security and privacy.
  4. User-friendly environments, particularly for developers. There simple aren't enough Hadoop and MapReduce specialists, so the only way businesses will really be able to capitalize on Hadoop is if a platform makes it more accessible for the average developer. "You need to be able to democratize big data and bring it to the average user," Corrigan said.
  5. Integration-my personal favorite. Big data should not become a new silo, and to avoid that, a platform will need to support integrating the big data environment with relational data technologies, data warehouses, and other existing enterprise resources.

Add Comment      Leave a comment on this blog post

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.