A Reality Check on Hadoop and Big Data

Loraine Lawson
Slide Show

Why the Hoopla over Hadoop?

Hadoop in nine easy to understand facts.

Steve Ballmer is taunting IBM and Oracle with challenges of "You don't know Big Data." Yahoo this week announced it's spinning off its own Hadoop-focused company, cleverly named Hortonworks, after - I assume - another famous pachyderm. LexusNexus is touting its Hadoop-alternative, HPCC Systems, which it plans to release as open source after taking 10 years to develop it.


And at least once a week, there's yet another announcement about a vendor offering Hadoop plug-ins or Hadoop connectors or Hadoop "what'cha'ma'bob" for their business intelligence tool, data integration platform or what-have-you.


Meanwhile, there will be roughly 1.8 trillion gigabytes pocketed in 500 quadrillion files by the end of the year, says IDC - all of it, waiting to be picked, processed and integrated for the analyzing.


There's no doubt Hadoop and all-things Big Data are making an impact on how massive amounts of data are handled and who has access to this capability. And there's no doubt that Big Data is capturing the attention of mainstream businesses. Forrester analyst James Kobielus says clients no longer ask "What's Hadoop?" but "Who offers the most robust Hadoop solution?"


Still, as promising as Hadoop is, Kobielus writes in a recent blog post, it's time to add some perspective before we O.D. on Big Data and Hadoop hype:

At times, it almost feels like people discuss Big Data with the assumption that bigger is necessarily better and that throwing more data at your problems will automatically produce insights. I hope business and IT professionals heed my advice about searching for those special problems, often of a scientific nature, that can be solved best through petabyte-scale analytics. You don't need a data center full of maxed-out storage arrays to derive powerful insights. Gut feel is free, and it often thrives on the scantiest information.

Throughout June, Kobielus wrote a series of blogs asking such pause-worthy questions as "What Are These Big Bad Insights That Need All This Nouveau Stuff?" and "Hadoop: What Is It Good for?"


The result is a reality check about Hadoop and Big Data in general. What I like about Kobielus's post is he assesses Hadoop honestly without tearing it down or diminishing its contribution. He truly is just putting it into perspective.


Here are some of the questions he asks:

  • Do you really need this level of analysis? "Many of the core applications of Hadoop are scientific problems in linguistics, medicine, astronomy, genetics, psychology, physics, chemistry, mathematics, and artificial intelligence," Kobielus writes. "'Scientific' doesn't always mean theoretical. Essentially, any complex research, engineering, supply chain, marketing, or other problem is suitable for Big Data."
  • Do you have something that can already handle that level processing? Kobielus points out high-end enterprise data warehouses can do most of what Hadoop can do. "Many IT practitioners will ask why they should pay good money for a new way of doing things, with all the concomitant disruptions and glitches, when they can simply repurpose their investments in platforms like Teradata, Oracle, IBM, and Microsoft," he adds.
  • Can your IT budget afford it? Sure, Hadoop lowers the cost of massively large data somewhat, but it's still not what you would call cheap. "The immovable object that Big Data will need to overcome is the limited IT budget," he writes. "Until petabytes become dirt cheap, few companies can justify the hardware necessary for storing, processing, and managing all this data."


All good questions, to which I would add one more: Do you have a Hadoop expert hidden away somewhere? IT Business Edge's Susan Hall wrote about the shortage of IT workers with this skill. While IBM, Informatica, SnapLogic and others are offering tools to help you access and process Hadoop-stored data with their tools, it's still something you'll want to investigate.


This is not to say that Hadoop is overhyped and not worth your time. It's just that there are issues you need to consider. It's not quite enterprise-ready, or, as Kobielus puts it to his clients, "... yes, Hadoop is real, but ... it's still quite immature."

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Add Comment      Leave a comment on this blog post

Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.