For Clustered Environments, Is it Hadoop or Bust?

Share it on Twitter  
Share it on Facebook  
Share it on Linked in  
Slide Show

Why the Hoopla over Hadoop?

Hadoop in nine easy-to-understand facts.

If you're a CIO considering replacing static infrastructure silos with integrated compute clusters (and who isn't, really), you've no doubt come across Hadoop.

Originally designed to support the distribution of search capabilities, the framework has since been adapted for a wide range of database, warehousing and data management functions. Its ability to share applications across multiple nodes makes it a prime candidate for all manner of clustered and distributed architectures, including the cloud.

Clustered platform developers are quickly adopting Hadoop as a means to balance high-performance analytics and low power consumption in the era of Big Data. SGI, for example, just launched its SGI Hadoop Cluster Reference Implementation designed to run its Management Center software on Intel Xeon clusters. The idea is to optimize power envelopes to accommodate fluctuating analytics workloads. SGI is also working with business intelligence firms like Kitanga and Datameer to incorporate Hadoop into leading analytic functions like information modeling and data visualization.

A word to the wise, however: Hadoop is not without concerns. As Bloor Research's Philip Howard points out, certain distributions, namely HDFS, use a single NameNode to store all cluster metadata, which creates a single point of failure that could bring the entire environment down. There is also the need to develop new programming skills for Hadoop, which at the moment happen to be a rare and expensive commodity.

Hadoop will also require a fair amount of network optimization to ensure applications and data get where they need to be in a timely fashion. Solarflare and Fusion-io have taken the first crack at this with a collaboration that combines intelligent 10 GbE infrastructure with shared data decentralization techniques. The project is part of Fusion-io's Technology Alliance Program aimed at minimizing latency and increasing app performance across complex network infrastructures. Part of the effort involves accelerated workflow steering and intelligent network adapter technology to more closely match traffic patterns with available CPU cores.

As with any new technology, however, you need to walk before you can run. That's why Forrester analyst James Kobielus recommends a gradual deployment strategy for Hadoop, at least until some of the growing pains have been identified and corrected. A key mistake at this point, he says, is to deploy Hadoop without a clear understanding of what analytics tools are suitable for massive scalability. It would also be wise to deploy Hadoop as a staging layer behind existing data warehouse platforms.

As can be expected, Hadoop environments will grow more complex as the number and size of compute clusters grow. It's also true that Hadoop is not the only answer to distributed application management. But it does seem to be the flavor of the month, and on that basis it deserves a careful evaluation.

All too often, however, enterprises engage in a herd mentality driven by the desire to deploy the hottest developments even if their specific needs can be more adequately addressed by more established technology. The rule of thumb with Hadoop, then, is to identify a need, and then research, research, research.