Interest Growing in Integrating Hadoop Data Stores

Share it on Twitter  
Share it on Facebook  
Share it on Linked in  
Slide Show

Why the Hoopla over Hadoop?

Hadoop in nine easy to understand facts.

A recent survey of 102 Hadoop developers showed the No. 1 reason for using Hadoop is to mine large data sets for improved business intelligence. A desire to run data extraction, transformation and loading (ETL) on the data ranked in the top four.


Clearly, companies are interested in looking beyond just storing and processing large sets of data with Hadoop toward integrating and using that data in other systems.


If you're not familiar with Hadoop-pronounced HA-doop ("a" as in "adder") by its co-creator, Doug Cutting-it's an open source framework from Apache for the distributed storage and processing of literally petabytes of data. If you'd like specifics, Cutting explained this in more detail during my recent interview with him.


It's the same approach used by Google, Twitter, Facebook and other high-volume sites to analyze their Web log data. It opens up that overwhelming world of machine-created data for new analysis, and organizations are putting it to use in intriguing ways. For instance, the Tennessee Valley Authority has sensors on power-transmission lines and facilities across the country that create data at an approximate rate of 50 to 100 times a second, says Cutting. Hadoop makes it affordable to save and analyze that data, which is helping the TVA identify issues, patterns and grid problems. That, in turn, is helping the agency predict-and avoid-power failures.


And that's just one of a number of business-use cases Cutting identified in the second part of our interview.


While Hadoop can handle the storage and processing, integrating that much data is still something of a challenge. Third-party companies are stepping in to help with that task, including Cloudera, where Cutting is now the chief architect. Cloudera mostly partners with other companies, who use its tool, Sqoop, to pull Hadoop data into relational databases. Pentaho and Talend offer solutions that simplify the integration process with more traditional data integration interfaces and tools, and IBM plans to. Previously, you needed Java programmers to custom write this sort of integration, according to Pentaho's founder and CEO Richard Daley.


More organizations will be looking for these types of solutions if Karmasphere's survey of Hadoop developers is correct. Karmasphere is another tech company focused on making Hadoop more accessible, though it focuses on developers skilled in traditional programming languages. Karmasphere CEO Martin Hall says companies are uncomfortable making business decisions based on samples of data. Hadoop lets them analyze more data (way more data, actually), and that's driving its adoption.


You can learn more about Karamsphere's survey results by reading IT Business Edge's new slideshow, "Developers Move Quickly to Embrace Hadoop."