Eight Ways Any IT Division Can Use Hadoop, Part II


There are a lot of ultra-cutting-edge and cool use cases for Hadoop, from sensors to Four Square. But what about Hadoop for the rest of us? What are the ways any IT division and CIO can put Hadoop to work?

Yesterday, I shared the first three ways you can do just that:

1. As a staging layer for analytics.

2. As supplemental storage for an enterprise data warehouse platform.

3. As an acquisition and staging layer for unstructured content.

Let’s look at the final five in my list of eight ways any IT division could use Hadoop.

4. As an ETL tool. Hadoop doesn’t just handle large amounts of data — it’s fast. Already, some companies are generating so much data in a day that it actually takes their ETL solutions longer than 24 hours to process it. By using Hadoop and MapReduce to perform the ETL process, they’re able to significantly reduce the time it takes to process that data.

5. As an exploration engine. I’ve been quoting this excellent post by Ravi Kalakota this week, and this what he says is one of the three primary use cases for Hadoop. What’s cool about Hadoop in this situation is that you can add new data to existing data without having to reindex the entire cluster.

6. An archive for historical data. Sometimes, you want to archive data, but you also want to be able to access it without the hassle of sending for and uploading the archives. Hadoop allows you to store large amounts of historical data without the tapes, giving you access to that data at any time, Kalakota points out.

7. As an enterprise search solution. If you really want to search all your enterprise data, build an indexing infrastructure on top of Hadoop. It scales easily, so it will grow as your data grows. Plus, thanks to the distributed parallel architecture, it’ll be fast, according to Cloudera.

8. As a data sandbox. Data warehouses are big, but unwieldy, which means if you want to put something in them, you need a plan. Hadoop is much more flexible, so some companies are using it to create a data sandbox where users can play with the data, and then if they find something worthwhile, they can add that query to the data warehouse. This use case should appeal to any company striving to be more “data-driven.”

Of course, there are other great use cases that will apply across many industries — including building a recommendation engine or using Hadoop to evaluate customer churn. For more on those use cases, I’ll point you to Kalakota’s post and Cloudera’s slideshow or whitepaper, “Ten Common Hadoopable Problems.”