Will Hadoop Steal Work from ETL Solutions?

Loraine Lawson

How “Big” does data have to get before you need something like Hadoop to tackle it?

Well, there are a couple of ways to decide that. One is just by bytes, of course. But another, arguably more relevant way to decide is to ask yourself: Can you process the amount of data you have by the time you’ll actually need it?

That’s becoming a problem for more organizations, Charles Zedlewski, the vice president of product at Cloudera, told Information Week.

He calls this “the ETL window,” and explains that some enterprises have so much data, it takes longer than 24 hours for an ETL-based data management tool to process it.

Or, as Information Week so nicely summarized it:

In other words, there aren’t enough hours in the day to process the volume of data received in a 24-hour period.

What do you do when you’re missing that "ETL window"? You switch to Hadoop, says Zedlewski, whose job it is to sell Cloudera’s Hadoop-based solutions. Essentially, you replace ETL with Hadoop, then move the processed data right along to your traditional database systems.

I know. It’s like some sort of technology riddle and Zedlewski is the Sphinx. Because here’s the thing: If you want to put data into Hadoop, how do you do it? You use an ETL solution. Granted, you may want to use a super-fast version of ETL, but you use ETL.

And yet, here we are, at a point where Hadoop is replacing ETL.


IT people will immediately get this, but it took me a while. Obviously, what’s happening is Hadoop is replacing the “transform” part of ETL (extract, transform, load), leaving the "E" and "L" to traditional data management tools. So this is clearly one way Hadoop is fitting in with existing IT systems and architecture.

“It's common that Hadoop is used in conjunction with databases. In the Hadoop world, databases don't go away,” Zedlewski told Information Week. “They just play a different role than Hadoop does.”

After all, old technology seldom just goes away; it just gets repurposed or integrated into some new “layer of abstraction.”

But I do have to wonder how the vendors with ETL-based solutions will respond. Everybody’s been in such a rush to embrace Hadoop, but what happens when Hadoop starts to “bite” into their bread-and-butter processing work?

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Add Comment      Leave a comment on this blog post
Aug 15, 2012 2:36 PM John Haddad John Haddad  says:
Hadoop is not a replacement for ETL. Hadoop is a high-performance no-share distributed computing platform. It complements ETL nicely for processing big data on the order of terabytes to petabytes. Saying that Hadoop replaces ETL is like saying Linux is a replacement for MS Office on Windows. In order to perform the “T” in ETL you need to either hand-code the logic or use a tool that can execute transformations on Hadoop. Reply
Oct 23, 2012 12:21 PM no one no one  says:
just take the data as is without tranformations and use a simple tool to access the data then you do not need the "T" and the "L" in your ETL process Reply
May 3, 2013 9:57 PM Icarus Icarus  says:
Seems simple, problem is you need to point your extraction tool to the piece of data you need… that requires intelligence in the extraction. Traditionally this intelligence is contained in the “Transformation” phase. You still need to “Load” the data into your application. Am I missing something? Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.