How “Big” does data have to get before you need something like Hadoop to tackle it?
Well, there are a couple of ways to decide that. One is just by bytes, of course. But another, arguably more relevant way to decide is to ask yourself: Can you process the amount of data you have by the time you’ll actually need it?
That’s becoming a problem for more organizations, Charles Zedlewski, the vice president of product at Cloudera, told Information Week.
He calls this “the ETL window,” and explains that some enterprises have so much data, it takes longer than 24 hours for an ETL-based data management tool to process it.
Or, as Information Week so nicely summarized it:
In other words, there aren’t enough hours in the day to process the volume of data received in a 24-hour period.
What do you do when you’re missing that "ETL window"? You switch to Hadoop, says Zedlewski, whose job it is to sell Cloudera’s Hadoop-based solutions. Essentially, you replace ETL with Hadoop, then move the processed data right along to your traditional database systems.
I know. It’s like some sort of technology riddle and Zedlewski is the Sphinx. Because here’s the thing: If you want to put data into Hadoop, how do you do it? You use an ETL solution. Granted, you may want to use a super-fast version of ETL, but you use ETL.
And yet, here we are, at a point where Hadoop is replacing ETL.
IT people will immediately get this, but it took me a while. Obviously, what’s happening is Hadoop is replacing the “transform” part of ETL (extract, transform, load), leaving the "E" and "L" to traditional data management tools. So this is clearly one way Hadoop is fitting in with existing IT systems and architecture.
“It's common that Hadoop is used in conjunction with databases. In the Hadoop world, databases don't go away,” Zedlewski told Information Week. “They just play a different role than Hadoop does.”
After all, old technology seldom just goes away; it just gets repurposed or integrated into some new “layer of abstraction.”
But I do have to wonder how the vendors with ETL-based solutions will respond. Everybody’s been in such a rush to embrace Hadoop, but what happens when Hadoop starts to “bite” into their bread-and-butter processing work?