There are a lot of mixed messages when it comes to moving Big Data with ETL processes. I’ve been told by many Big Data experts that ETL is perfectly capable of moving data into and out of Big Data solutions. But recently, I’ve ran into several pieces that suggest ETL tools are slowing down the process.
For example, it’s been noted in at least two recent pieces — including this DataMigration Pro article I shared last week — that there’s so much data now, using ETL to migrate data takes you much longer than the tradition “big bang data migration weekend” IT has relied upon in the past.
David Loshin (@davidloshin) joins those who say ETL and the whole batch-approach may be an outdated approach for some data needs. Loshin is the president of Knowledge Integrity, an information training, consulting and development firm.
“Now pervasive and right-time analytics seems to be within reach, but the batch-oriented approach is insufficient to meet today’s — let alone tomorrow’s — data integration and delivery needs,” he writes in the November issue of TechTarget’s BI Trends + Strategies. “Without addressing the challenge of data latency, data provisioning will continue to be the biggest bottleneck to increased productivity and accurate business decision making.”
He’s talking about data integration in two particular situations, mind you — business intelligence and analytics and Big Data.
There are different ways to solve the problem, of course. He suggests high-speed data replication technologies (which I guess would include in-memory tools, another option I’ve seen discussed) and caching techniques like those used in data federation or data virtualization.
I should add that ETL vendors are not ignoring this problem, either. Informatica and Pervasive are both among those with ETL that now offer high-speed ETL tools for Hadoop.
It’s a great piece that also looks at the ways data latency can costs businesses — including how it slows down development cycles for analytics applications.
The article is published as part of an e-zine (aka, a PDF) and starts on page 10.
You might also want to take some time to read or skim through the article right before it, “Data Stewardship Programs Need Solid Plan, Firm Focus.” It reviews five common mistakes companies make when starting data stewardship programs, including the challenge of finding the right people.