Does it really matter when you load and when you transform data during a data integration project? Yes, actually it does, argues a recent Intelligent Enterprise article, "Is it Time to Switch to ELT?"
The article, written by Sreedhar Kajeepeta, compares the traditional extract-transform-load approach to data integration with an extract-load-transform approach-and a few hybrids that extra "transforms" at various points in the process.
I know it sounds a bit overly technical and perhaps even like nerdy nit-picking, but there's actually a strong cost and efficiency issue here, particularly in this age of virtualization and cloud computing. Loading before you transform actually gives you cost-of-performance advantages, particularly in data warehouse and business intelligence initiatives by reducing the costs of software licensing and development.
By contract, transforming before you load saves in infrastructure costs, but that's becoming less of an issue, thanks to cheaper multicore database servers, appliances, virtualization and cloud computing. That's why you'll be hearing more about ELT, Kajeepeta says:
As an enabler of this bigger BI market, the field of data integration tools is expected to grow 17 percent annually to reach $3 billion by 2012, according to Gartner estimates. It is in this market that we are likely to see ETL and ELT battling it out for market dominance. With performance and low latency increasingly in demand, you can guess ELT will get a lot of attention.
Kajeepeta offers clear guidelines for when you should stick with the old ETL and when you should consider an ELT or hybrid option:
As a rule of thumb, it's generally accepted that you should stick with ETL for DW/BI projects when there are 10 or more source systems or when the source databases are a terabyte or larger in size. If your project doesn't fit this description, it's time to consider ELT and to match tools against your requirements.
He also offers a few other issues to consider:
I'll be honest. I'm not sure how much credence to give this issue. I noticed it surfaced as a discussion point in 2006 - Vincent McBurney of IT Toolbox had an good piece explaining the pros and cons - and then disappeared until 2008, when it made a brief reappearance, and now it's back. Usually, that's a sure sign of a vendor trying to push a sales angle as a bigger story.
But there are two things that make me think this is worth considering and not just a vendor pitch. First, Kajeepeta is the the global vice president/CTO of technology consulting practices for the Global Business Solutions & Services division of CSC, an India-based services firm, not a data integration company. Second, Kajeepeta discusses how several vendors approach this issue, including Oracle (ELT), IBM and Informatica (both multiple options), and appliance vendors Netezza and Teradata, (ELT).
So, I think it's worth considering. It's also surprisingly understandable, if you can keep the acronyms straight.