I often interview vendors, and one question I like to ask is who they consider their competition to be. PR people hate this question because the last thing you want is your company talking about the competition. I understand that-I ask it anyway because it's a confusing field, and the answer is always interesting.
Whenever I ask this question to a data-integration vendor, without fail, they'll tell me their main competition is hand coding.
Last year about this time, I asked Phillip Russom of The Data Warehousing Institute why companies were still hand coding when they could buy tools that would do that work for them. He had no idea, but the question vexed him, since, in his words, hand-coded data integration is a "fairly old and arcane practice." He estimated that roughly half the data-integration solutions are coded from scratch. A 2008 study by an IBM user group confirmed his estimation, finding that 50 percent of companies still use hand-coded scripts to move data.
But nobody had an explanation as to why companies do this.
This week, I finally found answers in this Information Management article by Rick Sherman, founder of the Boston-based data warehouse and business intelligence consultancy Athena IT Solutions.
Sherman begins by examing how the use of ETL and other integration tools have evolved. He notes that data-integration tools now offer technologies and processes that extend well beyond basic ETL tasks. These suites can help with projects including:
And yet ... even these more robust tools still aren't very pervasive, with both small and global Fortune 1000 companies still resorting to hand coding. And this is where he finally answers our "why" question.
It boils down to this: Ignorance of the market, costs, a lack of resources and a weird adherence to corporate standards, even if enterprise-wide standards don't apply to the situation.
Fortune 1000-size organizations are turned off by the expense of licensing these tools for wider use. They lack the resources-aka, data-integration developers-for deploying the tools more widely, and the tools don't always fit a predefined corporate standard required by individual groups. In other words. sometimes you need an enterprise-class data-integration tool and sometimes you just need to move some data downstream. Companies could use ETL for the projects to move data downstream, but for some reason, people insist on the more expensive enterprise data-integration tool, which they can't afford for that particular project, so they opt to hand code. Crazy, right?
Likewise, the remaining "smaller" companies are turned off by the cost, but Sherman believes it's because they assume only Fortune 1000 companies can afford these tools. "From their perspective, you either have to pay for high-end tools or you hand code, and hand coding usually wins out," writes Sherman.
That leads us to another reason they don't use the tools: They're unaware of the market. To their detriment, they don't know about data-integration solutions that are priced within their budgets, because these are the tools that don't get mentioned by industry analyst or publications. (I hope we're the exception. I know I've talked to and about opens source solutions on more than a few occasions.)
He wraps up the piece with an admonition to expand your use of data-profiling tools, which he describes as more effective and efficient way to check your work after a data-integration project:
Data profiling should be established as a best practice for every data warehouse, BI, and data migration project. In addition to meeting project requirements, data profiling should be an ongoing activity to ensure that you maintain data quality levels.
And both groups don't really understand how far these tools have come and how versatile they can be, the article suggests. Fortunately, he has suggestions on how you can fix this problem, no matter which size company you're in, so you can wean yourself off hand-coding data integration