Everybody knows you can fit a square peg in a round hole without damaging either the peg or the hole-or both, right?
Yet, that's often what's happening when you try to integrate third-party, external data with your enterprise data warehouse. And the consequence is you could damage the credibility of the data from your data warehouse, according to Babu Narayanan Ramakrishnan, a senior technical architect with the BI practice of HCL Technologies.
We tend to talk about integration as a black box. If you solve this thing called "integration," you're done and you can move on. But it's not that simple. Actually, integration raises a host of issues that IT needs to resolve beyond the technical issue of connecting the data. And these issues are the ones that can hurt the business if they're not dealt with correctly.
In a recent Information Management article, Ramakrishnan gives us a look inside the integration black box - specifically, he discusses the problems created by integrating data from an external source with your enterprise data warehouse (or EDW, as he calls it). To be honest, I think the piece is primarily written for the data warehousing staff, but it covers some essential business issues as well.
It turns out, enterprise data warehouses are a bit persnickety. You might say they're set in their ways, and they're not willing to change to accommodate any newfangled data from the outside.
So, for instance, if the enterprise data warehouse considers a customer a customer even if there have been no transactions with that customer in a year, then it doesn't matter what the external data says about it being "inactive." Gosh darn it: That's a customer! Or, as Ramakrishnan explained it, it's a matter of compliance between the two systems:
...the external source may call the account inactive if there is no transaction with that customer for more than 180 days. But the EDW may never call a customer inactive unless there is an explicit transaction that requests the business to close all the related accounts. Whatever the intelligent transformation rule is to handle such anomalies, the amount of effort and time required to make that transformation will far exceed the value it can bring.
And, of course, the third-party data is seldom documented well, which leaves IT staff with no option but to guess, as Baseline Consulting's Evan Levy pointed out during a recent interview:
Unfortunately, what a lot of people do is they actually eyeball two files. They have an idea of which columns are different because of tabs or whatever, and they guess, 'Oh, I think I can link this to this.' And they have to do a little bit of guesswork because there's a lot of data in Excel spreadsheets, there's a lot of data in Access databases, where no one goes to the effort of identifying the column names and inputting the metadata.
That's right - guess. As in, this may be the data that you definitely need to make that $1 million business decision - or it may not. There's no metadata to tell us for sure, so we're guessing.
Obviously, data warehouses don't like ambiguity, so sometimes, they actually absorb the external data, thus the incoherent data from multiple sources "are forced fit into the same table." And, of course, that undermines the credibility of the enterprise data warehouse's information.
That's just one example of the problems you'll encounter while trying to integrate external data with an enterprise data warehouse. Besides compliance between the data, other potential landmines include:
Ramakrishnan does offer possible options for dealing with each these conundrums, although they're not as definitive as you might hope. It turns out several issues will need to be handled contractually-and here I mean an actual, written contract-before you've committed to sharing data with the third party.
You might also want to check out a list by Bob Sala, CEO of Distributed Market Advantage, of "Seven Guiding Principles for Selecting Integration Friendly Application Partners." Sala's company relies heavily on SaaS solutions, including PivotLink, the BI vendor that posted the list. Sala says there are more than 13 integrated SaaS applications in the company's ecosystem.
The company has learned a few things about integration with third parties along the way, hence Sala's seven guiding principles. Most of the principles address business issues-for instance, principle No. 4 calls for you to ensure the partner is financially viable and has experience hosting-but No. 7 is a very specific and deals with security certification. It's a good list. Check it out.
Obviously, third-party data integration isn't going away. On the contrary, it's increasing. So while there are and will be problems, it's something we'll have to figure out how to live with-even the curmudgeonly enterprise data warehouse.