Although you may not typically associate the two, data quality is connected to successful integration. Too often, though, integration focuses on the technical aspects of connecting two data sets and not on the more strategic value of ensuring you've also matched and integrated the business context of that data.
As Jim Harris, an independent consultant and data quality blogger, explained to me during a recent interview, data quality tends to fall between the cracks of the integration project-which IT oversees-and the actual use of the data by business users. Said Harris:
When you bring all of that data together though, that's when you start to find that the same customer is actually living in multiple databases, possibly represented in different ways. Then when you start making business decisions based on the integrated data, that's when you'll potentially start sending me credit card offers for a credit card that I already have or asking me to go up for telephone service when I've been a customer of yours for three years.
... just because the data is housed in the database and IT runs the database, doesn't mean that they understand the business content of what goes into the database. And I think that's where the breakdown of, 'Oh, it's just IT's responsibility' just doesn't work. IT knows technology, but they don't necessarily know what the data means to the business.
Data quality is easy to ignore and it's also common to take a reactive approach to data quality. Said Harris:
The analogy I like to use there is like it's if your house is on fire, it's not very difficult to get people to say, 'Hey, maybe we should put the fire out.' But, it's very difficult to get people to practice fire safety.
That can be a critical, even fatal, mistake to make. If that sounds overly dramatic, then consider this news report raising questions about whether bad data contributed to a fatal pipe explosion in San Bruno, Calif., on Sept. 9, 2010.
The San Francisco Chronicle reported that uncorrected omissions and data entry errors in the Pacific Gas and Electric Company "may explain why PG&E was unaware that the 1956-vintage pipeline had been built with a seam, according to records and interviews," adding that federal investigators found the explosion started at a poorly installed weld on the seam. The report suggests the data should have indicated a potential problem. The article also reported that the National Transportation Safety Board was evaluating PG&E's data system as part of its investigation in the explosion, which killed eight.
PG&E has since said it wouldn't have changed how the utility kept track of the pipeline, even if the data had been correct and it had known about the seam.
But that doesn't change the point, because the company is nonetheless paying a heavy price just to address the questions raised by the press and federal investigators because of its poor data quality. So, the lesson remains: Data quality matters.
Of course, while data quality may be coupled with integration-which I also discussed with Harris-it has to be an enterprise-wide issue that involves business users. As Harris explains in the second part of our interview, there are often systemic issues organizations need to address, even down to the data entry level.
The other problem with data quality is that it can be very difficult to justify the investment. The key to building a business case for data quality is to focus on concrete business drivers, according to David Loshin, president of Knowledge Integrity, Inc., and author of "The Practitioner's Guide to Data Quality Improvement." Loshin shared his thoughts on how to build a business case for data quality with me during a recent e-mail interview:
We start with a hierarchy of business value driver areas-such as financial, risk and productivity-break that down into smaller digestible chunks, look at the data sets that are relevant to success in each area, and talk to the business function leaders to understand how data issues impact that success. This gives us some hard measures that correlate data issues to measurable business impacts.
You might also check out Loshin's interview with TDWI's BI This Week, which goes into a bit more detail about some of the issues we discussed, including more about how master data management relates to data quality.