Fatal Accident Shows Why Data Quality Matters

Loraine Lawson

Although you may not typically associate the two, data quality is connected to successful integration. Too often, though, integration focuses on the technical aspects of connecting two data sets and not on the more strategic value of ensuring you've also matched and integrated the business context of that data.


As Jim Harris, an independent consultant and data quality blogger, explained to me during a recent interview, data quality tends to fall between the cracks of the integration project-which IT oversees-and the actual use of the data by business users. Said Harris:

When you bring all of that data together though, that's when you start to find that the same customer is actually living in multiple databases, possibly represented in different ways. Then when you start making business decisions based on the integrated data, that's when you'll potentially start sending me credit card offers for a credit card that I already have or asking me to go up for telephone service when I've been a customer of yours for three years.
... just because the data is housed in the database and IT runs the database, doesn't mean that they understand the business content of what goes into the database. And I think that's where the breakdown of, 'Oh, it's just IT's responsibility' just doesn't work. IT knows technology, but they don't necessarily know what the data means to the business.

Data quality is easy to ignore and it's also common to take a reactive approach to data quality. Said Harris:

The analogy I like to use there is like it's if your house is on fire, it's not very difficult to get people to say, 'Hey, maybe we should put the fire out.' But, it's very difficult to get people to practice fire safety.

That can be a critical, even fatal, mistake to make. If that sounds overly dramatic, then consider this news report raising questions about whether bad data contributed to a fatal pipe explosion in San Bruno, Calif., on Sept. 9, 2010.


The San Francisco Chronicle reported that uncorrected omissions and data entry errors in the Pacific Gas and Electric Company "may explain why PG&E was unaware that the 1956-vintage pipeline had been built with a seam, according to records and interviews," adding that federal investigators found the explosion started at a poorly installed weld on the seam. The report suggests the data should have indicated a potential problem. The article also reported that the National Transportation Safety Board was evaluating PG&E's data system as part of its investigation in the explosion, which killed eight.


PG&E has since said it wouldn't have changed how the utility kept track of the pipeline, even if the data had been correct and it had known about the seam.


But that doesn't change the point, because the company is nonetheless paying a heavy price just to address the questions raised by the press and federal investigators because of its poor data quality. So, the lesson remains: Data quality matters.


Of course, while data quality may be coupled with integration-which I also discussed with Harris-it has to be an enterprise-wide issue that involves business users. As Harris explains in the second part of our interview, there are often systemic issues organizations need to address, even down to the data entry level.


The other problem with data quality is that it can be very difficult to justify the investment. The key to building a business case for data quality is to focus on concrete business drivers, according to David Loshin, president of Knowledge Integrity, Inc., and author of "The Practitioner's Guide to Data Quality Improvement." Loshin shared his thoughts on how to build a business case for data quality with me during a recent e-mail interview:

We start with a hierarchy of business value driver areas-such as financial, risk and productivity-break that down into smaller digestible chunks, look at the data sets that are relevant to success in each area, and talk to the business function leaders to understand how data issues impact that success. This gives us some hard measures that correlate data issues to measurable business impacts.

You might also check out Loshin's interview with TDWI's BI This Week, which goes into a bit more detail about some of the issues we discussed, including more about how master data management relates to data quality.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


More from Our Network
Add Comment      Leave a comment on this blog post
Mar 14, 2011 8:49 AM Jeff Kibler Jeff Kibler  says:

Loraine -

You've hit on a very important subject, and in your analysis, you indicated two of the biggest yet neglected aspects of data quality: prevention and ownership.

In my previous company where I served as a program manager for data quality, we continually stressed that data quality/info quality is the job of every individual in the company.  Without quality data, you're truly guiding your ship without a compass on cloudy day.  What's difficult, though, is company culture.  You need each individual (from senior exec to software engineer) to understand the value in good data.

At Infobright, we offer an open-source analytic database.  When community members use us, I also push the value of proper data quality verifications, checks, processes, and steps from end-to-end in their data pipeline.  DQ cannot be "mastered" at one point in the process.  Nor can it be mastered "once and for all".  It's a continual, evolving process that needs proper attention.

Thanks for the article; I appreciate the continued spotlight on this ever-important subject.

Jeff Kibler

Community Manager

Infobright, Inc.

Mar 18, 2011 7:49 PM Lindsey Niedzielski Lindsey Niedzielski  says:

Great post Loraine. This is probably one of the most relevant examples of the importance of data quality. We have posted this on our community for IM professionals (www.openmethodology.org). Great work as usual!

Apr 25, 2011 5:11 PM Loraine Lawson Loraine Lawson  says:

There is an update on this story, with PG&E again reiterating and explaining to state regulators why better data wouldn't have mattered. Nonetheless, state regulators are looking at sanctioning the company for shoddy record keeping, according to the San Francisco Gate.


Aug 4, 2014 3:49 PM Scott Johnson Scott Johnson  says:
Great post Loraine. Poor data quality can be extremely costly to an organization. This is a really good example. Fortunately, with cloud computing technologies that can be implemented quite easily enterprise wide, a lot of these data quality problems can be solved. Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.