When you dig into data quality—and more of you are—you’ll hear a lot about “good enough” data quality. But what the heck does that mean? And how do you know if you’ve achieved it?
Data folks have long understood that data quality is a continuum. Data quality comes with an associated cost and, at some point, that cost is not worth paying to further “perfect” the data; hence, the concept of “good enough” data quality.
That may have made sense in a relational database world, but now … it’s complicated. The data isn’t just being used for reporting, but is also being leveraged in BI and analytics systems. Data has left IT and is being used to drive decisions across the organization. What’s more, data looks different—it’s now social data, sensor data, external data, Big Data.
BI in particular raises new questions about data quality, as independent researcher Howard Dresner writes. BI is also bringing to light data quality problems that were previously invisible.
“Funny (and sadly) enough, many organizations only find out about their data quality issues once they actually start using their business intelligence solution,” writes Mario Barajas, a business intelligence consultant for the ZAP BI U.S. office, in a recent SmartData Collective column.
Barajas includes two striking BI graphs that show the problem much better than words can explain.
He also offers four tactics you can use to address data quality. All are worth reading, but his recommendation that organizations take a risk-based approach to data quality really speaks to this question of defining good enough data quality.
There are two key questions to consider in a risk-based approach:
- What are the most important data attributes to correct? Obviously, customer addresses matter more than some attribute that was once important, but no longer matters, he explains.
- Will fixing the problem cost more than you’ll earn or save by improving the data? Dresner points out a major consideration on cost, noting that correcting data can be up to 10 times as expensive downstream than at the point of entry.
Barajas explains why both matter:
“A better approach is to understand the impact of a 1 percent improvement on the data quality of either and focus any efforts on the one that delivers the most benefits. An even better approach would be to understand whether a 1 percent improvement delivers a comparable benefit to a 1 percent deterioration, or another way to put it: if the deterioration of 1 percent of my customer records will make me lose more money than I’ll make by the 1 percent improvement of this.”
It sounds complicated, but what he’s really going for here is a gut-level response from the data stakeholders about data quality’s value. Of course, if you can, quantify it, which seems like a better idea given that research shows that we’re not as intuitive as we think we are.
Barajas isn’t the only one rethinking how to define “good enough” data quality. Diginomica reports that Informatica added a feature allowing users to rank data quality, in the same way Amazon customers can rank products.
Loraine Lawson is a veteran technology reporter and blogger. She currently writes the Integration blog for IT Business Edge, which covers all aspects of integration technology, including data governance and best practices. She has also covered IT/Business Alignment and IT Security for IT Business Edge. Before becoming a freelance writer, Lawson worked at TechRepublic as a site editor and writer, covering mobile, IT management, IT security and other technology trends. Previously, she was a webmaster at the Kentucky Transportation Cabinet and a newspaper journalist. Follow Lawson at Google+ and on Twitter.