How to Measure the Cost of Data Quality Problems

Loraine Lawson
Slide Show

Questions to Ask Before Implementing a Business Intelligence/Analytics System

Ted Friedman, vice president and distinguished analyst with Gartner’s Information Management team, recently hosted a webinar on improving data quality to support analytics. Approximately 400 people joined the event. He shared by Twitter that the most common question asked by participants was how to measure the cost of data quality issues. IT Business Edge’s Loraine Lawson followed up with an interview to find out the answer.

Lawson: I’ve read several surveys indicating CFOs and other executives are more aware of the need for data quality and the impact bad data has on the business. So it caught my eye when you shared that the most frequent question people were asking you was how to measure the cost of data quality issues. Why do you think that was the big question?

Friedman: It was interesting to me, too. I go through these cycles where I get pretty optimistic along the lines of what you just said: People really get this; they're very aware, they understand the impact. Then I go through having a state of calls with Gartner clients or just chatting with people out in the industry, and it becomes very apparent to me that the level of detail with which they are assessing the impact is very, very light.

It seems for many of them, it’s done largely through intuition. They can make a good logical argument to say, “Well, if the quality of our customer data is bad, then customer satisfaction is kind of degraded.” And that gets them so far in engaging the business stakeholders and such. I’m generally talking IT people here. And then when they try to go to the next step and secure resources, really try to make something happen, and people come back and say, “Well, wait a minute. What exactly is the cost benefit there? Have we actually quantified how much we’re losing because of that?” They're getting pressed to go to the next level of detail and then that’s when they start asking, “How do I really, in a very solid, quantified way, assess the impact of this?”

I also think there are always a lot of beginners that are just now trying to formalize their programs. The discussions around metrics and how do we measure and all that drives this question on, “We’ve measured stuff but how do we translate that into some quantified ideally financial impact on the business?”

People have a sense that poor-quality data is problematic for them, but I still continue to feel — and that webinar and the questions are another point of evidence on it — that most organizations have not done the math in a very rigorous way.

Lawson: What do you tell them when they ask about how to measure the cost?

Friedman: We tell them that there are really several ways that poor-quality data creates cost for the enterprise. There’s a set of costs related to efficiency concerns. That is, poor-quality data degrades the efficiency of the enterprise. So we’re talking there about the cost of reduced productivity of people and we’re talking about the costs that come from less-than-optimal performance of business processes — things moving more slowly than they can and should move.

We’re talking about the costs that come from redundancies. I’m just thinking about all the shadow massaging of data that goes on because people don’t trust corporate systems and data warehouses. They don’t believe the quality of data is correct, and rightfully so in many cases.

You can actually observe business processes and observe people working and basically analyze the amount of time being spent working around and compensating for poor-quality data. I’m giving you the simplified view here, but multiply that by the fully loaded cost of your labor force and there you go: You have an estimate of cost of poor data quality from an efficiency point of view.

Then there’s the obvious low-hanging fruit where when a business process breaks because of some data quality issue, what is lost as a result of that? Do we lose a customer? Is customer return higher? What does it cost just to retain customers? Attract new customers and all the costs associated with that?

I’m thinking about supply chain examples, which I see a lot of these days: Do we have a higher inventory carrying costs than what we really need to have because the quality of our forecast data is poor?  There are loads of different, very specific ways that I think organizations can measure the cost of poor quality data like that.  All these efficiency-related things would be one category.

Another one would be related to risk. We talk about poor-quality data as creating risk to the enterprise in various ways. It could be anything from risk of violation of regulatory compliance mandates; think about Basel and Solvency over in the UK. If I get my numbers wrong, I violate those things and the result is fines and sanctions, and those have monetary impact. So I can basically quantify the risk, in fact calculate the cost of the risk of those things, and that adds to my business case in terms of what does data quality cost me.

It can also be risk in a classic financial sense. If I don’t have good visibility to the actual performance of the enterprise, I could be getting ready to drive off a financial cliff, as it were. What’s the risk in terms of staying in business or the financials of the organization?  It could be risk in the sense of legal and litigation too. If I’m not stewarding my data in the appropriate ways, could I be subject to some legal action on the part of my customers or my shareholders or other parties? What’s the potential cost associated with that? So there’s risk-oriented elements that come into the business case that, in our view, can be quantified to some degree.

Then the third category of drivers that could be quantified, and we just talk about it in the form of value creation or opportunity costs to say it using a cost perspective: What am I losing? What am I missing in terms of top line growth, in terms of being able to increase profitability with my customers, in terms of being able to enter new markets, perhaps, because I don’t have the necessary insights or operational capabilities or agility due to lack of good quality data? Whatever growth might mean to any particular organization, I can begin to quantify those type things.

The one last thought I’d throw in around cost and it’s why I think we get so many questions: It is very personal, I think, to each organization. I’ve given you the generic categories of cost that could be relevant, but I think it’s down to each organization to personalize that. What are their specific current corporate goals and objectives and how does data quality degrade those? And for some, efficiency will be big. For others, risk will be big. Others are in a growth mode. Personalizing those things and putting them into a context that really makes sense and has solid math behind it, such that the stakeholders in the business can really get behind it, is really key. And that’s where I guess I see a lot of organizations struggle these days.

Lawson: Now, are they pretty happy with those answers or did they — have you ever had people come back and say that wasn’t enough or didn’t work?

Friedman: I don’t know that people are saying it didn’t work. Obviously they're always looking for more detail. They want a perfect cookbook recipe for how to plug in some numbers and crank out an estimate of cost of poor-quality data in their enterprise.

I just don’t think that’s reasonable given the personalization thing. So we try to give them some of the basic parameters to work with and then hopefully coach them in how to map that to their specific requirements and scenarios.

The other part of the dialogue is they need to do more to measure baseline levels of data quality. Having well-formed data quality metrics, having the infrastructure to measure those in a fairly comprehensive and ongoing fashion, use that to set appropriate goals and targets, and then quantify, in the ways I just said, the gap between where they are today and what those targets are that they want to hit. As an industry, we collectively in these organizations I’m talking to need to do better at rigorous measuring and monitoring of data quality as well.

Add Comment      Leave a comment on this blog post
Apr 17, 2013 6:39 AM Doug Laney Doug Laney  says:
Hi Loraine, Great interview with one of the brilliant data quality minds out there. Since this piece, Ted and I have developed a toolkit available to Gartner clients for measuring and monitoring over a dozen different types of data quality dimensions: Cheers, Doug Laney, VP Research, Gartner, @doug_laney Reply
Apr 17, 2013 8:09 AM John O'Gorman John O'Gorman  says:
Hi Loraine - I agree with Doug: One of your best interviews and very well laid out in terms of the progression of the challenge. Intuition and anecdote can only take you so far, just as traditional solutions can only solve part of the problem. The thing is, proving the ROI for a series of data quality initiatives relies on a set of assumptions that has no support on the implementation side. Reluctance at the enterprise level is due in large part to the fact that there are very few solutions out there that do not simply create yet another silo, albeit much larger and more complex than the ones in existence now. The answer is to change our perspective and 'see' information in the same way our brains 'see' input: it requires immediate translation from the granular and linear (via our senses) to the stereoscopic and dimensional. That may sound a bit wierd but the answer to poor data quality is geometric, not technical. Reply
Apr 24, 2013 8:23 PM Nadina Jose Nadina Jose  says:
I agree with John that a much needed paradigm shift in how data is managed is crucial. I also believe that in order for data quality to be even branded as "good quality" - which in the pharmaceutical and biotech industry translates to "reliable" data; the data has to be tracked, monitored and trended in as close to real time as possible. This goes along with what John labeled as "immediate translation from the granular and linear to the stereoscopic and dimensional." This likewise relates to increasing efficiency which would contain the cost as issues which are identified as soon as data is gathered are mitigated and managed almost immediately. This of course is in sharp contrast to the norm which is to react to a report generated as opposed to see what is going on with the data via a dashboard which can be accessed on an iPad. Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.