Ted Friedman, vice president and distinguished analyst with Gartner’s Information Management team, recently hosted a webinar on improving data quality to support analytics. Approximately 400 people joined the event. He shared by Twitter that the most common question asked by participants was how to measure the cost of data quality issues. IT Business Edge’s Loraine Lawson followed up with an interview to find out the answer.
Lawson: I’ve read several surveys indicating CFOs and other executives are more aware of the need for data quality and the impact bad data has on the business. So it caught my eye when you shared that the most frequent question people were asking you was how to measure the cost of data quality issues. Why do you think that was the big question?
Friedman: It was interesting to me, too. I go through these cycles where I get pretty optimistic along the lines of what you just said: People really get this; they're very aware, they understand the impact. Then I go through having a state of calls with Gartner clients or just chatting with people out in the industry, and it becomes very apparent to me that the level of detail with which they are assessing the impact is very, very light.
It seems for many of them, it’s done largely through intuition. They can make a good logical argument to say, “Well, if the quality of our customer data is bad, then customer satisfaction is kind of degraded.” And that gets them so far in engaging the business stakeholders and such. I’m generally talking IT people here. And then when they try to go to the next step and secure resources, really try to make something happen, and people come back and say, “Well, wait a minute. What exactly is the cost benefit there? Have we actually quantified how much we’re losing because of that?” They're getting pressed to go to the next level of detail and then that’s when they start asking, “How do I really, in a very solid, quantified way, assess the impact of this?”
I also think there are always a lot of beginners that are just now trying to formalize their programs. The discussions around metrics and how do we measure and all that drives this question on, “We’ve measured stuff but how do we translate that into some quantified ideally financial impact on the business?”
People have a sense that poor-quality data is problematic for them, but I still continue to feel — and that webinar and the questions are another point of evidence on it — that most organizations have not done the math in a very rigorous way.
Lawson: What do you tell them when they ask about how to measure the cost?
We’re talking about the costs that come from redundancies. I’m just thinking about all the shadow massaging of data that goes on because people don’t trust corporate systems and data warehouses. They don’t believe the quality of data is correct, and rightfully so in many cases.
You can actually observe business processes and observe people working and basically analyze the amount of time being spent working around and compensating for poor-quality data. I’m giving you the simplified view here, but multiply that by the fully loaded cost of your labor force and there you go: You have an estimate of cost of poor data quality from an efficiency point of view.
Then there’s the obvious low-hanging fruit where when a business process breaks because of some data quality issue, what is lost as a result of that? Do we lose a customer? Is customer return higher? What does it cost just to retain customers? Attract new customers and all the costs associated with that?
I’m thinking about supply chain examples, which I see a lot of these days: Do we have a higher inventory carrying costs than what we really need to have because the quality of our forecast data is poor? There are loads of different, very specific ways that I think organizations can measure the cost of poor quality data like that. All these efficiency-related things would be one category.
Another one would be related to risk. We talk about poor-quality data as creating risk to the enterprise in various ways. It could be anything from risk of violation of regulatory compliance mandates; think about Basel and Solvency over in the UK. If I get my numbers wrong, I violate those things and the result is fines and sanctions, and those have monetary impact. So I can basically quantify the risk, in fact calculate the cost of the risk of those things, and that adds to my business case in terms of what does data quality cost me.
It can also be risk in a classic financial sense. If I don’t have good visibility to the actual performance of the enterprise, I could be getting ready to drive off a financial cliff, as it were. What’s the risk in terms of staying in business or the financials of the organization? It could be risk in the sense of legal and litigation too. If I’m not stewarding my data in the appropriate ways, could I be subject to some legal action on the part of my customers or my shareholders or other parties? What’s the potential cost associated with that? So there’s risk-oriented elements that come into the business case that, in our view, can be quantified to some degree.
Then the third category of drivers that could be quantified, and we just talk about it in the form of value creation or opportunity costs to say it using a cost perspective: What am I losing? What am I missing in terms of top line growth, in terms of being able to increase profitability with my customers, in terms of being able to enter new markets, perhaps, because I don’t have the necessary insights or operational capabilities or agility due to lack of good quality data? Whatever growth might mean to any particular organization, I can begin to quantify those type things.
The one last thought I’d throw in around cost and it’s why I think we get so many questions: It is very personal, I think, to each organization. I’ve given you the generic categories of cost that could be relevant, but I think it’s down to each organization to personalize that. What are their specific current corporate goals and objectives and how does data quality degrade those? And for some, efficiency will be big. For others, risk will be big. Others are in a growth mode. Personalizing those things and putting them into a context that really makes sense and has solid math behind it, such that the stakeholders in the business can really get behind it, is really key. And that’s where I guess I see a lot of organizations struggle these days.
Lawson: Now, are they pretty happy with those answers or did they — have you ever had people come back and say that wasn’t enough or didn’t work?
Friedman: I don’t know that people are saying it didn’t work. Obviously they're always looking for more detail. They want a perfect cookbook recipe for how to plug in some numbers and crank out an estimate of cost of poor-quality data in their enterprise.
I just don’t think that’s reasonable given the personalization thing. So we try to give them some of the basic parameters to work with and then hopefully coach them in how to map that to their specific requirements and scenarios.
The other part of the dialogue is they need to do more to measure baseline levels of data quality. Having well-formed data quality metrics, having the infrastructure to measure those in a fairly comprehensive and ongoing fashion, use that to set appropriate goals and targets, and then quantify, in the ways I just said, the gap between where they are today and what those targets are that they want to hit. As an industry, we collectively in these organizations I’m talking to need to do better at rigorous measuring and monitoring of data quality as well.