Adding Common Sense to Data Quality


Christopher Murphy of InformationWeek tried to start a rumble with the data quality pros this week in his Global CIO column. "IT Execs Worry Too Much About Data Quality," the headline declared, with the subhead taunting, "How's that for a bit of heresy?"


"... you know what can foul up those analytics effort?" he wrote. "Too much focus on data quality and other forms of data management, especially as those efforts get started."


As proof, he cited IBM Vice President of Information Management Strategy Andy Warzecha, P&G CIO Filippo Passerini and an IBM survey on analytics that queried around 3,000 executives.


His thesis: Sometimes, maybe even often, "rough directional data is a great starting point for discussions," therefore don't let the quest for perfect data slow down using data.


Not surprisingly, there were responses, both direct and indirect. I personally did my best to stir the pot by sharing the link with some specific data quality pros via Twitter. What can I say? I love a good blog rumble.


Surprisingly, the responses I read hardly viewed his conclusion as heresy. Instead, their response was more of the "Well, yeah" variety.


David Loshin-whom I interviewed recently and who literally wrote the book on data quality-wrote an excellent response. He reviewed the conclusion paper on the IBM study cited by Murphy, and suggested the study's top three "biggest barriers to broader use of analytics" would actually apply to anything new and misunderstood-not just data quality.


That said, he also offers this:

Look at companies like Amazon, Netflix, eBay, or Orbitz, which collects reams and reams of transactions and are able to consumer, analyze, and apply the results. If there are some errors in the data, it is probably OK, since a small amount of bad data has little overall effect on the aggregate results. In addition, in some scenarios, there is an ability to tolerate some incorrect conclusions because real-time performance monitoring is in place that allows the company to rapidly change direction if a decision turns out to be bad.

Phil Simon, a consultant and author who frequently writes about data, also wrote a blog post on the topic this week. To be fair, he doesn't specifically say he's addressing Murphy's article, but given the content and timing, he certainly could be accused of it.


Simon points out that not all data errors are created equally. In other words: Sometimes the problems are no big deal, but sometimes they are. In fact, he even offers a table showing the three common types of errors, ranked by "Should You Freak Out: Probably, Kind of and No."


His parting words on the topic:

Fight the urge to treat all errors and issues as equal. They are not. Take the time to understand the nuances of your data, your information management project's constraints, and the links among different systems, tables, and applications.

Mind you, data quality does matter-as Oliver Claude, an Informatica vice president, recently noted, data quality can be an "integration killer." But, clearly, you don't have to be perfect at data quality to make it worthwhile. It depends, as Loshin notes, on several factors, including your noise tolerance level. That's not heresy. That's just common sense.