Generally, data people talk about two well-trodden paths to improving data quality:
Getting rid of duplications or resolving conflicts via technology. Master data management tools are the primary way to handle this kind of data quality effort. Databases can also do some data cleaning or data scrubbing, as can data management tools also during integration.
Involving business users in correcting data. That can mean anything from giving business users a way to make changes to the master data or an awareness campaign about the problems caused by data input errors.
What do you do when those techniques fail?
Elliot King, a technology writer and researcher, says there’s a third, often-overlooked cause: Conceptual data quality problems.
“Conceptual data quality problems occur when data is not well defined or it is inappropriate for its intended use,” King writes in a recent Melissa Data blog. Melissa Data is an independent vendor specializing in address verification, geocoding, duplicate identification and other contact datasets for improving data quality.
King cites the movie Moneyball. I’ll sum up: Brad Pitt, baseball, and a conceptual data quality problem, where the sport is fixating on the wrong data and, really, the wrong metrics.
King says data quality problems really break down into three general categories, which you can use as a sort of “hide-and-seek” map to pinpoint your problem:
- Operational data quality issues, which are what people typically mean when they talk about data quality problems: incomplete, corrupt or inaccurate data
- Conceptual data quality issues
- Organizational data quality problems
Start with operational and work your way down.
“When operational and conceptual data problems persist over time despite repeated attempts to fix them, organizational data quality problems are usually the culprit,” King writes. “In these cases, wrong, missing and invalid data is not really the problem, but the symptom. Something has to be fixed in the organizational structure or culture.”