Money Saving Tip: Use ETL Tool for Data Quality


Attention, bargain shoppers and frugal companies. If you'd like to have a data quality tool, but just can't spare the funds for a full solution, then I've found a deal for you.


You can use your ETL (extract, transform and load) tool for detecting data problems, cleansing the data and even maintaining data quality.


Granted, some of you may already know about this cost-saving data tip-although, according to Data Quality Pro, a surprising number of organizations do not realize the humble ETL's potential as data quality tool.


Now, for some of you, ETL may not be good enough-more on that later-or perhaps you've already invested in a full data quality solution. That's how it is with frugal advice. Sometimes it works for you, sometimes it doesn't. For instance, I will happily hang a few shirts out to dry on a sunny day, but I'm not going to make my own baby wipes.


The Data Quality Pro piece warns it isn't a replacement for a high-end data quality solution, but if you want better data, but can't afford a full-blown investment, an ETL can be an excellent starting point for three reason:

  1. You already have one-or more-on hand. So, you won't have to buy anything new.
  2. An ETL can address nearly 70 percent of data quality requirements, according to Data Quality Pro, which used Arkady Maydanchik's "Data Quality Assessment" framework as a gauge. That remaining 30 percent? Only 7 percent of it is completely non-complaint. It's not perfect, but it's better than 100 percent unsure, right?
  3. You can use the ETL data quality approach to tackle low-hanging fruit problems, and then turn any cost savings into a down payment for a full-blown tool-or, at least, that's what the Data Quality Pro piece recommends. I say it's your money, do what you want with it.


The piece offers three steps for an ETL data quality project, but to be honest, it's more tips for succeeding with this approach -- understand underlying data quality principles, pick the low-hanging fruit, yadayada. There's not a lot of detail here on how to actually make it work. That said, some of the comments offer helpful advice.


However, if you'd like more specifics, check out this excellent IT toolbox wiki on ETL data quality, with Vincent McBurney, a Deloitte manager, as the major editor. It offers more detail on how ETL can work for detecting problems, cleansing the data and then auditing for quality control.


And if you want to get really specific, you can check out this short article-complete with diagram - about how to perform basic data cleansing using an ETL tool.