Money Saving Tip: Use ETL Tool for Data Quality

Loraine Lawson

Attention, bargain shoppers and frugal companies. If you'd like to have a data quality tool, but just can't spare the funds for a full solution, then I've found a deal for you.


You can use your ETL (extract, transform and load) tool for detecting data problems, cleansing the data and even maintaining data quality.


Granted, some of you may already know about this cost-saving data tip-although, according to Data Quality Pro, a surprising number of organizations do not realize the humble ETL's potential as data quality tool.


Now, for some of you, ETL may not be good enough-more on that later-or perhaps you've already invested in a full data quality solution. That's how it is with frugal advice. Sometimes it works for you, sometimes it doesn't. For instance, I will happily hang a few shirts out to dry on a sunny day, but I'm not going to make my own baby wipes.


The Data Quality Pro piece warns it isn't a replacement for a high-end data quality solution, but if you want better data, but can't afford a full-blown investment, an ETL can be an excellent starting point for three reason:

  1. You already have one-or more-on hand. So, you won't have to buy anything new.
  2. An ETL can address nearly 70 percent of data quality requirements, according to Data Quality Pro, which used Arkady Maydanchik's "Data Quality Assessment" framework as a gauge. That remaining 30 percent? Only 7 percent of it is completely non-complaint. It's not perfect, but it's better than 100 percent unsure, right?
  3. You can use the ETL data quality approach to tackle low-hanging fruit problems, and then turn any cost savings into a down payment for a full-blown tool-or, at least, that's what the Data Quality Pro piece recommends. I say it's your money, do what you want with it.


The piece offers three steps for an ETL data quality project, but to be honest, it's more tips for succeeding with this approach -- understand underlying data quality principles, pick the low-hanging fruit, yadayada. There's not a lot of detail here on how to actually make it work. That said, some of the comments offer helpful advice.


However, if you'd like more specifics, check out this excellent IT toolbox wiki on ETL data quality, with Vincent McBurney, a Deloitte manager, as the major editor. It offers more detail on how ETL can work for detecting problems, cleansing the data and then auditing for quality control.


And if you want to get really specific, you can check out this short article-complete with diagram - about how to perform basic data cleansing using an ETL tool.

Add Comment      Leave a comment on this blog post
Jun 5, 2009 9:50 AM Len Dubois Len Dubois  says:

My response should be entitled "Leave Data Quality to the Data Quality Experts"

Your description of an ETL tool in the article leads the reader to believe that an ETL tool can handle 70% of an organizations data quality problems.  Nothing could be further from the truth.

1. Most organization only access 50% of their information assets via an ETL tool, so right off the bat you aren't going to get to the 70% mark that you quoted.

2. If you want to increase the amount of data that you access, I am sure that an ETL vendor will sell you more access rights, but that is going to cost you more.

3. An ETL tool will only tell you what is wrong with the data.  Most of the effort to correct the data will occur manually which will astronomically increase your development costs.

4. The concept of getting 70% of the data correct is so outdated that it almost borders on negligence.  I can hear the argument now in front of Congress, 'Your right Senator, we did make a mistake, but we did find 70% of the customers who had their identity stolen.'

5. Lastly ETL tools provide little if no functionality in preventing the corruption of incoming information to an organization.  So, what data you do clean with an ETL tool is going to end up corrupted the very next day with new incoming transactions.

Like I said, leave data quality to the experts please!!!!!!!

Jul 31, 2009 8:25 AM antonio romero antonio romero  says:

Don't overlook the answer that's already in front of you... If you're an Oracle database customer, Oracle Warehouse Builder is a natural choice for this kind of thing. The basic ETL functionality is included in the DB license. Full-featured match/merge and data deduplication are part of what comes free with every Oracle DB:

This is also a great stepping stone to a full-fledged data profiling and DQ process, which you can implement in Warehouse Builder if you license the DQ option.

Aug 9, 2009 7:21 AM Mike Mike  says:


Quite often data comes in as text files and quality is ... and guest what? we still got to load it and automate the process.

Most common causes of it

  1. Wrong fields sizes

  2. Wrong and inconsistent formats

  3. Logical inconsistency like typing zipcode into phone number box

  4. User Errors

and not all ETL tools can handle it

Dec 16, 2009 6:30 AM ETL Software ETL Software  says: in response to Mike

Adeptia ETL Suite is a comprehensive solution that provides a powerful data conversion capability.  This is a graphical, easy to use software that supports ANY TO ANY conversion. Flat file to XML, XML to CSV, Any Database to XML , Excel to Database ,XML to EDI , HIPAA to XML, HL7 to XML, Date format, string, numerical conversions etc.

Jun 1, 2010 1:41 AM Application Integration Application Integration  says: in response to Len Dubois

The bottom line here is that it's totally true that there is an increasing need for data quality and integration within the companies and trading partners.

Choosing or not an ETL tool depends on several factors and has to be evaluated thoughtfully for each company's situation.

Certainly, money is a key factor and an ETL tool can face this issue, however as Len comments, it may become a nightmare if the ETL tool is not able to support the entire data issue.


Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.