Data Discovery: Oft-Forgotten Step Can Save Team Time, Trouble

Loraine Lawson
Slide Show

Data Integration Remains a Major IT Headache

Study shows that data integration is still costly and requires a lot of manual coding.

There's one question you should teach your data integration team to always ask business people, says William Sharp, a product manager at Informatica and author of the blog The Data Quality Chronicle.


They should ask it even if the business user has an outline of data they want you to use, and it should certainly be asked before the team writes one line of code for the ETL, he adds.


That question: What are the critical data domains we are looking to integrate into the target?


Now here's the key element: You don't just ask the question, you push for an answer.


Sharp isn't just making a theoretical recommendation - he's drawing on his own experience.

I had a marketing client that was looking to build a repository from which they could perform campaign management and analytics. They had identified what they felt were the required sources.When I asked my generic question there was a fair amount of dissent in the room and some even pointed to the source to target matrix (STTM) as my source of information. However, I pressed on and discovered that some of the more executive users of the analytics were interested in performing analysis on customers were were marketed to but the address of record, for which the source systems was included in the STTM, was not deliverable

The information he needed - you won't be surprised to learn - was in a spreadsheet somewhere.


Because of this one question, he uncovered this critical data point (so to speak) and was able to include it in the ETL sources. It also helped improve the data flow, because he added another process for discovering and profiling address data within the critical applications.


Sharp is using his experience to preach the good word about data discovery. Although it's common to ignore data profiling and enhancement, this leads to project back-tracking and more work that can be avoided if teams make data discovery a first-step priority, he contends:

In effect, I was performing two critical functions left out of the original development plan, data profiling and enhancement. I feel strongly that had these two processes not been left out, I would have had a more complete and accurate ETL development experience from the get-go.

He also offers other bits of advice and tips about better ETL design. For instance, he suggests you never develop an ETL map from a specification. Instead, use the profile results to build the ETL map.


Another time-saving, strategic tip: Make sure you have a "usable data dictionary" that's created from data discovery so you don't make the mistake of "relying on assumptions and assertions made by business analysts and database administrators," he writes.


Data discovery isn't just useful in ETL data integration projects. It's also valuable for MDM and application lifecycle, he explains in a related blog post.


If you oversee the data management team or integration developers - or if you're an IT leader who suspects your integration team isn't as effective as it could be - this blog and these posts in particular are definitely worth reading. The advice may seem a bit more tactical, but if you think of how it could improve the end results, you'll see his recommendations are key to ensuring a project achieves its more strategic goals.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


More from Our Network
Add Comment      Leave a comment on this blog post

Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.