Talend Releases Open Source Data Cleaning Tool

Loraine Lawson

For years, my cousin, Scottie, would visit from Albuquerque, New Mexico, every summer. She announced, one summer, that she now went by "Margaret," her first name. Margaret turned into Maggie, then Mags, then Maggie again, who offered dispensation to us at the ripe age of 25. We were allowed to use Scottie again, which was a relief, since I'd never been able to give up calling her Scottie in the first place. I'm not sure what prompted all the aliases, but I imagine her casual name changes would cause quite a few database headaches today.


Last Wednesday, Talend unveiled a new offering to deal with exactly these kinds of data quirks. The new product, Talend Data Quality, is certainly not the first such tool, though Talend does contend it's the first product that combines "data integration, data profiling, and data quality in a single open source suite," according to this TMCnet article.


Regardless of its standing in the pantheon of firsts, it's an open source tool for finding and fixing dirty data, including nicknames and duplicated records. It can also confirm addresses, phone numbers, ZIP codes and abbreviations by comparing the data with with reference information from providers such as the U.S. Postal Service and mailing databases in other countries, according to the press release. Talend even promises the tool can pick up on variations, including, I was pleased to note, the incarnations of "Margaret."


Yves De Montcheuil, vice president of worldwide marketing for Talend, told Network World that the tool can also be used for cleaning product data. That article also includes a nice description of the tool's drag-and-drop graphical interface.


It'll be interesting to see how data quality offerings fit in with the growth of master data management. Perhaps tools like this will appeal to smaller companies that can't afford MDM's hefty price tag.


A recent TechTarget article reports that mid-market companies are increasingly concerned about data integrity. Michael Dortch, senior analyst at Boston-based Aberdeen Group, told TechTarget that many mid-market companies are adopting tiered storage and building structured document repositories to address data quality problems.


Dortch also said there may be SaaS versions of some MDM products announced in December, which could make for an intriguing shake-up in the data quality space.


For now, however, those who can't afford MDM or proprietary data quality tools might want to look into Talend Data Quality, which will be released at the end of September under the GPL. Subscription fees for tech support and other services will start at $15,000 per year.


Talend CEO Bertrand Diard told IT Business Edge's Lora Bentley in March that the open source model gives customers an "insurance policy" against vendor changes in the data integration space:

"... the recent M&As in the space tremendously help the open source cause. Clients are tired of being victims of product strategy shifts, price list changes and other demands of proprietary vendors."

Add Comment      Leave a comment on this blog post
Aug 27, 2008 10:41 AM Eric Hall Eric Hall  says:
Glad Scottie finally got back to being Scottie... is she still an Ann Ryand devotee, or did she drop that when she became Margaret?Thanks for the notice of the Talend tool... having another one of these in the toolbox can't hurt. Reply
May 13, 2011 3:48 AM Rose Rose  says:

Thanks for the notification of the talent tool. I was waiting for it.

May 25, 2011 3:21 AM Matt Matt  says:

Talent tool!! Its sound interesting. I was waiting for it.

Jan 9, 2014 12:24 AM jakegeorge jakegeorge  says:
Talend Online Training Call Us-91-900-044-4287 21st Century Software Solutions Online Training Introduction Introduction to Talend Why Talend? Talend Editions and Features Talend Data Integration Overview Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.