Better Data Integration with Hadoop? It’s Possible | IT Business Edge

Better Data Integration with Hadoop? It’s Possible

More organizations are using Hadoop not just to process large datasets, but as a replacement for the transformation engines in ETL. But is Hadoop capable of being a data integration platform, complete with data quality functions? Gartner analyst Ted Friedman (@Ted_Friedman) thinks not. Friedman recently wrote a research paper, “Hadoop is Not a Data Integration […]

Written By
Loraine Lawson
Loraine Lawson
Feb 7, 2013

More organizations are using Hadoop not just to process large datasets, but as a replacement for the transformation engines in ETL.

But is Hadoop capable of being a data integration platform, complete with data quality functions?

Gartner analyst Ted Friedman (@Ted_Friedman) thinks not. Friedman recently wrote a research paper, “Hadoop is Not a Data Integration Solution,” on the topic. The description sums up his point:

“As use of the Hadoop stack continues to grow, organizations are asking if it is a suitable solution for data integration. Today, the answer is no. Not only are many key data integration capabilities immature or missing from the stack, but many have not been addressed in current projects.”

I haven’t read the paper, because I’m not a client and it’s $195, but Todd Goldman has. Goldman is vice president and general manager for Enterprise Data Integration at Informatica. He wrote a response to the paper.

 He says many companies are turning Hadoop into a data integration platform.

“Gartner is correct in that, Hadoop, by itself, is NOT a data integration platform,” Goldman writes. “However, it can be made into a data integration platform. Lots of companies are investing in making Hadoop based integration easier.”

Informatica did this by porting its Virtual Data Machine onto Hadoop, he adds, giving companies the same integration development environment they use for ETL jobs, with Hadoop as the underlying engine.

Not surprisingly, Informatica is not the only vendor investing in adding full data integration platform capabilities to Hadoop.

“The market in general is moving in this direction so expect to see some exciting capabilities emerging over the next six months,” he states, adding that there are companies already using a kind of graphical development environment with Hadoop — as opposed to hand-coding MapReduce jobs. Not surprisingly, they’re able to create code five times faster, he said.

Hadoop has already made it possible to run more complex transformations in substantially less time than traditional ETL tools. Some companies are even running sophisticated integration jobs, he adds, without hiring expensive data scientists or MapReduce specialists.

If you’d like to read more about Big Data integration, check out this Big Data integration piece by Richard Daley, industry veteran and co-founder of Pentaho. Daley looks at all the tools in the Hadoop stack and discusses supporting integration for other NoSQL solutions, such as MongoDB, Cassandra and HBASE.

Loraine Lawson

Loraine Lawson is a freelance writer specializing in technology and business issues, including integration, health care IT, cloud and Big Data.

IT Business Edge Logo

The go-to resource for IT professionals from all corners of the tech world looking for cutting edge technology solutions that solve their unique business challenges. We aim to help these professionals grow their knowledge base and authority in their field with the top news and trends in the technology space.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.