One of the popular, and a bit surprising, use cases for Hadoop is to speed up batch processing jobs.
More companies, particularly retailers, use Hadoop basically as an ETL (extract, transform and load) tool, because their datasets had become so big, running them took more time than they had. In other words, that old IT standard of an overnight batch job wasn’t even feasible anymore.
It only makes sense, then, that someone would roll out an ETL tool specifically designed for Hadoop.
That’s basically what Syncsort, a long-time player in the data space, announced earlier with its DMX-h ETL Edition and DMX-h Sort Edition.
“Big Data is prompting organizations to look at Hadoop to process more data in less time and for less money, but Hadoop is not yet a complete ETL solution,” the press release points out.
The big issue here is the nodes. Hadoop relies on nodes, so by optimizing for Hadoop, Syncsort’s new offers are designed to make it easier to maximize node performance.
It’s a smart play, according to the IT Advisory firm ESG.
“Well-established ETL solutions do well with structured data from a few sources, but Hadoop’s ability to recombine large numbers of data sources of varying data structures relatively quickly make it a natural for, dare I use the term, big data ETL,” ESG states in a recent post.
But, as always, the problem with Hadoop isn’t its potential so much as the application of that potential. And that’s no different when it comes to writing ETL processes. You need to know how to code for MapReduce, which means knowing Hive, Pig or other open source tools.
That’s why Syncsort’s new tools, with their drag-and-drop GUI, are such a smart move, according to ESG:
“Compared to the graphical, visual, drag-and-drop tools available from established ETL solutions, with Hadoop you must code. Syncsort DMX-h ETL Edition will help Hadoopists take a big data step forward in terms of ETL ease of development and performance.”
I do wonder about costs, because I know Syncsort focuses on the large enterprise market rather than the mid-size market. But, as always with tech vendors, that’s more of a conversation than a data point.