Why Big Data May Speed Adoption of DI Tools

Slide Show

Eight Big-Name Options for Big Data

The biggest competition for any integration vendor is hand-coding, although in recent years, TDWI has seen signs that organizations are finally shifting to ETL tools. This incredibly slow transition process may get a boost from Big Data. While hand-coding is still the main option for connecting with Hadoop stores and other Big Data solutions, the bottom line seems to be that developers with the needed skills are in short supply - which seems to me to make it more likely that, as organizations embrace Big Data, there will be a corresponding movement to replace hand-coding in general with more automated solutions.


Richard Daley, Pentaho's founder and chief strategy officer, recently discussed the problem of hand-coding and Big Data.


"Most of the people who have Hadoop and NoSQL are relatively technical staff right now," Daley said. "How do we get the adoption rate up? How this market matures is bringing tools like Pentaho Kettle in front of the non-Java developer, right, to get things going much quicker."


Pentaho's taking an aggressive stance to secure its position in the Big Data space, including writing native connectors to Hadoop, Hive and Cassandra, which, as Daley explains, allows their connectors to provide high-performance integration. Its ETL tool, Kettle, can act as a data-as-a-service layer to speed up real-time data, he says, and if you couple Kettle with Pentaho Analytics, you can immediately start running analysis in the visualization tool without further mapping. It's all designed to tightly integrate.


But perhaps the more unusual move that Pentaho's made is to switch both its already-open-source ETL tool, Kettle, and its commercial solution, the Pentaho Data Integration tool, to the Apache license.


Daley said moving to the Apache license gives the company a distribution and development advantage.


"The ability to have a community contribute to extend the product is going to be much faster or much greater than we've seen in the history of our company. So development is definitely there, then there's distribution," he said. "Having an open source gets you wider distribution. Why? It's free, for one thing. But that distribution eventually is going to lead to some monetization.


"The fact is that a good percentage of people who start to use the free product in open source, which is Pentaho Kettle, a percentage of those folks will come up and buy our enterprise edition analytical platform. So it's an on-ramp."


One point of proof: Last month, DataStax became the first NoSQL database provider to integrate all features and components of Pentaho Kettle, "instantly weaving Cassandra into the broader fabric of relational databases, analytic databases, Hadoop, and other NoSQL databases," according to a press release.


While opening up Big Data is a major goal for vendors like Pentaho, what may impact organizations more is the opportunity to finally abandon hand-coding and embrace data integration tools and platforms. Daley shared one example of what a difference using a tool can make with Big Data.


"A major Wall Street financial institution had a consulting company that was in there for three months trying to get data in and out of Hadoop," he said. "They were writing all kinds of Java code and scripts, trying to create their own schedulers and it got to be a little bit of an overkill/mess. We got involved and it was about a two week project for us to literally reproduce what a team of consultants had taken a couple months to do."