Pentaho's Unique Integration Hadoop Play

Share it on Twitter  
Share it on Facebook  
Share it on Linked in  
Slide Show

Top Ten Best Practices for Data Integration

Use these guidelines to help you achieve more modern, high-value and diverse uses of DI tools and techniques.

By now we know that every integration and BI vendor in the place is hustling to connect with Hadoop.


But there are some interesting plays emerging from the Big Data hype cycle.


One example: Pentaho recently moved its free open source ETL tool - the community edition of Pentaho Kettle - from the LGPL to the Apache license. And then it went one step further and made its paid, commercial solution, the Pentaho Data Integration tool, a free tool under the Apache license.


Hadoop and its stack are Apache projects, and obviously use the Apache licensing model. GigaOm points out that the Apache license is actually more liberal, allowing you to put Apache software into a product and "distribute it under any other open-source license as long as the embedded Apache-licensed code is unadulterated ..." By contrast, LGPL doesn't allow a developer to "distribute a derivative work under a less restrictive license," according to the article.


"In order to obtain broader market adoption of big data technology including Hadoop and NoSQL, Pentaho is open sourcing its data integration product under the free Apache license," Matt Casters, founder and chief architect for the Kettle Project, is quoted in the press release as saying. "This will foster success and productivity for developers, analysts and data scientists giving them one tool for data integration and access to discovery and visualization."


The chief geek, James Dixon, points out that it will also add graphical design and visualization tools to Hadoop.


Right. Call me cynical, but somehow I don't think this is just about helping Hadoop users. It looks to me like a smart move to spread adoption of Pentaho, effectively giving the BI tool a piggy-back ride on Hadoop's massive marketing power.


But savvy marketing tangos aside, Pentaho is receiving props from the analysts for its integration with Hadoop. In its most recent Forrester Wave: Enterprise Hadoop Solutions (available for free download), the research company lists Pentaho as a "strong performer" - just below the "leader" category - and gives its integration solution major props:

Pentaho, an established open source data analytics solution vendor calls Hadoop-based extract, transform, and load (ETL) jobs from its Pentaho Data Integration (PDI) 4.2 product. It has the richest functionality and most extensive integration with open source Apache Hadoop among the data integration vendors that have added Hadoop functionality to their products over the past year.

EMC Greenplum, IBM, Amazon, Cloudera, MapR and Hortonworks all ranked as leaders among the enterprise Hadoop solutions.