Ever wonder why Facebook and Yahoo are among the most prominent users of Hadoop, the open source framework for processing and storing large amounts of data? True, both companies deal with formidable amounts of data. But they also have formidable development resources and are thus well-positioned to take advantage of Hadoop. Even as the hype for Hadoop grows, vendors are just now beginning to provide tools to make Hadoop more accessible to more organizations.
Linking Hadoop to business intelligence systems seems like a no-brainer. After all, analyzing data is how most companies derive business value from it. So it's not surprising that the past few weeks have seen a number of announcements concerning vendors tweaking their BI software to work with Hadoop.
Last week BI software provider Informatica teamed with Cloudera, which offers a data platform based on Hadoop. According to a PC World article, users will be able to move data in and out of Hadoop through Informatica's graphical environment, can deploy Cloudera's Sqoop tool to issue SQL commands against Hadoop data, and can also use data mappings developed within Informatica against Hadoop instances, using a combination of MapReduce functions and Hadoop User Defined Functions.
IBM is also putting Hadoop at the center of its InfoSphere BigInsights platform, which runs either on-premise or in an IBM commercial development test cloud. According to CompterWeekly.com, IBM says employing the cloud environment makes it easier for an organization to experiment with analyzing so-called Big Data and determining logical business applications for it before bringing it in-house.
Last month, Pentaho rolled out BI and data integration tools for Hadoop. According to internetnews.com, Pentaho's zero-programming graphical design environment helps companies manage how data is moved into and out of Hadoop, execute and schedule Hadoop tasks in the context of existing ETL and BI workflows, and design and execute massively scalable ETL jobs in Hadoop using more than 200 out-of-the-box ETL steps. Pentaho BI Suite for Hadoop includes PDI for Hadoop. Users can perform production, operational and batch reporting against the full set of data in Hadoop using Hadoop's Hive data warehouse infrastructure, and ad hoc reporting can be performed against data in Hadoop by folks with no knowledge of Hadoop or SQL, Pentaho says. Users can also create data marts for interactive analysis and dashboarding in minutes using Pentaho Agile BI.
The story quotes Enterprise Management Associates analyst Shawn Rogers, who says the Pentaho tools will be useful for senior level architects at organizations with Big Data initiatives "or even for a DBA or ETL guy trying to get into Hadoop."