Despite the press Hadoop’s received in the past year, there’s still a lot of confusion about what it is and what it does, according to a recent TDWI report.
Last fall, TDWI ran an email survey about the Hadoop ecosystem, and of the 263 complete responses, 26 said they did not understand Hadoop. The survey also revealed only 18 percent had any experience with deploying or using Hadoop.
“Best Practices Integrating Hadoop into BI/DW” is TDWI’s response to the questions and concerns IT professionals have about Hadoop. As the title suggests, it provides a list of 10 best practice priorities for integrating Hadoop with BI systems, data warehouses and other analytics stores.
But it goes beyond that to bust the myths organizations have about Hadoop, provide an explanation of the Hadoop ecosystem, and provide a list of vendors and what they’re doing to support Hadoop.
In short, it’s an absolute must read and starting point for anyone considering or implementing Hadoop.
One particular myth the report attacks is that Hadoop will replace enterprise data warehouses. In fact, 46 percent of Hadoop adopters have integrated it with their analytics tools and data warehouses.
While machines (52 percent) and sensors (54) top the list of technologies integrated with Hadoop, other top areas of integration, in order of current adoption practice, include reporting tools, Web servers, data integration tools, analytic databases and data visualization tools.
In terms of future Hadoop integration, more organizations are looking at data management technologies, with 52 percent planning to integrate Hadoop with data quality tools and master data management. Fifty-percent also plan to integrate with third-party data providers.
TDWI also offered good news about one of the most cited barriers to Hadoop adoption: Hadoop staffing and skills:
The challenge with HDFS and Hadoop tools is that, in their current state, they demand a fair amount of hand coding in languages that the average BI professional does not know well, namely Java, R, and Hive. However, this is not a showstopper; TDWI has seen a number of BI/DW teams successfully acquire the skills and staffing needed for Hadoop. As more and better development tools for Hadoop arrive from vendors and the open source community, the current excess of hand coding will give way to high-level automated approaches to BI, analytics, and data management development for Hadoop.
One mistake TDWI says organizations are making with Hadoop skills is to focus on application developers. Instead, the TDWI recommends companies focus on training BI professionals.
Another key takeaway: Though most organizations will develop Hadoop as a silo in order to “try it out,” TDWI recommends you make integrating Hadoop with BI, DW, data integration and other analytics systems a second-phase priority.
“After all, the goal is to integrate Hadoop with your well-integrated BI/DW environment, not proliferate twenty-first-century spreadmarts,” the report notes. “To make the integration happen, look for products (both open source and vendor built) that enable the integration points discussed elsewhere in this report.”
The 36-page report is available for free download, with the usual site registration information. There’s also a webinar featuring Philip Russom, TDWI research director for data management and author of the report.