A data integration sandbox and integrating with Hadoop data through vendor products are among the best practices emerging for use with the Hadoop ecosystem, according to a recent TDWI report.
TDWI Data Management Research Director Philip Russom wrote the report on the heels of a survey (registration required to download) that found 63 percent of organizations expect to deploy Hadoop Distributed File System within three years. Right now, adoption hovers around 10 percent, so if organizations follow through, it would require a significant effort to ramp up to that 63 percent.
But Russom says it will happen, thanks in part to these trends:
“Most of the organizations adopting Hadoop are completely new to it, so they need to educate themselves quickly about emerging best practices,” Russom states in the best practices report. “The checklist of best practices presented here can help users make sustainable decisions as they plan their first Hadoop deployments.”
One of the challenges with Hadoop is finding someone who can write code to process data using MapReduce. This was a major barrier to adoption until recently, but three changes are making it possible to reduce MapReduce coding:
What I love about this report is that it really does fill in those gaps you might have when you think about deploying Hadoop. For instance, it outlines best practices for extending your data warehouse architecture with Hadoop. Among the recommendations:
If you’re among that 63 percent planning a Hadoop deployment, you’ll find this 10-page checklist of the Eight Hadoop Best Practices an essential resource. It’s available for free download with basic user registration. There’s also a webinar with Philip Russom discussing the report.