Setting up a Hadoop cluster on a physical machine is not one of the easiest jobs an IT administrator is ever likely to encounter, but setting one up on a virtual machine just might be.
VMware today announced that it has launched an open source Project Serengeti effort that is designed to allow an IT administrator to configure Hadoop on a vSphere in less than 10 minutes. According to Fausto Ibarra, senior director of product management for cloud application platform at VMware, Project Serengeti extends the automated provisioning capabilities that VMware developed for vSphere out to instances of Hadoop that are now virtual machine-aware thanks to code that VMware contributed to the Apache Hadoop project.
VMware also announced updates to Spring for Apache Hadoop, an open source project that makes it easy for enterprise developers to build distributed processing solutions with Apache Hadoop. These updates allow Spring developers to invoke the HBase in-memory database, the Cascading library, and Hadoop security capabilities that are based on Kerberos authentication. Spring for Apache Hadoop is designed to allow developers to build Hadoop applications without having to master complex Hadoop conventions such as the MapReduce interface, say Ibarra.
One of the major issues holding back adoption of Hadoop has been a shortage of IT administrators who know how to configure it and a scarcity of developers who know how to build applications for the platform. Ibarra says VMware is moving to address both issues as part of an overall effort to make Hadoop a first-class virtual citizen in the enterprise that developers can easily write applications for using an existing Java application development framework.
To what degree those enterprise organizations will adopt Hadoop remains to be seen, but given all the interest in Big Data these days, it's almost certain that the number of Hadoop pilot projects running on top of VMware is about to rapidly increase.