As part of an effort to ensure compatibility across multiple distributions of Hadoop, ODPi, a non-profit consortium led by many application providers, today released the ODPi Runtime Specification and associated test suite.
John Mertic, director of ODPi, says the goal is to give IT organizations a mechanism for making sure that various distributions of Hadoop don’t wind up introducing dependencies that lock them into a particular Hadoop implementation. Backed by Hortonworks, IBM, GE and the Pivotal arm of EMC, Mertic says ODPi is designed to complement the effort of the Apache Software Foundation that is charged with developing the raw bits that make up Hadoop and related open source projects such as YARN and HDFS.
While not every vendor agrees that ODPi is necessary, how and where distributors of Hadoop add value is an ongoing industry debate. Based on version 2.7 of Apache Hadoop, the ODPi Runtime Specification is part of the ODPi Core reference specification. At present, over 25 vendors and IT service providers are supporting ODPi.
The specification itself is not designed to limit Hadoop innovation as much as provide a structured way to introduce those innovations without locking customers into a particular distribution. To that end, the published specification also includes rules and guidelines on how to incorporate additional, non-breaking features, which are allowed, provided source code is made available via the relevant Apache community.
ODPi is also working on specification of installation and management tools. For example, an Operations Specification covers Apache Ambari, the ASF project for provisioning, managing and monitoring Apache Hadoop clusters.
It may be too early to say to what degree any provider of a distribution of Hadoop is trying to lock in customers. But given the history of enterprise IT, many IT organizations would most likely prefer to be safe than sorry.