When deploying Hadoop in a production environment, IT organizations often struggle with building applications that run across a Hadoop cluster. To solve that problem, the developers at LinkedIn created Gradle, a set of workflow tools that make it easier to connect multiple Hadoop jobs within the context of an application.
This week, LinkedIn announced that it is turning Gradle into an open source project. Alex Bain, senior software engineer for LinkedIn, says that LinkedIn has a vested interest in making Gradle, a plug-in to Hadoop, a bigger part of a rapidly growing Hadoop ecosystem. For example, as the Apache Spark in-memory computing project continues to evolve, Bain says that LinkedIn would like to see open source contributions that extended the reach of Gradle to both Hadoop and Spark.
At the core of Gradle is a domain-specific language called Hadoop DSL, which LinkedIn created to make Hadoop more accessible to its developers who need to work with Hadoop workflow managers such as Azkaban and Apache Oozie. Hadoop DSL is written in Groovy, a language derivative of Java, which provides developers with a consistent method of invoking multiple application development frameworks running on top of Hadoop.
Bain says LinkedIn relies heavily on Hadoop to surface data that’s relevant to its community in real time. LinkedIn can only do that, says Bain, because it found a way to unify all the Hadoop frameworks that developers need to invoke in order to create a production application.
LinkedIn, of course, is no stranger to the open source community. It has previously launched projects such as the Galene search engine, Pinot real-time analytics software, and Burrow monitoring tools for monitoring the Kafka messaging system, which is often deployed on top of Hadoop. In all four cases, IT organizations that choose make use of that software are relying on open source software that is core to how LinkedIn operates.
When it comes to Hadoop, many IT organizations are understandably intimidated by the dizzying array of frameworks that can be used to build Big Data applications. Over time, IT organizations working with Hadoop are going to be working with multiple instances of those frameworks. In the case of Gradle, LinkedIn is providing a convenient place from which to get started building Hadoop applications, and also to ultimately master those frameworks.