LinkedIn Makes Hadoop Tools Available as Open Source Project

Mike Vizard
Slide Show

Six Mistakes that Lead to Poor Enterprise Software Adoption

When deploying Hadoop in a production environment, IT organizations often struggle with building applications that run across a Hadoop cluster. To solve that problem, the developers at LinkedIn created Gradle, a set of workflow tools that make it easier to connect multiple Hadoop jobs within the context of an application.

This week, LinkedIn announced that it is turning Gradle into an open source project. Alex Bain, senior software engineer for LinkedIn, says that LinkedIn has a vested interest in making Gradle, a plug-in to Hadoop, a bigger part of a rapidly growing Hadoop ecosystem. For example, as the Apache Spark in-memory computing project continues to evolve, Bain says that LinkedIn would like to see open source contributions that extended the reach of Gradle to both Hadoop and Spark.

At the core of Gradle is a domain-specific language called Hadoop DSL, which LinkedIn created to make Hadoop more accessible to its developers who need to work with Hadoop workflow managers such as Azkaban and Apache Oozie. Hadoop DSL is written in Groovy, a language derivative of Java, which provides developers with a consistent method of invoking multiple application development frameworks running on top of Hadoop.

Bain says LinkedIn relies heavily on Hadoop to surface data that’s relevant to its community in real time. LinkedIn can only do that, says Bain, because it found a way to unify all the Hadoop frameworks that developers need to invoke in order to create a production application.

LinkedIn, of course, is no stranger to the open source community. It has previously launched projects such as the Galene search engine, Pinot real-time analytics software, and Burrow monitoring tools for monitoring the Kafka messaging system, which is often deployed on top of Hadoop. In all four cases, IT organizations that choose make use of that software are relying on open source software that is core to how LinkedIn operates.

When it comes to Hadoop, many IT organizations are understandably intimidated by the dizzying array of frameworks that can be used to build Big Data applications. Over time, IT organizations working with Hadoop are going to be working with multiple instances of those frameworks. In the case of Gradle, LinkedIn is providing a convenient place from which to get started building Hadoop applications, and also to ultimately master those frameworks.

Add Comment      Leave a comment on this blog post
Aug 14, 2015 12:23 PM Rene Groeschke Rene Groeschke  says:
Hello, Gradle is an open source multipurpose build system (see What linkedin is really open sourcing here, is just a plugin for this gradle buildsystem that provides a nice DSL for dealing with hadoop. cheers, René Reply
Aug 17, 2015 5:51 AM Henri Henri  says:
I'm really sorry but this article is plainly wrong. Gradle isn't a build tool developed by LinkedIn. It has nothing to do with Hadoop. And it has always been open source. Groovy is a programming language. It has nothing specific for Hadoop either. Gradle is implemented using Groovy. LinkedIn has open sourced a Gradle PLUGIN for Hadoop. Which includes a DSL. Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.