Where Will You Find Hadoop Talent?

Susan Hall
Slide Show

Why the Hoopla over Hadoop?

Hadoop in nine easy-to-understand facts.

My colleague Loraine Lawson, who has written extensively about Hadoop, including interviewing founder Doug Cutting, recently posed the question to me: Where will companies find Hadoop talent?


Job aggregator Indeed.com's graph of demand for Hadoop skills shows a hockey stick, if there ever was one, since January 2010. It named Hadoop skills among its most recent top 10 list of job skill trends. Open source vendor OpenLogic ranks it No. 4 on its list of the hottest open source projects in the past year. Like most new, hot technologies, there aren't enough people with those skills to go around.


(And with apologies to rivals such as LexisNexis, the name of the set of Apache projects seems to have become generic, just as Xerox came to mean "to copy.")


I posed Loraine's question out to the Twitterverse and James Kobielus, an analyst with Forrester Research, responded:

Companies will find Hadoop talent from within. People are teaching themselves. It's open source. It's made for that.

Kobielus has mentioned that companies have yet to fully embrace Hadoop and has told SDTimes, "we're at the beginning of the maturation of this market" - basically that there's far to go yet. And since Oracle has just jumped into it with Cloudera, it'll be interesting to see what comes of that.


Kobielus is quoted in this useful Computerweek article on what exactly these skills are. Its definition:

Hadoop allows companies to store and manage far larger volumes of structured and unstructured data than can be managed affordably by today's relational database management systems.

Though it's often thought of solely in terms of Big Data - Yahoo is reported to have installed a 50,000-node Hadoop network - this IT World post makes the point that its ability to scale also allows it to effectively scale down to meet business needs.


There have been many references to two roles in Big Data - managing data and interpreting it - but the Computerweek article lists three:

  1. Data analysts or data scientists - Those who glean useful insight from the massive amounts of stored information. The skills: multivariate statistical analysis, data mining, predictive modeling, natural language processing, content analysis, text analysis and social network analysis. It also mentions experience in areas such as SAS, IBM's SPSS software and programming languages such as R. Lack of training or skills in this area, not just related to Hadoop, was seen as a major limitation on use of Big Data in a recent EMC survey.
  2. Data engineers - Those who create the data-processing jobs and build the distributed MapReduce algorithms for use by data analysts. Those with Java and C++ skills will have the edge.
  3. IT data management professionals - Those who choose, install, manage, provision and scale Hadoop clusters. It says these skills will be similar to those in traditional relational database and data warehouse environments.


However, the IT World article adds this:

Since Hadoop is Java-based, and MapReduce makes use of Java classes, a lot of the interaction is the kind where experience as a developer (and as a Java developer in particular) will be very handy. ... Hadoop, Hive, Sqoop, and other tools in the Hadoop ecosystem are controlled from the command line. ...
Hadoop-related jobs typically call for experience with large-scale, distributed systems, and a clear understanding of system design and development through scaling, performance, and scheduling. In addition to experience in Java, programmers should be hands-on and have a good background in data structures and parallel programming techniques. Cloud experience of any kind is a big plus.


At this point, though, Matt Asay, in a post at The Register, worries that lack of Hadoop talent could stunt its adoption. He tells of a London-based friend whose difficulty in finding Hadoop talent prompted him to put off that project and start a training business instead.


Indeed, Hadoop talent is overwhelmingly located in the Silicon Valley area, according a study by North Carolina State University's Institute for Advanced Analytics. The 451 Group compared that with NoSQL and other projects, noting that the NC State project looked at the geographic distribution of LinkedIn members who mentioned Hadoop skills, calling it "by no means perfect, but an insightful measure nonetheless."


Asay, who formerly worked for Ubuntu sponsor Canonical, suggests adopting one of its successful strategies:

Canonical has managed to hire a very strong team of Linux talent by paying well and letting developers work from home, whether that home is in Des Moines, Iowa or Villa Gesell, Argentina. ... For many top-quality engineers, the greatest perk of all, and one that might steer them to a Sears instead of Zynga, is the chance to stay in Canterbury, England, rather than moving to Menlo Park, California.

As for teaching yourself Hadoop, Loren Siebert, a San Francisco entrepreneur and software developer, wrote in a post on Cloudera's site:

The big challenge in my opinion is not that any one piece of the puzzle is too difficult. Any reasonably smart (or in my case stubborn) engineer can set themselves on the task of learning about a new technology once they know that it needs to be learned. The challenge with the Hadoop ecosystem is that it presents the newbie with the meta-problem of figuring out which of these tools are appropriate for their use case at all ...


My advice ... is to break down problems into a few discrete use cases and then work on ferreting out the technologies that are designed for that use case. ... Work toward putting something simple into production. Lather, rinse, and repeat.

For those looking for some outside help, you can check out the Hadoop Support wiki on the Apache site, as well as training courses through Cloudera University, MapR Academy, Hortonworks, IBM and others.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Add Comment      Leave a comment on this blog post
Jan 27, 2012 8:36 PM Flavio Villanustre Flavio Villanustre  says:


You are right on the spot indicating that data analysts, data engineers (also known as Java MapReduce software developers) and IT operations resources are in high demand, mainly due to the hype of Hadoop.

However, I believe that it is important to clarify the differences that implementing big data solutions based on the HPCC Systems platform represent.

In the case of the HPCC Systems platform, due to its concise and expressive high level ECL language, data analysts themselves can implement the data transformations and queries required for the particular big data solution, eliminating the need for the data engineers altogether. This is not only beneficial to the ROI of the project as a whole, but it also speeds up significantly the process of recruiting resources, as Java MapReduce software developers are not needed at all. In addition to this, since the HPCC Systems platform is a cohesive architecture, the amount of operational effort to implement and maintain the system is also sensibly lower, requiring a smaller number of human resources in the IT operations area.


Aug 24, 2014 8:42 PM Mike Smith Mike Smith  says:
There is a simpler solution. Hire Hadoop engineers and data scientists from Experfy (www.experfy.com), which is a Big Data Marketplace based in the Harvard Innovation Lab. Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.