Just as data scientist is becoming one of the up-and-coming job descriptions in Big Data, so, too, are the roles of Big Data architects and engineers. My colleague Loraine Lawson wrote yesterday about the people required for Big Data implementations,starting with the CIO. It seems, though, that the various roles are still evolving.
There seem to be more job postings for Big Data engineers than Big Data architects, though a post at Silicon Angle points out that at least for the engineers, companies seem to want the world:
Employers are looking for people who know MapReduce, Hadoop and related frameworks such as HBase, Pig and Hive. Programming languages in demand include: Java, Ruby, or C++. It really covers the gamut, which is part of the issue with using Big Data in a job description. Do you have MongoDB expertise? Yes, that's applicable. Practical hands-on experience with Bayesian models and neural networks? Yes, the job may be right for you.
Any wonder that companies say they can't find Big Data talent? It lists the requirements from a couple of ads:
- Design, develop and support a MapReduce-based data-aggregation pipeline for processing billions of events a day
- Support data-mining and machine-learning algorithms using behavioral data
- Study state-of-the-art techniques in massively parallel frameworks and apply them to advertising problems
- Help other engineers get the most out of the platform you own
- Experience with Lisp and/or Clojure (functional programming languages)
- Experience with large-scale machine learning techniques (examples: Google PageRank, Netflix Prize, genome sequence assembly, computational finance)
- Experience with Amazon Web Services (EC2, S3, SQS, etc.)
- Deep knowledge of the Hadoop ecosystem
- Git version control
- Frequent contributor to open source projects (show us your work on GitHub!)
There's not only a shortage of analytics talent for Big Data, but engineering talent as well. Andy Mendelsohn, Oracle senior vice president of database server technologies, talked about that in an article at businesscloud9:
... [Hadoop] is a development platform for very sophisticated Java developers to build parallel applications, so one of the big problems around Hadoop is a skill-set problem. I talk to customers all the time, and they just don't have developers who know how to write these MapReduce programs. And so one of the big challenges of Hadoop is sort of to raise the level of discourse around Hadoop so you don't have to have rocket-scientist Java parallel programming developers, but you can code at a higher level.
So it appears the staff shortage will continue until vendors make it easier or more people become trained. In the meantime, companies would do well to stop looking for a purple squirrel and really zero in on the essential skills they need.