In the blog post "Where's My Big Data Ninja?" at Enterprise Strategy Group, writer Julie Lockner really gets into the "images of swords and intrigue."
Nine Predictions for the Analytics Industry in 2011
The competitive gap between analytical innovators and those who do not invest in analytics will widen over the coming 12 months.
No longer is the data analyst the guy in a room where you "shove pizza boxes under the door and get data models and reports the next morning"... Now they are covert agents that specialize in unorthodox arts of data wars-waging war against the volumes of data that could not be penetrated using traditional weapons of enterprise data warehouse methods and tools. Slicing and dicing billions of rows of data in seconds, finding anomalies, trends, or clusters that would be considered a competitive advantage, turning that knowledge into corporate victory.
She then goes into the scarce supply of "Data Ninjas" with skills in the same area as those needed for these deployments:
... analytical databases such as Greenplum and AsterData, map/reduce frameworks including Hadoop, integrated with an existing enterprise data warehouse.
She seems to be talking about at least two different job functions here, though: data analysts and systems designers and integrators.
My colleague Loraine Lawson wrote after an interview with Richard Daley, CEO and founder of open source BI vendor Pentaho, that the job market is hot for those with skills in deploying Hadoop. There's similar demand for data analysts, I learned from an interview with Jack Phillips, CEO of the research firm International Institute for Analytics. A recent CIO.com article called this area one of four essential skills in IT, though it referred to it as two-fold: managing the data and interpreting the data.
Are the two functions converging? Blogger Mike Vizard wrote about a tool to help analysts use SQL to query a variety of Hadoop implementations.
What is promising is that college curricula now include frameworks such as MapReduce combining how to build and deploy multi-parallel processing /distributed computing architectures, and traditional data analytics courses (aka statistics, digital signal processing, etc.). These course descriptions reference the use of open source tools such as R and Python in a Hadoop framework.
She says vendors will have to be actively educating the market to accelerate adoption of their new platforms.
I asked Phillips about the growing ease of use with analytics tools and whether they lessen the need for training in statistics. He totally disagreed, telling me:
I would argue that it's making it more important. On the one hand, you would think it's a little like Web design, I suppose. Now you don't have to know any HTML and you can just put up a Web page, seemingly without any real effort or background or education. In analytics, I think that is not the case. The software certainly is doing the hard number crunching. It is doing the data synthesis, but it then forces the user to then be that much more creative and knowledgeable, really about the questions that need to be answered and how best to answer those questions-which data sets to bring together, which data sets to correlate. It's just the opposite of what you might think.
As to whether these statistical whizzes can configure the systems, maybe this new generation can.