While a lot of progress has been made in making Big Data easier to manage, the fact remains that in order for Big Data projects to really transform the way a business operates, that business will need the services of a data scientist. The only real question is whether the business should, assuming it can find one, hire a data scientist or simply opt to contract the services of one.
Given the general shortage of data scientists, contracting is probably the only the viable option. To help make it easier to find a data scientist, the Greenplum unit of EMC that specializes in Big Data technologies has inked an alliance with Kaggle, a social networking site that claims to have over 57,000 data scientist members. Under terms of the agreement, Kaggle will be loosely coupled to the open source OpenChorus Project framework for collaboratively building Big Data applications in a way that makes it easier for organizations to discover and then hire a data scientist from within a Chorus environment.
Announced today at the O’Reilly Strata Conference + Hadoop World conference, Kaggle CEO Anthony Goldbloom says the basic idea is to marry Kaggle up with the OpenChorus, which EMC Greenplum today launched to fulfill an earlier promise to make Chorus an open source project.
While most businesses would prefer to hire their own data scientist, as a practical matter most businesses need the services of a data scientist to model their Big Data projects for an analyst to make use of that construct. For all intents and purposes, that means in most cases they only need to hire a data scientist for a specified period of time.
What’s most important, says Michael Maxey, senior director of product marketing at EMC Greenplum, is making sure that first Big Data project is successful. As such, Maxey recommends that organizations make sure their first Big Data project has meaningful impact on the business without being so overly broad as to take too many years of effort to justify an actual return on the investment.
In the meantime, like most things worth doing, Big Data projects require patience. After all, one successful project can pay for the cost of the investment multiple times over, which in turn would mean that every other following project would essentially be free. The key is identifying the initial project that drives that kind of return on investment.