Big Data is still in its toddler, maybe its preschool, years. As the mother of a toddler, I can tell you that, yes, while they are amazing creatures compared to babies, with advanced capabilities like walking and climbing, they think they can do WAAAAY more than they actually can and their movements are completely devoid of grace and finesse.
"Big Data solutions are vast, fast growing, unstructured, and too new to have the measures of discipline around them," writes Rajeev Rawat, the CEO and founder of BI Results. "Big Data is innovative, but lacks the ability to scale in the areas of education, community, documentation, procedures, and security. Standardization and controls for policies are not in place yet."
So if you're planning on moving ahead with Big Data, remember that you're toddling into unmapped territory.
That said, putting the right pieces in place can help. I call this the "Big Data Support System": the disciplines and tools that experts say are critical support beams in building a Big Data program.
Let's look at the disciplines first.
Support discipline #1: Data Quality
Most data experts will tell you the first step to managing Big Data is to get your data quality "house." With Big Data, it seems there's almost always a catch, and data quality is no exception.
Big Data is unstructured and it comes from nontraditional sources such as the Web, which makes it dirtier than your usual enterprise data. Much dirtier. Organizations will have to adapt to dealing with this dirty data, Andy Bechtolsheim, co-founder of Arista Networks and Sun Microsystems, advised at the recent High Performance Computing Linux for Wall Street conference.
One approach you can take is to deal with the quality issues within your data integration tool, as you're moving the data in and out of the clusters, suggests David Linthicum, who writes about SOA, integration and other data-related issues.
(If you don't have a data quality program or it lacks support, I recommend you take a look at Information Management's recent article on building a case for data quality.)
Support Discipline #2: Data Governance
How to govern Big Data is still a discussion in progress, but there's no doubt about its importance. In addition to data quality, Sunil Soares of IBM contends Big Data governance should cover:
How, then, is Big Data governance any different than regular, old governance? April Reeve, a business consultant in the Enterprise Information Management practice of EMC, says there are three ways Big Data governance is different:
Support discipline #3: An analytics program that includes non-IT people
In his TechTarget article, "The wrong way: Worst practices in 'big data' analytics programs," consultant Rick Sherman cautions that companies need more than software and a few IT guys to deal with Big Data. What's needed is a Big Data analytics program that includes "analytics professionals with statistical, actuarial and other sophisticated skills, which might mean new hiring for organizations that are making their first forays into advanced analytics," Sherman writes.
Next: A look at some of the supporting tools that form the foundations for Big Data.