Getting Real About Big Data: The Disciplines

Loraine Lawson
Slide Show

Eight Big-Name Options for Big Data

Big Data is still in its toddler, maybe its preschool, years. As the mother of a toddler, I can tell you that, yes, while they are amazing creatures compared to babies, with advanced capabilities like walking and climbing, they think they can do WAAAAY more than they actually can and their movements are completely devoid of grace and finesse.


"Big Data solutions are vast, fast growing, unstructured, and too new to have the measures of discipline around them," writes Rajeev Rawat, the CEO and founder of BI Results. "Big Data is innovative, but lacks the ability to scale in the areas of education, community, documentation, procedures, and security. Standardization and controls for policies are not in place yet."


So if you're planning on moving ahead with Big Data, remember that you're toddling into unmapped territory.


That said, putting the right pieces in place can help. I call this the "Big Data Support System": the disciplines and tools that experts say are critical support beams in building a Big Data program.


Let's look at the disciplines first.


Support discipline #1: Data Quality


Most data experts will tell you the first step to managing Big Data is to get your data quality "house." With Big Data, it seems there's almost always a catch, and data quality is no exception.


Big Data is unstructured and it comes from nontraditional sources such as the Web, which makes it dirtier than your usual enterprise data. Much dirtier. Organizations will have to adapt to dealing with this dirty data, Andy Bechtolsheim, co-founder of Arista Networks and Sun Microsystems, advised at the recent High Performance Computing Linux for Wall Street conference.


One approach you can take is to deal with the quality issues within your data integration tool, as you're moving the data in and out of the clusters, suggests David Linthicum, who writes about SOA, integration and other data-related issues.


(If you don't have a data quality program or it lacks support, I recommend you take a look at Information Management's recent article on building a case for data quality.)


Support Discipline #2: Data Governance


How to govern Big Data is still a discussion in progress, but there's no doubt about its importance. In addition to data quality, Sunil Soares of IBM contends Big Data governance should cover:


  • Information Lifecycle Management, so that you're not storing large quantities of useless unstructured data.
  • Metadata, to ensure you're not paying twice for datasets you've already acquired just because they had different names and were in different repositories.
  • Privacy, which becomes an issue for regulated industries dealing with social media.


How, then, is Big Data governance any different than regular, old governance? April Reeve, a business consultant in the Enterprise Information Management practice of EMC, says there are three ways Big Data governance is different:


  1. You're governing more types of data, including external and unstructured data;
  2. You'll need more sophisticated tools to access and profile the data because of the large data sets. "Big Data volumes are beyond human manageable scale and the traditional approaches of profiling and managing data primarily through observation becomes unfeasible," she warns.
  3. Metadata takes on a new importance.


Jill Dyche, a vice president of thought leadership at Dataflux, has written an analysis on how to tackle this problem, and Soares created a new metric for measuring Big Data governance.


Support discipline #3: An analytics program that includes non-IT people


In his TechTarget article, "The wrong way: Worst practices in 'big data' analytics programs," consultant Rick Sherman cautions that companies need more than software and a few IT guys to deal with Big Data. What's needed is a Big Data analytics program that includes "analytics professionals with statistical, actuarial and other sophisticated skills, which might mean new hiring for organizations that are making their first forays into advanced analytics," Sherman writes.


Next: A look at some of the supporting tools that form the foundations for Big Data.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Add Comment      Leave a comment on this blog post

Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.