Silos, Standards and the Big Secret of Big Data

Loraine Lawson
Slide Show

Big Data Analytics

The first steps toward achieving a lasting competitive edge with Big Data analytics.

You want to know the big secret to integrating Big Data?


Metadata. It's all about the metadata, Krish Krishnan, president of the analyst consulting company Sixth Sense Advisors Inc., and an expert in high-performance data warehousing solutions, said during a recent Dataversity webinar, "Integrating Big Data Technologies."


Whatever tool you use, whatever technology you try, whatever technique you apply, metadata will be the key to the castle with Big Data, according to Krishnan.


He's not the only one saying so. In a recent in-depth look at Big Data published by GCN, Gartner analyst Anne Lapkin explained that Hadoop and other Big Data analysis tools can tackle large amounts of data because they focus on manipulating the metadata, rather than the data itself.


She also foresees that this will cause problems, given the way people are writing MapReduce queries against Hadoop. Basically, they're embedding the metadata within the analytical code, and that will eventually lead to the dreaded "S" word, Lapkin told GCN:

So you can't take that information, for example, and easily integrated with other information another system. People who are doing Hadoop/MapReduce implementations are building themselves a new little set of data silos to replace the ones that they had previously.

Great. Because that's what companies need: big silos for Big Data.


The article points out that there is a solution, that other dreaded "S" word: standards. Program Director Marion Royal, who leads the federal employees and contractors working on the site, foresaw the metadata problem early.


The site now has more than 400,000 data sets from 172 groups - so you can see how metadata could've been a big issue. Royal's team took a proactive approach by setting up a "metadata template," which they made organizations use before they submitted data sets to


Royal said open standards would help make it easier to share data.


Managing the metadata takes on even more significance when it comes to integrating Big Data, but there's another consideration: Metadata is key to governing Big Data as well, according to David Corrigan, director of strategy for IBM's InfoSphere portfolio.


"The integration challenge would be as you take the information from various sources, then what you want to do is attach metadata through that governance process to understand where they came from," Corrigan said in an ITBE interview last year. "Is it a trusted source? Are there any rules around who may see that particular source?"


Metadata is a pretty techie issue - just try explaining it to a non-techie, or for that matter, getting techies to agree on how you're defining it. And of course it's not just an integration issue with Big Data.


However, what it's important to realize is that metadata will take on an even greater significance when you're trying to bring order and meaning to huge volumes of largely unstructured data.


How significant? Without knowing your metadata, Krishnan said, your integration will fail.

Add Comment      Leave a comment on this blog post

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.