Why Data Integration Is Core to Big Data and Social Web Apps

Loraine Lawson
Slide Show

The Business Impact of Big Data

Many business executives want more information than ever, even though they're already drowning in it.

If you want to use social networks as a source of information, do yourself a favor now and make sure your developers are thinking about data integration.


That's the advice of Robert Diana, a software engineer, database and Web developer, not to mention a self-proclaimed "data geek" who blogs at Regular Geek.


It turns out, Diana and I saw the same post about an active (through May 18) survey, in which respondents were asked "What are the main application areas of controlled vocabularies from your perspective?" In case you don't know-and I didn't-controlled vocabularies are a way of organizing information for retrieval, according to Wikipedia. Controlled vocabularies are used for things like taxonomies, which means, of course, they're a major consideration for semantic technology.


The post writer was surprised by the survey results thus far, which show that "data integration" is the leading area for using controlled vocabularies. In response, Diana argues that integration is very likely to be a core task, particularly when you're building social Web applications.


As an example, he looks at how you can avoid replicating your work and creating bloated applications simply by treating status updates from Facebook, Twitter and LinkedIn as one type of data, even though they are three separate applications:

Does it always make sense to store each of these in their own table/shard/partition? What benefits do you gain from having separate storage for each of these status updates when they tend to converge on the same functionality over time? This does not sound like a large issue, but it has the potential to become one, especially if it is not handled well.

According to Diana, this can be a major issue when you start to handle Big Data. He writes:

If you do not design your data storage like you would design your application, you will not be able to use your data effectively. If you can not use your data effectively, you may lose a big opportunity.

Diana's post reinforces much of what I shared last week about shifting from an application-centric development to a data-centric design philosophy. In that post, I shared the four tenets of data-centric design as outlined by Rajive Joshi, an expert in high-performance, real-time distributed systems. One of Joshi's requirements is to "hide the behavior," by which he means to eliminate "any direct references to operations or code of the component interfaces." I believe that is what Diana is getting at when he says you don't have to store status updates in data silos based on the social network.


Once again, the takeaway is clear: Data integration isn't just part of a project anymore. It's a core discipline that needs to be considered from the start if you're going to make the most use of your data without having to constantly recode the hub (so to speak).

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Add Comment      Leave a comment on this blog post
Apr 19, 2011 11:24 AM Derek Kol Derek Kol  says:

Interested in a 'Big Data' customer story?

Coraid Deployed the Queplix Virtual Data Manager and is willing to talk to you about their experience.

If you can spare 30 minutes, I'll set up a short briefing with Coraid and they can fill you in on the details.


Derek Kol

Ventana PR


Apr 21, 2011 3:20 PM datagwal datagwal  says:

Loved your blog. are there other sources related to this topic? would love to learn more.


Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.