The Business Impact of Big Data
Many business executives want more information than ever, even though they're already drowning in it.
If you want to use social networks as a source of information, do yourself a favor now and make sure your developers are thinking about data integration.
It turns out, Diana and I saw the same post about an active (through May 18) survey, in which respondents were asked "What are the main application areas of controlled vocabularies from your perspective?" In case you don't know-and I didn't-controlled vocabularies are a way of organizing information for retrieval, according to Wikipedia. Controlled vocabularies are used for things like taxonomies, which means, of course, they're a major consideration for semantic technology.
As an example, he looks at how you can avoid replicating your work and creating bloated applications simply by treating status updates from Facebook, Twitter and LinkedIn as one type of data, even though they are three separate applications:
Does it always make sense to store each of these in their own table/shard/partition? What benefits do you gain from having separate storage for each of these status updates when they tend to converge on the same functionality over time? This does not sound like a large issue, but it has the potential to become one, especially if it is not handled well.
According to Diana, this can be a major issue when you start to handle Big Data. He writes:
If you do not design your data storage like you would design your application, you will not be able to use your data effectively. If you can not use your data effectively, you may lose a big opportunity.
Diana's post reinforces much of what I shared last week about shifting from an application-centric development to a data-centric design philosophy. In that post, I shared the four tenets of data-centric design as outlined by Rajive Joshi, an expert in high-performance, real-time distributed systems. One of Joshi's requirements is to "hide the behavior," by which he means to eliminate "any direct references to operations or code of the component interfaces." I believe that is what Diana is getting at when he says you don't have to store status updates in data silos based on the social network.
Once again, the takeaway is clear: Data integration isn't just part of a project anymore. It's a core discipline that needs to be considered from the start if you're going to make the most use of your data without having to constantly recode the hub (so to speak).