Won't Somebody Think of the Data in Big Data?

Share it on Twitter  
Share it on Facebook  
Share it on Linked in  
Slide Show

The Business Impact of Big Data

Many business executives want more information than ever, even though they're already drowning in it.

Big Data is a big deal these days.


You can't swing a mouse without running into an article exploring Big Data and its ramifications for:


That's certainly exciting, but if organizations are going to tap into Big Data's potential, they're first going to have to think of the data and, inevitably, data integration.


We're going to need a different way of approaching not just data, but computer system design in general if we want to handle Big Data well, argues Rajive Joshi, an expert in high-performance, real-time distributed systems.


And this isn't just about Big Data-it's also about adapting technology systems to a world where data needs to be delivered to multiple types of devices and applications in real time and often across wireless networks.


In a recent InformationWeek article, Joshi says this new tech order will require IT to recognize that data is the key element of systems, and therefore, design should be data-centric:

The key to data-centric design is to separate data from behavior. The data and data-transfer contracts then become the primary organizing constructs. With carefully controlled data relationships and timing, the system can then be built from independent components with loosely coupled behaviors. Data changes drive the interactions between components, not vice versa as in traditional or object-oriented design.

He points out that this approach enables "integration of distributed systems from components."


It seems to me this will be no small change for IT shops, which are traditionally project-driven and application-focused.


Those of you familiar with SOA's approach to building applications will recognize a lot of the tactics used in data-driven design: Separate the data from the behavior; avoid tight coupling, which includes any component-specific state or behavior; use a standards-based approach. It even calls for a data bus to orchestrate movement of the data through systems-much like SOA often relies on an enterprise service bus for moving services.


Joshi identifies four basic principles of data-centric design:


  1. Expose the data and metadata.
  2. Hide the behavior, meaning "any direct references to operations or code of the component interfaces."
  3. Delegate data handling to a data bus, which also enforces quality of service contracts.
  4. Explicitly define data-handling contracts for quality of service; these contracts are specified by the application and enforced by the data bus.


This design approach obviously isn't required to do Big Data analysis. I doubt, for instance, Pete Warden worried about data-centric design when he scraped data from 500 million Web pages and processed it on $100-worth of rented Amazon EC2 power.


But Joshi is taking a long-term view of what organizations will need to do if they want to use data in a flexible, fast and yes, big, way.