As data becomes more fungible, that is, less engaged with the physical infrastructure that supports higher level virtual and cloud architectures, the overall data environment starts to exhibit new characteristics, some of which will dramatically alter the way in which those environments are built and operated.
Of late, the concept of data gravity has been showing up in tech conferences and discussion groups. Coined by VMware’s Dave McCrory about four years ago, it describes the way data behaves in highly distributed architectures. Rather than becoming evenly distributed across a flattened fabric, data tends to collect in pockets, with smaller bits of data gravitating toward larger sets the same way that particles coalesced into galaxies after the Big Bang. Part of this is due to the nature of distributed architectures where the farther away storage is from processing centers and endpoints, the greater the cost, complexity and latency. But it is also a function of the data itself, particularly now that all information must be “contextualized” with reams of metadata for it to be useful.
You can already see data gravity occurring in the cloud, says Red Hat’s Joe Brockmeier. Notice how easy it is to put data into Amazon, Box and Dropbox, but much more difficult, and expensive, to get it out? Any CMS migration is difficult, and many organizations find that the service offerings on the cloud are just as useful, if not more so, than the ones in-house, so cloud instances can quickly become highly dense. And greater density, of course, increases gravitational pull, even in cyberspace.
But don’t get the idea that data gravity will have only negative consequences to the utility of distributed data environments. Indeed, as many in the analytics industry are starting to notice, data gravity can be leveraged to gain more meaningful insight for key decision makers in an organization. As Dell’s Joanna Schloss explains, not all analyses are created equal, and not everyone values data sets in the same way. Analytic data gravity, then, is the ability to give the proper “weight” to a data set, usually by adding key metadata such as historical trends and results from related data activity, and then applying key governance and business intelligence capabilities to ensure the most important information is delivered to those who need it.
Still, the trouble with most analytics platforms these days is that they rely on large, centralized engines that can become gravity wells of their own. As Steve Kearns of aptly-named analytics firm DataGravity points out, you start with a standard Hadoop cluster and the next thing you know you have a huge conglomeration of downloaded apps and support packages that require ever increasing resources, both human and technical. Pushing analytics to the end user, something I’m confident DataGravity is working on, has the potential to provide for faster analysis turnaround and the ability to drill down into highly specific data sets, while at the same time avoiding the mass accumulation of data that can hamper productivity.
This is leading to a very different vision of the data future than what many people have when they hear talk of “flattened virtual fabrics” and fully federated data environments. Data architectures will most certainly be built on fabric architectures, particularly now that the entire stack can be fully software-defined. But that fabric will not be flat like starched linen. Rather, it will be lumpy, like Grandma’s old quilt.
The challenge ahead will be to alleviate pockets of gravity where they are a hindrance, and support those that prove valuable.