For much of this last week, I've been looking at a report titled, "IDC Digital Universe White Paper" (an update of last year's report), funded by EMC and created by IDC, looking at the growth of data. It corrects a mistake in last year's report, which understated significantly the growth of data. Much of this understatement is due to the identification of "shadow" data, or data captured by the massive number of sensors and digital video cameras proliferating at an incredible pace around the world.
This shadow data, according to IDC, now accounts for more than half of the data collected, and it is accelerating like an out-of-control train on a near-vertical drop. Let's talk about what shadow data is and what it means for your privacy and storage systems.
Deja Vu All over Again
In the movie Deja vu, Denzel Washington plays a detective who has access to a system that supposedly can look collectively through every piece of surveillance equipment and weave together images of past events. In what is science following science fiction, according to this IDC report, such data is growing at a massive rate. Naturally, one of the areas undergoing massive development is making sense of all this sensor information.
Applications that can identify and mine this content to create visual links between this information and past events seem to be on a rapid ramp-up. A number of labs are working on making sense of this mess.
In terms of privacy, the reality is that somebody is on camera almost every minute of almost every day. This is leading to a data explosion that most of us likely can't even imagine. According to IDC, 85 percent of this information is under control of the enterprise, which is responsible for its "security, privacy, and reliability."
This certainly has implications in terms of responsibility and liability in areas ranging from safety to employee conduct to security. Yet according to this report, getting a handle on this information is increasingly becoming a higher priority.
From a personal basis, this also means that you are likely being recorded much of the time. Even if no one is doing anything with the information now, they are putting it into archives where it can be mined later. Thus, coupled with the tracks you are leaving in the digital world, an apparently very clear picture of you (which also could be inaccurate) could likely be built around your activities months, or even decades, later. Suddenly I'm not thinking such warm and fuzzy thoughts about Google.
Media over All
What is also interesting in the report is the massive growth of digital media. This segment is accounting for 10 times the amount of storage that its share of the digital universe currently represents. With that kind of growth, the management, containment, security and delivery of these massive files will have to engage an ever-larger portion of the networking infrastructure. I don't even want to think what this is doing to network traffic, but I expect we are all seeing more of the related impact daily.
It isn't just the number of files. As resolutions increase, the files themselves are getting bigger at an increasing rate. Just moving from regular definition to high definition is like the reverse of "America's Biggest Loser," multiplied by a thousand and taken to near infinity.
This takes me back to the daunting, and often pointless, efforts in which the RIAA and MPAA are currently engaged to contain what appears to be overstated piracy. These efforts should likely shift to bringing people into compliance and finding a way to broadly license the media before they lose all control of it. With this kind of growth, if it can be copied digitally it likely will be, and if it isn't being stolen, it is being bypassed. The respective movie and music industries are losing their shorts regardless.
Maybe it is time to stop going to war with customers and find a way to better embrace them, because, based on these numbers, they are losing the war.
What is also interesting is the diversity of data. We mention the gigabyte files represented by movie and TV shows proliferating on the Web, but tiny 128-bit signals from RFID tags are proliferating at a massive rate as well. While this doesn't yet create capacity problems, the sheer number of these small files (estimated to be 20 quadrillion by 2011) becomes potentially a bigger problem than the big media files. To put this in perspective, according to the report, this type of file is growing 50 percent faster than the large multimedia files.
This becomes a huge tracking problem because these files represent physical objects from devices we need to track as assets, to inventory in all stages of the supply chain. This takes us from a capacity nightmare to an audit nightmare, and there is nothing scarier than a pissed-off auditor. I should know, as I used to be one.
To net out the recommendations, it comes down to the fact that the light at the end of the tunnel is a train and you'd better start planning for it because you sure as hell won't be able to outrun it.
Particularly if you are starting to see your storage requirements ramp significantly, you'll need to make sure you have already thought through key policies like retention, ownership and disposal. And you'll need to do this before someone shows up wanting the data and either holding your butt over the fire because you lost it, or frying your butt because you didn't have a policy to delete or adequately protect it.
IDC concludes that it is a requirement to get your arms around this data and find ways to profit from it, either by simply making better decisions or, in the case of the RIAA and MPAA, finding ways to profit from providing it. Yet the firm downplays the other side of that coin. If we don't get our arms around this data, it will be used against us, either corporately or individually. It is our responsibility to ensure that doesn't happen. Technology won't fix this without knowledgeable people to intelligently anticipate the problems and find creative ways to mitigate them.
Wrapping Up: Ride the Wave or Be a Statistic
In short, the report forecasts the equivalent of a data tsunami; we can either get our boards ready to ride the wave or get mowed down by it. Both paths are exciting, and with work, we can survive the former.