Holding the Data Line with Deduplication

Michael Vizard

A few months back data deduplication stopped being a separate product you buy. It's now a feature you turn on when deploy most major new storage management technologies.

And yet, because most IT organizations are hampered by tight IT budgets, most organizations have not upgraded their storage systems in the last six months. The end result is that as surveys show data growth continuing to spiral out of control; they have yet to implement a key technology that gives them their best fighting chance to keep their data storage needs from exploding. This is because even as the amount of data continues to grow, the opportunity to free up 50 percent or more of exiting storage using data deduplication is significant.

According to Jeff Rector, marketing director for Quantum, backing up the typical 100 GB dataset usually winds up consuming 1 TB of disk space over 10 days because each additional backup adds about 100 GB of storage. Using a Quantum DXi appliance, the capacity requirements are only about one-third the capacity of a conventional disk system. After that, for each block that the DXi appliance has already stored, it now stores only a pointer. And it applies its de-duplication technology to any new data sets to reduce space, making it common for each new backup to require only 1 percent or 2 percent more capacity. At the end of 10 days, instead of taking up a TB of space, the backup has used about 75GB, he said.

Data storage mileage will vary greatly with data deduplication, but the concept is pretty clear. Unfortunately, given the adoption rates of data deduplication, it's pretty clear that about half the market doesn't seem to get the concept. That's probably because there are a lot of storage hardware salespeople out there who would rather see customers continue to throw expensive storage hardware at the problem.

The other major benefit of data deduplication, notes Rector, is that it reduces the amount of data flowing around the network, which serves to ultimately improve network performance. And for all those companies thinking about solving their problems by moving storage to the cloud, just remember that those services typically charge by the amount of data being stored.

For more years than anybody wants to count, IT organizations have thrown hardware at the storage problem because it was a relatively cheap thing to do. But with IT organizations low on funds and facing a real need to reduce their power consumption in order to add more server capacity, the time has come to start prioritzing data management by throwing a little software at the problem instead.

Add Comment      Leave a comment on this blog post
Feb 26, 2010 11:19 AM Alastair Williams Alastair Williams  says:

I am interested that de-duplication is still regarded as the solution to expanding disk and tape storage capacities and the time and management overheads this entails. Given that storage growth is a direct result of data growth is it not more sensible to address the cause not the symptom and focus on archiving unused and deleting un-needed data.

Many file systems show upwards of 80% dormancy rates on data, archiving this information or deleting that which has no business value could give similar returns to implimenting de-dupe. for more thoughts please see my blog



Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.