Dedupe in Primary Storage: Does the Risk Match the Reward?


The number and variety of deduplication processes continues to grow, and with it a growing movement to bring the technology out of the back-up realm and into primary storage arrays. But is this really such a good idea, particularly when you consider that storage performance is now just as important as capacity when evaluating your immediate data needs?


Deduplication as a primary storage tool is promoted largely by NetApp. Earlier this month, the company announced that it has expanded the dedupe capabilities of its V-Series NAS Gateway appliance to work with the leading primary storage products from EMC, HP, IBM and Hitachi Data Systems. This comes on the heels of NetApp offering dedupe on its FAS line of primary storage, as well as the NearStore secondary line.


How effective is dedupe for reducing primary data? That's hard to say since there isn't a lot of field experience. NetApp holds up the example of Burt's Bees in Durham, NC. The honey and personal care firm reports that it has reclaimed more than half of the storage tied to its VMware ESX infrastructure.


The rise of virtual environments may be a key factor in deduplicating primary storage data, according to GlassHouse Technologies CTO Jim Damoulakis. Dedupe seeks out and eliminates duplicate data blocks and files, which can be found in abundance in the virtual world. How many C: drives reside in a typical VMware cluster, he asks.


Still, the performance question could be the killer here, says Warren Smith, a StorageWorks manager at HP. What you're actually doing is inserting an invasive and performance-hampering process into your critical business infrastructure. This will take away compute cycles from high-priority processes, making your entire organization less productive. Even running the dedupe process during low activity periods can bite you when that deduped volume is accessed during regular operating hours.


NetApp is apparently aware of all this. Not only does it require customers to sign a statement saying they are aware of possible performance degradations, but they offer the ability to turn the dedupe function off if performance degrades too much.


Still, the company is confident that deduplicating primary data is appropriate in certain circumstances. You'll just have to work closely with them to figure out which ones.