Dedupe in Primary Storage: Does the Risk Match the Reward?

Arthur Cole

The number and variety of deduplication processes continues to grow, and with it a growing movement to bring the technology out of the back-up realm and into primary storage arrays. But is this really such a good idea, particularly when you consider that storage performance is now just as important as capacity when evaluating your immediate data needs?


Deduplication as a primary storage tool is promoted largely by NetApp. Earlier this month, the company announced that it has expanded the dedupe capabilities of its V-Series NAS Gateway appliance to work with the leading primary storage products from EMC, HP, IBM and Hitachi Data Systems. This comes on the heels of NetApp offering dedupe on its FAS line of primary storage, as well as the NearStore secondary line.


How effective is dedupe for reducing primary data? That's hard to say since there isn't a lot of field experience. NetApp holds up the example of Burt's Bees in Durham, NC. The honey and personal care firm reports that it has reclaimed more than half of the storage tied to its VMware ESX infrastructure.


The rise of virtual environments may be a key factor in deduplicating primary storage data, according to GlassHouse Technologies CTO Jim Damoulakis. Dedupe seeks out and eliminates duplicate data blocks and files, which can be found in abundance in the virtual world. How many C: drives reside in a typical VMware cluster, he asks.


Still, the performance question could be the killer here, says Warren Smith, a StorageWorks manager at HP. What you're actually doing is inserting an invasive and performance-hampering process into your critical business infrastructure. This will take away compute cycles from high-priority processes, making your entire organization less productive. Even running the dedupe process during low activity periods can bite you when that deduped volume is accessed during regular operating hours.


NetApp is apparently aware of all this. Not only does it require customers to sign a statement saying they are aware of possible performance degradations, but they offer the ability to turn the dedupe function off if performance degrades too much.


Still, the company is confident that deduplicating primary data is appropriate in certain circumstances. You'll just have to work closely with them to figure out which ones.

Add Comment      Leave a comment on this blog post
Aug 19, 2008 12:41 PM Calvin Zito Calvin Zito  says:
Hi Arthur, Good post! I think customers should be extremely cautious before buying into deduplication of online storage systems. The risks of data loss are significantly higher. Just image the customer who has multiple versions of the same file and those get deduped down to a single instance. If that record gets corrupted, all copies of the data are lost. My recommendation on disk based deduplication is "Just say, 'No!'"To read all of Warren Smith's comments, you can see his complete post on this topic at: Calvin ZitoHP StorageWorks Reply
Sep 10, 2008 4:03 AM Brian McCarthy Brian McCarthy  says:
....I guess our friends at GlassHouse Technologies and their CTO Jim Damoulakis have been on the Data Domain payroll way too long, It looks like to me their standard answer to any data reduction question is based on what Data Domain has told them to say. So Jim let me help you at a bit, yes blocks and files in "backup" have lots of duplicates and yes not alot in the primary space, so far are you with me? The difference in primary storage is from a new breed of vendors, Jim write this down: Ocarina Networks and Storwize. Not like backup dedupe technologies these two companies use lossless data compression (see for more details but don't ask Glasshouse :-) or their CTO. For more information on these vendor visit a company that knows the difference between these technologies and is not on the Data Domain payroll. McCarthyCEO/President Sencilo Solutions Reply
Oct 29, 2008 7:57 AM Mike Taylor Mike Taylor  says:
Mr. McCarthy is on target. Why do backup de-dupe when you can just start at the source. We did run a 90 day de-dupe test with several "leaders" in this space Data Domain, Falcon Stor and one other. None were up to 50 tera-byte backups let alone our fulls. MTSys Admin Reply

Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.