Tier 1 Deduplication: Ready for the Majors?

Arthur Cole

Deduplication on primary storage has long been considered something of an oddity. Certainly, it's possible, but the latency penalty was deemed too great for fast-paced Tier 1 applications.


But now that some dedupe systems are gearing up for that very purpose, it might be time to rethink the matter.


The key driver here is a new approach from Permabit called Albireo that attempts to match lower-tier dedupe performance in primary applications. The company claims that its subfile design preserves data integrity and overall system performance even as it scales up to the petabyte level. The package includes a high-performance index engine built on a memory/disk structure capable of delivering duplicate IDs in seconds by removing the integration process out of the data read path. The system supports block, file and unified storage and features an advanced scanning system that delivers 50 to 70 percent file reduction.


The main issue with primary deduplication is what Storage Switzerland's George Crump calls the "Roll Down Hill" effect. That is, the benefits are there as long as you are able to maintain data in its deduped state. Functions like snapshots, replication, copying, even data removal, can be done without affecting overall capacity as long as deduplication is consistent throughout the data lifecycle. Once you "re-inflate," those benefits are lost and you may lose out on some metadata and integrity features of the dedupe engine. Dedupe in general also makes storage-capacity planning a bit more complicated.


Dedupe is also most effective on largely static data sets that change very little over time -- and that's not usually what you find in primary storage, says IT consultant Behzad Behtash. For these types of applications, such as disk-to-disk e-mail backup and the like, reduction ratios of 30-to-1 are not uncommon. And if you use a remote backup facility, dedupe speeds up the recovery process by reducing bandwidth requirements.


In that light, dedupe in primary storage can be helpful, but ideally it should be integrated with other data-reduction measures like compression and capacity optimization -- provided the cumulative lag from all these processes does not hamper system performance.


One of the newest optimization systems in the channel comes from Neuxpower Solutions, a British firm that specializes in Microsoft environments. The company's NXPowerLite for File Servers is said to reduce Office and JPEG files by 95 percent through a process that removes digital content without sacrificing file integrity. The system can be deployed more cheaply than dedupe, but can be used in a complementary capacity as well.


The thing about primary storage is that, even though it represents only a small portion of the overall storage environment in terms of capacity, it is nevertheless the most expensive. So any measure to extend capacity should be welcome. But since data in the top tier is also usually of the most critical variety, it's probably a good idea to keep a sharp eye on how primary dedupe is performing -- at least at the outset of the deployment.



Add Comment      Leave a comment on this blog post
Jun 10, 2010 4:08 AM Tom Cook Tom Cook  says:

Arthur, primary storage actually dedupes very well.  Like you mention 3x for many file types up to 100x for virtual images, so the savings can be enormous. 

Reply
Jun 11, 2010 12:36 PM The Storage Alchemist The Storage Alchemist  says:

Arthur,

Nice post - it is great to see primary storage optimization is now leading the charger of how IT can best best optimize their entire environment.  I do have one question which comes back to answering your initial question - "Tier 1 Deduplication: Ready for the Majors?" - You hit the nail on the head - the two main reasons why IT buy storage is for performance and availability - any vendor that is 'mixing it up' in the storage mix must 100% maintain these characteristics - at this juncture it will be interesting to see if the OEM's can make the deduplication SDK real-time (performance) - for deduplication this will be very hard.  the first instantiation will be compression. Storwize has proven it can do real-time storage compression.

Reply
Jun 14, 2010 3:20 AM balesio AG balesio AG  says:

What is important in the Tier 1 storage is the data volume you need to deal with. And single file size is the main reason for data volume. If this file is saved two or three times on the tier 1 storage and you deduplicate it, then you treat the symptoms but not the cause.

Native file optimization is the key for primary storage optimization, so you optimize file sizes in the first place and then, all those good things like dedupe can take effect at later stages.

balesio's FILEminimizer technology has proven to optimize common office and image files by over 70% on average. A great value given that you have a "normal" file after and do not end up with a compressed or zipped one.

Reply

Post a comment

 

 

 

 


(Maximum characters: 1200). You have 1200 characters left.

 

null
null

 

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.