Compression Has Dedupe's Back

Arthur Cole

Could real-time compression be the missing component that kicks deduplication systems into high gear? The folks over at Storwize certainly think so.


The company recently ran a number of tests comparing leading dedupe platforms, first by themselves and then linked to the company's compression technology. They report a 200 percent improvement in overall data reduction with the compression turned on, as well as boosts to processing times and disk, CPU and network utilization.


It turns out that when you compress data at the primary storage level, you subsequently reduce file sizes throughout the storage process, including dedupe. That software has a smaller file to work with from the start, and can track and remove duplicate files much more quickly.


For its tests, Storwize paired its technology with dedupe systems from Data Domain and NetApp running a week's worth of data. In every test, the data that was hit with real-time compression before it went to dedupe proved vastly easier to work with than raw data.


You can find the full report here.

Add Comment      Leave a comment on this blog post
Feb 27, 2009 12:08 PM Jered Floyd Jered Floyd  says:

Compression and deduplication are definitely "two great tastes that taste great together", but chaining them like this is a bit like eating a peanut butter sandwich and THEN a jelly sandwich -- they both taste good, but could be put together better.  For file-level deduplication (also called single-instance storage), this is a fine strategy.  With more advanced variable deduplication, however, you can lose a lot of your deduplication opportunities.

That's because of the nature of stream compression.  A single byte change of input in a large file will make the compressed output past that point very different, which means that deduplication can't eliminate redundancies later in the file.  The right place for compression is actually after deduplication has occurred, or at least the segmentation for deduplication.

Permabit Enterprise Archive ( incorporates both technologies in this order for maximum benefit.   As data is being written an in-line process breaks files up in to variable-sized segments for optimal deduplication.  Then these segments are compressed, deduplicated, and written to disk.  This provides the best of both worlds.


  Jered Floyd

  CTO, Permabit


Post a comment





(Maximum characters: 1200). You have 1200 characters left.



Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.