Closing in on Primary Deduplication

Arthur Cole
Slide Show

The Data Deduplication Revolution

Data deduplication has evolved into an efficient, high-performance technology.

Deduplication has done a lot to lessen the cost and complexity of backup systems. But the holy grail for dedupe remains the primary storage infrastructure, where enterprises continue to hold out hope that they can somehow lessen the capacity burden even in the face of rising data loads.

The good news is that primary dedupe is possible, although it will require a significant investment to make it happen. As tech consultant Chris Evans pointed out this week, SSD arrays are more likely to incorporate dedupe, if only to lower overall operating costs in comparison to disk-based systems. Unfortunately, many of the leading disk-based storage arrays still rely on legacy LUN structure, which makes effective dedupe a dicey proposition at best.

But perhaps not for long. A small Rhode Island company called GreenBytes recently received a patent for a new primary dedupe method that overcomes many of the I/O and scalability issues that plague standard inline approaches. The dcache system applied directly to the storage kernal, implemented in the company's HA-3000 platform, uses a specialized data structure and an advanced search algorithm that accommodates massively scaled architectures. At the same time, it provides a high level of fault tolerance, as well as in-memory indexing, searching and other functions of the dedupe process.

Meanwhile, EMC seems to be on the verge of boosting its deduplication capabilities in solid-state environments. The company is said to be close to purchasing Israeli firm XtremIO, which specializes in all-Flash arrays packed with advanced features like automatic thin provisioning and real-time primary dedupe. The technology is likely to find its way into the upcoming Project Thunder array that is slated to provide multi-TB levels of PCIe-based storage.

Also moving ahead on the solid-state front is Permabit, which just released a Flash version of its Albireo platform. The package provides up to a 35-fold reduction in the cost of Flash infrastructure by substantially limiting the number of writes needed for high I/O applications like database indexing, transaction processing and server/desktop virtualization. Key performance metrics include scalability up to millions of dedupe operations per second, sub-millisecond latency and less than 512 MB memory footprint.

A cynic would argue that traditional storage providers are loathe to implement primary dedupe because it eats into their volume sales of raw storage. An attitude like that would be extremely short-sighted, however. The more likely explanation is that solid-state technology is more amenable to the random nature of dedupe operations.

Companies like GreenBytes may offer up effective solutions for inline processing, but it remains to be seen whether they can be effectively implemented in legacy environments.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.


Add Comment      Leave a comment on this blog post
May 2, 2012 8:29 AM Joel Billy Joel Billy  says:

As pointed by you, primary storage is where deduplication will make the biggest dent in a storage companies sales. They could charge you more for a deduplication software license, but it wouldn't be as big a sale as an additional array But for all flash arrays dedupe is a must.

There are other options than what you have mentioned, nexentastor, quadstor (quadstor.com check this out its great), or you could build your own ZFS appliance.

Primary storage dedupe will be the next big thing in the next couple of years


Post a comment





(Maximum characters: 1200). You have 1200 characters left.




Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.