The Data Deduplication Revolution
Data deduplication has evolved into an efficient, high-performance technology.
Deduplication has done a lot to lessen the cost and complexity of backup systems. But the holy grail for dedupe remains the primary storage infrastructure, where enterprises continue to hold out hope that they can somehow lessen the capacity burden even in the face of rising data loads.
The good news is that primary dedupe is possible, although it will require a significant investment to make it happen. As tech consultant Chris Evans pointed out this week, SSD arrays are more likely to incorporate dedupe, if only to lower overall operating costs in comparison to disk-based systems. Unfortunately, many of the leading disk-based storage arrays still rely on legacy LUN structure, which makes effective dedupe a dicey proposition at best.
But perhaps not for long. A small Rhode Island company called GreenBytes recently received a patent for a new primary dedupe method that overcomes many of the I/O and scalability issues that plague standard inline approaches. The dcache system applied directly to the storage kernal, implemented in the company's HA-3000 platform, uses a specialized data structure and an advanced search algorithm that accommodates massively scaled architectures. At the same time, it provides a high level of fault tolerance, as well as in-memory indexing, searching and other functions of the dedupe process.
Meanwhile, EMC seems to be on the verge of boosting its deduplication capabilities in solid-state environments. The company is said to be close to purchasing Israeli firm XtremIO, which specializes in all-Flash arrays packed with advanced features like automatic thin provisioning and real-time primary dedupe. The technology is likely to find its way into the upcoming Project Thunder array that is slated to provide multi-TB levels of PCIe-based storage.
Also moving ahead on the solid-state front is Permabit, which just released a Flash version of its Albireo platform. The package provides up to a 35-fold reduction in the cost of Flash infrastructure by substantially limiting the number of writes needed for high I/O applications like database indexing, transaction processing and server/desktop virtualization. Key performance metrics include scalability up to millions of dedupe operations per second, sub-millisecond latency and less than 512 MB memory footprint.
A cynic would argue that traditional storage providers are loathe to implement primary dedupe because it eats into their volume sales of raw storage. An attitude like that would be extremely short-sighted, however. The more likely explanation is that solid-state technology is more amenable to the random nature of dedupe operations.