Deduplication is considered by many to be the saving grace of long-term storage and backup. Why provision expensive infrastructure merely to save 15 copies of the same file? Through advanced algorithms and system intelligence, a competent dedupe system can reduce the number of copies to one and then link it to the correct data when requested.
Increasingly, this concept is starting to infiltrate other aspects of the data environment. Rather than simply plunk a dedupe engine at the backup facility, the technology is finding homes in the network infrastructure and even up to the database and application stacks.
Permabit kicked this trend into high gear this week with a new set of APIs for its Albireo platform, which the company says enables a "Dedupe Everywhere" strategy. The idea is to push dedupe across the entire enterprise stack in a way that data is subject to multiple redundancy checks from the point of creation to its destination. In this way, you not only make more efficient use of available storage capacity, but you lessen the load on network and server infrastructure as well. The company says its new configuration maintains Albireo's 400 GBps performance rating and 20 PB scalability.
Microsoft is starting to take dedupe to heart as well. Word is that Windows Server 8 will have a built-in memory dedupe module that will reduce redundant data that is finding its way onto RAM through the increased use of virtual machines. A more traditional storage module will also be included, working in the background to eliminate copies and then share the original across multiple files.
Cloud providers are also taking advantage of new dedupe technologies, although it's probably wise to gain full understanding of exactly what happens to your data once it leaves the confines of your data center. Bitcasa, for example, says it can provide unlimited storage capacity for only $10 a month through the liberal use of dedupe. The idea is to use Bitcasa as regular storage while reserving local disk capacity for cache. Still, the company says it will dedupe data from multiple clients, meaning that if two identical data sets come in from two different firms, Bitcasa will delete one and share the other.
Even without the cloud, dedupe can produce a fair share of headaches if not deployed correctly, according to tech consultant Preston de Guise. Storage and network settings ranging from multiplexing to replication all need to be reviewed in order to gain the maximum benefit. And certain data sets themselves are more dedupe-friendly than others. It also helps to realize that dedupe ratios are only one way to gauge the effectiveness of any platform. Equally important is the speed at which data is recovered, or "rehydrated," and how well the system holds up in long-term, archival environments.
Deduplication is on its way to becoming as ubiquitous in the data center as virtualization. Data loads show no sign of letting up on their own, so enterprises are being forced to throw every tool at their disposal to ease the pressure.
This isn't a simple plug-and-play technology, however, as it can affect the often delicate interplay between data sets, applications and underlying infrastructure. The more comprehensive solution, then, the better.