Deduplication on primary storage has long been considered something of an oddity. Certainly, it's possible, but the latency penalty was deemed too great for fast-paced Tier 1 applications.
But now that some dedupe systems are gearing up for that very purpose, it might be time to rethink the matter.
The key driver here is a new approach from Permabit called Albireo that attempts to match lower-tier dedupe performance in primary applications. The company claims that its subfile design preserves data integrity and overall system performance even as it scales up to the petabyte level. The package includes a high-performance index engine built on a memory/disk structure capable of delivering duplicate IDs in seconds by removing the integration process out of the data read path. The system supports block, file and unified storage and features an advanced scanning system that delivers 50 to 70 percent file reduction.
The main issue with primary deduplication is what Storage Switzerland's George Crump calls the "Roll Down Hill" effect. That is, the benefits are there as long as you are able to maintain data in its deduped state. Functions like snapshots, replication, copying, even data removal, can be done without affecting overall capacity as long as deduplication is consistent throughout the data lifecycle. Once you "re-inflate," those benefits are lost and you may lose out on some metadata and integrity features of the dedupe engine. Dedupe in general also makes storage-capacity planning a bit more complicated.
Dedupe is also most effective on largely static data sets that change very little over time -- and that's not usually what you find in primary storage, says IT consultant Behzad Behtash. For these types of applications, such as disk-to-disk e-mail backup and the like, reduction ratios of 30-to-1 are not uncommon. And if you use a remote backup facility, dedupe speeds up the recovery process by reducing bandwidth requirements.
In that light, dedupe in primary storage can be helpful, but ideally it should be integrated with other data-reduction measures like compression and capacity optimization -- provided the cumulative lag from all these processes does not hamper system performance.
One of the newest optimization systems in the channel comes from Neuxpower Solutions, a British firm that specializes in Microsoft environments. The company's NXPowerLite for File Servers is said to reduce Office and JPEG files by 95 percent through a process that removes digital content without sacrificing file integrity. The system can be deployed more cheaply than dedupe, but can be used in a complementary capacity as well.
The thing about primary storage is that, even though it represents only a small portion of the overall storage environment in terms of capacity, it is nevertheless the most expensive. So any measure to extend capacity should be welcome. But since data in the top tier is also usually of the most critical variety, it's probably a good idea to keep a sharp eye on how primary dedupe is performing -- at least at the outset of the deployment.