New Ways to Improve Deduplication


Data Domain came out with the first deduplication system close to 10 years ago, following up with a steady stream of upgrades and improvements nearly every year. But once you get to the point where you can remove virtually all duplicate files from your storage, how can you improve the product from there?

You make it faster. The company recently unveiled DD OS 4.6, the newest version of the software that is available as a free upgrade to existing users. The system uses a new proprietary technique called Stream Informed Segment Layout (SISL) that utilizes CPU resources (read, without additional controllers or other hardware) to dramatically increase throughput for backup and recovery operations.

Depending on your existing installation, Data Domain says it can provide a 50 percent to 100 percent improvement in data throughput. For example, a DD120 that operates at 150 GB per hour will jump to 300 GBph with the new software. A full DDX array running at 22.4 TBph can see performance increased to 43.2 TBph.

The new software marks the continuation of a steady stream of data storage improvements aimed at allowing cash-strapped enterprises to make the most use of their existing capacity. Just this week, Bus-Tech Inc. forged an interoperability agreement with Quantum to unite the Mainframe Data Library (MDL) system with Quantum's DXi7500 backup and replication system. The deal will let Bus-Tech users tap into Quantum's policy-based dedupe technology that operates across both mainframe and open system server architectures.

The name of the game here is avoiding the purchase of additional storage as long as possible, according to Symantec's Phil Goodwin. Typical organizations only use about half of their available storage, or less, so there should be plenty of capacity to get through the next couple of years when, hopefully, revenues will start to pick up again. But even then, it's likely that the past practice of simply provisioning more storage whenever the need arises is gone for good.

That's certainly true, says Search Storage's Eric Burgener, but it's also true that dedupe should only be used as one component of an overall storage management architecture. Tools like data compression, incremental backup, copy-on-write snapshotting, and file-level instancing are all needed to keep storage usage to a minimum. But it's important to realize that not all of these technologies are complementary. It will take a fairly sophisticated management system to determine which tool or tools can best be utilized for any given data set.

With all of the advancements taking place, though, it seems dedupe is emerging as the primary weapon in the backup arsenal. And my guess is that the technology has only begun to chart new paths in both flexibility and functionality.