The Data Deduplication Revolution - Slide 3

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11
Next The Data Deduplication Revolution-3 Next

Deduplication can occur in any of three places. “At the client”- where the source data sits, “in-line” – as the data travels to the target, or “on the target” – after the data has been written, the latter is often referred to as “post process.” All three locations offer advantages and disadvantages, and one or more of these techniques will be found in the deduplication solutions available on the market today. The choice of which type of deduplication an organization deploys is governed by their infrastructure, budgets, and perhaps most importantly, their business process requirements.

Post-process deduplication

This works by first capturing and storing all the data, and then processing it at a later time to look for the duplicate chunks. This requires a larger initial disk capacity than in-line solutions, however, because the processing of duplicate data happens after the backup is complete, there is no real performance hit on the data protection process. And CPU and memory requirements for use in the deduplication process are consumed on the target, away from the original application and therefore not interfering with business operations. As the target device may be the destination for data from many file and application servers, post-process deduplication also offers the additional benefit of comparing data from all sources – this global deduplication increases the level of saved storage space even further.

In-line deduplication

The analysis of the data, the calculation of the hash value, and the comparison with the index all takes place as the data travels from source to target. The benefit being it requires less storage as data is first placed on the target disk, however, on the negative side because so much processing has to occur, the speed of moving the data can be slowed down. In reality, the efficiency of the in-line processing has increased to the point that the performance on the backup job is so small it is inconsequential. Historically, the main issue with in-line deduplication was that it was often focused only on the data stream being transported, and did not always take into account data from other sources. This could result in a less “global” deduplication occurring and therefore more disk space being consumed than is necessary.

Client-side deduplication

Sometimes referred to as Source deduplication, this takes place where the data resides. The deduplication hash calculations are initially created on the client (source) machines. Files that have identical hashes to files already in the target device are not sent, the target device just creates appropriate internal links to reference the duplicated data and results in less data being transferred to the target. This efficiency does however, incur a cost. The CPU and memory resources required to analyze the data will also be needed by the application being protected, therefore, application performance will most likely be negatively affected during the backup process.

The term data deduplication increasingly refers to the technique of data reduction by breaking streams of data down into very granular components, such as blocks or bytes, and then storing only the first instance of the item on the destination media, and then adding all other occurrences to an index. Because it works at a more granular level than single instance storage, the resulting savings in space are much higher, thus delivering more cost effective solutions. The savings in space translate directly to reduced acquisition, operation, and management costs.

Data deduplication technologies are deployed in many forms and many places within the backup and recovery infrastructure. It has evolved from being delivered within specially designed disk appliances offering post processing deduplication to being a distributed technology found as an integrated part of backup and recovery software. According to CA Technologies, along the way solution suppliers have identified the good and bad points of each evolution and developed what today are high performance efficient technologies.

This slideshow looks at data deduplication and five areas, identified by CA Technologies, that you should consider carefully when approaching a data deduplication project.

More Slideshows:

Don't Be a Loser: Think Before You Post It doesn't look like online users have learned much, as the number of those with "poster's remorse" has increased since last year.

Eleven Easy Ways to Improve Your Survey Response Rates Tips for getting better results when conducting surveys.

Nine Female Executives to Watch Top female executives to keep your eye on.


Related Topics : Brocade, EMC, Fibre Channel, Network Adapters, SAN

More Slideshows

ReduxioFlashStorage0x Flash Storage Architecture: What's Available and Why It Matters

By comparing flash storage architectures side by side, storage administrators can better understand what flash architectures make the most sense for their particular set of applications. ...  More >>

QumuloEnterpriseStorage0x 5 Trends Shaking Up Today's Enterprise Storage Strategy

"Storage Wars" is a popular reality TV show, but the title may seem all too real to enterprises trying to deal with storage demands as they drown in data. ...  More >>

Holiday17-290x195 Eleven Hot Gadgets for Dads and Grads

Got a gadget-loving grad or dad that you need a gift for? We've gathered a list of 11 new and innovative gadgets that are sure to please. ...  More >>

Subscribe Daily Edge Newsletters

Sign up now and get the best business technology insights direct to your inbox.