WAN-optimized replication of backup datasets using stream-informed delta compression
- 1 November 2012
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Storage
- Vol. 8 (4), 1-26
- https://doi.org/10.1145/2385603.2385606
Abstract
Replicating data off site is critical for disaster recovery reasons, but the current approach of transferring tapes is cumbersome and error prone. Replicating across a wide area network (WAN) is a promising alternative, but fast network connections are expensive or impractical in many remote locations, so improved compression is needed to make WAN replication truly practical. We present a new technique for replicating backup datasets across a WAN that not only eliminates duplicate regions of files (deduplication) but also compresses similar regions of files with delta compression, which is available as a feature of EMC Data Domain systems. Our main contribution is an architecture that adds stream-informed delta compression to already existing deduplication systems and eliminates the need for new, persistent indexes. Unlike techniques based on knowing a file's version or that use a memory cache, our approach achieves delta compression across all data replicated to a server at any time in the past. From a detailed analysis of datasets and statistics from hundreds of customers using our product, we achieve an additional 2X compression from delta compression beyond deduplication and local compression, which enables customers to replicate data that would otherwise fail to complete within their backup window.Keywords
This publication has 13 references indexed in Scilit:
- PRESIDIOACM Transactions on Storage, 2011
- Efficient Deduplication Techniques for Modern Backup OperationIEEE Transactions on Computers, 2010
- Characterizing datasets for data deduplication in backup applicationsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2010
- The design of a similarity based deduplication systemPublished by Association for Computing Machinery (ACM) ,2009
- Algorithms for Delta Compression and Remote File SynchronizationPublished by Elsevier BV ,2003
- A low-bandwidth network file systemPublished by Association for Computing Machinery (ACM) ,2001
- Delta algorithmsACM Transactions on Software Engineering and Methodology, 1998
- Potential benefits of delta encoding and data compression for HTTPPublished by Association for Computing Machinery (ACM) ,1997
- Efficient distributed backup with delta compressionPublished by Association for Computing Machinery (ACM) ,1997
- Copy detection mechanisms for digital documentsPublished by Association for Computing Machinery (ACM) ,1995