AE: An Asymmetric Extremum content defined chunking algorithm for fast and bandwidth-efficient data deduplication
- 1 April 2015
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
- p. 1337-1345
- https://doi.org/10.1109/infocom.2015.7218510
Abstract
Data deduplication, a space-efficient and bandwidth-saving technology, plays an important role in bandwidth-efficient data transmission in various data-intensive network and cloud applications. Rabin-based and MAXP-based Content-Defined Chunking (CDC) algorithms, while robust in finding suitable cut-points for chunk-level redundancy elimination, face the key challenges of (1) low chunking throughput that renders the chunking stage the deduplication performance bottleneck and (2) large chunk-size variance that decreases deduplication efficiency. To address these challenges, this paper proposes a new CDC algorithm called the Asymmetric Extremum (AE) algorithm. The main idea behind AE is based on the observation that the extreme value in an asymmetric local range is not likely to be replaced by a new extreme value in dealing with the boundaries-shift problem, which motivates AE's use of asymmetric (rather than symmetric as in MAXP) local range to identify cut-points and simultaneously achieve high chunking throughput and low chunk-size variance. As a result, AE simultaneously addresses the problems of low chunking throughput in MAXP and Rabin and high chunk-size variance in Rabin. The experimental results based on four real-world datasets show that AE improves the throughput performance of the state-of-the-art CDC algorithms by 3x while attaining comparable or higher deduplication efficiency.Keywords
This publication has 12 references indexed in Scilit:
- Ddelta: A deduplication-inspired fast delta compression approachPerformance Evaluation, 2014
- Combining Deduplication and Delta Compression to Achieve Low-Overhead Data Reduction on Backup DatasetsPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2014
- Asymmetric cachingPublished by Association for Computing Machinery (ACM) ,2012
- P-Dedupe: Exploiting Parallelism in Data Deduplication SystemPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing EnvironmentPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- SmartREPublished by Association for Computing Machinery (ACM) ,2009
- Redundancy in network trafficPublished by Association for Computing Machinery (ACM) ,2009
- WinnowingPublished by Association for Computing Machinery (ACM) ,2003
- A low-bandwidth network file systemACM SIGOPS Operating Systems Review, 2001
- A protocol-independent technique for eliminating redundant network trafficPublished by Association for Computing Machinery (ACM) ,2000