Leap-based Content Defined Chunking — Theory and Implementation
- 1 May 2015
- conference paper
- conference paper
- Published by Institute of Electrical and Electronics Engineers (IEEE)
Abstract
Content Defined Chunking (CDC) is an important component in data deduplication, which affects both the deduplication ratio as well as deduplication performance. The sliding-window-based CDC algorithm and its variants have been the most popular CDC algorithms for the last 15 years. However, their performance is limited in certain application scenarios since they have to slide byte by byte. The authors present a leap-based CDC algorithm which provides significant improvement in deduplication performance without compromising the deduplication ratio. Compared to the sliding-window-based CDC algorithm, the new algorithm enables up to two-fold improvement in performance.Keywords
This publication has 7 references indexed in Scilit:
- BloomStore: Bloom-Filter based memory-efficient key-value store for indexing of data deduplication on flashPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2012
- A study of practical deduplicationACM Transactions on Storage, 2012
- DBLK: Deduplication for primary block storagePublished by Institute of Electrical and Electronics Engineers (IEEE) ,2011
- Extreme Binning: Scalable, parallel deduplication for chunk-based file backupPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2009
- Detecting near-duplicates for web crawlingPublished by Association for Computing Machinery (ACM) ,2007
- Locality-sensitive hashing scheme based on p-stable distributionsPublished by Association for Computing Machinery (ACM) ,2004
- A low-bandwidth network file systemPublished by Association for Computing Machinery (ACM) ,2001