A feature-based intelligent deduplication compression system with extreme resemblance detection
- 21 December 2020
- journal article
- research article
- Published by Taylor & Francis Ltd in Connection Science
- Vol. 33 (3), 576-604
- https://doi.org/10.1080/09540091.2020.1862058
Abstract
With the fast development of various computing paradigms, the amount of data is rapidly increasing that brings the huge storage overhead. However, the existing data deduplication techniques do not make full use of similarity detection to improve the storage efficiency and data transmission rate. In this paper, we study the problem of utilising the duplicate and resemblance detection techniques to further compress data. We first present a framework of FIDCS-ERD, a feature-based intelligent deduplication compression system with extreme resemblance detection. We also introduce the main components and the detailed workflow of our compression system. We propose a content-defined chunking algorithm for duplicate detection and a Bloom filter-based resemblance detection algorithm. FIDCS-ERD implements the intelligent file chunking and the fast duplicate and resemblance detection. By extensive experiments over the real datasets, we demonstrate that FIDCS-ERD has better compression effect and more accurate resemblance detection compared to the existing approaches.Keywords
Funding Information
- King Saud University (RG-1441-331)
- National Natural Science Foundation of China (41971343, 61872422, 41771411)
This publication has 41 references indexed in Scilit:
- DARE: A Deduplication-Aware Resemblance Detection and Elimination Scheme for Data Reduction with Low OverheadsIEEE Transactions on Computers, 2015
- Leap-based Content Defined Chunking — Theory and ImplementationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Trading in markets with noisy information: an evolutionary analysisConnection Science, 2015
- AE: An Asymmetric Extremum content defined chunking algorithm for fast and bandwidth-efficient data deduplicationPublished by Institute of Electrical and Electronics Engineers (IEEE) ,2015
- Ddelta: A deduplication-inspired fast delta compression approachPerformance Evaluation, 2014
- Optimal Pattern Matching in LZW Compressed StringsACM Transactions on Algorithms, 2013
- WAN-optimized replication of backup datasets using stream-informed delta compressionACM Transactions on Storage, 2012
- Network Applications of Bloom Filters: A SurveyInternet Mathematics, 2004
- Compression of individual sequences via variable-rate codingIEEE Transactions on Information Theory, 1978
- A universal algorithm for sequential data compressionIEEE Transactions on Information Theory, 1977