Improving duplicate elimination in storage systems
- 1 November 2006
- journal article
- Published by Association for Computing Machinery (ACM) in ACM Transactions on Storage
- Vol. 2 (4), 424-448
- https://doi.org/10.1145/1210596.1210599
Abstract
Minimizing the amount of data that must be stored and managed is a key goal for any storage architecture that purports to be scalable. One way to achieve this goal is to avoid maintaining duplicate copies of the same data. Eliminating redundant data at the source by not writing data which has already been stored not only reduces storage overheads, but can also improve bandwidth utilization. For these reasons, in the face of today's exponentially growing data volumes, redundant data elimination techniques have assumed critical significance in the design of modern storage systems.Intelligent object partitioning techniques identify data that is new when objects are updated, and transfer only these chunks to a storage server. In this article, we propose a new object partitioning technique, called fingerdiff , that improves upon existing schemes in several important respects. Most notably, fingerdiff dynamically chooses a partitioning strategy for a data object based on its similarities with previously stored objects in order to improve storage and bandwidth utilization. We present a detailed evaluation of fingerdiff , and other existing object partitioning schemes, using a set of real-world workloads. We show that for these workloads, the duplicate elimination strategies employed by fingerdiff improve storage utilization on average by 25%, and bandwidth utilization on average by 40% over comparable techniques.Keywords
This publication has 10 references indexed in Scilit:
- Search and replication in unstructured peer-to-peer networksPublished by Association for Computing Machinery (ACM) ,2002
- Compactly encoding unstructured inputs with differential compressionJournal of the ACM, 2002
- A low-bandwidth network file systemPublished by Association for Computing Machinery (ACM) ,2001
- OceanStorePublished by Association for Computing Machinery (ACM) ,2000
- Delta algorithmsACM Transactions on Software Engineering and Methodology, 1998
- Data compressionACM Computing Surveys, 1987
- Rcs — a system for version controlSoftware: Practice and Experience, 1985
- The string-to-string correction problem with block movesACM Transactions on Computer Systems, 1984
- A universal algorithm for sequential data compressionIEEE Transactions on Information Theory, 1977
- The source code control systemIEEE Transactions on Software Engineering, 1975