Sliding look-back window assisted data chunk rewriting for improving deduplication restore performance

Zhichao Cao, Shiyong Liu, Fenggang Wu, Guohua Wang, Bingzhe Li, David H.C. Du

Research output: Chapter in Book/Report/Conference proceedingConference contribution

28 Scopus citations

Abstract

Data deduplication is an effective way of improving storage space utilization. The data generated by deduplication is persistently stored in data chunks or data containers (a container consisting of a few hundreds or thousands of data chunks). The data restore process is rather slow due to data fragmentation and read amplification. To speed up the restore process, data chunk rewrite (a rewrite is to store a duplicate data chunk) schemes have been proposed to effectively improve data chunk locality and reduce the number of container reads for restoring the original data. However, rewrites will decrease the deduplication ratio since more storage space is used to store the duplicate data chunks. To remedy this, we focus on reducing the data fragmentation and read amplification of container-based deduplication systems. We first propose a flexible container referenced count based rewrite scheme, which can make a better tradeoff between the deduplication ratio and the number of required container reads than that of capping which is an existing rewrite scheme. To further improve the rewrite candidate selection accuracy, we propose a sliding look-back window based design, which can make more accurate rewrite decisions by considering the caching effect, data chunk localities, and data chunk closeness in the current and future windows. According to our evaluation, our proposed approach can always achieve a higher restore performance than that of capping especially when the reduction of deduplication ratio is small.

Original languageEnglish (US)
Title of host publicationProceedings of the 17th USENIX Conference on File and Storage Technologies, FAST 2019
PublisherUSENIX Association
Pages129-142
Number of pages14
ISBN (Electronic)9781939133090
StatePublished - 2019
Event17th USENIX Conference on File and Storage Technologies, FAST 2019 - Boston, United States
Duration: Feb 25 2019Feb 28 2019

Publication series

NameProceedings of the 17th USENIX Conference on File and Storage Technologies, FAST 2019

Conference

Conference17th USENIX Conference on File and Storage Technologies, FAST 2019
Country/TerritoryUnited States
CityBoston
Period2/25/192/28/19

Bibliographical note

Funding Information:
We thank all the members in CRIS group for providing the useful comments to improve our design. We would like to thank our shepherd, Keith Smith, for his useful comments, suggestions, and help in the paper revision. This work was partially supported by NSF awards 1421913, 1439622, 1525617, and 1812537.

Fingerprint

Dive into the research topics of 'Sliding look-back window assisted data chunk rewriting for improving deduplication restore performance'. Together they form a unique fingerprint.

Cite this