TDDFS: A tier-aware data deduplication-based file system

Zhichao Cao, Hao Wen, Xiongzi Ge, Jingwei Ma, Jim Diehl, David H.C. Du

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

With the rapid increase in the amount of data produced and the development of new types of storage devices, storage tiering continues to be a popular way to achieve a good tradeoff between performance and cost-effectiveness. In a basic two-tier storage system, a storage tier with higher performance and typically higher cost (the fast tier) is used to store frequently-accessed (active) data while a large amount of less-active data are stored in the lower-performance and low-cost tier (the slow tier). Data are migrated between these two tiers according to their activity. In this article, we propose a Tier-aware Data Deduplication-based File System, called TDDFS, which can operate efficiently on top of a two-tier storage environment. Specifically, to achieve better performance, nearly all file operations are performed in the fast tier. To achieve higher cost-effectiveness, files are migrated from the fast tier to the slow tier if they are no longer active, and this migration is done with data deduplication. The distinctiveness of our design is that it maintains the non-redundant (unique) chunks produced by data deduplication in both tiers if possible. When a file is reloaded (called a reloaded file) from the slow tier to the fast tier, if some data chunks of the file already exist in the fast tier, then the data migration of these chunks from the slow tier can be avoided. Our evaluation shows that TDDFS achieves close to the best overall performance among various file-tiering designs for two-tier storage systems.

Original languageEnglish (US)
Article number4
JournalACM Transactions on Storage
Volume15
Issue number1
DOIs
StatePublished - Feb 2019

Bibliographical note

Funding Information:
This work was partially supported by NSF awards 1421913, 1439622, 1525617, and 1812537. Authors’ addresses: Z. Cao, H. Wen, J. Diehl, and D. H. C. Du, University of Minnesota, Twin Cities, 4-192 Keller Hall, 200 Union Street SE, Minneapolis, MN, 55455; emails: {caoxx380, wenxx159, jdiehl, du}@umn.edu; X. Ge, NetApp, 7301 Kit Creek Road, Research Triangle Park, NC, 27709; email: gexx132@umn.edu; J. Ma, Baidu Online Network Technology Co., Ltd., Beijing, China; email: mjwtom@gmail.com. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2019 Association for Computing Machinery. 1553-3077/2019/02-ART4 $15.00 https://doi.org/10.1145/3295461

Funding Information:
This work was partially supported by NSF awards 1421913, 1439622, 1525617, and 1812537.

Publisher Copyright:
© 2019 Association for Computing Machinery.

Keywords

  • Data deduplication
  • Data migration
  • File system
  • Tiered storage

Fingerprint

Dive into the research topics of 'TDDFS: A tier-aware data deduplication-based file system'. Together they form a unique fingerprint.

Cite this