EAGER: Data Deduplication with Consideration of Data Chunk Frequency

Project: Research project

Project Details

Description

With the globalization of the economy, business data needs to be available twenty-four hours a day, seven days a week. Furthermore, in the event of a disaster, the data must be restored as quickly as possible to minimize the business? financial loss. Since our current Internet environment is truly distributed, data are copied and revised many times. Therefore, the data or portions of the data are highly redundant. How to store, preserve and manage the enormous amount of digital data with a reasonable cost become very challenging. Data de-duplication is used to support many data driven applications in our daily life, and is widely deployed for data redundancy elimination so that the huge volumes of data can be easily managed and better preserved. However, the theoretical understanding of the problem is still largely missing. In this project, the PI plans first to investigate several fundamental issues of data de-duplication and then use these new insights to design more efficient data new algorithms for efficient data archiving and backup. The anticipated prototype system will be open source and made available to others. The proposed project will enhance the education process by bringing input from industry, developing new courses at both undergraduate and graduate levels and emphasizing the diversity of the student population. The efficiency of data de-duplication has a great impact on both long-term data preservation and ease of managing the existing huge volume of digital data. Many crucial applications from large scale simulation and modeling to electronic patient records to preserving and managing our personal data depend on both preservation and management, enhancing the impact of this work

StatusFinished
Effective start/end date9/15/098/31/12

Funding

  • National Science Foundation: $300,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.