Collaborative Research: CNS Core: Small: Efficient Ways to Enlarge Practical DNA Storage Capacity by Integrating Bio-Computer Technologies

Project: Research project

Project Details

Description

The world's digital data increases immensely each year. By 2025, it will reach 175 Zettabytes (ZB). Most human activities are recorded in digital format today. However, data recorded in digital media cannot last very long. Therefore, valuable data cannot be preserved today with our current storage technologies and devices for a long duration (beyond 15 years). The capacity of existing storage media cannot keep up with the growth of the amount of digital data. Also, all storage devices could become obsolete within several years, so the data stored are vulnerable as they perish as time goes by. Therefore, synthetic deoxyribonucleic acid (DNA) becomes an attractive alternative storage medium due to its high density and long durability. These characteristics of DNA storage make it a great candidate for archival storage. However, the preliminary study of the project indicates the practical DNA storage tube capacity based on current technologies is only around 250GB, which is much less than the expected capacity. The major reason is that primer-payload collisions in DNA storage can drastically reduce the number of usable primers in a tube as the data payload size increases. The use of primers is essential for random access to DNA data. In this project, an interdisciplinary team is formed to investigate both bio and storage approaches that can improve the scalability of DNA storage. Among the many factors that can scale up DNA storage, the project plans to investigate the following questions: 1) How to identify more primers for a primer library to be used in DNA storage? 2) Given a primer library, how to efficiently allocate payload data to avoid primer-payload collisions to increase DNA storage capacity? and 3) How to effectively use a popular technique called data deduplication in data backup applications to further increase the storage capability of DNA storage? With a deep understanding of molecular biology and computer storage technologies and systems, this interdisciplinary team fosters several innovative ways of understanding the fundamental issues of DNA storage and will develop necessary genome engineering, sequencing techniques, software, and new algorithms to optimize the process of converting the world's digital data to DNA storage for archiving and preserving today's valuable digital data for hundreds of years in the future. The goal of storing the world's digital data in DNA storage to preserve all human activities can move one step closer with this project. The potential research outcomes of the project include fostering the advancement of bioscience and storage technologies, preserving human activities in DNA storage for hundreds of years, and facilitating fundamental understanding, identifying tradeoffs, and creating efficient ways of scaling up DNA storage. The project will provide an ideal inter-disciplinary thinking, hands-on learning, and development environment to teach computer science and electrical and computer engineering graduate and undergraduate students important system building and experimental skills that are critical for today's and the future IT workforce. The research outcomes of the project will be incorporated into the classroom teaching of the team members, for both class projects and the core courses in computer science and electrical and computer engineering. The team plans to include the obtained research results in a new course on Storage Technologies /Systems for Big Data for students in a Data Science Program, as well as in undergraduate senior design and directed research studies. The team plans to disseminate the research advances to industrial collaborators, and through publications, presentations, and public release of research data, software tools, and prototype systems to the research community. The team is committed to recruiting underrepresented undergraduate and graduate students to the project. Research results will be made quickly available to the general public and disseminated via websites and open source repositories like GitHub.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date7/15/226/30/25

Funding

  • National Science Foundation: $300,000.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.