Identifying Biomarkers from Multi-source, Multi-way Data

Project: Research project

Project Details

Description

Project Summary In medical research, a growing number of high-content platforms and technologies are used to measure di- verse but related information. Examples include sequencing of the genome, epigenome, transcriptome and translatome, metabolite profiling, and imaging modalities. Moreover, data from the same high-content platform are often measured over multiple dimensions, such as multiple tissues, body regions, or developmental time points. We refer to data measured over multiple platforms or technologies as multi-source, and data measured over multiple dimensions as multi-way. Many modern biomedical studies collect data that are both multi-source and multi-way, meaning multi-way data are collected from multiple platforms. Multi-source multi-way data has enormous potential to capture and synthesize every facet of a complex biological system. However, to date there has been little methodology developed for fully integrative analysis of such data. We will focus on devel- oping methods to identify biomarkers for a clinical outcome from multi-source multi-way data. Biomarkers are often used as a surrogate for disease progression or as an endpoint for clinical trials, and so their precision in capturing a given medical phenomenon is crucial. We propose to develop new composite biomarker meth- ods that identify patterns across multiple sources of data, and multiple dimensions, that are associated with a clinical outcome. Our central hypothesis is that a fully integrated and multivariate approach will yield more precise biomarkers and simplify their interpretation. The novel product of this project will be a suite of methods extending common biomarker tasks to the multi-source multi-way context, including dimension reduction (Aim 1a), missing value imputation (Aim 1b), high-dimensional prediction (Aim 2) and dependent hypothesis testing (Aim 3). This work is motivated by our involvement in several ongoing collaborative translational projects with rich multi-source multi-way data, including biomarker discovery for the development of lung cancer in chronic obstructive pulmonary disease patients, for the progression of neurodegenerative disorders such as Friedre- ich's Ataxia, and for brain iron deficiency in infants. We will apply and rigorously assess our multi-source multi-way approaches on these applications. All methods will be implemented in free, open-source and easily accessible software to facilitate their use by other researchers and practitioners.
StatusFinished
Effective start/end date3/1/1911/30/23

Funding

  • National Institute of General Medical Sciences: $269,197.00
  • National Institute of General Medical Sciences: $268,956.00
  • National Institute of General Medical Sciences: $268,710.00
  • National Institute of General Medical Sciences: $299,367.00
  • National Institute of General Medical Sciences: $299,618.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.