CAREER: Predicting and Mining Phenome-genome Association across Species

Project: Research project

Project Details

Description

Understanding how genetic material determines the observable characteristics (phenotypes) of an organism relies on knowledge of phenotype-gene relations. From high-throughput genomic data, it is now possible to apply computational approaches to identify the associations between individual phenotypes and genes. Since the number of determined phenotype-gene associations is still very limited, no computational framework has been developed to perform large-scale cross-species analysis of the association between the whole collection of phenotypes (phenome) and genes (genome). The objective of this CAREER proposal is to develop new computational methods for predicting and understanding phenome-genome association across multiple species. With the prediction tools, a biologist or disease researcher could more reliably prioritize genes to test their association with phenotypes in the laboratory. The availability of the tools will greatly expedite the process of discovering new associations, especially for studying rare phenotypes. The developed methods will be applied to study phenome-genome associations for the analysis of several cancer tumor phenotypes and the growth phenotypes of Arabidopsis thaliana in collaboration with oncologists and biologists. The study of the plant growth phenotypes aims to identify genes that govern seedling de-etiolation and seed development. The collaboration should generate a potential increase in seed yield and concomitant increases in the contents of proteins and oil per seed for the crop plants. The study of the ovarian cancer and lung cancer tumor phenotypes will help reveal the driving pathways of chemoresistance, and result in useful prediction tools and drug targets for the treatment of ovarian cancer and lung cancer. The research in this proposal will deliver a web portal called Phenome-Genome Explorer with a collection of computational tools that utilize known phenotype-gene associations to predict new associations, find conserved associations and conserved modules of associations across species. The PI has a long-term commitment to teach a summer class in the BioSMART program for Minnesota high school students. He will also create a new course Computational Phenomics and Genomics to support two graduate programs for training students in biomedical/health informatics with knowledge in genomics and computer science. The education plan in the proposal aims to promote high school students' early interest in careers in computing science and biomedical/health informatics and integrate the research development on phenome-genome analysis into training graduate students to meet the need of workforce in the growing biomedical and health informatics industry, with a focus on recruiting students in minority and under-represented groups.

This proposal targets a systematic computational study of phenome-genome association in a network context. The comparative analysis across multiple species will expand the current scope of understanding evolutionary relation between phenome and genome. The proposed research work focuses on 1) How to discover patterns and predict new associations by learning with the sparse connections in a large heterogeneous network composed of phenotype network, gene network and their association network; and 2) How to compare multiple heterogeneous networks to find conserved patterns and modules, and to infer new associations. Both scenarios require development of scalable new algorithms to deal with multiple large heterogeneous networks.

StatusFinished
Effective start/end date8/1/127/31/17

Funding

  • National Science Foundation: $446,508.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.