Abstract
Spectral graph convolutional neural networks (GCN) are proposed to incorporate important information contained in graphs such as gene networks. In a standard spectral GCN, there is only one gene network to describe the relationships among genes. However, for genomic applications, due to condition- or tissue-specific gene function and regulation, multiple gene networks may be available; it is unclear how to apply GCNs to disease classification with multiple networks. Besides, which gene networks may provide more effective prior information for a given learning task is unknown a priori and is not straightforward to discover in many cases. A deep multiple graph convolutional neural network is therefore developed here to meet the challenge. The new approach not only computes a feature of a gene as the weighted average of those of itself and its neighbors through spectral GCNs, but also extracts features from gene-specific expression (or other feature) profiles via a feed-forward neural networks (FNN). We also provide two measures, the importance of a given gene and the relative importance score of each gene network, for the genes' and gene networks' contributions, respectively, to the learning task. To evaluate the new method, we conduct real data analyses using several breast cancer and diffuse large B-cell lymphoma datasets and incorporating multiple gene networks obtained from “GIANT 2.0” Compared with the standard FNN, GCN, and random forest, the new method not only yields high classification accuracy but also prioritizes the most important genes confirmed to be highly associated with cancer, strongly suggesting the usefulness of the new method in incorporating multiple gene networks.
Original language | English (US) |
---|---|
Pages (from-to) | 5547-5564 |
Number of pages | 18 |
Journal | Statistics in Medicine |
Volume | 40 |
Issue number | 25 |
DOIs | |
State | Published - Nov 10 2021 |
Bibliographical note
Funding Information:information the National Natural Science Foundation for Distinguished Young Scholars of China, 71701223; the National Statistical Science Foundation of China, 2018LZ08; the Central University of Finance and Economics Young Talents Training Support Project, QYP2014We are grateful to the reviewers for many constructive and insightful comments. We thank Haoran Xue in the School of Statistics at the University of Minnesota for helpful discussions. H.Y. was supported by grants from the National Natural Science Foundation for Distinguished Young Scholars of China Project number 71701223, the National Statistical Science Foundation of China Project number 2018LZ08 and the Central University of Finance and Economics Young Talents Training Support Project QYP2014.
Funding Information:
We are grateful to the reviewers for many constructive and insightful comments. We thank Haoran Xue in the School of Statistics at the University of Minnesota for helpful discussions. H.Y. was supported by grants from the National Natural Science Foundation for Distinguished Young Scholars of China Project number 71701223, the National Statistical Science Foundation of China Project number 2018LZ08 and the Central University of Finance and Economics Young Talents Training Support Project QYP2014.
Funding Information:
the National Natural Science Foundation for Distinguished Young Scholars of China, 71701223; the National Statistical Science Foundation of China, 2018LZ08; the Central University of Finance and Economics Young Talents Training Support Project, QYP2014 Funding information
Publisher Copyright:
© 2021 John Wiley & Sons Ltd.
Keywords
- Laplacian
- deep learning
- feed-forward neural network
- gene expression data
- spectral graph theory
PubMed: MeSH publication types
- Journal Article
- Research Support, Non-U.S. Gov't