Statistical power of transcriptome-wide association studies

for the Alzheimer's Disease Neuroimaging Initiative

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Transcriptome-Wide Association Studies (TWASs) have become increasingly popular in identifying genes (or other endophenotypes or exposures) associated with complex traits. In TWAS, one first builds a predictive model for gene expressions using an expression quantitative trait loci (eQTL) data set in stage 1, then tests the association between the predicted gene expression and a trait based on a large, independent genome-wide association study (GWAS) data set in stage 2. However, since the sample size of the eQTL data set is usually small and the coefficient of multiple determination (i.e., (Formula presented.)) of the model for many genes is also small, a question of interest is to what extent these factors affect the statistical power of TWAS. In addition, in contrast to a standard (univariate) TWAS (UV-TWAS) considering only a single gene at a time, multivariate TWAS (MV-TWAS) methods have recently emerged to account for the effects of multiple genes, or a gene's nonlinear effects, simultaneously. With the absence of the power analysis for these MV-TWAS methods, it would be of interest to investigate whether one can gain or lose power by using the newly proposed MV-TWAS instead of UV-TWAS. In this paper, we first outline a general method for sample size/power calculations for two-sample TWAS, then use real data—the Alzheimer's Disease Neuroimaging Initiative (ADNI) expression quantitative trait loci (eQTL) data and the Genotype-Tissue Expression (GTEx) eQTL data for stage 1, the International Genomics of Alzheimer's Project Alzheimer's disease (AD) GWAS summary data and UK Biobank (UKB) individual-level data for stage 2—to empirically address these questions. Our most important conclusions are the following. First, a sample size of a few thousands (~8000) would suffice in stage 1, where the power of TWAS would be more determined by cis-heritability of gene expression. Second, as in the general case of simple regression versus multiple regression, the power of MV-TWAS may be higher or lower than that of UV-TWAS, depending on the specific relationships among the GWAS trait and multiple genes (or linear and nonlinear terms of the same gene's expression levels), such as their correlations and effect sizes. Interestingly, several top genes with large power gains in MV-TWAS (over that in UV-TWAS) were known to be (and in our data more significantly) associated with AD. We also reached similar conclusions in an application to the GTEx whole blood gene expression data and UKB GWAS data of high-density lipoprotein cholesterol. The proposed method and the conclusions are expected to be useful in planning and designing future TWAS and other related studies (e.g., Proteome- or Metabolome-Wide Association Studies) when determining the sample sizes for the two stages.

Original languageEnglish (US)
Pages (from-to)572-588
Number of pages17
JournalGenetic epidemiology
Volume46
Issue number8
DOIs
StatePublished - Dec 2022

Bibliographical note

Funding Information:
Data collection and sharing for this project were funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI; National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH‐12‐2‐0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug Discovery Foundation; Araclon Biotech; BioClinica Inc.; Biogen; Bristol‐Myers Squibb Company; CereSpir Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann‐La Roche Ltd., and its affiliated company Genentech Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC; Johnson & Johnson Pharmaceutical Research & Development LLC; Lumosity; Lundbeck; Merck & Co. Inc.; Meso Scale Diagnostics, LLC; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health ( www.fnih.org ). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer's Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. The Genotype‐Tissue Expression (GTEx) Project was supported by the Common Fund of the Office of the Director of the National Institutes of Health, and by NCI, NHGRI, NHLBI, NIDA, NIMH, and NINDS. The data used for the analyses described in this manuscript were obtained from dbGaP Project #26511. The access to the UKB data was approved through UKB Application #35107. This study was supported by NIH grants R01AG065636, RF1AG067924, U01AG073079, and R01HL116720, and by the MSI. We thank the reviewers for many helpful and constructive comments.

Publisher Copyright:
© 2022 The Authors. Genetic Epidemiology published by Wiley Periodicals LLC.

Keywords

  • 2SLS
  • Alzheimer's disease
  • TWAS
  • causal inference
  • sample size

PubMed: MeSH publication types

  • Journal Article
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

Fingerprint

Dive into the research topics of 'Statistical power of transcriptome-wide association studies'. Together they form a unique fingerprint.

Cite this