Some statistical consideration in transcriptome-wide association studies

for the Alzheimer's Disease Neuroimaging Initiative

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

The methodology of transcriptome-wide association studies (TWAS) has become popular in integrating a reference expression quantitative trait (eQTL) data set with an independent main GWAS data set to identify (putatively) causal genes, shedding mechanistic insights to biological pathways from genetic variants to a GWAS trait mediated by gene expression. Statistically TWAS is a (two-sample) 2-stage least squares (2SLS) method in the framework of instrumental variables analysis for causal inference: in Stage 1 it uses the reference eQTL data to impute a genes expression for the main GWAS data, then in Stage 2 it tests for association between the imputed gene expression and the GWAS trait; if an association is detected in Stage 2, a (putatively) causal relationship between the gene and the GWAS trait is claimed. If a nonlinear model or a generalized linear model (GLM) is fitted in Stage 2 (e.g., for a binary GWAS trait), it is known that using only imputed gene expression, as in standard TWAS, in general does not lead to a consistent (i.e., asymptotically unbiased) estimate for the causal effect; accordingly, a variation of 2SLS, called two-stage residual inclusion (2SRI), has been proposed to yield better estimates (e.g., being consistent under suitable conditions). Our main goal is to investigate whether it is necessary or even better to apply 2SRI, instead of the standard 2SLS. In addition, due to the use of imputed gene expression (i.e., with measurement errors), it is known that in general some correction to the standard error estimate of the causal effect estimate has to be applied, while in the standard TWAS no correction is applied. Is this an issue? We also compare one-sample 2SLS with two-sample 2SLS (i.e., the standard TWAS). We used the Alzheimer's Disease Neuroimaging Initiative (ADNI) data and simulated data mimicking the ADNI data to address the above questions. At the end, we conclude that, in practice with the large sample sizes and small effect sizes of genetic variants, the standard TWAS performs well and is recommended.

Original languageEnglish (US)
Pages (from-to)221-232
Number of pages12
JournalGenetic epidemiology
Volume44
Issue number3
DOIs
StatePublished - Apr 1 2020

Bibliographical note

Funding Information:
We thank the reviewers for many helpful comments and suggestions. WP would like to thank Dr. Todd MacKenzie for first introducing 2SRI to him. This study was supported by NIH grants R21AG057038, R01HL116720, R01GM113250, R01GM126002, and R01HL105397, and by the Minnesota Supercomputing Institute at the University of Minnesota.

Funding Information:
We thank the reviewers for many helpful comments and suggestions. WP would like to thank Dr. Todd MacKenzie for first introducing 2SRI to him. This study was supported by NIH grants R21AG057038, R01HL116720, R01GM113250, R01GM126002, and R01HL105397, and by the Minnesota Supercomputing Institute at the University of Minnesota.

Publisher Copyright:
© 2019 Wiley Periodicals, Inc.

Keywords

  • 2SLS
  • 2SPS
  • 2SRI
  • Mendelian randomization
  • TWAS
  • causal inference
  • instrumental variables

Fingerprint

Dive into the research topics of 'Some statistical consideration in transcriptome-wide association studies'. Together they form a unique fingerprint.

Cite this