Statistical inference with large-scale trait imputation

Jingchen Ren; Wei Pan

doi:10.1002/sim.9975

Statistical inference with large-scale trait imputation

Jingchen Ren, Wei Pan

Biostatistics

Research output: Contribution to journal › Article › peer-review

Abstract

Recently a nonparametric method called LS-imputation has been proposed for large-scale trait imputation based on a GWAS summary dataset and a large set of genotyped individuals. The imputed trait values, along with the genotypes, can be treated as an individual-level dataset for downstream genetic analyses, including those that cannot be done with GWAS summary data. However, since the covariance matrix of the imputed trait values is often too large to calculate, the current method imposes a working assumption that the imputed trait values are identically and independently distributed, which is incorrect in truth. Here we propose a “divide and conquer/combine” strategy to estimate and account for the covariance matrix of the imputed trait values via batches, thus relaxing the incorrect working assumption. Applications of the methods to the UK Biobank data for marginal association analysis showed some improvement by the new method in some cases, but overall the original method performed well, which was explained by nearly constant variances of and mostly weak correlations among imputed trait values.

Original language	English (US)
Pages (from-to)	625-641
Number of pages	17
Journal	Statistics in Medicine
Volume	43
Issue number	4
DOIs	https://doi.org/10.1002/sim.9975
State	Published - Feb 20 2024

Bibliographical note

Publisher Copyright:
© 2023 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Keywords

GWAS
LS-imputation
SNP
least squares
linear models

PubMed: MeSH publication types

Journal Article

Access

10.1002/sim.9975

OpenUrl availability

Full text

Cite this

@article{2d90d5d0d4f842e2a6dbf37ad43f30fe,

title = "Statistical inference with large-scale trait imputation",

abstract = "Recently a nonparametric method called LS-imputation has been proposed for large-scale trait imputation based on a GWAS summary dataset and a large set of genotyped individuals. The imputed trait values, along with the genotypes, can be treated as an individual-level dataset for downstream genetic analyses, including those that cannot be done with GWAS summary data. However, since the covariance matrix of the imputed trait values is often too large to calculate, the current method imposes a working assumption that the imputed trait values are identically and independently distributed, which is incorrect in truth. Here we propose a “divide and conquer/combine” strategy to estimate and account for the covariance matrix of the imputed trait values via batches, thus relaxing the incorrect working assumption. Applications of the methods to the UK Biobank data for marginal association analysis showed some improvement by the new method in some cases, but overall the original method performed well, which was explained by nearly constant variances of and mostly weak correlations among imputed trait values.",

keywords = "GWAS, LS-imputation, SNP, least squares, linear models",

author = "Jingchen Ren and Wei Pan",

note = "Publisher Copyright: {\textcopyright} 2023 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.",

year = "2024",

month = feb,

day = "20",

doi = "10.1002/sim.9975",

language = "English (US)",

volume = "43",

pages = "625--641",

journal = "Statistics in Medicine",

issn = "0277-6715",

publisher = "John Wiley and Sons Ltd",

number = "4",

}

TY - JOUR

T1 - Statistical inference with large-scale trait imputation

AU - Ren, Jingchen

AU - Pan, Wei

PY - 2024/2/20

Y1 - 2024/2/20

N2 - Recently a nonparametric method called LS-imputation has been proposed for large-scale trait imputation based on a GWAS summary dataset and a large set of genotyped individuals. The imputed trait values, along with the genotypes, can be treated as an individual-level dataset for downstream genetic analyses, including those that cannot be done with GWAS summary data. However, since the covariance matrix of the imputed trait values is often too large to calculate, the current method imposes a working assumption that the imputed trait values are identically and independently distributed, which is incorrect in truth. Here we propose a “divide and conquer/combine” strategy to estimate and account for the covariance matrix of the imputed trait values via batches, thus relaxing the incorrect working assumption. Applications of the methods to the UK Biobank data for marginal association analysis showed some improvement by the new method in some cases, but overall the original method performed well, which was explained by nearly constant variances of and mostly weak correlations among imputed trait values.

AB - Recently a nonparametric method called LS-imputation has been proposed for large-scale trait imputation based on a GWAS summary dataset and a large set of genotyped individuals. The imputed trait values, along with the genotypes, can be treated as an individual-level dataset for downstream genetic analyses, including those that cannot be done with GWAS summary data. However, since the covariance matrix of the imputed trait values is often too large to calculate, the current method imposes a working assumption that the imputed trait values are identically and independently distributed, which is incorrect in truth. Here we propose a “divide and conquer/combine” strategy to estimate and account for the covariance matrix of the imputed trait values via batches, thus relaxing the incorrect working assumption. Applications of the methods to the UK Biobank data for marginal association analysis showed some improvement by the new method in some cases, but overall the original method performed well, which was explained by nearly constant variances of and mostly weak correlations among imputed trait values.

KW - GWAS

KW - LS-imputation

KW - SNP

KW - least squares

KW - linear models

UR - http://www.scopus.com/inward/record.url?scp=85178457575&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85178457575&partnerID=8YFLogxK

U2 - 10.1002/sim.9975

DO - 10.1002/sim.9975

M3 - Article

C2 - 38038193

AN - SCOPUS:85178457575

SN - 0277-6715

VL - 43

SP - 625

EP - 641

JO - Statistics in Medicine

JF - Statistics in Medicine

IS - 4

ER -

Statistical inference with large-scale trait imputation

Abstract

Bibliographical note

Keywords

PubMed: MeSH publication types

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this