Double Probability Integral Transform Residuals for Regression Models with Discrete Outcomes

Lu Yang

doi:10.1080/10618600.2024.2303336

Double Probability Integral Transform Residuals for Regression Models with Discrete Outcomes

Lu Yang

Statistics (Twin Cities)

Research output: Contribution to journal › Article › peer-review

Abstract

The assessment of regression models with discrete outcomes is challenging and has many fundamental issues. With discrete outcomes, standard regression model assessment tools such as Pearson and deviance residuals do not follow the conventional reference distribution (normal) under the true model, calling into question the legitimacy of model assessment based on these tools. To fill this gap, we construct a new type of residuals for regression models with general discrete outcomes, including ordinal and count outcomes. The proposed residuals are based on two layers of probability integral transformation. When at least one continuous covariate is available, the proposed residuals closely follow a uniform distribution (or a normal distribution after transformation) under the correctly specified model. One can construct visualizations such as QQ plots to check the overall fit of a model straightforwardly, and the shape of QQ plots can further help identify possible causes of misspecification such as overdispersion. We provide theoretical justification for the proposed residuals by establishing their asymptotic properties. Moreover, in order to assess the mean structure and identify potential covariates, we develop an ordered curve as a supplementary tool, which is based on the comparison between the partial sum of outcomes and of fitted means. Through simulation, we demonstrate empirically that the proposed tools outperform commonly used residuals for various model assessment tasks. We also illustrate the workflow of model assessment using the proposed tools in data analysis. Supplementary materials for this article are available online.

Original language	English (US)
Journal	Journal of Computational and Graphical Statistics
DOIs	https://doi.org/10.1080/10618600.2024.2303336
State	Accepted/In press - 2024

Bibliographical note

Publisher Copyright:
© 2024 American Statistical Association and Institute of Mathematical Statistics.

Keywords

Generalized linear models
Goodness-of-fit
Model diagnostics

Access

10.1080/10618600.2024.2303336

Cite this

@article{655f5b8bb2714a3db726c1ad26ba9840,

title = "Double Probability Integral Transform Residuals for Regression Models with Discrete Outcomes",

abstract = "The assessment of regression models with discrete outcomes is challenging and has many fundamental issues. With discrete outcomes, standard regression model assessment tools such as Pearson and deviance residuals do not follow the conventional reference distribution (normal) under the true model, calling into question the legitimacy of model assessment based on these tools. To fill this gap, we construct a new type of residuals for regression models with general discrete outcomes, including ordinal and count outcomes. The proposed residuals are based on two layers of probability integral transformation. When at least one continuous covariate is available, the proposed residuals closely follow a uniform distribution (or a normal distribution after transformation) under the correctly specified model. One can construct visualizations such as QQ plots to check the overall fit of a model straightforwardly, and the shape of QQ plots can further help identify possible causes of misspecification such as overdispersion. We provide theoretical justification for the proposed residuals by establishing their asymptotic properties. Moreover, in order to assess the mean structure and identify potential covariates, we develop an ordered curve as a supplementary tool, which is based on the comparison between the partial sum of outcomes and of fitted means. Through simulation, we demonstrate empirically that the proposed tools outperform commonly used residuals for various model assessment tasks. We also illustrate the workflow of model assessment using the proposed tools in data analysis. Supplementary materials for this article are available online.",

keywords = "Generalized linear models, Goodness-of-fit, Model diagnostics",

author = "Lu Yang",

note = "Publisher Copyright: {\textcopyright} 2024 American Statistical Association and Institute of Mathematical Statistics.",

year = "2024",

doi = "10.1080/10618600.2024.2303336",

language = "English (US)",

journal = "Journal of Computational and Graphical Statistics",

issn = "1061-8600",

publisher = "American Statistical Association",

}

TY - JOUR

T1 - Double Probability Integral Transform Residuals for Regression Models with Discrete Outcomes

AU - Yang, Lu

PY - 2024

Y1 - 2024

N2 - The assessment of regression models with discrete outcomes is challenging and has many fundamental issues. With discrete outcomes, standard regression model assessment tools such as Pearson and deviance residuals do not follow the conventional reference distribution (normal) under the true model, calling into question the legitimacy of model assessment based on these tools. To fill this gap, we construct a new type of residuals for regression models with general discrete outcomes, including ordinal and count outcomes. The proposed residuals are based on two layers of probability integral transformation. When at least one continuous covariate is available, the proposed residuals closely follow a uniform distribution (or a normal distribution after transformation) under the correctly specified model. One can construct visualizations such as QQ plots to check the overall fit of a model straightforwardly, and the shape of QQ plots can further help identify possible causes of misspecification such as overdispersion. We provide theoretical justification for the proposed residuals by establishing their asymptotic properties. Moreover, in order to assess the mean structure and identify potential covariates, we develop an ordered curve as a supplementary tool, which is based on the comparison between the partial sum of outcomes and of fitted means. Through simulation, we demonstrate empirically that the proposed tools outperform commonly used residuals for various model assessment tasks. We also illustrate the workflow of model assessment using the proposed tools in data analysis. Supplementary materials for this article are available online.

AB - The assessment of regression models with discrete outcomes is challenging and has many fundamental issues. With discrete outcomes, standard regression model assessment tools such as Pearson and deviance residuals do not follow the conventional reference distribution (normal) under the true model, calling into question the legitimacy of model assessment based on these tools. To fill this gap, we construct a new type of residuals for regression models with general discrete outcomes, including ordinal and count outcomes. The proposed residuals are based on two layers of probability integral transformation. When at least one continuous covariate is available, the proposed residuals closely follow a uniform distribution (or a normal distribution after transformation) under the correctly specified model. One can construct visualizations such as QQ plots to check the overall fit of a model straightforwardly, and the shape of QQ plots can further help identify possible causes of misspecification such as overdispersion. We provide theoretical justification for the proposed residuals by establishing their asymptotic properties. Moreover, in order to assess the mean structure and identify potential covariates, we develop an ordered curve as a supplementary tool, which is based on the comparison between the partial sum of outcomes and of fitted means. Through simulation, we demonstrate empirically that the proposed tools outperform commonly used residuals for various model assessment tasks. We also illustrate the workflow of model assessment using the proposed tools in data analysis. Supplementary materials for this article are available online.

KW - Generalized linear models

KW - Goodness-of-fit

KW - Model diagnostics

UR - http://www.scopus.com/inward/record.url?scp=85185504453&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85185504453&partnerID=8YFLogxK

U2 - 10.1080/10618600.2024.2303336

DO - 10.1080/10618600.2024.2303336

M3 - Article

AN - SCOPUS:85185504453

SN - 1061-8600

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

ER -

Double Probability Integral Transform Residuals for Regression Models with Discrete Outcomes

Abstract

Bibliographical note

Keywords

Access

Other files and links

Fingerprint

Cite this