Assessment of Regression Models with Noncontinuous Outcomes

Project: Research project

Project Details

Description

Regression models are widely used to synthesize researchers' knowledge about relationships between predictors and an outcome of interest, for example, the effects of treatments on mortality. However, researchers' prior information may not adequately describe the patterns in the data. The resulting model deficiency can lead to biased parameter estimates, misleading conclusions, lack of generalizability of results, and unreliable predictions, among many other detrimental consequences. Judging a model's adequacy to describe the data is thus a routine and critical task in statistics. Currently, there is a lack of valid tools for assessing regression models with noncontinuous outcomes. Noncontinuous data are found frequently in many domains of science. Examples include stages of cancer in medical research (ordinal), the number of offspring of organisms in ecology (count), and rainfall amounts in climate research (semicontinuous with a probability of zero corresponding to no rain). This project will fill this gap and provide a framework for model assessment with noncontinuous outcomes. The project outputs will benefit researchers and practitioners in a wide range of areas by enabling them to draw conclusions from their models with statistical confidence and find directions for model improvements. The project will integrate research into course development and graduate student mentoring and develop free software for broad dissemination. The assessment of regression models with noncontinuous (e.g., count, binary, ordinal, and semicontinuous) outcomes is challenging and has many fundamental issues. In this context, standard regression model assessment tools such as Pearson and deviance goodness-of-fit tests do not follow their null distributions under the true model, calling into question the legitimacy of model assessment based on these tools. In addition, existing assessment tools might have little statistical power in detecting model misspecification. The long-term goal of this project is to establish a principled framework for assessing regression models with noncontinuous outcomes. The envisioned framework includes first an omnibus goodness-of-fit test for regression models with general discrete outcomes. This goodness-of-fit test is based on the distance between the quasi-empirical residual distribution function and its null pattern (the identity function). It can assist analysts in obtaining p-values and confidence statements on the adequacy of their models. The theoretical properties of the test statistic will be studied using empirical process theory. Extensive simulation studies and real data analysis will be conducted for evaluation. Second, the investigator will study a new type of residuals for discrete outcomes, which can help identify potential causes of misspecification and provide directions for model improvement. Third, this framework will be extended to semicontinuous outcomes, which are scarcely studied in the literature. Completion of this research will fill the gap between regression models and an informative assessment framework for noncontinuous outcomes.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusActive
Effective start/end date7/1/226/30/25

Funding

  • National Science Foundation: $203,498.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.