Model Selection Diagnostics and Localized Model Selection/Combination

Project: Research project

Project Details

Description

Although research in the last decade has brought in general awareness of the seriousness of statistical uncertainty due to model selection, much more effort is needed to reform the currently still dominating practice of basing all statistical conclusions on a final selected model. From a methodological standpoint, a critical component missing in the toolbox of model selection and model combination is model selection diagnostics (not model diagnostics). The PI seeks model selection diagnosis methods that go beyond simple bootstrap uncertainty measures. They will address the uncertainty in variable selection and in estimation of a quantity of interest via means that take into account the distances between subsets of variables or between estimates from the candidate models. For high-dimensional or complex data, it is very likely that different candidate procedures perform the best in different regions, especially when very distinct learning methods are considered. This calls for localized model selection/combination methodology and theory, which is the second major component of this project. The PI takes new approaches and derives oracle inequalities on performance of the new methods for localized model selection or combination.

Statistical methods have become an essential ingredient in all applied sciences. Proper quantifications of the true uncertainty in mathematical descriptions of the natural and social phenomena are fundamentally important to draw unbiased and accurate conclusions. Since model selection and model combination play a central role in statistical analysis, the proposed work on accurately measuring model selection uncertainty and the resulting better tools for model selection and model combination, together with other researches in the area, are expected to contribute substantially to changing the currently unsound practices of statistical model selection in applied sciences. The improved use of data in information extraction will have broader impacts in scientific research, policy and decision making.

StatusFinished
Effective start/end date6/1/075/31/11

Funding

  • National Science Foundation: $197,482.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.