Confidence sets for model selection by F-testing

Davide Ferrari; Yuhong Yang

doi:10.5705/ss.2014.110

Confidence sets for model selection by F-testing

Davide Ferrari, Yuhong Yang

Statistics (Twin Cities)

Research output: Contribution to journal › Article › peer-review

23 Scopus citations

Abstract

We introduce the notion of variable selection confidence set (VSCS) for linear regression based on F-testing. Our method identifies the most important variables in a principled way that goes beyond simply trusting the single winner based on a model selection criterion. The VSCS extends the usual notion of confidence intervals to the variable selection problem: A VSCS is a set of regression models that contains the true model with a given level of confidence. Although the size of the VSCS properly reflects the model selection uncertainty, without specific assumptions on the true model, the VSCS is typically rather large (unless the number of predictors is small). As a solution, we advocate special attention to the set of lower boundary models (LBMs), which are the most parsimonious models not statistically significantly inferior to the full model at a given confidence level. Based on the LBMs, variable importance and measures of co-appearance importance of predictors can be naturally defined.

Original language	English (US)
Pages (from-to)	1637-1658
Number of pages	22
Journal	Statistica Sinica
Volume	25
Issue number	4
DOIs	https://doi.org/10.5705/ss.2014.110
State	Published - Oct 2015

Bibliographical note

Funding Information:
We sincerely thank the two reviewers and the AE for their very helpful comments and suggestions for improving our work. In particular, the reference of Hansen, Lunde and Nason (2011) that they brought to our attention for comparison and discussion is appreciated. The work of Yuhong Yang was partially supported by the NSF Grant DMS-1106576.

Publisher Copyright:
© 2015, Institute of Statistical Science. All rights reserved.

Keywords

Confidence set
Linear regression
Model selection
Variable selection

Access

10.5705/ss.2014.110

OpenUrl availability

Full text

Cite this

@article{8488ee9deddb47fc99d3cf0a457ffa83,

title = "Confidence sets for model selection by F-testing",

abstract = "We introduce the notion of variable selection confidence set (VSCS) for linear regression based on F-testing. Our method identifies the most important variables in a principled way that goes beyond simply trusting the single winner based on a model selection criterion. The VSCS extends the usual notion of confidence intervals to the variable selection problem: A VSCS is a set of regression models that contains the true model with a given level of confidence. Although the size of the VSCS properly reflects the model selection uncertainty, without specific assumptions on the true model, the VSCS is typically rather large (unless the number of predictors is small). As a solution, we advocate special attention to the set of lower boundary models (LBMs), which are the most parsimonious models not statistically significantly inferior to the full model at a given confidence level. Based on the LBMs, variable importance and measures of co-appearance importance of predictors can be naturally defined.",

keywords = "Confidence set, Linear regression, Model selection, Variable selection",

author = "Davide Ferrari and Yuhong Yang",

note = "Funding Information: We sincerely thank the two reviewers and the AE for their very helpful comments and suggestions for improving our work. In particular, the reference of Hansen, Lunde and Nason (2011) that they brought to our attention for comparison and discussion is appreciated. The work of Yuhong Yang was partially supported by the NSF Grant DMS-1106576. Publisher Copyright: {\textcopyright} 2015, Institute of Statistical Science. All rights reserved.",

year = "2015",

month = oct,

doi = "10.5705/ss.2014.110",

language = "English (US)",

volume = "25",

pages = "1637--1658",

journal = "Statistica Sinica",

issn = "1017-0405",

publisher = "Institute of Statistical Science",

number = "4",

}

TY - JOUR

T1 - Confidence sets for model selection by F-testing

AU - Ferrari, Davide

AU - Yang, Yuhong

N1 - Funding Information: We sincerely thank the two reviewers and the AE for their very helpful comments and suggestions for improving our work. In particular, the reference of Hansen, Lunde and Nason (2011) that they brought to our attention for comparison and discussion is appreciated. The work of Yuhong Yang was partially supported by the NSF Grant DMS-1106576. Publisher Copyright: © 2015, Institute of Statistical Science. All rights reserved.

PY - 2015/10

Y1 - 2015/10

N2 - We introduce the notion of variable selection confidence set (VSCS) for linear regression based on F-testing. Our method identifies the most important variables in a principled way that goes beyond simply trusting the single winner based on a model selection criterion. The VSCS extends the usual notion of confidence intervals to the variable selection problem: A VSCS is a set of regression models that contains the true model with a given level of confidence. Although the size of the VSCS properly reflects the model selection uncertainty, without specific assumptions on the true model, the VSCS is typically rather large (unless the number of predictors is small). As a solution, we advocate special attention to the set of lower boundary models (LBMs), which are the most parsimonious models not statistically significantly inferior to the full model at a given confidence level. Based on the LBMs, variable importance and measures of co-appearance importance of predictors can be naturally defined.

AB - We introduce the notion of variable selection confidence set (VSCS) for linear regression based on F-testing. Our method identifies the most important variables in a principled way that goes beyond simply trusting the single winner based on a model selection criterion. The VSCS extends the usual notion of confidence intervals to the variable selection problem: A VSCS is a set of regression models that contains the true model with a given level of confidence. Although the size of the VSCS properly reflects the model selection uncertainty, without specific assumptions on the true model, the VSCS is typically rather large (unless the number of predictors is small). As a solution, we advocate special attention to the set of lower boundary models (LBMs), which are the most parsimonious models not statistically significantly inferior to the full model at a given confidence level. Based on the LBMs, variable importance and measures of co-appearance importance of predictors can be naturally defined.

KW - Confidence set

KW - Linear regression

KW - Model selection

KW - Variable selection

UR - http://www.scopus.com/inward/record.url?scp=84994845343&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994845343&partnerID=8YFLogxK

U2 - 10.5705/ss.2014.110

DO - 10.5705/ss.2014.110

M3 - Article

AN - SCOPUS:84994845343

SN - 1017-0405

VL - 25

SP - 1637

EP - 1658

JO - Statistica Sinica

JF - Statistica Sinica

IS - 4

ER -

Confidence sets for model selection by F-testing

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this