PERFORMANCE ASSESSMENT OF HIGH-DIMENSIONAL VARIABLE IDENTIFICATION

Yanjia Yu, Yi Yang, Yuhong Yang

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Because model selection is ubiquitous in data analysis, the reproducibility of statistical results requires that we be able to evaluate the reliability of the employed model selection method, regardless of the model’s apparent good properties. Instability measures have been proposed for evaluating model selection uncertainty. However, low instability does not necessarily indicate that the selected model is trustworthy, because low instability can also arise when a method tends to select an overly parsimonious model. F- and G-measures have become increasingly popular for assessing variable selection performance in theoretical studies and simulation results. However, they are not computable in practice. In this work, we propose an estimation method for F- and G-measures and prove their desirable properties of uniform consistency. This gives the data analyst a valuable tool to compare different variable selection methods based on the data at hand. Extensive simulations are conducted to show the very good finite-sample performance of our approach. Lastly, we apply our methods to several microarray gene expression data sets, with intriguing results.

Original languageEnglish (US)
Pages (from-to)695-718
Number of pages24
JournalStatistica Sinica
Volume32
Issue number2
DOIs
StatePublished - Apr 2022

Bibliographical note

Funding Information:
The authors thank the editor, an associate editor, and two anonymous referees for their helpful comments and suggestions. The work of Yi Yang was partially supported by NSERC RGPIN-2016-05174 and FRQNT NC-205972.

Publisher Copyright:
© 2022 Institute of Statistical Science. All rights reserved.

Keywords

  • F-measure
  • G-measure
  • gene expression
  • model averaging
  • reproducibility
  • variable selection performance

Fingerprint

Dive into the research topics of 'PERFORMANCE ASSESSMENT OF HIGH-DIMENSIONAL VARIABLE IDENTIFICATION'. Together they form a unique fingerprint.

Cite this