A permutation test approach to the choice of size k for the nearest neighbors classifier

Yinglei Lai, Baolin Wu, Hongyu Zhao

Research output: Contribution to journalArticlepeer-review

Abstract

The k nearest neighbors (k-NN) classifier is one of the most popular methods for statistical pattern recognition and machine learning. In practice, the size k, the number of neighbors used for classification, is usually arbitrarily set to one or some other small numbers, or based on the cross-validation procedure. In this study,we propose a novel alternative approach to decide the size k. Based on a k-NN-based multivariate multi-sample test, we assign each k a permutation test based Z-score. The number of NN is set to the k with the highest Z-score. This approach is computationally efficient since we have derived the formulas for the mean and variance of the test statistic under permutation distribution for multiple sample groups. Several simulation and real-world data sets are analyzed to investigate the performance of our approach. The usefulness of our approach is demonstrated through the evaluation of prediction accuracies using Zscore as a criterion to select the size k.We also compare our approach to the widely used cross-validation approaches. The results show that the size k selected by our approach yields high prediction accuracies when informative features are used for classification, whereas the cross-validation approach may fail in some cases.

Original languageEnglish (US)
Pages (from-to)2289-2302
Number of pages14
JournalJournal of Applied Statistics
Volume38
Issue number10
DOIs
StatePublished - Oct 2011

Bibliographical note

Funding Information:
We greatly appreciate the careful reading and helpful comments from the editor and two anonymous reviewers. Y.L. was supported in part by a start-up fund from GWU/CCAS and the NIH grant DK-75004. H.Z. was supported in part by the NIH grant GM59507 and the NSF grant DMS0714817.

Keywords

  • Cross-validation
  • Nearest neighbors classifier
  • Number of neighbors
  • Permutation test
  • Prediction accuracy

Fingerprint

Dive into the research topics of 'A permutation test approach to the choice of size k for the nearest neighbors classifier'. Together they form a unique fingerprint.

Cite this