To understand double descent, we need to understand VC theory

Vladimir Cherkassky, Eng Hock Lee

Research output: Contribution to journalArticlepeer-review

Abstract

We analyze generalization performance of over-parameterized learning methods for classification, under VC-theoretical framework. Recently, practitioners in Deep Learning discovered ‘double descent’ phenomenon, when large networks can fit perfectly available training data, and at the same time, achieve good generalization for future (test) data. The current consensus view is that VC-theoretical results cannot account for good generalization performance of Deep Learning networks. In contrast, this paper shows that double descent can be explained by VC-theoretical concepts, such as VC-dimension and Structural Risk Minimization. We also present empirical results showing that double descent generalization curves can be accurately modeled using classical VC-generalization bounds. Proposed VC-theoretical analysis enables better understanding of generalization curves for data sets with different statistical characteristics, such as low vs high-dimensional data and noisy data. In addition, we analyze generalization performance of transfer learning using pre-trained Deep Learning networks.

Original languageEnglish (US)
Pages (from-to)242-256
Number of pages15
JournalNeural Networks
Volume169
DOIs
StatePublished - Jan 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2023 Elsevier Ltd

Keywords

  • Complexity control
  • Deep learning
  • Double descent
  • Structural risk minimization
  • VC-dimension
  • VC-generalization bounds

Fingerprint

Dive into the research topics of 'To understand double descent, we need to understand VC theory'. Together they form a unique fingerprint.

Cite this