EAGER: Quantifying the error landscape of deep neural networks

  • Martiniani, Stefano (PI)

Project: Research project

Project Details

Description

The remarkable success achieved by deep learning systems in a broad number of applications can be attributed to their ability to approximate complex functions well, their aptitude to being trained efficiently, and their good performance in predicting the values of unseen inputs. This last property, known as generalization, is particularly puzzling. It is observed that deep neural networks (DNNs) trained by the optimization algorithm known as stochastic gradient descent produce models that generalize well, particularly when the number of model parameters greatly exceeds the number of samples on which the model is trained. Traditional theory fails to explain these observations and new perspectives and means of investigation are necessary to elucidate these phenomena. To this end, statistical mechanics may provide methods and perspectives capable of addressing long-standing questions in deep learning. The energy landscape represents a common paradigm at the intersection of these fields: when training a DNN we descend the so- called 'error landscape' towards a minimum corresponding to a particular choice of model parameters. Understanding generalization performance in DNNs amounts to understanding the interplay between the structure of the error landscape and the dynamics of the training algorithm that descends it. In particular, the concept of 'flat minima' is gaining popularity as a possible explanation for these observations, but a rigorous approach for estimating flatness is lacking. We propose to employ a new class of methods developed within statistical mechanics to answer questions concerning the structure of the error landscapes of DNNs and to identify the relationship between the probability of finding a given solution, its flatness and its generalization performance. This line of investigation should have a significant impact on our understanding of generalization in deep learning systems with implications for high-stakes applications such as transportation, security and medicine.

This proposal seeks to bring a new degree of rigor in the characterization of the error landscape of DNNs and how the interplay between landscape structure and optimization dynamics yield generalizable solutions. As a result, we will be able to elucidate why DNNs are endowed with low estimation error (i.e., high generalization performance). Such an understanding will represent a significant step forward in the development of a theory of deep learning. We aim to do so by exploiting state-of-the-science numerical techniques to measure the volume of basins of attraction in high-dimensional parameter spaces. We will measure the basin volume distributions and the associated flatness as a function of the number of parameters and the generalization performance of the network.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

StatusFinished
Effective start/end date9/15/217/31/22

Funding

  • National Science Foundation: $149,225.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.