Exploring Gradient Oscillation in Deep Neural Network Training

Chedi Morchdi, Yi Zhou, Jie Ding, Bei Wang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Understanding optimization in deep learning is a fundamental problem, and recent findings have challenged the previously held belief that gradient descent stably trains deep networks. In this study, we delve deeper into the instability of gradient descent during the training of deep networks. By employing gradient descent to train various modern deep networks, we provide empirical evidence demonstrating that a significant portion of the optimization progress occurs through the utilization of oscillating gradients. These gradients exhibit a high negative correlation between adjacent iterations. Furthermore, we make the following noteworthy observations about these gradient oscillations (GO): (i) GO manifests in different training stages for networks with diverse architectures; (ii) when using a large learning rate, GO consistently emerges across all layers of the networks; and (iii) when employing a small learning rate, GO is more prominent in the input layers compared to the output layers. These discoveries indicate that GO is an inherent characteristic of training different types of neural networks and may serve as a source of inspiration for the development of novel optimizer designs.

Original languageEnglish (US)
Title of host publication2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350328141
DOIs
StatePublished - 2023
Event59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023 - Monticello, United States
Duration: Sep 26 2023Sep 29 2023

Publication series

Name2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023

Conference

Conference59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023
Country/TerritoryUnited States
CityMonticello
Period9/26/239/29/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Fingerprint

Dive into the research topics of 'Exploring Gradient Oscillation in Deep Neural Network Training'. Together they form a unique fingerprint.

Cite this