Exploring Gradient Oscillation in Deep Neural Network Training

Chedi Morchdi; Yi Zhou; Jie Ding; Bei Wang

doi:10.1109/Allerton58177.2023.10313361

Exploring Gradient Oscillation in Deep Neural Network Training

Chedi Morchdi, Yi Zhou, Jie Ding, Bei Wang

Statistics (Twin Cities)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Understanding optimization in deep learning is a fundamental problem, and recent findings have challenged the previously held belief that gradient descent stably trains deep networks. In this study, we delve deeper into the instability of gradient descent during the training of deep networks. By employing gradient descent to train various modern deep networks, we provide empirical evidence demonstrating that a significant portion of the optimization progress occurs through the utilization of oscillating gradients. These gradients exhibit a high negative correlation between adjacent iterations. Furthermore, we make the following noteworthy observations about these gradient oscillations (GO): (i) GO manifests in different training stages for networks with diverse architectures; (ii) when using a large learning rate, GO consistently emerges across all layers of the networks; and (iii) when employing a small learning rate, GO is more prominent in the input layers compared to the output layers. These discoveries indicate that GO is an inherent characteristic of training different types of neural networks and may serve as a source of inspiration for the development of novel optimizer designs.

Original language	English (US)
Title of host publication	2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9798350328141
DOIs	https://doi.org/10.1109/Allerton58177.2023.10313361
State	Published - 2023
Event	59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023 - Monticello, United States Duration: Sep 26 2023 → Sep 29 2023

Publication series

Name	2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023

Conference

Conference	59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023
Country/Territory	United States
City	Monticello
Period	9/26/23 → 9/29/23

Bibliographical note

Publisher Copyright:
© 2023 IEEE.

Access

10.1109/Allerton58177.2023.10313361

OpenUrl availability

Full text

Cite this

Morchdi, C., Zhou, Y., Ding, J., & Wang, B. (2023). Exploring Gradient Oscillation in Deep Neural Network Training. In 2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023 (2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/Allerton58177.2023.10313361

Exploring Gradient Oscillation in Deep Neural Network Training. / Morchdi, Chedi; Zhou, Yi; Ding, Jie et al.
2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023. Institute of Electrical and Electronics Engineers Inc., 2023. (2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Morchdi, C, Zhou, Y, Ding, J & Wang, B 2023, Exploring Gradient Oscillation in Deep Neural Network Training. in 2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023. 2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023, Institute of Electrical and Electronics Engineers Inc., 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023, Monticello, United States, 9/26/23. https://doi.org/10.1109/Allerton58177.2023.10313361

Morchdi C, Zhou Y, Ding J, Wang B. Exploring Gradient Oscillation in Deep Neural Network Training. In 2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023. Institute of Electrical and Electronics Engineers Inc. 2023. (2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023). doi: 10.1109/Allerton58177.2023.10313361

@inproceedings{29efba4984df4c3394b112b24a1c184e,

title = "Exploring Gradient Oscillation in Deep Neural Network Training",

abstract = "Understanding optimization in deep learning is a fundamental problem, and recent findings have challenged the previously held belief that gradient descent stably trains deep networks. In this study, we delve deeper into the instability of gradient descent during the training of deep networks. By employing gradient descent to train various modern deep networks, we provide empirical evidence demonstrating that a significant portion of the optimization progress occurs through the utilization of oscillating gradients. These gradients exhibit a high negative correlation between adjacent iterations. Furthermore, we make the following noteworthy observations about these gradient oscillations (GO): (i) GO manifests in different training stages for networks with diverse architectures; (ii) when using a large learning rate, GO consistently emerges across all layers of the networks; and (iii) when employing a small learning rate, GO is more prominent in the input layers compared to the output layers. These discoveries indicate that GO is an inherent characteristic of training different types of neural networks and may serve as a source of inspiration for the development of novel optimizer designs.",

author = "Chedi Morchdi and Yi Zhou and Jie Ding and Bei Wang",

note = "Publisher Copyright: {\textcopyright} 2023 IEEE.; 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023 ; Conference date: 26-09-2023 Through 29-09-2023",

year = "2023",

doi = "10.1109/Allerton58177.2023.10313361",

language = "English (US)",

series = "2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023",

}

TY - GEN

T1 - Exploring Gradient Oscillation in Deep Neural Network Training

AU - Morchdi, Chedi

AU - Zhou, Yi

AU - Ding, Jie

AU - Wang, Bei

PY - 2023

Y1 - 2023

N2 - Understanding optimization in deep learning is a fundamental problem, and recent findings have challenged the previously held belief that gradient descent stably trains deep networks. In this study, we delve deeper into the instability of gradient descent during the training of deep networks. By employing gradient descent to train various modern deep networks, we provide empirical evidence demonstrating that a significant portion of the optimization progress occurs through the utilization of oscillating gradients. These gradients exhibit a high negative correlation between adjacent iterations. Furthermore, we make the following noteworthy observations about these gradient oscillations (GO): (i) GO manifests in different training stages for networks with diverse architectures; (ii) when using a large learning rate, GO consistently emerges across all layers of the networks; and (iii) when employing a small learning rate, GO is more prominent in the input layers compared to the output layers. These discoveries indicate that GO is an inherent characteristic of training different types of neural networks and may serve as a source of inspiration for the development of novel optimizer designs.

AB - Understanding optimization in deep learning is a fundamental problem, and recent findings have challenged the previously held belief that gradient descent stably trains deep networks. In this study, we delve deeper into the instability of gradient descent during the training of deep networks. By employing gradient descent to train various modern deep networks, we provide empirical evidence demonstrating that a significant portion of the optimization progress occurs through the utilization of oscillating gradients. These gradients exhibit a high negative correlation between adjacent iterations. Furthermore, we make the following noteworthy observations about these gradient oscillations (GO): (i) GO manifests in different training stages for networks with diverse architectures; (ii) when using a large learning rate, GO consistently emerges across all layers of the networks; and (iii) when employing a small learning rate, GO is more prominent in the input layers compared to the output layers. These discoveries indicate that GO is an inherent characteristic of training different types of neural networks and may serve as a source of inspiration for the development of novel optimizer designs.

UR - http://www.scopus.com/inward/record.url?scp=85179502153&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85179502153&partnerID=8YFLogxK

U2 - 10.1109/Allerton58177.2023.10313361

DO - 10.1109/Allerton58177.2023.10313361

M3 - Conference contribution

AN - SCOPUS:85179502153

T3 - 2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023

BT - 2023 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 59th Annual Allerton Conference on Communication, Control, and Computing, Allerton 2023

Y2 - 26 September 2023 through 29 September 2023

ER -

Exploring Gradient Oscillation in Deep Neural Network Training

Abstract

Publication series

Conference

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this