A Unified Analysis of AdaGrad With Weighted Aggregation and Momentum Acceleration

Li Shen; Congliang Chen; Fangyu Zou; Zequn Jie; Ju Sun; Wei Liu

doi:10.1109/TNNLS.2023.3279381

A Unified Analysis of AdaGrad With Weighted Aggregation and Momentum Acceleration

Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu

Computer Science and Engineering

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

Integrating adaptive learning rate and momentum techniques into stochastic gradient descent (SGD) leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, and so on. In spite of their effectiveness in practice, there is still a large gap in their theories of convergences, especially in the difficult nonconvex stochastic setting. To fill this gap, we propose weighted AdaGrad with unified momentum and dubbed AdaUSM, which has the main characteristics that: 1) it incorporates a unified momentum scheme that covers both the heavy ball (HB) momentum and the Nesterov accelerated gradient (NAG) momentum and 2) it adopts a novel weighted adaptive learning rate that can unify the learning rates of AdaGrad, AccAdaGrad, Adam, and RMSProp. Moreover, when we take polynomially growing weights in AdaUSM, we obtain its <inline-formula> <tex-math notation="LaTeX">$\mathcal{O}(\log(T)/\sqrt{T})$</tex-math> </inline-formula> convergence rate in the nonconvex stochastic setting. We also show that the adaptive learning rates of Adam and RMSProp correspond to taking exponentially growing weights in AdaUSM, thereby providing a new perspective for understanding Adam and RMSProp. Finally, comparative experiments of AdaUSM against SGD with momentum, AdaGrad, AdaEMA, Adam, and AMSGrad on various deep learning models and datasets are also carried out.

Original language	English (US)
Pages (from-to)	1-9
Number of pages	9
Journal	IEEE Transactions on Neural Networks and Learning Systems
DOIs	https://doi.org/10.1109/TNNLS.2023.3279381
State	Accepted/In press - 2023

Bibliographical note

Publisher Copyright:
IEEE

Keywords

Adaptive learning
Convergence
Convergence rate
Noise measurement
Optimization
Stochastic processes
Sun
Weight measurement
nonconvex optimization
stochastic gradient descent

PubMed: MeSH publication types

Journal Article

Access

10.1109/TNNLS.2023.3279381

Cite this

@article{7060cefe32cd4f22a3232e330cc17ba7,

title = "A Unified Analysis of AdaGrad With Weighted Aggregation and Momentum Acceleration",

abstract = "Integrating adaptive learning rate and momentum techniques into stochastic gradient descent (SGD) leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, and so on. In spite of their effectiveness in practice, there is still a large gap in their theories of convergences, especially in the difficult nonconvex stochastic setting. To fill this gap, we propose weighted AdaGrad with unified momentum and dubbed AdaUSM, which has the main characteristics that: 1) it incorporates a unified momentum scheme that covers both the heavy ball (HB) momentum and the Nesterov accelerated gradient (NAG) momentum and 2) it adopts a novel weighted adaptive learning rate that can unify the learning rates of AdaGrad, AccAdaGrad, Adam, and RMSProp. Moreover, when we take polynomially growing weights in AdaUSM, we obtain its $\mathcal{O}(\log(T)/\sqrt{T})$ convergence rate in the nonconvex stochastic setting. We also show that the adaptive learning rates of Adam and RMSProp correspond to taking exponentially growing weights in AdaUSM, thereby providing a new perspective for understanding Adam and RMSProp. Finally, comparative experiments of AdaUSM against SGD with momentum, AdaGrad, AdaEMA, Adam, and AMSGrad on various deep learning models and datasets are also carried out.",

keywords = "Adaptive learning, Convergence, Convergence rate, Noise measurement, Optimization, Stochastic processes, Sun, Weight measurement, nonconvex optimization, stochastic gradient descent",

author = "Li Shen and Congliang Chen and Fangyu Zou and Zequn Jie and Ju Sun and Wei Liu",

note = "Publisher Copyright: IEEE",

year = "2023",

doi = "10.1109/TNNLS.2023.3279381",

language = "English (US)",

pages = "1--9",

journal = "IEEE Transactions on Neural Networks and Learning Systems",

issn = "2162-237X",

publisher = "IEEE Computational Intelligence Society",

}

TY - JOUR

T1 - A Unified Analysis of AdaGrad With Weighted Aggregation and Momentum Acceleration

AU - Shen, Li

AU - Chen, Congliang

AU - Zou, Fangyu

AU - Jie, Zequn

AU - Sun, Ju

AU - Liu, Wei

N1 - Publisher Copyright: IEEE

PY - 2023

Y1 - 2023

N2 - Integrating adaptive learning rate and momentum techniques into stochastic gradient descent (SGD) leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, and so on. In spite of their effectiveness in practice, there is still a large gap in their theories of convergences, especially in the difficult nonconvex stochastic setting. To fill this gap, we propose weighted AdaGrad with unified momentum and dubbed AdaUSM, which has the main characteristics that: 1) it incorporates a unified momentum scheme that covers both the heavy ball (HB) momentum and the Nesterov accelerated gradient (NAG) momentum and 2) it adopts a novel weighted adaptive learning rate that can unify the learning rates of AdaGrad, AccAdaGrad, Adam, and RMSProp. Moreover, when we take polynomially growing weights in AdaUSM, we obtain its $\mathcal{O}(\log(T)/\sqrt{T})$ convergence rate in the nonconvex stochastic setting. We also show that the adaptive learning rates of Adam and RMSProp correspond to taking exponentially growing weights in AdaUSM, thereby providing a new perspective for understanding Adam and RMSProp. Finally, comparative experiments of AdaUSM against SGD with momentum, AdaGrad, AdaEMA, Adam, and AMSGrad on various deep learning models and datasets are also carried out.

AB - Integrating adaptive learning rate and momentum techniques into stochastic gradient descent (SGD) leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, and so on. In spite of their effectiveness in practice, there is still a large gap in their theories of convergences, especially in the difficult nonconvex stochastic setting. To fill this gap, we propose weighted AdaGrad with unified momentum and dubbed AdaUSM, which has the main characteristics that: 1) it incorporates a unified momentum scheme that covers both the heavy ball (HB) momentum and the Nesterov accelerated gradient (NAG) momentum and 2) it adopts a novel weighted adaptive learning rate that can unify the learning rates of AdaGrad, AccAdaGrad, Adam, and RMSProp. Moreover, when we take polynomially growing weights in AdaUSM, we obtain its $\mathcal{O}(\log(T)/\sqrt{T})$ convergence rate in the nonconvex stochastic setting. We also show that the adaptive learning rates of Adam and RMSProp correspond to taking exponentially growing weights in AdaUSM, thereby providing a new perspective for understanding Adam and RMSProp. Finally, comparative experiments of AdaUSM against SGD with momentum, AdaGrad, AdaEMA, Adam, and AMSGrad on various deep learning models and datasets are also carried out.

KW - Adaptive learning

KW - Convergence

KW - Convergence rate

KW - Noise measurement

KW - Optimization

KW - Stochastic processes

KW - Sun

KW - Weight measurement

KW - nonconvex optimization

KW - stochastic gradient descent

UR - http://www.scopus.com/inward/record.url?scp=85162710503&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85162710503&partnerID=8YFLogxK

U2 - 10.1109/TNNLS.2023.3279381

DO - 10.1109/TNNLS.2023.3279381

M3 - Article

C2 - 37310828

AN - SCOPUS:85162710503

SN - 2162-237X

SP - 1

EP - 9

JO - IEEE Transactions on Neural Networks and Learning Systems

JF - IEEE Transactions on Neural Networks and Learning Systems

ER -

A Unified Analysis of AdaGrad With Weighted Aggregation and Momentum Acceleration

Abstract

Bibliographical note

Keywords

PubMed: MeSH publication types

Access

Other files and links

Fingerprint

Cite this