A Unified Analysis of AdaGrad With Weighted Aggregation and Momentum Acceleration

Li Shen, Congliang Chen, Fangyu Zou, Zequn Jie, Ju Sun, Wei Liu

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

Integrating adaptive learning rate and momentum techniques into stochastic gradient descent (SGD) leads to a large class of efficiently accelerated adaptive stochastic algorithms, such as AdaGrad, RMSProp, Adam, AccAdaGrad, and so on. In spite of their effectiveness in practice, there is still a large gap in their theories of convergences, especially in the difficult nonconvex stochastic setting. To fill this gap, we propose weighted AdaGrad with unified momentum and dubbed AdaUSM, which has the main characteristics that: 1) it incorporates a unified momentum scheme that covers both the heavy ball (HB) momentum and the Nesterov accelerated gradient (NAG) momentum and 2) it adopts a novel weighted adaptive learning rate that can unify the learning rates of AdaGrad, AccAdaGrad, Adam, and RMSProp. Moreover, when we take polynomially growing weights in AdaUSM, we obtain its <inline-formula> <tex-math notation="LaTeX">$\mathcal{O}(\log(T)/\sqrt{T})$</tex-math> </inline-formula> convergence rate in the nonconvex stochastic setting. We also show that the adaptive learning rates of Adam and RMSProp correspond to taking exponentially growing weights in AdaUSM, thereby providing a new perspective for understanding Adam and RMSProp. Finally, comparative experiments of AdaUSM against SGD with momentum, AdaGrad, AdaEMA, Adam, and AMSGrad on various deep learning models and datasets are also carried out.

Original languageEnglish (US)
Pages (from-to)1-9
Number of pages9
JournalIEEE Transactions on Neural Networks and Learning Systems
DOIs
StateAccepted/In press - 2023

Bibliographical note

Publisher Copyright:
IEEE

Keywords

  • Adaptive learning
  • Convergence
  • Convergence rate
  • Noise measurement
  • Optimization
  • Stochastic processes
  • Sun
  • Weight measurement
  • nonconvex optimization
  • stochastic gradient descent

PubMed: MeSH publication types

  • Journal Article

Fingerprint

Dive into the research topics of 'A Unified Analysis of AdaGrad With Weighted Aggregation and Momentum Acceleration'. Together they form a unique fingerprint.

Cite this