ON the DIVERGENCE of DECENTRALIZED NONCONVEX OPTIMIZATION

Mingyi Hong, Siliang Zeng, Junyu Zhang, Haoran Sun

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

\bfA \bfb \bfs \bft \bfr \bfa \bfc \bft . In this work, we study a generic class of decentralized algorithms in which N agents jointly optimize the nonconvex objective function f(\bfu):= 1/N \sum Ni=1 fi(\bfu), while only communicating with their neighbors. This class of problems has become popular in modeling many signal processing and decentralized machine learning applications, and efficient algorithms have been proposed for such a type of problem. However, most of the existing decentralized algorithms require that the local function gradients \nabla fi's as well as the average function gradient \nabla f are Lipschitz, that is, the local Lipschitz conditions (LLC) and global Lipschitz condition (GLC) are satisfied. In this work, we first demonstrate the importance of the above Lipschitzness assumptions on the state-of-the-art decentralized algorithms. First, by constructing a series of examples, we show that when the LLC on the local function gradient \nabla fi's are not satisfied, a number of state-of-the-art decentralized algorithms diverge, even if the global Lipschitz condition (GLC) still holds. This observation brings out a fundamental theoretical issue of the existing decentralized algorithms-their convergence conditions are strictly stronger than centralized algorithms such as the gradient descent, which only requires the GLC. Our observation raises an important open question: How to design decentralized algorithms when the LLC, or even the GLC, is not satisfied? To address this question, we design two first-order algorithms, which are capable of computing stationary solutions of the original problem with neither the LLC nor the GLC condition. In particular, we show that the proposed algorithms converge sublinearly to a certain \epsilon -stationary solution, where the precise rate depends on various algorithmic and problem parameters. In particular, if the local function fi's are lower bounded Qth order polynomials, then the rate becomes \scrO (1/\epsilon Q -1) for Q \geq 2 (where the \scrO notation hides some constants such as dependency on the network topology). Such a rate is tight for the special case of Q = 2 where each fi satisfies LLC. To our knowledge, this is the first attempt that studies decentralized nonconvex optimization problems with neither the LLC nor the GLC.

Original languageEnglish (US)
Pages (from-to)2879-2908
Number of pages30
JournalSIAM Journal on Optimization
Volume32
Issue number4
DOIs
StatePublished - Dec 2022

Bibliographical note

Funding Information:
\ast Received by the editors July 15, 2020; accepted for publication (in revised form) June 24, 2022; published electronically November 21, 2022. https://doi.org/10.1137/20M1353149 \bfF \bfu \bfn \bfd \bfi \bfn \bfg : The authors are supported in part by the NSF under grants CMMI-172775 and CIF-1910385, by the AFOSR under grant 19RT0424, and by the ARO under grant W911NF-19-1-0247. \dagger Department of ECE, University of Minnesota Twin Cities, Minneapolis, MN 55455 USA (mhong@umn.edu, zeng0176@umn.edu, sun00111@umn.edu). \ddagger Department of ISEM, National University of Singapore, 119007, Singapore (junyuz@nus.edu.sg).

Publisher Copyright:
© 2022 Society for Industrial and Applied Mathematics Publications. All rights reserved.

Keywords

  • Lipschitz continuous gradient
  • nonconvex problems
  • \bfK \bfe \bfy \bfw \bfo \bfr \bfd \bfs . decentralized optimization

Fingerprint

Dive into the research topics of 'ON the DIVERGENCE of DECENTRALIZED NONCONVEX OPTIMIZATION'. Together they form a unique fingerprint.

Cite this