Multicomposite nonconvex optimization for training deep neural networks

Ying Cui, Ziyu He, Jong Shi Pang

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

We present in this paper a novel deterministic algorithmic framework that enables the computation of a directional stationary solution of the empirical deep neural network training problem formulated as a multicomposite optimization problem with coupled nonconvexity and nondifferentiability. This is the first time to our knowledge that such a sharp kind of stationary solution is provably computable for a nonsmooth deep neural network. Allowing for arbitrary finite numbers of input samples and training layers, an arbitrary number of neurons within each layer, and arbitrary piecewise activation functions, the proposed approach combines the methods of exact penalization, majorization-minimization, gradient projection with enhancements, and the dual semismooth Newton method, each for a particular purpose in an overall computational scheme. While a routine implementation of the semismooth Newton method would be computationally expensive, we show that careful linear algebraic implementation helps to greatly reduce the computational and storage costs for problems of arbitrary dimensions. Contrary to existing stochastic approaches which provide at best very weak guarantees on the computed solutions obtained in practical implementation, our rigorous deterministic treatment provides guarantee of the stationarity properties of the computed solutions with reference to the optimization problems being solved. Numerical results from a MATLAB implementation demonstrate the effectiveness of the framework for solving reasonably sized networks with a modest number of training samples (in the low thousands).

Original languageEnglish (US)
Pages (from-to)1693-1723
Number of pages31
JournalSIAM Journal on Optimization
Volume30
Issue number2
DOIs
StatePublished - 2020

Bibliographical note

Funding Information:
The authors are grateful to Defeng Sun at the Hong Kong Polytechnic University and Kim-Chuan Toh at the National University of Singapore for discussions on this paper. They are particularly indebted to Dr. Toh for his review of our MATLAB codes for accuracy and possible improvements. We also thank two referees for constructive comments that have improved the presentation of the paper.

Funding Information:
\ast Received by the editors December 10, 2018; accepted for publication (in revised form) April 17, 2020; published electronically June 18, 2020. https://doi.org/10.1137/18M1231559 \bfF \bfu \bfn \bfd \bfi \bfn \bfg : This work was based on research supported by the National Science Foundation under grant IIS-1632971 and by the Air Force Office of Scientific Research under grant FA9550-18-1-0382. \dagger Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN 55414 (yingcui@umn.edu). \ddagger Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA 90089-0193 (ziyuhe@usc.edu, jongship@usc.edu).

Publisher Copyright:
© 2020 Society for Industrial and Applied Mathematics.

Keywords

  • Deep neural network
  • Exact penalization
  • Majorization-minimization
  • Nonconvexity
  • Nondifferentiablity
  • Semismooth Newton method

Fingerprint

Dive into the research topics of 'Multicomposite nonconvex optimization for training deep neural networks'. Together they form a unique fingerprint.

Cite this