Multicomposite nonconvex optimization for training deep neural networks

Ying Cui; Ziyu He; Jong Shi Pang

doi:10.1137/18M1231559

Multicomposite nonconvex optimization for training deep neural networks

Ying Cui, Ziyu He, Jong Shi Pang

Industrial and Systems Engineering

Research output: Contribution to journal › Article › peer-review

12 Scopus citations

Abstract

We present in this paper a novel deterministic algorithmic framework that enables the computation of a directional stationary solution of the empirical deep neural network training problem formulated as a multicomposite optimization problem with coupled nonconvexity and nondifferentiability. This is the first time to our knowledge that such a sharp kind of stationary solution is provably computable for a nonsmooth deep neural network. Allowing for arbitrary finite numbers of input samples and training layers, an arbitrary number of neurons within each layer, and arbitrary piecewise activation functions, the proposed approach combines the methods of exact penalization, majorization-minimization, gradient projection with enhancements, and the dual semismooth Newton method, each for a particular purpose in an overall computational scheme. While a routine implementation of the semismooth Newton method would be computationally expensive, we show that careful linear algebraic implementation helps to greatly reduce the computational and storage costs for problems of arbitrary dimensions. Contrary to existing stochastic approaches which provide at best very weak guarantees on the computed solutions obtained in practical implementation, our rigorous deterministic treatment provides guarantee of the stationarity properties of the computed solutions with reference to the optimization problems being solved. Numerical results from a MATLAB implementation demonstrate the effectiveness of the framework for solving reasonably sized networks with a modest number of training samples (in the low thousands).

Original language	English (US)
Pages (from-to)	1693-1723
Number of pages	31
Journal	SIAM Journal on Optimization
Volume	30
Issue number	2
DOIs	https://doi.org/10.1137/18M1231559
State	Published - 2020

Bibliographical note

Funding Information:
The authors are grateful to Defeng Sun at the Hong Kong Polytechnic University and Kim-Chuan Toh at the National University of Singapore for discussions on this paper. They are particularly indebted to Dr. Toh for his review of our MATLAB codes for accuracy and possible improvements. We also thank two referees for constructive comments that have improved the presentation of the paper.

Funding Information:
\ast Received by the editors December 10, 2018; accepted for publication (in revised form) April 17, 2020; published electronically June 18, 2020. https://doi.org/10.1137/18M1231559 \bfF \bfu \bfn \bfd \bfi \bfn \bfg : This work was based on research supported by the National Science Foundation under grant IIS-1632971 and by the Air Force Office of Scientific Research under grant FA9550-18-1-0382. \dagger Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN 55414 (yingcui@umn.edu). \ddagger Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA 90089-0193 (ziyuhe@usc.edu, jongship@usc.edu).

Publisher Copyright:
© 2020 Society for Industrial and Applied Mathematics.

Keywords

Deep neural network
Exact penalization
Majorization-minimization
Nonconvexity
Nondifferentiablity
Semismooth Newton method

Access

10.1137/18M1231559

OpenUrl availability

Full text

Cite this

@article{19c8d132176e44a1851d746b7e3bc4a3,

title = "Multicomposite nonconvex optimization for training deep neural networks",

abstract = "We present in this paper a novel deterministic algorithmic framework that enables the computation of a directional stationary solution of the empirical deep neural network training problem formulated as a multicomposite optimization problem with coupled nonconvexity and nondifferentiability. This is the first time to our knowledge that such a sharp kind of stationary solution is provably computable for a nonsmooth deep neural network. Allowing for arbitrary finite numbers of input samples and training layers, an arbitrary number of neurons within each layer, and arbitrary piecewise activation functions, the proposed approach combines the methods of exact penalization, majorization-minimization, gradient projection with enhancements, and the dual semismooth Newton method, each for a particular purpose in an overall computational scheme. While a routine implementation of the semismooth Newton method would be computationally expensive, we show that careful linear algebraic implementation helps to greatly reduce the computational and storage costs for problems of arbitrary dimensions. Contrary to existing stochastic approaches which provide at best very weak guarantees on the computed solutions obtained in practical implementation, our rigorous deterministic treatment provides guarantee of the stationarity properties of the computed solutions with reference to the optimization problems being solved. Numerical results from a MATLAB implementation demonstrate the effectiveness of the framework for solving reasonably sized networks with a modest number of training samples (in the low thousands).",

keywords = "Deep neural network, Exact penalization, Majorization-minimization, Nonconvexity, Nondifferentiablity, Semismooth Newton method",

author = "Ying Cui and Ziyu He and Pang, {Jong Shi}",

note = "Funding Information: The authors are grateful to Defeng Sun at the Hong Kong Polytechnic University and Kim-Chuan Toh at the National University of Singapore for discussions on this paper. They are particularly indebted to Dr. Toh for his review of our MATLAB codes for accuracy and possible improvements. We also thank two referees for constructive comments that have improved the presentation of the paper. Funding Information: \ast Received by the editors December 10, 2018; accepted for publication (in revised form) April 17, 2020; published electronically June 18, 2020. https://doi.org/10.1137/18M1231559 \bfF \bfu \bfn \bfd \bfi \bfn \bfg : This work was based on research supported by the National Science Foundation under grant IIS-1632971 and by the Air Force Office of Scientific Research under grant FA9550-18-1-0382. \dagger Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN 55414 (yingcui@umn.edu). \ddagger Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA 90089-0193 (ziyuhe@usc.edu, jongship@usc.edu). Publisher Copyright: {\textcopyright} 2020 Society for Industrial and Applied Mathematics.",

year = "2020",

doi = "10.1137/18M1231559",

language = "English (US)",

volume = "30",

pages = "1693--1723",

journal = "SIAM Journal on Optimization",

issn = "1052-6234",

publisher = "Society for Industrial and Applied Mathematics Publications",

number = "2",

}

TY - JOUR

T1 - Multicomposite nonconvex optimization for training deep neural networks

AU - Cui, Ying

AU - He, Ziyu

AU - Pang, Jong Shi

N1 - Funding Information: The authors are grateful to Defeng Sun at the Hong Kong Polytechnic University and Kim-Chuan Toh at the National University of Singapore for discussions on this paper. They are particularly indebted to Dr. Toh for his review of our MATLAB codes for accuracy and possible improvements. We also thank two referees for constructive comments that have improved the presentation of the paper. Funding Information: \ast Received by the editors December 10, 2018; accepted for publication (in revised form) April 17, 2020; published electronically June 18, 2020. https://doi.org/10.1137/18M1231559 \bfF \bfu \bfn \bfd \bfi \bfn \bfg : This work was based on research supported by the National Science Foundation under grant IIS-1632971 and by the Air Force Office of Scientific Research under grant FA9550-18-1-0382. \dagger Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, MN 55414 (yingcui@umn.edu). \ddagger Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, CA 90089-0193 (ziyuhe@usc.edu, jongship@usc.edu). Publisher Copyright: © 2020 Society for Industrial and Applied Mathematics.

PY - 2020

Y1 - 2020

N2 - We present in this paper a novel deterministic algorithmic framework that enables the computation of a directional stationary solution of the empirical deep neural network training problem formulated as a multicomposite optimization problem with coupled nonconvexity and nondifferentiability. This is the first time to our knowledge that such a sharp kind of stationary solution is provably computable for a nonsmooth deep neural network. Allowing for arbitrary finite numbers of input samples and training layers, an arbitrary number of neurons within each layer, and arbitrary piecewise activation functions, the proposed approach combines the methods of exact penalization, majorization-minimization, gradient projection with enhancements, and the dual semismooth Newton method, each for a particular purpose in an overall computational scheme. While a routine implementation of the semismooth Newton method would be computationally expensive, we show that careful linear algebraic implementation helps to greatly reduce the computational and storage costs for problems of arbitrary dimensions. Contrary to existing stochastic approaches which provide at best very weak guarantees on the computed solutions obtained in practical implementation, our rigorous deterministic treatment provides guarantee of the stationarity properties of the computed solutions with reference to the optimization problems being solved. Numerical results from a MATLAB implementation demonstrate the effectiveness of the framework for solving reasonably sized networks with a modest number of training samples (in the low thousands).

AB - We present in this paper a novel deterministic algorithmic framework that enables the computation of a directional stationary solution of the empirical deep neural network training problem formulated as a multicomposite optimization problem with coupled nonconvexity and nondifferentiability. This is the first time to our knowledge that such a sharp kind of stationary solution is provably computable for a nonsmooth deep neural network. Allowing for arbitrary finite numbers of input samples and training layers, an arbitrary number of neurons within each layer, and arbitrary piecewise activation functions, the proposed approach combines the methods of exact penalization, majorization-minimization, gradient projection with enhancements, and the dual semismooth Newton method, each for a particular purpose in an overall computational scheme. While a routine implementation of the semismooth Newton method would be computationally expensive, we show that careful linear algebraic implementation helps to greatly reduce the computational and storage costs for problems of arbitrary dimensions. Contrary to existing stochastic approaches which provide at best very weak guarantees on the computed solutions obtained in practical implementation, our rigorous deterministic treatment provides guarantee of the stationarity properties of the computed solutions with reference to the optimization problems being solved. Numerical results from a MATLAB implementation demonstrate the effectiveness of the framework for solving reasonably sized networks with a modest number of training samples (in the low thousands).

KW - Deep neural network

KW - Exact penalization

KW - Majorization-minimization

KW - Nonconvexity

KW - Nondifferentiablity

KW - Semismooth Newton method

UR - http://www.scopus.com/inward/record.url?scp=85095119855&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85095119855&partnerID=8YFLogxK

U2 - 10.1137/18M1231559

DO - 10.1137/18M1231559

M3 - Article

AN - SCOPUS:85095119855

SN - 1052-6234

VL - 30

SP - 1693

EP - 1723

JO - SIAM Journal on Optimization

JF - SIAM Journal on Optimization

IS - 2

ER -

Multicomposite nonconvex optimization for training deep neural networks

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this