LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling

Nanda K. Unnikrishnan; Keshab K. Parhi

doi:10.1109/ICCAD51958.2021.9643567

LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling

Nanda K. Unnikrishnan, Keshab K. Parhi

Electrical and Computer Engineering

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

3 Scopus citations

Abstract

The time required for training the neural networks increases with size, complexity, and depth. Training model parameters by backpropagation inherently creates feedback loops. These loops hinder efficient pipelining and scheduling of the tasks within the layer and between consecutive layers. Prior approaches, such as PipeDream, have exploited the use of delayed gradient to achieve inter-layer pipelining. However, these approaches treat the entire backpropagation as a single task; this leads to an increase in computation time and processor underutilization. This paper presents novel optimization approaches where the gradient computations with respect to the weights and the activation functions are considered independently; therefore, these can be computed in parallel. This is referred to as intra-layer optimization. Additionally, the gradient computation with respect to the activation function is further divided into two parts and distributed to two consecutive layers. This leads to balanced scheduling where the computation time of each layer is the same. This is referred to as inter-layer optimization. The proposed system, referred to as LayerPipe, reduces the number of clock cycles required for training while maximizing processor utilization with minimal inter-processor communication overhead. LayerPipe achieves an average speedup of 25% and upwards of 80% with 7 to 9 processors with less communication overhead when compared to PipeDream.

Original language	English (US)
Title of host publication	2021 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781665445078
DOIs	https://doi.org/10.1109/ICCAD51958.2021.9643567
State	Published - 2021
Event	40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021 - Munich, Germany Duration: Nov 1 2021 → Nov 4 2021

Publication series

Name	IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
Volume	2021-November
ISSN (Print)	1092-3152

Conference

Conference	40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021
Country/Territory	Germany
City	Munich
Period	11/1/21 → 11/4/21

Bibliographical note

Funding Information:
This research was supported in part by the National Science Foundation under grant number CCF-1954749.

Publisher Copyright:
© 2021 IEEE.

Access

10.1109/ICCAD51958.2021.9643567

OpenUrl availability

Full text

Cite this

Unnikrishnan, N. K., & Parhi, K. K. (2021). LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling. In 2021 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021 - Proceedings (IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD; Vol. 2021-November). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCAD51958.2021.9643567

LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling. / Unnikrishnan, Nanda K.; Parhi, Keshab K.
2021 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2021. (IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD; Vol. 2021-November).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Unnikrishnan, NK & Parhi, KK 2021, LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling. in 2021 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021 - Proceedings. IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD, vol. 2021-November, Institute of Electrical and Electronics Engineers Inc., 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021, Munich, Germany, 11/1/21. https://doi.org/10.1109/ICCAD51958.2021.9643567

Unnikrishnan NK, Parhi KK. LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling. In 2021 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2021. (IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD). doi: 10.1109/ICCAD51958.2021.9643567

Unnikrishnan, Nanda K. ; Parhi, Keshab K. / LayerPipe : Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling. 2021 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2021. (IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD).

@inproceedings{ae7b93c7f89648debbc87fbc3d73d898,

title = "LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling",

abstract = "The time required for training the neural networks increases with size, complexity, and depth. Training model parameters by backpropagation inherently creates feedback loops. These loops hinder efficient pipelining and scheduling of the tasks within the layer and between consecutive layers. Prior approaches, such as PipeDream, have exploited the use of delayed gradient to achieve inter-layer pipelining. However, these approaches treat the entire backpropagation as a single task; this leads to an increase in computation time and processor underutilization. This paper presents novel optimization approaches where the gradient computations with respect to the weights and the activation functions are considered independently; therefore, these can be computed in parallel. This is referred to as intra-layer optimization. Additionally, the gradient computation with respect to the activation function is further divided into two parts and distributed to two consecutive layers. This leads to balanced scheduling where the computation time of each layer is the same. This is referred to as inter-layer optimization. The proposed system, referred to as LayerPipe, reduces the number of clock cycles required for training while maximizing processor utilization with minimal inter-processor communication overhead. LayerPipe achieves an average speedup of 25% and upwards of 80% with 7 to 9 processors with less communication overhead when compared to PipeDream.",

author = "Unnikrishnan, {Nanda K.} and Parhi, {Keshab K.}",

note = "Funding Information: This research was supported in part by the National Science Foundation under grant number CCF-1954749. Publisher Copyright: {\textcopyright} 2021 IEEE.; 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021 ; Conference date: 01-11-2021 Through 04-11-2021",

year = "2021",

doi = "10.1109/ICCAD51958.2021.9643567",

language = "English (US)",

series = "IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2021 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021 - Proceedings",

}

TY - GEN

T1 - LayerPipe

T2 - 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021

AU - Unnikrishnan, Nanda K.

AU - Parhi, Keshab K.

PY - 2021

Y1 - 2021

N2 - The time required for training the neural networks increases with size, complexity, and depth. Training model parameters by backpropagation inherently creates feedback loops. These loops hinder efficient pipelining and scheduling of the tasks within the layer and between consecutive layers. Prior approaches, such as PipeDream, have exploited the use of delayed gradient to achieve inter-layer pipelining. However, these approaches treat the entire backpropagation as a single task; this leads to an increase in computation time and processor underutilization. This paper presents novel optimization approaches where the gradient computations with respect to the weights and the activation functions are considered independently; therefore, these can be computed in parallel. This is referred to as intra-layer optimization. Additionally, the gradient computation with respect to the activation function is further divided into two parts and distributed to two consecutive layers. This leads to balanced scheduling where the computation time of each layer is the same. This is referred to as inter-layer optimization. The proposed system, referred to as LayerPipe, reduces the number of clock cycles required for training while maximizing processor utilization with minimal inter-processor communication overhead. LayerPipe achieves an average speedup of 25% and upwards of 80% with 7 to 9 processors with less communication overhead when compared to PipeDream.

AB - The time required for training the neural networks increases with size, complexity, and depth. Training model parameters by backpropagation inherently creates feedback loops. These loops hinder efficient pipelining and scheduling of the tasks within the layer and between consecutive layers. Prior approaches, such as PipeDream, have exploited the use of delayed gradient to achieve inter-layer pipelining. However, these approaches treat the entire backpropagation as a single task; this leads to an increase in computation time and processor underutilization. This paper presents novel optimization approaches where the gradient computations with respect to the weights and the activation functions are considered independently; therefore, these can be computed in parallel. This is referred to as intra-layer optimization. Additionally, the gradient computation with respect to the activation function is further divided into two parts and distributed to two consecutive layers. This leads to balanced scheduling where the computation time of each layer is the same. This is referred to as inter-layer optimization. The proposed system, referred to as LayerPipe, reduces the number of clock cycles required for training while maximizing processor utilization with minimal inter-processor communication overhead. LayerPipe achieves an average speedup of 25% and upwards of 80% with 7 to 9 processors with less communication overhead when compared to PipeDream.

UR - http://www.scopus.com/inward/record.url?scp=85124160159&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85124160159&partnerID=8YFLogxK

U2 - 10.1109/ICCAD51958.2021.9643567

DO - 10.1109/ICCAD51958.2021.9643567

M3 - Conference contribution

AN - SCOPUS:85124160159

T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD

BT - 2021 40th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2021 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

Y2 - 1 November 2021 through 4 November 2021

ER -

LayerPipe: Accelerating Deep Neural Network Training by Intra-Layer and Inter-Layer Gradient Pipelining and Multiprocessor Scheduling

Abstract

Publication series

Conference

Bibliographical note

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this