InterGrad: Energy-Efficient Training of Convolutional Neural Networks via Interleaved Gradient Scheduling

Nanda K. Unnikrishnan; Keshab K. Parhi

doi:10.1109/TCSI.2023.3246468

InterGrad: Energy-Efficient Training of Convolutional Neural Networks via Interleaved Gradient Scheduling

Nanda K. Unnikrishnan, Keshab K. Parhi

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

This paper addresses the design of accelerators using systolic architectures to train convolutional neural networks using a novel gradient interleaving approach. Training the neural network involves computation and backpropagation of gradients of error with respect to the activation functions and weights. It is shown that the gradient with respect to the activation function can be computed using a weight-stationary systolic array, while the gradient with respect to the weights can be computed using an output-stationary systolic array. The novelty of the proposed approach lies in interleaving the computations of these two gradients on the same configurable systolic array. This results in the reuse of the variables from one computation to the other and eliminates unnecessary memory accesses and energy consumption associated with these memory accesses. The proposed approach leads to 1.4-2.2 × savings in terms of the number of cycles and 1.9 × savings in terms of memory accesses in the fully-connected layer. Furthermore, the proposed method uses up to 25% fewer cycles and memory accesses, and 16% less energy than baseline implementations for state-of-the-art CNNs. Under iso-area comparisons, for Inception-v4, compared to weight-stationary (WS), Intergrad achieves 12% savings in energy, 17% savings in memory, and 4% savings in cycles. Savings for Densenet-264 are 18%, 26%, and 27% with respect to energy, memory, and cycles, respectively. Thus, the proposed novel accelerator architecture reduces the latency and energy consumption for training deep neural networks.

Original language	English (US)
Pages (from-to)	1949-1962
Number of pages	14
Journal	IEEE Transactions on Circuits and Systems I: Regular Papers
Volume	70
Issue number	5
DOIs	https://doi.org/10.1109/TCSI.2023.3246468
State	Published - May 1 2023
Externally published	Yes

Bibliographical note

Publisher Copyright:
© 2004-2012 IEEE.

Keywords

Neural network training
accelerator architectures
backpropagation
convolutional neural networks
gradient interleaving
interleaved scheduling
systolic array

Access

10.1109/TCSI.2023.3246468

OpenUrl availability

Full text

Cite this

@article{7da7935f82f74286b42a7920edf3fc9f,

title = "InterGrad: Energy-Efficient Training of Convolutional Neural Networks via Interleaved Gradient Scheduling",

abstract = "This paper addresses the design of accelerators using systolic architectures to train convolutional neural networks using a novel gradient interleaving approach. Training the neural network involves computation and backpropagation of gradients of error with respect to the activation functions and weights. It is shown that the gradient with respect to the activation function can be computed using a weight-stationary systolic array, while the gradient with respect to the weights can be computed using an output-stationary systolic array. The novelty of the proposed approach lies in interleaving the computations of these two gradients on the same configurable systolic array. This results in the reuse of the variables from one computation to the other and eliminates unnecessary memory accesses and energy consumption associated with these memory accesses. The proposed approach leads to 1.4-2.2 × savings in terms of the number of cycles and 1.9 × savings in terms of memory accesses in the fully-connected layer. Furthermore, the proposed method uses up to 25% fewer cycles and memory accesses, and 16% less energy than baseline implementations for state-of-the-art CNNs. Under iso-area comparisons, for Inception-v4, compared to weight-stationary (WS), Intergrad achieves 12% savings in energy, 17% savings in memory, and 4% savings in cycles. Savings for Densenet-264 are 18%, 26%, and 27% with respect to energy, memory, and cycles, respectively. Thus, the proposed novel accelerator architecture reduces the latency and energy consumption for training deep neural networks.",

keywords = "Neural network training, accelerator architectures, backpropagation, convolutional neural networks, gradient interleaving, interleaved scheduling, systolic array",

author = "Unnikrishnan, {Nanda K.} and Parhi, {Keshab K.}",

note = "Publisher Copyright: {\textcopyright} 2004-2012 IEEE.",

year = "2023",

month = may,

day = "1",

doi = "10.1109/TCSI.2023.3246468",

language = "English (US)",

volume = "70",

pages = "1949--1962",

journal = "IEEE Transactions on Circuits and Systems I: Regular Papers",

issn = "1549-8328",

number = "5",

}

TY - JOUR

T1 - InterGrad

T2 - Energy-Efficient Training of Convolutional Neural Networks via Interleaved Gradient Scheduling

AU - Unnikrishnan, Nanda K.

AU - Parhi, Keshab K.

PY - 2023/5/1

Y1 - 2023/5/1

N2 - This paper addresses the design of accelerators using systolic architectures to train convolutional neural networks using a novel gradient interleaving approach. Training the neural network involves computation and backpropagation of gradients of error with respect to the activation functions and weights. It is shown that the gradient with respect to the activation function can be computed using a weight-stationary systolic array, while the gradient with respect to the weights can be computed using an output-stationary systolic array. The novelty of the proposed approach lies in interleaving the computations of these two gradients on the same configurable systolic array. This results in the reuse of the variables from one computation to the other and eliminates unnecessary memory accesses and energy consumption associated with these memory accesses. The proposed approach leads to 1.4-2.2 × savings in terms of the number of cycles and 1.9 × savings in terms of memory accesses in the fully-connected layer. Furthermore, the proposed method uses up to 25% fewer cycles and memory accesses, and 16% less energy than baseline implementations for state-of-the-art CNNs. Under iso-area comparisons, for Inception-v4, compared to weight-stationary (WS), Intergrad achieves 12% savings in energy, 17% savings in memory, and 4% savings in cycles. Savings for Densenet-264 are 18%, 26%, and 27% with respect to energy, memory, and cycles, respectively. Thus, the proposed novel accelerator architecture reduces the latency and energy consumption for training deep neural networks.

AB - This paper addresses the design of accelerators using systolic architectures to train convolutional neural networks using a novel gradient interleaving approach. Training the neural network involves computation and backpropagation of gradients of error with respect to the activation functions and weights. It is shown that the gradient with respect to the activation function can be computed using a weight-stationary systolic array, while the gradient with respect to the weights can be computed using an output-stationary systolic array. The novelty of the proposed approach lies in interleaving the computations of these two gradients on the same configurable systolic array. This results in the reuse of the variables from one computation to the other and eliminates unnecessary memory accesses and energy consumption associated with these memory accesses. The proposed approach leads to 1.4-2.2 × savings in terms of the number of cycles and 1.9 × savings in terms of memory accesses in the fully-connected layer. Furthermore, the proposed method uses up to 25% fewer cycles and memory accesses, and 16% less energy than baseline implementations for state-of-the-art CNNs. Under iso-area comparisons, for Inception-v4, compared to weight-stationary (WS), Intergrad achieves 12% savings in energy, 17% savings in memory, and 4% savings in cycles. Savings for Densenet-264 are 18%, 26%, and 27% with respect to energy, memory, and cycles, respectively. Thus, the proposed novel accelerator architecture reduces the latency and energy consumption for training deep neural networks.

KW - Neural network training

KW - accelerator architectures

KW - backpropagation

KW - convolutional neural networks

KW - gradient interleaving

KW - interleaved scheduling

KW - systolic array

UR - http://www.scopus.com/inward/record.url?scp=85149395263&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85149395263&partnerID=8YFLogxK

U2 - 10.1109/TCSI.2023.3246468

DO - 10.1109/TCSI.2023.3246468

M3 - Article

AN - SCOPUS:85149395263

SN - 1549-8328

VL - 70

SP - 1949

EP - 1962

JO - IEEE Transactions on Circuits and Systems I: Regular Papers

JF - IEEE Transactions on Circuits and Systems I: Regular Papers

IS - 5

ER -

InterGrad: Energy-Efficient Training of Convolutional Neural Networks via Interleaved Gradient Scheduling

Abstract

Bibliographical note

Keywords

Access

OpenUrl availability

Other files and links

Fingerprint

Cite this