Abstract
This paper addresses the design of accelerators using systolic architectures to train convolutional neural networks using a novel gradient interleaving approach. Training the neural network involves computation and backpropagation of gradients of error with respect to the activation functions and weights. It is shown that the gradient with respect to the activation function can be computed using a weight-stationary systolic array, while the gradient with respect to the weights can be computed using an output-stationary systolic array. The novelty of the proposed approach lies in interleaving the computations of these two gradients on the same configurable systolic array. This results in the reuse of the variables from one computation to the other and eliminates unnecessary memory accesses and energy consumption associated with these memory accesses. The proposed approach leads to 1.4-2.2 × savings in terms of the number of cycles and 1.9 × savings in terms of memory accesses in the fully-connected layer. Furthermore, the proposed method uses up to 25% fewer cycles and memory accesses, and 16% less energy than baseline implementations for state-of-the-art CNNs. Under iso-area comparisons, for Inception-v4, compared to weight-stationary (WS), Intergrad achieves 12% savings in energy, 17% savings in memory, and 4% savings in cycles. Savings for Densenet-264 are 18%, 26%, and 27% with respect to energy, memory, and cycles, respectively. Thus, the proposed novel accelerator architecture reduces the latency and energy consumption for training deep neural networks.
Original language | English (US) |
---|---|
Pages (from-to) | 1949-1962 |
Number of pages | 14 |
Journal | IEEE Transactions on Circuits and Systems I: Regular Papers |
Volume | 70 |
Issue number | 5 |
DOIs | |
State | Published - May 1 2023 |
Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2004-2012 IEEE.
Keywords
- Neural network training
- accelerator architectures
- backpropagation
- convolutional neural networks
- gradient interleaving
- interleaved scheduling
- systolic array