Abstract
This paper presents an energy-efficient, deep parallel Convolutional Neural Network (CNN) accelerator. By adopting a recently proposed binary weight method, the CNN computations are converted into multiplication-free processing. To allow parallel accessing and storing of data, we use two RAM banks, where each bank is composed of NRAM blocks corresponding to N-parallel processing. We also design a reconfigurable CNN computing unit in a divide-and-reuse to support a variable-size convolutional filter. Compared with full-precision computing on the MNIST and CIFAR-10 classification tasks, the inference Top-1 accuracy of the binary weight CNN has dropped by 1.21% and 1.34%, respectively. The hardware implementation results show that the proposed design can achieve 2100 GOPs with a 4.6 millisecond processing latency. The deep parallel accelerator exhibits 3X energy efficiency compared to a GPU-based design.
Original language | English (US) |
---|---|
Title of host publication | 2018 IEEE 23rd International Conference on Digital Signal Processing, DSP 2018 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781538668115 |
DOIs | |
State | Published - Jul 2 2018 |
Event | 23rd IEEE International Conference on Digital Signal Processing, DSP 2018 - Shanghai, China Duration: Nov 19 2018 → Nov 21 2018 |
Publication series
Name | International Conference on Digital Signal Processing, DSP |
---|---|
Volume | 2018-November |
Conference
Conference | 23rd IEEE International Conference on Digital Signal Processing, DSP 2018 |
---|---|
Country/Territory | China |
City | Shanghai |
Period | 11/19/18 → 11/21/18 |
Bibliographical note
Publisher Copyright:© 2018 IEEE.
Keywords
- Convolutional Neural Network (CNN)
- deep neural network
- energy efficiency
- parallel implementation