TY - JOUR
T1 - NeuPart
T2 - Using Analytical Models to Drive Energy-Efficient Partitioning of CNN Computations on Cloud-Connected Mobile Clients
AU - Manasi, Susmita Dey
AU - Snigdha, Farhana Sharmin
AU - Sapatnekar, Sachin S.
N1 - Publisher Copyright:
© 1993-2012 IEEE.
PY - 2020/8
Y1 - 2020/8
N2 - Data processing on convolutional neural networks (CNNs) places a heavy burden on energy-constrained mobile platforms. This article optimizes energy on a mobile client by partitioning CNN computations between in situ processing on the client and offloaded computations in the cloud. A new analytical CNN energy model is formulated, capturing all major components of the in situ computation, for ASIC-based deep learning accelerators. The model is benchmarked against measured silicon data. The analytical framework is used to determine the optimal energy partition point between the client and the cloud at runtime. On standard CNN topologies, partitioned computation is demonstrated to provide significant energy savings on the client over a fully cloud-based computation or fully in situ computation. For example, at 80 Mbps effective bit rate and 0.78 W transmission power, the optimal partition for AlexNet [SqueezeNet] saves up to 52.4% [73.4%] energy over a fully cloud-based computation and 27.3% [28.8%] energy over a fully in situ computation.
AB - Data processing on convolutional neural networks (CNNs) places a heavy burden on energy-constrained mobile platforms. This article optimizes energy on a mobile client by partitioning CNN computations between in situ processing on the client and offloaded computations in the cloud. A new analytical CNN energy model is formulated, capturing all major components of the in situ computation, for ASIC-based deep learning accelerators. The model is benchmarked against measured silicon data. The analytical framework is used to determine the optimal energy partition point between the client and the cloud at runtime. On standard CNN topologies, partitioned computation is demonstrated to provide significant energy savings on the client over a fully cloud-based computation or fully in situ computation. For example, at 80 Mbps effective bit rate and 0.78 W transmission power, the optimal partition for AlexNet [SqueezeNet] saves up to 52.4% [73.4%] energy over a fully cloud-based computation and 27.3% [28.8%] energy over a fully in situ computation.
KW - Computation partitioning
KW - convolutional neural networks (CNNs)
KW - embedded deep learning
KW - energy modeling
KW - hardware acceleration
UR - http://www.scopus.com/inward/record.url?scp=85089884207&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089884207&partnerID=8YFLogxK
U2 - 10.1109/TVLSI.2020.2995135
DO - 10.1109/TVLSI.2020.2995135
M3 - Article
AN - SCOPUS:85089884207
SN - 1063-8210
VL - 28
SP - 1844
EP - 1857
JO - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
JF - IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IS - 8
M1 - 9113336
ER -