Next Article in Journal
SVM-Based Blood Exam Classification for Predicting Defining Factors in Metabolic Syndrome Diagnosis
Next Article in Special Issue
Efficient Architectures for Full Hardware Scrypt-Based Block Hashing System
Previous Article in Journal
Using Open Tools to Transform Retired Equipment into Powerful Engineering Education Instruments: A Smart Agri-IoT Control Example
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators

Computer Science Department, Donald Bren School of Information and Computer Sciences, University of California, Irvine, CA 92697, USA
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(6), 858; https://doi.org/10.3390/electronics11060858
Submission received: 1 February 2022 / Revised: 24 February 2022 / Accepted: 1 March 2022 / Published: 9 March 2022

Abstract

Although neural network quantization is an imperative technology for the computation and memory efficiency of embedded neural network accelerators, simple post-training quantization incurs unacceptable levels of accuracy degradation on some important models targeting embedded systems, such as MobileNets. While explicit quantization-aware training or re-training after quantization can often reclaim lost accuracy, this is not always possible or convenient. We present an alternative approach to compressing such difficult neural networks, using a novel variant of the ZFP lossy floating-point compression algorithm to compress both model weights and inter-layer activations and demonstrate that it can be efficiently implemented on an embedded FPGA platform. Our ZFP variant, which we call ZFPe, is designed for efficient implementation on embedded accelerators, such as FPGAs, requiring a fraction of chip resources per bandwidth compared to state-of-the-art lossy compression accelerators. ZFPe-compressing the MobileNet V2 model with an 8-bit budget per weight and activation results in significantly higher accuracy compared to 8-bit integer post-training quantization and shows no loss of accuracy, compared to an uncompressed model when given a 12-bit budget per floating-point value. To demonstrate the benefits of our approach, we implement an embedded neural network accelerator on a realistic embedded acceleration platform equipped with the low-power Lattice ECP5-85F FPGA and a 32 MB SDRAM chip. Each ZFPe module consumes less than 6% of LUTs while compressing or decompressing one value per cycle, requiring a fraction of the resources compared to state-of-the-art compression accelerators while completely removing the memory bottleneck of our accelerator.
Keywords: embedded FPGA accelerators; compression; neural networks embedded FPGA accelerators; compression; neural networks

Share and Cite

MDPI and ACS Style

Lim, S.-M.; Jun, S.-W. MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators. Electronics 2022, 11, 858. https://doi.org/10.3390/electronics11060858

AMA Style

Lim S-M, Jun S-W. MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators. Electronics. 2022; 11(6):858. https://doi.org/10.3390/electronics11060858

Chicago/Turabian Style

Lim, Se-Min, and Sang-Woo Jun. 2022. "MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators" Electronics 11, no. 6: 858. https://doi.org/10.3390/electronics11060858

APA Style

Lim, S.-M., & Jun, S.-W. (2022). MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators. Electronics, 11(6), 858. https://doi.org/10.3390/electronics11060858

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop