MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators

Lim, Se-Min; Jun, Sang-Woo

doi:10.3390/electronics11060858

Open AccessArticle

MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators

by

Se-Min Lim

and

Sang-Woo Jun

^*

Computer Science Department, Donald Bren School of Information and Computer Sciences, University of California, Irvine, CA 92697, USA

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(6), 858; https://doi.org/10.3390/electronics11060858

Submission received: 1 February 2022 / Revised: 24 February 2022 / Accepted: 1 March 2022 / Published: 9 March 2022

(This article belongs to the Special Issue Reconfigurable Hardware Accelerators: Opportunities, Trends, and Challenges)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Although neural network quantization is an imperative technology for the computation and memory efficiency of embedded neural network accelerators, simple post-training quantization incurs unacceptable levels of accuracy degradation on some important models targeting embedded systems, such as MobileNets. While explicit quantization-aware training or re-training after quantization can often reclaim lost accuracy, this is not always possible or convenient. We present an alternative approach to compressing such difficult neural networks, using a novel variant of the ZFP lossy floating-point compression algorithm to compress both model weights and inter-layer activations and demonstrate that it can be efficiently implemented on an embedded FPGA platform. Our ZFP variant, which we call ZFPe, is designed for efficient implementation on embedded accelerators, such as FPGAs, requiring a fraction of chip resources per bandwidth compared to state-of-the-art lossy compression accelerators. ZFPe-compressing the MobileNet V2 model with an 8-bit budget per weight and activation results in significantly higher accuracy compared to 8-bit integer post-training quantization and shows no loss of accuracy, compared to an uncompressed model when given a 12-bit budget per floating-point value. To demonstrate the benefits of our approach, we implement an embedded neural network accelerator on a realistic embedded acceleration platform equipped with the low-power Lattice ECP5-85F FPGA and a 32 MB SDRAM chip. Each ZFPe module consumes less than 6% of LUTs while compressing or decompressing one value per cycle, requiring a fraction of the resources compared to state-of-the-art compression accelerators while completely removing the memory bottleneck of our accelerator.

Keywords: embedded FPGA accelerators; compression; neural networks

Share and Cite

MDPI and ACS Style

Lim, S.-M.; Jun, S.-W. MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators. Electronics 2022, 11, 858. https://doi.org/10.3390/electronics11060858

AMA Style

Lim S-M, Jun S-W. MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators. Electronics. 2022; 11(6):858. https://doi.org/10.3390/electronics11060858

Chicago/Turabian Style

Lim, Se-Min, and Sang-Woo Jun. 2022. "MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators" Electronics 11, no. 6: 858. https://doi.org/10.3390/electronics11060858

APA Style

Lim, S.-M., & Jun, S.-W. (2022). MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators. Electronics, 11(6), 858. https://doi.org/10.3390/electronics11060858

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MobileNets Can Be Lossily Compressed: Neural Network Compression for Embedded Accelerators

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI