Improved JPEG Lossless Compression for Compression of Intermediate Layers in Neural Networks Based on Compute-In-Memory
Abstract
:1. Introduction
2. Related Work
2.1. Data Compression Theory
2.2. Encoding Introduction
2.2.1. Predictive Coding
2.2.2. Statistical Coding
2.2.3. Transform Coding
3. Framework
3.1. JPEG-LS Encode
3.1.1. Gradient Value Calculation
3.1.2. Quantization of Gradient Values
3.1.3. Gradient Value Combination
3.1.4. Prediction Value Calculation
3.1.5. Prediction Error Calculation
3.1.6. Golomb Coding
3.1.7. Runlen Coding
3.1.8. Context Update
3.2. JPEG-LS Decode
4. Results
4.1. Hardware Evaluation
4.2. Compression Ratio
4.3. Comparison
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Si, X.; Chen, J.J.; Tu, Y.N.; Huang, W.H.; Wang, J.H.; Chiu, Y.C.; Wei, W.C.; Wu, S.Y.; Sun, X.; Liu, R.; et al. Twin-8T SRAM Computation-in-Memory Unit-Macro for Multibit CNN-Based AI Edge Processors. IEEE J.-Solid-State Circuits 2020, 55, 189–202. [Google Scholar] [CrossRef]
- Yue, J.; Yuan, Z.; Feng, X.; He, Y.; Zhang, Z.; Si, X.; Liu, R.; Chang, M.F.; Li, X.; Yang, H.; et al. 14.3 A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse. In Proceedings of the 2020 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA, 16–20 February 2020; pp. 234–236. [Google Scholar]
- Xue, C.X.; Chen, W.H.; Liu, J.S.; Li, J.F.; Lin, W.Y.; Lin, W.E.; Wang, J.H.; Wei, W.C.; Huang, T.Y.; Chang, T.W.; et al. Embedded 1-Mb ReRAM-Based Computing-in- Memory Macro With Multibit Input and Weight for CNN-Based AI Edge Processors. IEEE J.-Solid-State Circuits 2020, 55, 203–215. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. DeepCompression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. Fiber 2015, 56, 3–7. [Google Scholar]
- Xie, C.; Shao, Z.; Zhao, N.; Du, Y.; Du, L. An Efficient CNN Inference Accelerator Based on Intra- and Inter-Channel Feature Map Compression. IEEE Trans. Circuits Syst. Regul. Pap. 2023, 70, 3625–3638. [Google Scholar] [CrossRef]
- Shao, Z.; Chen, X.; Du, L.; Chen, L.; Du, Y.; Zhuang, W.; Wei, H.; Xie, C.; Wang, Z. Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression. IEEE Trans. Circuits Sys Tems Regul. Pap. 2022, 69, 668–681. [Google Scholar] [CrossRef]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Matsuo, Y. Predictive Coding Using Local Decoded Images with Different Degrees of Blurring. In Proceedings of the 2023 IEEE 13th International Conference on Consumer Electronics-Berlin (ICCE-Berlin), Berlin, Germany, 3–5 September 2023; pp. 25–28. [Google Scholar]
- Barowsky, M.; Mariona, A.; Calmon, F.P. Predictive Coding for Lossless Dataset Compression. In Proceedings of the ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 1545–1549. [Google Scholar]
- Zhang, J.; Zhao, D.; Jiang, F. Spatially directional predictive coding for block-based compressive sensing of natural images. In Proceedings of the 2013 IEEE International Conference on Image Processing, Melbourne, VIC, Australia, 15–18 September 2013; pp. 1021–1025. [Google Scholar]
- Weinberger, M.J.; Seroussi, G.; Sapiro, G. The LOCO-I lossless image compression algorithm: Principles and standardization into JPEG-LS. IEEE Trans. Image Process. 2000, 9, 1309–1324. [Google Scholar] [CrossRef] [PubMed]
- Marpe, D.; Schwarz, H.; Wiegand, T. Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard. IEEE Trans. Circuits Syst. Video Technol. 2003, 13, 620–636. [Google Scholar] [CrossRef]
- Rissanen, J.; Langdon, G.G. Arithmetic Coding. Ibm J. Res. Dev. 1979, 23, 149–162. [Google Scholar] [CrossRef]
- Jas, A.; Ghosh-Dastidar, J.; Ng, M.-E.; Touba, N.A. An efficient test vector compression scheme using selective Huffman coding. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2003, 22, 797–806. [Google Scholar] [CrossRef]
- Wallace, G.K. The JPEG still picture compression standard. IEEE Trans. Consum. Electron. 1992, 38, xviii–xxxiv. [Google Scholar] [CrossRef]
- Chang, C.-L.; Girod, B. Direction-Adaptive Discrete Wavelet Transform for Image Compression. IEEE Trans. Image Process. 2007, 16, 1289–1302. [Google Scholar] [CrossRef] [PubMed]
- Skodras, A.; Christopoulos, C.; Ebrahimi, T. The JPEG 2000 still image compression standard. IEEE Signal Process. Mag. 2001, 18, 36–58. [Google Scholar] [CrossRef]
- Cintra, R.J.; Bayer, F.M. A DCT Approximation for Image Compression. IEEE Signal Process. Lett. 2011, 18, 579–582. [Google Scholar] [CrossRef]
- Klimesh, M.; Stanton, V.; Watola, D. Hardware implementation of a lossless image compression algorithm using a field programmable gate array. Mars 2001, 4, 5–72. [Google Scholar]
- Kim, B.S.; Baek, S.; Kim, D.S.; Chung, D.J. A high performance fully pipeline JPEG-LS encoder for lossless compression. IEICE Electron. Express 2013, 10, 20130348. [Google Scholar] [CrossRef]
- Nazar, F.; Murugan, S. Implementation of JPEG-LS compression algorithm for real time applications. In Proceedings of the 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), Chennai, India, 3–5 March 2016; pp. 2772–2774. [Google Scholar]
- Jallouli, S.; Zouari, S.; Masmoudi, N.; Masmoudi, A. An Adaptive Block-Based Histogram Packing for Improving the Compression Performance of JPEG-LS for Images with Sparse and Locally Sparse Histograms. In International Conference on Image and Signal Processing; Springer: Cham, Switzerland, 2018; pp. 63–71. [Google Scholar]
- Daryanavard, H.; Abbasi, O.; Talebi, R. FPGA implementation of JPEG-LS compression algorithm for real time applications. In Proceedings of the 2011 19th Iranian Conference on Electrical Engineering, Tehran, Iran, 17–19 May 2011; pp. 1–4. [Google Scholar]
- Ferretti, M.; Boffadossi, M. A parallel pipelined implementation of LOCO-I for JPEG-LS. In Proceedings of the 17th International Conference on Pattern Recognition, Cambridge, UK, 26 August 2004; pp. 769–772. [Google Scholar]
- Merlino, P.; Abramo, A. A fully pipelined architecture for the LOCO-I compression algorithm. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2009, 17, 967–971. [Google Scholar] [CrossRef]
- Chen, L.; Yan, L.; Sang, H.; Zhang, T. High-Throughput Architecture for Both Lossless and Near-lossless Compression Modes of LOCO-I Algorithm. IEEE Trans. Circuits Syst. Video Technol. 2019, 29, 3754–3764. [Google Scholar] [CrossRef]
- Si, X.; Tu, Y.N.; Huang, W.H.; Su, J.W.; Lu, P.J.; Wang, J.H.; Liu, T.W.; Wu, S.Y.; Liu, R.; Chou, Y.C. 15.5 A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips. In Proceedings of the 2020 IEEE International Solid-State Circuits Conference-(ISSCC), San Francisco, CA, USA, 16–20 February 2020; pp. 246–248. [Google Scholar]
- Sun, C.; Chen, C.H.; Kurian, G.; Wei, L.; Miller, J.; Agarwal, A.; Peh, L.S.; Stojanovic, V. DSENT—A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling. In Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip, Lyngby, Denmark, 9–11 May 2012; pp. 201–210. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Yan, B.-K.; Ruan, S.-J. Area Efficient Compression for Floating-Point Feature Maps in Convolutional Neural Network Accelerators. IEEE Trans. Circuits Syst. II Express Briefs 2023, 70, 746–750. [Google Scholar] [CrossRef]
- Xie, C.; Shao, Z.; Xu, H.; Chen, X.; Du, L.; Du, Y.; Wang, Z. Deep Neural Network Interlayer Feature Map Compression Based on Least-Squares Fitting. In Proceedings of the 2022 IEEE International Symposium on Circuits and Systems (ISCAS), Austin, TX, USA, 27 May–1 June 2022; pp. 3398–3402. [Google Scholar]
Resource | Utilization | Available | Utilization % |
---|---|---|---|
LUT | 8728 | 274,080 | 3.18 |
LUTRAM | 156 | 144,000 | 0.11 |
FF | 1208 | 548,160 | 0.22 |
BRAM | 4.5 | 912 | 0.49 |
Without Compression | With Compression | |
---|---|---|
Gate Count (M) | 1.34 | 1.46 |
Energy Consumption (pJ/MAC) | 6.41 | 5.99 |
This Work | TCASI’22 [8] | TCASII’23 [35] | ISCAS’22 [36] | ||
---|---|---|---|---|---|
Technology (nm) | 28 | 28 | 130 | 28 | |
Clock Rate (MHz) | 600 | 700 | N/A | 800 | |
Energy Efficiency (TOPS/W) | 1.24 | 2.16 | N/A | 1.14 | |
Compression Throughput (bits/cycle) | 32 | N/A | 8 | N/A | |
Compression Gate Count (K) | 122 | N/A | 2.09 | 136 | |
Compression Ratio | VGG16 (61%) | 6.44 | N/A | 2.54 | 1.37 |
ResNet34 (64%) | 3.62 | N/A | 1.93 | N/A | |
MobileNetV2 (70.5%) | 1.67 | 1.41 | 1.66 | N/A | |
InceptionV3 (70.7%) | 2.31 | N/A | N/A | 1.28 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hua, J.; Xu, H.; Du, Y.; Du, L. Improved JPEG Lossless Compression for Compression of Intermediate Layers in Neural Networks Based on Compute-In-Memory. Electronics 2024, 13, 3872. https://doi.org/10.3390/electronics13193872
Hua J, Xu H, Du Y, Du L. Improved JPEG Lossless Compression for Compression of Intermediate Layers in Neural Networks Based on Compute-In-Memory. Electronics. 2024; 13(19):3872. https://doi.org/10.3390/electronics13193872
Chicago/Turabian StyleHua, Junyong, Hang Xu, Yuan Du, and Li Du. 2024. "Improved JPEG Lossless Compression for Compression of Intermediate Layers in Neural Networks Based on Compute-In-Memory" Electronics 13, no. 19: 3872. https://doi.org/10.3390/electronics13193872