Conditional Encoder-Based Adaptive Deep Image Compression with Classification-Driven Semantic Awareness
Abstract
:1. Introduction
- First, we extend the conditional encoder-based rate-adaptive DIC to hybrid contexts scenarios. We propose a rate-distortion-perception-classification (RDPC) joint optimization framework to train the neural network model for hybrid contexts.
- Second, we propose a Qmap generation mechanism based on image complexity and semantic importance, which allows variable-rate encoding within the rate range and controls the trade-off between classification and reconstruction contexts during bit allocation.
- Third, we evaluate the proposed DIC using performance metrics that correspond to DCP objectives. We show that the proposed DIC can generalize to different datasets and downstream classifiers and achieve superior RDCP trade-off performance.
2. Related Work
2.1. DIC for Single Context
2.2. DIC for Hybrid Contexts
2.3. Adaptive DIC
3. Conditional Encoder-Based Adaptive DIC in Hybrid Contexts
3.1. The RDCP Trade-Off
3.2. Framework of Proposed DIC
3.3. Loss Function Design
3.4. Method for Training the Proposed DIC
3.5. Qmap-Based Bits Assignment
4. Implementation of the Proposed DIC
4.1. Network Structure
4.2. Deep Codec Model Training
4.3. Qmap Generation
5. Experimental Results
5.1. Experimental Setup
5.2. Comparison of Loss Terms’ Weights Settings
5.3. Comparison of Qmap Generation Policies
5.4. Performance Evaluation and Comparison
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Hu, Y.; Yang, W.; Ma, Z.; Liu, J. Learning end-to-end lossy image compression: A benchmark. IEEE Trans. Pattern Anal. 2021, 44, 4194–4211. [Google Scholar] [CrossRef] [PubMed]
- Mishra, D.; Singh, S.K.; Singh, R.K. Deep architectures for image compression: A critical review. Signal Process. 2022, 191, 108346. [Google Scholar] [CrossRef]
- Ma, S.; Zhang, X.; Jia, C.; Zhao, Z.; Wang, S.; Wang, S. Image and video compression with neural networks: A review. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 1683–1698. [Google Scholar] [CrossRef] [Green Version]
- Binglin, L.; Linwei, Y.; Jietao, L.; Yang, W.; Jingning, H. Region-of-interest and channel attention-based joint optimization of image compression and computer vision. Neurocomputing 2022, 500, 13–25. [Google Scholar]
- Gündüz, D.; Qin, Z.; Aguerri, I.E.; Dhillon, H.S.; Yang, Z.; Yener, A.; Wong, K.K.; Chae, C.B. Beyond transmitting bits: Context, semantics, and task-oriented communications. IEEE J. Sel. Areas Commun. 2022, 41, 5–41. [Google Scholar] [CrossRef]
- Liu, D.; Zhang, H.; Xiong, Z. On the classification-distortion-perception tradeoff. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8 December 2019. [Google Scholar]
- Lei, Z.; Duan, P.; Hong, X.; Mota, J.F.; Shi, J.; Wang, C.X. Progressive Deep Image Compression for Hybrid Contexts of Image Classification and Reconstruction. IEEE J. Sel. Areas Commun. 2022, 41, 72–89. [Google Scholar] [CrossRef]
- Singh, S.; Abu-El-Haija, S.; Johnston, N.; Ballé, J.; Shrivastava, A.; Toderici, G. End-to-end learning of compressible features. In Proceedings of the IEEE International Conference on Image Processing, Abu Dhabi, United Arab Emirates, 25 October 2020. [Google Scholar]
- Borkar, T.S.; Karam, L.J. DeepCorrect: Correcting DNN models against image distortions. IEEE Trans. Image Process. 2019, 28, 6022–6034. [Google Scholar] [CrossRef] [Green Version]
- Cheng, Z.; Sun, H.; Takeuchi, M.; Katto, J. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13 June 2020. [Google Scholar]
- Sun, Z.; Tan, Z.; Sun, X.; Zhang, F.; Qian, Y.; Li, D.; Li, H. Interpolation variable rate image compression. In Proceedings of the ACM International Conference on Multimedia, Virtual Event, China, 20 October 2021. [Google Scholar]
- Yang, F.; Herranz, L.; Van De Weijer, J.; Guitián, J.A.I.; López, A.M.; Mozerov, M.G. Variable rate deep image compression with modulated autoencoder. IEEE Signal Process. Lett. 2020, 27, 331–335. [Google Scholar] [CrossRef] [Green Version]
- Song, M.; Choi, J.; Han, B. Variable-rate deep image compression through spatially-adaptive feature transform. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10 October 2021. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.C.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8 December 2014. [Google Scholar]
- Agustsson, E.; Tschannen, M.; Mentzer, F.; Timofte, R.; Gool, L.V. Generative adversarial networks for extreme learned image compression. In Proceedings of the the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October 2019. [Google Scholar]
- Ballé, J.; Laparra, V.; Simoncelli, E.P. End-to-end optimization of nonlinear transform codes for perceptual quality. In Proceedings of the Picture Coding Symposium, Nuremberg, Germany, 4 December 2016. [Google Scholar]
- Mentzer, F.; Agustsson, E.; Tschannen, M.; Timofte, R.; Van Gool, L. Conditional probability models for deep image compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18 June 2018. [Google Scholar]
- Nakanishi, K.M.; Maeda, S.I.; Miyato, T.; Okanohara, D. Neural multi-scale image compression. In Proceedings of the Asian Conference on Computer Vision, Perth, Australia, 2 December 2018. [Google Scholar]
- Chen, T.; Liu, H.; Ma, Z.; Shen, Q.; Cao, X.; Wang, Y. End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Trans. Image Process. 2021, 30, 3179–3191. [Google Scholar] [CrossRef]
- Lu, M.; Guo, P.; Shi, H.; Cao, C.; Ma, Z. Transformer-based image compression. arXiv 2021, arXiv:2111.06707. [Google Scholar]
- Toderici, G.; Vincent, D.; Johnston, N.; Jin Hwang, S.; Minnen, D.; Shor, J.; Covell, M. Full resolution image compression with recurrent neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21 July 2017. [Google Scholar]
- Gong, R.; Liu, X.; Jiang, S.; Li, T.; Hu, P.; Lin, J.; Yu, F.; Yan, J. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October 2019. [Google Scholar]
- Cai, J.; Zhang, L. Deep image compression with iterative non-uniform quantization. In Proceedings of the IEEE International Conference on Image Processing, Athens, Greece, 7 October 2018. [Google Scholar]
- Ballé, J.; Minnen, D.; Singh, S.; Hwang, S.J.; Johnston, N. Variational image compression with a scale hyperprior. arXiv 2018, arXiv:1802.01436. [Google Scholar]
- Minnen, D.; Ballé, J.; Toderici, G.D. Joint autoregressive and hierarchical priors for learned image compression. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3 December 2018. [Google Scholar]
- Minnen, D.; Singh, S. Channel-wise autoregressive entropy models for learned image compression. In Proceedings of the IEEE International Conference on Image Processing, Abu Dhabi, United Arab Emirates, 25 October 2020. [Google Scholar]
- Huang, C.; Liu, H.; Chen, T.; Shen, Q.; Ma, Z. Extreme image coding via multiscale autoencoders with generative adversarial optimization. In Proceedings of the IEEE Visual Communications and Image Processing, Sydney, Australia, 1 December 2019. [Google Scholar]
- Wu, L.; Huang, K.; Shen, H. A gan-based tunable image compression system. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 2 March 2020. [Google Scholar]
- Minnen, D.; Toderici, G.; Covell, M.; Chinen, T.; Johnston, N.; Shor, J.; Hwang, S.J.; Vincent, D.; Singh, S. Spatially adaptive image compression using a tiled deep network. In Proceedings of the IEEE International Conference on Image Processing, Beijing, China, 17 September 2017. [Google Scholar]
- Torfason, R.; Mentzer, F.; Agustsson, E.; Tschannen, M.; Timofte, R.; Van Gool, L. Towards image understanding from deep compression without decoding. arXiv 2018, arXiv:1803.06131. [Google Scholar]
- Mei, Y.; Li, F.; Li, L.; Li, Z. Learn A Compression for Objection Detection-VAE with a Bridge. In Proceedings of the International Conference on Visual Communications and Image Processing, Munich, Germany, 5 December 2021. [Google Scholar]
- Chamain, L.D.; Qi, S.; Ding, Z. End-to-End Image Classification and Compression with variational autoencoders. IEEE Internet Things J. 2022, 9, 21916–21931. [Google Scholar] [CrossRef]
- Liu, L.; Chen, T.; Liu, H.; Pu, S.; Wang, L.; Shen, Q. 2C-Net: Integrate image compression and classification via deep neural network. Multimed. Syst. 2023, 29, 945–959. [Google Scholar] [CrossRef]
- Zhang, Q.; Liu, D.; Li, H. Deep network-based image coding for simultaneous compression and retrieval. In Proceedings of the IEEE International Conference on Image Processing, Beijing, China, 17 September 2017. [Google Scholar]
- Le, N.; Zhang, H.; Cricri, F.; Ghaznavi-Youvalari, R.; Tavakoli, H.R.; Rahtu, E. Learned image coding for machines: A content-adaptive approach. In Proceedings of the IEEE International Conference on Multimedia and Expo, Shenzhen, China, 5 July 2021. [Google Scholar]
- Wang, Q.; Shen, L.; Shi, Y. Recognition-driven compressed image generation using semantic-prior information. IEEE Signal Process. Lett. 2020, 27, 1150–1154. [Google Scholar] [CrossRef]
- Xiao, J.; Aggarwal, L.; Banerjee, P.; Aggarwal, M.; Medioni, G. Identity Preserving Loss for Learned Image Compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19 June 2022. [Google Scholar]
- Le, N.; Zhang, H.; Cricri, F.; Ghaznavi-Youvalari, R.; Rahtu, E. Image coding for machines: An end-to-end learned approach. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, ON, Canada, 6 June 2021. [Google Scholar]
- Cui, Z.; Wang, J.; Gao, S.; Guo, T.; Feng, Y.; Bai, B. Asymmetric gained deep image compression with continuous rate adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19 June 2021. [Google Scholar]
- Yin, S.; Li, C.; Bao, Y.; Liang, Y.; Meng, F.; Liu, W. Universal Efficient Variable-Rate Neural Image Compression. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Singapore, 22 May 2022. [Google Scholar]
- Cai, C.; Chen, L.; Zhang, X.; Gao, Z. Efficient variable rate image compression with multi-scale decomposition network. IEEE Trans. Circuits Syst. 2018, 29, 3687–3700. [Google Scholar] [CrossRef]
- Sinha, A.K.; Moorthi, S.M.; Dhar, D. Self-Supervised Variable Rate Image Compression Using Visual Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 21 June 2022. [Google Scholar]
- Han, C.; Duan, Y.; Tao, X.; Xu, M.; Lu, J. Toward variable-rate generative compression by reducing the channel redundancy. IEEE Trans. Circuits Syst. 2020, 30, 1789–1802. [Google Scholar] [CrossRef]
- Blau, Y.; Michaeli, T. The perception-distortion tradeoff. In Proceedings of the IEEE conference on Computer Vision and Rattern Recognition, Salt Lake City, UT, USA, 18 June 2018. [Google Scholar]
- Blau, Y.; Michaeli, T. Rethinking lossy compression: The rate-distortion-perception tradeoff. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9 June 2019. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- Mishra, S.; Chen, D.Z.; Hu, X.S. Image complexity guided network compression for biomedical image segmentation. ACM J. Emerg. Technol. Comput. Syst. 2021, 18, 1–23. [Google Scholar] [CrossRef]
- Yu, H.; Winkler, S. Image complexity and spatial information. In Proceedings of the International Workshop on Quality of Multimedia Experience, Klagenfurt am Wórthersee, Austria, 3 July 2013. [Google Scholar]
- Wang, X.; Yu, K.; Dong, C.; Loy, C.C. Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18 June 2018. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-CAM++: Generalized gradient-based visual explanations for deep convolutional networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, Lake Tahoe, UT, USA, 12 March 2018. [Google Scholar]
- Ding, L.; Goshtasby, A. On the Canny edge detector. Pattern Recognit. 2001, 34, 721–725. [Google Scholar] [CrossRef]
- Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef] [Green Version]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6 September 2014. [Google Scholar]
- Franzen, R. Kodak Lossless True Color Image Suite. Available online: http://r0k.us/graphics/kodak/ (accessed on 3 March 2023).
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20 June 2009. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8 December 2019. [Google Scholar]
- Bégaint, J.; Racapé, F.; Feltman, S.; Pushparaja, A. Compressai: A pytorch library and evaluation platform for end-to-end compression research. arXiv 2020, arXiv:2011.03029. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the International Conference on Pattern Recognition, Istanbul, Turkey, 23 August 2010. [Google Scholar]
- Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef] [Green Version]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Rattern Recognition, Salt Lake City, UT, USA, 18 June 2018. [Google Scholar]
- Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
- Bellard, F. BPG Image Format. Available online: https://bellard.org/bpg/ (accessed on 3 March 2023).
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27 June 2016. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27 June 2016. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8 June 2015. [Google Scholar]
- Krizhevsky, A. One weird trick for parallelizing convolutional neural networks. arXiv 2014, arXiv:1404.5997. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lei, Z.; Zhang, W.; Hong, X.; Shi, J.; Su, M.; Lin, C. Conditional Encoder-Based Adaptive Deep Image Compression with Classification-Driven Semantic Awareness. Electronics 2023, 12, 2781. https://doi.org/10.3390/electronics12132781
Lei Z, Zhang W, Hong X, Shi J, Su M, Lin C. Conditional Encoder-Based Adaptive Deep Image Compression with Classification-Driven Semantic Awareness. Electronics. 2023; 12(13):2781. https://doi.org/10.3390/electronics12132781
Chicago/Turabian StyleLei, Zhongyue, Weicheng Zhang, Xuemin Hong, Jianghong Shi, Minxian Su, and Chaoheng Lin. 2023. "Conditional Encoder-Based Adaptive Deep Image Compression with Classification-Driven Semantic Awareness" Electronics 12, no. 13: 2781. https://doi.org/10.3390/electronics12132781
APA StyleLei, Z., Zhang, W., Hong, X., Shi, J., Su, M., & Lin, C. (2023). Conditional Encoder-Based Adaptive Deep Image Compression with Classification-Driven Semantic Awareness. Electronics, 12(13), 2781. https://doi.org/10.3390/electronics12132781