Multimodal Image Translation Algorithm Based on Singular Squeeze-and-Excitation Network
Abstract
:1. Introduction
- Propose a novel multimodal image translation algorithm: We propose the multimodal image translation algorithm based on the Singular Squeeze-and-Excitation Network (MASSE). MASSE facilitates the conversion between images by utilizing the content and style of the generated feature blocks.
- Develop an enhanced attention mechanism: We developed an attention mechanism combined with singular value channels, named the SSEnet. This mechanism integrates channel weights with singular value features to enhance image content features.
- Introduce a new feature cascading method: We introduce a new method called feature layer insertion (FLI), which efficiently combines traditional features with convolutional features.
- Demonstrated empirical effectiveness: We empirically demonstrate the effectiveness of our method in image translation and illustration. Qualitative and quantitative comparisons with state-of-the-art models validate the superior performance of MASSE.
2. Related Work
2.1. Generative Adversarial Networks
2.2. Image-to-Image Translation
3. Multimodal Image Translation Algorithm Based on Singular Squeeze-and-Excitation Network (MASSE)
3.1. Channel Attention Mechanism
3.2. The Introduction of MASSE Model
3.2.1. The Structure of MASSE
3.2.2. The Singular Squeeze-and-Excitation Network (SSEnet) Structure
3.2.3. The Feature Layer Insertion (FLI) Generator and Discriminator of MASSE
Algorithm 1: MASSE |
Step 1 Initialize Get(images) Svd_featureA = svd(imagesA) Svd_featureB = svd(imagesB) Step 2 For i to iteration SSEnet_featureA = SSEnet(imageA, Svd_featureA) ContentA = FLI(encode_content(imageA), SSEnet_featureA) StyleB = encode_style(imageB) SSEnet_featureB = SSEnet(imageB, Svd_featureB) ContentB = FLI(encode_content(imageB), SSEnet_featureB)) StyleA = encode_style(imageA) fakeA = generator(ContentB,StyleA) fakeB = generator(ContentA,StyleB) Discriminator(realA, fakeA) Discriminator(realB, fakeB) |
3.3. Global Loss Function
4. Training and Experimental Results
4.1. Parameter Details
4.2. Dataset
4.3. Ablation Experiment
4.4. Analysis of Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, Y.; Hu, B.; Huang, Y.; Gao, C.; Yin, J.; Wang, Q. HQ-I2IT: Redesign the optimization scheme to improve image quality in CycleGAN-based image translation systems. IET Image Process. 2024, 18, 507–522. [Google Scholar] [CrossRef]
- Tu, H.Y.; Wang, Z.; Zhao, Y.W. Unpaired Image-to-Image Translation with Diffusion Adversarial Network. Mathematics 2024, 12, 3178. [Google Scholar] [CrossRef]
- Hu, X.; Zhou, X.; Huang, Q.; Shi, Z.; Sun, L.; Li, Q. Qs-attn: Query-selected attention for contrastive learning in i2i translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 18291–18300. [Google Scholar]
- Wang, C.; Xu, C.; Wang, C.; Tao, D. Perceptual adversarial networks for image-to-image transformation. IEEE Trans. Image Process. 2018, 27, 4066–4079. [Google Scholar] [CrossRef] [PubMed]
- Wang, C.; Zheng, H.; Yu, Z.; Zheng, Z.; Gu, Z.; Zheng, B. Discriminative region proposal adversarial networks for high-quality image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 770–785. [Google Scholar]
- Dou, H.; Chen, C.; Hu, X.; Peng, S. Asymmetric CycleGan for unpaired NIR-to-RGB face image translation. In Proceedings of the ICASSP 2019—2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; pp. 1757–1761. [Google Scholar]
- Fu, X. Digital Image Art Style Transfer Algorithm Based on CycleGAN. Comput. Intell. Neurosci. 2022, 2022, 6075398. [Google Scholar] [CrossRef] [PubMed]
- Dong, Y.; Tan, W.; Tao, D.; Zheng, L.; Li, X. CartoonLossGAN: Learning surface and coloring of images for cartoonization. IEEE Trans. Image Process. 2021, 31, 485–498. [Google Scholar] [CrossRef] [PubMed]
- Zhang, K.; Gool, L.V.; Timofte, R. Deep unfolding network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Xie, Z.; Huang, Z.; Zhao, F.; Dong, H.; Kampffmeyer, M.; Liang, X. Towards scalable unpaired virtual try-on via patch-routed spatially-adaptive gan. Adv. Neural Inf. Process. Syst. 2021, 34, 2598–2610. [Google Scholar]
- Fang, H.; Deng, W.; Zhong, Y.; Hu, J. Triple-GAN: Progressive face aging with triple translation loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 13–19 June 2020; pp. 804–805. [Google Scholar]
- Huang, X.; Liu, M.Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 172–189. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
- Andrews, H.; Patterson, C.L.I.I.I. Singular value decomposition (SVD) image coding. IEEE Trans. Commun. 1976, 24, 425–432. [Google Scholar] [CrossRef]
- Resales; Achan; Frey. Unsupervised image translation. In Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France, 13–16 October 2003; pp. 472–478. [Google Scholar]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Taigman, Y.; Polyak, A.; Wolf, L. Unsupervised cross-domain image generation. arXiv 2016, arXiv:1611.02200. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Zhu, J.Y.; Zhang, R.; Pathak, D.; Darrell, T.; Efros, A.A.; Wang, O.; Shechtman, E. Toward multimodal image-to-image translation. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
- Bao, J.; Chen, D.; Wen, F.; Li, H.; Hua, G. CVAE-GAN: Fine-grained image generation through asymmetric training. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2745–2754. [Google Scholar]
- Yang, J.; Kannan, A.; Batra, D.; Parikh, D. Lr-gan: Layered recursive generative adversarial networks for image generation. arXiv Preprint 2017, arXiv:1703.01560. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Yi, Z.; Zhang, H.; Tan, P.; Gong, M. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2849–2857. [Google Scholar]
- Kim, T.; Cha, M.; Kim, H.; Lee, J.K.; Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 1857–1865. [Google Scholar]
- Yu, C.; Hu, D.; Zheng, S.; Jiang, W.; Li, M.; Zhao, Z.Q. An improved steganography without embedding based on attention GAN. Peer-to-Peer Netw. Appl. 2021, 14, 1446–1457. [Google Scholar] [CrossRef]
- Tang, H.; Liu, H.; Xu, D.; Torr, P.H.; Sebe, N. Attentiongan: Unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 1972–1987. [Google Scholar] [CrossRef] [PubMed]
- Wu, S.; Dong, C.; Qiao, Y. Blind image restoration based on cycle-consistent network. IEEE Trans. Multimed. 2022, 25, 1111–1124. [Google Scholar] [CrossRef]
- Liu, M.Y.; Huang, X.; Mallya, A.; Karras, T.; Aila, T.; Lehtinen, J.; Kautz, J. Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 10551–10560. [Google Scholar]
- Saito, K.; Saenko, K.; Liu, M.Y. Coco-funit: Few-shot unsupervised image translation with a content conditioned style encoder. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16. Springer International Publishing: Cham, Switzerland, 2020; pp. 382–398. [Google Scholar]
- Li, Y.; Liang, Q.; Han, Z.; Mai, W.; Wang, Z. Few-shot face sketch-to-photo synthesis via global-local asymmetric image-to-image translation. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 20, 1–24. [Google Scholar] [CrossRef]
- Li, B.; Xue, K.; Liu, B.; Lai, Y.K. Bbdm: Image-to-image translation with brownian bridge diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1952–1961. [Google Scholar]
- Alami Mejjati, Y.; Richardt, C.; Tompkin, J.; Cosker, D.; Kim, K.I. Unsupervised attention-guided image-to-image translation. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar] [CrossRef]
- Shamsolmoali, P.; Zareapoor, M.; Das, S.; Garcia, S.; Granger, E.; Yang, J. GEN: Generative equivariant networks for diverse image-to-image translation. IEEE Trans. Cybern. 2022, 53, 874–886. [Google Scholar] [CrossRef] [PubMed]
- Tu, H.; Wang, W.; Chen, J.; Wu, F.; Li, G. Unpaired image-to-image translation with improved two-dimensional feature. Multimed. Tools Appl. 2022, 81, 43851–43872. [Google Scholar] [CrossRef]
- Hicsonmez, S.; Samet, N.; Akbas, E.; Duygulu, P. GANILLA: Generative adversarial networks for image to illustration translation. Image Vis. Comput. 2020, 95, 103886. [Google Scholar] [CrossRef]
- Yang, S.; Jiang, L.; Liu, Z.; Loy, C.C. Unsupervised image-to-image translation with generative prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 18332–18341. [Google Scholar]
Type | Kernel Size | Stride | Output |
---|---|---|---|
Conv Block | 7 | 1 | 64 |
Conv Block | 3 | 2 | 128 |
Conv Block | 3 | 2 | 128 |
Fusion Block | - | - | 256 |
Conv Block | 3 | 1 | 256 |
Conv Block | 3 | 1 | 256 |
Conv Block | 3 | 1 | 256 |
Conv Block | 3 | 1 | 256 |
Conv Block | 3 | 1 | 256 |
Conv Block | 3 | 1 | 256 |
Deconv Block | 3 | 2 | 128 |
Deconv Block | 3 | 2 | 64 |
Deconv | 7 | 1 | 3 |
Type | Kernel Size | Stride | Output |
---|---|---|---|
Conv Block | 4 | 2 | 64 |
Conv Block | 4 | 2 | 128 |
Conv Block | 4 | 2 | 256 |
Conv Block | 4 | 1 | 512 |
Conv | 4 | 1 | 1 |
Cezanne2photo | |||||
---|---|---|---|---|---|
256 | 128 | 64 | 32 | 16 | |
FID | 71.59 | 57.36 | 62.19 | 74.64 | 77.15 |
EGF | 1007.10 | 1197.49 | 1024.77 | 972.50 | 1001.83 |
Sunmmer2winter | |||||
---|---|---|---|---|---|
FID | Ours | UNTF | GEN | CycleGAN | UNIT |
64.36 | 67.44 | 69.43 | 75.71 | 97.96 | |
EGF | Ours | UNTF | GEN | CycleGAN | UNIT |
956.24 | 880.21 | 841.26 | 784.82 | 741.33 | |
Apple2orange | |||||
FID | Ours | UNTF | GEN | CycleGAN | UNIT |
72.49 | 85.76 | 86.52 | 107.44 | 107.16 | |
EGF | Ours | UNTF | GEN | CycleGAN | UNIT |
891.58 | 826.22 | 779.08 | 727.01 | 784.71 |
Cezanne2photo | ||||
---|---|---|---|---|
FID | Ours | GANILLA | UNTF | UNITG |
53.99 | 64.27 | 68.48 | 70.43 | |
EGF | Ours | GANILLA | UNTF | UNITG |
1073.01 | 941.46 | 967.20 | 926.93 |
Cezanne2photo | |||||||
---|---|---|---|---|---|---|---|
FID | algorithm | Ours | MUNIT | EGF | algorithm | Ours | MUNIT |
Category 1 | 71.12 | 94.22 | Category 1 | 974.51 | 810.07 | ||
Category 2 | 79.66 | 84.84 | Category 2 | 1069.27 | 904.12 | ||
Category 3 | 76.46 | 104.69 | Category 3 | 997.66 | 878.88 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tu, H.; Wang, Z.; Zhao, Y. Multimodal Image Translation Algorithm Based on Singular Squeeze-and-Excitation Network. Mathematics 2025, 13, 177. https://doi.org/10.3390/math13010177
Tu H, Wang Z, Zhao Y. Multimodal Image Translation Algorithm Based on Singular Squeeze-and-Excitation Network. Mathematics. 2025; 13(1):177. https://doi.org/10.3390/math13010177
Chicago/Turabian StyleTu, Hangyao, Zheng Wang, and Yanwei Zhao. 2025. "Multimodal Image Translation Algorithm Based on Singular Squeeze-and-Excitation Network" Mathematics 13, no. 1: 177. https://doi.org/10.3390/math13010177
APA StyleTu, H., Wang, Z., & Zhao, Y. (2025). Multimodal Image Translation Algorithm Based on Singular Squeeze-and-Excitation Network. Mathematics, 13(1), 177. https://doi.org/10.3390/math13010177