A Visible and Synthetic Aperture Radar Image Fusion Algorithm Based on a Transformer and a Convolutional Neural Network
Abstract
:1. Introduction
- (1)
- We propose a dual-branch Transformer–CNN framework to extract and fuse global and local features of visible and SAR images, addressing the issue of insufficient feature extraction in traditional auto-encoder-based image fusion methods.
- (2)
- On a macroscopic level, on the one hand, we have made innovative improvements to the dual-branch structure. Instead of traditionally concatenating global and local features of different modal data and then sending them to the decoder for reconstruction, we first concatenate the global features and then send the fused global features along with the local features of visible and SAR images to the decoder for reconstruction of the original images. On the other hand, we have introduced a residual structure to the model to enhance network performance and expressive capability, strengthening the extraction of complex features.
- (3)
- On a microscopic level, we have made some improvements to the dual-branch feature extraction network model, specifically including the introduction of the LT and DropKey mechanisms in the Transformer feature extraction network and the addition of the CBAM module in the CNN feature extraction network to reduce the potential loss of important feature information during the forward propagation of the fusion network and enhance the robustness of the model.
- (4)
- For the two-stage training process, we designed specific loss functions to suit different training tasks, achieving good results.
2. Related Work
2.1. CNN
2.2. Attention Mechanism
2.3. Transformer and Its Variants
2.4. Regularization Method
3. Framework and Methodology
3.1. Encoder
3.2. Fusion Strategy
3.3. Decoder
3.4. Loss Function
3.4.1. Training Stage 1
- Mutual information loss
- Structural similarity loss
- Feature decomposition loss
3.4.2. Training Stage 2
4. Experimental Setup and Result Analysis
4.1. Dataset Introduction
4.2. Evaluation Metrics
4.3. Experimental Setup
4.4. Comparison with SOTA Methods
4.4.1. Qualitative Comparison
4.4.2. Quantitative Comparison
4.5. Ablation Studies
- (1)
- Dual-branch structure: In this paper, we design a CNN-based and a Transformer-based dual-branch structure, and in order to prove the effectiveness of the dual-branch structure, we design ablation experiments as follows: (a) We use only the Transformer branch to complete the feature extraction, i.e., the CNN branch is replaced by the Transformer branch. (b) We use only the CNN branch to complete the feature extraction, i.e., the Transformer branch is replaced by the CNN branch.
- (2)
- Residual structure: A comparative experiment is conducted by comparing scenarios with and without the introduction of the residual structure.
- (3)
- DropKey: For the Transformer branch, a comparative experiment is conducted between using the DropKey mechanism and not using it.
- (4)
- CBAM: For the CNN branch, an experiment is conducted comparing the use of the CBAM module against not using it.
- (5)
- Two-stage training: This experiment introduced two-stage training to enhance fusion performance. In the ablation study, a one-stage training method directly trains the encoder, fusion layer, and decoder. The number of training rounds is consistent with the total number of rounds in the two-stage training, both at 140 rounds.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Zhang, H.; Shen, H.F.; Yuan, Q.Q.; Guan, X.B. Multispectral and SAR Image Fusion Based on Laplacian Pyramid and Sparse Representation. Remote Sens. 2022, 14, 870. [Google Scholar] [CrossRef]
- He, Y.Q.; Zhang, Y.T.; Chen, P.H.; Wang, J. Complex number domain SAR image fusion based on Laplacian pyramid. In Proceedings of the 2021 CIE International Conference on Radar (Radar), Haikou, China, 15–19 December 2021. [Google Scholar]
- Zhang, T.W.; Zhang, X.L. Squeeze-and-Excitation Laplacian Pyramid Network With Dual-Polarization Feature Fusion for Ship Classification in SAR Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 4019905. [Google Scholar] [CrossRef]
- Dai, J.Y.; Lv, Q.; Li, Y.; Wang, W.; Tian, Y.; Guo, J.Z. Controllable Angle Shear Wavefront Reconstruction Based on Image Fusion Method for Shear Wave Elasticity Imaging. IEEE Trans. Ultrason. Ferroelectr. Freq. Control. 2022, 69, 187–198. [Google Scholar] [CrossRef] [PubMed]
- Jia, H.Y. Research on Image Fusion Algorithm Based on Nonsubsampled Shear Wave Transform and Principal Component Analysis. J. Phys. Conf. Ser. 2022, 2146, 012025. [Google Scholar] [CrossRef]
- Zhao, M.J.; Peng, Y.P. A Multi-module Medical Image Fusion Method Based on Non-subsampled Shear Wave Transformation and Convolutional Neural Network. Sens. Imaging 2021, 22, 9. [Google Scholar] [CrossRef]
- Singh, S.; Singh, H.; Gehlot, A.; Kaur, J.; Gagandeep. IR and visible image fusion using DWT and bilateral filter. Microsyst. Technol. 2023, 29, 457–467. [Google Scholar] [CrossRef]
- Amritkar, M.A.; Mahajan, K.J. Comparative Approach of DCT and DWT for SAR Image Fusion. Int. J. Adv. Electron. Comput. Sci. 2016, 3, 107–111. [Google Scholar]
- Cheng, C.; Zhang, K.; Jiang, W.; Huang, Y. A SAR-optical image fusion method based on DT-CWT(Article). J. Inf. Comput. Sci. 2014, 11, 6067–6076. [Google Scholar] [CrossRef]
- Zhang, K.; Huang, Y.D.; Zhao, C. Remote sensing image fusion via RPCA and adaptive PCNN in NSST domain. Int. J. Wavelets Multiresolut. Inf. Process. 2018, 16, 1850037. [Google Scholar] [CrossRef]
- Liu, K.X.; Li, Y.F. SAR and multispectral image fusion algorithm based on sparse representation and NSST. In Proceedings of the 2nd International Conference on Green Energy and Sustainable Development (GESD 2019), Shanghai, China, 18–20 October 2019. [Google Scholar]
- Shen, F.Y.; Wang, Y.F.; Liu, C. Change Detection in SAR Images Based on Improved Non-subsampled Shearlet Transform and Multi-scale Feature Fusion CNN. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 1. [Google Scholar] [CrossRef]
- An, F.P.; Ma, X.M.; Bai, L. Image fusion algorithm based on unsupervised deep learning-optimized sparse representation. Biomed. Signal Process. Control. 2022, 71, 103140. [Google Scholar] [CrossRef]
- Ma, X.L.; Hu, S.H.; Yang, D.S. SAR Image De-noising Based on Residual Image Fusion and Sparse Representation. KSII Trans. Internet Inf. Syst. 2019, 13, 3620–3637. [Google Scholar]
- Bai, L.; Yao, S.L.; Gao, K.; Huang, Y.J.; Tang, R.J.; Yan, H.; Max, Q.-H.M.; Ren, H.L. Joint Sparse Representations and Coupled Dictionary Learning in Multi-Source Heterogeneous Image Pseudo-color Fusion. IEEE Sens. J. 2023, 23, 1. [Google Scholar] [CrossRef]
- Wang, J.W.; Qu, H.J.; Zhang, Z.H.; Xie, M. New insights into multi-focus image fusion: A fusion method based on multi-dictionary linear sparse representation and region fusion model. Inf. Fusion 2024, 105, 102230. [Google Scholar] [CrossRef]
- Wang, H.Z.; Shu, C.; Li, X.F.; Fu, Y.; Fu, Z.Z.; Yin, X.F. Two-Stream Edge-Aware Network for Infrared and Visible Image Fusion With Multi-Level Wavelet Decomposition. IEEE Access 2024, 12, 22190–22204. [Google Scholar] [CrossRef]
- Zhang, T.T.; Du, H.Q.; Xie, M. W-shaped network: A lightweight network for real-time infrared and visible image fusion. J. Electron. Imaging 2023, 32, 63005. [Google Scholar] [CrossRef]
- Luo, J.H.; Zhou, F.; Yang, J.; Xing, M.D. DAFCNN: A Dual-Channel Feature Extraction and Attention Feature Fusion Convolution Neural Network for SAR Image and MS Image Fusion. Remote Sens. 2023, 15, 3091. [Google Scholar] [CrossRef]
- Deng, B.; Lv, H. Research on Image Fusion Method of SAR and Visible Image Based on CNN. In Proceedings of the 2022 IEEE 4th International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Dali, China, 12–14 October 2022. [Google Scholar]
- Kong, Y.Y.; Hong, F.; Leung, H.; Peng, X.Y. A Fusion Method of Optical Image and SAR Image Based on Dense-UGAN and Gram–Schmidt Transformation. Remote Sens. 2021, 13, 4274. [Google Scholar] [CrossRef]
- Li, D.H.; Liu, J.; Liu, F.; Zhang, W.H.; Zhang, A.D.; Gao, W.F.; Shi, J. A Dual-fusion Semantic Segmentation Framework with GAN For SAR Images. In Proceedings of the IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022. [Google Scholar]
- Ma, C.H.; Gao, H.C. A GAN based method for SAR and optical images fusion. In Proceedings of the Seventh Asia Pacific Conference on Optics Manufacture and 2021 International Forum of Young Scientists on Advanced Optical Manufacturing (APCOM and YSAOM 2021), Shanghai, China, 28–31 October 2022. [Google Scholar]
- Liang, J.Y.; Cao, J.Z.; Sun, G.L.; Zhang, K.; Van Gool, L.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; 2021. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2021, arXiv:2010.11929. [Google Scholar]
- Wu, Z.; Liu, Z.; Lin, J.; Lin, Y.; Han, S. Lite Transformer with Long-Short Range Attention. arXiv 2020, arXiv:2004.11886. [Google Scholar] [CrossRef]
- Li, B.; Hu, Y.H.; Nie, X.C.; Han, C.Y.; Jiang, X.J.; Guo, T.D.; Liu, L.Q. DropKey for Vision Transformer. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–20 June 2023. [Google Scholar]
- Liu, Y.; Chen, X.; Peng, H.; Wang, Z.F. Multi-focus image fusion with a deep convolutional neural network. Inf. Fusion 2017, 36, 191–207. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J.; Kittler, J. Infrared and Visible Image Fusion using a Deep Learning Framework. In Proceedings of the 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018. [Google Scholar]
- Liu, Y.; Chen, X.; Cheng, J.; Peng, H.; Wang, Z.F. Infrared and visible image fusion with convolutional neural networks. Int. J. Wavelets Multiresolut. Inf. Process. 2018, 16, 1. [Google Scholar] [CrossRef]
- Di, J.; Ren, L.; Liu, J.Z.; Guo, W.Q.; Zhang, H.K.; Liu, Q.D.; Lian, J. FDNet: An end-to-end fusion decomposition network for infrared and visible images. PLoS ONE 2023, 18, e0290231. [Google Scholar] [CrossRef]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the 15th European Conference on Computer Vision (ECCV 2018), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Bai, Z.X.; Zhu, R.G.; He, D.Y.; Wang, S.C.; Huang, Z.T. Adulteration Detection of Pork in Mutton Using Smart Phone with the CBAM-Invert-ResNet and Multiple Parts Feature Fusion. Foods 2023, 12, 3594. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.H.; Fernandes, S.; Zhu, Z.Q.; Zhang, Y.D. AVNC: Attention-based VGG-style network for COVID-19 diagnosis by CBAM. IEEE Sens. J. 2021, 22, 1. [Google Scholar] [CrossRef] [PubMed]
- Jia, J.H.; Qin, L.L.; Lei, R.F. Im5C-DSCGA: A Proposed Hybrid Framework Based on Improved DenseNet and Attention Mechanisms for Identifying 5-methylcytosine Sites in Human RNA. Front. Biosci. 2023, 28, 346. [Google Scholar] [CrossRef] [PubMed]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is all You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Wang, W.H.; Xie, E.Z.; Li, X.; Fan, D.P.; Song, K.T.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient Transformer for High-Resolution Image Restoration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
- Zhao, Z.X.; Bai, H.W.; Zhang, J.S.; Zhang, Y.L.; Xu, S.; Lin, Z.D.; Timofte, R.; Van Gool, L. CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Wang, C.; Ruan, R.; Zhao, Z.C.; Li, C.L.; Tang, J. Category-oriented Localization Distillation for SAR Object Detection and A Unified Benchmark. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1. [Google Scholar] [CrossRef]
- Schmitt, M.; Hughes, L.H.; Zhu, X.X. The SEN1–2 dataset for deep learning in SAR-optical data fusion. arXiv 2018, arXiv:1807.01569. [Google Scholar] [CrossRef]
- Zhang, X.; Ye, P.; Xiao, G. VIFB: A Visible and Infrared Image Fusion Benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; Shanghai Jiao Tong University, School of Aeronautics and Astronautics: Shanghai, China, 2020. [Google Scholar]
- Li, H.; Wu, X.J. DenseFuse: A Fusion Approach to Infrared and Visible Images. IEEE Trans. Image Process. 2019, 28, 2614–2623. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Wu, X.J.; Kittler, J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Inf. Fusion 2021, 73, 72–86. [Google Scholar] [CrossRef]
- Tang, L.F.; Yuan, J.T.; Ma, J.Y. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf. Fusion 2022, 82, 28–42. [Google Scholar] [CrossRef]
- Wang, Z.S.; Chen, Y.L.; Shao, W.Y.; Li, H.; Zhang, L. SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images. IEEE Trans. Instrum. Meas. 2022, 71, 1–12. [Google Scholar] [CrossRef]
- Tang, W.; He, F.Z.; Liu, Y. YDTR: Infrared and Visible Image Fusion via Y-Shape Dynamic Transformer. IEEE Trans. Multimed. 2023, 25, 5413–5428. [Google Scholar] [CrossRef]
Theory | Evaluation Metrics |
---|---|
Information Theory | EN, MI, PSNR |
Structural Similarity | SSIM, MSE |
Image Feature | AG, EI, SD, SF, Qabf |
Visual Perception | SCD, VIF |
DenseFuse | SeAFusion | RFN-Nest | SwinFusion | YDTR | Ours | |
---|---|---|---|---|---|---|
EN | 7.02 | 7.42 | 7.06 | 7.23 | 7.09 | 7.54 |
MI | 1.71 | 1.64 | 1.51 | 1.75 | 1.78 | 2.42 |
PSNR | 15.47 | 12.99 | 14.50 | 12.92 | 14.63 | 14.09 |
SSIM | 1.04 | 1.01 | 0.80 | 1.06 | 1.05 | 1.10 |
MSE | 1974.79 | 3614.99 | 2532.79 | 3656.79 | 2451.7 | 2820.47 |
AG | 10.15 | 15.40 | 6.92 | 15.21 | 11.96 | 17.15 |
EI | 41.50 | 48.25 | 37.81 | 48.37 | 42.74 | 50.19 |
SD | 36.54 | 48.64 | 39.58 | 49.50 | 39.06 | 52.88 |
SF | 24.28 | 33.43 | 14.31 | 34.88 | 30.17 | 40.08 |
Qabf | 0.32 | 0.44 | 0.22 | 0.41 | 0.39 | 0.56 |
SCD | 1.21 | 1.35 | 1.20 | 1.49 | 1.16 | 1.57 |
VIF | 0.37 | 0.35 | 0.31 | 0.36 | 0.39 | 0.58 |
Information Theory | Structural Similarity | Image Features | Visual Perception | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
EN | MI | PSNR | SSIM | MSE | AG | EI | SD | SF | Qabf | SCD | VIF | |
(1) Dual-branch structure | ||||||||||||
Transformer branch | 7.52 | 1.73 | 13.47 | 1.01 | 3142.91 | 19.72 | 50.89 | 53.39 | 47.69 | 0.48 | 1.33 | 0.38 |
CNN branch | 7.36 | 1.02 | 11.40 | 0.41 | 4898.22 | 7.45 | 34.36 | 42.53 | 23.62 | 0.15 | 0.07 | 0.15 |
Ours | 7.54 | 2.42 | 14.09 | 1.10 | 2820.47 | 17.15 | 50.19 | 52.88 | 40.08 | 0.56 | 1.57 | 0.58 |
(2) Residual structure | ||||||||||||
Nonresidual | 7.52 | 2.41 | 13.14 | 0.98 | 3471.32 | 17.37 | 49.31 | 52.04 | 40.64 | 0.55 | 1.15 | 0.56 |
Ours | 7.54 | 2.42 | 14.09 | 1.10 | 2820.47 | 17.15 | 50.19 | 52.88 | 40.08 | 0.56 | 1.57 | 0.58 |
(3) DropKey | ||||||||||||
No Dropkey | 7.51 | 1.67 | 13.64 | 0.94 | 3049.84 | 17.11 | 47.86 | 49.74 | 42.08 | 0.45 | 1.07 | 0.34 |
Ours | 7.54 | 2.42 | 14.09 | 1.10 | 2820.47 | 17.15 | 50.19 | 52.88 | 40.08 | 0.56 | 1.57 | 0.58 |
(4) CBAM | ||||||||||||
No CBAM | 7.45 | 2.95 | 12.75 | 0.99 | 3886.71 | 16.23 | 49.47 | 51.02 | 37.53 | 0.55 | 0.97 | 0.75 |
Ours | 7.54 | 2.42 | 14.09 | 1.10 | 2820.47 | 17.15 | 50.19 | 52.88 | 40.08 | 0.56 | 1.57 | 0.58 |
(5) Two-stage training | ||||||||||||
One stage | 7.53 | 2.22 | 13.23 | 1.01 | 3425.46 | 18.50 | 50.76 | 52.41 | 43.25 | 0.55 | 1.19 | 0.52 |
Ours | 7.54 | 2.42 | 14.09 | 1.10 | 2820.47 | 17.15 | 50.19 | 52.88 | 40.08 | 0.56 | 1.57 | 0.58 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hu, L.; Su, S.; Zuo, Z.; Wei, J.; Huang, S.; Zhao, Z.; Tong, X.; Yuan, S. A Visible and Synthetic Aperture Radar Image Fusion Algorithm Based on a Transformer and a Convolutional Neural Network. Electronics 2024, 13, 2365. https://doi.org/10.3390/electronics13122365
Hu L, Su S, Zuo Z, Wei J, Huang S, Zhao Z, Tong X, Yuan S. A Visible and Synthetic Aperture Radar Image Fusion Algorithm Based on a Transformer and a Convolutional Neural Network. Electronics. 2024; 13(12):2365. https://doi.org/10.3390/electronics13122365
Chicago/Turabian StyleHu, Liushun, Shaojing Su, Zhen Zuo, Junyu Wei, Siyang Huang, Zongqing Zhao, Xiaozhong Tong, and Shudong Yuan. 2024. "A Visible and Synthetic Aperture Radar Image Fusion Algorithm Based on a Transformer and a Convolutional Neural Network" Electronics 13, no. 12: 2365. https://doi.org/10.3390/electronics13122365
APA StyleHu, L., Su, S., Zuo, Z., Wei, J., Huang, S., Zhao, Z., Tong, X., & Yuan, S. (2024). A Visible and Synthetic Aperture Radar Image Fusion Algorithm Based on a Transformer and a Convolutional Neural Network. Electronics, 13(12), 2365. https://doi.org/10.3390/electronics13122365