DMDiff: A Dual-Branch Multimodal Conditional Guided Diffusion Model for Cloud Removal Through SAR-Optical Data Fusion
Abstract
:1. Introduction
- Considering the significant differences in imaging mechanisms and information characteristics between SAR and optical images, a multimodal feature extraction and feature fusion mechanism is designed. DMDiff incorporates an innovative dual-branch encoder and a cross-modal feature fusion encoder. In the dual-branch encoder, the SAR branch extracts spatial and radiometric information from the SAR image, while the optical branch captures optical signals from cloud-free regions of cloudy image. In the feature fusion encoder, cross-attention establishes complementary mapping relationships between the two branches to infer optical information for cloud-covered regions. A de-redundancy mechanism ensures compact feature representations, effectively guiding the diffusion model to generate high-quality cloud-free images during the progressive generation process.
- To address the limitations of the noise prediction (NP) strategy in applying diffusion models to complex remote sensing scenarios, this study proposes an image adaptive prediction (IAP) strategy. Unlike the traditional NP strategy, which models noise distributions, IAP directly models the target image distribution rather than indirectly predicting Gaussian noise, more effectively guides the diffusion model to capture the inherent high spatial heterogeneity and complex spectral characteristics of remote sensing images. This strategy notably enhances the performance of diffusion models in remote sensing CR tasks.
- A comprehensive experimental validation framework is established. For airborne data, various masking modes are designed to evaluate reconstruction performance. For satellite data, a dataset approximating real-world cloud scenarios is developed using actual cloud masks to analyze restoration performance across different land cover types. Finally, the effectiveness of DMDiff in real-world application scenarios is validated using the LuojiaSET-OSFCR real cloud dataset.
2. Related Work
2.1. End-to-End Method for CR
2.2. Diffusion Models for CR
3. Materials and Methods
3.1. Introduction to Diffusion Model
3.2. Overview of the Cloud Removal Network
3.3. Dual-Branch Multimodal Feature Extraction Encoder
3.4. Multimodal Feature Fusion and De-Redundancy Encoder
3.5. Image Adaptive Prediction Strategy
4. Results and Analysis
4.1. Description of Datasets
4.2. Implementation Details and Metrics
4.3. Compared Algorithms
4.4. Simulated Experiment Results
4.4.1. Analysis of Model Reconstruction Performance in Multi-Mask Scenarios
4.4.2. Analysis of Reconstruction Performance for Different Land Surface Types Using Real Cloud Masks
4.5. Real Experiment Results
5. Discussion
5.1. Ablation Studies
- DMDiff ablation experiment: After replacing IAP with NP, the model performance significantly deteriorated. PSNR decreased by 21.6 dB, SSIM decreased by 0.4443, FID increased by 174.88, and LPIPS increased by 0.4726.
- Conditional diffusion transfer experiment: As shown in Table 5, by integrating the IAP strategy into conditional diffusion, a substantial performance improvement was observed. PSNR increased by 18.75 dB, SSIM increased by 0.2755, FID decreased by 58.74, and LPIPS decreased by 0.3213.
5.2. Computational Cost
5.3. Limitations and Future Directions
- Designing specialized feature extraction strategies for thin cloud regions, such as employing graph neural networks (GNNs) to model spatial dependencies between thin clouds and surrounding clear areas;
- Integrating multi-temporal data to leverage time-series analysis for mitigating thin cloud influences;
- Incorporating additional multimodal data sources to provide a more comprehensive observation of thin cloud-affected regions.
- Practical usability evaluation by applying cloud-free reconstructions to downstream tasks (e.g., land cover classification, change detection) and comparing them with ground truth data;
- Synergistic multi-temporal reconstruction integrating optical and SAR data from different time points to mitigate uncertainty in single-temporal estimations.
6. Conclusions
- Exploring latent diffusion models to improve computational efficiency;
- Developing specialized feature extraction strategies for thin clouds and integrating multi-temporal data;
- Validating the practical usability of reconstruction results through downstream tasks and investigating collaborative reconstruction strategies with multi-temporal data.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ju, J.; Roy, D.P. The availability of cloud-free Landsat ETM+ data over the conterminous United States and globally. Remote Sens. Environ. 2008, 112, 1196–1211. [Google Scholar] [CrossRef]
- King, M.D.; Platnick, S.; Menzel, W.P.; Ackerman, S.A.; Hubanks, P.A. Spatial and Temporal Distribution of Clouds Observed by MODIS Onboard the Terra and Aqua Satellites. IEEE Trans. Geosci. Remote Sens. 2013, 51, 3826–3852. [Google Scholar] [CrossRef]
- Asner, G.P. Cloud cover in Landsat observations of the Brazilian Amazon. Int. J. Remote Sens. 2001, 22, 3855–3862. [Google Scholar] [CrossRef]
- Jing, R.; Duan, F.; Lu, F.; Zhang, M.; Zhao, W. Denoising Diffusion Probabilistic Feature-Based Network for Cloud Removal in Sentinel-2 Imagery. Remote Sens. 2023, 15, 2217. [Google Scholar] [CrossRef]
- Meraner, A.; Ebel, P.; Zhu, X.X.; Schmitt, M. Cloud removal in Sentinel-2 imagery using a deep residual neural network and SAR-optical data fusion. ISPRS J. Photogramm. Remote Sens. 2020, 166, 333–346. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Li, Y.; Chan, J.C.W. Thick Cloud Removal With Optical and SAR Imagery via Convolutional-Mapping-Deconvolutional Network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 2865–2879. [Google Scholar] [CrossRef]
- Sui, J.; Ma, Y.; Yang, W.; Zhang, X.; Pun, M.O.; Liu, J. Diffusion Enhancement for Cloud Removal in Ultra-Resolution Remote Sensing Imagery. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
- Grohnfeldt, C.; Schmitt, M.; Zhu, X. A Conditional Generative Adversarial Network to Fuse SAR and Multispectral Optical Data for Cloud Removal from Sentinel-2 Images. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 1726–1729. [Google Scholar]
- Gao, J.; Yuan, Q.; Li, J.; Zhang, H.; Su, X. Cloud Removal with Fusion of High Resolution Optical and SAR Images Using Generative Adversarial Networks. Remote Sens. 2020, 12, 191. [Google Scholar] [CrossRef]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; pp. 6840–6851. [Google Scholar]
- Whang, J.; Delbracio, M.; Talebi, H.; Saharia, C.; Dimakis, A.G.; Milanfar, P. Deblurring via Stochastic Refinement. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 16272–16282. [Google Scholar]
- Ren, M.; Delbracio, M.; Talebi, H.; Gerig, G.; Milanfar, P. Multiscale Structure Guided Diffusion for Image Deblurring. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 10687–10699. [Google Scholar]
- Li, H.; Yang, Y.; Chang, M.; Chen, S.; Feng, H.; Xu, Z.; Li, Q.; Chen, Y. SRDiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing 2022, 479, 47–59. [Google Scholar] [CrossRef]
- Gao, S.; Liu, X.; Zeng, B.; Xu, S.; Li, Y.; Luo, X.; Liu, J.; Zhen, X.; Zhang, B. Implicit Diffusion Models for Continuous Super-Resolution. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 10021–10030. [Google Scholar]
- Lugmayr, A.; Danelljan, M.; Romero, A.; Yu, F.; Timofte, R.; Gool, L.V. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11451–11461. [Google Scholar]
- Xia, B.; Zhang, Y.; Wang, S.; Wang, Y.; Wu, X.; Tian, Y.; Yang, W.; Gool, L.V. DiffIR: Efficient Diffusion Model for Image Restoration. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 13049–13059. [Google Scholar]
- Bai, X.; Pu, X.; Xu, F. Conditional Diffusion for SAR to Optical Image Translation. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
- Wen, Z.; Suo, J.; Su, J.; Li, B.; Zhou, Y. Edge-SAR-Assisted Multimodal Fusion for Enhanced Cloud Removal. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
- Zhang, X.; Qiu, Z.; Peng, C.; Ye, P. Removing Cloud Cover Interference from Sentinel-2 Imagery in Google Earth Engine by Fusing Sentinel-1 SAR Data with a CNN Model. Int. J. Remote Sens. 2022, 43, 132–147. [Google Scholar] [CrossRef]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Gool, L.V.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
- Ding, H.; Zi, Y.; Xie, F. Uncertainty-Based Thin Cloud Removal Network via Conditional Variational Autoencoders. In Proceedings of the Asian Conference on Computer Vision (ACCV), Macao, China, 4–8 December 2022; pp. 469–485. [Google Scholar]
- Wu, P.; Pan, Z.; Tang, H.; Hu, Y. Cloudformer: A Cloud-Removal Network Combining Self-Attention Mechanism and Convolution. Remote Sens. 2022, 14, 6132. [Google Scholar] [CrossRef]
- Han, S.; Wang, J.; Zhang, S. Former-CR: A Transformer-Based Thick Cloud Removal Method with Optical and SAR Imagery. Remote Sens. 2023, 15, 1196. [Google Scholar] [CrossRef]
- Bermudez, J.D.; Happ, P.N.; Oliveira, D.A.B.; Feitosa, R.Q. SAR to Optical Image Synthesis for Cloud Removal with Generative Adversarial Networks. ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. 2018, 4, 5–11. [Google Scholar] [CrossRef]
- Zhang, S.; Li, X.; Zhou, X.; Wang, Y.; Hu, Y. Cloud removal using SAR and optical images via attention mechanism-based GAN. Pattern Recognit. Lett. 2023, 175, 8–15. [Google Scholar] [CrossRef]
- Li, C.; Liu, X.; Li, S. Transformer Meets GAN: Cloud-Free Multispectral Image Reconstruction via Multisensor Data Fusion in Satellite Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
- Dhariwal, P.; Nichol, A. Diffusion Models Beat GANs on Image Synthesis. In Proceedings of the Advances in Neural Information Processing Systems, Online, 6–14 December 2021; pp. 8780–8794. [Google Scholar]
- Kumari, N.; Zhang, B.; Zhang, R.; Shechtman, E.; Zhu, J.-Y. Multi-Concept Customization of Text-to-Image Diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 1931–1941. [Google Scholar]
- Kim, G.; Kwon, T.; Ye, J.C. DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 2426–2435. [Google Scholar]
- Gu, S.; Chen, D.; Bao, J.; Wen, F.; Zhang, B.; Chen, D.; Yuan, L.; Guo, B. Vector Quantized Diffusion Model for Text-to-Image Synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10696–10706. [Google Scholar]
- Tan, H.; Wu, S.; Pi, J. Semantic Diffusion Network for Semantic Segmentation. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; pp. 8702–8716. [Google Scholar]
- Esser, P.; Chiu, J.; Atighehchian, P.; Granskog, J.; Germanidis, A. Structure and Content-Guided Video Synthesis with Diffusion Models. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 7312–7322. [Google Scholar]
- Tumanyan, N.; Geyer, M.; Bagon, S.; Dekel, T. Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 1921–1930. [Google Scholar]
- Nichol, A.; Dhariwal, P. Improved Denoising Diffusion Probabilistic Models. In Proceedings of the International Conference on Machine Learning (ICML), Online, 18–24 July 2021; pp. 8162–8171. [Google Scholar]
- Wang, Y.; Zhang, B.; Zhang, W.; Hong, D.; Zhao, B.; Li, Z. Cloud Removal With SAR-Optical Data Fusion Using a Unified Spatial-Spectral Residual Network. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–20. [Google Scholar] [CrossRef]
- Yang, Q.; Wang, G.; Zhao, Y.; Zhang, X.; Dong, G.; Ren, P. Multi-Scale Deep Residual Learning for Cloud Removal. In Proceedings of the 2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 4967–4970. [Google Scholar]
- Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J.; Huang, Z. Efficient Multi-Scale Attention Module with Cross-Spatial Learning. In Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
- Chen, Z.; Zhang, Y.; Gu, J.; Kong, L.; Yang, X.; Yu, F. Dual Aggregation Transformer for Image Super-Resolution. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 12278–12287. [Google Scholar]
- Li, J.; Wen, Y.; He, L. SCConv: Spatial and Channel Reconstruction Convolution for Feature Redundancy. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 6153–6162. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Wang, Y.; Zhang, W.; Zhang, B. Cloud Removal With PolSAR-Optical Data Fusion Using A Two-Flow Residual Network. arXiv 2025, arXiv:2501.07901. [Google Scholar]
- Zhao, X.; Jia, K. Cloud Removal in Remote Sensing Using Sequential-Based Diffusion Models. Remote Sens. 2023, 15, 2861. [Google Scholar] [CrossRef]
- Hellwich, O.; Reigber, A.; Lehmann, H. Sensor and Data Fusion Contest: Test Imagery to Compare and Combine Airborne SAR and Optical Sensors for Mapping. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Toronto, ON, Canada, 24–28 June 2002; pp. 82–84. [Google Scholar]
- Li, X.; Zhang, G.; Cui, H.; Hou, S.; Wang, S.; Li, X.; Chen, Y.; Li, Z.; Zhang, L. MCANet: A joint semantic segmentation framework of optical and SAR images for land use classification. Int. J. Appl. Earth Obs. Geoinf. 2022, 106, 102638. [Google Scholar] [CrossRef]
- Mohajerani, S.; Krammer, T.A.; Saeedi, P. A Cloud Detection Algorithm for Remote Sensing Images Using Fully Convolutional Neural Networks. In Proceedings of the 2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), Vancouver, BC, Canada, 29–31 August 2018; pp. 1–5. [Google Scholar]
- Pan, J.; Xu, J.; Yu, X.; Ye, G.; Wang, M.; Chen, Y.; Ma, J. HDRSA-Net: Hybrid dynamic residual self-attention network for SAR-assisted optical image cloud and shadow removal. ISPRS J. Photogramm. Remote Sens. 2024, 218, 258–275. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
- Zhou, W.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
- Pan, H. Cloud Removal for Remote Sensing Imagery via Spatial Attention Generative Adversarial Network. arXiv 2020, arXiv:2009.13015. [Google Scholar]
- Xu, F.; Shi, Y.; Ebel, P.; Yu, L.; Xia, G.-S.; Yang, W.; Zhu, X.X. GLF-CR: SAR-enhanced cloud removal with global-local fusion. ISPRS J. Photogramm. Remote Sens. 2022, 192, 268–278. [Google Scholar] [CrossRef]
Metric | Method | Half | Expand | Line | SR | Thin | Thick |
---|---|---|---|---|---|---|---|
PSNR | (a) | 21.10 | 19.59 | 27.48 | 25.21 | 26.26 | 21.60 |
(b) | 16.91 | 16.39 | 26.23 | 24.13 | 30.58 | 22.99 | |
(c) | 24.50 | 22.19 | 32.38 | 29.24 | 29.48 | 22.95 | |
(d) | 21.63 | 20.92 | 30.44 | 27.68 | 33.45 | 24.83 | |
(e) | 20.87 | 18.93 | 30.61 | 28.48 | 20.75 | 25.12 | |
(f) | 23.70 | 22.20 | 31.81 | 28.90 | 29.87 | 22.97 | |
(g) | 17.75 | 17.75 | 17.75 | 17.75 | 17.75 | 17.75 | |
(h) | 25.03 | 24.55 | 38.47 | 35.94 | 32.63 | 25.20 | |
SSIM | (a) | 0.6484 | 0.5362 | 0.8223 | 0.7198 | 0.8597 | 0.6596 |
(b) | 0.5778 | 0.4221 | 0.7980 | 0.6836 | 0.9343 | 0.7350 | |
(c) | 0.7812 | 0.6660 | 0.9266 | 0.8408 | 0.9293 | 0.7395 | |
(d) | 0.6974 | 0.6193 | 0.8975 | 0.8150 | 0.9574 | 0.8034 | |
(e) | 0.6792 | 0.5460 | 0.9013 | 0.8337 | 0.8211 | 0.8054 | |
(f) | 0.7761 | 0.6690 | 0.9272 | 0.8624 | 0.9313 | 0.7463 | |
(g) | 0.6792 | 0.6792 | 0.6792 | 0.6792 | 0.6792 | 0.6792 | |
(h) | 0.7928 | 0.7468 | 0.9788 | 0.9623 | 0.9465 | 0.8065 | |
FID | (a) | 142.10 | 209.12 | 86.26 | 128.93 | 102.34 | 170.47 |
(b) | 168.99 | 263.54 | 96.92 | 146.50 | 43.49 | 134.05 | |
(c) | 115.95 | 208.29 | 25.67 | 62.77 | 56.89 | 142.87 | |
(d) | 167.67 | 251.48 | 47.63 | 93.37 | 48.18 | 178.86 | |
(e) | 188.25 | 279.27 | 48.66 | 83.93 | 166.65 | 156.02 | |
(f) | 104.35 | 194.99 | 37.85 | 64.31 | 55.63 | 145.76 | |
(g) | 81.69 | 81.69 | 81.69 | 81.69 | 81.69 | 81.69 | |
(h) | 64.38 | 86.90 | 9.88 | 15.07 | 25.32 | 76.43 | |
LPIPS | (a) | 0.2931 | 0.3934 | 0.1634 | 0.2356 | 0.1661 | 0.3171 |
(b) | 0.3447 | 0.5077 | 0.1865 | 0.2650 | 0.0730 | 0.2422 | |
(c) | 0.2021 | 0.3272 | 0.0669 | 0.1176 | 0.0896 | 0.2551 | |
(d) | 0.3119 | 0.4535 | 0.1061 | 0.1872 | 0.0786 | 0.2921 | |
(e) | 0.3487 | 0.5327 | 0.1043 | 0.1709 | 0.2514 | 0.2668 | |
(f) | 0.2068 | 0.3291 | 0.0735 | 0.1270 | 0.0926 | 0.2682 | |
(g) | 0.2350 | 0.2350 | 0.2350 | 0.2350 | 0.2350 | 0.2350 | |
(h) | 0.1569 | 0.1931 | 0.0209 | 0.0332 | 0.0472 | 0.1558 |
Metric | Method | All | City | Farmland | Forest | Road | Village | Water |
---|---|---|---|---|---|---|---|---|
PSNR | (a) | 25.04 | 22.31 | 25.63 | 27.44 | 24.95 | 23.44 | 26.44 |
(b) | 23.93 | 22.49 | 24.50 | 28.08 | 23.68 | 21.87 | 22.96 | |
(c) | 27.09 | 24.48 | 28.67 | 28.52 | 27.44 | 25.88 | 27.54 | |
(d) | 28.33 | 25.14 | 29.16 | 32.50 | 28.23 | 26.42 | 28.54 | |
(e) | 28.84 | 25.51 | 29.42 | 33.11 | 28.51 | 26.65 | 29.80 | |
(f) | 28.30 | 24.58 | 28.78 | 32.61 | 28.06 | 25.98 | 29.80 | |
(g) | 13.38 | 12.97 | 14.35 | 12.73 | 13.56 | 12.87 | 13.80 | |
(h) | 29.19 | 25.15 | 30.07 | 33.20 | 29.07 | 26.94 | 30.75 | |
SSIM | (a) | 0.6529 | 0.6116 | 0.6478 | 0.6525 | 0.6481 | 0.6560 | 0.7016 |
(b) | 0.6833 | 0.6660 | 0.6924 | 0.7120 | 0.6785 | 0.6888 | 0.6622 | |
(c) | 0.7477 | 0.6903 | 0.7617 | 0.7491 | 0.7543 | 0.7418 | 0.7892 | |
(d) | 0.7661 | 0.6986 | 0.7730 | 0.7991 | 0.7699 | 0.7505 | 0.8056 | |
(e) | 0.7709 | 0.6974 | 0.7757 | 0.8133 | 0.7725 | 0.7506 | 0.8157 | |
(f) | 0.7470 | 0.6728 | 0.7525 | 0.7927 | 0.7439 | 0.7282 | 0.7918 | |
(g) | 0.2805 | 0.1838 | 0.2963 | 0.3194 | 0.2858 | 0.2247 | 0.3728 | |
(h) | 0.7733 | 0.7006 | 0.7984 | 0.7933 | 0.7736 | 0.7554 | 0.8183 | |
FID | (a) | 92.53 | 150.44 | 129.74 | 116.93 | 142.97 | 137.63 | 159.91 |
(b) | 101.19 | 148.77 | 156.33 | 162.39 | 157.02 | 141.53 | 179.72 | |
(c) | 88.17 | 140.34 | 132.48 | 120.47 | 142.74 | 126.32 | 153.14 | |
(d) | 94.49 | 151.66 | 148.65 | 108.39 | 151.46 | 132.08 | 165.79 | |
(e) | 97.77 | 161.06 | 149.48 | 117.39 | 153.76 | 132.16 | 160.32 | |
(f) | 76.02 | 128.36 | 107.48 | 85.78 | 124.34 | 118.93 | 139.15 | |
(g) | 192.40 | 274.44 | 260.83 | 216.58 | 274.21 | 274.69 | 267.35 | |
(h) | 55.92 | 88.33 | 80.19 | 61.70 | 94.62 | 82.07 | 115.95 | |
LPIPS | (a) | 0.3326 | 0.3192 | 0.3179 | 0.3710 | 0.3275 | 0.3045 | 0.3553 |
(b) | 0.3286 | 0.3241 | 0.3244 | 0.3423 | 0.3205 | 0.2918 | 0.3684 | |
(c) | 0.3093 | 0.3125 | 0.2900 | 0.3540 | 0.3035 | 0.2812 | 0.3148 | |
(d) | 0.3117 | 0.3240 | 0.2938 | 0.3587 | 0.3081 | 0.2827 | 0.3029 | |
(e) | 0.3284 | 0.3378 | 0.3149 | 0.3803 | 0.3244 | 0.2975 | 0.3152 | |
(f) | 0.2796 | 0.2837 | 0.2704 | 0.2938 | 0.2801 | 0.2570 | 0.2926 | |
(g) | 0.6816 | 0.6632 | 0.7001 | 0.6877 | 0.6826 | 0.6924 | 0.6636 | |
(h) | 0.2202 | 0.2206 | 0.1985 | 0.2533 | 0.2265 | 0.2012 | 0.2211 |
Metric | (a) | (b) | (c) | (d) | (e) | (f) | (g) | (h) |
---|---|---|---|---|---|---|---|---|
PSNR | 25.64 | 24.48 | 34.08 | 26.74 | 31.00 | 34.16 | 13.41 | 34.29 |
SSIM | 0.7303 | 0.6966 | 0.8957 | 0.7818 | 0.8637 | 0.9001 | 0.5802 | 0.9012 |
FID | 169.60 | 178.74 | 106.61 | 210.10 | 191.95 | 133.11 | 181.35 | 88.67 |
LPIPS | 0.5497 | 0.4903 | 0.3060 | 0.4667 | 0.4235 | 0.3343 | 0.6825 | 0.2852 |
Metric | (a) | (b) | (c) | (d) | (e) | (f) | (g) | (h) | (i) |
---|---|---|---|---|---|---|---|---|---|
PSNR | 33.08 | 32.94 | 32.61 | 33.60 | 33.58 | 12.69 | 32.23 | 33.14 | 34.29 |
SSIM | 0.8841 | 0.8815 | 0.8803 | 0.8911 | 0.8906 | 0.4569 | 0.8779 | 0.8884 | 0.9012 |
FID | 98.99 | 100.21 | 104.55 | 93.78 | 95.64 | 263.55 | 107.34 | 98.48 | 88.67 |
LPIPS | 0.3108 | 0.3112 | 0.3124 | 0.2933 | 0.2962 | 0.7578 | 0.3169 | 0.2984 | 0.2852 |
Method | Strategy | ||||
---|---|---|---|---|---|
Condition-Diffution | NP | 13.41 | 0.5802 | 181.35 | 0.6825 |
IAP | 32.16 | 0.8557 | 122.61 | 0.3612 |
Method | Params (M) | FLOPs (G) |
---|---|---|
SAR-Opt-cGAN | 57.23 | 92.20 |
SpA-GAN | 0.23 | 69.44 |
SF-GAN | 111.66 | 170.93 |
DSen2-CR | 18.94 | 4964.71 |
GLF-CR | 14.83 | 981.11 |
Cloud-Attention GAN | 166.11 | 243.05 |
Conditional Diffusion | 158.82 | 1340.22 |
DMDiff (Ours) | 137.86 | 5645.67 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, W.; Mei, J.; Wang, Y. DMDiff: A Dual-Branch Multimodal Conditional Guided Diffusion Model for Cloud Removal Through SAR-Optical Data Fusion. Remote Sens. 2025, 17, 965. https://doi.org/10.3390/rs17060965
Zhang W, Mei J, Wang Y. DMDiff: A Dual-Branch Multimodal Conditional Guided Diffusion Model for Cloud Removal Through SAR-Optical Data Fusion. Remote Sensing. 2025; 17(6):965. https://doi.org/10.3390/rs17060965
Chicago/Turabian StyleZhang, Wenjuan, Junlin Mei, and Yuxi Wang. 2025. "DMDiff: A Dual-Branch Multimodal Conditional Guided Diffusion Model for Cloud Removal Through SAR-Optical Data Fusion" Remote Sensing 17, no. 6: 965. https://doi.org/10.3390/rs17060965
APA StyleZhang, W., Mei, J., & Wang, Y. (2025). DMDiff: A Dual-Branch Multimodal Conditional Guided Diffusion Model for Cloud Removal Through SAR-Optical Data Fusion. Remote Sensing, 17(6), 965. https://doi.org/10.3390/rs17060965