Taming a Diffusion Model to Revitalize Remote Sensing Image Super-Resolution
Abstract
:1. Introduction
- We propose RSDiffSR, a conditional, diffusion-based framework for single remote sensing image super-resolution (SRSISR) that leverages a large natural image generation diffusion model as a generative prior, benefiting from its strong generative capability.
- To address the domain gap between natural images and RSIs that are caused by differences in data distribution, we apply the low-rank adaptation technique and a multi-stage training process, enabling efficient fine tuning with reduced computational and data requirements.
- Given the challenges posed by small and blurry objects in RSIs, we introduce an enhanced control mechanism. This mechanism separately processes edge and content information from input images and combines them with diffusion features using the proposed content-edge joint guidance (CEJG) module, ensuring accurate and realistic reconstructions.
- Quantitative and qualitative evaluations demonstrate that our model performs favorably against state-of-the-art methods across multiple benchmarks. The adoption of a generative prior significantly enhances visual perception, enabling the super-resolved results of RSDiffSR to exhibit superior visual quality with rich details, which positively impacts downstream tasks.
2. Related Works
2.1. Natural Image Super-Resolution
2.2. Remote Sensing Image Super-Resolution
3. Methodology
3.1. Diffusion Framework
3.1.1. Generative Prior
3.1.2. Degradation Pre-Eliminated Encoder
3.1.3. Trainable Content Encoder and Content-Edge Joint Guidance
3.2. Model Training and Low-Rank Adaptation
Algorithm 1 Training |
repeat Inputs: Network:
Outputs:
Take a gradient descent step on Loss: until converged |
Algorithm 2 Sampling |
Inputs: x fordo if , else end for Return: |
4. Experiments
4.1. Datasets and Implementation
4.1.1. Training and Testing Datasets
4.1.2. Implementation Details
4.1.3. Evaluation Metrics
4.2. Comparisons with Existing Methods
4.2.1. Results of the DOTA and RSOD Datasets
4.2.2. Real-World Evaluations
4.2.3. Inference Time and Model Size Comparison
4.3. Ablation Study
4.4. Failure Cases
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bandara, W.G.C.; Nair, N.G.; Patel, V.M. Ddpm-cd: Remote sensing change detection using denoising diffusion probabilistic models. arXiv 2022, arXiv:2206.11892. [Google Scholar]
- Wang, X.; Yi, J.; Guo, J.; Song, Y.; Lyu, J.; Xu, J.; Yan, W.; Zhao, J.; Cai, Q.; Min, H. A review of image super-resolution approaches based on deep learning and applications in remote sensing. Remote Sens. 2022, 14, 5423. [Google Scholar] [CrossRef]
- Lei, S.; Shi, Z.; Zou, Z. Super-resolution for remote sensing images via local–global combined network. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1243–1247. [Google Scholar] [CrossRef]
- Li, Y.; Mavromatis, S.; Zhang, F.; Du, Z.; Sequeira, J.; Wang, Z.; Zhao, X.; Liu, R. Single-image super-resolution for remote sensing images using a deep generative adversarial network with local and global attention mechanisms. IEEE Trans. Geosci. Remote Sens. 2021, 60, 3000224. [Google Scholar] [CrossRef]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; Volume 2017, pp. 4681–4690. [Google Scholar]
- Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22367–22377. [Google Scholar]
- Lei, S.; Shi, Z.; Mo, W. Transformer-based multistage enhancement for remote sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5615611. [Google Scholar] [CrossRef]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
- Shi, S.; Bai, Q.; Cao, M.; Xia, W.; Wang, J.; Chen, Y.; Yang, Y. Region-adaptive deformable network for image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 324–333. [Google Scholar]
- Yang, S.; Wu, T.; Shi, S.; Lao, S.; Gong, Y.; Cao, M.; Wang, J.; Yang, Y. Maniqa: Multi-dimension attention network for no-reference image quality assessment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1191–1200. [Google Scholar]
- Yang, F.; Yang, H.; Fu, J.; Lu, H.; Guo, B. Learning texture transformer network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5791–5800. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
- Dhariwal, P.; Nichol, A. Diffusion models beat gans on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
- Yang, L.; Liu, J.; Hong, S.; Zhang, Z.; Huang, Z.; Cai, Z.; Zhang, W.; Cui, B. Improving diffusion-based image synthesis with context prediction. Adv. Neural Inf. Process. Syst. 2024, 36. [Google Scholar]
- Saharia, C.; Ho, J.; Chan, W.; Salimans, T.; Fleet, D.J.; Norouzi, M. Image super-resolution via iterative refinement. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4713–4726. [Google Scholar] [CrossRef]
- Wang, J.; Yue, Z.; Zhou, S.; Chan, K.C.; Loy, C.C. Exploiting diffusion prior for real-world image super-resolution. Int. J. Comput. Vis. 2024, 132, 5929–5949. [Google Scholar] [CrossRef]
- Lin, X.; He, J.; Chen, Z.; Lyu, Z.; Dai, B.; Yu, F.; Qiao, Y.; Ouyang, W.; Dong, C. Diffbir: Toward blind image restoration with generative diffusion prior. In Proceedings of the European Conference on Computer Vision, Milan, Italy, 29 September–4 October 2024; pp. 430–448. [Google Scholar]
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
- Podell, D.; English, Z.; Lacey, K.; Blattmann, A.; Dockhorn, T.; Müller, J.; Penna, J.; Rombach, R. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv 2023, arXiv:2307.01952. [Google Scholar]
- Liu, J.; Yuan, Z.; Pan, Z.; Fu, Y.; Liu, L.; Lu, B. Diffusion model with detail complement for super-resolution of remote sensing. Remote Sens. 2022, 14, 4834. [Google Scholar] [CrossRef]
- Xiao, Y.; Yuan, Q.; Jiang, K.; He, J.; Jin, X.; Zhang, L. EDiffSR: An Efficient Diffusion Probabilistic Model for Remote Sensing Image Super-Resolution. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5601514. [Google Scholar] [CrossRef]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
- Xu, Z.; Baojie, X.; Guoxin, W. Canny edge detection based on Open CV. In Proceedings of the 2017 13th IEEE International Conference on Electronic Measurement & Instruments (ICEMI), Yangzhou, China, 20–22 October 2017; pp. 53–56. [Google Scholar]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part IV 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 184–199. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Wang, X.; Xie, L.; Dong, C.; Shan, Y. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1905–1914. [Google Scholar]
- Yang, T.; Wu, R.; Ren, P.; Xie, X.; Zhang, L. Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization. In Proceedings of the European Conference on Computer Vision, Paris, France, 26–27 March 2025; pp. 74–91. [Google Scholar]
- Yue, Z.; Wang, J.; Loy, C.C. Resshift: Efficient diffusion model for image super-resolution by residual shifting. Adv. Neural Inf. Process. Syst. 2024, 36, 13294–13307. [Google Scholar]
- Sun, H.; Li, W.; Liu, J.; Chen, H.; Pei, R.; Zou, X.; Yan, Y.; Yang, Y. CoSeR: Bridging Image and Language for Cognitive Super-Resolution. arXiv 2023, arXiv:2311.16512. [Google Scholar]
- Zhang, S.; Yuan, Q.; Li, J.; Sun, J.; Zhang, X. Scene-adaptive remote sensing image super-resolution using a multiscale attention network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4764–4779. [Google Scholar] [CrossRef]
- Pan, Z.; Ma, W.; Guo, J.; Lei, B. Super-resolution of single remote sensing image based on residual dense backprojection networks. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7918–7933. [Google Scholar] [CrossRef]
- Xiao, Y.; Su, X.; Yuan, Q.; Liu, D.; Shen, H.; Zhang, L. Satellite video super-resolution via multiscale deformable convolution alignment and temporal grouping projection. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5610819. [Google Scholar] [CrossRef]
- Guo, J.; Lv, F.; Shen, J.; Liu, J.; Wang, M. An improved generative adversarial network for remote sensing image super-resolution. IET Image Process. 2023, 17, 1852–1863. [Google Scholar] [CrossRef]
- Tu, J.; Mei, G.; Ma, Z.; Piccialli, F. SWCGAN: Generative adversarial network combining swin transformer and CNN for remote sensing image super-resolution. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 5662–5673. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Han, L.; Zhao, Y.; Lv, H.; Zhang, Y.; Liu, H.; Bi, G.; Han, Q. Enhancing remote sensing image super-resolution with efficient hybrid conditional diffusion model. Remote Sens. 2023, 15, 3452. [Google Scholar] [CrossRef]
- Ali, A.M.; Benjdira, B.; Koubaa, A.; Boulila, W.; El-Shafai, W. TESR: Two-stage approach for enhancement and super-resolution of remote sensing images. Remote Sens. 2023, 15, 2346. [Google Scholar] [CrossRef]
- Wu, C.; Wang, D.; Bai, Y.; Mao, H.; Li, Y.; Shen, Q. HSR-Diff: Hyperspectral image super-resolution via conditional diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 7083–7093. [Google Scholar]
- Pinaya, W.H.L.; Vieira, S.; Garcia-Dias, R.; Mechelli, A. Autoencoders. In Machine Learning; Elsevier: Amsterdam, The Netherlands, 2020; pp. 193–208. [Google Scholar]
- Zhang, L.; Rao, A.; Agrawala, M. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 3836–3847. [Google Scholar]
- Brooks, T.; Holynski, A.; Efros, A.A. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18392–18402. [Google Scholar]
- Yang, F.; Yang, S.; Butt, M.A.; van de Weijer, J. Dynamic prompt learning: Addressing cross-attention leakage for text-based image editing. Adv. Neural Inf. Process. Syst. 2024, 36, 26291–26303. [Google Scholar]
- Yang, Q.; Chen, D.; Tan, Z.; Liu, Q.; Chu, Q.; Bao, J.; Yuan, L.; Hua, G.; Yu, N. HQ-50K: A Large-scale, High-quality Dataset for Image Restoration. arXiv 2023, arXiv:2306.05390. [Google Scholar]
- Schuhmann, C.; Beaumont, R.; Vencu, R.; Gordon, C.; Wightman, R.; Cherti, M.; Coombes, T.; Katta, A.; Mullis, C.; Wortsman, M.; et al. Laion-5b: An open large-scale dataset for training next generation image-text models. Adv. Neural Inf. Process. Syst. 2022, 35, 25278–25294. [Google Scholar]
- Ding, J.; Xue, N.; Xia, G.S.; Bai, X.; Yang, W.; Yang, M.Y.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; et al. Object detection in aerial images: A large-scale benchmark and challenges. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 7778–7796. [Google Scholar] [CrossRef]
- Xia, G.S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef]
- Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate object localization in remote sensing images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498. [Google Scholar] [CrossRef]
- Rottensteiner, F.; Sohn, G.; Gerke, M.; Wegner, J.D.; Breitkopf, U.; Jung, J. Results of the ISPRS benchmark on urban object detection and 3D building reconstruction. ISPRS J. Photogramm. Remote Sens. 2014, 93, 256–271. [Google Scholar] [CrossRef]
- Cheng, G.; Han, J.; Zhou, P.; Guo, L. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS J. Photogramm. Remote Sens. 2014, 98, 119–132. [Google Scholar] [CrossRef]
- Kingma, D.P. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Karras, T.; Aittala, M.; Aila, T.; Laine, S. Elucidating the design space of diffusion-based generative models. Adv. Neural Inf. Process. Syst. 2022, 35, 26565–26577. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30, 6626–6637. [Google Scholar]
- Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
- Ke, J.; Wang, Q.; Wang, Y.; Milanfar, P.; Yang, F. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 5148–5157. [Google Scholar]
- Wang, J.; Chan, K.C.; Loy, C.C. Exploring clip for assessing the look and feel of images. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 2555–2563. [Google Scholar]
- Lei, S.; Shi, Z. Hybrid-scale self-similarity exploitation for remote sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5401410. [Google Scholar] [CrossRef]
- Xiao, Y.; Yuan, Q.; Jiang, K.; He, J.; Lin, C.W.; Zhang, L. TTST: A top-k token selective transformer for remote sensing image super-resolution. IEEE Trans. Image Process. 2024, 33, 738–752. [Google Scholar] [CrossRef]
- Meng, F.; Chen, Y.; Jing, H.; Zhang, L.; Yan, Y.; Ren, Y.; Wu, S.; Feng, T.; Liu, R.; Du, Z. A conditional diffusion model with fast sampling strategy for remote sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5408616. [Google Scholar] [CrossRef]
Datasets | Metrics | Bicubic | SwinIR [9] | Real-ESRGAN+ [30] | HSENet [60] | TransENet [8] | TTST [61] | EDiffSR [22] | FastDiffSR [62] | StableSR [17] | DiffBIR [18] | Ours |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ICCV2021 | ICCV2021 | TGRS2021 | TGRS2021 | TIP2024 | TGRS2024 | TGRS2024 | IJCV2024 | ECCV2024 | ||||
DOTA ×2 512 →1024 | PSNR ↑ | 24.73 | 27.64 | 27.48 | 27.51 | 25.34 | 25.51 | 27.17 | 27.36 | 27.15 | 26.51 | 24.66 |
SSIM ↑ | 0.4956 | 0.6639 | 0.6934 | 0.6972 | 0.5090 | 0.5317 | 0.6188 | 0.6861 | 0.6680 | 0.5496 | 0.5632 | |
FID ↓ | 198.14 | 129.64 | 126.26 | 121.47 | 190.81 | 149.61 | 129.59 | 127.62 | 125.46 | 131.62 | 143.84 | |
LPIPS ↓ | 0.5702 | 0.4318 | 0.4662 | 0.4071 | 0.5358 | 0.5163 | 0.4414 | 0.4287 | 0.3810 | 0.4275 | 0.4258 | |
NIQE ↓ | 20.52 | 18.78 | 10.32 | 18.95 | 13.72 | 17.95 | 18.65 | 17.62 | 13.55 | 11.31 | 10.04 | |
MUSIQ ↑ | 21.68 | 27.70 | 57.01 | 28.96 | 27.95 | 29.61 | 29.50 | 34.51 | 51.35 | 54.28 | 59.29 | |
CLIP-IQA ↑ | 0.5069 | 0.5281 | 0.6440 | 0.6052 | 0.5476 | 0.5921 | 0.5800 | 0.5841 | 0.5781 | 0.6294 | 0.7254 | |
MANIQA ↑ | 0.1759 | 0.1518 | 0.3108 | 0.1937 | 0.1494 | 0.1596 | 0.1760 | 0.2061 | 0.2995 | 0.3452 | 0.3634 | |
DOTA ×4 256→1024 | PSNR ↑ | 24.07 | 26.24 | 25.86 | 26.31 | 26.26 | 26.22 | 26.19 | 25.97 | 26.53 | 25.63 | 23.66 |
SSIM ↑ | 0.4845 | 0.6137 | 0.6257 | 0.6162 | 0.6094 | 0.6181 | 0.5952 | 0.5867 | 0.6328 | 0.5496 | 0.4917 | |
FID ↓ | 215.37 | 186.81 | 196.14 | 185.52 | 186.72 | 185.60 | 184.72 | 180.68 | 163.36 | 169.85 | 171.01 | |
LPIPS ↓ | 0.5741 | 0.5055 | 0.4785 | 0.5026 | 0.5108 | 0.5006 | 0.5204 | 0.5082 | 0.4429 | 0.4581 | 0.4701 | |
NIQE ↓ | 18.10 | 16.42 | 10.49 | 19.06 | 16.69 | 18.07 | 15.19 | 13.97 | 14.10 | 11.82 | 9.78 | |
MUSIQ ↑ | 21.57 | 26.90 | 57.23 | 24.66 | 26.07 | 25.63 | 24.73 | 35.78 | 43.13 | 51.64 | 58.41 | |
CLIP-IQA ↑ | 0.5112 | 0.5836 | 0.6179 | 0.5902 | 0.5529 | 0.5849 | 0.5255 | 0.5585 | 0.5678 | 0.6050 | 0.7384 | |
MANIQA ↑ | 0.1946 | 0.2047 | 0.3207 | 0.1920 | 0.1880 | 0.1988 | 0.1462 | 0.1847 | 0.2388 | 0.3391 | 0.3544 | |
RSOD ×2 512→1024 | PSNR ↑ | 25.59 | 28.11 | 27.27 | 25.96 | 25.78 | 27.18 | 27.32 | 27.23 | 27.24 | 27.03 | 23.54 |
SSIM ↑ | 0.5533 | 0.7203 | 0.7343 | 0.5871 | 0.5642 | 0.6386 | 0.6472 | 0.6405 | 0.7178 | 0.6572 | 0.5714 | |
FID ↓ | 138.89 | 119.63 | 126.97 | 153.12 | 157.95 | 146.87 | 128.36 | 129.84 | 123.02 | 127.88 | 141.76 | |
LPIPS ↓ | 0.4722 | 0.3878 | 0.3362 | 0.4996 | 0.5125 | 0.4463 | 0.4207 | 0.4363 | 0.3608 | 0.4245 | 0.4127 | |
NIQE ↓ | 15.35 | 19.52 | 11.33 | 17.70 | 12.92 | 17.32 | 19.57 | 20.73 | 13.62 | 11.14 | 10.46 | |
MUSIQ ↑ | 21.44 | 23.83 | 57.65 | 23.17 | 22.69 | 25.91 | 26.07 | 25.14 | 48.84 | 52.95 | 62.15 | |
CLIP-IQA ↑ | 0.5052 | 0.6105 | 0.7154 | 0.6851 | 0.5688 | 0.7016 | 0.5842 | 0.5661 | 0.7116 | 0.7246 | 0.7344 | |
MANIQA ↑ | 0.1239 | 0.1212 | 0.2771 | 0.1326 | 0.1093 | 0.2054 | 0.1619 | 0.2007 | 0.2341 | 0.3102 | 0.3784 | |
RSOD ×4 256→1024 | PSNR ↑ | 25.10 | 26.62 | 25.38 | 26.64 | 26.46 | 26.65 | 26.59 | 25.71 | 26.66 | 26.02 | 22.79 |
SSIM ↑ | 0.6313 | 0.6683 | 0.6427 | 0.6681 | 0.6587 | 0.6687 | 0.6533 | 0.6378 | 0.6773 | 0.6014 | 0.4830 | |
FID ↓ | 216.43 | 177.13 | 209.32 | 177.35 | 178.72 | 177.39 | 176.53 | 177.64 | 165.25 | 180.90 | 173.72 | |
LPIPS ↓ | 0.5073 | 0.4675 | 0.4699 | 0.4638 | 0.4708 | 0.4649 | 0.4801 | 0.4725 | 0.4303 | 0.4778 | 0.4696 | |
NIQE ↓ | 17.64 | 16.52 | 11.61 | 19.67 | 16.31 | 18.59 | 15.26 | 16.42 | 13.54 | 11.39 | 11.34 | |
MUSIQ ↑ | 20.87 | 23.27 | 59.09 | 22.25 | 23.51 | 22.39 | 22.41 | 23.57 | 38.99 | 49.37 | 60.99 | |
CLIP-IQA ↑ | 0.5912 | 0.6435 | 0.6856 | 0.6647 | 0.6496 | 0.6529 | 0.6266 | 0.6364 | 0.6547 | 0.6499 | 0.6553 | |
MANIQA ↑ | 0.1363 | 0.1688 | 0.3155 | 0.1511 | 0.1515 | 0.1630 | 0.1173 | 0.1445 | 0.1802 | 0.3016 | 0.3749 |
Datasets | Metrics | Bicubic | SwinIR [9] | Real-ESRGAN+ [30] | HSENet [60] | TransENet [8] | TTST [61] | EDiffSR [22] | FastDiffSR [62] | StableSR [17] | DiffBIR [18] | Ours |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ICCV2021 | ICCV2021 | TGRS2021 | TGRS2021 | TIP2024 | TGRS2024 | TGRS2024 | IJCV2024 | ECCV2024 | ||||
DOTA ×2 512→1024 | PSNR ↑ | 25.70 | 31.83 | 31.01 | 27.56 | 26.29 | 27.21 | 27.32 | 28.61 | 25.93 | 27.18 | 25.81 |
SSIM ↑ | 0.6268 | 0.8828 | 0.8232 | 0.7018 | 0.6550 | 0.6916 | 0.6984 | 0.7058 | 0.6514 | 0.6262 | 0.6011 | |
FID ↓ | 93.14 | 55.41 | 68.71 | 60.17 | 139.33 | 109.51 | 88.75 | 84.01 | 107.54 | 124.41 | 115.28 | |
LPIPS ↓ | 0.3557 | 0.2445 | 0.2297 | 0.2548 | 0.3612 | 0.3381 | 0.3380 | 0.3195 | 0.3274 | 0.3885 | 0.3636 | |
NIQE ↓ | 16.90 | 12.87 | 11.33 | 19.39 | 11.57 | 15.32 | 16.69 | 15.26 | 10.03 | 10.62 | 9.56 | |
MUSIQ ↑ | 41.64 | 58.14 | 56.95 | 50.94 | 53.07 | 58.81 | 46.91 | 49.81 | 64.08 | 56.20 | 61.68 | |
CLIP-IQA ↑ | 0.6017 | 0.6353 | 0.6536 | 0.6305 | 0.6573 | 0.6475 | 0.6916 | 0.6895 | 0.6022 | 0.6340 | 0.7058 | |
MANIQA ↑ | 0.2783 | 0.3562 | 0.2827 | 0.2701 | 0.2857 | 0.3104 | 0.2698 | 0.2714 | 0.4648 | 0.3491 | 0.4025 | |
DOTA ×4 256→1024 | PSNR ↑ | 24.38 | 23.92 | 26.81 | 26.39 | 25.12 | 24.89 | 24.91 | 25.51 | 26.34 | 26.49 | 25.40 |
SSIM ↑ | 0.6617 | 0.6554 | 0.6966 | 0.7055 | 0.6675 | 0.6852 | 0.5709 | 0.6072 | 0.6677 | 0.6083 | 0.6043 | |
FID ↓ | 138.69 | 118.45 | 123.74 | 78.13 | 105.92 | 104.02 | 112.51 | 109.27 | 102.31 | 124.52 | 117.17 | |
LPIPS ↓ | 0.4780 | 0.3664 | 0.3546 | 0.3687 | 0.3826 | 0.3504 | 0.4229 | 0.4083 | 0.3451 | 0.4086 | 0.3975 | |
NIQE ↓ | 18.12 | 13.74 | 12.06 | 20.02 | 16.34 | 15.18 | 16.51 | 14.84 | 12.74 | 11.66 | 11.28 | |
MUSIQ ↑ | 25.35 | 54.83 | 55.35 | 40.27 | 45.97 | 52.11 | 43.33 | 47.23 | 57.11 | 55.40 | 57.24 | |
CLIP-IQA ↑ | 0.6051 | 0.6728 | 0.6728 | 0.6113 | 0.6266 | 0.6684 | 0.7188 | 0.6361 | 0.5920 | 0.6315 | 0.7232 | |
MANIQA ↑ | 0.2531 | 0.3280 | 0.3280 | 0.2670 | 0.2828 | 0.3115 | 0.2536 | 0.2956 | 0.3566 | 0.3273 | 0.3366 | |
RSOD ×2 512→1024 | PSNR ↑ | 23.93 | 30.11 | 30.41 | 27.49 | 26.66 | 26.49 | 26.76 | 27.74 | 26.47 | 27.78 | 24.80 |
SSIM ↑ | 0.6935 | 0.8601 | 0.8534 | 0.7892 | 0.7017 | 0.7316 | 0.6818 | 0.7897 | 0.7266 | 0.6931 | 0.6709 | |
FID ↓ | 106.07 | 65.26 | 63.72 | 81.49 | 95.52 | 110.78 | 80.84 | 73.74 | 102.75 | 121.12 | 114.84 | |
LPIPS ↓ | 0.3761 | 0.2077 | 0.2018 | 0.3693 | 0.3785 | 0.3518 | 0.3556 | 0.3385 | 0.3134 | 0.3840 | 0.3357 | |
NIQE ↓ | 16.70 | 15.54 | 12.59 | 17.96 | 10.77 | 13.61 | 17.61 | 17.84 | 11.91 | 10.83 | 10.52 | |
MUSIQ ↑ | 34.13 | 43.59 | 55.54 | 37.91 | 38.16 | 53.98 | 42.72 | 42.74 | 61.68 | 56.80 | 62.04 | |
CLIP-IQA ↑ | 0.6067 | 0.6745 | 0.7360 | 0.7105 | 0.6408 | 0.7497 | 0.6922 | 0.7072 | 0.7509 | 0.7365 | 0.7560 | |
MANIQA ↑ | 0.2091 | 0.2395 | 0.2378 | 0.2085 | 0.1940 | 0.2384 | 0.2267 | 0.2751 | 0.3790 | 0.3341 | 0.3569 | |
RSOD ×4 256→1024 | PSNR ↑ | 23.09 | 24.69 | 26.54 | 26.32 | 25.29 | 25.33 | 25.76 | 25.97 | 26.63 | 26.95 | 23.79 |
SSIM ↑ | 0.6592 | 0.6857 | 0.7235 | 0.7133 | 0.6870 | 0.6996 | 0.6581 | 0.6695 | 0.7206 | 0.6750 | 0.6065 | |
FID ↓ | 175.53 | 140.12 | 142.85 | 142.75 | 135.99 | 135.07 | 139.86 | 137.74 | 123.11 | 141.40 | 134.77 | |
LPIPS ↓ | 0.4255 | 0.3976 | 0.3952 | 0.4034 | 0.3972 | 0.3960 | 0.4113 | 0.3864 | 0.3449 | 0.4187 | 0.3941 | |
NIQE ↓ | 15.09 | 12.89 | 11.50 | 18.15 | 14.85 | 14.67 | 14.53 | 15.81 | 13.85 | 9.58 | 10.58 | |
MUSIQ ↑ | 27.91 | 45.21 | 59.15 | 35.26 | 39.09 | 42.88 | 36.58 | 45.53 | 54.41 | 52.55 | 62.03 | |
CLIP-IQA ↑ | 0.6734 | 0.7492 | 0.7359 | 0.7082 | 0.6962 | 0.7487 | 0.6730 | 0.6925 | 0.7380 | 0.7133 | 0.7515 | |
MANIQA ↑ | 0.1835 | 0.2583 | 0.2822 | 0.2109 | 0.2091 | 0.2429 | 0.1954 | 0.2239 | 0.2603 | 0.2997 | 0.3551 |
Datasets | Metrics | Bicubic | SwinIR [9] | Real-ESRGAN+ [30] | HSENet [60] | TransENet [8] | TTST [61] | EDiffSR [22] | FastDiffSR [62] | StableSR [17] | DiffBIR [18] | Ours |
---|---|---|---|---|---|---|---|---|---|---|---|---|
ICCV2021 | ICCV2021 | TGRS2021 | TIP2024 | TGRS2021 | TGRS2024 | TGRS2024 | IJCV2024 | ECCV2024 | ||||
NWPU VHR-10 ×2 512→1024 | NIQE ↓ | 16.11 | 15.54 | 12.71 | 18.66 | 12.06 | 16.14 | 17.59 | 17.34 | 12.52 | 11.50 | 11.16 |
MUSIQ ↑ | 38.71 | 41.63 | 54.25 | 43.74 | 39.96 | 40.18 | 42.24 | 48.32 | 62.23 | 57.63 | 59.75 | |
CLIP-IQA ↑ | 0.6068 | 0.6445 | 0.7360 | 0.6715 | 0.6408 | 0.7261 | 0.6922 | 0.7017 | 0.7509 | 0.6422 | 0.7760 | |
MANIQA ↑ | 0.2093 | 0.2271 | 0.2607 | 0.2075 | 0.1962 | 0.2268 | 0.2136 | 0.2734 | 0.3857 | 0.3486 | 0.3533 | |
NWPU VHR-10 ×4 512→2048 | NIQE ↓ | 15.58 | 13.61 | 10.77 | 17.71 | 13.76 | 15.23 | 14.03 | 15.97 | 12.54 | 11.07 | 10.10 |
MUSIQ ↑ | 22.39 | 28.10 | 52.97 | 35.97 | 26.70 | 25.62 | 25.29 | 31.85 | 43.12 | 46.79 | 53.44 | |
CLIP-IQA ↑ | 0.7234 | 0.7692 | 0.7359 | 0.6647 | 0.7962 | 0.7135 | 0.7730 | 0.7613 | 0.7380 | 0.7258 | 0.7915 | |
MANIQA ↑ | 0.1834 | 0.2263 | 0.2998 | 0.2278 | 0.2007 | 0.2162 | 0.1661 | 0.2471 | 0.2474 | 0.3150 | 0.3496 | |
Potsdam ×2 512→1024 | NIQE ↓ | 15.90 | 16.23 | 14.06 | 18.38 | 15.13 | 15.94 | 18.82 | 15.75 | 14.84 | 14.93 | 14.14 |
MUSIQ ↑ | 36.67 | 37.86 | 52.68 | 47.81 | 37.45 | 37.93 | 40.03 | 45.68 | 55.73 | 55.13 | 56.72 | |
CLIP-IQA ↑ | 0.5312 | 0.5828 | 0.5892 | 0.5485 | 0.5432 | 0.6321 | 0.6352 | 0.5857 | 0.5882 | 0.5204 | 0.6678 | |
MANIQA ↑ | 0.2128 | 0.2251 | 0.2438 | 0.2153 | 0.2037 | 0.2615 | 0.2207 | 0.2753 | 0.2928 | 0.3200 | 0.3317 | |
Potsdam ×4 512→2048 | NIQE ↓ | 14.46 | 14.08 | 11.39 | 17.47 | 14.34 | 16.58 | 14.03 | 14.53 | 13.37 | 12.51 | 11.89 |
MUSIQ ↑ | 21.67 | 24.15 | 45.87 | 33.34 | 23.45 | 22.21 | 22.79 | 28.22 | 38.46 | 45.12 | 45.91 | |
CLIP-IQA ↑ | 0.7147 | 0.7742 | 0.7482 | 0.4945 | 0.7709 | 0.7280 | 0.7276 | 0.7489 | 0.4551 | 0.5642 | 0.7816 | |
MANIQA ↑ | 0.2137 | 0.2533 | 0.3151 | 0.2472 | 0.2307 | 0.2519 | 0.1697 | 0.2024 | 0.2321 | 0.3154 | 0.3236 |
Methods | SwinIR | Real-ESRGAN+ | HSENet | TransENet | TTST | EDiffSR | FastDiffSR | StableSR | DiffBIR | Ours |
---|---|---|---|---|---|---|---|---|---|---|
Inference Time | 2.4 s | 0.51 s | 6.4 s | 5.1 s | 1.6 s | 87 s | 30 s | 40 s | 34 s | 12 s |
Parameter | 11.75 M | 16.70 M | 5.58 M | 37.61 M | 18.37 M | 26.79 M | 20 M | 1.44 B | 1.71 B | 3.7 B |
Model | CEJG-Content | CEJG-edge | ZeroConv | LoRA | NIQE ↓ | MUSIQ ↑ | CLIP-IQA ↑ | MANIQA ↑ |
---|---|---|---|---|---|---|---|---|
A | ✓ | ✓ | ✓ | 12.79 | 49.42 | 0.7861 | 0.3238 | |
B | ✓ | ✓ | ✓ | 11.31 | 48.83 | 0.7742 | 0.3141 | |
C | ✓ | ✓ | 12.86 | 46.91 | 0.7619 | 0.3091 | ||
D | ✓ | ✓ | 13.64 | 45.95 | 0.7478 | 0.3028 | ||
Ours | ✓ | ✓ | ✓ | 10.10 | 53.44 | 0.7915 | 0.3496 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, C.; Liu, Y.; Huang, S.; Wang, F. Taming a Diffusion Model to Revitalize Remote Sensing Image Super-Resolution. Remote Sens. 2025, 17, 1348. https://doi.org/10.3390/rs17081348
Zhu C, Liu Y, Huang S, Wang F. Taming a Diffusion Model to Revitalize Remote Sensing Image Super-Resolution. Remote Sensing. 2025; 17(8):1348. https://doi.org/10.3390/rs17081348
Chicago/Turabian StyleZhu, Chao, Yong Liu, Shan Huang, and Fei Wang. 2025. "Taming a Diffusion Model to Revitalize Remote Sensing Image Super-Resolution" Remote Sensing 17, no. 8: 1348. https://doi.org/10.3390/rs17081348
APA StyleZhu, C., Liu, Y., Huang, S., & Wang, F. (2025). Taming a Diffusion Model to Revitalize Remote Sensing Image Super-Resolution. Remote Sensing, 17(8), 1348. https://doi.org/10.3390/rs17081348