remotesensing-logo

Journal Browser

Journal Browser

Deep Learning for Multi-Source Remote Sensing Image Interpretation: Exploring, Rethinking, and Limiting Breakthroughs

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "AI Remote Sensing".

Deadline for manuscript submissions: 28 February 2026 | Viewed by 4205

Special Issue Editors

Radar Technology Research Institute, Beijing Institute of Technology, Beijing 100081, China
Interests: intelligent radar target detection and recognition; intelligent interpretation of remote sensing images

E-Mail Website
Guest Editor
Radar Technology Research Institute, Beijing Institute of Technology, Beijing 100081, China
Interests: target detection and recognition

E-Mail Website
Guest Editor
Radar Technology Research Institute, Beijing Institute of Technology, Beijing 100081, China
Interests: target detection and recognition

E-Mail Website
Guest Editor
Engineering Department, University of Almería, Carretera de Sacramento s/n, La Cañada de San Urbano, 04120 Almería, Spain
Interests: forest monitoring; OBIA; LiDAR; UAV; machine learning; optical satellite imagery; data fusion
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, China
Interests: computer vision remote sensing; deep learning; image processing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Remote sensing image interpretation, which is pivotal for environment, resource, and target monitoring, has experienced a profound transformation with the infusion of deep learning techniques. Deep learning algorithms possess the outstanding capacity to extract intricate patterns and features from extensive remote sensing image datasets, thereby facilitating more precise and efficient interpretation than conventional approaches. Concurrently, with the maturation of sensor technologies, a wealth of multi-source remote sensing image data have emerged, including those from radar, optical, and other sensors. These diverse data sources offer powerful technical means for remote sensing image interpretation.

This Special Issue unfolds methodically around several crucial and significant goals. Firstly, in terms of promoting the exploration of new tasks in remote sensing image interpretation, researchers are actively encouraged to break the routine. Nowadays, merely sticking to conventional recognition and detection tasks can no longer meet the development needs. We aim to leverage the powerful feature learning and data processing capabilities of deep learning technology to explore remote sensing image interpretation in new tasks and scenarios, thus broadening application fields and guiding research directions, enabling academic achievements to better serve practical demands. Secondly, reflecting on existing deep learning techniques is of great importance. Many traditional deep learning models proposed are rather general purpose, overlooking the differences in resolution, target characteristics, etc., among different source remote sensing images. Therefore, urging a rethinking helps develop improved architectures that better fit the unique nature of remote sensing images, allowing for more precise image analysis. Finally, the focus is on dissecting the limitations of deep learning technology in specific scenarios. When faced with thorny situations like imbalanced samples, weak targets, and complex backgrounds, the performance of models often falls short. We encourage researchers to propose innovative solutions to further enhance the interpretation performance of existing methods in these scenarios.

Articles may address, but are not limited to, the following topics:

  • Multi-source remote sensing image processing;
  • Remote sensing image generation;
  • Multimodal remote sensing image interpretation model;
  • Target characteristic analysis;
  • Radar signal processing;
  • New task dataset and benchmark for remote sensing image interpretation.

Dr. Xin Zhang
Dr. Xueyao Hu
Prof. Dr. Yang Li
Prof. Dr. Fernando José Aguilar
Dr. Muhammad Yasir
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • multi-source remote sensing image processing
  • remote sensing image generation
  • multimodal remote sensing image interpretation model
  • target characteristic analysis
  • radar signal processing
  • new task dataset and benchmark for remote sensing image interpretation

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

25 pages, 20535 KB  
Article
DWTF-DETR: A DETR-Based Model for Inshore Ship Detection in SAR Imagery via Dynamically Weighted Joint Time–Frequency Feature Fusion
by Tiancheng Dong, Taoyang Wang, Yuqi Han, Deren Li, Guo Zhang and Yuan Peng
Remote Sens. 2025, 17(19), 3301; https://doi.org/10.3390/rs17193301 - 25 Sep 2025
Abstract
Inshore ship detection in synthetic aperture radar (SAR) imagery poses significant challenges due to the high density and diversity of ships. However, low inter-object backscatter contrast and blurred boundaries of docked ships often result in performance degradation for traditional object detection methods, especially [...] Read more.
Inshore ship detection in synthetic aperture radar (SAR) imagery poses significant challenges due to the high density and diversity of ships. However, low inter-object backscatter contrast and blurred boundaries of docked ships often result in performance degradation for traditional object detection methods, especially under complex backgrounds and low signal-to-noise ratio (SNR) conditions. To address these issues, this paper proposes a novel detection framework, the Dynamic Weighted Joint Time–Frequency Feature Fusion DEtection TRansformer (DETR) Model (DWTF-DETR), specifically designed for SAR-based ship detection in inshore areas. The proposed model integrates a Dual-Domain Feature Fusion Module (DDFM) to extract and fuse features from both SAR images and their frequency-domain representations, enhancing sensitivity to both high- and low-frequency target features. Subsequently, a Dual-Path Attention Fusion Module (DPAFM) is introduced to dynamically weight and fuse shallow detail features with deep semantic representations. By leveraging an attention mechanism, the module adaptively adjusts the importance of different feature paths, thereby enhancing the model’s ability to perceive targets with ambiguous structural characteristics. Experiments conducted on a self-constructed inshore SAR ship detection dataset and the public HRSID dataset demonstrate that DWTF-DETR achieves superior performance compared to the baseline RT-DETR. Specifically, the proposed method improves mAP@50 by 1.60% and 0.72%, and F1-score by 0.58% and 1.40%, respectively. Moreover, comparative experiments show that the proposed approach outperforms several state-of-the-art SAR ship detection methods. The results confirm that DWTF-DETR is capable of achieving accurate and robust detection in diverse and complex maritime environments. Full article
Show Figures

Figure 1

32 pages, 6397 KB  
Article
Enhancing YOLO-Based SAR Ship Detection with Attention Mechanisms
by Ranyeri do Lago Rocha and Felipe A. P. de Figueiredo
Remote Sens. 2025, 17(18), 3170; https://doi.org/10.3390/rs17183170 - 12 Sep 2025
Viewed by 588
Abstract
This study enhances Synthetic Aperture Radar (SAR) ship detection by integrating attention mechanisms, Bi-Level Routing Attention (BRA), Swin Transformer, and a Convolutional Block Attention Module (CBAM) into state-of-the-art YOLO architectures (YOLOv11 and v12). Addressing challenges like small ship sizes and complex maritime backgrounds [...] Read more.
This study enhances Synthetic Aperture Radar (SAR) ship detection by integrating attention mechanisms, Bi-Level Routing Attention (BRA), Swin Transformer, and a Convolutional Block Attention Module (CBAM) into state-of-the-art YOLO architectures (YOLOv11 and v12). Addressing challenges like small ship sizes and complex maritime backgrounds in SAR imagery, we systematically evaluate the impact of adding and replacing attention layers at strategic positions within the models. Experiments reveal that replacing the original attention layer at position 4 (C3k2 module) with the CBAM in YOLOv12 achieves optimal performance, attaining an mAP@0.5 of 98.0% on the SAR Ship Dataset (SSD), surpassing baseline YOLOv12 (97.8%) and prior works. The optimized CBAM-enhanced YOLOv12 also reduces computational costs (5.9 GFLOPS vs. 6.5 GFLOPS in the baseline). Cross-dataset validation on the SAR Ship Detection Dataset (SSDD) confirms consistent improvements, underscoring the efficacy of targeted attention-layer replacement for SAR-specific challenges. Additionally, tests on the SADD and MSAR datasets demonstrate that this optimization generalizes beyond ship detection, yielding gains in aircraft detection and multi-class SAR object recognition. This work establishes a robust framework for efficient, high-precision maritime surveillance using deep learning. Full article
Show Figures

Figure 1

24 pages, 4589 KB  
Article
Semantic Segmentation of Clouds and Cloud Shadows Using State Space Models
by Zhixuan Zhang, Ziwei Hu, Min Xia, Ying Yan, Rui Zhang, Shengyan Liu and Tao Li
Remote Sens. 2025, 17(17), 3120; https://doi.org/10.3390/rs17173120 - 8 Sep 2025
Viewed by 596
Abstract
In remote sensing image processing, cloud and cloud shadow detection is of great significance, which can solve the problems of cloud occlusion and image distortion, and provide support for multiple fields. However, the traditional convolutional or Transformer models and the existing studies combining [...] Read more.
In remote sensing image processing, cloud and cloud shadow detection is of great significance, which can solve the problems of cloud occlusion and image distortion, and provide support for multiple fields. However, the traditional convolutional or Transformer models and the existing studies combining the two have some shortcomings, such as insufficient feature fusion, high computational complexity, and difficulty in taking into account local and long-range dependent information extraction. In order to solve these problems, this paper proposes the MCloud model based on Mamba architecture is proposed, which takes advantage of its linear computational complexity to effectively model long-range dependencies and local features through the coordinated work of state space and convolutional support and the Mamba-convolutional fusion module. Experiments show that MCloud have the leading segmentation performance and generalization ability on multiple datasets, and provides more accurate and efficient solutions for cloud and cloud shadow detection. Full article
Show Figures

Figure 1

22 pages, 5692 KB  
Article
RiceStageSeg: A Multimodal Benchmark Dataset for Semantic Segmentation of Rice Growth Stages
by Jianping Zhang, Tailai Chen, Yizhe Li, Qi Meng, Yanying Chen, Jie Deng and Enhong Sun
Remote Sens. 2025, 17(16), 2858; https://doi.org/10.3390/rs17162858 - 16 Aug 2025
Viewed by 735
Abstract
The accurate identification of rice growth stages is critical for precision agriculture, crop management, and yield estimation. Remote sensing technologies, particularly multimodal approaches that integrate high spatial and hyperspectral resolution imagery, have demonstrated great potential in large-scale crop monitoring. Multimodal data fusion offers [...] Read more.
The accurate identification of rice growth stages is critical for precision agriculture, crop management, and yield estimation. Remote sensing technologies, particularly multimodal approaches that integrate high spatial and hyperspectral resolution imagery, have demonstrated great potential in large-scale crop monitoring. Multimodal data fusion offers complementary and enriched spectral–spatial information, providing novel pathways for crop growth stage recognition in complex agricultural scenarios. However, the lack of publicly available multimodal datasets specifically designed for rice growth stage identification remains a significant bottleneck that limits the development and evaluation of relevant methods. To address this gap, we present RiceStageSeg, a multimodal benchmark dataset captured by unmanned aerial vehicles (UAVs), designed to support the development and assessment of segmentation models for rice growth monitoring. RiceStageSeg contains paired centimeter-level RGB and 10-band multispectral (MS) images acquired during several critical rice growth stages, including jointing and heading. Each image is accompanied by fine-grained, pixel-level annotations that distinguish between the different growth stages. We establish baseline experiments using several state-of-the-art semantic segmentation models under both unimodal (RGB-only, MS-only) and multimodal (RGB + MS fusion) settings. The experimental results demonstrate that multimodal feature-level fusion outperforms unimodal approaches in segmentation accuracy. RiceStageSeg offers a standardized benchmark to advance future research in multimodal semantic segmentation for agricultural remote sensing. The dataset will be made publicly available on GitHub v0.11.0 (accessed on 1 August 2025). Full article
Show Figures

Figure 1

23 pages, 7944 KB  
Article
BCTDNet: Building Change-Type Detection Networks with the Segment Anything Model in Remote Sensing Images
by Wei Zhang, Jinsong Li, Shuaipeng Wang and Jianhua Wan
Remote Sens. 2025, 17(15), 2742; https://doi.org/10.3390/rs17152742 - 7 Aug 2025
Viewed by 451
Abstract
Observing building changes in remote sensing images plays a crucial role in monitoring urban development and promoting sustainable urbanization. Mainstream change detection methods have demonstrated promising performance in identifying building changes. However, buildings have large intra-class variance and high similarity with other objects, [...] Read more.
Observing building changes in remote sensing images plays a crucial role in monitoring urban development and promoting sustainable urbanization. Mainstream change detection methods have demonstrated promising performance in identifying building changes. However, buildings have large intra-class variance and high similarity with other objects, limiting the generalization ability of models in diverse scenarios. Moreover, most existing methods only detect whether changes have occurred but ignore change types, such as new construction and demolition. To address these issues, we present a building change-type detection network (BCTDNet) based on the Segment Anything Model (SAM) to identify newly constructed and demolished buildings. We first construct a dual-feature interaction encoder that employs SAM to extract image features, which are then refined through trainable multi-scale adapters for learning architectural structures and semantic patterns. Moreover, an interactive attention module bridges SAM with a Convolutional Neural Network, enabling seamless interaction between fine-grained structural information and deep semantic features. Furthermore, we develop a change-aware attribute decoder that integrates building semantics into the change detection process via an extraction decoding network. Subsequently, an attribute-aware strategy is adopted to explicitly generate distinct maps for newly constructed and demolished buildings, thereby establishing clear temporal relationships among different change types. To evaluate BCTDNet’s performance, we construct the JINAN-MCD dataset, which covers Jinan’s urban core area over a six-year period, capturing diverse change scenarios. Moreover, we adapt the WHU-CD dataset into WHU-MCD to include multiple types of changing. Experimental results on both datasets demonstrate the superiority of BCTDNet. On JINAN-MCD, BCTDNet achieves improvements of 12.64% in IoU and 11.95% in F1 compared to suboptimal methods. Similarly, on WHU-MCD, it outperforms second-best approaches by 2.71% in IoU and 1.62% in F1. BCTDNet’s effectiveness and robustness in complex urban scenarios highlight its potential for applications in land-use analysis and urban planning. Full article
Show Figures

Figure 1

25 pages, 17505 KB  
Article
A Hybrid Spatio-Temporal Graph Attention (ST D-GAT Framework) for Imputing Missing SBAS-InSAR Deformation Values to Strengthen Landslide Monitoring
by Hilal Ahmad, Yinghua Zhang, Hafeezur Rehman, Mehtab Alam, Zia Ullah, Muhammad Asfandyar Shahid, Majid Khan and Aboubakar Siddique
Remote Sens. 2025, 17(15), 2613; https://doi.org/10.3390/rs17152613 - 28 Jul 2025
Cited by 1 | Viewed by 700
Abstract
Reservoir-induced landslides threaten infrastructures and downstream communities, making continuous deformation monitoring vital. Time-series InSAR, notably the SBAS algorithm, provides high-precision surface-displacement mapping but suffers from voids due to layover/shadow effects and temporal decorrelation. Existing deep-learning approaches often operate on fixed-size patches or ignore [...] Read more.
Reservoir-induced landslides threaten infrastructures and downstream communities, making continuous deformation monitoring vital. Time-series InSAR, notably the SBAS algorithm, provides high-precision surface-displacement mapping but suffers from voids due to layover/shadow effects and temporal decorrelation. Existing deep-learning approaches often operate on fixed-size patches or ignore irregular spatio-temporal dependencies, limiting their ability to recover missing pixels. With this objective, a hybrid spatio-temporal Graph Attention (ST-GAT) framework was developed and trained on SBAS-InSAR values using 24 influential features. A unified spatio-temporal graph is constructed, where each node represents a pixel at a specific acquisition time. The nodes are connected via inverse distance spatial edges to their K-nearest neighbors, and they have bidirectional temporal edges to themselves in adjacent acquisitions. The two spatial GAT layers capture terrain-driven influences, while the two temporal GAT layers model annual deformation trends. A compact MLP with per-map bias converts the fused node embeddings into normalized LOS estimates. The SBAS-InSAR results reveal LOS deformation, with 48% of missing pixels and 20% located near the Dasu dam. ST D-GAT reconstructed fully continuous spatio-temporal displacement fields, filling voids at critical sites. The model was validated and achieved an overall R2 (0.907), ρ (0.947), per-map R2 ≥ 0.807 with RMSE ≤ 9.99, and a ROC-AUC of 0.91. It also outperformed the six compared baseline models (IDW, KNN, RF, XGBoost, MLP, simple-NN) in both RMSE and R2. By combining observed LOS values with 24 covariates in the proposed model, it delivers physically consistent gap-filling and enables continuous, high-resolution landslide monitoring in radar-challenged mountainous terrain. Full article
Show Figures

Graphical abstract

Back to TopTop