Next Article in Journal
Calibration of Dual-Polarised Antennas for Air-Coupled Ground Penetrating Radar Applications
Previous Article in Journal
FireNet: A Lightweight and Efficient Multi-Scenario Fire Object Detector
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions

1
Department of Electronic and Communication Engineering, North China Electric Power University, Baoding 071003, China
2
Hebei Key Laboratory of Power Internet of Things Technology, North China Electric Power University, Baoding 071003, China
*
Author to whom correspondence should be addressed.
Remote Sens. 2024, 16(21), 4113; https://doi.org/10.3390/rs16214113
Submission received: 6 September 2024 / Revised: 23 October 2024 / Accepted: 31 October 2024 / Published: 4 November 2024
(This article belongs to the Section Remote Sensing Image Processing)

Abstract

Remote sensing images contain a wealth of Earth-observation information. Efficient extraction and application of hidden knowledge from these images will greatly promote the development of resource and environment monitoring, urban planning and other related fields. Remote sensing image caption (RSIC) involves obtaining textual descriptions from remote sensing images through accurately capturing and describing the semantic-level relationships between objects and attributes in the images. However, there is currently no comprehensive review summarizing the progress in RSIC based on deep learning. After defining the scope of the papers to be discussed and summarizing them all, the paper begins by providing a comprehensive review of the recent advancements in RSIC, covering six key aspects: encoder–decoder framework, attention mechanism, reinforcement learning, learning with auxiliary task, large visual language models and few-shot learning. Subsequently a brief explanation on the datasets and evaluation metrics for RSIC is given. Furthermore, we compare and analyze the results of the latest models and the pros and cons of different deep learning methods. Lastly, future directions of RSIC are suggested. The primary objective of this review is to offer researchers a more profound understanding of RSIC.
Keywords: remote sensing; image caption; encoder–decoder framework; attention mechanism; reinforcement learning; auxiliary task; large visual language model; few-shot learning remote sensing; image caption; encoder–decoder framework; attention mechanism; reinforcement learning; auxiliary task; large visual language model; few-shot learning

Share and Cite

MDPI and ACS Style

Zhang, K.; Li, P.; Wang, J. A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions. Remote Sens. 2024, 16, 4113. https://doi.org/10.3390/rs16214113

AMA Style

Zhang K, Li P, Wang J. A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions. Remote Sensing. 2024; 16(21):4113. https://doi.org/10.3390/rs16214113

Chicago/Turabian Style

Zhang, Ke, Peijie Li, and Jianqiang Wang. 2024. "A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions" Remote Sensing 16, no. 21: 4113. https://doi.org/10.3390/rs16214113

APA Style

Zhang, K., Li, P., & Wang, J. (2024). A Review of Deep Learning-Based Remote Sensing Image Caption: Methods, Models, Comparisons and Future Directions. Remote Sensing, 16(21), 4113. https://doi.org/10.3390/rs16214113

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop