remotesensing-logo

Journal Browser

Journal Browser

AI Interpretation of Satellite, Aerial, Ground, and Underwater Image and Video Sequences

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "AI Remote Sensing".

Deadline for manuscript submissions: closed (15 November 2022) | Viewed by 26714

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, Sapienza University of Rome, 00185 Rome, Italy
Interests: computer vision (feature extraction and pattern analysis); scene and event understanding (by people and/or vehicles and/or objects); human–computer interaction (pose estimation and gesture recognition by hands and/or body); sketch-based interaction (handwriting and freehand drawing); human–behaviour recognition (actions, emotions, feelings, affects, and moods by hands, body, facial expressions, and voice); biometric analysis (person re-identification by body visual features and/or gait and/or posture/pose); artificial intelligence (machine/deep learning); medical image analysis (MRI, ultrasound, X-rays, PET, and CT); multimodal fusion models; brain–computer interfaces (interaction and security systems); signal processing; visual cryptography (by RGB images); smart environments and natural interaction (with and without virtual/augmented reality); robotics (monitoring and surveillance systems with PTZ cameras, UAVs, AUVs, rovers, and humanoids)
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Science, Sapienza University of Rome, 00185 Rome, Italy
Interests: computer vision (feature extraction and pattern analysis); scene and event understanding (by people and/or vehicles and/or objects); human–behaviour recognition (actions, emotions, feelings, affects, and moods by hands, body, facial expressions, and voice); biometric analysis (person re-identification by body visual features and/or gait and/or posture/pose); artificial intelligence (machine/deep learning); brain–computer interfaces (interaction and security systems); signal processing; visual cryptography (by RGB images); robotics (monitoring and surveillance systems by PTZ cameras, UAVs, AUVs, rovers, and humanoids)
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Science, Sapienza University of Rome, 00185 Rome, Italy
Interests: computer vision (feature extraction and pattern analysis); scene and event understanding (by people and/or vehicles and/or objects); Human-Behaviour Recognition (actions, emotions, feelings, affects, and moods by hands, body, facial expressions, and voice); biometric analysis (person re-identification by body visual features and/or gait and/or posture/pose); artificial intelligence (machine/deep learning); medical image analysis (MRI, ultrasound, X-rays, PET, and CT); multimodal fusion models; signal processing; robotics (monitoring and surveillance systems by PTZ cameras, UAVs, AUVs, rovers, and humanoids)
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The MDPI journal Remote Sensing is inviting submissions to the Special Issue “AI Interpretation of Satellite, Aerial, Ground, and Underwater Image and Video Sequences”.

In recent years, artificial intelligence (AI) techniques, like machine and deep learning, case-based reasoning, reasoning under uncertainty, knowledge representation, and many others have supported the development of a wide range of algorithms and methods to understand and interpret complex visual information coming from satellite, aerial, ground, and underwater image and video sequences. These algorithms and methods are hence used to implement smart applications able to support different areas of interest such as Earth observation at local and global scales, monitoring and security vision-based systems of unmanned aerial vehicles (UAVs) at any scale (i.e., small, medium, and large), video surveillance systems by static or pan–tilt–zoom (PTZ) cameras, inspection and analysis vision-based systems of autonomous underwater vehicles (AUVs) or remotely operated vehicles (ROVs), and many others. The main aim of this Special Issue is to collect the most innovative works in image and video processing, independent of the specific acquisition device, in the support of practical and concrete problems in the civil and military fields. The Special Issue is not limited to RGB cameras, like static or PTZ, but it is open to any kind of acquisition device able to provide visual information that can be processed and interpreted by AI techniques such as 3D cameras, time-of-flight (ToF) cameras, structured-light cameras, thermal cameras, light detection and ranging (LiDAR) sensors, side-scan sonars (SSSs), radio detection and ranging (RADAR), and so on; even data ensemble and/or data fusion systems will be considered.

Prof. Danilo Avola
Prof. Daniele Pannone
Dr. Alessio Fagioli
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • earth observation at local and global scales
  • weather prediction
  • deforestation
  • land use mapping
  • urban growth
  • crop monitoring
  • land cover mapping
  • land and border monitoring
  • person and/or vehicle and/or object classification
  • background modelling, foreground detection
  • feature extractors and descriptors
  • change detection
  • novelty detection
  • saliency detection
  • mosaicking and stitching
  • video surveillance
  • person and/or vehicle re-identification
  • event Recognition
  • action recognition
  • deception detection
  • affect/emotion recognition
  • data fusion
  • visual inspection
  • semantic segmentation
  • SLAM algorithms
  • environment analysis

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 13817 KiB  
Article
Fault Detection via 2.5D Transformer U-Net with Seismic Data Pre-Processing
by Zhanxin Tang, Bangyu Wu, Weihua Wu and Debo Ma
Remote Sens. 2023, 15(4), 1039; https://doi.org/10.3390/rs15041039 - 14 Feb 2023
Cited by 4 | Viewed by 2409
Abstract
Seismic fault structures are important for the detection and exploitation of hydrocarbon resources. Due to their development and popularity in the geophysical community, deep-learning-based fault detection methods have been proposed and achieved SOTA results. Due to the efficiency and benefits of full spatial [...] Read more.
Seismic fault structures are important for the detection and exploitation of hydrocarbon resources. Due to their development and popularity in the geophysical community, deep-learning-based fault detection methods have been proposed and achieved SOTA results. Due to the efficiency and benefits of full spatial information extraction, 3D convolutional neural networks (CNNs) are used widely to directly detect faults on seismic data volumes. However, using 3D data for training requires expensive computational resources and can be limited by hardware facilities. Although 2D CNN methods are less computationally intensive, they lead to the loss of correlation between seismic slices. To mitigate the aforementioned problems, we propose to predict a 2D fault section using multiple neighboring seismic profiles, that is, 2.5D fault detection. In CNNs, convolution layers mainly extract local information and pooling layers may disrupt the edge features in seismic data, which tend to cause fault discontinuities. To this end, we incorporate the Transformer module in U-net for feature extraction to enhance prediction continuity. To reduce the data discrepancies between synthetic and different real seismic datasets, we apply a seismic data standardization workflow to improve the prediction stability on real datasets. Netherlands F3 real data tests show that, when training on synthetic data labels, the proposed 2.5D Transformer U-net-based method predicts more subtle faults and faults with higher spatial continuity than the baseline full 3D U-net model. Full article
Show Figures

Figure 1

20 pages, 5229 KiB  
Article
Pixel Representation Augmented through Cross-Attention for High-Resolution Remote Sensing Imagery Segmentation
by Yiyun Luo, Jinnian Wang, Xiankun Yang, Zhenyu Yu and Zixuan Tan
Remote Sens. 2022, 14(21), 5415; https://doi.org/10.3390/rs14215415 - 28 Oct 2022
Cited by 1 | Viewed by 1743
Abstract
Natural imagery segmentation has been transferred to land cover classification in remote sensing imagery with excellent performance. However, two key issues have been overlooked in the transfer process: (1) some objects were easily overwhelmed by the complex backgrounds; (2) interclass information for indistinguishable [...] Read more.
Natural imagery segmentation has been transferred to land cover classification in remote sensing imagery with excellent performance. However, two key issues have been overlooked in the transfer process: (1) some objects were easily overwhelmed by the complex backgrounds; (2) interclass information for indistinguishable classes was not fully utilized. The attention mechanism in the transformer is capable of modeling long-range dependencies on each sample for per-pixel context extraction. Notably, per-pixel context from the attention mechanism can aggregate category information. Therefore, we proposed a semantic segmentation method based on pixel representation augmentation. In our method, a simplified feature pyramid was designed to decode the hierarchical pixel features from the backbone, and then decode the category representations into learnable category object embedding queries by cross-attention in the transformer decoder. Finally, pixel representation is augmented by an additional cross-attention in the transformer encoder under the supervision of auxiliary segmentation heads. The results of extensive experiments on the aerial image dataset Potsdam and satellite image dataset Gaofen Image Dataset with 15 categories (GID-15) demonstrate that the cross-attention is effective, and our method achieved the mean intersection over union (mIoU) of 86.2% and 62.5% on the Potsdam test set and GID-15 validation set, respectively. Additionally, we achieved an inference speed of 76 frames per second (FPS) on the Potsdam test dataset, higher than all the state-of-the-art models we tested on the same device. Full article
Show Figures

Graphical abstract

15 pages, 19046 KiB  
Article
SS R-CNN: Self-Supervised Learning Improving Mask R-CNN for Ship Detection in Remote Sensing Images
by Ling Jian, Zhiqi Pu, Lili Zhu, Tiancan Yao and Xijun Liang
Remote Sens. 2022, 14(17), 4383; https://doi.org/10.3390/rs14174383 - 03 Sep 2022
Cited by 8 | Viewed by 1906
Abstract
Due to the cost of acquiring and labeling remote sensing images, only a limited number of images with the target objects are obtained and labeled in some practical applications, which severely limits the generalization capability of typical deep learning networks. Self-supervised learning can [...] Read more.
Due to the cost of acquiring and labeling remote sensing images, only a limited number of images with the target objects are obtained and labeled in some practical applications, which severely limits the generalization capability of typical deep learning networks. Self-supervised learning can learn the inherent feature representations of unlabeled instances and is a promising technique for marine ship detection. In this work, we design a more-way CutPaste self-supervised task to train a feature representation network using clean marine surface images with no ships, based on which a two-stage object detection model using Mask R-CNN is improved to detect marine ships. Experimental results show that with a limited number of labeled remote sensing images, the designed model achieves better detection performance than supervised baseline methods in terms of mAP. Particularly, the detection accuracy for small-sized marine ships is evidently improved. Full article
Show Figures

Graphical abstract

18 pages, 19536 KiB  
Article
A Novel GAN-Based Anomaly Detection and Localization Method for Aerial Video Surveillance at Low Altitude
by Danilo Avola, Irene Cannistraci, Marco Cascio, Luigi Cinque, Anxhelo Diko, Alessio Fagioli, Gian Luca Foresti, Romeo Lanzino, Maurizio Mancini, Alessio Mecca and Daniele Pannone
Remote Sens. 2022, 14(16), 4110; https://doi.org/10.3390/rs14164110 - 22 Aug 2022
Cited by 11 | Viewed by 2934
Abstract
The last two decades have seen an incessant growth in the use of Unmanned Aerial Vehicles (UAVs) equipped with HD cameras for developing aerial vision-based systems to support civilian and military tasks, including land monitoring, change detection, and object classification. To perform most [...] Read more.
The last two decades have seen an incessant growth in the use of Unmanned Aerial Vehicles (UAVs) equipped with HD cameras for developing aerial vision-based systems to support civilian and military tasks, including land monitoring, change detection, and object classification. To perform most of these tasks, the artificial intelligence algorithms usually need to know, a priori, what to look for, identify. or recognize. Actually, in most operational scenarios, such as war zones or post-disaster situations, areas and objects of interest are not decidable a priori since their shape and visual features may have been altered by events or even intentionally disguised (e.g., improvised explosive devices (IEDs)). For these reasons, in recent years, more and more research groups are investigating the design of original anomaly detection methods, which, in short, are focused on detecting samples that differ from the others in terms of visual appearance and occurrences with respect to a given environment. In this paper, we present a novel two-branch Generative Adversarial Network (GAN)-based method for low-altitude RGB aerial video surveillance to detect and localize anomalies. We have chosen to focus on the low-altitude sequences as we are interested in complex operational scenarios where even a small object or device can represent a reason for danger or attention. The proposed model was tested on the UAV Mosaicking and Change Detection (UMCD) dataset, a one-of-a-kind collection of challenging videos whose sequences were acquired between 6 and 15 m above sea level on three types of ground (i.e., urban, dirt, and countryside). Results demonstrated the effectiveness of the model in terms of Area Under the Receiving Operating Curve (AUROC) and Structural Similarity Index (SSIM), achieving an average of 97.2% and 95.7%, respectively, thus suggesting that the system can be deployed in real-world applications. Full article
Show Figures

Graphical abstract

19 pages, 12714 KiB  
Article
RAMC: A Rotation Adaptive Tracker with Motion Constraint for Satellite Video Single-Object Tracking
by Yuzeng Chen, Yuqi Tang, Te Han, Yuwei Zhang, Bin Zou and Huihui Feng
Remote Sens. 2022, 14(13), 3108; https://doi.org/10.3390/rs14133108 - 28 Jun 2022
Cited by 9 | Viewed by 1867
Abstract
Single-object tracking (SOT) in satellite videos (SVs) is a promising and challenging task in the remote sensing community. In terms of the object itself and the tracking algorithm, the rotation of small-sized objects and tracking drift are common problems due to the nadir [...] Read more.
Single-object tracking (SOT) in satellite videos (SVs) is a promising and challenging task in the remote sensing community. In terms of the object itself and the tracking algorithm, the rotation of small-sized objects and tracking drift are common problems due to the nadir view coupled with a complex background. This article proposes a novel rotation adaptive tracker with motion constraint (RAMC) to explore how the hybridization of angle and motion information can be utilized to boost SV object tracking from two branches: rotation and translation. We decouple the rotation and translation motion patterns. The rotation phenomenon is decomposed into the translation solution to achieve adaptive rotation estimation in the rotation branch. In the translation branch, the appearance and motion information are synergized to enhance the object representations and address the tracking drift issue. Moreover, an internal shrinkage (IS) strategy is proposed to optimize the evaluation process of trackers. Extensive experiments on space-born SV datasets captured from the Jilin-1 satellite constellation and International Space Station (ISS) are conducted. The results demonstrate the superiority of the proposed method over other algorithms. With an area under the curve (AUC) of 0.785 and 0.946 in the success and precision plots, respectively, the proposed RAMC achieves optimal performance while running at real-time speed. Full article
Show Figures

Figure 1

22 pages, 2851 KiB  
Article
Eagle-Eye-Inspired Attention for Object Detection in Remote Sensing
by Kang Liu, Ju Huang and Xuelong Li
Remote Sens. 2022, 14(7), 1743; https://doi.org/10.3390/rs14071743 - 05 Apr 2022
Cited by 6 | Viewed by 3488
Abstract
Object detection possesses extremely significant applications in the field of optical remote sensing images. A great many works have achieved remarkable results in this task. However, some common problems, such as scale, illumination, and image quality, are still unresolved. Inspired by the mechanism [...] Read more.
Object detection possesses extremely significant applications in the field of optical remote sensing images. A great many works have achieved remarkable results in this task. However, some common problems, such as scale, illumination, and image quality, are still unresolved. Inspired by the mechanism of cascade attention eagle-eye fovea, we propose a new attention mechanism network named the eagle-eye fovea network (EFNet) which contains two foveae for remote sensing object detection. The EFNet consists of two eagle-eye fovea modules: front central fovea (FCF) and rear central fovea (RCF). The FCF is mainly used to learn the candidate object knowledge based on the channel attention and the spatial attention, while the RCF mainly aims to predict the refined objects with two subnetworks without anchors. Three remote sensing object-detection datasets, namely DIOR, HRRSD, and AIBD, are utilized in the comparative experiments. The best results of the proposed EFNet are obtained on the HRRSD with a 0.622 AP score and a 0.907 AP50 score. The experimental results demonstrate the effectiveness of the proposed EFNet for both multi-category datasets and single category datasets. Full article
Show Figures

Figure 1

15 pages, 10634 KiB  
Article
Deep-Learning-Based Object Filtering According to Altitude for Improvement of Obstacle Recognition during Autonomous Flight
by Yongwoo Lee, Junkang An and Inwhee Joe
Remote Sens. 2022, 14(6), 1378; https://doi.org/10.3390/rs14061378 - 12 Mar 2022
Cited by 1 | Viewed by 2371
Abstract
The autonomous flight of an unmanned aerial vehicle refers to creating a new flight route after self-recognition and judgment when an unexpected situation occurs during the flight. The unmanned aerial vehicle can fly at a high speed of more than 60 km/h, so [...] Read more.
The autonomous flight of an unmanned aerial vehicle refers to creating a new flight route after self-recognition and judgment when an unexpected situation occurs during the flight. The unmanned aerial vehicle can fly at a high speed of more than 60 km/h, so obstacle recognition and avoidance must be implemented in real-time. In this paper, we propose to recognize objects quickly and accurately by effectively using the H/W resources of small computers mounted on industrial unmanned air vehicles. Since the number of pixels in the image decreases after the resizing process, filtering and object resizing were performed according to the altitude, so that quick detection and avoidance could be performed. To this end, objects up to 60 m in height were classified by subdividing them at 20 m intervals, and objects unnecessary for object detection were filtered with deep learning methods. In the 40 m to 60 m sections, the average speed of recognition was increased by 38%, without compromising the accuracy of object detection. Full article
Show Figures

Graphical abstract

19 pages, 12167 KiB  
Article
Building Extraction and Number Statistics in WUI Areas Based on UNet Structure and Ensemble Learning
by De-Yue Chen, Ling Peng, Wei-Chao Li and Yin-Da Wang
Remote Sens. 2021, 13(6), 1172; https://doi.org/10.3390/rs13061172 - 19 Mar 2021
Cited by 14 | Viewed by 2681
Abstract
Following the advancement and progression of urbanization, management problems of the wildland–urban interface (WUI) have become increasingly serious. WUI regional governance issues involve many factors including climate, humanities, etc., and have attracted attention and research from all walks of life. Building research plays [...] Read more.
Following the advancement and progression of urbanization, management problems of the wildland–urban interface (WUI) have become increasingly serious. WUI regional governance issues involve many factors including climate, humanities, etc., and have attracted attention and research from all walks of life. Building research plays a vital part in the WUI area. Building location is closely related with the planning and management of the WUI area, and the number of buildings is related to the rescue arrangement. There are two major methods to obtain this building information: one is to obtain them from relevant agencies, which is slow and lacks timeliness, while the other approach is to extract them from high-resolution remote sensing images, which is relatively inexpensive and offers improved timeliness. Inspired by the recent successful application of deep learning, in this paper, we propose a method for extracting building information from high-resolution remote sensing images based on deep learning, which is combined with ensemble learning to extract the building location. Further, we use the idea of image anomaly detection to estimate the number of buildings. After verification on two datasets, we obtain superior semantic segmentation results and achieve better building contour extraction and number estimation. Full article
Show Figures

Graphical abstract

17 pages, 3098 KiB  
Article
Campus Violence Detection Based on Artificial Intelligent Interpretation of Surveillance Video Sequences
by Liang Ye, Tong Liu, Tian Han, Hany Ferdinando, Tapio Seppänen and Esko Alasaarela
Remote Sens. 2021, 13(4), 628; https://doi.org/10.3390/rs13040628 - 09 Feb 2021
Cited by 25 | Viewed by 5506
Abstract
Campus violence is a common social phenomenon all over the world, and is the most harmful type of school bullying events. As artificial intelligence and remote sensing techniques develop, there are several possible methods to detect campus violence, e.g., movement sensor-based methods and [...] Read more.
Campus violence is a common social phenomenon all over the world, and is the most harmful type of school bullying events. As artificial intelligence and remote sensing techniques develop, there are several possible methods to detect campus violence, e.g., movement sensor-based methods and video sequence-based methods. Sensors and surveillance cameras are used to detect campus violence. In this paper, the authors use image features and acoustic features for campus violence detection. Campus violence data are gathered by role-playing, and 4096-dimension feature vectors are extracted from every 16 frames of video images. The C3D (Convolutional 3D) neural network is used for feature extraction and classification, and an average recognition accuracy of 92.00% is achieved. Mel-frequency cepstral coefficients (MFCCs) are extracted as acoustic features, and three speech emotion databases are involved. The C3D neural network is used for classification, and the average recognition accuracies are 88.33%, 95.00%, and 91.67%, respectively. To solve the problem of evidence conflict, the authors propose an improved Dempster–Shafer (D–S) algorithm. Compared with existing D–S theory, the improved algorithm increases the recognition accuracy by 10.79%, and the recognition accuracy can ultimately reach 97.00%. Full article
Show Figures

Graphical abstract

Back to TopTop