remotesensing-logo

Journal Browser

Journal Browser

Advanced Deep Learning Strategies for the Analysis of Remote Sensing Images

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "Remote Sensing Image Processing".

Deadline for manuscript submissions: closed (30 November 2020) | Viewed by 105045

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editors


E-Mail Website
Guest Editor

Special Issue Information

Dear Colleagues,

The last two decades have unveiled that remote sensing (RS) has become an essential technology in monitoring urban, atmospheric, and ecological changes. The increased availability of satellites and airborne sensors with different spatial and spectral resolutions has made this technology a key component in decision making. In addition to these traditional platforms, a new era has been opened recently by the adoption of UAVs for diverse applications such as policing, precision farming, and urban planning.

The great potential provided in terms of observation capability introduces similarly great challenges in terms of information extraction. Yet, processing the massive data collected by these diverse platforms is impractical and ineffective using traditional image analysis methodologies. This calls for the adoption of powerful techniques that can extract reliable and impressive information. In this context, deep learning (DL) strategies have recently been shown to hold the great promise of addressing the challenging needs of the RS community. Indeed, the introduction of DL dates back decades ago, when the first steps towards building artificial neural networks were undertaken. However, due to the limited processing resources, it did not reach a cutting-edge success in data representation and classification tasks until the recent appearance of high-performance computing facilities. This in turn enabled the design of sophisticated deep neural architectures and boosted the precision of many problems to groundbreaking performances.

This Special Issue welcomes papers that explore novel and challenging topics for the analysis of remote sensing images acquired with diverse platforms. Welcome topics include but are not limited to the following:

  • Semantic segmentation;
  • Domain adaptation from single and multiple sources;
  • Continual learning;
  • Exploration of the relationship between natural language and remote sensing images (bidirectional text to image retrieval, image captioning, visual question answering);
  • Crowd estimation in UAV imagery;
  • Image generation and conversion using generative adversarial networks.

Dr. Yakoub Bazi
Dr. Edoardo Pasolli
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Multispectral/hyperspectral/UAV imagery
  • Natural language and remote sensing
  • Classification, restoration, super-resolution, retrieval, change detection
  • Convolutional neural networks (CNNs)
  • Generative adversarial networks (GANs)

Published Papers (20 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 5532 KiB  
Article
Aerial Imagery Feature Engineering Using Bidirectional Generative Adversarial Networks: A Case Study of the Pilica River Region, Poland
by Maciej Adamiak, Krzysztof Będkowski and Anna Majchrowska
Remote Sens. 2021, 13(2), 306; https://doi.org/10.3390/rs13020306 - 17 Jan 2021
Cited by 6 | Viewed by 3984
Abstract
Generative adversarial networks (GANs) are a type of neural network that are characterized by their unique construction and training process. Utilizing the concept of the latent space and exploiting the results of a duel between different GAN components opens up interesting opportunities for [...] Read more.
Generative adversarial networks (GANs) are a type of neural network that are characterized by their unique construction and training process. Utilizing the concept of the latent space and exploiting the results of a duel between different GAN components opens up interesting opportunities for computer vision (CV) activities, such as image inpainting, style transfer, or even generative art. GANs have great potential to support aerial and satellite image interpretation activities. Carefully crafting a GAN and applying it to a high-quality dataset can result in nontrivial feature enrichment. In this study, we have designed and tested an unsupervised procedure capable of engineering new features by shifting real orthophotos into the GAN’s underlying latent space. Latent vectors are a low-dimensional representation of the orthophoto patches that hold information about the strength, occurrence, and interaction between spatial features discovered during the network training. Latent vectors were combined with geographical coordinates to bind them to their original location in the orthophoto. In consequence, it was possible to describe the whole research area as a set of latent vectors and perform further spatial analysis not on RGB images but on their lower-dimensional representation. To accomplish this goal, a modified version of the big bidirectional generative adversarial network (BigBiGAN) has been trained on a fine-tailored orthophoto imagery dataset covering the area of the Pilica River region in Poland. Trained models, precisely the generator and encoder, have been utilized during the processes of model quality assurance and feature engineering, respectively. Quality assurance was performed by measuring model reconstruction capabilities and by manually verifying artificial images produced by the generator. The feature engineering use case, on the other hand, has been presented in a real research scenario that involved splitting the orthophoto into a set of patches, encoding the patch set into the GAN latent space, grouping similar patches latent codes by utilizing hierarchical clustering, and producing a segmentation map of the orthophoto. Full article
Show Figures

Graphical abstract

27 pages, 9491 KiB  
Article
Post-Disaster Building Damage Detection from Earth Observation Imagery Using Unsupervised and Transferable Anomaly Detecting Generative Adversarial Networks
by Sofia Tilon, Francesco Nex, Norman Kerle and George Vosselman
Remote Sens. 2020, 12(24), 4193; https://doi.org/10.3390/rs12244193 - 21 Dec 2020
Cited by 28 | Viewed by 5793
Abstract
We present an unsupervised deep learning approach for post-disaster building damage detection that can transfer to different typologies of damage or geographical locations. Previous advances in this direction were limited by insufficient qualitative training data. We propose to use a state-of-the-art Anomaly Detecting [...] Read more.
We present an unsupervised deep learning approach for post-disaster building damage detection that can transfer to different typologies of damage or geographical locations. Previous advances in this direction were limited by insufficient qualitative training data. We propose to use a state-of-the-art Anomaly Detecting Generative Adversarial Network (ADGAN) because it only requires pre-event imagery of buildings in their undamaged state. This approach aids the post-disaster response phase because the model can be developed in the pre-event phase and rapidly deployed in the post-event phase. We used the xBD dataset, containing pre- and post- event satellite imagery of several disaster-types, and a custom made Unmanned Aerial Vehicle (UAV) dataset, containing post-earthquake imagery. Results showed that models trained on UAV-imagery were capable of detecting earthquake-induced damage. The best performing model for European locations obtained a recall, precision and F1-score of 0.59, 0.97 and 0.74, respectively. Models trained on satellite imagery were capable of detecting damage on the condition that the training dataset was void of vegetation and shadows. In this manner, the best performing model for (wild)fire events yielded a recall, precision and F1-score of 0.78, 0.99 and 0.87, respectively. Compared to other supervised and/or multi-epoch approaches, our results are encouraging. Moreover, in addition to image classifications, we show how contextual information can be used to create detailed damage maps without the need of a dedicated multi-task deep learning framework. Finally, we formulate practical guidelines to apply this single-epoch and unsupervised method to real-world applications. Full article
Show Figures

Graphical abstract

22 pages, 14200 KiB  
Article
Intelligent Mapping of Urban Forests from High-Resolution Remotely Sensed Imagery Using Object-Based U-Net-DenseNet-Coupled Network
by Shaobai He, Huaqiang Du, Guomo Zhou, Xuejian Li, Fangjie Mao, Di’en Zhu, Yanxin Xu, Meng Zhang, Zihao Huang, Hua Liu and Xin Luo
Remote Sens. 2020, 12(23), 3928; https://doi.org/10.3390/rs12233928 - 30 Nov 2020
Cited by 15 | Viewed by 2664
Abstract
The application of deep learning techniques, especially deep convolutional neural networks (DCNNs), in the intelligent mapping of very high spatial resolution (VHSR) remote sensing images has drawn much attention in the remote sensing community. However, the fragmented distribution of urban land use types [...] Read more.
The application of deep learning techniques, especially deep convolutional neural networks (DCNNs), in the intelligent mapping of very high spatial resolution (VHSR) remote sensing images has drawn much attention in the remote sensing community. However, the fragmented distribution of urban land use types and the complex structure of urban forests bring about a variety of challenges for urban land use mapping and the extraction of urban forests. Based on the DCNN algorithm, this study proposes a novel object-based U-net-DenseNet-coupled network (OUDN) method to realize urban land use mapping and the accurate extraction of urban forests. The proposed OUDN has three parts: the first part involves the coupling of the improved U-net and DenseNet architectures; then, the network is trained according to the labeled data sets, and the land use information in the study area is classified; the final part fuses the object boundary information obtained by object-based multiresolution segmentation into the classification layer, and a voting method is applied to optimize the classification results. The results show that (1) the classification results of the OUDN algorithm are better than those of U-net and DenseNet, and the average classification accuracy is 92.9%, an increase in approximately 3%; (2) for the U-net-DenseNet-coupled network (UDN) and OUDN, the urban forest extraction accuracies are higher than those of U-net and DenseNet, and the OUDN effectively alleviates the classification error caused by the fragmentation of urban distribution by combining object-based multiresolution segmentation features, making the overall accuracy (OA) of urban land use classification and the extraction accuracy of urban forests superior to those of the UDN algorithm; (3) based on the Spe-Texture (the spectral features combined with the texture features), the OA of the OUDN in the extraction of urban land use categories can reach 93.8%, thereby the algorithm achieved the accurate discrimination of different land use types, especially urban forests (99.7%). Therefore, this study provides a reference for feature setting for the mapping of urban land use information from VHSR imagery. Full article
Show Figures

Figure 1

17 pages, 4042 KiB  
Article
Development of an Automated Visibility Analysis Framework for Pavement Markings Based on the Deep Learning Approach
by Kyubyung Kang, Donghui Chen, Cheng Peng, Dan Koo, Taewook Kang and Jonghoon Kim
Remote Sens. 2020, 12(22), 3837; https://doi.org/10.3390/rs12223837 - 23 Nov 2020
Cited by 10 | Viewed by 2539
Abstract
Pavement markings play a critical role in reducing crashes and improving safety on public roads. As road pavements age, maintenance work for safety purposes becomes critical. However, inspecting all pavement markings at the right time is very challenging due to the lack of [...] Read more.
Pavement markings play a critical role in reducing crashes and improving safety on public roads. As road pavements age, maintenance work for safety purposes becomes critical. However, inspecting all pavement markings at the right time is very challenging due to the lack of available human resources. This study was conducted to develop an automated condition analysis framework for pavement markings using machine learning technology. The proposed framework consists of three modules: a data processing module, a pavement marking detection module, and a visibility analysis module. The framework was validated through a case study of pavement markings training data sets in the U.S. It was found that the detection model of the framework was very precise, which means most of the identified pavement markings were correctly classified. In addition, in the proposed framework, visibility was confirmed as an important factor of driver safety and maintenance, and visibility standards for pavement markings were defined. Full article
Show Figures

Graphical abstract

16 pages, 12092 KiB  
Article
Wildfire-Detection Method Using DenseNet and CycleGAN Data Augmentation-Based Remote Camera Imagery
by Minsoo Park, Dai Quoc Tran, Daekyo Jung and Seunghee Park
Remote Sens. 2020, 12(22), 3715; https://doi.org/10.3390/rs12223715 - 12 Nov 2020
Cited by 50 | Viewed by 5723
Abstract
To minimize the damage caused by wildfires, a deep learning-based wildfire-detection technology that extracts features and patterns from surveillance camera images was developed. However, many studies related to wildfire-image classification based on deep learning have highlighted the problem of data imbalance between wildfire-image [...] Read more.
To minimize the damage caused by wildfires, a deep learning-based wildfire-detection technology that extracts features and patterns from surveillance camera images was developed. However, many studies related to wildfire-image classification based on deep learning have highlighted the problem of data imbalance between wildfire-image data and forest-image data. This data imbalance causes model performance degradation. In this study, wildfire images were generated using a cycle-consistent generative adversarial network (CycleGAN) to eliminate data imbalances. In addition, a densely-connected-convolutional-networks-based (DenseNet-based) framework was proposed and its performance was compared with pre-trained models. While training with a train set containing an image generated by a GAN in the proposed DenseNet-based model, the best performance result value was realized among the models with an accuracy of 98.27% and an F1 score of 98.16, obtained using the test dataset. Finally, this trained model was applied to high-quality drone images of wildfires. The experimental results showed that the proposed framework demonstrated high wildfire-detection accuracy. Full article
Show Figures

Graphical abstract

18 pages, 2471 KiB  
Article
VddNet: Vine Disease Detection Network Based on Multispectral Images and Depth Map
by Mohamed Kerkech, Adel Hafiane and Raphael Canals
Remote Sens. 2020, 12(20), 3305; https://doi.org/10.3390/rs12203305 - 11 Oct 2020
Cited by 34 | Viewed by 4512
Abstract
Vine pathologies generate several economic and environmental problems, causing serious difficulties for the viticultural activity. The early detection of vine disease can significantly improve the control of vine diseases and avoid spread of virus or fungi. Currently, remote sensing and artificial intelligence technologies [...] Read more.
Vine pathologies generate several economic and environmental problems, causing serious difficulties for the viticultural activity. The early detection of vine disease can significantly improve the control of vine diseases and avoid spread of virus or fungi. Currently, remote sensing and artificial intelligence technologies are emerging in the field of precision agriculture. They offer interesting potential for crop disease management. However, despite the advances in these technologies, particularly deep learning technologies, many problems still present considerable challenges, such as semantic segmentation of images for disease mapping. In this paper, we present a new deep learning architecture called Vine Disease Detection Network (VddNet). It is based on three parallel auto-encoders integrating different information (i.e., visible, infrared and depth). Then, the decoder reconstructs and retrieves the features, and assigns a class to each output pixel. An orthophotos registration method is also proposed to align the three types of images and enable the processing by VddNet. The proposed architecture is assessed by comparing it with the most known architectures: SegNet, U-Net, DeepLabv3+ and PSPNet. The deep learning architectures were trained on multispectral data from an unmanned aerial vehicle (UAV) and depth map information extracted from 3D processing. The results of the proposed architecture show that the VddNet architecture achieves higher scores than the baseline methods. Moreover, this study demonstrates that the proposed method has many advantages compared to methods that directly use the UAV images. Full article
Show Figures

Figure 1

24 pages, 2547 KiB  
Article
A High-Performance Spectral-Spatial Residual Network for Hyperspectral Image Classification with Small Training Data
by Wijayanti Nurul Khotimah, Mohammed Bennamoun, Farid Boussaid, Ferdous Sohel and David Edwards
Remote Sens. 2020, 12(19), 3137; https://doi.org/10.3390/rs12193137 - 24 Sep 2020
Cited by 13 | Viewed by 3111
Abstract
In this paper, we propose a high performance Two-Stream spectral-spatial Residual Network (TSRN) for hyperspectral image classification. The first spectral residual network (sRN) stream is used to extract spectral characteristics, and the second spatial residual network (saRN) stream is concurrently used to extract [...] Read more.
In this paper, we propose a high performance Two-Stream spectral-spatial Residual Network (TSRN) for hyperspectral image classification. The first spectral residual network (sRN) stream is used to extract spectral characteristics, and the second spatial residual network (saRN) stream is concurrently used to extract spatial features. The sRN uses 1D convolutional layers to fit the spectral data structure, while the saRN uses 2D convolutional layers to match the hyperspectral spatial data structure. Furthermore, each convolutional layer is preceded by a Batch Normalization (BN) layer that works as a regularizer to speed up the training process and to improve the accuracy. We conducted experiments on three well-known hyperspectral datasets, and we compare our results with five contemporary methods across various sizes of training samples. The experimental results show that the proposed architecture can be trained with small size datasets and outperforms the state-of-the-art methods in terms of the Overall Accuracy, Average Accuracy, Kappa Value, and training time. Full article
Show Figures

Graphical abstract

17 pages, 14922 KiB  
Article
Learn to Extract Building Outline from Misaligned Annotation through Nearest Feature Selector
by Yuxuan Wang, Guangming Wu, Yimin Guo, Yifei Huang and Ryosuke Shibasaki
Remote Sens. 2020, 12(17), 2722; https://doi.org/10.3390/rs12172722 - 23 Aug 2020
Cited by 2 | Viewed by 2568
Abstract
For efficient building outline extraction, many algorithms, including unsupervised or supervised, have been proposed over the past decades. In recent years, due to the rapid development of the convolutional neural networks, especially fully convolutional networks, building extraction is treated as a semantic segmentation [...] Read more.
For efficient building outline extraction, many algorithms, including unsupervised or supervised, have been proposed over the past decades. In recent years, due to the rapid development of the convolutional neural networks, especially fully convolutional networks, building extraction is treated as a semantic segmentation task that deals with the extremely biased positive pixels. The state-of-the-art methods, either through direct or indirect approaches, are mainly focused on better network design. The shifts and rotations, which are coarsely presented in manually created annotations, have long been ignored. Due to the limited number of positive samples, the misalignment will significantly reduce the correctness of pixel-to-pixel loss that might lead to a gradient explosion. To overcome this, we propose a nearest feature selector (NFS) to dynamically re-align the prediction and slightly misaligned annotations. The NFS can be seamlessly appended to existing loss functions and prevent misleading by the errors or misalignment of annotations. Experiments on a large scale aerial image dataset with centered buildings and corresponding building outlines indicate that the additional NFS brings higher performance when compared to existing naive loss functions. In the classic L1 loss, the addition of NFS gains increments of 8.8% of f1-score, 8.9% of kappa coefficient, and 9.8% of Jaccard index, respectively. Full article
Show Figures

Graphical abstract

27 pages, 41426 KiB  
Article
Deep Learning with Open Data for Desert Road Mapping
by Christopher Stewart, Michele Lazzarini, Adrian Luna and Sergio Albani
Remote Sens. 2020, 12(14), 2274; https://doi.org/10.3390/rs12142274 - 15 Jul 2020
Cited by 13 | Viewed by 6709
Abstract
The availability of free and open data from Earth observation programmes such as Copernicus, and from collaborative projects such as Open Street Map (OSM), enables low cost artificial intelligence (AI) based monitoring applications. This creates opportunities, particularly in developing countries with scarce economic [...] Read more.
The availability of free and open data from Earth observation programmes such as Copernicus, and from collaborative projects such as Open Street Map (OSM), enables low cost artificial intelligence (AI) based monitoring applications. This creates opportunities, particularly in developing countries with scarce economic resources, for large–scale monitoring in remote regions. A significant portion of Earth’s surface comprises desert dune fields, where shifting sand affects infrastructure and hinders movement. A robust, cost–effective and scalable methodology is proposed for road detection and monitoring in regions covered by desert sand. The technique uses Copernicus Sentinel–1 synthetic aperture radar (SAR) satellite data as an input to a deep learning model based on the U–Net architecture for image segmentation. OSM data is used for model training. The method comprises two steps: The first involves processing time series of Sentinel–1 SAR interferometric wide swath (IW) acquisitions in the same geometry to produce multitemporal backscatter and coherence averages. These are divided into patches and matched with masks of OSM roads to form the training data, the quantity of which is increased through data augmentation. The second step includes the U–Net deep learning workflow. The methodology has been applied to three different dune fields in Africa and Asia. A performance evaluation through the calculation of the Jaccard similarity coefficient was carried out for each area, and ranges from 84% to 89% for the best available input. The rank distance, calculated from the completeness and correctness percentages, was also calculated and ranged from 75% to 80%. Over all areas there are more missed detections than false positives. In some cases, this was due to mixed infrastructure in the same resolution cell of the input SAR data. Drift sand and dune migration covering infrastructure is a concern in many desert regions, and broken segments in the resulting road detections are sometimes due to sand burial. The results also show that, in most cases, the Sentinel–1 vertical transmit–vertical receive (VV) backscatter averages alone constitute the best input to the U–Net model. The detection and monitoring of roads in desert areas are key concerns, particularly given a growing population increasingly on the move. Full article
Show Figures

Graphical abstract

18 pages, 2907 KiB  
Article
Deep Open-Set Domain Adaptation for Cross-Scene Classification based on Adversarial Learning and Pareto Ranking
by Reham Adayel, Yakoub Bazi, Haikel Alhichri and Naif Alajlan
Remote Sens. 2020, 12(11), 1716; https://doi.org/10.3390/rs12111716 - 27 May 2020
Cited by 27 | Viewed by 3557
Abstract
Most of the existing domain adaptation (DA) methods proposed in the context of remote sensing imagery assume the presence of the same land-cover classes in the source and target domains. Yet, this assumption is not always realistic in practice as the target domain [...] Read more.
Most of the existing domain adaptation (DA) methods proposed in the context of remote sensing imagery assume the presence of the same land-cover classes in the source and target domains. Yet, this assumption is not always realistic in practice as the target domain may contain additional classes unknown to the source leading to the so-called open set DA. Under this challenging setting, the problem turns to reducing the distribution discrepancy between the shared classes in both domains besides the detection of the unknown class samples in the target domain. To deal with the openset problem, we propose an approach based on adversarial learning and pareto-based ranking. In particular, the method leverages the distribution discrepancy between the source and target domains using min-max entropy optimization. During the alignment process, it identifies candidate samples of the unknown class from the target domain through a pareto-based ranking scheme that uses ambiguity criteria based on entropy and the distance to source class prototype. Promising results using two cross-domain datasets that consist of very high resolution and extremely high resolution images, show the effectiveness of the proposed method. Full article
Show Figures

Figure 1

25 pages, 7585 KiB  
Article
Small-Object Detection in Remote Sensing Images with End-to-End Edge-Enhanced GAN and Object Detector Network
by Jakaria Rabbi, Nilanjan Ray, Matthias Schubert, Subir Chowdhury and Dennis Chao
Remote Sens. 2020, 12(9), 1432; https://doi.org/10.3390/rs12091432 - 01 May 2020
Cited by 166 | Viewed by 16355
Abstract
The detection performance of small objects in remote sensing images has not been satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) showed remarkable image enhancement performance, but reconstructed images [...] Read more.
The detection performance of small objects in remote sensing images has not been satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) showed remarkable image enhancement performance, but reconstructed images usually miss high-frequency edge information. Therefore, object detection performance showed degradation for small objects on recovered noisy and low-resolution remote sensing images. Inspired by the success of edge enhanced GAN (EEGAN) and ESRGAN, we applied a new edge-enhanced super-resolution GAN (EESRGAN) to improve the quality of remote sensing images and used different detector networks in an end-to-end manner where detector loss was backpropagated into the EESRGAN to improve the detection performance. We proposed an architecture with three components: ESRGAN, EEN, and Detection network. We used residual-in-residual dense blocks (RRDB) for both the ESRGAN and EEN, and for the detector network, we used a faster region-based convolutional network (FRCNN) (two-stage detector) and a single-shot multibox detector (SSD) (one stage detector). Extensive experiments on a public (car overhead with context) dataset and another self-assembled (oil and gas storage tank) satellite dataset showed superior performance of our method compared to the standalone state-of-the-art object detectors. Full article
Show Figures

Graphical abstract

27 pages, 15007 KiB  
Article
Semantic Labeling in Remote Sensing Corpora Using Feature Fusion-Based Enhanced Global Convolutional Network with High-Resolution Representations and Depthwise Atrous Convolution
by Teerapong Panboonyuen, Kulsawasd Jitkajornwanich, Siam Lawawirojwong, Panu Srestasathiern and Peerapon Vateekul
Remote Sens. 2020, 12(8), 1233; https://doi.org/10.3390/rs12081233 - 12 Apr 2020
Cited by 9 | Viewed by 4192
Abstract
One of the fundamental tasks in remote sensing is the semantic segmentation on the aerial and satellite images. It plays a vital role in applications, such as agriculture planning, map updates, route optimization, and navigation. The state-of-the-art model is the Enhanced Global Convolutional [...] Read more.
One of the fundamental tasks in remote sensing is the semantic segmentation on the aerial and satellite images. It plays a vital role in applications, such as agriculture planning, map updates, route optimization, and navigation. The state-of-the-art model is the Enhanced Global Convolutional Network (GCN152-TL-A) from our previous work. It composes two main components: (i) the backbone network to extract features and ( i i ) the segmentation network to annotate labels. However, the accuracy can be further improved, since the deep learning network is not designed for recovering low-level features (e.g., river, low vegetation). In this paper, we aim to improve the semantic segmentation network in three aspects, designed explicitly for the remotely sensed domain. First, we propose to employ a modern backbone network called “High-Resolution Representation (HR)” to extract features with higher quality. It repeatedly fuses the representations generated by the high-to-low subnetworks with the restoration of the low-resolution representations to the same depth and level. Second, “Feature Fusion (FF)” is added to our network to capture low-level features (e.g., lines, dots, or gradient orientation). It fuses between the features from the backbone and the segmentation models, which helps to prevent the loss of these low-level features. Finally, “Depthwise Atrous Convolution (DA)” is introduced to refine the extracted features by using four multi-resolution layers in collaboration with a dilated convolution strategy. The experiment was conducted on three data sets: two private corpora from Landsat-8 satellite and one public benchmark from the “ISPRS Vaihingen” challenge. There are two baseline models: the Deep Encoder-Decoder Network (DCED) and our previous model. The results show that the proposed model significantly outperforms all baselines. It is the winner in all data sets and exceeds more than 90% of F 1 : 0.9114, 0.9362, and 0.9111 in two Landsat-8 and ISPRS Vaihingen data sets, respectively. Furthermore, it achieves an accuracy beyond 90% on almost all classes. Full article
Show Figures

Graphical abstract

21 pages, 34064 KiB  
Article
Vehicle and Vessel Detection on Satellite Imagery: A Comparative Study on Single-Shot Detectors
by Tanguy Ophoff, Steven Puttemans, Vasileios Kalogirou, Jean-Philippe Robin and Toon Goedemé
Remote Sens. 2020, 12(7), 1217; https://doi.org/10.3390/rs12071217 - 09 Apr 2020
Cited by 20 | Viewed by 8490
Abstract
In this paper, we investigate the feasibility of automatic small object detection, such as vehicles and vessels, in satellite imagery with a spatial resolution between 0.3 and 0.5 m. The main challenges of this task are the small objects, as well as the [...] Read more.
In this paper, we investigate the feasibility of automatic small object detection, such as vehicles and vessels, in satellite imagery with a spatial resolution between 0.3 and 0.5 m. The main challenges of this task are the small objects, as well as the spread in object sizes, with objects ranging from 5 to a few hundred pixels in length. We first annotated 1500 km2, making sure to have equal amounts of land and water data. On top of this dataset we trained and evaluated four different single-shot object detection networks: YOLOV2, YOLOV3, D-YOLO and YOLT, adjusting the many hyperparameters to achieve maximal accuracy. We performed various experiments to better understand the performance and differences between the models. The best performing model, D-YOLO, reached an average precision of 60% for vehicles and 66% for vessels and can process an image of around 1 Gpx in 14 s. We conclude that these models, if properly tuned, can thus indeed be used to help speed up the workflows of satellite data analysts and to create even bigger datasets, making it possible to train even better models in the future. Full article
Show Figures

Graphical abstract

21 pages, 4865 KiB  
Article
An End-to-End and Localized Post-Processing Method for Correcting High-Resolution Remote Sensing Classification Result Images
by Xin Pan, Jian Zhao and Jun Xu
Remote Sens. 2020, 12(5), 852; https://doi.org/10.3390/rs12050852 - 06 Mar 2020
Cited by 18 | Viewed by 4001
Abstract
Since the result images obtained by deep semantic segmentation neural networks are usually not perfect, especially at object borders, the conditional random field (CRF) method is frequently utilized in the result post-processing stage to obtain the corrected classification result image. The CRF method [...] Read more.
Since the result images obtained by deep semantic segmentation neural networks are usually not perfect, especially at object borders, the conditional random field (CRF) method is frequently utilized in the result post-processing stage to obtain the corrected classification result image. The CRF method has achieved many successes in the field of computer vision, but when it is applied to remote sensing images, overcorrection phenomena may occur. This paper proposes an end-to-end and localized post-processing method (ELP) to correct the result images of high-resolution remote sensing image classification methods. ELP has two advantages. (1) End-to-end evaluation: ELP can identify which locations of the result image are highly suspected of having errors without requiring samples. This characteristic allows ELP to be adapted to an end-to-end classification process. (2) Localization: Based on the suspect areas, ELP limits the CRF analysis and update area to a small range and controls the iteration termination condition. This characteristic avoids the overcorrections caused by the global processing of the CRF. In the experiments, ELP is used to correct the classification results obtained by various deep semantic segmentation neural networks. Compared with traditional methods, the proposed method more effectively corrects the classification result and improves classification accuracy. Full article
Show Figures

Graphical abstract

20 pages, 5823 KiB  
Article
Water Identification from High-Resolution Remote Sensing Images Based on Multidimensional Densely Connected Convolutional Neural Networks
by Guojie Wang, Mengjuan Wu, Xikun Wei and Huihui Song
Remote Sens. 2020, 12(5), 795; https://doi.org/10.3390/rs12050795 - 02 Mar 2020
Cited by 68 | Viewed by 6198
Abstract
The accurate acquisition of water information from remote sensing images has become important in water resources monitoring and protections, and flooding disaster assessment. However, there are significant limitations in the traditionally used index for water body identification. In this study, we have proposed [...] Read more.
The accurate acquisition of water information from remote sensing images has become important in water resources monitoring and protections, and flooding disaster assessment. However, there are significant limitations in the traditionally used index for water body identification. In this study, we have proposed a deep convolutional neural network (CNN), based on the multidimensional densely connected convolutional neural network (DenseNet), for identifying water in the Poyang Lake area. The results from DenseNet were compared with the classical convolutional neural networks (CNNs): ResNet, VGG, SegNet and DeepLab v3+, and also compared with the Normalized Difference Water Index (NDWI). Results have indicated that CNNs are superior to the water index method. Among the five CNNs, the proposed DenseNet requires the shortest training time for model convergence, besides DeepLab v3+. The identification accuracies are evaluated through several error metrics. It is shown that the DenseNet performs much better than the other CNNs and the NDWI method considering the precision of identification results; among those, the NDWI performance is by far the poorest. It is suggested that the DenseNet is much better in distinguishing water from clouds and mountain shadows than other CNNs. Full article
Show Figures

Graphical abstract

19 pages, 6797 KiB  
Article
TextRS: Deep Bidirectional Triplet Network for Matching Text to Remote Sensing Images
by Taghreed Abdullah, Yakoub Bazi, Mohamad M. Al Rahhal, Mohamed L. Mekhalfi, Lalitha Rangarajan and Mansour Zuair
Remote Sens. 2020, 12(3), 405; https://doi.org/10.3390/rs12030405 - 27 Jan 2020
Cited by 49 | Viewed by 5282
Abstract
Exploring the relevance between images and their respective natural language descriptions, due to its paramount importance, is regarded as the next frontier in the general computer vision literature. Thus, recently several works have attempted to map visual attributes onto their corresponding textual tenor [...] Read more.
Exploring the relevance between images and their respective natural language descriptions, due to its paramount importance, is regarded as the next frontier in the general computer vision literature. Thus, recently several works have attempted to map visual attributes onto their corresponding textual tenor with certain success. However, this line of research has not been widespread in the remote sensing community. On this point, our contribution is three-pronged. First, we construct a new dataset for text-image matching tasks, termed TextRS, by collecting images from four well-known different scene datasets, namely AID, Merced, PatternNet, and NWPU datasets. Each image is annotated by five different sentences. All the five sentences were allocated by five people to evidence the diversity. Second, we put forth a novel Deep Bidirectional Triplet Network (DBTN) for text to image matching. Unlike traditional remote sensing image-to-image retrieval, our paradigm seeks to carry out the retrieval by matching text to image representations. To achieve that, we propose to learn a bidirectional triplet network, which is composed of Long Short Term Memory network (LSTM) and pre-trained Convolutional Neural Networks (CNNs) based on (EfficientNet-B2, ResNet-50, Inception-v3, and VGG16). Third, we top the proposed architecture with an average fusion strategy to fuse the features pertaining to the five image sentences, which enables learning of more robust embedding. The performances of the method expressed in terms Recall@K representing the presence of the relevant image among the top K retrieved images to the query text shows promising results as it yields 17.20%, 51.39%, and 73.02% for K = 1, 5, and 10, respectively. Full article
Show Figures

Graphical abstract

17 pages, 13446 KiB  
Article
Remote Sensing and Texture Image Classification Network Based on Deep Learning Integrated with Binary Coding and Sinkhorn Distance
by Chu He, Qingyi Zhang, Tao Qu, Dingwen Wang and Mingsheng Liao
Remote Sens. 2019, 11(23), 2870; https://doi.org/10.3390/rs11232870 - 03 Dec 2019
Cited by 6 | Viewed by 3595
Abstract
In the past two decades, traditional hand-crafted feature based methods and deep feature based methods have successively played the most important role in image classification. In some cases, hand-crafted features still provide better performance than deep features. This paper proposes an innovative network [...] Read more.
In the past two decades, traditional hand-crafted feature based methods and deep feature based methods have successively played the most important role in image classification. In some cases, hand-crafted features still provide better performance than deep features. This paper proposes an innovative network based on deep learning integrated with binary coding and Sinkhorn distance (DBSNet) for remote sensing and texture image classification. The statistical texture features of the image extracted by uniform local binary pattern (ULBP) are introduced as a supplement for deep features extracted by ResNet-50 to enhance the discriminability of features. After the feature fusion, both diversity and redundancy of the features have increased, thus we propose the Sinkhorn loss where an entropy regularization term plays a key role in removing redundant information and training the model quickly and efficiently. Image classification experiments are performed on two texture datasets and five remote sensing datasets. The results show that the statistical texture features of the image extracted by ULBP complement the deep features, and the new Sinkhorn loss performs better than the commonly used softmax loss. The performance of the proposed algorithm DBSNet ranks in the top three on the remote sensing datasets compared with other state-of-the-art algorithms. Full article
Show Figures

Graphical abstract

19 pages, 18762 KiB  
Article
Lifting Scheme-Based Deep Neural Network for Remote Sensing Scene Classification
by Chu He, Zishan Shi, Tao Qu, Dingwen Wang and Mingsheng Liao
Remote Sens. 2019, 11(22), 2648; https://doi.org/10.3390/rs11222648 - 13 Nov 2019
Cited by 10 | Viewed by 3228
Abstract
Recently, convolutional neural networks (CNNs) achieve impressive results on remote sensing scene classification, which is a fundamental problem for scene semantic understanding. However, convolution, the most essential operation in CNNs, restricts the development of CNN-based methods for scene classification. Convolution is not efficient [...] Read more.
Recently, convolutional neural networks (CNNs) achieve impressive results on remote sensing scene classification, which is a fundamental problem for scene semantic understanding. However, convolution, the most essential operation in CNNs, restricts the development of CNN-based methods for scene classification. Convolution is not efficient enough for high-resolution remote sensing images and limited in extracting discriminative features due to its linearity. Thus, there has been growing interest in improving the convolutional layer. The hardware implementation of the JPEG2000 standard relies on the lifting scheme to perform wavelet transform (WT). Compared with the convolution-based two-channel filter bank method of WT, the lifting scheme is faster, taking up less storage and having the ability of nonlinear transformation. Therefore, the lifting scheme can be regarded as a better alternative implementation for convolution in vanilla CNNs. This paper introduces the lifting scheme into deep learning and addresses the problems that only fixed and finite wavelet bases can be replaced by the lifting scheme, and the parameters cannot be updated through backpropagation. This paper proves that any convolutional layer in vanilla CNNs can be substituted by an equivalent lifting scheme. A lifting scheme-based deep neural network (LSNet) is presented to promote network applications on computational-limited platforms and utilize the nonlinearity of the lifting scheme to enhance performance. LSNet is validated on the CIFAR-100 dataset and the overall accuracies increase by 2.48% and 1.38% in the 1D and 2D experiments respectively. Experimental results on the AID which is one of the newest remote sensing scene dataset demonstrate that 1D LSNet and 2D LSNet achieve 2.05% and 0.45% accuracy improvement compared with the vanilla CNNs respectively. Full article
Show Figures

Graphical abstract

18 pages, 3979 KiB  
Article
Road Extraction of High-Resolution Remote Sensing Images Derived from DenseUNet
by Jiang Xin, Xinchang Zhang, Zhiqiang Zhang and Wu Fang
Remote Sens. 2019, 11(21), 2499; https://doi.org/10.3390/rs11212499 - 25 Oct 2019
Cited by 81 | Viewed by 7396
Abstract
Road network extraction is one of the significant assignments for disaster emergency response, intelligent transportation systems, and real-time updating road network. Road extraction base on high-resolution remote sensing images has become a hot topic. Presently, most of the researches are based on traditional [...] Read more.
Road network extraction is one of the significant assignments for disaster emergency response, intelligent transportation systems, and real-time updating road network. Road extraction base on high-resolution remote sensing images has become a hot topic. Presently, most of the researches are based on traditional machine learning algorithms, which are complex and computational because of impervious surfaces such as roads and buildings that are discernible in the images. Given the above problems, we propose a new method to extract the road network from remote sensing images using a DenseUNet model with few parameters and robust characteristics. DenseUNet consists of dense connection units and skips connections, which strengthens the fusion of different scales by connections at various network layers. The performance of the advanced method is validated on two datasets of high-resolution images by comparison with three classical semantic segmentation methods. The experimental results show that the method can be used for road extraction in complex scenes. Full article
Show Figures

Graphical abstract

22 pages, 3106 KiB  
Article
Deep Multi-Scale Recurrent Network for Synthetic Aperture Radar Images Despeckling
by Yuanyuan Zhou, Jun Shi, Xiaqing Yang, Chen Wang, Durga Kumar, Shunjun Wei and Xiaoling Zhang
Remote Sens. 2019, 11(21), 2462; https://doi.org/10.3390/rs11212462 - 23 Oct 2019
Cited by 20 | Viewed by 3006
Abstract
For the existence of speckles, many standard optical image processing methods, such as classification, segmentation, and registration, are restricted to synthetic aperture radar (SAR) images. In this work, an end-to-end deep multi-scale recurrent network (MSR-net) for SAR image despeckling is proposed. The multi-scale [...] Read more.
For the existence of speckles, many standard optical image processing methods, such as classification, segmentation, and registration, are restricted to synthetic aperture radar (SAR) images. In this work, an end-to-end deep multi-scale recurrent network (MSR-net) for SAR image despeckling is proposed. The multi-scale recurrent and weights sharing strategies are introduced to increase network capacity without multiplying the number of weights parameters. A convolutional long short-term memory (convLSTM) unit is embedded to capture useful information and helps with despeckling across scales. Meanwhile, the sub-pixel unit is utilized to improve the network efficiency. Besides, two criteria, edge feature keep ratio (EFKR) and feature point keep ratio (FPKR), are proposed to evaluate the performance of despeckling capacity for SAR, which can assess the retention ability of the despeckling algorithm to edge and feature information more effectively. Experimental results show that our proposed network can remove speckle noise while preserving the edge and texture information of images with low computational costs, especially in the low signal noise ratio scenarios. The peak signal to noise ratio (PSNR) of MSR-net can outperform traditional despeckling methods SAR-BM3D (Block-Matching and 3D filtering) by more than 2 dB for the simulated image. Furthermore, the adaptability of optical image processing methods to real SAR images can be enhanced after despeckling. Full article
Show Figures

Graphical abstract

Back to TopTop