A Review of Deep Learning-Based Methods for Road Extraction from High-Resolution Remote Sensing Images

Liu, Ruyi; Wu, Junhong; Lu, Wenyi; Miao, Qiguang; Zhang, Huan; Liu, Xiangzeng; Lu, Zixiang; Li, Long

doi:10.3390/rs16122056

Open AccessReview

A Review of Deep Learning-Based Methods for Road Extraction from High-Resolution Remote Sensing Images

by

Ruyi Liu

^1,2,3

,

Junhong Wu

^1,2,3,

Wenyi Lu

^1,2,3,

Qiguang Miao

^1,2,3,*

,

Huan Zhang

⁴

,

Xiangzeng Liu

^1,2,3

,

Zixiang Lu

^1,2,3

and

Long Li

⁵

¹

School of Computer Science and Technology, Xidian University, 2 Taibainan Road, Xi’an 710071, China

²

Xi’an Key Laboratory of Big Data and Intelligent Vision, Xi’an 710071, China

³

Key Laboratory of Collaborative Intelligence Systems, Ministry of Education, Xi’an 710071, China

⁴

Xi’an Research Institute of Navigation Technology, Xi’an 710071, China

⁵

Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin 541004, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(12), 2056; https://doi.org/10.3390/rs16122056

Submission received: 20 April 2024 / Revised: 30 May 2024 / Accepted: 2 June 2024 / Published: 7 June 2024

(This article belongs to the Topic Computational Intelligence in Remote Sensing: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Road extraction from high-resolution remote sensing images has long been a focal and challenging research topic in the field of computer vision. Accurate extraction of road networks holds extensive practical value in various fields, such as urban planning, traffic monitoring, disaster response and environmental monitoring. With rapid development in the field of computational intelligence, particularly breakthroughs in deep learning technology, road extraction technology has made significant progress and innovation. This paper provides a systematic review of deep learning-based methods for road extraction from remote sensing images, focusing on analyzing the application of computational intelligence technologies in improving the precision and efficiency of road extraction. According to the type of annotated data, deep learning-based methods are categorized into fully supervised learning, semi-supervised learning, and unsupervised learning approaches, each further divided into more specific subcategories. They are comparatively analyzed based on their principles, advantages, and limitations. Additionally, this review summarizes the metrics used to evaluate the performance of road extraction models and the high-resolution remote sensing image datasets applied for road extraction. Finally, we discuss the main challenges and prospects for leveraging computational intelligence techniques to enhance the precision, automation, and intelligence of road network extraction.

Keywords:

road extraction; high-resolution remote sensing images; deep learning; fully-supervised; semi-supervised; unsupervised

Graphical Abstract

1. Introduction

Road networks are a fundamental component of urban and rural infrastructure, playing a crucial role in promoting economic development and improving the quality of life for residents. Accurate extraction of road information holds significant practical application value in various fields, including urban planning [1,2,3], traffic monitoring [4,5,6], disaster emergency response [7,8,9], and environmental monitoring [10,11,12]. With the continuous advancement of remote sensing technology, we now have access to a greater amount of clear image data [13]. The acquisition cycle for high-resolution remote sensing images is becoming shorter, which offers a rich dataset for the automatic extraction of roads [14].

High-resolution images capture finer details, and additional color bands increase the data volume. This requires more computing power and more efficient algorithm design. In addition, high-resolution images can introduce more noise and error.

Specifically, the application of high-resolution remote sensing imagery in road extraction has garnered widespread attention in recent years. However, the complexity of the ground information introduces noise from trees, buildings, vehicles, and spectral variances [15]. To address these challenges, researchers have designed various computational intelligence-based methods for road extraction.

This review utilizes the Google Scholar database, employing “road extraction” and “remote sensing” as keywords to filter relevant literature from 2012. Recent review articles summarize road extraction techniques in remote sensing imagery based on different classification criteria. For instance, based on road features and selected road models, Wang et al. [16] categorized road extraction methods into clustering, knowledge-based, morphological, active contour models, and dynamic programming. Lian et al. [17] divided them into heuristic and data-driven methods based on design principles and data types. Chen et al. [18] classified methods that use 2D earth observing images and 3D LiDAR point clouds. In 2D optical images, road targets are divided into road areas and road lines. In 3D point clouds, road extraction methods are categorized into MLS-based, ALS-based, and TLS-based. Two-dimensional optical images are characterized by their low cost and mature technical research. With the exceptional performance of deep learning in various fields, the interest in its methodological research has significantly surpassed that of traditional techniques [19]. Therefore, this review focuses on the application of deep learning in extracting road information from 2D high-resolution optical images.

According to the different deep learning models used, Abdollahi et al. [20] classified the approaches that utilize GAN, deconvolution, FCN, and patch-based CNN models. However, with the emergence of methods based on different network models, this classification approach has become insufficiently detailed. Pruthi et al. [21] divided the task of road extraction into four categories based on road features and target extraction: edge extraction, centerline extraction, surface extraction, and their combinations. Nonetheless, our analysis shows most networks concentrate on road surface extraction. Liu et al. [22] and Mo et al. [23] classified methods into fully supervised learning, semi-supervised learning, and unsupervised learning based on the type of data annotation and learning approach.

Integrating the advantages of the above reviews and analyzing their deficiencies, this review categorizes deep learning methods for road extraction from high-resolution remote sensing images based on the type of data annotation into fully supervised learning, semi-supervised learning, and unsupervised learning. Fully supervised methods are divided into six types based on network models: Patch-CNN, Encoder–Decoder, GAN, Graph, Transformer, and Mamba. Based on the different annotated data, semi-supervised methods are divided into less labeled data-based and weak labeled data-based methods. Unsupervised methods are divided into those based on the fewer parameters models and those based on large remote sensing models. Figure 1 illustrates the specific categorization of deep learning methods for road extraction in remote sensing images, and Figure 2 displays the organization of chapters in this review.

2. Background

In this section, we discuss the development of deep learning-based methods in the field of computer vision, particularly focusing on their application in extracting road networks from high-resolution remote sensing images. Figure 3 displays a roadmap of the methods used in the relevant literature, and Figure 4 illustrates the sources of literature used in this review.

Road extraction methods typically classify each pixel of the image as “road” or “non road” [116]. Typically, road areas are obscured by surrounding objects such as cars, buildings, and trees. Although these obstructions make road area identification more complex, they also provide valuable background information that aids in identifying road areas in complex scenarios.

The development of road extraction has three main stages: morphological feature-based, manual feature-based, and deep learning approaches [117]. Initial traditional methods are often costly in terms of time and resources, and their reliance on manual analysis tends to restrict their accuracy [22]. As deep learning technology advances, methods that leverage it are progressively being refined, leading to ongoing improvements in both the precision and efficiency of road extraction. Therefore, this section focuses on the development of deep learning methods in the field of road extraction.

In 2011, convolutional neural networks (CNNs) showed initial success in optical character recognition (OCR) tasks, and subsequently, in the 2012 IMAGENET competition, Hinton et al. [24] achieved remarkable results using a CNN. With the improvement of several open-source platforms and the open-sourcing of models, CNN-based methods begin to show better results in various image-related tasks. Given the focus of our review on deep learning, we collect all relevant literature from 2012 to the present. The earliest neural network-based road extraction method in the past decade was proposed by Yuan et al. [118], who designed a network called LEGION, which emphasizes local information while suppressing global information. However, there was a gap in research on road extraction methods based on deep learning between 2011 and 2017, with few related works emerging during this period. Starting from 2017, a substantial number of deep learning-based algorithms have been applied in the field of road extraction. Recently, the notion of large remote sensing models has emerged. These models leverage deep learning algorithms and extensive remote sensing data collections to markedly improve the capabilities in tasks like road extraction.

3. Fully Supervised Methods for Road Extraction

In this section, fully supervised methods are categorized into five categories based on the different backbones used, including those based on Patch-CNN, Encoder–Decoder, GAN, Graph, and Transformer.

3.1. Methods Based on Patch-CNNs

The process of road extraction from remote sensing images using a patch-based CNN model primarily involves several steps. Firstly, the image and segment are preprocessed into patches. Then, they are inputted into CNNs to extract features and identify the patches containing road information. Finally, the road patches are aggregated and the complete road network is outputted. Figure 5 illustrates the general architecture of a patch-based CNN model.

SAR images exhibit distinctive geometric and scattering properties, offering distinctive landmark information compared to optical images [119]. Popescu et al. [25] proposed a combined radiometric/structure-driven method based on spectral descriptors for SAR images. Specifically, they utilized a feature extraction approach using 200 × 200 pixel image patches to recognize targets. Li et al. [26] introduced a CNN-based framework. Initially, a CNN model was employed to extract road features from small patches of SAR images and identify candidate road areas. Subsequently, an enhanced radon transform was applied to group the candidate roads, followed by the utilization of a Markov random field (MRF) for global road network connectivity.

Alshehhi et al. [120] presented a patch-based CNN model that extracts road and building areas from remote sensing images. The model uses fully connected layers and simple linear iterative clustering (SLIC) to enhance features and refine results. Unlike the method that integrates features from both low and high layers, as presented in [120], Chen et al. [121] proposed a coarse-to-fine road extraction strategy that integrates gray-value distribution and structural features. It employs a local Dirichlet mixture model for initial segmentation and a high-order deep-learning approach to capture road context.

Saito et al. [27] devised a novel output function called channel-wise inhibited softmax (CIS) to effectively train the network. Sun et al. [28] designed experiments to analyze the influence of different patch sizes and input image resolutions on segmentation accuracy and proposed a multi-scale collective fusion (MSCF) method to extract information from multiple resolutions.

In contrast to the aforementioned methods for surface extraction, the extraction of road centerlines is also a typical task in road extraction. Li et al. [122] employed a CNN model based on

32 \times 32

patches to extract road centerlines from high-resolution remote sensing images. They combined common image processing operators to obtain the road centerlines and design line integral convolution (LIC) to optimize the extracted road network. Differing from [122], Liu et al. [123] proposed a four-stage approach for road centerline extraction, where road centerlines are extracted using Gabor filtering models and multi-directional non-maximum suppression methods.

3.2. Methods Based on Encoder–Decoder

The Encoder–Decoder network architecture is a type of deep learning model designed to efficiently extract object information from input images. Recently, it has been the most commonly used semantic segmentation model in road extraction tasks from remote sensing images. The encoder is typically a pre-trained classification network used to extract features from input remote sensing images, transforming them into high-dimensional feature representations. The decoder part, combined with the features extracted by the encoder, restores the feature map size using upsampling techniques and then reconstructs the road network label map as output. Figure 6 illustrates the general architecture of an Encoder–Decoder model. As most fully supervised methods in recent years have been implemented based on this structure, for clarity, this section further categorizes them into five groups based on the variations in their decoders.

3.2.1. Methods Based on FCNs

Long et al. [29] introduced the fully convolutional network (FCN) in 2015, which replaces the fully connected layers of existing classification networks with convolutional layers. Unlike patch-based CNN models, a FCN makes end-to-end predictions on images. It accepts images of any size. The decoder of a FCN uses bilinear interpolation filters to restore the feature map to the same size as the input image. It retains spatial information and represents the membership relationship among pixels.

Subsequently, several methods are improved based on FCNs for the task of road extraction. First, these methods standardize remote sensing images to fit network inputs. Then, a FCN is utilized for layer-wise feature extraction and output generation. Finally, post-processing is applied to improve the accuracy. According to the different methods used by the network, the classifications are elaborated as follows.

(1) Feature Fusion. Zhong et al. [124] designed a model that integrates low-level semantic information with high-level semantic information. It also adds the output of pooling layers to the final score layer to enhance the overall accuracy of the model. Fu et al. [125] proposed an improved FCN model, which is divided into segmentation and classification stages. It primarily fuses multi-scale features of roads by designing skip connections.

(2) Different Loss Functions. Wei et al. [30] introduced a RSRCNN, which constructs a unique road structure loss function. It is the first to use structure-based loss for CNN in aerial image road extraction based on the minimum Euclidean distance. Henry et al. [31] devised FCN-8s with class-weighted mean squared error (MSE) loss and control parameters for model spatial tolerance to improve network performance. To address the issue of sample imbalance between the road and background in aerial images, Zhang et al. [32] proposed an ensemble method based on a FCN with spatial consistency (SC). The main idea of this method is to increase the weight of misclassified pixels. Li et al. [33] developed a noise probability model named RDNN to tackle the problem of noise in training data. The purpose was to leverage the relationship between input images, noisy labels, and true labels to learn the noisy data. RDNN effectively trains the noisy dataset using a loss function based on regularization methods.

(3) Data Augmentation. Chen et al. [34] suggested an improved CNN named MCNNTL. This network employs data augmentation, transfer learning, data preprocessing, and backpropagation algorithms to enhance road extraction accuracy. To address the challenge of low accuracy in extracting unpaved and narrow-width roads, Babaali et al. [35] designed DAA-SSEG for extracting unpaved and narrow roads. It utilizes a novel data augmentation technique based on geometric transformation and image refinement.

(4) Innovative Architecture. Varia et al. [126] employed the FCN-32 variant and GAN for road extraction from UAV remote sensing datasets. Kestur et al. [36] introduced UFCN, a U-shaped FCN architecture characterized by symmetric convolution and deconvolution operations with skip connections to retain local information. This model is similar to the UNet discussed in Section 3.2.2. Chen et al. [127] introduced CR-HR-RoadNet, which fuses local and global information for comprehensive road network analysis. It has a specialized encoder for detail retention and uses multi-scale, residual learning for spatial detail extraction. A compact coordinate attention module enhances global context awareness and infers relationships between segments.

(5) Training Speed Enhancement. Zhang et al. [128] proposed a MFFCN for road extraction in mountainous remote sensing images. MFFCN is improved on the basis of FCN and removes six convolution layers to improve training speed. Similarly, to boost the efficiency of the model, Pan et al. [129] proposed an automatic road centerline extraction method based on a FCN, using atrous convolution instead of pooling layers to enhance efficiency.

(6) Multi-Output Network. In the field of road extraction, there are three typical tasks: road surface segmentation, road centerline extraction, and road edge detection [130]. Some studies can simultaneously obtain two or more outputs through a multi-output network. Wei et al. [131] introduced a framework for simultaneous road surface and centerline extraction. It employs a FCN for initial segmentation and refines details through the iterative application of a lightweight FCN. The method utilizes a multi-seed point-tracking mechanism for road tracking and integrates segmentation and tracking to generate the final road network. Liu et al. [37] designed RoadNet, a multitask CNN that simultaneously predicts road surfaces, edges, and centerlines. It employs a specially designed cascaded network for learning multi-scale features by end-to-end training.

3.2.2. Methods Based on UNet

The UNet, proposed by Ronneberger et al. [38] in 2015, consists of a downsampling path for capturing context and an upsampling path for precise localization, both structured symmetrically resembling a “U”. The UNet model employs convolution operations in the upsampling path to reconstruct the details and structures of images. Since the majority of recent approaches are based on this architecture, they are further divided into several categories according to the specific techniques they utilize.

(1) Tailored Loss Function. Mosinska et al. [132] designed an iterative refinement method for road topology extraction. It uses a novel loss function to identify high-order topological features of roads. He et al. [133] introduced a structural SIMilarity (SSIM) loss function to refine extraction clarity. Ding et al. [39] proposed DiResNet with a loss function utilizing angular operators for directional mapping based on road direction. Constantin et al. [134] merged the UNet and atrous convolution architectures, using binary cross-entropy (BCE) and Jaccard distance in their loss functions. Buslaev et al. [135] combined the intersection over union (IoU) with BCE. Xin et al. [136] suggested a DenseUNet model with a weighted loss function to emphasize foreground pixels and improve precision. Qi et al. [40] developed DSCNet, a U-shaped network structure-based model, incorporating a continuity constraint loss function derived from persistent homology for enhanced topological continuity extraction.

(2) Multi-scale Contextual Information Fusion. Li et al. [41] proposed a HCN that consists of three subnets that extract features of roads at different granularities. Then, a shallow convolutional subnet is used to integrate them. Zhu et al. [42] introduced GCB-Net, incorporating a global context-aware (GCA) block into the network to capture global contextual information of roads. To better utilize spatial information, Tan et al. [137] utilized scale-sensitive and fusion modules to merge multi-scale information and learn the weight tensors of features. Hu et al. [138] offered DCANet comprising a discriminative context-aware feature module, which not only captures contextual information but also aggregates local information at multiple scales. RCFSNet, designed by Yang et al. [43], consists of MSCE and FSFF modules to enhance the feature representation of roads. Gao et al. [44] proposed an improved deep residual CNN named RDRCNN. It consists of a residual connected unit (RCU) and an expanded perception unit (DPU).

Wu et al. [45] devised a DGRN aimed at improving the utilization of spatial information. The model incorporates a dense global spatial pyramid pooling (DGSPP) module based on ASPP to capture contextual information. Doshi et al. [139] summarized three approaches they employed in the 2018 DeepGlobe Road Extraction Challenge. The first model maintained a constant number of 128 feature maps throughout the entire network. This enabled the model to tolerate a reduction in representational power within the encoder, as the presence of skip connections allowed the decoder to access low-level features. Zhang et al. [140] combined the advantages of residual learning and UNet to propose a new network for road extraction. The rich skip-connection structures within the model facilitated information propagation and enhanced performance while reducing parameters. Hong et al. [46] advocated a road centerline extraction method named Road-RCF, which is based on richer convolutional features (RCF). The RCF model processes the entire image to obtain high-level semantic information. It then leverages complementary information from different convolutional layers for precise extraction of road networks. Wang et al. [141] designed a feature extraction algorithm called dual feature fusion (DFF), which is based on context fusion and self-learning sampling. This method can suppress redundant features. Furthermore, they proposed a dense feature convolutional network (DFC-UNet).

(3) Diverse Attention Mechanisms. Xu et al. [47] proposed the GL-Dense-UNet for extracting roads of different widths. The model includes feature attention blocks to extract local and global information. Dong et al. [48] put forward BMDANet, which combines cross-layer information exchange with the block multi-dimensional attention (BMDA) module. Akhtarmanesh et al. [142] utilized both hard attention and soft attention to assist in designing an improved UNet. Xiao et al. [49] recommended RATT-UNet to extract mine roads, which incorporates a RATT module that integrates residual connections and attention to reduce parameters. Dai et al. [50] advocated RADANet, which includes a road augmentation module (RAM) and a deformable attention module (DAM) to obtain multi-scale semantic information. Mei et al. [51] designed CoANet, which includes a connectivity attention module (CoA) to predict the connectivity of the eight pixels adjacent to a given pixel. Utilizing the spectral representation of images, Yang et al. [52] put forward AFUNet with modulation learning (MoL) for modulating spectral features across different granularities. Patil et al. [53] introduced Tiny-AAResUNet, a method that combines the advantages of self-attention mechanisms and the residual UNet architecture to achieve higher accuracy and long-range dependency relationships.

(4) Specialized Network Architecture. Wang et al. [54] introduced the dual decoder UNet (DDUNet), incorporating a novel dilated convolution attention module (DCAM) that facilitates the fusion of multi-scale features between the encoder and decoder. Similar to DDUNet [54], Wang et al. [143] integrated the squeeze-and-excitation mechanism into a small decoder to extract the information for roads. Then, it is passed to another standard decoder, which refines the contextual understanding of the road network. Luo et al. [55] introduced AD-RoadNet, an auxiliary decoding network for road extraction. It mainly comprises the hybrid receptive field module (HRFM) and the topological feature representation module (TFRM) to better utilize road details.

Xu et al. [144] offer a road extraction method leveraging the advantages of UNet on top of the deep residual network. It introduces a multitask network to handle remote sensing images at different scales. Fan et al. [145] presented a deep residual-based U-shaped network model to address the problem of existing methods ignoring high-dimensional features in remote sensing images.

(5) Lightweight Model. Sun et al. [56] addressed the challenge of excessive parameters in the existing models by introducing LRSR-net. This model utilizes an expanded joint convolution module to mitigate the loss associated with pooling layers and to reduce the number of parameters. Sultonov et al. [57] designed two lightweight networks for road network extraction from UAV images. They integrated UNet, depth-wise separable convolutions, ConvMixer layers, and initialization modules. Han et al. [58] introduced a lightweight target-aware network named LOANet. The encoder of LOANet utilizes a lightweight, dense connection network.

(6) Road Topology Focus. Hao et al. [59] proposed a geometric-aware deep recursive neural network called Geo-DRNN for high-spectral classification. This network is built on the foundation of UNet and recursive neural networks (RNN). Additionally, the model introduces a Net-Gated GRU and geometric-aware ResNet loss to better encode complex geometric shapes. Ge et al. [60] introduced deep FR TransNet, which was designed to improve the learning capabilities of road contours. The encoder incorporates a novel deep feature review (FR) module, which learns the contour features of roads to minimize road fragmentation resulting from weight parameter loss. Qiu et al. [61] presented a dual-branch semantic-geometric framework named SGNet. The semantic-dominant branch collects dense semantic information about roads from the input, while the geometric-dominant branch generates sparse boundary features of the image. Finally, the information generated by the two branches is adaptively fused. Shao et al. [146] designed MCTN-Net, which is capable of recognizing railways, roads, sidewalks, and bridges. The network employs a dense feature-sharing encoder (DFSE) to extract directional and semantic features. These features are integrated into the orientation-guided stacking module (OGSM) to enhance connectivity detection.

(7) Multi-source Fusion. Luo et al. [62] combined LIDAR images with high-resolution images to build a dual-encoder cross-modal complementary network named DECCFNet. The encoder includes a cross-modal feature fusion (CMFF) module designed to blend features from different sources. Furthermore, a multi-direction strip convolution (MDSC) module was created to help the network concentrate more sharply on road features. Wang et al. [147] designed the DelvMap framework, which leverages delivery courier paths and satellite data to generate complete road maps. The framework operates in two steps. It first uses the dual signal fusion network (DSFNet) to create an inferred map by merging both types of data and then applying a map completion algorithm to integrate this inferred map with the existing road map, effectively filling in any missing details.

(8) Multi-Output Network. Cheng et al. [63] highlighted the significance of road detection and centerline extraction. They proposed CasNet, a cascaded CNN that addresses both tasks concurrently. It is composed of a main sub-network designed for efficient road detection, complemented by a secondary sub-network that utilizes the feature maps generated by the primary sub-network to delineate road centerlines. The model employs a refinement algorithm to enhance the centerline output. CasEANet, an improvement of CasNet designed by Liu et al. [148], introduces an edge perception module (ESM) and an attention module (AM) to refine road edges and enhance global contextual information. Lin et al. [149] presented a dual-task CNN adapted to road shape and scale variations. This network includes a residual encoder and is equipped with a multi-scale, multi-direction strip convolutional module (MSMD-SCM) within the decoder to improve the accuracy of road extraction. Additionally, Liu et al. [150] developed LRDNet, a lightweight road detection method. It incorporates a multi-scale convolutional attention network (MSCAN) and a coupled decoder head. This design aims to achieve efficient detection and smooth edge output, addressing efficiency and connectivity issues in occluded scenes. Guo et al. [151] proposed CRIN, which extracts roads and buildings concurrently through their complementary relationship. The model features an MTI module for task-specific information exchange and a CSI module for learning varying receptive fields across different structures.

3.2.3. Methods Based on FPNs

In 2017, Lin et al. [64] proposed the feature pyramid network (FPN), which is a framework with lateral connections that operates in a top-down manner. Its introduction was primarily aimed at improving feature fusion.

For example, Gao et al. [65] proposed a network called the multi-feature pyramid network (MFPN) for road extraction. The MFPN utilizes feature pyramids and an improved pyramid pooling module to extract multi-level semantic features of roads. In the optimization phase, a weighted, balanced loss function is implemented to tackle the issue of significant variance in pixel distribution between roads and the background within images. Yu et al. [152] designed a new model called CS-CapsFPN, which integrates context enhancement techniques with self-attention capsule feature pyramid networks to enhance the representational capacity of features. The model primarily enhances the representation of road features by extracting and fusing higher-order capsule features from various levels and scales.

3.2.4. Methods Based on SegNet

In 2017, Badrinarayanan et al. [66] devised SegNet, an innovative and practical network structure based on FCN. The architecture comprises an encoder, a decoder, and a pixel-wise classification layer. In contrast to FCN, the decoder of SegNet implements non-linear upsampling by leveraging pooling indices calculated during the max-pooling steps of its encoder, which minimizes the additional overhead associated with learning upsampling modules.

Panboonyuen et al. [67] utilized SegNet as the backbone and design DCED. The model employs an exponential linear unit (ELU) instead of a rectified linear unit (ReLU), typically used to enhance network accuracy. Furthermore, landscape metric thresholds are applied to eliminate excessively detected roads. The same group of authors proposed an enhanced version of SegNet in [68], drawing parallels with DCED [67] by employing ELU activation functions and landscape metric thresholds. Distinct from DCED [67], their approach introduces a conditional random field (CRF) to hone the road network extraction by considering the low-level information gleaned from the local interactions between pixels and edges.

To confront challenges like indistinct object boundaries, erroneous classifications, and irregularities, Zhao et al. [69] proposed a model called DANet, utilizing two spatial pyramid pooling (ASPP) structures for multi-scale feature fusion. Akhtar et al. [70] replaced the basic convolution blocks with dense residual blocks to achieve context information fusion and employ geometric shape analysis to filter out non-road segments after segmentation.

3.2.5. Methods Based on LinkNet

While current approaches predominantly concentrate on enhancing model accuracy, they frequently neglect the aspect of model efficiency. Therefore, Chaurasia et al. [71] introduced LinkNet in 2017, a model specifically designed for semantic segmentation. Drawing insights from UNet, LinkNet achieves feature learning without substantially increasing parameters, ensuring both speed and precision. Specifically, ResNet18 replaces commonly used encoders like ResNet101 and VGG16. Unlike UNet, LinkNet directly transfers the extracted features of the encoder to the decoder, bypassing pooling or stride convolutions. This refined approach accelerates the process while maintaining feature richness and the accuracy of the outcomes.

In 2018, Zhou et al. [72] introduced D-LinkNet, a variant of LinkNet, by integrating cascaded stacked dilated convolutions into its central layers. This modification enables the network to achieve a larger receptive field while preserving the high resolution of the feature maps.

(1) Design of Different Modules. Li et al. [73] enhanced D-LinkNet and devised D-LinkNetPlus by incorporating a bottleneck layer and ESIPs to reduce parameters and remove isolated blocks. Xie et al. [74] introduced HsgNet, which employs bilinear pooling in the intermediate module to capture global context. Deng et al. [75] designed SPD-LinkNet with strip pooling, considering large receptive fields and distant contextual information. Wang et al. [76] proposed FE-LinkNet to handle occlusions with a modified DP-Block for the multi-scale context. Wulamu et al. [153] presented a UNet-based network equipped with ASPP and a LinkNet-like decoder. Lu et al. [77] developed a global-aware deep network (GAN) featuring a spatial-aware module (SAM) and a channel-aware module (CAM) for road detection. Jie et al. [78] constructed MECA-Net, an enhanced LinkNet that integrates multi-scale encoding and long-range context for remote sensing road images.

To address the issue of roads in high-resolution remote sensing images being easily confused with surrounding terrain and susceptible to interference from non-road features, Wu et al. [79] introduced the NL-DLinkNet. This model incorporates non-local blocks into the DLinkNet encoder to capture long-distance dependencies among features in the satellite imagery. Wang et al. [80] also proposed a DLinkNet variant named NL-LinkNet with non-local blocks for road extraction from high-resolution satellite images.

(2) Multi-source Data Fusion. Sun et al. [81] merged crowdsourced GPS data and aerial images to improve road extraction. Their model employs novel techniques such as data augmentation, GPS rendering, and 1D transpose convolution to enhance network performance. Liu et al. [82] proposed a cross-modal message propagation network (CMMPNet) by leveraging aerial images and crowdsourced trajectory data. Zhang et al. [83] introduced FND-Linknet, which merges DLinkNet with filter response normalization (FRN) layers. It also applies transfer learning from multi-source road datasets to enhance the precision of road extraction. To address the issue of the time-consuming and labor-intensive process of obtaining a large dataset with precise annotations, Zhang et al. [154] proposed a method that utilizes the GPS trajectories of floating cars as the training set.

(3) Attention Mechanisms. Wu et al. [84] recommended a dual attention network (DA-LinkNet) that combines the advantages of D-LinkNet and dual attention mechanisms. To better integrate features from different branches and reduce information loss, an attention feature fusion module is used to replace skip connections. Li et al. [155] introduced a DLinkNet-based cascaded network designed to enhance the precision of road boundary detection. The network leverages spatial attention residual blocks across various scales to maintain long-range dependencies, while channel attention mechanisms are employed to refine the integration of features. Ai et al. [85] applied variance and the coefficient of variation to the squeeze-and-excitation (SE) mechanism, designing a multi-parameter-guided SE module named MPGSE, which was then integrated into the D-LinkNet architecture. Weng et al. [156] introduced an improved D-Linknet that integrates an edge detection module for the purpose of detecting railway tracks. This module incorporates a channel-spatial dual attention mechanism to expand the receptive field, thereby reducing missed detections.

(4) Design of Specialized Network Architecture. Motivated by strategies employed in lane detection, Hu et al. [86] designed the location-guided network (LGNet), aimed at resolving the problem of disjointed extraction results common in segmentation techniques. They devised an auxiliary road location prediction (RLP) branch, which predicts road positions through row and column anchors. Yang et al. [87] designed RUW-Net, a dual-encoder structure network based on D-LinkNet. They introduced a decoder–encoder combination (DEC) module to connect the two networks and minimize the semantic gap.

(5) Multi-Output Network. Lu et al. [130] proposed CasMT, which can simultaneously perform road surface segmentation, centerline extraction, and edge detection. It leverages topology-aware learning and hard example mining (HEM) loss to enhance accuracy.

3.2.6. Methods Based on DeepLab

The Google team introduced Deeplab v1 [88], Deeplab v2 [89], Deeplab v3 [90], and Deeplab v3+ [91] during the period from 2014 to 2018. Here is a brief overview of the DeepLab series semantic segmentation algorithms.

Deeplab v1 combines DCNN and DenseCRF, using VGG16 as the base model, and employs dilated convolutions and fully connected conditional random fields to enhance the accuracy of semantic segmentation. DeepLab v2, built upon v1, replaces VGG16 with ResNet101 as the backbone and introduces the ASPP module, providing a powerful plug-and-play module for future semantic segmentation models. DeepLab v3 validates the effectiveness of the parallel ASPP modules and directly upsamples the decoder part by 16 times to obtain the output.

The latest DeepLab v3+ adopts an encoder–decoder architecture, utilizing Xception as the backbone network with fine-tuning. The decoder does not progressively restore image size or directly upsample by 16 times like v3. Instead, the encoder features are firstly upsampled by four times. They are then connected with the corresponding low-level features that have the same spatial resolution. Finally, the features are refined using convolution and upsampled to the same size as the input image using bilinear interpolation.

Building upon the Deeplab v3 framework, Lin et al. [92] introduced nested SE-Deeplab, which incorporates the SE module to refine road network extraction. In addition, the model leverages multi-scale upsampling to integrate data from various levels. Huan et al. [157] proposed SANet with strip attention. Building upon the architecture of DeepLab, they incorporated a strip attention module (SAM) to extract contextual semantic information and spatial positional information of roads. They also added a channel attention fusion module (CAF) to fuse low-level and high-level features.

Lourenço et al. [158] presented an improved method for automatically detecting rural roads. They utilized the road network output from DeepLab v3+ and refined it using morphological methods to obtain the centerlines.

Xu et al. [159] introduced P2CNet, which integrates partial maps with satellite images. The network incorporates a gated self-attention module (GSAM) to capture long-range dependencies and introduces a missing part (MP) loss function.

3.3. Methods Based on GAN

Recently, methods based on generative adversarial networks (GAN) have made significant progress in road extraction from remote sensing images. This strategy involves training a generator to produce realistic road images while simultaneously training a discriminator to distinguish between real road images and generated ones. The adversarial training process helps to enhance the accuracy and robustness of road extraction. Figure 7 illustrates the general architecture of the GAN model.

GANs are utilized in various approaches to optimize the structure of network models, thereby ultimately improving the general performance of the model. A deep convolutional generative adversarial network (DCGAN) was suggested by [93]. Both the generator and discriminator of the DCGAN are components of deep CNN architectures used to improve the performance of the entire network, such as U-Net, SegNet, FCN, and so on. Shi et al. [160] proposed an end-to-end GAN framework for road detection. In the generator, SegNet was employed to produce pixel-level classification results. Zhang et al. [161] offered a refined network that does not need large training datasets with a simpler architecture. The network uses a FCN as the generator and a CNN as the discriminator.

In addition to optimizing the network architecture, efforts have been directed towards refining the loss function to strengthen relevant constraints. Pre-processing and post-processing techniques are also utilized to further enhance the effectiveness of road extraction. Gulrajani et al. [162] leveraged the Wasserstein distance in standard GANs and introduced a gradient penalty to ensure a more stable training process. Yang et al. [94] put forward E-WGAN-GP for road extraction. This network uses the UNet and BiSeNet as generators, respectively. Furthermore, a spatial penalty term was added to the loss function to solve the class imbalance problem. Abdollahi et al. [95] devised a modified UNet architecture, denoted as MUNet, as the generator for generating road network segmentation maps. This network incorporates a simple pre-processing step involving edge-preserving filtering techniques. Cira et al. [163] presented a lightweight conditional GAN framework based on Pix2pix, which was designed to improve the extraction of road surface areas. The method incorporates a post-processing mechanism to enhance the precision of road extraction outcomes.

To extract more comprehensive features, some multi-stream networks have been proposed. Tao et al. [164] proposed a GAN-assisted two-stream neural network to enhance the effectiveness of feature extraction. The primary stream leverages high-resolution panchromatic images to retain low-level details, while the auxiliary stream uses an unsupervised approach to extract high-level features from multispectral images. Costea et al. [165] designed DH-GAN, a model that operates in two stages involving GANs. In the first stage, a pair of GANs are trained. The first generates road segmentations and the second recognizes intersections concurrently. In the second stage, a graph optimization process based on smoothness is applied to produce the final road map. Liu et al. [166] proposed a novel model called TPEGAN, which combines a segmentation model based on road pixel enhancement with graph inference. During the process of generating pixel-enhanced images, GAN leverages the consistency among road pixels to improve the segmentation accuracy. Furthermore, the multi-scale dual-branch segmentation module employs graph inference to capture the long-range dependencies of roads.

In addition to using GANs to enhance the overall network structure, some studies focus on integrating multi-scale features to improve the accuracy of road network segmentation. When applying GANs to road segmentation, a significant challenge arises when dealing with input data of uniform resolution. In such cases, the network may overlook the interrelationships between pixels, which can lead to incomplete segmentation of road objects and discrepancies in the size and shape of the segmented objects compared to the ground truth. To address this issue, Li et al. [167] put forward a network that integrates GAN with multi-scale context aggregation. By inputting three scale images (0.5n, 1n, and 2n) into the generator, corresponding scale road extraction results are obtained with identical parameters. Lin et al. [168] presented a network designed for road extraction that leverages the combination of multi-scale information. The model integrates the ASPP module and a feature fusion module within the encoder of the generator, allowing for the effective consolidation of multi-scale features and the utilization of background information. Moreover, the generator utilizes an asymmetric encoder–decoder structure to minimize feature redundancy. Zhang et al. [96] designed MsGAN, which improves topological connectivity and spectral structure through multi-scale feature fusion. The network is designed with two discriminators, each containing four sub-discriminators that take the same image at four different scales as input, enabling the network to extract roads of varying widths. Shamsolmoali et al. [169] incorporated a feature pyramid (FP) into GAN. A FP is used to extract features which contains four divisions: feature map fusion (FMF), an optimized u-shape network (OUN), feature transportation division (FTD), and scale-wise feature concatenation (SFC). They cooperate with each other to obtain the final multistage multi-scale output features.

3.4. Methods Based on Graph

Currently, the majority of road extraction methods are built upon CNNs. Although these approaches can deliver high-quality road networks, CNN-based techniques often exhibit suboptimal performance in extracting the topological connectivity of road networks due to the inherent constraints of convolution operations. To improve the quality of road network topology, numerous methods resort to sophisticated post-processing techniques for optimization. However, the efficacy of these post-processing steps is frequently limited by the quality of the initial road segmentation results.

Consequently, preserving the topological connectivity of roads remains a significant challenge. In light of this, methods based on graph structures are gaining increased attention. In this context, the term “graph” does not refer to graph neural networks (GNN) but rather emphasizes the topological relationships among roads. Figure 8 shows the general architecture of the Graph model.

3.4.1. Methods Based on Graph Representation

A road network can be represented by an undirected graph denoted as G(V,E), where V and E represent the set of road nodes and edges between nodes, respectively. Therefore, the focus of these methods is on finding the key points that make up G and the connectivity between them. The connectivity between points is usually represented by an adjacency matrix. In graph representation-based methods, the nodes and edges that delineate the roads are typically derived from CNN.

Xu et al. [170] proposed a new method for computing vector maps from remote sensing images, which is based on well-defined patched line segment (PaLiS) representations of road graphs with geometric significance. These fragments contain both the location and direction of the road. Xu et al. [171] designed csBoundary, a method that initially generates a keypoint map and subsequently utilizes AfANet to delineate the road edges by predicting the adjacency matrix of the vertices. Zao et al. [172] proposed an end-to-end road extraction approach known as Road2Graph. This method encodes road maps into a seven-dimensional representation that encompasses segmentation maps, vertex maps, midpoint maps, and their respective endpoint displacements. It refines the output by integrating multi-scale features. Finally, a decoding module is employed to recover the topological representation.

To further improve road connectivity and topology, many works combine CNN and G(V,E) to form multi-task branches to ensure the contextual semantic information of features and the connectivity of roads. For example, Li et al. [173] devised a multi-task architecture within an encoder–decoder framework to simultaneously predict the segmentation, anchor points, and connectivity maps. The latter two branches can improve road segmentation performance by enhancing road connectivity and topology. Then, the road network is constructed and simplified based on three predicted maps. Mattyus et al. [97] proposed DeepRoadMapper, a network that involves a two-step road extraction process. It initially employs a CNN to segment aerial images, followed by the generation of a graph that portrays the road topology, where nodes represent road endpoints and edges represent the curves joining them.

Beyond integration with segmentation tasks, certain methods are also designed to create a multi-branch network that encompasses additional operations such as direction extraction and node extraction. Wu et al. [174] introduced Bi-HRNet, which contains three parts: the “top-to-down” and “down-to-top” road direction prediction branch and node heatmap prediction. Chen et al. [175] suggested a multi-task network which combines three branches: a boundary auxiliary branch, a road extraction backbone, and a node inferring branch. All of them are trained together, and the latter two branches are trained with equal-weighted loss. This network incorporates the road boundary details and road junction information. Zhang et al. [176] offered a method for extracting road nodes and inferring the connectivity between them, known as NodeConnect. This method predicts road nodes by learning a confidence map and simultaneously proposes a multi-task framework to learn the connectivity map for the nodes. Zao et al. [98] proposed TopoRoad, a method that learns road topological maps to extract road networks. It comprises three main components: road vertex prediction, direction graph prediction, and segmentation graph prediction. After a unified decoding process, these three components are able to obtain the vertices and edges of the final road map. This method effectively addresses the issues of excessive parameters and low computational efficiency.

There are also some methods based on Graph neural networks (GNN) and Graph convolutional networks (GCN) for road extraction. Liu et al. [177] introduced RDPGNet, a network that integrates a CNN for feature extraction with a GCN for information interaction, centered around a GCN-based dual-view perceptor (GDVP). A GDVP includes an RFSG for reweighting regional features during graph inference and an RSHS to detect long-range road dependencies. They also implement an MVFA strategy to effectively consolidate road information. Zhou et al. [178] designed a split depth-wise (DW) separable GCN named SGCN to obtain spatial and channel features. The network then uses a GCN to capture global contextual information and constructs the adjacency matrix of the feature map with the Sobel gradient operator.

3.4.2. Methods Based on Iterative Detection

Iterative detection-based methods construct road extraction as an iterative graph generation. They start by defining an initial vertex, then iteratively predict the next vertex and ultimately obtain the entire road network. There are two issues that need to be addressed with this method: how to obtain the initial vertices and how to locate the subsequent point or the direction of advancement.

RoadTracer [179] is the first method used to employ iterative detection for road extraction. It begins at an initial vertex, predicting one of the fixed angles at each step and moving by a fixed step size. However, relying solely on the information from the current location to identify the next step may lead to a deviation between the extracted road and the actual road. To enhance road connectivity, Tan et al. [99] proposed a point-based iterative graph exploration scheme that integrates segmentation cues and a flexible step approach. This method employs a point-based detector capable of learning an appropriate step size through point-based supervisory encoding. Lian et al. [100] proposed DeepWindow, which utilizes a CNN to identify central points within patches and progressively determines subsequent center points.

To tackle the challenges of inaccuracies and inefficiencies encountered during the iterative procedure, Xu et al. [101] applied imitation learning to road detection, training agents to mimic expert policies using initial vertex candidates from segmentation and heatmaps. They introduced a training algorithm combining exploration methods for robust generation. Later, Xu et al. [180] developed a novel model called RNGDet, leveraging a CNN and transformer for feature extraction and vertex prediction. Enhanced with instance segmentation, RNGDet++ [181] refines the training process of network and reasoning by utilizing multi-scale features. Cheng et al. [182] introduced JTFN, which extracts curve-structured objects via an iterative feedback strategy. JTFN employs the object boundary to provide global topological regularization for the predicted mask. It also integrates a feature interchange model (FIM) to facilitate better feature exchange in segmentation and boundary detection. Additionally, a Gaussian attention unit (GAU) is included for feature enhancement.

3.4.3. Methods Based on Polygon Boundary

Iterative detection methods maintain road network topology but face challenges due to time-consuming vertex-by-vertex boundary generation. The narrow and elongated nature of roads necessitates global information for feature extraction, which CNNs struggle to capture. To overcome these issues, research has shifted towards treating road boundary extraction as a polygon identification problem, focusing on direct shape prediction from images.

Some models have been proposed to directly predict polygons from the input images using CNNs, such as PolygonRNN [102] and its improved variant PolygonRNN++ [183]. In the encoder, a CNN is employed to extract features that predict the initial vertex, which are then passed to a recurrent decoder. The RNN predicts additional vertices in the decoder, thereby constructing polygons incrementally. PolygonRNN++ builds upon PolygonRNN with several enhancements. It incorporates a novel CNN encoder, employs reinforcement learning for training, and utilizes a GNN to enhance the resolution of the output.

Some studies enhance global perception and continuity by adding modules or loss constraints. Hu et al. [103] introduced PolyRoad, which uses a transformer for parallel road boundary detection and proposed a polyline matching cost and additional losses for improved topology.

Numerous polyline detection methods focus on particular targets and may not perform well across a diverse range of categories. Yang et al. [184] designed TopDiG, a model that adapts to diverse boundary extractions, including road boundaries. It involves a topological-concentrated node detector for initial extraction, dynamic graph supervision for label generation, and a directional graph generator for constructing topological graphs, offering a general approach to boundary detection.

3.5. Methods Based on Transformer

Road networks in remote sensing images are extensive but relatively small in scale, which often leaves traditional methods lacking in global context and localization accuracy. The Transformer architecture excels in acquiring global information and leveraging contextual cues from the input imagery. It employs a self-attention mechanism to capture relationships across different positions in the input sequence, enabling parallel computation. Although numerous methods based on Transformers have been proposed in recent years, few rely solely on Transformer for road extraction. Most methods integrate Transformer within a neural network architecture, allowing it to interact with other components to collaboratively accomplish the task of road extraction. Figure 9 illustrates the general architecture of a neural network-fused Transformer model.

In the field of road extraction from remote sensing images, numerous studies have deeply investigated methods that combine CNNs with Transformers to extract more abundant features. This integration strategy leverages the advantages of CNN in capturing spatial information and utilizes the capabilities of Transformer in processing sequence data and addressing long-distance dependencies. Previous research has already demonstrated the effectiveness of the Encoder-Decoder network architecture. Within these methods, the majority of CNN frameworks adopt a “U”-shaped structure. Wang et al. [185] integrated a CNN-Transformer into the UNet architecture to enhance feature extraction. They connected the Transformer in succession to the CNN and introduced a dual up-sampling module to improve performance. RoadCT, designed by Liu et al. [104], fuses CNN and Transformer features in a two-step decoder for road extraction. Li et al. [186] proposed a MACN with a mixed attention and convolutional Transformer (MACT) layer for efficient feature capture. Meng et al. [105] introduced an axial Transformer module (ATM) and a multilayer attention fusion module (MLAF) on UNet for feature learning and a channel attention module (CAM) for enhanced feature representation. Jamali et al. [187] combined residual learning with UNet and ViT in ResUNetFormer, employing a neighborhood attention Transformer for local feature enhancement.

There are some works that combine Transformer into modules to extract features with multi-scale, multi-stage, and rich contextual information. Luo et al. [188] introduced BDTNet, which uses a Transformer-enhanced BDTM module to capture multi-scale contextual information of roads, followed by a feature refinement module (FRM). Hu et al. [189] proposed MDTNet, incorporating a multi-scale deformable Transformer (MDTB) module for comprehensive feature capture, blending Transformer and deformable convolution. Wang et al. [190] integrated a Transformer-based ESTM into the neck of their model for global context modeling. In addition, they introduced the GDEM for the automatic extraction of contextual information within the model. Alongside these, they proposed REF loss to improve the accuracy of road extraction under conditions of sample imbalance.

A mix-Transformer enhances the capability of road extraction in remote sensing images through its hybrid attention and local–global fusion features. Deng et al. [191] designed UMiT-Net. It consists of four mix-Transformer blocks for global feature extraction and a dilated attention module (DAM) for semantic feature fusion. The decoder employs multiscale self-adaptive modules (MSAM) to boost segmentation precision, concatenating multi-scale features and refining outputs through attention mechanisms, resulting in more connected and accurate road segmentation.

The Swin-Transformer, known for its efficient multi-head and shifted window self-attention, streamlines road extraction computations. Ge et al. [192] integrated it into a U-shaped architecture to boost global learning. TransRoadNet [193] employs the Swin-Transformer in a CIEM framework for feature map downsampling. Zhang et al. [194] presented a Transformer-based approach with modules dedicated to detailed road feature extraction and fusion of global/local contexts. Yang et al. [195] presented SSEANet, a framework that jointly trains the CNN and Swin-Transformer with the aid of consistency loss to improve their cross-supervised capabilities.

Yuan et al. [106] proposed RRSIS, a model that generates segmentation masks from natural language descriptions using a Transformer-based LAVT model with an LGCE module for better detection of small targets.

3.6. Methods Based on Mamba

In recent years, the breakthrough progress of artificial intelligence technology in large language models and basic visual models has attracted scholars’ attention to large-scale, remote sensing model technology. In the research field of road extraction from remote sensing images, methods based on VMamba [196] have been widely applied. These methods not only improve the efficiency of road network extraction through the deep learning ability of large models but also highlight the enormous potential of large models in remote sensing applications.

The VMamba-based approach maintains the superior features of ViT while utilizing linear time complexity for processing. This method effectively captures global information in two-dimensional images through variants of the cross-scanning module. For example, Chen et al. [197] proposed RSMamba for remote sensing scene classification, which includes roads. RSMamba integrates the advantages of global receptive fields and linear complexity modeling and designs a dynamic multi-path activation mechanism to enhance the modeling capability for two-dimensional image data. Zhao et al. [198] proposed RSM, a remote sensing Mamba that captures global contextual information with only linear complexity. RSM is designed for dense prediction tasks in high-resolution remote sensing imagery, including road detection. RSM mitigates the loss of contextual information caused by input image segmentation and employs an omnidirectional selective scan module for global modeling from multiple directions. Ma et al. [107] developed RS3Mamba, a dual-branch network that enhances CNNs and Transformers with an auxiliary VSS block for global information and an inter-branch collaboration completion module for feature enhancement and fusion. Zhu et al. [108] proposed Smamba for semantic segmentation of high-resolution remote sensing images. They used Samba blocks as encoders and an FPN-based UperNet as the decoder. In Samba blocks, Mamba substitute the multi-head self-attention of ViT and are combined with multiple MLPs for efficient image feature extraction. The UperNet decoder effectively captures multi-level semantic information.

3.7. Comparison of Six Models Based on Fully-Supervised Learning Methods

Specifically, each model within the fully supervised learning approach presents its own set of strengths and weaknesses. We provide a concise yet comprehensive overview in Table 1 that includes the quantity of relevant literature and clearly delineates the advantages and limitations of these six distinct models.

4. Semi-Supervised Methods for Road Extraction

Supervised-learning methods remain the predominant approach for road extraction from remote sensing imagery, continuously achieving breakthroughs in performance. However, these methods necessitate extensive datasets with clear labels, which can be both time-consuming and resource-intensive to compile. Consequently, semi-supervised learning methods have emerged as a viable alternative. Within this paradigm, weakly supervised learning techniques represent a significant branch. We categorize semi-supervised learning approaches into two types based on the nature of the training data: those utilizing partially labeled data and those using imprecisely labeled data.

4.1. Methods Based on Less Labeled Data

Methods based on less labeled data can leverage a small amount of labeled data alongside a large volume of unlabeled data to train the network. The primary idea is to mine deep and useful information from the vast pool of unlabeled data, thereby reducing annotation costs and lowering the demands on labeling expenses [199].

Xia et al. [200] focused on creating representative datasets and a semi-supervised technique to leverage deep learning for road extraction from satellite images. He et al. [201] presented ClassHyPer, a semi-supervised method using hybrid perturbation to improve model performance with limited data, incorporating boundary information and implicit pseudo-supervision without extra threshold settings.

Han et al. [202] introduced a semi-supervised learning (SSL) method for road detection using GAN alongside a weakly supervised learning (WSL) approach based on conditional GAN. IndSSL, thedgenerator produces road detection results for both labeled and unlabeled images, withdthe discriminator determining labeling. WSL predicts road shapes to guide both the generator and discriminator. Chen et al. [203] presented SemiRoadExNet, a GAN-based method that overcomes the limitations of previous SSL methods in utilizing pseudo label information. It features one generator and two discriminators on UNet, extracting features and producing road segmentation and entropy maps. The discriminators enforce feature consistency between predictions, with the generator refined through adversarial training by leveraging unlabeled data.

Yang et al. [195] designed a semi-supervised edge-aware network combining CNNs and Transformers for road segmentation named SSEANet, focusing on road edges to overcome limitations of traditional self-training methods. Cheng et al. [109] developed an algorithm that integrates semi-supervised segmentation with multi-scale filtering and multi-directional non-maximum suppression for road centerline extraction. Xiao et al. [204] proposed a semi-supervised FCN algorithm that optimizes labeled and unlabeled sample losses to prevent overfitting. Further advancing the field, You et al. [205] presented FMWDCT, which integrates road information into a dual network, combining semi-supervised training and data perturbation to address overfitting and class imbalance.

4.2. Methods Based on Weak Labeled Data

Methods based on weak labeled data do not require detailed annotations for the data, even scribble labels can suffice, significantly reducing the cost and time required for data labeling [206]. Currently, datasets based on partial road maps can provide incomplete road network labels, enabling models to learn and infer the complete road network despite missing information [159]. These methods make full use of available resources and leverage algorithmic intelligence to compensate for the lack of annotation information, thereby achieving broader road network extraction under resource constraints [206].

Wang et al. [207] proposed CRAUP, an object segmentation method based on imprecise annotation in remote sensing images, using consistency regularization (CR) and average update of pseudo labels (AUP) to refine the semantic segmentation network with pseudo and accurate labels. They enhanced CRAUP [207] with the RanPaste algorithm and mean teacher approach [208] for higher accuracy. Bonafilia et al. [110] merged weakly supervised learning and semi-supervised learning to detect buildings and roads from OpenStreetMap(OSM), using D-LinkNet with weakly-supervised methods for robust road extraction on noisy datasets. In summary, this work is considered the first research considering pre-training globally with OSM without fine-tuning. Chen et al. [209] introduced SW-GAN, which employs a weakly supervised network within a GAN framework, enhancing performance with a mix of weak and clear labels. Wu et al. [116] proposed MD-ResUNet, a weakly supervised method for road extraction, relying on OSM centerlines and outperforming fully supervised counterparts. Meng et al. [210] developed a segmentation model that leverages OSM road data and satellite imagery to mitigate the need for precise pixel-level annotations and enhance generalization. Leveraging data annotated with road center points, Lian et al. [211] designed a method based on point annotations for road extraction. They employed a CNN for the detection of road seeds and trained it solely with point annotations.

Hu et al. [111] introduced a weakly supervised GAN-based method for road extraction, using a ResNet generator and WGAN-GP optimization with threshold post-processing. Hua et al. [212] proposed a semantic segmentation framework based on sparse scribble annotations. The framework utilizes the feature and spatial relational regularization method, designing an unsupervised learning signal that combines spatial and feature term neighborhood structures to complement the supervised task. Wei et al. [213] put forward a dual-branch ScRoadExtractor based on a weakly supervised road extraction model. This model can learn features from scribble annotations, which are relatively easy to obtain, eliminating the need for large datasets with pixel-level annotations. Zhou et al. [214] designed SOC-RoadNet, a dual-branch network for weakly supervised learning based on structural and directional consistency. The segmentation branch of this network is capable of learning road surface features using only scribble labels.

In addition to these standard models, some research has also been attempted on large models. The segment anything model (SAM) [215] proposed by Meta AI is a powerful tool. It can improve segmentation efficiency without the need for completely labeled data. Some studies make adjustments based on SAM to make it suitable for road extraction in remote sensing images. For example, Osco et al. [216] tested SAM across multi-scale datasets with various input prompts and implemented an automated technique that combines text prompts derived from general examples with a single training to improve accuracy. Hetang et al. [112] improved the SAM by designing SAM-ROAD to extract road networks from remote sensing imagery. They modified the encoder of SAM and utilized the non-maximum suppression method to extract the vertices of the road map and use a lightweight Transformer-based GNN to predict the topology of the graph. Ma et al. [217] introduced a semantic segmentation model for remote sensing images that leverages SAM to integrate target and boundary constraints. The model generates SAM-generated objects (SGO) and SAM-generated boundaries (SGB) to improve accuracy through object consistency and boundary preservation losses. By incorporating a SAM-based phase into traditional models, the approach directly generates SGO and SGB, enhancing segmentation performance.

5. Unsupervised Methods for Road Extraction

Road extraction methods based on unsupervised learning do not rely on labeled datasets [116]. Instead, they explore the inherent structures and features within input images to identify road regions. Initially unsupervised road extraction methods are based on traditional image processing techniques such as edge detection, mathematical morphology, and template matching, which heavily depend on geometric and radiometric features. However, these methods often exhibit poor adaptability to complex scenes [218]. With the advent of deep learning, many approaches utilize autoencoders to learn intricate road features or leverage GAN to generate more precise road network emerge as viable options. Notably, self-supervised learning methods, a subtype of unsupervised learning method, train models by formulating prediction tasks (e.g., predicting missing image regions), thereby indirectly learning pertinent features for road extraction [219].

5.1. Methods Based on Models with Fewer Parameters

Zhang et al. [113] introduced the category-anchored guided UDA (CAG-UDA) model for semantic segmentation to mitigate bias in unsupervised domain adaptation (UDA) classifiers. It employs category-anchored feature alignment and utilizes pixel-level and discrimination losses to improve target domain identification, enhancing inter-class variance while reducing intra-class variance. To address the domain shift (DS) challenge, Zhang et al. [220] designed RoadDA, a two-stage unsupervised domain adaptation network. Initially, the generator, equipped with a feature pyramid fusion module (FPFM), predicts segmentation for unlabeled target data, with the discriminator identifying domain labels. In the subsequent stage, the model generates pseudo-labels to refine segmentation and minimize domain discrepancies.

Deng et al. [206] proposed an adversarial learning framework for semantic segmentation in remote sensing images that reduces the need for extensive labeled data. Their GAN-like framework includes a segmentation network and a discriminator to handle distribution shifts between datasets. Initially, the segmentation network is trained in a supervised manner on labeled source data, followed by unsupervised fine-tuning on the target dataset using adversarial loss from the discriminator. Similarly, Cira et al. [221] designed a cGAN to enhance road feature representation in semantic segmentation through unsupervised generative learning, validating the approach through qualitative perception.

Han et al. [222] proposed a self-supervised technique, termed segmentation and reconstruction, designed to overcome the constraints of standalone segmentation models regarding the preservation of road connectivity and the attainment of boundary smoothness. Their architecture includes a segmentation model for initial road extraction from remote sensing images and a reconstruction model based on an all-visible denoising autoencoder (AV-DAE) for refining the results. The AV-DAE, trained without additional constraints, effectively improves road topology as a post-processing step.

5.2. Methods Based on Large Remote Sensing Models

Cha et al. [223] proposed a billion-scale foundational remote sensing image model. Their research investigated how the size of model parameters affects the performance of tasks like semantic segmentation. They pretrained foundational models with varying numbers of parameters, including 86 M, 605.26 M, 1.3 B, and 2.4 B, to determine whether the performance of downstream tasks improves with an increase in parameters. Additionally, they introduced an enhanced Transformer approach that enhances parallelism.

Yan et al. [224] designed RingMo-SAM, a foundational model for multimodal remote sensing image segmentation that can handle object segmentation and target classification in both optical and SAR data. They constructed a large-scale training set using multiple open-source datasets. The model features a classification decoupling mask decoder (CDMDecoder) for accurate classification and segmentation. Furthermore, it introduces a prompt encoder that optimizes the precision of multi-object segmentation and enhances the segmentation performance of SAR images.

Sun et al. [225] developed RingMo, a foundational model for remote sensing images that leverages generative self-supervised learning. They constructed a large-scale dataset with 2 million images and used the PIMask strategy and RingMo MIM method, which effectively handle dense small targets in complex scenes. The encoder, once trained, is suitable for various optical remote sensing tasks and uses ViT and Swin Transformer architectures to optimize reconstruction accuracy through L1 regression loss.

6. Metrics

Road extraction from remote sensing images is a binary classification problem, where road pixels are positive samples and background pixels are negative samples. The performance of a model is assessed based on a series of important metrics. These metrics are typically derived from the four fundamental elements within the confusion matrix of the classification results: TP (the number of pixels correctly predicted as roads), TN (the number of pixels correctly predicted as non-roads), FP (the number of pixels incorrectly predicted as roads), and FN (the number of pixels incorrectly predicted as non-roads) [95].

This review introduces nine indicators, which include Accuracy [46], Precision [25], Recall [25], F1 score [33], intersection over union (IoU) [139], mean intersection over union (mIoU) [52], average path length similarity (APLS) [174], entropy-based connectivity metric (ECM) [226], and customized connectivity (CC) [101].

Among them, Accuracy, Precision, Recall, F1 score, IoU, and mIoU are the most commonly used evaluation metrics, which are calculated based on the elements within the confusion matrix.

6.1. Accuracy

Accuracy refers to the proportion of the number of samples correctly predicted by the model to the total number of samples, which is defined as Equation (1). A higher value indicates that the model has a stronger ability to correctly predict road pixels, meaning that the predictions of the model are more aligned with the actual road positions.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N} .

(1)

6.2. Precision

Precision refers to the proportion of pixels predicted by the model as roads that actually belong to roads, as defined by Equation (2). It is suitable for situations where reducing false positive rates is important. A higher value indicates higher accuracy of the model in predicting road pixels.

P r e c i s i o n = \frac{T P}{T P + F P} .

(2)

6.3. Recall

Recall refers to the proportion of pixels correctly identified by the model as roads relative to the total number of actual road pixels, as defined by Equation (3). It measures the completeness of the predictions and is suitable for scenarios where minimizing false negatives is crucial. A higher value rate indicates a stronger ability of the model to capture all actual road pixels.

R e c a l l = \frac{T P}{T P + F N} .

(3)

6.4. F1 Score

The F1 Score is the harmonic mean of Precision and Recall, considering both the accuracy and completeness of the model, making it suitable for scenarios where balancing accuracy and completeness is important. As defined by Equation (4), a high value indicates that the model has achieved a good balance in predicting road pixels, minimizing both false positives and false negatives as much as possible.

F 1 S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} .

(4)

6.5. IoU

The IoU indicator represents the degree of overlap between the road area predicted by the model and the actual road area defined as Equation (5). Higher IoU values generally correspond to higher Accuracy, Precision, Recall, and F1 Score. A higher value reflects that the model prediction results are closer to the actual situation.

I o U = \frac{T P}{T P + F P + F N} .

(5)

6.6. mIoU

The mIoU in road extraction tasks calculates the average IoU value between road pixels and background pixels. Its formula is shown as Equation (6). Similar to IoU, the value of mIoU ranges between 0 and 1, with a higher value indicating a stronger model capability in road extraction.

m I o U = \frac{1}{2} (\frac{T P}{T P + F P + F N} + \frac{F N}{F N + T P + T N}) .

(6)

6.7. APLS

APLS is used to measure the similarity between the extracted road network and the real road network. It is defined by Equation (7). By comparing the average path lengths between them, the accuracy and completeness of the road extraction results can be evaluated, determining whether the topology of the road network is consistent with the real situation. In the definition of APLS, N is number of unique paths, and

L (a, b)

is the length of path

(a, b)

. The node

a^{'}

represents the node in the predicted graph closet to the location of ground truth node a(source) and the node

b^{'}

represents the node in the predicted graph closet to the location of ground truth node b(target). A higher value indicates that the road extraction result is closer to the real road map.

A P L S = 1 - \frac{1}{N} \sum min \{1, \frac{| L (a, b) - L (a^{'}, b^{'}) |}{L (a, b)}\} .

(7)

6.8. ECM

ECM evaluates object connectivity in remote sensing road extraction by quantifying pixel relationships based on entropy. It is defined by Equation (8), where

C^{i}

denotes the connectivity of the ith ground-truth instance,

α^{i}

denotes the completion of ith ground-truth instance,

M_{i}

is the number of predicted road-boundary instances,

p_{j}

is the dominance of the jth predicted instance, and N is the total number of ground-truth instances. The larger the value, the more connected the road network.

E C M = \sum_{j = 1}^{N} α_{i} e^{- C_{i}}, C_{i} = \sum_{j = 1}^{M_{i}} - p_{j} l o g (p_{j}) .

(8)

6.9. CC

The CC metric assesses the degree of connectivity among road pixels within the segmented road network. It is calculated using Equation (9), where

N_{c}

is the total number of connected road pixels, and

N_{t}

is the total number of road pixels in the segmented region. A higher value indicates that the connectivity of the extracted road network is higher.

C C = \frac{N_{c}}{N_{t}} .

(9)

7. Datasets

Numerous datasets dedicated to road extraction from remote sensing images have emerged. They play a pivotal role in model training.

Table 2 presents a chronologically organized overview of the details for various remote sensing road datasets. It includes information such as the size and resolution of the images within the datasets, as well as the number of images in the training, testing, and validation sets for each dataset. Additionally, the table provides information on the source of the datasets and the year of publication.

Figure 10 illustrates an overview of a subset of datasets. For each dataset, columns (a) and (b) are relatively simple graphs, while columns (c) and (d) are more complex graphs.

7.1. Massachusetts

The Massachusetts dataset is a diverse collection of remote sensing images designed for road extraction tasks. It covers a range of scene types, including urban, suburban, and rural areas. It also includes a variety of terrain and landform features. The moderate scale of the dataset makes it suitable for training medium-scale models without the burden of handling large amounts of data. However, the dataset also presents challenges, especially in dealing with occlusions between roads and adjacent objects, and in maintaining high accuracy in image recognition under different lighting and weather conditions.

7.2. DeepGlobe

The DeepGlobe dataset includes over 10,000 satellite images, providing a rich resource for road extraction tasks. It covers countries such as Thailand, Indonesia, and India, encompassing environments from urban and rural to coastal and tropical rainforest. This environmental variety is beneficial for developing robust road extraction algorithms that can adapt to different conditions. However, extracting roads from satellite imagery is challenging. Roads often appear as narrow strips in these images and can be mistaken for other linear features like rivers and railways.

7.3. SpaceNet

The SpaceNet dataset is constantly expanding and updating. Currently, there are eight versions, among which v3 and v5 are datasets specifically designed for road extraction. SpaceNet v3 covers Las Vegas, Paris, Shanghai, and Khartoum. Apart from these four cities, v5 has added Moscow, Mumbai, San Juan, and a mysterious city. Compared to v3, the SpaceNet v5 dataset has advantages in resolution, coverage, and data diversity.

In the SpaceNet dataset, detailed information such as road centerline, road type, pavement type, bridges, and number of lanes are covered. However, due to the perspective of satellite images, changes in lighting conditions and the complexity of urban environments, the roads in the dataset may appear narrow or difficult to identify, which increases the challenge of accurately extracting roads from remote sensing images.

7.4. CHN6-CUG

The CHN6-CUG dataset, compiled and shared by Zhu Qiqi’s team from the China University of Geosciences, is a suite of remote sensing images centered on urban road extraction in China. It features six distinct Chinese cities: Beijing’s Chaoyang District, Shanghai’s Yangpu District, the central region of Wuhan, Shenzhen’s Nanshan District, and the Sha Tin District of Hong Kong and Macau. The dataset includes meticulously annotated road information, encompassing both covered and uncovered roads, as well as a detailed classification of road types, such as railways, highways, urban streets, and rural paths.

8. Discussion

This review categorizes road extraction tasks from remote sensing images into three primary types based on the requirements for annotated information within the datasets: fully supervised learning, semi-supervised learning, and unsupervised learning. Table 3 summarizes the quantity of relevant literature, annotation requirements, advantages, and limitations of these methods.

Fully supervised learning methods employ comprehensive annotated information for model training, leading to the efficient extraction of road details and a significant enhancement in accuracy. This approach currently predominates in technological applications. In contrast, semi-supervised and unsupervised learning methods reduce the reliance on large-scale annotated datasets, significantly lowering the cost of data preparation and bolstering the generalization of the model to new datasets. Although these methods may not match the precision of fully supervised learning, they are capable of actively exploring and mining the implicit structures and features within the input images. However, given that these methods are still in their nascent stage, there is room for improvement in terms of segmentation accuracy and adaptability.

To provide a comprehensive and intuitive assessment of the performance of various models, this review employs a suite of common performance metrics for comparison. We selected two widely recognized datasets: Deepglobe and Massachusetts, and compared the performance of several representative models on these datasets. The comparative results for the Deepglobe dataset are summarized in Table 4, while the results for the Massachusetts dataset are presented in Table 5. Since the F1 score is a comprehensive indicator adopted by most methods, both tables are sorted in descending order based on the F1 score. This allows for a clearer view of the advanced methods in the current field.

Through the analysis of these two tables, a significant difference is observed in the F1 scores of RoadCT [104] on the DeepGlobe and Massachusetts datasets. There are two reasons for this phenomenon. Firstly, there are significant differences in geographic coverage, image resolution, road types, and complexity among datasets. Secondly, the generalization ability of the model is insufficient, resulting in performance degradation on new datasets. Therefore, the model we choose should be relevant to the characteristics of the selected dataset and the actual application requirements.

9. Conclusions

This review systematically collates deep learning algorithms employed in the field of road extraction from remote sensing images over the past 13 years. We examine a series of road extraction methods proposed in approximately 232 relevant articles and categorize the deep learning-based approaches into three primary categories based on the differing requirements for annotated datasets: fully supervised learning, semi-supervised learning, and unsupervised learning. For each category, we provide a comprehensive summary and in-depth analysis. In light of the literature analysis indicating that the majority of current methods still rely on fully supervised learning, we further subdivide the fully supervised learning approach into five subcategories and conduct a detailed comparison and analysis of the performance of each subcategory. Moreover, this review summarizes the evaluation metrics and datasets that are commonly utilized in the field.

Currently, models for road extraction from remote sensing images perform well in images taken under clear and well-lit conditions but struggle when faced with road occlusion, adverse weather, and other challenging scenarios. Major challenges include the complexity of remote sensing images, the high cost associated with data annotation, model generalization ability, and robustness. In the era of large-scale models and multimodal data, road extraction from remote sensing images holds significant importance. Therefore, this review looks forward to further research and development in the following aspects.

Multi-modal Data Fusion
As technology continues to progress, the effective fusion of multi-modal data from different sensors, such as remote sensing images, LiDAR images, and videos, is becoming a focal point of current research. The integration of multi-modal data not only offers a wealth of information but also addresses the limitations of relying on a single data source, leading to a better capture of road features. For example, LiDAR data can provide highly accurate terrain information, while high-definition video data are capable of capturing dynamic changes on the roads. The combination of various data modalities allows for more precise identification of road positions, shapes, and features, thus improving the robustness and generalization of road extraction models.
Semi-supervised Networks or Unsupervised Networks
Currently, most road extraction methods are based on fully supervised models, which rely on manually annotated datasets. This process is time-consuming, labor-intensive, and the annotated data are often limited in size, leading to potential performance issues when applied to other datasets. Therefore, the exploration of semi-supervised and unsupervised approaches, which aim to understand the internal structure of data or facilitate adaptive training without human annotation, remains a prominent research focus. Presently, methods based on GAN can automatically generate data annotations to bridge the gap between synthetic and real images, making it a significant direction for future research.
Adaptive Modeling in Complex Scenarios
The adaptability of road extraction models on remote sensing images is crucial when encountering complex scenarios. This adaptability enables models to effectively extract road information in diverse environments, including urban settings with building occlusions, tree cover, and uneven lighting conditions. By learning and understanding complex scenes, models can adapt to different geographical environments and remote sensing images conditions, thereby improving the accuracy and robustness of road extraction. Techniques such as multi-modal data fusion, data augmentation, and adversarial training can be employed to continuously enhance model structures and algorithms, enabling them to better adapt to various challenges and changes.
Lightweight Networks
Many road extraction methods, such as Graph-based and Transformer-based approaches, encounter challenges related to large computational requirements. Therefore, designing lightweight networks is necessary. Lightweight networks can significantly reduce model parameters and computational complexity while maintaining high accuracy. Leveraging knowledge distillation techniques, key knowledge about road features can be extracted from large and complex models and transferred to lightweight networks, enabling them to learn effective road feature representations.

Funding

The work was jointly supported by the National Science and Technology Major Project under grant No. 2022ZD0117103, the National Natural Science Foundation of China under grant No. 62272364, the Guangxi Key Laboratory of Trusted Software under grant No. KX202061, the provincial Key Research and Development Program of Shaanxi under grant No. 2024GH-ZDXM-47, and the Fundamental Research Funds for the Central Universities under grant No. XJSJ24021.

Data Availability Statement

The original contributions presented in the study are included in the article material, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Qian, D.; Wang, Y.; Zhang, X.; Zhao, D. Rationality Evaluation of Urban Road Network Plan Based on the EW-TOPSIS Method. In Proceedings of the 2021 13th International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), Beihai, China, 16–17 January 2021; pp. 840–844. [Google Scholar]
Liu, H.; Wang, Y. The apply of urban design in the detailed planning of residential areas. In Proceedings of the 2011 International Conference on Multimedia Technology, Hangzhou, China, 26–28 July 2011; pp. 4164–4166. [Google Scholar]
Qi, H.; Shi, J.; Chen, J.; Chi, C.; Shan, H. Research on the Complete Design, Construction and Management of Urban Road in Dalian City under the Concept of “People-Oriented Traffic”. In Proceedings of the 2020 5th International Conference on Electromechanical Control Technology and Transportation (ICECTT), Nanchang, China, 15–17 May 2020; pp. 457–460. [Google Scholar]
Cruz, G.G.L.; Litonjua, A.; Juan, A.N.P.S.; Libatique, N.J.; Tan, M.I.L.; Honrado, J.L.E. Motorcycle and Vehicle Detection for Applications in Road Safety and Traffic Monitoring Systems. In Proceedings of the 2022 IEEE Global Humanitarian Technology Conference (GHTC), Santa Clara, CA, USA, 8–11 September 2022; pp. 102–105. [Google Scholar]
Shao, Z.; Zheng, J.; Yue, G.; Yang, Y. Road Traffic Assignment Algorithm Based on Computer Vision. In Proceedings of the 2023 International Conference on Integrated Intelligence and Communication Systems (ICIICS), Kalaburagi, India, 24–25 November 2023; pp. 1–5. [Google Scholar]
Seid, S.; Zennaro, M.; Libsie, M.; Pietrosemoli, E.; Manzoni, P. A Low Cost Edge Computing and LoRaWAN Real Time Video Analytics for Road Traffic Monitoring. In Proceedings of the 2020 16th International Conference on Mobility, Sensing and Networking (MSN), Tokyo, Japan, 17–19 December 2020; pp. 762–767. [Google Scholar]
Wu, J.; Han, X.; Zhou, Y.; Yue, P.; Wang, X.; Lu, J.; Jiang, W.; Li, J.; Tang, H.; Wang, F.; et al. Disaster Monitoring and Emergency Response Services in China. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 3473–3476. [Google Scholar]
Huang, Y.; Wei, H.; Yang, J.; Wu, M. Damaged Road Extraction Based on Simulated Post-Disaster Remote Sensing Images. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 4684–4687. [Google Scholar]
Wang, J.; Qin, Q.; Zhao, J.; Ye, X.; Qin, X.; Yang, X.; Wang, J.; Zheng, X.; Sun, Y. A knowledge-based method for road damage detection using high-resolution remote sensing image. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 3564–3567. [Google Scholar]
Xu, Y.; Liu, S.; Peng, Y. Research and design of environmental monitoring and road lighting system based on the Internet of things. In Proceedings of the 2018 Chinese Automation Congress (CAC), Xi’an, China, 30 November–2 December 2018; pp. 1073–1078. [Google Scholar]
Wan, Y.; Hu, X.; Zhong, Y.; Ma, A.; Wei, L.; Zhang, L. Tailings Reservoir Disaster and Environmental Monitoring Using the UAV-ground Hyperspectral Joint Observation and Processing: A Case of Study in Xinjiang, the Belt and Road. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 9713–9716. [Google Scholar]
Dong, L. The Research on Model Framework of the Trunk Road Network Operation and Environmental Monitoring. In Proceedings of the 2012 2nd International Conference on Remote Sensing, Environment and Transportation Engineering, Nanjing, China, 1–3 June 2012; pp. 1–4. [Google Scholar]
Cheng, G.; Han, J. A survey on object detection in optical remote sensing images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef]
Hormese, J.; Saravanan, C. Automated road extraction from high resolution satellite images. Procedia Technol. 2016, 24, 1460–1467. [Google Scholar] [CrossRef]
Nidamanuri, R.R. Hyperspectral discrimination of tea plant varieties using machine learning, and spectral matching methods. Remote Sens. Appl. Soc. Environ. 2020, 19, 100350. [Google Scholar] [CrossRef]
Wang, W.; Yang, N.; Zhang, Y.; Wang, F.; Cao, T.; Eklund, P. A review of road extraction from remote sensing images. J. Traffic Transp. Eng. (Engl. Ed.) 2016, 3, 271–282. [Google Scholar] [CrossRef]
Lian, R.; Wang, W.; Mustafa, N.; Huang, L. Road extraction methods in high-resolution remote sensing images: A comprehensive review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5489–5507. [Google Scholar] [CrossRef]
Chen, Z.; Deng, L.; Luo, Y.; Li, D.; Junior, J.M.; Gonçalves, W.N.; Nurunnabi, A.A.M.; Li, J.; Wang, C.; Li, D. Road extraction in remote sensing data: A survey. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102833. [Google Scholar] [CrossRef]
Shrestha, A.; Mahmood, A. Review of deep learning algorithms and architectures. IEEE Access 2019, 7, 53040–53065. [Google Scholar] [CrossRef]
Abdollahi, A.; Pradhan, B.; Shukla, N.; Chakraborty, S.; Alamri, A. Deep learning approaches applied to remote sensing datasets for road extraction: A state-of-the-art review. Remote Sens. 2020, 12, 1444. [Google Scholar] [CrossRef]
Pruthi, J.; Dhingra, S. A Review of Research on Road Feature Extraction Through Remote Sensing Images Based on Deep Learning Algorithms. In Proceedings of the 2023 3rd International Conference on Innovative Sustainable Computational Technologies (CISCT), Dehradun, India, 8–9 September 2023; pp. 1–5. [Google Scholar]
Liu, P.; Wang, Q.; Yang, G.; Li, L.; Zhang, H. Survey of road extraction methods in remote sensing images based on deep learning. PFG- Photogramm. Remote Sens. Geoinf. Sci. 2022, 90, 135–159. [Google Scholar] [CrossRef]
Mo, S.; Shi, Y.; Yuan, Q.; Li, M. A Survey of Deep Learning Road Extraction Algorithms Using High-Resolution Remote Sensing Images. Sensors 2024, 24, 1708. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Popescu, A.A.; Gavat, I.; Datcu, M. Contextual Descriptors for Scene Classes in Very High Resolution SAR Images. IEEE Geosci. Remote Sens. Lett. 2012, 9, 80–84. [Google Scholar] [CrossRef]
Li, Y.; Zhang, R.; Wu, Y. Road network extraction in high-resolution SAR images based CNN features. In Proceedings of the 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 23–28 July 2017; pp. 1664–1667. [Google Scholar]
Saito, S.; Yamashita, T.; Aoki, Y. Multiple object extraction from aerial imagery with convolutional neural networks. Electron. Imaging 2016, 28, 010402-1–010402-9. [Google Scholar] [CrossRef]
Sun, G.; Yan, H. Ultra-High Resolution Image Segmentation with Efficient Multi-Scale Collective Fusion. In Proceedings of the 2022 IEEE International Conference on Visual Communications and Image Processing (VCIP), Suzhou, China, 13–16 December 2022; pp. 1–5. [Google Scholar]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Wei, Y.; Wang, Z.; Xu, M. Road Structure Refined CNN for Road Extraction in Aerial Image. IEEE Geosci. Remote Sens. Lett. 2017, 14, 709–713. [Google Scholar] [CrossRef]
Henry, C.; Azimi, S.M.; Merkle, N. Road Segmentation in SAR Satellite Images With Deep Fully Convolutional Neural Networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 1867–1871. [Google Scholar] [CrossRef]
Zhang, X.; Ma, W.; Li, C.; Wu, J.; Tang, X.; Jiao, L. Fully Convolutional Network-Based Ensemble Method for Road Extraction From Aerial Images. IEEE Geosci. Remote Sens. Lett. 2020, 17, 1777–1781. [Google Scholar] [CrossRef]
Li, P.; He, X.; Qiao, M.; Cheng, X.; Li, Z.; Luo, H.; Song, D.; Li, D.; Hu, S.; Li, R.; et al. Robust Deep Neural Networks for Road Extraction From Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 6182–6197. [Google Scholar] [CrossRef]
Chen, J.; Liu, X.; Liu, C.; Yang, Y.; Yang, S.; Zhang, Z. A Modified Convolutional Neural Network with Transfer Learning for Road Extraction from Remote Sensing Imagery. In Proceedings of the 2018 Chinese Automation Congress (CAC), Xi’an, China, 30 November–2 December 2018; pp. 4263–4267. [Google Scholar]
Babaali, K.O.; Zigh, E.; Djebbouri, M.; Chergui, O. A new approach for road extraction using data augmentation and semantic segmentation. Indones. J. Electr. Eng. Comput. Sci. 2022, 28, 1493–1501. [Google Scholar]
Kestur, R.; Farooq, S.; Abdal, R.; Mehraj, E.; Narasipura, O.; Mudigere, M. UFCN: A fully convolutional neural network for road extraction in RGB imagery acquired by remote sensing from an unmanned aerial vehicle. J. Appl. Remote Sens. 2018, 12, 016020. [Google Scholar] [CrossRef]
Liu, Y.; Yao, J.; Lu, X.; Xia, M.; Wang, X.; Liu, Y. RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes From High-Resolution Remotely Sensed Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 2043–2056. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Ding, L.; Bruzzone, L. DiResNet: Direction-Aware Residual Network for Road Extraction in VHR Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2021, 59, 10243–10254. [Google Scholar] [CrossRef]
Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic Snake Convolution Based on Topological Geometric Constraints for Tubular Structure Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 6070–6079. [Google Scholar]
Li, Y.; Guo, L.; Rao, J.; Xu, L.; Jin, S. Road Segmentation Based on Hybrid Convolutional Network for High-Resolution Visible Remote Sensing Image. IEEE Geosci. Remote Sens. Lett. 2019, 16, 613–617. [Google Scholar] [CrossRef]
Zhu, Q.; Zhang, Y.; Wang, L.; Zhong, Y.; Guan, Q.; Lu, X.; Zhang, L.; Li, D. A Global Context-aware and Batch-independent Network for road extraction from VHR satellite imagery. ISPRS J. Photogramm. Remote Sens. 2021, 175, 353–365. [Google Scholar] [CrossRef]
Yang, Z.; Zhou, D.; Yang, Y.; Zhang, J.; Chen, Z. Road Extraction From Satellite Imagery by Road Context and Full-Stage Feature. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
Gao, L.; Song, W.; Dai, J.; Chen, Y. Road extraction from high-resolution remote sensing imagery using refined deep residual convolutional neural network. Remote Sens. 2019, 11, 552. [Google Scholar] [CrossRef]
Wu, Q.; Luo, F.; Wu, P.; Wang, B.; Yang, H.; Wu, Y. Automatic Road Extraction from High-Resolution Remote Sensing Images Using a Method Based on Densely Connected Spatial Feature-Enhanced Pyramid. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 3–17. [Google Scholar] [CrossRef]
Hong, Z.; Ming, D.; Zhou, K.; Guo, Y.; Lu, T. Road Extraction From a High Spatial Resolution Remote Sensing Image Based on Richer Convolutional Features. IEEE Access 2018, 6, 46988–47000. [Google Scholar] [CrossRef]
Xu, Y.; Xie, Z.; Feng, Y.; Chen, Z. Road extraction from high-resolution remote sensing imagery using deep learning. Remote Sens. 2018, 10, 1461. [Google Scholar] [CrossRef]
Dong, S.; Chen, Z. Block Multi-Dimensional Attention for Road Segmentation in Remote Sensing Imagery. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar]
Xiao, D.; Yin, L.; Fu, Y. Open-Pit Mine Road Extraction From High-Resolution Remote Sensing Images Using RATT-UNet. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Dai, L.; Zhang, G.; Zhang, R. RADANet: Road Augmented Deformable Attention Network for Road Extraction From Complex High-Resolution Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
Mei, J.; Li, R.J.; Gao, W.; Cheng, M.M. CoANet: Connectivity Attention Network for Road Extraction From Satellite Imagery. IEEE Trans. Image Process. 2021, 30, 8540–8552. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Liu, H. Modulation Learning on Fourier-Domain for Road Extraction From Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
Patil, P.S.; Holambe, R.; Waghmare, L. An Attention Augmented Convolution-based Tiny-Residual UNet for Road Extraction. IEEE Trans. Artif. Intell. 2024, 1–14. [Google Scholar] [CrossRef]
Wang, Y.; Peng, Y.; Li, W.; Alexandropoulos, G.C.; Yu, J.; Ge, D.; Xiang, W. DDU-Net: Dual-Decoder-U-Net for Road Extraction Using High-Resolution Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
Luo, Z.; Zhou, K.; Tan, Y.; Wang, X.; Zhu, R.; Zhang, L. AD-RoadNet: An Auxiliary-Decoding Road Extraction Network Improving Connectivity While Preserving Multiscale Road Details. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 8049–8062. [Google Scholar] [CrossRef]
Sun, S.; Yang, Z.; Ma, T. Lightweight Remote Sensing Road Detection Network. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Sultonov, F.; Park, J.H.; Yun, S.; Lim, D.W.; Kang, J.M. Mixer U-Net: An improved automatic road extraction from UAV imagery. Appl. Sci. 2022, 12, 1953. [Google Scholar] [CrossRef]
Han, X.; Liu, Y.; Liu, G.; Lin, Y.; Liu, Q. LOANet: A lightweight network using object attention for extracting buildings and roads from UAV aerial remote sensing images. PeerJ Comput. Sci. 2023, 9, e1467. [Google Scholar] [CrossRef]
Hao, S.; Wang, W.; Salzmann, M. Geometry-Aware Deep Recurrent Neural Networks for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 2448–2460. [Google Scholar] [CrossRef]
Ge, Z.; Zhao, Y.; Wang, J.; Wang, D.; Si, Q. Deep Feature-Review Transmit Network of Contour-Enhanced Road Extraction From Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Qiu, L.; Yu, D.; Zhang, C.; Zhang, X. A Semantics-Geometry Framework for Road Extraction From Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
Luo, H.; Wang, Z.; Du, B.; Dong, Y. A Deep Cross-Modal Fusion Network for Road Extraction With High-Resolution Imagery and LiDAR Data. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
Cheng, G.; Wang, Y.; Xu, S.; Wang, H.; Xiang, S.; Pan, C. Automatic Road Detection and Centerline Extraction via Cascaded End-to-End Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3322–3337. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Gao, X.; Sun, X.; Zhang, Y.; Yan, M.; Xu, G.; Sun, H.; Jiao, J.; Fu, K. An End-to-End Neural Network for Road Extraction From Remote Sensing Imagery by Multiple Feature Pyramid Network. IEEE Access 2018, 6, 39401–39414. [Google Scholar] [CrossRef]
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495. [Google Scholar] [CrossRef] [PubMed]
Panboonyuen, T.; Vateekul, P.; Jitkajornwanich, K.; Lawawirojwong, S. An enhanced deep convolutional encoder-decoder network for road segmentation on aerial imagery. In Proceedings of the Recent Advances in Information and Communication Technology 2017: Proceedings of the 13th International Conference on Computing and Information Technology (IC2IT), Bangkok, Thailand, 6–7 July 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 191–201. [Google Scholar]
Panboonyuen, T.; Jitkajornwanich, K.; Lawawirojwong, S.; Srestasathiern, P.; Vateekul, P. Road segmentation of remotely-sensed images using deep convolutional neural networks with landscape metrics and conditional random fields. Remote Sens. 2017, 9, 680. [Google Scholar] [CrossRef]
Zhao, S.; Feng, Z.; Chen, L.; Li, G. DANet: A Semantic Segmentation Network for Remote Sensing of Roads Based on Dual-ASPP Structure. Electronics 2023, 12, 3243. [Google Scholar] [CrossRef]
Akhtar, N.; Mandloi, M. DenseResSegnet: A Dense Residual Segnet for Road Detection Using Remote Sensing Images. In Proceedings of the 2023 International Conference on Machine Intelligence for GeoAnalytics and Remote Sensing (MIGARS), Hyderabad, India, 27–29 January 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–4. [Google Scholar]
Chaurasia, A.; Culurciello, E. LinkNet: Exploiting encoder representations for efficient semantic segmentation. In Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP), St. Petersburg, FL, USA, 10–13 December 2017; pp. 1–4. [Google Scholar]
Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet with Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 182–186. [Google Scholar]
Li, Y.; Peng, B.; Fan, K.; Yuan, L.; Tong, L.; He, L. New Neural Network and an Image Postprocessing Method for High Resolution Satellite Imagery Road Extraction. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 3935–3938. [Google Scholar]
Xie, Y.; Miao, F.; Zhou, K.; Peng, J. HsgNet: A road extraction network based on global perception of high-order spatial information. ISPRS Int. J. Geo-Inf. 2019, 8, 571. [Google Scholar] [CrossRef]
Deng, Y.; Yang, J.; Liang, C.; Jing, Y. Spd-Linknet: Upgraded D-Linknet with Strip Pooling for Road Extraction. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 2190–2193. [Google Scholar]
Wang, Q.; Bai, H.; He, C.; Cheng, J. Fe-LinkNet: Enhanced D-LinkNet with Attention and Dense Connection for Road Extraction in High-Resolution Remote Sensing Images. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 3043–3046. [Google Scholar]
Lu, X.; Zhong, Y.; Zheng, Z. A Novel Global-Aware Deep Network for Road Detection of Very High Resolution Remote Sensing Imagery. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 2579–2582. [Google Scholar]
Jie, Y.; He, H.; Xing, K.; Yue, A.; Tan, W.; Yue, C.; Jiang, C.; Chen, X. MECA-Net: A MultiScale feature encoding and long-range context-aware network for road extraction from remote sensing images. Remote Sens. 2022, 14, 5342. [Google Scholar] [CrossRef]
Wu, K.Y.; Wang, X.; Zhou, J.J.; Wang, X.F.; Fan, Y.P.; Yao, M. An Improved D-Linknet Method for Road Extraction from High Resolution Remote Sensing Images. In Proceedings of the 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, 22–24 October 2021; pp. 175–180. [Google Scholar]
Wang, Y.; Seo, J.; Jeon, T. NL-LinkNet: Toward Lighter But More Accurate Road Extraction With Nonlocal Operations. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Sun, T.; Di, Z.; Che, P.; Liu, C.; Wang, Y. Leveraging crowdsourced GPS data for road extraction from aerial imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7509–7518. [Google Scholar]
Liu, L.; Yang, Z.; Li, G.; Wang, K.; Chen, T.; Lin, L. Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust Road Extraction. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 3308–3322. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Zhu, Q.; Zhong, Y.; Guan, Q.; Zhang, L.; Li, D. A Modified D-Linknet with Transfer Learning for Road Extraction from High-Resolution Remote Sensing. In Proceedings of the IGARSS 2020—2020 IEEE International Geoscience and Remote Sensing Symposium, Waikoloa, HI, USA, 26 September–2 October 2020; pp. 1817–1820. [Google Scholar]
Wu, K.; Cai, F. Dual Attention D-LinkNet for Road Segmentation in Remote Sensing Images. In Proceedings of the 2022 IEEE 14th International Conference on Advanced Infocomm Technology (ICAIT), Chongqing, China, 8–11 July 2022; pp. 304–307. [Google Scholar]
Ai, J.; Hou, S.; Wu, M.; Chen, B.; Yan, H. MPGSE-D-LinkNet: Multiple-Parameters-Guided Squeeze-and-Excitation Integrated D-LinkNet for Road Extraction in Remote Sensing Imagery. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
Hu, J.; Gao, J.; Yuan, Y.; Chanussot, J.; Wang, Q. LGNet: Location-Guided Network for Road Extraction From Satellite Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–12. [Google Scholar] [CrossRef]
Yang, J.; Gu, Z.; Wu, T.; Ahmed, Y.A.E. RUW-Net: A Dual Codec Network for Road Extraction From Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 1550–1564. [Google Scholar] [CrossRef]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv 2014, arXiv:1412.7062. [Google Scholar]
Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Lin, Y.; Xu, D.; Wang, N.; Shi, Z.; Chen, Q. Road extraction from very-high-resolution remote sensing images via a nested SE-Deeplab model. Remote Sens. 2020, 12, 2985. [Google Scholar] [CrossRef]
Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
Yang, C.; Wang, Z. An Ensemble Wasserstein Generative Adversarial Network Method for Road Extraction From High Resolution Remote Sensing Images in Rural Areas. IEEE Access 2020, 8, 174317–174324. [Google Scholar] [CrossRef]
Abdollahi, A.; Pradhan, B.; Sharma, G.; Maulud, K.N.A.; Alamri, A. Improving Road Semantic Segmentation Using Generative Adversarial Network. IEEE Access 2021, 9, 64381–64392. [Google Scholar] [CrossRef]
Zhang, Y.; Xiong, Z.; Zang, Y.; Wang, C.; Li, J.; Li, X. Topology-aware road network extraction via multi-supervised generative adversarial networks. Remote Sens. 2019, 11, 1017. [Google Scholar] [CrossRef]
Máttyus, G.; Luo, W.; Urtasun, R. Deeproadmapper: Extracting road topology from aerial images. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3438–3446. [Google Scholar]
Zao, Y.; Zou, Z.; Shi, Z. Topology-Guided Road Graph Extraction From Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
Tan, Y.Q.; Gao, S.H.; Li, X.Y.; Cheng, M.M.; Ren, B. Vecroad: Point-based iterative graph exploration for road graphs extraction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8910–8918. [Google Scholar]
Lian, R.; Huang, L. DeepWindow: Sliding Window Based on Deep Learning for Road Extraction From Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1905–1916. [Google Scholar] [CrossRef]
Xu, Z.; Sun, Y.; Liu, M. icurb: Imitation learning-based detection of road curbs using aerial images for autonomous driving. IEEE Robot. Autom. Lett. 2021, 6, 1097–1104. [Google Scholar] [CrossRef]
Castrejon, L.; Kundu, K.; Urtasun, R.; Fidler, S. Annotating object instances with a polygon-rnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5230–5238. [Google Scholar]
Hu, Y.; Wang, Z.; Huang, Z.; Liu, Y. PolyRoad: Polyline Transformer for Topological Road-Boundary Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–12. [Google Scholar] [CrossRef]
Liu, W.; Gao, S.; Zhang, C.; Yang, B. RoadCT: A Hybrid CNN-Transformer Network for Road Extraction From Satellite Imagery. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Meng, Q.; Zhou, D.; Zhang, X.; Yang, Z.; Chen, Z. Road Extraction from Remote Sensing Images via Channel Attention and Multi-Layer Axial Transformer. IEEE Geosci. Remote Sens. Lett. 2024, 21, 5504705. [Google Scholar] [CrossRef]
Yuan, Z.; Mou, L.; Hua, Y.; Zhu, X.X. RRSIS: Referring Remote Sensing Image Segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–12. [Google Scholar] [CrossRef]
Ma, X.; Zhang, X.; Pun, M.O. Rs3mamba: Visual state space model for remote sensing images semantic segmentation. arXiv 2024, arXiv:2404.02457. [Google Scholar]
Zhu, Q.; Cai, Y.; Fang, Y.; Yang, Y.; Chen, C.; Fan, L.; Nguyen, A. Samba: Semantic segmentation of remotely sensed images with state space model. arXiv 2024, arXiv:2404.01705. [Google Scholar]
Cheng, G.; Zhu, F.; Xiang, S.; Pan, C. Road Centerline Extraction via Semisupervised Segmentation and Multidirection Nonmaximum Suppression. IEEE Geosci. Remote Sens. Lett. 2016, 13, 545–549. [Google Scholar] [CrossRef]
Bonafilia, D.; Gill, J.; Basu, S.; Yang, D. Building high resolution maps for humanitarian aid and development with weakly-and semi-supervised learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 15–20 June 2019; pp. 1–9. [Google Scholar]
Hu, A.; Chen, S.; Wu, L.; Xie, Z.; Qiu, Q.; Xu, Y. WSGAN: An improved generative adversarial network for remote sensing image road network extraction by weakly supervised processing. Remote Sens. 2021, 13, 2506. [Google Scholar] [CrossRef]
Hetang, C.; Xue, H.; Le, C.; Yue, T.; Wang, W.; He, Y. Segment Anything Model for Road Network Graph Extraction. arXiv 2024, arXiv:2403.16051. [Google Scholar]
Zhang, Q.; Zhang, J.; Liu, W.; Tao, D. Category anchor-guided unsupervised domain adaptation for semantic segmentation. Adv. Neural Inf. Process. Syst. 2019, 32, 433–443. [Google Scholar]
He, S.; Bastani, F.; Jagwani, S.; Alizadeh, M.; Balakrishnan, H.; Chawla, S.; Elshrif, M.M.; Madden, S.; Sadeghi, M.A. Sat2graph: Road graph extraction through graph-tensor encoding. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Part XXIV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 51–67. [Google Scholar]
Ren, Y.; Yu, Y.; Guan, H. DA-CapsUNet: A dual-attention capsule U-Net for road extraction from remote sensing imagery. Remote Sens. 2020, 12, 2866. [Google Scholar] [CrossRef]
Wu, S.; Du, C.; Chen, H.; Xu, Y.; Guo, N.; Jing, N. Road extraction from very high resolution images using weakly labeled OpenStreetMap centerline. ISPRS Int. J. Geo-Inf. 2019, 8, 478. [Google Scholar] [CrossRef]
Chen, W.; Wu, A.N.; Biljecki, F. Classification of urban morphology with deep learning: Application on urban vitality. Comput. Environ. Urban Syst. 2021, 90, 101706. [Google Scholar] [CrossRef]
Yuan, J.; Wang, D.; Wu, B.; Yan, L.; Li, R. LEGION-Based Automatic Road Extraction From Satellite Imagery. IEEE Trans. Geosci. Remote Sens. 2011, 49, 4528–4538. [Google Scholar] [CrossRef]
Fu, C.; Chen, Y.; Tong, L.; Jia, M.; Tan, L.; Ji, X. Road damage information extraction using high-resolution SAR imagery. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium, Quebec City, QC, Canada, 13–18 July 2014; pp. 1836–1838. [Google Scholar]
Alshehhi, R.; Marpu, P.R.; Woon, W.L.; Mura, M.D. Simultaneous extraction of roads and buildings in remote sensing imagery with convolutional neural networks. ISPRS J. Photogramm. Remote Sens. 2017, 130, 139–149. [Google Scholar] [CrossRef]
Chen, Z.; Fan, W.; Zhong, B.; Li, J.; Du, J.; Wang, C. Corse-to-Fine Road Extraction Based on Local Dirichlet Mixture Models and Multiscale-High-Order Deep Learning. IEEE Trans. Intell. Transp. Syst. 2020, 21, 4283–4293. [Google Scholar] [CrossRef]
Li, P.; Zang, Y.; Wang, C.; Li, J.; Cheng, M.; Luo, L.; Yu, Y. Road network extraction via deep learning and line integral convolution. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 1599–1602. [Google Scholar]
Liu, R.; Miao, Q.; Song, J.; Quan, Y.; Li, Y.; Xu, P.; Dai, J. Multiscale road centerlines extraction from high-resolution aerial imagery. Neurocomputing 2019, 329, 384–396. [Google Scholar] [CrossRef]
Zhong, Z.; Li, J.; Cui, W.; Jiang, H. Fully convolutional networks for building and road extraction: Preliminary results. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; pp. 1591–1594. [Google Scholar]
Fu, G.; Liu, C.; Zhou, R.; Sun, T.; Zhang, Q. Classification for high resolution remote sensing imagery using a fully convolutional network. Remote Sens. 2017, 9, 498. [Google Scholar] [CrossRef]
Varia, N.; Dokania, A.; Senthilnath, J. DeepExt: A Convolution Neural Network for Road Extraction using RGB images captured by UAV. In Proceedings of the 2018 IEEE Symposium Series on Computational Intelligence (SSCI), Bangalore, India, 18–21 November 2018; pp. 1890–1895. [Google Scholar]
Chen, J.; Yang, L.; Wang, H.; Zhu, J.; Sun, G.; Dai, X.; Deng, M.; Shi, Y. Road extraction from high-resolution remote sensing images via local and global context reasoning. Remote Sens. 2023, 15, 4177. [Google Scholar] [CrossRef]
Zhang, Y.; Xia, G.; Wang, J.; Lha, D. A Multiple Feature Fully Convolutional Network for Road Extraction From High-Resolution Remote Sensing Image Over Mountainous Areas. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1600–1604. [Google Scholar] [CrossRef]
Pan, D.; Zhang, M.; Zhang, B. A Generic FCN-Based Approach for the Road-Network Extraction From VHR Remote Sensing Images—Using OpenStreetMap as Benchmarks. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2662–2673. [Google Scholar] [CrossRef]
Lu, X.; Zhong, Y.; Zheng, Z.; Chen, D.; Su, Y.; Ma, A.; Zhang, L. Cascaded Multi-Task Road Extraction Network for Road Surface, Centerline, and Edge Extraction. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
Wei, Y.; Zhang, K.; Ji, S. Simultaneous Road Surface and Centerline Extraction From Large-Scale Remote Sensing Images Using CNN-Based Segmentation and Tracing. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8919–8931. [Google Scholar] [CrossRef]
Mosinska, A.; Marquez-Neila, P.; Koziński, M.; Fua, P. Beyond the pixel-wise loss for topology-aware delineation. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3136–3145. [Google Scholar]
He, H.; Yang, D.; Wang, S.; Wang, S.; Li, Y. Road extraction by using atrous spatial pyramid pooling integrated encoder-decoder network and structural similarity loss. Remote Sens. 2019, 11, 1015. [Google Scholar] [CrossRef]
Constantin, A.; Ding, J.J.; Lee, Y.C. Accurate Road Detection from Satellite Images Using Modified U-Net. In Proceedings of the 2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Chengdu, China, 26–30 October 2018; pp. 423–426. [Google Scholar]
Buslaev, A.; Seferbekov, S.; Iglovikov, V.; Shvets, A. Fully Convolutional Network for Automatic Road Extraction from Satellite Imagery. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 207–210. [Google Scholar]
Xin, J.; Zhang, X.; Zhang, Z.; Fang, W. Road extraction of high-resolution remote sensing images derived from DenseUNet. Remote Sens. 2019, 11, 2499. [Google Scholar] [CrossRef]
Tan, X.; Xiao, Z.; Wan, Q.; Shao, W. Scale Sensitive Neural Network for Road Segmentation in High-Resolution Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2021, 18, 533–537. [Google Scholar] [CrossRef]
Hu, L.; Niu, C.; Ren, S.; Dong, M.; Zheng, C.; Zhang, W.; Liang, J. Discriminative Context-Aware Network for Target Extraction in Remote Sensing Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 700–715. [Google Scholar] [CrossRef]
Doshi, J. Residual Inception Skip Network for Binary Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 206–2063. [Google Scholar]
Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
Wang, G.; Yang, W.; Ning, K.; Peng, J. DFC-UNet: A U-Net-Based Method for Road Extraction From Remote Sensing Images Using Densely Connected Features. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Akhtarmanesh, A.; Abbasi-Moghadam, D.; Sharifi, A.; Yadkouri, M.H.; Tariq, A.; Lu, L. Road Extraction From Satellite Images Using Attention-Assisted UNet. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 1126–1136. [Google Scholar] [CrossRef]
Wang, R.; Wei, H.; Wang, A.; Chen, J.W.; Huo, C.; Niu, Y. Robust Road Detection on High-Resolution Remote Sensing Images with Occlusion by a Dual-Decoded UNet. In Proceedings of the IGARSS 2023—2023 IEEE International Geoscience and Remote Sensing Symposium, Pasadena, CA, USA, 16–21 July 2023; pp. 5716–5719. [Google Scholar]
Xu, Y.; Feng, Y.; Xie, Z.; Hu, A.; Zhang, X. A Research on Extracting Road Network from High Resolution Remote Sensing Imagery. In Proceedings of the 2018 26th International Conference on Geoinformatics, Kunming, China, 28–30 June 2018; pp. 1–4. [Google Scholar]
Fan, J.; Yang, Z. Deep Residual Network Based Road Detection Algorithm for Remote Sensing Images. In Proceedings of the 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Harbin, China, 25–27 December 2020; pp. 1723–1726. [Google Scholar]
Shao, C.; Li, H.; Shen, H. MCTN-Net: A Multiclass Transportation Network Extraction Method Combining Orientation and Semantic Features. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Wang, S.; Wang, Z.; Ruan, S.; Han, H.; Xiong, K.; Yuan, H.; Yuan, Z.; Li, G.; Bao, J.; Zheng, Y. DelvMap: Completing Residential Roads in Maps Based on Couriers’ Trajectories and Satellite Imagery. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–14. [Google Scholar] [CrossRef]
Liu, D.; Zhang, J.; Liu, K.; Zhang, Y. Aerial Remote Sensing Image Cascaded Road Detection Network Based on Edge Sensing Module and Attention Module. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar]
Lin, Y.; Jin, F.; Wang, D.; Wang, S.; Liu, X. Dual-Task Network for Road Extraction From High-Resolution Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 66–78. [Google Scholar] [CrossRef]
Liu, D.; Zhang, J.; Qi, Y.; Zhang, Y. A Lightweight Road Detection Algorithm Based on Multiscale Convolutional Attention Network and Coupled Decoder Head. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
Guo, H.; Su, X.; Wu, C.; Du, B.; Zhang, L. Building-Road Collaborative Extraction From Remote Sensing Images via Cross-Task and Cross-Scale Interaction. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–16. [Google Scholar] [CrossRef]
Yu, Y.; Wang, J.; Guan, H.; Jin, S.; Zhang, Y.; Yu, C.; Tang, E.; Xiao, S.; Li, J. CS-CapsFPN: A Context-Augmentation and Self-Attention Capsule Feature Pyramid Network for Road Network Extraction from Remote Sensing Imagery. Can. J. Remote Sens. 2021, 47, 499–517. [Google Scholar] [CrossRef]
Wulamu, A.; Shi, Z.; Zhang, D.; He, Z. Multiscale Road Extraction in Remote Sensing Images. Comput. Intell. Neurosci. 2019, 2019, 2373798. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Hu, Q.; Li, J.; Ai, M. Learning From GPS Trajectories of Floating Car for CNN-Based Urban Road Extraction With High-Resolution Satellite Imagery. IEEE Trans. Geosci. Remote Sens. 2021, 59, 1836–1847. [Google Scholar] [CrossRef]
Li, S.; Liao, C.; Ding, Y.; Hu, H.; Jia, Y.; Chen, M.; Xu, B.; Ge, X.; Liu, T.; Wu, D. Cascaded residual attention enhanced road extraction from remote sensing images. ISPRS Int. J. Geo-Inf. 2022, 11, 9. [Google Scholar] [CrossRef]
Weng, Y.; Huang, X.; Chen, X.; He, J.; Li, Z.; Yi, H. Research on Railway Track Extraction Method Based on Edge Detection and Attention Mechanism. IEEE Access 2024, 12, 26550–26561. [Google Scholar] [CrossRef]
Huan, H.; Sheng, Y.; Zhang, Y.; Liu, Y. Strip attention networks for road extraction. Remote Sens. 2022, 14, 4516. [Google Scholar] [CrossRef]
Lourenço, M.; Estima, D.; Oliveira, H.; Oliveira, L.; Mora, A. Automatic rural road centerline detection and extraction from aerial images for a forest fire decision support system. Remote Sens. 2023, 15, 271. [Google Scholar] [CrossRef]
Xu, Q.; Long, C.; Yu, L.; Zhang, C. Road Extraction With Satellite Images and Partial Road Maps. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
Shi, Q.; Liu, X.; Li, X. Road Detection From Remote Sensing Images by Generative Adversarial Networks. IEEE Access 2018, 6, 25486–25494. [Google Scholar] [CrossRef]
Zhang, X.; Han, X.; Li, C.; Tang, X.; Zhou, H.; Jiao, L. Aerial image road extraction based on an improved generative adversarial network. Remote Sens. 2019, 11, 930. [Google Scholar] [CrossRef]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning. PMLR, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
Cira, C.I.; Manso-Callejo, M.Á.; Alcarria, R.; Fernandez Pareja, T.; Bordel Sanchez, B.; Serradilla, F. Generative learning for postprocessing semantic segmentation predictions: A lightweight conditional generative adversarial network based on Pix2pix to improve the extraction of road surface areas. Land 2021, 10, 79. [Google Scholar] [CrossRef]
Tao, Y.; Xu, M.; Zhong, Y.; Cheng, Y. GAN-assisted two-stream neural network for high-resolution remote sensing image classification. Remote Sens. 2017, 9, 1328. [Google Scholar] [CrossRef]
Costea, D.; Marcu, A.; Slusanschi, E.; Leordeanu, M. Creating roadmaps in aerial images with generative adversarial networks and smoothing-based optimization. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 2100–2109. [Google Scholar]
Liu, R.; Li, F.; Jiang, W.; Song, C.; Chen, Q.; Li, Z. Generating Pixel Enhancement for Road Extraction in High-Resolution Aerial Images. IEEE Trans. Intell. Veh. 2024, 1–13. [Google Scholar] [CrossRef]
Li, Y.; Peng, B.; He, L.; Fan, K.; Tong, L. Road Segmentation of Unmanned Aerial Vehicle Remote Sensing Images Using Adversarial Network With Multiscale Context Aggregation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 2279–2287. [Google Scholar] [CrossRef]
Lin, S.; Yao, X.; Liu, X.; Wang, S.; Chen, H.M.; Ding, L.; Zhang, J.; Chen, G.; Mei, Q. MS-AGAN: Road Extraction via Multi-Scale Information Fusion and Asymmetric Generative Adversarial Networks from High-Resolution Remote Sensing Images under Complex Backgrounds. Remote Sens. 2023, 15, 3367. [Google Scholar] [CrossRef]
Shamsolmoali, P.; Zareapoor, M.; Zhou, H.; Wang, R.; Yang, J. Road Segmentation for Remote Sensing Images Using Adversarial Spatial Pyramid Networks. IEEE Trans. Geosci. Remote Sens. 2021, 59, 4673–4688. [Google Scholar] [CrossRef]
Xu, J.; Xu, B.; Xia, G.S.; Dong, L.; Xue, N. Patched Line Segment Learning for Vector Road Mapping. Proc. AAAI Conf. Artif. Intell. 2024, 38, 6288–6296. [Google Scholar] [CrossRef]
Xu, Z.; Liu, Y.; Gan, L.; Hu, X.; Sun, Y.; Liu, M.; Wang, L. csBoundary: City-Scale Road-Boundary Detection in Aerial Images for High-Definition Maps. IEEE Robot. Autom. Lett. 2022, 7, 5063–5070. [Google Scholar] [CrossRef]
Zao, Y.; Zou, Z.; Shi, Z. Road Graph Extraction via Transformer and Topological Representation. IEEE Geosci. Remote Sens. Lett. 2024, 21, 2502205. [Google Scholar] [CrossRef]
Li, T.; Ye, S.; Li, R.; Fu, Y.; Yang, G.; Pan, Z. Topology-aware Road Extraction via Multi-task Learning for Autonomous Driving. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 2275–2281. [Google Scholar]
Wu, Z.; Zhang, J.; Zhang, L.; Liu, X.; Qiao, H. Bi-HRNet: A road extraction framework from satellite imagery based on node heatmap and bidirectional connectivity. Remote Sens. 2022, 14, 1732. [Google Scholar] [CrossRef]
Chen, X.; Sun, Q.; Guo, W.; Qiu, C.; Yu, A. GA-Net: A geometry prior assisted neural network for road extraction. Int. J. Appl. Earth Obs. Geoinf. 2022, 114, 103004. [Google Scholar] [CrossRef]
Zhang, J.; Hu, X.; Wei, Y.; Zhang, L. Road Topology Extraction From Satellite Imagery by Joint Learning of Nodes and Their Connectivity. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
Liu, G.; Shan, Z.; Meng, Y.; Akbar, T.A.; Ye, S. RDPGNet: A road extraction network with dual-view information perception based on GCN. J. King Saud Univ.-Comput. Inf. Sci. 2024, 36, 102009. [Google Scholar] [CrossRef]
Zhou, G.; Chen, W.; Gui, Q.; Li, X.; Wang, L. Split Depth-Wise Separable Graph-Convolution Network for Road Extraction in Complex Environments From High-Resolution Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
Bastani, F.; He, S.; Abbar, S.; Alizadeh, M.; Balakrishnan, H.; Chawla, S.; Madden, S.; DeWitt, D. RoadTracer: Automatic Extraction of Road Networks from Aerial Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4720–4728. [Google Scholar]
Xu, Z.; Liu, Y.; Gan, L.; Sun, Y.; Wu, X.; Liu, M.; Wang, L. Rngdet: Road network graph detection by transformer in aerial images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
Xu, Z.; Liu, Y.; Sun, Y.; Liu, M.; Wang, L. RNGDet++: Road Network Graph Detection by Transformer With Instance Segmentation and Multi-Scale Features Enhancement. IEEE Robot. Autom. Lett. 2023, 8, 2991–2998. [Google Scholar] [CrossRef]
Cheng, M.; Zhao, K.; Guo, X.; Xu, Y.; Guo, J. Joint Topology-Preserving and Feature-Refinement Network for Curvilinear Structure Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 7147–7156. [Google Scholar]
Acuna, D.; Ling, H.; Kar, A.; Fidler, S. Efficient interactive annotation of segmentation datasets with polygon-rnn++. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 859–868. [Google Scholar]
Yang, B.; Zhang, M.; Zhang, Z.; Zhang, Z.; Hu, X. TopDiG: Class-Agnostic Topological Directional Graph Extraction From Remote Sensing Images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 1265–1274. [Google Scholar]
Wang, R.; Cai, M.; Xia, Z.; Zhou, Z. Remote Sensing Image Road Segmentation Method Integrating CNN-Transformer and UNet. IEEE Access 2023, 11, 144446–144455. [Google Scholar] [CrossRef]
Li, K.; Wang, D.; Wang, X.; Liu, G.; Wu, Z.; Wang, Q. Mixing Self-Attention and Convolution: A Unified Framework for Multisource Remote Sensing Data Classification. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–16. [Google Scholar] [CrossRef]
Jamali, A.; Roy, S.K.; Li, J.; Ghamisi, P. Neighborhood Attention Makes the Encoder of ResUNet Stronger for Accurate Road Extraction. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1–5. [Google Scholar] [CrossRef]
Luo, L.; Wang, J.X.; Chen, S.B.; Tang, J.; Luo, B. BDTNet: Road Extraction by Bi-Direction Transformer From Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar]
Hu, P.C.; Chen, S.B.; Huang, L.L.; Wang, G.Z.; Tang, J.; Luo, B. Road Extraction by Multiscale Deformable Transformer From Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar]
Wang, C.; Xu, R.; Xu, S.; Meng, W.; Wang, R.; Zhang, J.; Zhang, X. Toward Accurate and Efficient Road Extraction by Leveraging the Characteristics of Road Shapes. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–16. [Google Scholar] [CrossRef]
Deng, F.; Luo, W.; Ni, Y.; Wang, X.; Wang, Y.; Zhang, G. UMiT-Net: A U-Shaped Mix-Transformer Network for Extracting Precise Roads Using Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
Ge, C.; Nie, Y.; Kong, F.; Xu, X. Improving Road Extraction for Autonomous Driving Using Swin Transformer Unet. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1216–1221. [Google Scholar]
Yang, Z.; Zhou, D.; Yang, Y.; Zhang, J.; Chen, Z. TransRoadNet: A Novel Road Extraction Method for Remote Sensing Images via Combining High-Level Semantic Feature and Context. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar]
Zhang, X.; Ma, X.; Yang, Z.; Liu, X.; Chen, Z. A Context-Aware Road Extraction Method for Remote Sensing Imagery Based on Transformer Network. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar]
Yang, Z.X.; You, Z.H.; Chen, S.B.; Tang, J.; Luo, B. Semisupervised Edge-Aware Road Extraction via Cross Teaching Between CNN and Transformer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 8353–8362. [Google Scholar] [CrossRef]
Liu, Y.; Tian, Y.; Zhao, Y.; Yu, H.; Xie, L.; Wang, Y.; Ye, Q.; Liu, Y. Vmamba: Visual state space model. arXiv 2024, arXiv:2401.10166. [Google Scholar]
Chen, K.; Chen, B.; Liu, C.; Li, W.; Zou, Z.; Shi, Z. Rsmamba: Remote sensing image classification with state space model. arXiv 2024, arXiv:2403.19654. [Google Scholar] [CrossRef]
Zhao, S.; Chen, H.; Zhang, X.; Xiao, P.; Bai, L.; Ouyang, W. Rs-mamba for large remote sensing image dense prediction. arXiv 2024, arXiv:2404.02668. [Google Scholar]
Van Engelen, J.E.; Hoos, H.H. A survey on semi-supervised learning. Mach. Learn. 2020, 109, 373–440. [Google Scholar] [CrossRef]
Xia, W.; Zhang, Y.Z.; Liu, J.; Luo, L.; Yang, K. Road extraction from high resolution image with deep convolution network-A case study of GF-2 image. Proceedings 2018, 2, 325. [Google Scholar]
He, Y.; Wang, J.; Liao, C.; Shan, B.; Zhou, X. ClassHyPer: ClassMix-based hybrid perturbations for deep semi-supervised semantic segmentation of remote sensing imagery. Remote Sens. 2022, 14, 879. [Google Scholar] [CrossRef]
Han, X.; Lu, J.; Zhao, C.; You, S.; Li, H. Semisupervised and Weakly Supervised Road Detection Based on Generative Adversarial Networks. IEEE Signal Process. Lett. 2018, 25, 551–555. [Google Scholar] [CrossRef]
Chen, H.; Li, Z.; Wu, J.; Xiong, W.; Du, C. SemiRoadExNet: A semi-supervised network for road extraction from remote sensing imagery via adversarial learning. ISPRS J. Photogramm. Remote Sens. 2023, 198, 169–183. [Google Scholar] [CrossRef]
Xiao, R.; Wang, Y.; Tao, C. Fine-Grained Road Scene Understanding From Aerial Images Based on Semisupervised Semantic Segmentation Networks. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
You, Z.H.; Wang, J.X.; Chen, S.B.; Tang, J.; Luo, B. FMWDCT: Foreground Mixup Into Weighted Dual-Network Cross Training for Semisupervised Remote Sensing Road Extraction. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 5570–5579. [Google Scholar] [CrossRef]
Deng, X.; Yang, H.L.; Makkar, N.; Lunga, D. Large Scale Unsupervised Domain Adaptation of Segmentation Networks with Adversarial Learning. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 4955–4958. [Google Scholar]
Wang, J.; HQ Ding, C.; Chen, S.; He, C.; Luo, B. Semi-supervised remote sensing image semantic segmentation via consistency regularization and average update of pseudo-label. Remote Sens. 2020, 12, 3603. [Google Scholar] [CrossRef]
Wang, J.X.; Chen, S.B.; Ding, C.H.Q.; Tang, J.; Luo, B. RanPaste: Paste Consistency and Pseudo Label for Semisupervised Remote Sensing Image Semantic Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–16. [Google Scholar] [CrossRef]
Chen, H.; Peng, S.; Du, C.; Li, J.; Wu, S. SW-GAN: Road extraction from remote sensing imagery using semi-weakly supervised adversarial learning. Remote Sens. 2022, 14, 4145. [Google Scholar] [CrossRef]
Meng, S.; Di, Z.; Yang, S.; Wang, Y. Large-scale Weakly Supervised Learning for Road Extraction from Satellite Imagery. arXiv 2023, arXiv:2309.07823. [Google Scholar]
Lian, R.; Huang, L. Weakly Supervised Road Segmentation in High-Resolution Remote Sensing Images Using Point Annotations. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
Hua, Y.; Marcos, D.; Mou, L.; Zhu, X.X.; Tuia, D. Semantic Segmentation of Remote Sensing Images With Sparse Annotations. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar]
Wei, Y.; Ji, S. Scribble-Based Weakly Supervised Deep Learning for Road Surface Extraction From Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
Zhou, M.; Sui, H.; Chen, S.; Liu, J.; Shi, W.; Chen, X. Large-scale road extraction from high-resolution remote sensing images based on a weakly-supervised structural and orientational consistency constraint network. ISPRS J. Photogramm. Remote Sens. 2022, 193, 234–251. [Google Scholar] [CrossRef]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 4015–4026. [Google Scholar]
Osco, L.P.; Wu, Q.; de Lemos, E.L.; Gonçalves, W.N.; Ramos, A.P.M.; Li, J.; Junior, J.M. The segment anything model (sam) for remote sensing applications: From zero to one shot. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103540. [Google Scholar] [CrossRef]
Ma, X.; Wu, Q.; Zhao, X.; Zhang, X.; Pun, M.O.; Huang, B. SAM-Assisted Remote Sensing Imagery Semantic Segmentation with Object and Boundary Constraints. arXiv 2023, arXiv:2312.02464. [Google Scholar]
Guo, C.; Mita, S.; McAllester, D. Robust road detection and tracking in challenging scenarios based on Markov random fields with unsupervised learning. IEEE Trans. Intell. Transp. Syst. 2012, 13, 1338–1354. [Google Scholar] [CrossRef]
Assran, M.; Duval, Q.; Misra, I.; Bojanowski, P.; Vincent, P.; Rabbat, M.; LeCun, Y.; Ballas, N. Self-supervised learning from images with a joint-embedding predictive architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 15619–15629. [Google Scholar]
Zhang, L.; Lan, M.; Zhang, J.; Tao, D. Stagewise Unsupervised Domain Adaptation With Adversarial Self-Training for Road Segmentation of Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar]
Cira, C.I.; Kada, M.; Manso-Callejo, M.Á.; Alcarria, R.; Bordel Sanchez, B. Improving road surface area extraction via semantic segmentation with conditional generative learning for deep inpainting operations. ISPRS Int. J. Geo-Inf. 2022, 11, 43. [Google Scholar] [CrossRef]
Han, L.; Hou, L.; Zheng, X.; Ding, Z.; Yang, H.; Zheng, K. Segmentation Is Not the End of Road Extraction: An All-Visible Denoising Autoencoder for Connected and Smooth Road Reconstruction. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–18. [Google Scholar] [CrossRef]
Cha, K.; Seo, J.; Lee, T. A billion-scale foundation model for remote sensing images. arXiv 2023, arXiv:2304.05215. [Google Scholar] [CrossRef]
Yan, Z.; Li, J.; Li, X.; Zhou, R.; Zhang, W.; Feng, Y.; Diao, W.; Fu, K.; Sun, X. RingMo-SAM: A Foundation Model for Segment Anything in Multimodal Remote-Sensing Images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–16. [Google Scholar] [CrossRef]
Sun, X.; Wang, P.; Lu, W.; Zhu, Z.; Lu, X.; He, Q.; Li, J.; Rong, X.; Yang, Z.; Chang, H.; et al. RingMo: A Remote Sensing Foundation Model With Masked Image Modeling. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–22. [Google Scholar] [CrossRef]
Xu, Z.; Sun, Y.; Liu, M. Topo-boundary: A benchmark dataset on topological road-boundary detection using aerial images for autonomous driving. IEEE Robot. Autom. Lett. 2021, 6, 7248–7255. [Google Scholar] [CrossRef]
Mnih, V. Machine Learning for Aerial Image Labeling; University of Toronto: Toronto, ON, Canada, 2013. [Google Scholar]
Mattyus, G.; Wang, S.; Fidler, S.; Urtasun, R. Enhancing road maps by parsing aerial images around the world. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1689–1697. [Google Scholar]
Kaiser, P.; Wegner, J.D.; Lucchi, A.; Jaggi, M.; Hofmann, T.; Schindler, K. Learning Aerial Image Segmentation From Online Maps. IEEE Trans. Geosci. Remote Sens. 2017, 55, 6054–6068. [Google Scholar] [CrossRef]
Demir, I.; Koperski, K.; Lindenbaum, D.; Pang, G.; Huang, J.; Basu, S.; Hughes, F.; Tuia, D.; Raskar, R. DeepGlobe 2018: A Challenge to Parse the Earth through Satellite Images. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 172–181. [Google Scholar]
Van Etten, A.; Lindenbaum, D.; Bacastow, T.M. Spacenet: A remote sensing dataset and challenge series. arXiv 2018, arXiv:1807.01232. [Google Scholar]
Chen, Z.; Wang, C.; Li, J.; Xie, N.; Han, Y.; Du, J. Reconstruction Bias U-Net for Road Extraction From Optical Remote Sensing Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2284–2294. [Google Scholar] [CrossRef]

Figure 1. Classification of road extraction approaches based on deep learning.

Figure 2. The organization of this review.

Figure 3. Methodology roadmap in the literature [24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115] from 2012 to 2024.

Figure 4. The proportion of literature sources in this review.

Figure 5. The general architecture of a patch-based CNN model.

Figure 6. The general architecture of an Encoder–Decoder model.

Figure 7. The general architecture of the GAN model.

Figure 8. The general architecture of the Graph model.

Figure 9. The general architecture of a neural network-fused Transformer model.

Figure 10. Schematic illustrations of representative samples from the partial dataset. Columns (a,b) are relatively simple graphs, columns (c,d) are more complex graphs.

Table 1. The quantity of relevant literature advantages and limitations of fully supervised models.

Approaches	Literature Quantity	Advantages	Limitations
PatchCNN-based	9	Local feature extraction; Handling images of varying sizes; Memory consumption reduction	Loss of contextual information; Boundary effects; High computational cost
Encoder-Decoder-based	101	Efficient information propagation; Effective utilization of contextual information; Strong interpretability	High risk of information loss; High computational cost; High model complexity
GAN-based	14	Realistic generated images; No need large amnotated datas; Learn data distribution	Training and convergence instability; High resource demand
Graph-based	22	Global information utilization; Improve road topological connectivity and integrity	Complex training; Time consuming
Transformer-based	14	Capture long-distance dependencies and context information; Improve model interpretability	Require large parameters; Strong data dependences
Mamba-based	5	Linear time complexity; More efficient across long series; Cross-scanningmodule	Direction sensitive; Complex model design

Table 2. Datasets for road extraction from remote sensing images.

Dataset	Size	Train	Test	Val	Resolution (m/pixel)	From	Time
Massachusetts [227]	$1500 \times 1500$	1108	49	14	1	Massachusetts	2013
AerialKITTI-Bavaria [228]	$512 \times 512$	360	100	-	-	AerialKITTI and Bavaria	2015
CNDS [63]	$1600 \times 1600$	180	30	14	1.2	Google Earth	2017
CITY-OSM [229]	$2611 \times 2453$	660	165	-	-	-	2017
RTDS [179]	$4096 \times 4096$	25 cities	15 cites	-	0.6	GoogleMap	2018
DeepGlobe [230]	$1024 \times 1024$	6226	1101	243	0.5	Thailand, Indonesia and India	2018
SpaceNet v3 [231]	$1300 \times 1300$	2213	567	-	0.3	Paris, Las Vegas, Shanghai, and Khartoum	2018
Conghua [136]	$6000 \times 6000$	37	10	-	0.2	-	2019
RNBD [37]	-	14 regions	6 regions	1 region	0.21	Google Earth	2019
WorldView-4 [73]	$512 \times 512$	6736	1012	-	0.31	-	2019
CityScale [114]	$2048 \times 2048$	144	27	9	1	OSM	2020
Gaofen-2 [94]	$512 \times 512$	36,000	4000	-	0.8	Fujian and Hainan	2020
ShaoShan [121]	$1589 \times 1131$	29	20	-	0.5	Shaoshan	2020
GE-Road [115]	$800 \times 800$	12,000	7000	1000	0.3–0.6	-	2020
CHN6-CUG [42]	$512 \times 512$	3608	903	-	0.5	Google Earth	2021
LRSNY [232]	$1000 \times 1000$	716	432	220	0.5	New York	2021
Topo-boundary [226]	$1000 \times 1000$	20,236	3289	1770	-	-	2021
Icurb [101]	$1000 \times 1000$	29,000	10,000	1000	0.152	NYC OpenData	2021

Table 3. Comparison of fully supervised, semi-supervised, and unsupervised learning approaches.

Type	Literature Quantity	Annotation Requirements	Advantages	Limitations
Fully-supervised	165	Complete annotated data	High level of precision	Equire a large amount of annotated data; High cost; Weak ability of model generalization
Semi-supervised	28	Limited labeled data and abundant unlabeled data	Cost-saving annotation process; Can still train more accurate models in situations where annotation information is difficult to obtain; More suitable for practical scenarios	Inaccurate model learning information increases the complexity of design and tuning
Unsupervised	11	No need for annotated data		Lack of interpretability of results; Difficulty in mastering model performance

Table 4. Comparative performance of representative models on the Deepglobe dataset.

Model	Precision	Recall	F1 (↑)	IoU	mIoU	APLS
Tiny-AAResUNet [53]	86.96	93.96	92.35	95.89	–	–
CoANet [51]	–	–	89.25	80.58	–	85.14
NodeConnect [176]	88.34	88.38	88.36	82.58	–	70.81
Bi-HRNet [174]	88.78	84.39	86.51	–	–	54.78
RoadCT [104]	85.20	83.90	84.50	73.40	–	–
Topology_aware [173]	85.55	82.43	83.96	72.36	–	77.24
AD-RoadNet [55]	84.37	82.11	83.22	71.28	83.04	–
MDTNet [189]	82.96	83.01	82.98	71.19	–	78.42
UMiT-Net [191]	82.64	84.09	82.61	71.66	–	–
RSANet [190]	78.84	86.33	82.42	70.26	–	–
Unet+ATM+MLAF [105]	80.88	85.41	82.13	70.91	–	–
Dual-Task Network [149]	82.50	81.77	82.13	69.68	–	–
TransRoadNet [193]	80.92	83.91	81.34	70.06	–	–
DFC-UNet [141]	83.62	79.14	81.32	68.52	–	–
RCFSNet [43]	78.98	85.46	81.01	69.34	–	–
LGNet [86]	91.17	87.35	80.54	68.29	–	72.69

Table 5. Comparative performance of representative models on the Massachusetts dataset.

Model	Accuracy	Precision	Recall	F1 (↑)	IoU
PropGAN [95]	–	91.54	92.92	92.20	87.43
Road-RCF [46]	96.30	85.80	98.50	91.50	–
Tiny-AAResUNet [53]	–	92.23	81.56	91.07	94.27
Mixer UNet [57]	–	89.42	87.02	88.04	–
LRSR-net [56]	–	89.67	89.20	87.48	82.64
ImproGAN [173]	98.00	93.00	82.00	87.00	–
MsGAN [96]	–	85.30	87.10	86.20	–
Nested SE-Deeplab [92]	96.70	85.80	–	85.70	73.87
NL-DlinkNet [79]	–	85.20	81.80	83.40	–
RDRCNN [149]	98.01	85.35	75.75	80.31	67.10
MDTNet [189]	98.06	81.07	79.54	80.30	67.30
DCANet [138]	98.09	80.20	79.54	79.84	66.45
DDU-Net [54]	98.04	82.54	73.99	78.03	63.98
RUW-Net [87]	–	87.70	68.10	76.70	69.10
Modified UNet [134]	97.14	74.15	75.48	74.54	–
RoadCT [104]	–	81.20	68.90	74.50	59.50
DenseUNet [136]	–	78.25	70.41	74.07	74.47

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, R.; Wu, J.; Lu, W.; Miao, Q.; Zhang, H.; Liu, X.; Lu, Z.; Li, L. A Review of Deep Learning-Based Methods for Road Extraction from High-Resolution Remote Sensing Images. Remote Sens. 2024, 16, 2056. https://doi.org/10.3390/rs16122056

AMA Style

Liu R, Wu J, Lu W, Miao Q, Zhang H, Liu X, Lu Z, Li L. A Review of Deep Learning-Based Methods for Road Extraction from High-Resolution Remote Sensing Images. Remote Sensing. 2024; 16(12):2056. https://doi.org/10.3390/rs16122056

Chicago/Turabian Style

Liu, Ruyi, Junhong Wu, Wenyi Lu, Qiguang Miao, Huan Zhang, Xiangzeng Liu, Zixiang Lu, and Long Li. 2024. "A Review of Deep Learning-Based Methods for Road Extraction from High-Resolution Remote Sensing Images" Remote Sensing 16, no. 12: 2056. https://doi.org/10.3390/rs16122056

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Review of Deep Learning-Based Methods for Road Extraction from High-Resolution Remote Sensing Images

Abstract

1. Introduction

2. Background

3. Fully Supervised Methods for Road Extraction

3.1. Methods Based on Patch-CNNs

3.2. Methods Based on Encoder–Decoder

3.2.1. Methods Based on FCNs

3.2.2. Methods Based on UNet

3.2.3. Methods Based on FPNs

3.2.4. Methods Based on SegNet

3.2.5. Methods Based on LinkNet

3.2.6. Methods Based on DeepLab

3.3. Methods Based on GAN

3.4. Methods Based on Graph

3.4.1. Methods Based on Graph Representation

3.4.2. Methods Based on Iterative Detection

3.4.3. Methods Based on Polygon Boundary

3.5. Methods Based on Transformer

3.6. Methods Based on Mamba

3.7. Comparison of Six Models Based on Fully-Supervised Learning Methods

4. Semi-Supervised Methods for Road Extraction

4.1. Methods Based on Less Labeled Data

4.2. Methods Based on Weak Labeled Data

5. Unsupervised Methods for Road Extraction

5.1. Methods Based on Models with Fewer Parameters

5.2. Methods Based on Large Remote Sensing Models

6. Metrics

6.1. Accuracy

6.2. Precision

6.3. Recall

6.4. F1 Score

6.5. IoU

6.6. mIoU

6.7. APLS

6.8. ECM

6.9. CC

7. Datasets

7.1. Massachusetts

7.2. DeepGlobe

7.3. SpaceNet

7.4. CHN6-CUG

8. Discussion

9. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI