1. Introduction
The road systems connecting villages, cities, and countries stand as a pivotal transportation infrastructure in modern society [
1], and road maps are widely used in navigation, intelligent transportation, location-based services, emergency rescue, and urban design [
2]. Road extraction from remotely sensed imagery is one of the early-stage applications in the traffic industry. A road is usually seen as linear features in medium- to low-resolution satellite imagery, or the central line of a road is extracted from high-resolution imagery [
1,
2]. With the increasing availability of high-resolution remote sensing, roads are no longer just extracted as linear features from images, but can be used to evaluate the health conditions of road pavements [
3,
4]. In the fields of computer vision and autonomous driving, the focus is mainly on the recognition and extraction of targets such as cracks, curbs, pedestrians, and cars [
5]. In addition to conventional shallow machine learning and mathematical morphology methods, deep neural networks have carried out a significant amount of work in road pavement distress and road target extraction in recent years [
6,
7]. The main remotely sensed data used in these studies are high-resolution RGB images captured by vehicle-mounted and handheld cameras such as DeepCrack [
8] and RDD2022 [
9]. The methods for assessing pavement aging and distress conditions can be categorized into three types: image classification [
10,
11], object detection [
5,
6], and semantic segmentation [
7]. The pavement management system (PMS) often consists of mounted sensors including CCD cameras and LiDAR, as well as ground-penetrating radar (GPR) and thermal infrared sensors. At the same time, many researchers in the field of remote sensing have attempted to use sub-meter satellite image data for pavement aging assessment [
4,
11], and apply unmanned aerial vehicle (UAV)-captured RGB, multispectral, and hyperspectral data for road distress detection and semantic segmentation [
3,
12]. In addition, navigation street view images are also used for road distress identification in urban areas [
3]. From this point of view, the various remote sensor data with different resolutions obtained by spaceborne, UAV, and terrestrial remote sensing systems offer a new possibility for road aging and distress assessment. A research direction that is becoming a hot topic is how to integrate remote sensing data from multiple modalities to enhance sensing capability for pavement health conditions. For example, cracks may be difficult to distinguish from gasoline stains, shadows, etc., in RGB and multispectral images, but can be easily differentiated if high-resolution thermal infrared images can be obtained simultaneously [
13]. Closely related to multimodal remote sensing applications, it is necessary to study new deep learning models that fully utilize the spatial, spectral, depth, and thermal characteristics of road pavement distresses to construct deep artificial neural networks with strong generalization ability in order to provide more reliable technologies for road maintenance. It is undeniable that remote sensing technology has become a new tool for assessing road pavement health conditions, and it is worthy of further study. Therefore, we compiled a Special Issue for the journal 
Remote Sensing in 2022: “Road Extraction and Distress Assessment by Spaceborne, Airborne, and Territorial Platforms”, which received contributions from several scholars. The 12 papers published in this Special Issue will be introduced briefly in the following section.
  2. An Overview of Published Articles
In the first analysis of the contributions presented, it can be observed that the majority of them consistently utilize imagery from ground-based vehicle systems, developing sensors as well as algorithms to automate the extraction of pavement damage in either fully automatic or semi-automatic modes [
2,
3,
5,
7,
9]. A growing number of papers, on the other hand, employ ground-penetrating radar (GPR) techniques either alone [
8,
10,
11] or in combination with optical techniques [
12]. Surveys from unmanned aerial vehicles (UAVs) are also beginning to proliferate [
4,
6], while the use of true remote sensing techniques appears to be more limited [
1]. This evidence may reflect the fact that satellites and UAVs are not globally integrated in terms of technical requirements for road management procedures. Nonetheless, this kind of technology could be efficient and promising, especially when used with AI techniques, to examine large road networks and to extract valuable parameters to establish intervention priorities or to set up preventive maintenance programs.
In Liu et al. (contribution 1), the authors introduce a lightweight dynamic addition network (LDANet) tailored for rural road extraction. To address the unique characteristics of rural roads—narrowness, complexity, and diversity—they propose an enhanced Asymmetric Convolution Block (ACB)-based Inception structure to augment low-level features in the feature extraction layer. In the deep feature association module, they leverage depth-wise separable convolution (DSC) to reduce computational complexity and design an adaptation-weighted overlay to capture salient features effectively. Additionally, they curate a rural road dataset based on the Deep Globe Land Cover Classification Challenge dataset. Hence, LDANet exhibits promise for the rapid extraction and monitoring of rural roads from remote sensing imagery. 
The article by Song et al. (contribution 2) presents the creation of the ISTD-PDS7 dataset, the first of its kind aimed at multi-type pavement distress segmentation. This dataset comprises natural charge-coupled device (CCD) images and encompasses seven types of pavement distress across nine different scenarios, including negative samples with texture similarity noise, resulting in a total of 18,527 annotated images, surpassing previous benchmarks in scale. Additionally, the authors explore the efficacy of negative samples in mitigating false positive predictions in complex scenes and propose two potential data augmentation methods to enhance segmentation accuracy. The authors think that these efforts will catalyze advancements in both academic research and industrial applications within the field. 
In Inácio at al. (contribution 3), the authors introduce a straightforward system aimed at expediting road pavement surface inspection and analysis to facilitate maintenance decision-making. Leveraging a low-cost video camera mounted on a vehicle, pavement imagery was captured and processed through an automatic crack detection and classification system based on deep neural networks. The system offers a cracking percentage per road segment, alerting experts to areas requiring attention, as well as a segmentation map highlighting cracked areas on the road pavement surface. The system seems to exhibit promising performance in highway pavement analysis, and its automation and low processing time make it a valuable tool for experts engaged in road pavement maintenance activities. 
In the fourth text (contribution 4), the authors present a methodology for real-time road extraction and condition detection using video footage captured by UAV multispectral cameras or pre-downloaded multispectral images from satellites. The primary objective is to detect road conditions and identify emergencies to provide timely assistance to individuals in the wild. By leveraging a normalized difference vegetation index (NDVI), the UAV effectively distinguishes between bare soil roads and gravel roads, enhancing the accuracy of route planning data. In the context of low-altitude human–machine interaction, the authors utilized media-pipe hand landmarks and machine learning techniques to develop a dataset comprising four fundamental hand gestures for dynamic gesture recognition. The experimental results demonstrate that the model achieves very high accuracy on the testing set. Through this proof-of-concept study, the authors claim that the described approach fulfills the expected tasks of UAV rescue and route planning effectively. 
In Wang et al. (contribution 5), the authors propose an improved version of the You Only Look Once version three (YOLOv3) object detection model, integrating data augmentation and structure optimization, to achieve the intelligent and accurate measurement of pavement surface potholes. Initially, color adjustment techniques were employed to enhance the image contrast, followed by data augmentation through geometric transformations. Pothole categories were further categorized into P1 and P2 based on the presence of water. Subsequently, the structure of the YOLOv3 model was optimized using the Residual Network (ResNet101) and complete IoU (CIoU) loss, while the multiscale anchor sizes were refined through clustering and modification using the K-Means++ algorithm. Lastly, the robustness of the proposed model was evaluated through the generation of adversarial examples. The experimental results indicate a significant improvement over the original YOLOv3 model. 
The article by Qiu et al. (contribution 6) proposes an Adaptive Spatial Feature Fusion YOLOv5 Network (ASFF-YOLOv5) for the automatic recognition and detection of multiple multiscale road traffic elements. Initially, the K-means++ algorithm is utilized for clustering statistics on the range of multiscale road traffic elements, facilitating the determination of suitable candidate box sizes for the dataset. Subsequently, a Spatial Pyramid Pooling Fast (SPPF) structure is integrated to enhance the classification accuracy and speed, enabling richer feature information extraction. An ASFF strategy based on a Receptive Field Block (RFB) is then introduced to improve the feature scale invariance and enhance the detection of small objects. Finally, the experimental effectiveness is evaluated through mean average precision (mAP) calculations. The results demonstrate that the proposed method achieves a significant improvement over the original YOLOv5 model. 
Zhang et al. (contribution 7) evaluated mainstream CNN structures for road crack segmentation and propose a novel method, termed a Recurrent Adaptive Network (RAN), inspired by the second law of thermodynamics. The RAN dynamically assesses the imbalance degree, adjusts sampling rates, and modifies loss weights during training to maintain a balanced flow between precision and recall, akin to temperature conduction. The authors realized a dataset of high-resolution road crack images with pixel-level annotations (HRRC) from real inspection scenes, enabling the comprehensive evaluation of CNN performance in highway patrol scenarios. The primary contribution lies in addressing data imbalance and guiding model training by analyzing precision and recall. The experimental results seem to demonstrate the effectiveness of the RAN, achieving state-of-the-art performance on the HRRC dataset. 
Qi’s text (contribution 8) concerns a specific problem of some road infrastructure: the block-stone embankment that is vital for stabilizing underlying warm and ice-rich permafrost. It faces various damages over time, potentially compromising its cooling function and exacerbating issues along the Qinghai–Tibet Highway (QTH). Ground-penetrating radar (GPR), a nondestructive testing technique, was employed to assess damage properties in the embankment. An analysis of GPR imagery alongside other data and methodologies revealed several damage categories: loosening of the upper sand–gravel layer, loosening of the block-stone layer, settlement of the block-stone layer, and dense filling of the block-stone layer. While the first two conditions were widespread, settlement and dense filling of the block-stone layer were less common, with occurrences of combined damages also noted. The observed correlations among different damages suggest underlying causal relationships. 
Chen’s study (contribution 9) introduces LeViT, a novel Transformer method for automatic asphalt pavement image classification. LeViT comprises convolutional layers, transformer stages alternating between Multi-layer Perception (MLP) and multi-head self-attention blocks using residual connections, and two classifier heads. Leveraging three different sources of pavement image datasets and pre-trained weights from ImageNet, the authors compare the performance of LeViT with six state-of-the-art (SOTA) deep learning models trained using transfer learning. The experimental results demonstrate that after training for 100 epochs with a batch size of 16, LeViT achieves good results on the Chinese asphalt pavement dataset as well as on the German asphalt pavement dataset, outperforming all tested SOTA models. Moreover, LeViT exhibits superior inference speed compared to the original ViT method as well as prominent CNN-based models like DenseNet, VGG, and ResNet. Furthermore, the authors propose a visualization method combining Grad-CAM and Attention Rollout to enhance the interpretability of LeViT, facilitating the analysis of the classification results and insights into the learned features in each MLP and attention block.
In the tenth study (contribution 10), the authors developed a method for the rapid target identification and comparison of time-lapse GPR profiles. A field experiment was conducted to monitor a backfill pit using three-dimensional GPR (3D GPR), with time-lapse data collected over four months. A U-Net, a fast neural network based on convolutional neural networks (CNNs), was trained using the collected data. The trained model effectively segmented the backfill pit from inline profiles, achieving an Intersection over Union (IoU) of 0.83 on the test dataset. Additionally, the comparison of segmentation masks revealed potential changes in the southwest side of the backfill pit. 
The article by Ling et al. (contribution 11) proposes a road subgrade monitoring method based on the time-lapse full-coverage (TLFC) 3D GPR technique. The approach focuses on resolving key challenges related to time and spatial position mismatches in experimental data. By employing time-zero consistency correction, 3D data combination, and spatial-position-matching methods, the approach seems to significantly enhance the 3D imaging quality of underground spaces. Furthermore, the authors utilized time-lapse attribute analysis on TLFC 3D GPR data to extract detailed characteristics and overall patterns of dynamic subgrade changes.
The last text (contribution 12) introduces a novel approach for the inverse calculation of material parameters to determine the mechanical response of asphalt pavements. Initially, a modulus correction method is developed to minimize the error between the tested and simulated strains. Furthermore, a dual sinusoidal regression model effectively illustrates the relationship between temperature at various depths within the pavement structure and atmospheric temperature. An analysis of the pavement monitoring data reveals that increased loading weight and temperature, coupled with decreased loading speed, lead to elevated three-way strain in the asphalt layer. Consequently, a relationship model between loading conditions and three-way strain is established with high fidelity (R2 > 0.95). This comprehensive methodology addresses reliability issues with pavement structure parameters and provides a quantitative assessment of structural conditions, supporting the performance prediction and maintenance analysis of asphalt pavements with a semi-rigid base.
The Guest Editors would like to extend their thanks to all of the contributors to this Special Issue.