Next Article in Journal
Development of a Stress Block Model to Predict the Ultimate Bending Capacity of Rectangular Concrete-Filled Steel Tube Beams Strengthened with U-Shaped CFRP Sheets
Previous Article in Journal
Parametric Analysis as a Tool for Hypothesis Generation: A Case Study of the Federal Archive Building in New York City
Previous Article in Special Issue
Review and Insights Toward Cognitive Digital Twins in Pavement Assets for Construction 5.0
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Asphalt Pavement Surface Distress Detection Technology Coupling Deep Learning and Object Detection Algorithms

1
Department of Civil Engineering, Chongqing Jiaotong University, Chongqing 400074, China
2
China Highway Engineering Consulting Corporation, Beijing 100089, China
3
China Highway Engineering Consulting Corporation DATA Co., Ltd., Beijing 100089, China
4
Shanxi Provincial Highway Bureau Linfen Branch, Linfen 041000, China
*
Author to whom correspondence should be addressed.
Infrastructures 2025, 10(4), 72; https://doi.org/10.3390/infrastructures10040072
Submission received: 18 February 2025 / Revised: 12 March 2025 / Accepted: 17 March 2025 / Published: 24 March 2025
(This article belongs to the Special Issue Sustainable and Digital Transformation of Road Infrastructures)

Abstract

:
To address the challenges posed by the vast scale of highway maintenance in China and the high costs associated with traditional inspection vehicles. This study focuses on a routine maintenance project for national and provincial roads in Shanxi Province, with an emphasis on the selection and design of hardware for lightweight, portable pavement inspection devices. A monocular camera was used to capture pavement surface images, resulting in a dataset of 85,511 training samples. Additionally, the YOLOv5 object detection algorithm, combined with convolutional deep learning techniques, was employed to classify and identify pavement surface distresses in the collected images. Through multiple iterations of model tuning and validation, the proposed detection system achieved a false negative rate of 1.13%, a recall rate of 97.35%, and a precision rate of 98.30%. Its high accuracy provides a technical reference for the development and design of portable pavement distress detection devices.

1. Introduction

By the end of 2024, the total highway mileage in China had reached 5.4368 million kilometers, with 99.9% of this network being maintained. Over the past several years, the scope of road maintenance has consistently expanded, demonstrating a steady upward trend. However, large-scale specialized inspection equipment faces significant limitations in terms of detection costs, data acquisition, and processing, creating a major bottleneck in road maintenance management. Additionally, China’s extensive rural road network remains largely underserved by maintenance interventions due to inspection constraints. To address these challenges, this project aims to develop a compact, portable, and cost-effective device for the rapid collection, detection, and assessment of road conditions. The system will also facilitate timely data processing and analysis, ensuring a continuous flow of data to support informed decision-making in road maintenance management.
In the 1970s, the French road management authorities developed the GERPHO system. In the 1980s, a Japanese research team designed the Komatsu system, which was based on analog video technology [1]. By the 1990s, American researchers had introduced the Pavement Condition Evaluation Service (PCES) system, utilizing a line-scan digital camera for pavement distress detection [2,3]. Compared to its predecessors, the PCES system enabled simultaneous image acquisition and processing. However, its functionality was limited, as it was unable to distinguish between different types of pavement distresses [4]. During the same period, rapid advancements in Charge-Coupled Device (CCD) image sensor technology led a Canadian company to develop the Automatic Road Analyzer (ARAN) system [5]. Despite its innovations, the ARAN system faced high hardware costs and lacked synchronization between pavement data collection and distress identification [6,7]. Since the early 21st century, 3D laser scanning technology has rapidly evolved. Researchers have integrated this technology into pavement distress detection systems, such as the DHDV system in the United States, which achieved true automation in detecting pavement distresses. However, its high hardware requirements and maintenance costs have hindered widespread adoption [8]. More recently, studies have utilized 3D point cloud data and YOLOv5 detection models to identify various types of pavement distresses, including longitudinal cracks, transverse cracks, alligator cracks, and potholes. Notable contributions in this field include the work of Ravi Radhika and Ayman Habib from the United States [9], as well as Sami Abdullah from Australia and Sakib Saadman from Bangladesh [10,11,12,13]. Early pavement distress detection methods predominantly relied on traditional texture feature extraction techniques, such as Local Binary Patterns (LBP) and Gabor filters, which often yielded suboptimal performance in practical applications. In contrast, recent advancements in deep learning technologies have enabled the direct learning of feature representations from data, significantly improving the accuracy of pavement distress identification. By applying deep learning-based image processing algorithms to pavement images, researchers can accurately detect and localize distresses. The detected distress regions can then be segmented to extract geometric features such as area, length, and width. Statistical analysis of all detected distresses allows for the calculation of the distress rate (DR) for the evaluated road section, ultimately leading to the derivation of the Pavement Condition Index (PCI). Several other scholars have also made significant contributions to pavement distress detection [13,14,15,16,17,18].
In comparison to foreign countries, China’s development of automatic pavement inspection technology began relatively late. However, since the early 21st century, advancements in hardware capabilities and significant progress in image processing technologies have enabled substantial achievements by domestic research institutes and universities. Several Chinese scholars have conducted extensive research on automatic pavement inspection systems, making significant contributions to image processing techniques for pavement images. Their work has greatly advanced the application of digital image processing technology in road inspection [19,20,21]. In China, pavement inspection technologies are primarily classified into four categories: deep learning-based methods, 3D laser-based techniques, vibration signal-based detection, and pavement texture analysis [21,22,23,24,25,26,27,28].
With the advancement of the 2025 initiative, the digitalization, informatization, and automation of road maintenance have been rapidly progressing. In recent years, a variety of lightweight inspection devices have emerged on the market. Companies such as Baidu, Qianxun, and Shanghai Tonglu Cloud, as well as universities including Tongji University and Southeast University, have actively engaged in research and development. The hardware used in these devices is largely similar, with most relying on monocular cameras, stereo cameras, or industrial cameras to capture road surface data. The collected images are subsequently processed for various recognition tasks to generate pavement inspection data. This study aims to develop a pavement distress detection system by selecting suitable hardware components and utilizing a monocular camera. An enhanced YOLOv5 algorithm is adopted as the target detection model, incorporating convolutional deep learning techniques to classify pavement distress images captured by the monocular camera. By constructing a distress dataset and conducting debugging, validation, and comparative experiments, the proposed model undergoes training and technical validation. Ultimately, the system facilitates the extraction of key physical characteristics of pavement distresses, including length, width, and area, thereby achieving accurate pavement distress detection.

2. Hardware Selection and Algorithm Principles

2.1. Hardware Selection

Based on the demand assessment, the hardware platform must meet the following requirements: (1) The CPU should have a minimum of four cores. (2) The encoding and decoding capabilities must support the processing of two channels of 4K 30 fps video streams. (3) The system should be equipped with at least 8 GB of RAM. (4) A minimum of two USB 3.0 ports should be available. (5) The platform must support hard disk read/write operations and include an HDMI interface for display connectivity.

2.1.1. Hardware Platform

A comparative analysis was performed on several embedded platforms from NVIDIA, including the TX2 and Jetson Xavier, as well as processing platforms such as Raspberry Pi, BeagleBone, and Huawei, as shown in Figure 1. Initially, considering both processor performance and physical size, the TX2 was deemed too large, while platforms such as the Raspberry Pi lacked sufficient computational power and stability. As a result, the Jetson AGX Xavier platform was selected for this study. However, due to its high cost, future iterations of the system may consider adopting the Xavier NX series to optimize cost control.
Considering the installation environment, the camera will be mounted on a data collection vehicle. The vertically installed lens is typically positioned 1 to 3 m above the ground, while the side-mounted lens is generally placed 1 to 10 m from the guardrail. Given these conditions, a short-focus lens with a focal length of 8 mm is selected for the pavement camera.
Currently, positioning systems include GPS, BeiDou, Galileo, and GLONASS. The full deployment of the BeiDou satellite network in China provides robust support for positioning-related equipment and significantly improves location accuracy. In crack detection, the algorithm must accurately determine the location of detected pavement distresses, enabling detailed analysis of distress distribution across entire road segments. These data are crucial for effective road maintenance planning and decision-making. To ensure high-precision positioning, the system will employ a multi-system, multi-frequency positioning board that supports BeiDou navigation to further enhance accuracy.

2.1.2. Camera Selection

Industrial cameras are available with various interface types, including USB, Ethernet (GigE), CameraLink, IEEE 1394, and CoaXPress. When conducting outdoor data collection, industrial cameras offer superior resistance to environmental variations, dust, and vibrations, ensuring greater reliability in complex conditions. To achieve a 1 mm crack resolution over a 4 m wide pavement section, a minimum of 4000 pixels is required. Since pavement inspection typically focuses on a single lane per capture, a 2D area-scan camera is preferred over a line-scan camera.
Industrial camera sensors are typically classified into two types: CMOS and CCD, with CMOS sensors becoming increasingly prevalent. Additionally, industrial cameras use two exposure methods: (1) Global Shutter, which captures the entire image simultaneously; and (2) Rolling Shutter, which exposes different rows at different times, potentially causing distortion when capturing moving objects. Given the vehicle’s motion speed, the image capture response time must be as fast as possible, and the exposure time should be minimized while ensuring image clarity. Therefore, a global shutter CMOS camera is selected to guarantee distortion-free imaging.
FLIR offers both global shutter and rolling shutter cameras, such as the Flea3 USB3 series. However, these cameras have a resolution range limited to 1K to 2K pixels. In contrast, the ORYX 10GigE can capture 4K resolution, 12-bit images at over 60 FPS, but it uses a GigE interface.
In contrast, the Huarui A7A20MU30 from Zhejiang Dahua Technology achieves 4K resolution with a USB 3.0 interface, global shutter, and C-mount compatibility. It supports the USB3 Vision protocol and the GenICam standard, and features a pixel size of 3.45 μm × 3.45 μm. A USB-interface camera was selected for its plug-and-play functionality, which facilitates rapid prototype development and allows for easy future upgrades with similar or higher-performance cameras. Consequently, the Dahua 4K industrial camera was chosen for this study (as shown in Figure 2).

2.1.3. Positioning System

Currently, positioning systems primarily include GPS, GNSS (Global Navigation Satellite System), Galileo, and GLONASS. With the full deployment of the BeiDou satellite network in China, it provides substantial support for positioning-related equipment and significantly improves location accuracy. For crack detection, the algorithm requires precise geolocation of identified distresses, enabling detailed analysis of distress distribution across entire road segments and offering essential data for road maintenance decision-making. To enhance the reliability and accuracy of detection data, this study selects a high-precision, multi-system, multi-frequency positioning board that supports dual-mode BeiDou + GPS positioning. This module features low power consumption and a compact size (as shown in Figure 3), making it easy to integrate into vehicle-mounted and other automated inspection systems.

2.1.4. Movable Central Control System

The data acquisition platform functions as the central control and operation hub for the automated inspection system (as shown in Figure 4). It integrates a positioning data processing module, power module, display unit, and an embedded core processing platform, while incorporating a high-capacity 1 TB solid-state drive (SSD) to support large-scale 4K image data storage.
The device housing has been modified to include connection ports, ensuring efficient interconnectivity among all components. The display unit supports touchscreen operation and is equipped with customized software that includes functionalities such as algorithm parameter configuration, algorithm encapsulation and execution, result visualization, image list display, and processing progress tracking. Additionally, the system supports large-panel display operation for pavement distress detection, thereby significantly enhancing workflow efficiency for maintenance personnel.

2.1.5. Car Triangular Bracket

To facilitate the integration and utilization of various modular components within the pavement distress detection system—including the solid-state storage drive—and to ensure seamless interface connectivity, signal transmission, and external data output, a custom-designed mechanical enclosure was developed to house all system components in a unified structure.
A dedicated hard drive support and fixation bracket was designed, featuring a tripod with suction cups for stability. The industrial camera and high-resolution lens are screw-mounted onto the bracket, which is securely positioned in a triangular configuration at the rear of the vehicle (as shown in Figure 5). A balancing mechanism is incorporated between the camera and the mounting structure, enabling the camera to capture pavement images at a fixed angle relative to the vertical axis, ensuring optimal image acquisition.

2.2. Algorithm Principle and Selection

The computational principles employed in this study primarily include deep learning theory and object detection algorithms.

2.2.1. Data Collection

The core principle of deep learning is to extract hierarchical feature representations through layered networks, which enables the learning and abstraction of complex features. Convolutional Neural Networks (CNNs) are widely used in image classification and object detection tasks. Their architecture is primarily composed of three components: the input layer, hidden layers, and output layer.
In this study, a CNN-based model is employed for training, which consists of two main stages. The first stage is the forward propagation phase, during which data flows from lower-level to higher-level representations. The second stage is backpropagation, in which, if the predicted results deviate from the expected outcomes, the error is propagated backward from higher to lower layers, facilitating model optimization and improved accuracy.

2.2.2. Object Detection Algorithms

Object detection algorithms can be broadly classified into two types. The first type consists of Region Proposal-based methods, such as R-CNN, Fast R-CNN, and Faster R-CNN, which follow a two-stage approach. These methods first generate region proposals using heuristic techniques, such as Selective Search or a CNN-based Region Proposal Network (RPN). Classification and regression are then performed on these proposals. The second type includes one-stage methods, such as YOLO and SSD, which directly predict object categories and locations using a single CNN network. A typical object detection model consists of a feature extraction backbone and a classification-detection network. The design of the backbone network is crucial in determining how features are fused across different network layers. With advancements in deep learning, backbone networks are continuously optimized to enhance feature extraction.
The YOLOv5 network, a representative one-stage detection model, comprises four main components: the input layer, backbone network (Backbone), feature fusion network (Neck), and output layer (Prediction). Compared to other models, YOLOv5 offers superior efficiency and speed, making it particularly well suited for high-frequency pavement distress detection. The architecture of YOLOv5 is shown in Figure 6. This study will perform a comparative analysis to select the optimal YOLOv5 model and integrate deep learning techniques to enhance the accuracy and robustness of pavement distress detection.
YOLOv5 offers several backbone network architectures with varying depths, including YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. These four structures are governed by two key parameters: model depth and the channel width between layers. The number of model parameters increases progressively from YOLOv5s (the smallest) to YOLOv5x (the largest). Specific parameter settings and channel configurations are provided in the Table 1.
Using the same Cross Stage Partial (CSP) structure for comparison, the YOLOv5s network includes one residual block, while YOLOv5m utilizes two residual blocks, YOLOv5l employs three residual blocks, and YOLOv5x incorporates four residual blocks at the same locations. This design effectively controls the network’s depth. Additionally, the network width in YOLOv5 is regulated by adjusting the number of convolutional kernels at different stages. Taking the Focus structure as an example, YOLOv5s uses 32 convolutional kernels, YOLOv5m increases this to 48, YOLOv5l to 64, and YOLOv5x to 80.

3. Research on the Construction of Disease Datasets and Scheme Design

3.1. Construction of Road Disease Detection Dataset

In this study, the pavement distress detection dataset was constructed with a strict proportional division into a training set, validation set, and test set. By collecting real-time traffic photos on the road after the equipment is assembled, the dataset can categorize pavement distresses into five types: transverse cracks, longitudinal cracks, alligator cracks, potholes, and patched areas. A total of 85,511 pavement images were collected from actual road surfaces, and 14,641 valid samples were selected to build the dataset. The training and test sets were split into a 9:1 ratio, resulting in 13,388 images for training and 1253 images for testing. Additionally, during training, a validation set was extracted from the training set at a 9:1 ratio to update the model weights, resulting in 1338 validation images. Annotated examples of pavement distresses in the dataset are illustrated in Figure 7.

3.2. Model Training Design

3.2.1. Platform Environment Deployment

In this study, the YOLOv5 object detection algorithm was trained using a self-constructed pavement distress dataset and applied to identify distress types on actual road surfaces. The platform and development environment consist primarily of a Linux-based system and the Python programming language.

3.2.2. Dataset Preparation

The YOLOv5 network does not directly read dataset images and annotations, as this may result in insufficient memory when processing a large number of images. To address this issue, the dataset and annotation files are first stored in a designated directory during training. The image filenames, labels, and bounding box coordinates are then extracted into a single text file, which YOLOv5 reads in batches. This approach prevents memory overflow and enhances data loading efficiency.
During the preprocessing stage, training images are stored in the JPEGImages directory, while annotation files are placed in the Annotations directory. The voc_label.py script is employed to automatically divide the dataset into training and validation sets, generating a labels directory. Within this directory, all annotation data are recorded in text files, including image filenames, label names, and bounding box coordinates (upper-left and lower-right points).
Before training, the dataset parameters must be modified in the test.yaml file. These parameters include the file paths for the training and test datasets, the number of classes, and the class names, which are provided as a string array. The structure of each dataset is illustrated in Figure 8, Figure 9, Figure 10 and Figure 11 the following figure.

3.2.3. Training Parameter Adjustments

Before initiating training, the YOLOv5 algorithm provides configurable network parameters, including the number of training iterations, the batch size for image loading per training step, the configuration file for the backbone network, and the input image size.

3.3. Methods for Analyzing Test Results

In this study, recall, precision, and the false-negative rate are used as evaluation metrics for pavement distress detection. True Positive (TP) refers to a correctly predicted positive instance, True Negative (TN) to a correctly predicted negative instance, False Positive (FP) to a negative instance incorrectly predicted as positive, and False Negative (FN) to a positive instance incorrectly predicted as negative.
(1)
Recall detection rate
Recall is the ratio of correctly identified instances to the total number of instances that should be identified in the dataset.
R e c a l l = T P / ( T P + F N )
(2)
Precision detection rate
Precision is the ratio of correctly identified instances to the total identified instances.
P r e c i s i o n = T P / ( T P + F P )
(3)
False-negative rate
The false-negative rate is the ratio of incorrectly unidentified instances to the total number of actual samples.
F N R = F N / ( T P + F N )
(4)
Test results
Figure 12 shows the detection results of YOLOv5 networks under different backbone networks.

4. Analysis of Asphalt Pavement Surface Damage Detection Technology

4.1. Demonstration and Analysis of the Experimental Results of Object Detection Algorithm

The initial validation experiment in this study aimed to assess the feasibility of the algorithm. This experiment compared the detection performance of the instance segmentation algorithm Mask R-CNN and the object detection algorithm YOLOv5 using the same set of pavement images. Mask R-CNN performs instance segmentation by first locating objects with bounding boxes and then conducting pixel-wise classification within each bounding box to segment the target objects. In contrast, YOLOv5, as an object detection algorithm, only locates objects using bounding boxes without performing pixel-level segmentation.
As shown in Figure 13, the results of the validation experiment indicate that Mask R-CNN exhibits significant deviation in the bounding box localization stage, leading to inaccuracies in the segmentation phase, where it struggles to fully segment the cracks. In contrast, YOLOv5 effectively locates pavement cracks and correctly identifies them individually, demonstrating its suitability for pavement distress detection.
Through four rounds of experimental validation, this study found that the algorithm’s generalization ability is directly related to the size of the dataset. Based on these findings, the final validation phase focused on significantly expanding the pavement distress dataset while ensuring consistent categorization.
During the validation process, pavement distresses were strictly classified into five categories: transverse cracks, longitudinal cracks, alligator cracks, potholes, and patched areas. The dataset was divided into training and validation sets in a 9:1 ratio, resulting in a total of 13,388 image samples for training.
As shown in Figure 14, the results from the four validation experiments demonstrate clear and accurate distress localization and classification. These findings confirm that the YOLOv5 algorithm achieves a high level of model fitting and generalization when applied to a comprehensive pavement distress dataset. Statistical analysis reveals an overall false-negative rate of 1.13%, a recall rate of 97.35%, and a precision rate of 98.30%.
The training results of the YOLOv5 object detection model developed in this study are shown in Figure 15. It can be observed that, during the 200-step iterative process, the GIOU loss and other loss functions exhibit a gradual decrease, while accuracy and recall steadily increase, eventually converging to stable values.

4.2. Analysis of Camera Calibration Algorithms

The calibration images captured for the experiment are shown in Figure 16. Among the 28 images used in this study, the position and angle of the calibration board vary across images, effectively covering the entire field of view of the camera. This diverse positioning strategy significantly reduces calibration errors.
First, a corner detection algorithm was applied to identify the subpixel locations of all corner points in each image. The yellow rectangular box represents the origin corner point, while the green markers indicate the subpixel coordinates of each detected corner. Using the complete set of corner point data from all images, the camera’s intrinsic and extrinsic parameters, as well as distortion coefficients, were optimized using Zhang’s calibration method, the least squares method, and the Levenberg–Marquardt (LM) algorithm. The 3D spatial positions of the monocular camera and all calibration boards are illustrated in Figure 17.

4.3. Measurement Index Table

After obtaining the accurate intrinsic and extrinsic camera parameters, along with the distortion coefficients, an index table can be established. Since this process involves only matrix operations, no additional errors are introduced, and the index table’s error remains consistent with the calibration error of 0.52 pixels.
Given that the camera has a 4K resolution with 3000 × 4096 pixels, the index table contains data for approximately 12 million points, fully representing the physical spatial information of the captured image, as illustrated in Figure 18. Each entry in the index table stores four key pieces of information for each pixel: its corresponding physical length, width, diagonal length, and area, as shown in Figure 19. Ultimately, the index table comprises 48 million precise parameters, which serve as a reference for subsequent pavement distress geometric measurement algorithms.

4.4. Analysis of Broken Geometry Information Measurement Algorithm

Due to the irregular geometric shape of pavement distresses, this study first calculates the length and then derives the average width. However, the accuracy of the distress skeletonization algorithm directly affects the width measurement error. To obtain the actual physical length of the distress, the binarized mask image must be processed into a skeletonized image, which consists of a single-pixel-wide representation of the distress region. The skeletonized image must be continuous and free of discontinuities or noise artifacts to ensure the most accurate length estimation.
The skeletonization algorithm used in this study is a lookup table-based method that iteratively applies logical operations to the outermost white pixels of the distress region. The algorithm determines whether a pixel belongs to the skeleton based on its relationship with neighboring pixels. If the pixel is part of the skeleton, it is retained; otherwise, it is removed. This process ultimately reduces the distress mask to a single-pixel-wide skeletonized representation.
This study categorizes pavement distresses into four major types: simple cracks, alligator cracks, potholes, and patched areas. Among these, potholes and patched areas are excluded from the length and width calculations. The study further classifies simple cracks and alligator cracks into six subtypes, as illustrated in Figure 20. Simple cracks are subdivided into transverse cracks, longitudinal cracks, diagonal cracks, and herringbone cracks, while alligator cracks are divided into simple alligator cracks and complex alligator cracks.
Figure 21 presents the skeletonized images extracted from the six types of pavement distress masks using the lookup table method. As shown in the figure, the width varies across different regions of the distress masks; however, the lookup table method accurately identifies the primary direction of distress propagation. Furthermore, the extracted skeleton images remain continuous, without disruptions or artifacts, ensuring that length calculations are not affected by noise or distortions. With a well-defined skeleton image, the index table can be used to retrieve the physical length corresponding to each pixel position, and these values are summed to obtain the actual physical length and average width of the distress region.
In summary, the distress geometry measurement algorithm enables the precise extraction of physical spatial parameters based solely on the calibration-generated index table and the semantic segmentation-derived distress mask, providing an effective method for measuring pavement distress geometries in real-world conditions.

5. Conclusions

(1)
Through the hardware selection and design process, a lightweight and portable pavement distress detection device was assembled, offering an efficient and practical solution for on-site pavement inspection.
(2)
By integrating the YOLOv5 object detection algorithm with convolutional deep learning techniques, a model was trained using 85,511 pavement sample images. The final statistical results show an overall false-negative rate of 1.13%, a recall rate of 97.35%, and a precision rate of 98.30%, demonstrating the model’s high accuracy and reliability.
(3)
Algorithm validation and analysis confirmed that the distress geometry measurement algorithm can accurately extract physical spatial parameters using only the calibration-generated index table and the semantic segmentation-derived distress mask. The study concludes that the developed pavement distress detection device has significant potential for practical engineering applications.

6. Prospect

(1)
This study utilizes a monocular camera, which effectively identifies two-dimensional pavement distresses, such as transverse and longitudinal cracks, alligator cracking, block cracking, and patched areas. However, it currently lacks the capability to accurately detect three-dimensional distresses, such as potholes and subsidence. In the future, a stereo camera system could be implemented, incorporating existing equipment algorithms and advanced technical approaches to enable comprehensive pavement distress detection.
(2)
This study focuses solely on pavement distress detection for asphalt surfaces. Given the distinct differences between cement and asphalt pavement distresses, future research could explore established distress recognition methodologies to develop an automated detection system for cement pavement distresses.

Author Contributions

Formal analysis, Y.D.; Methodology, Y.D.; Resources, Y.D. and Y.H.; Supervision, Y.D., Y.H., X.C., P.X. and K.D.; Validation, Y.D.; Writing—review & editing, H.Z. and Y.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used to support the findings of this study are included within the article.

Acknowledgments

Thanks for China Highway Engineering Consulting Corporation support.

Conflicts of Interest

Authors Yuanshuai Dong, Yun Hou and Xiangjun Cheng were employed by the company China Highway Engineering Consulting Corporation. Authors Yuanshuai Dong, Yun Hou, Xiangjun Cheng and Peiwen Xie were employed by the company China Highway Engineering Consulting Corporation DATA Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Sermanet, P.; Eigen, D.; Zhang, X.; Mathieu, M.; Fergus, R.; LeCun, Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
  2. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, 23–28 June 2014; IEEE Computer Society: Washington, DC, USA, 2014. [Google Scholar]
  3. Uijlings, J.R.; Van De Sande, K.E.; Gevers, T.; Smeulders, A.W. Selective Search for Object Recognition. Int. J. Comput. Vis. 2013, 104, 154–171. [Google Scholar] [CrossRef]
  4. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed]
  5. Girshick, R. Fast R-CNN. In Proceedings of the 15th IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 11–18 December 2015. [Google Scholar]
  6. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
  7. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
  8. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the 14th European Conference on Computer Vision, ECCV 2016, Amsterdam, The Netherlands, 8–16 October 2016. [Google Scholar]
  9. Ravi, R.; Habib, A.; Bullock, D. Pothole Mapping and Patching Quantity Estimates using LiDAR-Based Mobile Mapping Systems. Transp. Res. Rec. J. Transp. Res. Board 2020, 2674, 124–134. [Google Scholar] [CrossRef]
  10. Fakhri, S.A.; Satari Abrovi, M.; Zakeri, H.; Safdarinezhad, A.; Fakhri, A. Pavement crack detection through a deep-learned asymmetric encoder-decoder convolutional neural network. Int. J. Pavement Eng. 2023, 24, 2255359. [Google Scholar] [CrossRef]
  11. Sami, A.A.; Sakib, S.; Deb, K.; Sarker, I.H. Improved YOLOv5-Based Real-Time Road Pavement Damage Detection in Road Infrastructure Management. Algorithms 2023, 16, 452. [Google Scholar] [CrossRef]
  12. Hedeya, M.A.; Samir, E.; El-Sayed, E.; El-Sharkawy, A.A.; Abdel-Kader, M.F.; Moussa, A.; Abdel-Kader, R.F. A Low-Cost Multi-sensor Deep Learning System for Pavement Distress Detection and Severity Classification. In Proceedings of the 8th International Conference on Advanced Machine Learning and Technologies and Applications (AMLTA2022), Cairo, Egypt, 5–7 May 2022; pp. 21–33. [Google Scholar]
  13. Matarneh, S.; Elghaish, F.; Al-Ghraibah, A.; Abdellatef, E.; Edwards, D.J. An automatic image processing based on Hough transform algorithm for pavement crack detection and classification. Smart Sustain. Built Environ. 2023. [Google Scholar] [CrossRef]
  14. Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  15. Law, H.; Deng, J. CornerNet: Detecting Objects as Paired Keypoints. Int. J. Comput. Vis. 2020, 128, 642–656. [Google Scholar] [CrossRef]
  16. Zhang, A.A.; Shang, J.; Li, B.; Hui, B.; Gong, H.; Li, L.; Zhan, Y.; Ai, C.; Niu, H.; Chu, X.; et al. Intelligent pavement condition survey: Overview of current researches and practices. J. Road Eng. 2024, 4, 257–281. [Google Scholar] [CrossRef]
  17. Wang, S.; Cai, B.; Wang, W.; Li, Z.; Hu, W.; Yan, B.; Liu, X. Automated detection of pavement distress based on enhanced YOLOv8 and synthetic data with textured background modeling. Transp. Geotech. 2024, 48, 101304. [Google Scholar] [CrossRef]
  18. Yuan, B.; Sun, Z.; Pei, L.; Li, W.; Zhao, K. Shuffle Attention-Based Pavement-Sealed Crack Distress Detection. Sensors 2024, 24, 5757. [Google Scholar] [CrossRef] [PubMed]
  19. Wang, R.; Wang, C.; Chu, X. Research Progress on Pavement Damage Image Recognition. J. Jilin Univ. Technol. (Eng. Ed.) 2002, 32, 91–97. [Google Scholar]
  20. Li, L.; Sun, L.; Chen, C. Edge Detection Method for Pavement Damage Image Processing. J. Tongji Univ. (Nat. Sci. Ed.) 2011, 39, 688–692. [Google Scholar]
  21. Shi, L.; Dang, L.; Yang, L.; Shi, S. Pavement Damage Image Recognition Method Based on Manifold Feature Fusion. Comput. Appl. Softw. 2016, 33, 150–152 + 96. [Google Scholar]
  22. Zhang, Y.; Li, Q.; Xue, F.; Yu, L. Design of Pavement Crack Detection System Based on Jetson TX2. Highway 2023, 68, 337–344. [Google Scholar]
  23. Chen, H.; Wang, J. Infrared Asphalt Pavement Crack Detection Method Based on Improved YOLOv5. Telev. Technol. 2023, 47, 43–50. [Google Scholar]
  24. Zhou, Y.; Zhang, J.; Cao Ji Liu, Y.; Zhang, H. Research on Pavement Pothole Detection Error Compensation Algorithm Based on 3D Laser Technology. J. Highw. Transp. Technol. 2023, 40, 17–24. [Google Scholar]
  25. Wu, C.; Ti, J.; Ma, J. Digitalization of Asphalt Pavement Maintenance Information Based on Holographic 3D Detection Technology. Guangdong Highw. Transp. 2023, 49, 1–7. [Google Scholar]
  26. Chen, M.; Zhang, M.; Liu, Z.; Han, Y.; Gu, S. Design and Implementation of a Lightweight Portable Intelligent Pavement Distress Detection System. Eng. Qual. 2022, 40, 74–79. [Google Scholar]
  27. Chen, X.; Gao, H.; Yang, Z.; Kong, T.; Che, R. Research on Pavement Crack Detection and Recognition Based on Improved Yolov5s. Softw. Guide 2014. [Google Scholar]
  28. Wang, Y.; Zhou, C.; Wang, Y.; Li, W. Digital Research on Pavement Distress Based on Improved YOLOv8 Algorithm. Highway 2024, 69, 350–356. [Google Scholar]
Figure 1. Nvidia Xavier platform.
Figure 1. Nvidia Xavier platform.
Infrastructures 10 00072 g001
Figure 2. 4K industrial camera.
Figure 2. 4K industrial camera.
Infrastructures 10 00072 g002
Figure 3. Gradation curves.
Figure 3. Gradation curves.
Infrastructures 10 00072 g003
Figure 4. Movable central control system.
Figure 4. Movable central control system.
Infrastructures 10 00072 g004
Figure 5. Vehicle-mounted triangular bracket.
Figure 5. Vehicle-mounted triangular bracket.
Infrastructures 10 00072 g005
Figure 6. The network structure of YOLOv5.
Figure 6. The network structure of YOLOv5.
Infrastructures 10 00072 g006
Figure 7. Legend of the road disease dataset: (a) Transverse crack disease (b) Longitudinal crack disease (c) Two reticular fissure diseases. (d) Four pit diseases (e) Repair disease.
Figure 7. Legend of the road disease dataset: (a) Transverse crack disease (b) Longitudinal crack disease (c) Two reticular fissure diseases. (d) Four pit diseases (e) Repair disease.
Infrastructures 10 00072 g007
Figure 8. The CrackForest dataset.
Figure 8. The CrackForest dataset.
Infrastructures 10 00072 g008
Figure 9. The 4K dataset.
Figure 9. The 4K dataset.
Infrastructures 10 00072 g009
Figure 10. The 4K fixed-size split dataset.
Figure 10. The 4K fixed-size split dataset.
Infrastructures 10 00072 g010
Figure 11. The 4K arbitrarily cropped dataset.
Figure 11. The 4K arbitrarily cropped dataset.
Infrastructures 10 00072 g011
Figure 12. Detection results of different models of YOLOv5: (a) YOLOv5s, (b) YOLOv5m, (c) YOLOv5l, (d) YOLOv5x.
Figure 12. Detection results of different models of YOLOv5: (a) YOLOv5s, (b) YOLOv5m, (c) YOLOv5l, (d) YOLOv5x.
Infrastructures 10 00072 g012
Figure 13. Comparison of Mask-RCNN and YOLOv5: (a) Original images, (b) Mask-RCNN, (c) YOLOv5.
Figure 13. Comparison of Mask-RCNN and YOLOv5: (a) Original images, (b) Mask-RCNN, (c) YOLOv5.
Infrastructures 10 00072 g013
Figure 14. The results of the fourth verification: (a) Crack disease, (b) Reticular fissure diseases, (c) Pit diseases, (d) Repair disease.
Figure 14. The results of the fourth verification: (a) Crack disease, (b) Reticular fissure diseases, (c) Pit diseases, (d) Repair disease.
Infrastructures 10 00072 g014
Figure 15. Model training results.
Figure 15. Model training results.
Infrastructures 10 00072 g015
Figure 16. Images used for camera calibration.
Figure 16. Images used for camera calibration.
Infrastructures 10 00072 g016
Figure 17. Three−dimensional spatial position of the monocular camera and the calibration plate.
Figure 17. Three−dimensional spatial position of the monocular camera and the calibration plate.
Infrastructures 10 00072 g017
Figure 18. Index table.
Figure 18. Index table.
Infrastructures 10 00072 g018
Figure 19. Four parameters for each pixel.
Figure 19. Four parameters for each pixel.
Infrastructures 10 00072 g019
Figure 20. Four parameters for each pixel: (a) Simple cracks, (b) Reticulated cracks.
Figure 20. Four parameters for each pixel: (a) Simple cracks, (b) Reticulated cracks.
Infrastructures 10 00072 g020
Figure 21. Four parameters for each pixel: (a) Simple cracks, (b) Reticulated cracks.
Figure 21. Four parameters for each pixel: (a) Simple cracks, (b) Reticulated cracks.
Infrastructures 10 00072 g021
Table 1. Parameters and parameters of backbone networks at each level.
Table 1. Parameters and parameters of backbone networks at each level.
Backbone NetworkModel DepthThe Width of the Interstory PassageParameter Size/KB
YOLOv5s0.330.514,468
YOLOv5m0.670.7542,367
YOLOv5l1.01.093,086
YOLOv5x1.331.25173,370
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, H.; Dong, Y.; Hou, Y.; Cheng, X.; Xie, P.; Di, K. Research on Asphalt Pavement Surface Distress Detection Technology Coupling Deep Learning and Object Detection Algorithms. Infrastructures 2025, 10, 72. https://doi.org/10.3390/infrastructures10040072

AMA Style

Zhang H, Dong Y, Hou Y, Cheng X, Xie P, Di K. Research on Asphalt Pavement Surface Distress Detection Technology Coupling Deep Learning and Object Detection Algorithms. Infrastructures. 2025; 10(4):72. https://doi.org/10.3390/infrastructures10040072

Chicago/Turabian Style

Zhang, Hong, Yuanshuai Dong, Yun Hou, Xiangjun Cheng, Peiwen Xie, and Keming Di. 2025. "Research on Asphalt Pavement Surface Distress Detection Technology Coupling Deep Learning and Object Detection Algorithms" Infrastructures 10, no. 4: 72. https://doi.org/10.3390/infrastructures10040072

APA Style

Zhang, H., Dong, Y., Hou, Y., Cheng, X., Xie, P., & Di, K. (2025). Research on Asphalt Pavement Surface Distress Detection Technology Coupling Deep Learning and Object Detection Algorithms. Infrastructures, 10(4), 72. https://doi.org/10.3390/infrastructures10040072

Article Metrics

Back to TopTop