CrackYOLO: Rural Pavement Distress Detection Model with Complex Scenarios

Li, Yuxuan; Sun, Shangyu; Song, Weidong; Zhang, Jinhe; Teng, Qiaoshuang

doi:10.3390/electronics13020312

Open AccessArticle

CrackYOLO: Rural Pavement Distress Detection Model with Complex Scenarios

by

Yuxuan Li

^1,2,

Shangyu Sun

^1,2,3,*,

Weidong Song

²,

Jinhe Zhang

^1,2 and

Qiaoshuang Teng

^1,2

¹

School of Geomatics, Liaoning Technical University, Fuxin 123000, China

²

Collaborative Innovation Institute of Geospatial Information Service, Liaoning Technical University, Fuxin 123000, China

³

State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(2), 312; https://doi.org/10.3390/electronics13020312

Submission received: 15 December 2023 / Revised: 9 January 2024 / Accepted: 9 January 2024 / Published: 10 January 2024

Download

Browse Figures

Versions Notes

Abstract

:

The maintenance level of rural roads is relatively low, and the automated detection of pavement distress is easily affected by the shadows of rows of trees, weeds, soil, and distress object scale disparities; this makes it difficult to accurately evaluate the distress conditions of the pavement. To solve the above problems, this study specifically designed a target detection network called Crack Convolution (CrackYOLO) for pavement crack extraction on rural roads. CrackYOLO is based on an improved YOLOv5. The shadow created by rows of trees leads to the loss of crack features in the feature extraction and downsampling stages of the network; therefore, CrackConv and Adapt-weight Down Sample (ADSample) were introduced to strengthen the ability to locate and identify cracks. Due to disturbances such as soil and weeds, which cause the extraction of more redundant features, the Channel And Spatial mixed attention mechanism (CAS) was introduced to enhance crack weight. To address the issue of missed detections of fine cracks due to significant scale variations in crack objects in the same image, Multi Scale Convolution (MSConv) and Multi Scale Head (MSHead) were incorporated during the feature fusion and prediction inference stages of the network, thereby improving the multi-scale detection performance. In order to verify the effectiveness of the proposed method, the detection accuracy of CrackYOLO when used on the LNTU_RDD_NC dataset was determined to be 9.99%, 12.79%, and 4.61% higher than that of the current pavement crack detection models YOLO-LWNet, Faster R-CNN, and YOLOv7. At the same time, we compare the above model on public datasets of different scenarios, and the experimental results show that CrackYOLO has the same strong performance in urban roads and other scenarios.

Keywords:

crack detection; deep learning; rural roads; multi-scale features; hybrid attention mechanism

1. Introduction

Rural roads are long and have a wide distribution, and frequent pavement distress detection plays a key role in prolonging the service life of roads and enhancing rural roads to contribute towards local economic revitalization. Compared with the national and provincial trunk highways, the management and maintenance level of rural roads is relatively low, with overgrown roadside vegetation casting dense shadows, the presence of abundant ground cover such as weeds and soil, and cracks of different sizes. The above factors render the automated detection of pavement distress very difficult [1,2]. The pavement condition of typical national and provincial trunk highways and rural roads in the northern region of China is illustrated in Figure 1. National and provincial trunk highways have good pavement quality and less interference due to periodic maintenance and inspection operations, unlike rural roads, which are less maintained and have many interference items on the pavement. Furthermore, the complex shape of the cracks is challenging for the detection network.

As illustrated in Figure 1, rural pavement is obscured by shadows from objects, such as tree branches along the roadside, resulting in a complex image background and recognition difficulty for the detection model when detecting pavement cracks [3,4,5,6,7]. In addition, rural pavements have finer interference terms, similar to weeds, tree branches, and dirt, that are similar to the texture of cracks, leading to misrecognition by the detection network. Furthermore, cracks of different scales may exist in the same image in the rural pavement crack detection task, leading to increased detection difficulty. Therefore, a pavement crack detection algorithm with strong anti-interference and outstanding multi-scale detection capability is very important for rural pavement detection tasks.

With the development of image processing technology in recent years, road crack detection algorithms based on deep learning [4,8,9,10,11,12,13,14,15] have become the focus of research. Algorithms can be divided into two types: those based on semantic segmentation [16,17,18,19,20] and those based on target detection. Rural pavement crack detection requires reliable accuracy and good detection efficiency. Semantic segmentation-based crack detection algorithms are more accurate at extracting crack edge information, but their detection efficiency is lower and more applicable to high-grade urban pavements, which cannot meet the detection needs of rural pavements. Therefore, rural pavement crack detection tasks are more suited to the use of object detection algorithms with high detection efficiency and reliable accuracy. Yuan et al. [21] proposed a FedRD model for the rapid detection of pavement distress, which can not only detect pavement distress but has high accuracy and good reliability. The FedRD model not only has a faster detection speed, but it also demonstrates a fast detection effect in the case of limited edge data. However, it does not make a detailed classification of pavement distress, and the detection accuracy is not high for low pavement distress. Wu et al. [22] proposed a lightweight detection model, YOLO-LWNet, based on mobile terminal devices, which proposes a lightweight backbone and an efficient feature fusion network based on LWC as a basic block. It compares the model with other models in the RDD public dataset. The model has computational complexity, but its detection accuracy is poor. Li et al. [23] proposed a model based on U-net to output both the segmentation and detection results of cracks. They combined the multi-head detection method to output the two kinds of results, thereby constructing a more feature-rich detection model. However, this model is better for segmentation results, and it performs poorly in target classification and detection efficiency. Pham et al. [24] proposed a YOLOv7 road crack detection algorithm based on the YOLOv7 road crack detection algorithm, which combines coordinated attention and label smoothing, integration, and other related accuracy fine-tuning techniques to train a deep learning model. This was proven to have better detection performance through testing the results in the RDD public dataset. However, its training method is complex, and its practical applicability in engineering is poor. Deng et al. [25] applied the YOLOv2 recognition network to identify concrete surface cracks. Compared to its region-based convolutional neural network, YOLOv2 has better detection efficiency and detection accuracy, but compared to the current stage of the target detection field of the model, it is more complex and the accuracy is worse. Liu et al. [26] proposed the application of the nondestructive ground-penetrating radar (GPR) to collect road crack images, and, combined with the improved YOLOv3 to process the collected data, it achieved high recognition accuracy and detection effects; however, the sensor used is both expensive to replicate and is poor in practical applicability. Although pavement distress detection algorithms based on deep learning target detection technology have made progress, there are fewer academic studies on rural pavement distress detection, and at the same time, there is a lack of large-scale, multi-scenario, and all-type training datasets in this field. A comparison between the existing pavement distress detection models and the model proposed in this paper is seen in Table 1.

For the aforementioned issues, this study selects the YOLOv5 detection model, which is widely used for pavement crack detection [27,28], as the base model and uses CrackConv and ADSample in the backbone and ADSample in the feature fusion section to reduce the influence of the network on complex background interference. The CAS module is added to the feature fusion section to improve the network’s anti-interference ability for small cracks. The MSConv and MSHead modules in the feature fusion section are improved to enhance the network’s multi-scale detection ability, and the network is composed of the rural pavement detection system. In order to verify the effectiveness of the algorithm, this study applies a pavement condition monitoring vehicle, collects rural road data images in the field, produces the LNTU_RDD_NC dataset for the study of rural road crack detection technology, and verifies the algorithm proposed in this study.

2. CrackYOLO Detection Model

The CrackYOLO detection model is primarily composed of the backbone, feature fusion section, and head. The network architecture of the CrackYOLO model is illustrated in Figure 2.

Due to the complexity introduced by shadows obscuring rural pavement cracks, the original network exhibits weak target information extraction capabilities in the backbone and loses too much target information during the downsampling feature fusion section. In this study, leveraging the elongated and tubular geometric shape of cracks, we introduce CrackConv in the backbone section of the network to more accurately extract crack features. Simultaneously, we designed ADSample with adaptive weights that can adjust the weight of cracks obscured by shadows to solve the loss of information about crack features covered by shadows experienced during downsampling while improving the receptive field. To address the reduced recognition accuracy caused by fusions of interference, e.g., branches or weeds and crack edge textures, we introduce the CAS adaptive attention mechanism module between the backbone and neck and after up-sampling. This allows the model to focus on crack features, enhancing the model’s resistance to interference. To mitigate the variable scales of cracks on rural roads leading to misidentification, we introduce MSConv to refine the extraction of crack features, therefore enhancing the model’s ability to extract fine cracks. Simultaneously, we replace the MSHead detection head and integrate the output of MSConv. A scale-aware mechanism is introduced, enabling the adaptive allocation of semantic weights to feature maps with larger resolutions. This ensures that the model exhibits better adaptability to targets with significant scale differences.

2.1. Improvement for Complex Backgrounds

Improvements to the backbone are designed to address the geometric shape of cracks, utilizing CrackConv’s deformable convolution, as proposed in previous work, which aims to help the model focus on key features. The minute tubular structure exhibited by cracks, owing to their elongated and slender local construction, has long been considered a challenging task in the field of object detection. Moreover, cracks occupy a relatively small proportion of the entire pavement image, and they are susceptible to complex background interference, such as shadow coverage, thereby increasing the complexity of feature extraction. In this study, it was thought that feature extraction could be carried out through Snake Convolution [29] and variability convolution to improve the extraction accuracy of cracks covered by other cracks. However, when we referred to the model, no better accuracy improvements were observed, so we reconstructed the structure and improved the design of CrackConv to guide the model to focus on the key characteristics of the crack itself and to improve the information weight of the crack. Cracked convolution structure as shown in Figure 3.

Assuming the central coordinates are represented by K_i = (x_i, y_i), the 3 × 3 kernel K is expressed as follows:

K = {(x - 1, y - 1), (x - 1, y), \dots, (x + 1, y + 1)}

(1)

To enhance the convolution’s focus on complex geometric features and prevent the receptive field from deviating from the target, an offset correction Δ is employed. This allows the convolution, based on the morphological knowledge of tubular structures, to adapt and concentrate on the local features of elongated and curved cracks. The specific positions of each grid in K are represented as follows: K_i_±c = (x_i±c, y_i±c), where c = {0, 1…, 4} indicates the horizontal distance from the central grid. The selection of each grid position K_i±c in the convolutional kernel K is an accumulative process compared to K_i, with K_i+1 increasing the offset with reference to K_i ∆ = {δ | δ ∈ [−1, 1]}. Therefore, the offset needs to be Σ to ensure that the convolutional kernel conforms to linear morphological structures. The construction of coordinates is illustrated in Figure 4.

In Figure 4, the variations along the x-axis and y-axis within the receptive field are given by the following equations:

K_{i \pm c} \{\begin{cases} (x_{i + c}, y_{i + c}) = (x_{i} + c, y_{i} + \sum_{i}^{i + c} Δ y), \\ (x_{i - c}, y_{i - c}) = (x_{i} - c, y_{i} + \sum_{i - c}^{i} Δ y), \end{cases}

(2)

K_{j \pm c} \{\begin{cases} (x_{j + c}, y_{j + c}) = (x j + \sum_{j}^{j + c} Δ x, y_{j} + c), \\ (x_{j - c}, y_{j - c}) = (x_{i} + \sum_{j - c}^{j} Δ x, y_{j} - c), \end{cases}

(3)

Due to the variations along the x-axis and y-axis, Figure 4 (right) illustrates the feature extraction process of CrackConv within a receptive field of 9 × 9. In addition to the improvements in the backbone, complex backgrounds can cause the detection model to lose information about cracks themselves during the downsampling process. This issue arises because convolutional kernels use the same parameters for feature extraction within each receptive field without considering the differences in target information at different positions.

The original network’s convolution operation insufficiently recognized the criticality of crack-specific features, which further compromised the effective extraction of crack features. We designed ADSample based on the idea of expanding the receptive field of RFAConv [30], allowing the model to adjust the receptive field while maximizing the information of the crack itself and ensuring that it is not lost while also improving the downsampling operation of the original network to improve the model’s resistance to complex background interference. Global information is extracted from the input features using AvgPool operations that aggregate features within each receptive field. Subsequently, 1 × 1 convolution operations facilitate the interaction of information within the receptive field. Following this, the SoftMax function is applied to highlight the importance of each feature within the receptive field. The features obtained are then fused with spatial features from the receptive field to adjust the weights of convolutional parameters and ultimately output the features. The implementation process is illustrated in Figure 5.

The input feature map can be represented as follows:

\begin{matrix} F & = Softmax (g^{i * i} (AvgPool (X))) * RELU (Norm (g^{k * k} (X))) \\ = A_{rf} * F_{rf} \end{matrix}

(4)

In the above expressions, g^i*i denotes a group convolution of size i * i, k represents the size of the convolutional kernel, Norm indicates normalization, and X represents the input feature mapping and denotes the multiplication of the attention mapping (A_rf) with the transformed spatial features (F_rf) from the receptive field.

However, ADSample prioritizes the spatial features within the receptive field using weights associated with different features in the convolution, multiplying each feature weight with the input features, and summing the results. The feature map using ADSample does not overlap with the spatial features of the receptive field after adjusting its shape. Consequently, the learned attention map aggregates the feature information of each receptive field slider, extracting modules containing crack information.

2.2. Improvement for Fine Interference

To mitigate the impact of fine interference types such as weeds, soil, and branches on the model, in this study, the hybrid attention mechanism was introduced because the attention mechanism can make convolutional neural networks focus on key information [31,32,33]; however, in actual experiments, the network did not focus on the cracks themselves. This study designed the CAS adaptive attention mechanism module according to its related structure, which can adaptively improve the crack information weight and reduce the interference information weight, including branches and weeds, according to the crack characteristics detected by the network. This addresses the issue of decreased accuracy caused by the fusion of rural road crack texture information and surrounding interference, therefore improving the model’s resistance to interference. The structure of the CAS module is depicted in Figure 6.

The effective feature map F obtained from the backbone extraction network is initially subjected to a channel attention mechanism to calculate a weight Q_C. Subsequently, the feature map F is multiplied by the obtained weight Q_C, assigning a corresponding weight to each channel to obtain the channel attention feature F_C. Following this, a spatial attention mechanism is applied to obtain the weight Q_S, and the feature map F is multiplied by this to yield the refined spatial features F_S. Finally, the output feature map is obtained F′. The transformation of the feature map using the CAS hybrid attention mechanism module is expressed by Equation (5).

F' = F * F_{c} * F_{s}

(5)

In the above equation, F represents the input feature map, F_C denotes the feature map with assigned channel weights, F_S represents the feature map with assigned spatial weights, and F′ represents the feature map outputted by the CAS hybrid attention mechanism. The calculation process of the channel attention feature F_C in the CAS module is as follows: the input image feature F undergoes global average pooling and global max pooling, followed by processing through a multi-layer perceptron (MLP). The results of this process are stacked, sigmoid is applied, and the final channel attention feature F_C is obtained. The computation is expressed by Equation (6).

\begin{matrix} F_{C} & = σ (MLP (AvgPool (F)) + MLP (MaxPool (F))) \\ = σ (W_{1} (W_{0} (F_{avg}^{C})) + W_{1} (W_{0} (F_{\max}^{C}))) \end{matrix}

(6)

In the aforementioned equation, F represents a feature map of dimensions H × W × C, W₀, and W₁, respectively, and signifies the correlation coefficient;

F_{avg}^{C}

denotes the feature layer obtained after global average pooling; and

F_{\max}^{C}

signifies the feature layer obtained after global maximum pooling. The spatial attention feature F_S implementation process of the CAS module is outlined as follows:

For each feature point in the obtained channel attention feature F_C, the maximum values F_max and average values F_avg along the channel dimension are computed. Subsequently, these values are concatenated, and a convolution operation with a single channel is applied to adjust the channel dimension. Following this, a sigmoid function is employed to obtain the spatial attention feature, as expressed in Formula (7). The computational procedure is delineated as follows:

\begin{matrix} F_{S} & = σ ((f_{Dilated}^{(3 * 3)} [AvgPool (F); MaxPool (F)])) \\ = σ (f_{Dilated}^{(3 * 3)} (F_{avg}; F_{\max})) \end{matrix}

(7)

The equation,

f_{Dilated}^{(3 * 3)}

represents a 3 * 3 dilated convolutional layers, and [;] denotes the concatenation function. F_avg signifies the feature layer obtained after global average pooling, while F_max denotes the feature layer obtained after global maximum pooling.

2.3. Improvement for Multi-Scale Target Detection

The original YOLOv5 detection model incorporates a feature pyramid structure in the neck section to address multi-scale target detection, enabling feature extraction from three different scales. However, when recognizing targets with significant scale variations, such as rural pavement cracks, the multi-scale structure of the feature pyramid may still fall short of extracting all the fine-grained details of the targets. To address the insufficient multi-scale detection capability of the original detection model, this study improves the feature pyramid section by subdividing the channels of the utilized convolutional layers to extract key information for different-sized targets. In addition, the ability to adjust the distribution of feature maps with different resolutions is increased in the Network Head section.

Inspired by the concept proposed in Scale-Aware Modulation [34], this research introduces a multi-scale convolution, MSConv, to transform the original network’s feature pyramid structure. This modification enables the finer extraction of spatial features across multiple scales. Initially, the convolution channels of the input layer are divided into four parts, and 1 × 1, 3 × 3, 5 × 5, and 7 × 7 convolutional layers are individually applied to extract features from the corresponding parts. The extracted channel feature information is then fine-tuned for multi-scale extraction using pointwise convolution [35], which exchanges channel information. Finally, the refined features are embedded into the network’s feature fusion convolution. The implementation principle is illustrated in Figure 7.

The process involves breaking down the input convolution into four equal parts along the channel dimension through average splitting. Subsequently, channel feature exchange is performed using 1 × 1 convolutions, and the resulting features are then concatenated to reconstruct the convolution.

MSConv (X) = Concat ({Conv}_{1 * 1} (x 1), {Conv}_{3 * 3} (x 2), {Conv}_{5 * 5} (x 3), {Conv}_{7 * 7} (x 4))

(8)

Following the processing by MSConv, the feature maps exhibit a richer scale compared to the feature maps outputted by the original network. However, the original network’s head tends to assign lower weights to feature maps with larger or smaller resolutions, leading to a persistent failure to recognize small cracks. To address this problem, this study introduces MSHead [36], which stacks the input feature pyramid and incorporates a scale-aware mechanism. This mechanism reassigns semantic information weights to feature map layers with significant resolution disparities, thereby enhancing the multi-scale detection’s capability. The structure of MSHead is illustrated in Figure 8.

The input feature map tensor is denoted as

F \in R^{(L \times H \times W \times C)}

, and it is reshaped into a three-dimensional tensor through

S = H \times W

. The generalized form of self-attention can be expressed by the following formula:

W (F) = π_{S} (F) \cdot F

(9)

Π_S represents scale-aware attention, which, based on its semantic significance, facilitates the fusion of features across different scales. The input feature map layer undergoes global pooling to obtain the maximum value of the feature map. A 1 × 1 approximate convolution is responsible for integrating all channels. This is followed by the activation functions RELU and sigmoid. ΠS can be represented by the following formula:

π_{s} (F) \cdot F = σ (f (\frac{1}{SC} \sum_{S, C} F)) F

(10)

where

f (\cdot)

is a linear function and

σ (x)

is a sigmoid activation function.

3. Experimental Results and Analysis

3.1. Experimental Data Preparation

In the current field of pavement crack detection, the commonly used public datasets, such as RDD (Road Damage Detection), mostly cover urban road sections with minimal surface interference. However, datasets specifically addressing pavement distress in rural areas with poor road conditions are relatively scarce. To address this gap, this study constructed the LNTU_RDD_NC dataset, focusing on cracks commonly found in rural roads. The dataset, created by the Traffic Spatiotemporal Big Data Research Center at Liaoning Technical University, involves the collection of pavement images on rural roads in Liaoning Province by a pavement condition monitoring vehicle. The roof of the vehicle is equipped with a high-definition camera with a strong automated collection function that enables the camera to take pictures every 2 m to ensure the continuity and integrity of the data while also allowing it to effectively avoid leakage from road cracks. Distress types included in the dataset were categorized according to the description of the three types of cracks in the “Highway Technical Condition Assessment Standard” (JTG5210-2018) [37], Labelimg was used to label the dataset.

The rural road crack dataset, LNTU_RDD_NC, encompasses three types of common cracks found on rural roads: transverse cracks (tc), longitudinal cracks (lc), and reticular cracks (rc). The original images of the cracks consist of 9801 samples with a resolution of 3200 × 1800, and after cropping, the resolution is adjusted to 1200 × 800. The dataset comprises 3902 images of transverse cracks, 3448 images of longitudinal cracks, and 2451 images of reticular cracks. A subset of the LNTU_RDD_NC dataset is illustrated in Figure 9. LNTU_RDD dataset sample data have been published at https://github.com/leeyxan/LNTU_RDD (accessed on 13 December 2023).

To prove that the model in this paper can be competent for pavement detection tasks in different scenarios, the following public datasets widely used in the field of pavement distress detection are used as comparative experiments:

(a): RDD2022 Public Dataset [38]: The RDD2022 dataset is captured using a smartphone mounted on the interior of a detection vehicle, which is equipped with a mobile phone bracket. It comprises road surface images from various countries with a resolution of 640 × 640, resulting in relatively lower image data quality. The RDD2022 dataset encompasses a total of nine road damage categories. To align with practical maintenance tasks, this study focuses on the longitudinal crack (D00), transverse crack (D10), and reticular crack (D20) categories, utilizing a total of 4805 images for training;
(b): CN_RDD Public Dataset [39]: The CN_RDD public dataset is derived from the G303 section in China and is captured using a professional airborne camera. The dataset consists of road damage data with 4319 images, each with a resolution of 1600 × 1200 pixels. In this study, specific crack types, such as longitudinal crack (D00), transverse crack (D10), and reticular crack (D20), were selected for training;
(c): CrackDataset_DL_HY [40]: CrackDataset_DL_HY is a road crack dataset that integrates both detection and segmentation aspects. The dataset is captured by a mobile measurement collection vehicle, and the acquired images have a resolution of up to 1280 × 960. In this study, a total of 2378 images containing transverse cracks (transverse), reticular cracks (alligator), and longitudinal cracks (longitudinal) were selected for analysis.

3.2. Model Comparative Experiments

3.2.1. Experimental Setup

This experiment was conducted on the PyTorch platform, and the primary parameters of the model training server are as follows: CPU: Intel Core i7-9700k; GPU: GeForce RTX 2070 (Intel, San Francisco, CA, USA) To evaluate the performance of the models themselves, each model was trained on the training set of LNTU_RDD_NC, followed by experimental result comparisons on the pre-defined test set. The parameters of each control model were trained using the optimal training method mentioned in their references. The training settings are outlined in Table 2.

The dataset partitioning for each group is presented in Table 3.

3.2.2. Experimental Results and Comparisons

This study employed Average Precision (AP) and mean Average Precision (mAP) as accuracy evaluation metrics in the field of object detection. The four models were trained with the LNTU_RDD_NC dataset, and their weights were then applied to recognize the partitioned test set according to Table 3. The recognition accuracy results are presented in Figure 10.

From Figure 10, it is evident that the CrackYOLO model achieved a higher overall recognition accuracy on the LNTU_RDD_NC dataset. Compared to the YOLO-LWNet model, it demonstrated an average accuracy improvement of 9.99%, a 5.87% improvement in reticular crack recognition, a 9.71% improvement in transverse crack recognition, and a 14.38% improvement in longitudinal crack recognition. In comparison to the Faster R-CNN model, CrackYOLO exhibited an average accuracy improvement of 12.78%, an 8.66% improvement in reticular crack recognition, a 14.71% improvement in transverse crack recognition, and a 15.00% improvement in longitudinal crack recognition. Compared to the YOLOv7 model, CrackYOLO showed an average accuracy improvement of 4.61%, a 3.07% improvement in reticular crack recognition, a 4.74% improvement in transverse crack recognition, and a 6.00% improvement in longitudinal crack recognition. The recognition comparison charts for the four network models were presented, showcasing images with common rural road disturbances such as complex backgrounds, minor interferences, and multi-scale targets.

Taking Figure 11a,c as examples, YOLO-LWNet, Faster R-CNN, and YOLOv7 were all affected by shadows, leading to recognition issues with the cracks. Due to the stronger feature extraction capability of CrackYOLO’s CrackConv in the backbone network, it was more resistant to shadow interference and less affected by complex backgrounds. As highlighted in Figure 11c, YOLO-LWNet and YOLOv7 failed to recognize the shadow-covered part of the crack, and Faster R-CNN could not detect the crack at all. With the downsampling in ADSample, CrackYOLO enhanced the receptive field and increased the weight of the crack itself, preventing the cracks covered by shadows from being ignored and successfully identifying the entire web-like crack. As highlighted in Figure 11b, YOLO-LWNet and Faster R-CNN missed detections due to interference, where there were disturbances such as soil and ruts around the annotated cracks. Although YOLOv7 identified three cracks, it was disturbed by two transverse cracks and failed to recognize any longitudinal cracks. Thanks to the CAS attention mechanism added in the feature fusion part, CrackYOLO focused more on detailed texture information and successfully extracted the combination of the three types of cracks. For the detection results of multiple-sized cracks in the same image in Figure 11d,e, CrackYOLO with the MSHead detection head in Figure 11d exhibited better scale awareness, resulting in higher confidence in the detected cracks. In Figure 11e, due to the replacement of MSConv with a more refined network detection capability, only CrackYOLO successfully identified small, transverse cracks. To validate the universality of the CrackYOLO model in road crack detection, this study applied the four detection models to train on three public datasets: RDD2022, CN_RDD, and CrackDataset_DL_HY. The models were then tested on the pre-defined test sets to evaluate their detection performance. The recognition results are presented in Figure 12.

According to Figure 12, the CrackYOLO model exhibited higher overall recognition accuracy compared to the other three road surface damage detection models. In the RDD2022 public dataset testing experiments, CrackYOLO outperformed the YOLO-LWNet model, the Faster R-CNN model, and the YOLOv7 model with an average accuracy improvement of 8.31%, 7.37%, and 4.31%, respectively. Notably, in the recognition of reticular cracks and longitudinal cracks, the CrackYOLO model achieved the highest accuracy. However, in the recognition of transverse cracks, YOLOv7 outperformed CrackYOLO by 7.44%. This is attributed to the relatively low angle and fewer perspective disturbances in this dataset, and YOLOv7 was specifically improved for such datasets.

In the CN_RDD dataset, CrackYOLO achieved an average accuracy improvement of 1.66% compared to YOLO-LWNet, 1.96% compared to Faster R-CNN, and 1.04% compared to YOLOv7. For the detection of all three types of cracks, CrackYOLO achieved the highest accuracy. This is because the CN_RDD dataset is captured by a drone in orthographic images, and there are many small cracks, for which the CrackYOLO model has a specially designed improvement module.

In the CrackDataset_DL_HY dataset, CrackYOLO demonstrated an average accuracy improvement of 4.96% compared to YOLO-LWNet, 6.55% compared to Faster R-CNN, and 0.63% compared to YOLOv7. While YOLO-LWNet achieved slightly higher accuracy in reticular crack detection and YOLOv7 had a slight edge in transverse crack detection, CrackYOLO achieved the highest accuracy in longitudinal crack detection. This is because the background in this dataset is simple and has fewer disturbances, leading to good detection performance for commonly used models.

The recognition results on each public dataset are illustrated in Figure 13.

From Figure 13, it can be observed that in the detection on the CN_RDD public dataset, the Faster R-CNN, YOLO-LWNet, and YOLOv7 detection models misclassify the shadows of pedestrians as repair-type damage, while CrackYOLO does not exhibit such an issue. In the second image, Faster R-CNN, YOLO-LWNet, and YOLOv7 detection models misclassified tree branches and their shadows as reticular cracks, while CrackYOLO was not disturbed by them. In the detection on the CrackDataset_DL_HY public dataset, the Faster R-CNN and YOLO-LWNet detection models both missed small cracks and exhibited detection issues in the presence of rut interferences. However, YOLOv7 and CrackYOLO successfully identified the cracks. In the RDD2022 public dataset, due to the severe tilt in shooting angles, Faster R-CNN and YOLO-LWNet exhibited detection issues with obvious longitudinal cracks, but such issues did not occur in the LNTU_RDD_NC dataset created in this study. YOLOv7 and CrackYOLO successfully identified the detection objects.

4. Analysis of Model Method Effectiveness

To demonstrate that the CrackYOLO detection model can meet the requirements of rural road crack detection tasks and to validate the effectiveness of the proposed model, seven experiments are conducted by building upon the YOLOv5 base model. The experiments involve the incorporation of various modules: CrackConv, ADSample, CAS module, MSConv convolution, MSHead detection head, and the complete CrackYOLO recognition network. The models are compared based on their detection performance. Additionally, two control experiments are conducted to showcase the model’s optimal ability to withstand complex backgrounds and demonstrate its best multi-scale detection capability. CrackConv and ADSample aim to reduce the impact of complex backgrounds, and they are simultaneously added to the model as the eighth control experiment. MSConv and MSHead detection heads, which both have multi-scale detection capabilities, are simultaneously added to the model as the ninth control experiment. The experiments are conducted on the test set allocated according to Table 3, and the recognition results are presented in Figure 14.

The improvements to the main feature extraction network using CrackConv, compared to the original YOLOv5 detection model, resulted in enhanced feature extraction capabilities. The average recognition accuracy increased by 2.82%, and the accuracy for reticular cracks, transverse cracks, and longitudinal cracks increased by 2.74%, 2.89%, and 2.83%, respectively.

The improvement in the neck with the ADSample compared to the original YOLOv5 detection model reduced the loss of crack feature information under shadow coverage during downsampling. The average recognition accuracy increased by 0.58%; the accuracy for reticular cracks remains the same; for transverse cracks, it increased by 0.58%; and for longitudinal cracks, it increased by 1.18%.

Combining the CrackConv and ADSample modules in the base network to resist complex background interference results in an average recognition accuracy increase of 4.75%, an accuracy increase of 3.40% for reticular cracks, an increase of 5.51% for transverse cracks, and an increase of 5.36% for longitudinal cracks. This proves that the CrackConv and ADSample modules improved in this study exhibit good detection performance in rural road crack recognition.

Addressing the issue of resisting small disturbances by adding the CAS adaptive attention mechanism resulted in an average recognition accuracy increase of 0.87%, an accuracy increase of 1.73% for reticular cracks, an increase of 0.80% for transverse cracks, and an increase of 0.10% for longitudinal cracks. This improvement in the CAS module provided certain help in rural road crack detection against small interferences.

Improving multi-scale convolution (MSConv) in the neck part enhanced the model’s fine detection capability. Compared to the original YOLOv5 detection model, the average recognition accuracy increased by 1.56%, and the accuracy for reticular cracks, transverse cracks, and longitudinal cracks increased by 3.12%, 0.85%, and 0.72%, respectively.

Improving the multi-scale target detection head (AHead) in the head resulted in a recognition accuracy increase of 2.69%, an accuracy increase of 1.24% for reticular cracks, an increase of 3.87% for transverse cracks, and an increase of 3.02% for longitudinal cracks.

Addressing the model’s ability to recognize multi-scale cracks by adding the MSConv and MSHead modules to the base network results in an average recognition accuracy increase of 5.44%, an accuracy increase of 5.20% for reticular cracks, an increase of 6.02% for transverse cracks, and an increase of 5.11% for longitudinal cracks. This improvement in the MSConv and MSHead modules significantly enhanced rural road crack detection.

Compared to the original YOLOv5 detection model, the CrackYOLO detection model achieved an average recognition accuracy increase of 6.70% and an accuracy increase of 6.16% for reticular cracks, 7.10% for transverse cracks, and 6.85% for longitudinal cracks. This demonstrates that the improved model in this study provides higher accuracy in rural road crack detection.

To address the influence of complex backgrounds, such as shadow coverage, this study designed the CrackConv and ADSample modules to encourage the models to focus more on the features of the cracks themselves. To verify the effectiveness of these modules, the original YOLOv5 detection network was used as a base control, and the detection results of the aforementioned modules were summarized. Representative images were selected, and heat maps of the network were drawn. The network with both CrackConv and ADSample modules is abbreviated as (V5 + CC + AD), and the summarized results are as follows:

The samples shown in Figure 15a indicate that in the YOLOv5 network, there were two instances of missed recognition in cracks covered by tree branch shadows. The network with the ADSample module successfully recognized small cracks under shadow coverage, and the CrackConv module showed good resistance to shadow coverage, resulting in higher confidence in recognition.

In Figure 15b, the original network failed to recognize cracks covered by shadows, while the network with CrackConv and ADSample modules successfully identified cracks under shadow coverage.

Figure 15c shows that the CrackConv and ADSample modules successfully recognized cracks covered by shadows, but they failed to identify branches around the cracks. When both modules were added to the network, even the small crack branches were successfully identified.

The CAS module proposed in this study to address small disturbances such as weeds and branches was compared with the original network. The performance was analyzed based on the network heatmap, and the comparative detection results are as follows:

In the sample shown in Figure 16a, the original network failed to successfully identify transverse cracks around the cracks with soil and animal feces. The network with the CAS module effectively avoided interference from small branches to the cracks.

In Figure 16b, the sample showed enlarged processing of small disturbances. It can be observed that the original network misidentified branches in the image as longitudinal cracks. The network with the CAS module was not disturbed by these branches.

This study addressed the difficulty of recognizing cracks when an image presents cracks on multiple scales by designing the MSConv and MSHead modules to improve the multi-scale detection performance of the detection model. To verify the effectiveness of these modules, the original YOLOv5 detection network was used as a baseline, and the detection results of the above modules were summarized. Representative images were selected, and heat maps of the network were generated. The network using both MSConv and MSHead modules was denoted as (V5 + MSC + MSH). The summarized results are as follows:

In the sample shown in Figure 17a, the detection models from the four experiments exhibited some multi-scale detection capability. However, the original network paid less attention to detecting small cracks. With the addition of the MSConv and MSHead modules, there was an improvement in confidence for the identified small cracks.

Figure 17b shows a sample with four cracks of different sizes. The original network could only recognize the larger cracks. After adding the MSConv, it recognized two small cracks, and after adding the MSHead, it recognized three small cracks. Adding both of these modules enabled the detection of cracks at all scales. Therefore, it was evident that embedding the MSConv and MSHead modules simultaneously into the model significantly increased the multi-scale detection capability of the detection model.

In Figure 17c, the sample featured a larger transverse crack surrounded by smaller longitudinal cracks. The original network failed to identify both scales of cracks. After adding MSConv, it struggled to identify the smaller cracks close to the transverse crack. However, after adding MSHead, it successfully identified both scales of cracks.

5. Conclusions

There are often obstacles, including complex backgrounds, disturbances, and variable scale, in rural road crack detection that make existing road crack detection models ineffective. Aiming to solve this problem, this study designed the CrackYOLO rural road crack detection model, introducing unique modules such as CrackConv, ADSample, CAS, MSConv, and MSHead to improve crack feature extraction and solve problems such as shadows and variable crack scale. Compared with the standard model, the model described here exhibits excellent performance, especially in complex rural environments. To verify the effectiveness of the proposed algorithm, this study created the LNTU_RDD_NC dataset and used it to conduct experiments. The experimental results show that the CrackYOLO model has significant advantages compared with other commonly used road crack detection models for crack detection tasks carried out on rural pavement. At the same time, we also use the public dataset of pavement crack detection in different scenarios, such as urban pavement, to conduct experiments. The results show that the CrackYOLO detection model has good detection performance not only in rural pavement scenarios but also in urban pavement and other scenarios.

However, this study has limitations, particularly in detecting targets in various scenarios. In future work, efforts must be made to expand detection tasks, such as pothole detection, to improve the application scenarios of the study.

Author Contributions

Conceptualization—S.S. and Y.L.; methodology, S.S.; contributed to refining the methodology, J.Z.; data collection, W.S. and Q.T.; supervised the data collection process, Y.L.; data analysis—Y.L.; reviewed the analysis results and provided valuable insights, S.S. and Q.T.; literature review, Y.L.; contributed by identifying additional sources, critiquing literature, and refining the theoretical framework, S.S.; writing—review and editing, W.S., Q.T. and S.S.; visualization, Y.L.; contributed to the design and interpretation of visual elements, J.Z. and Q.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant No. 42071343).

Data Availability Statement

Data are available within the article.

Conflicts of Interest

The authors declare no conflicts of interest related to this research.

References

Tawalare, A.; Vasudeva Raju, K. Pavement Performance Index for Indian Rural Roads. Perspect. Sci. 2016, 8, 447–451. [Google Scholar] [CrossRef]
Sandamal, R.M.K.; Pasindu, H.R. Applicability of Smartphone-Based Roughness Data for Rural Road Pavement Condition Evaluation. Int. J. Pavement Eng. 2022, 23, 663–672. [Google Scholar] [CrossRef]
Mohan, A.; Poobal, S. Crack Detection Using Image Processing: A Critical Review and Analysis. Alex. Eng. J. 2018, 57, 787–798. [Google Scholar] [CrossRef]
Bhat, S.; Naik, S.; Gaonkar, M.; Sawant, P.; Aswale, S.; Shetgaonkar, P. A Survey on Road Crack Detection Techniques. In Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, 24–25 February 2020; pp. 1–6. [Google Scholar]
Gopalakrishnan, K. Deep Learning in Data-Driven Pavement Image Analysis and Automated Distress Detection: A Review. Data 2018, 3, 28. [Google Scholar] [CrossRef]
Munawar, H.S.; Hammad, A.W.A.; Haddad, A.; Soares, C.A.P.; Waller, S.T. Image-Based Crack Detection Methods: A Review. Infrastructures 2021, 6, 115. [Google Scholar] [CrossRef]
Hamishebahar, Y.; Guan, H.; So, S.; Jo, J. A Comprehensive Review of Deep Learning-Based Crack Detection Approaches. Appl. Sci. 2022, 12, 1374. [Google Scholar] [CrossRef]
An, Q.; Chen, X.; Wang, H.; Yang, H.; Yang, Y.; Huang, W.; Wang, L. Segmentation of Concrete Cracks by Using Fractal Dimension and UHK-Net. Fractal Fract. 2022, 6, 95. [Google Scholar] [CrossRef]
Hassan, S.-A.; Rahim, T.; Shin, S.-Y. An Improved Deep Convolutional Neural Network-Based Autonomous Road Inspection Scheme Using Unmanned Aerial Vehicles. Electronics 2021, 10, 2764. [Google Scholar] [CrossRef]
Lee, T.; Yoon, Y.; Chun, C.; Ryu, S. CNN-Based Road-Surface Crack Detection Model That Responds to Brightness Changes. Electronics 2021, 10, 1402. [Google Scholar] [CrossRef]
Xu, C.; Zhang, Q.; Mei, L.; Shen, S.; Ye, Z.; Li, D.; Yang, W.; Zhou, X. Dense Multiscale Feature Learning Transformer Embedding Cross-Shaped Attention for Road Damage Detection. Electronics 2023, 12, 898. [Google Scholar] [CrossRef]
Choi, S.; Do, M. Development of the Road Pavement Deterioration Model Based on the Deep Learning Method. Electronics 2020, 9, 3. [Google Scholar] [CrossRef]
Qi, Y.; Wan, F.; Lei, G.; Liu, W.; Xu, L.; Ye, Z.; Zhou, W. GMDNet: An Irregular Pavement Crack Segmentation Method Based on Multi-Scale Convolutional Attention Aggregation. Electronics 2023, 12, 3348. [Google Scholar] [CrossRef]
Sasaki, T.; Shioya, R.; Sakai, T.; Kinoshita, S.; Nojiri, T.; Terabayashi, K.; Jindai, M. Position and Posture Measurements Using Laser Projection Markers for Infrastructure Inspection. Electronics 2020, 9, 807. [Google Scholar] [CrossRef]
Vrochidou, E.; Sidiropoulos, G.K.; Ouzounis, A.G.; Lampoglou, A.; Tsimperidis, I.; Papakostas, G.A.; Sarafis, I.T.; Kalpakis, V.; Stamkos, A. Towards Robotic Marble Resin Application: Crack Detection on Marble Using Deep Learning. Electronics 2022, 11, 3289. [Google Scholar] [CrossRef]
Xu, S.; Xu, X.; Wei, H.; Du, J. DbCrackNet: Dual-Branch Network for Crack Segmentation. In Proceedings of the 2022 China Automation Congress (CAC), Xiamen, China, 25 November 2022; pp. 690–695. [Google Scholar]
Jun, F.; Jiakuan, L.; Yichen, S.; Ying, Z.; Chenyang, Z. ACAU-Net: Atrous Convolution and Attention U-Net Model for Pavement Crack Segmentation. In Proceedings of the 2022 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI), Shijiazhuang, China, 22–24 July 2022; pp. 561–565. [Google Scholar]
Wang, Y.; Song, K.; Liu, J.; Dong, H.; Yan, Y.; Jiang, P. RENet: Rectangular Convolution Pyramid and Edge Enhancement Network for Salient Object Detection of Pavement Cracks. Measurement 2021, 170, 108698. [Google Scholar] [CrossRef]
Zhang, A.; Wang, K.C.P.; Li, B.; Yang, E.; Dai, X.; Peng, Y.; Fei, Y.; Liu, Y.; Li, J.Q.; Chen, C. Automated Pixel-Level Pavement Crack Detection on 3D Asphalt Surfaces Using a Deep-Learning Network: Pixel-Level Pavement Crack Detection on 3D Asphalt Surfaces. Comput. Aided Civ. Infrastruct. Eng. 2017, 32, 805–819. [Google Scholar] [CrossRef]
Fan, X.; Cao, P.; Shi, P.; Wang, J.; Xin, Y.; Huang, W. A Nested Unet with Attention Mechanism for Road Crack Image Segmentation. In Proceedings of the 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP), Nanjing, China, 22 October 2021; pp. 189–193. [Google Scholar]
Yuan, Y.; Yuan, Y.; Baker, T.; Kolbe, L.M.; Hogrefe, D. FedRD: Privacy-Preserving Adaptive Federated Learning Framework for Intelligent Hazardous Road Damage Detection and Warning. Future Gener. Comput. Syst. 2021, 125, 385–398. [Google Scholar] [CrossRef]
Wu, C.; Ye, M.; Zhang, J.; Ma, Y. YOLO-LWNet: A Lightweight Road Damage Object Detection Network for Mobile Terminal Devices. Sensors 2023, 23, 3268. [Google Scholar] [CrossRef]
Li, P.; Xia, H.; Zhou, B.; Yan, F.; Guo, R. A Method to Improve the Accuracy of Pavement Crack Identification by Combining a Semantic Segmentation and Edge Detection Model. Appl. Sci. 2022, 12, 4714. [Google Scholar] [CrossRef]
Pham, V.; Nguyen, D.; Donan, C. Road Damage Detection and Classification with YOLOv7. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17 December 2022; pp. 6416–6423. [Google Scholar]
Deng, J.; Lu, Y.; Lee, V.C.-S. Imaging-Based Crack Detection on Concrete Surfaces Using You Only Look Once Network. Struct. Health Monit. 2021, 20, 484–499. [Google Scholar] [CrossRef]
Liu, Z.; Gu, X.; Yang, H.; Wang, L.; Chen, Y.; Wang, D. Novel YOLOv3 Model with Structure and Hyperparameter Optimization for Detection of Pavement Concealed Cracks in GPR Images. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22258–22268. [Google Scholar] [CrossRef]
Wang, S.; Chen, X.; Dong, Q. Detection of Asphalt Pavement Cracks Based on Vision Transformer Improved YOLO V5. J. Transp. Eng. Part B Pavements 2023, 149, 04023004. [Google Scholar] [CrossRef]
Du, Y.; Pan, N.; Xu, Z.; Deng, F.; Shen, Y.; Kang, H. Pavement Distress Detection and Classification Based on YOLO Network. Int. J. Pavement Eng. 2021, 22, 1659–1672. [Google Scholar] [CrossRef]
Qi, Y.; He, Y.; Qi, X.; Zhang, Y.; Yang, G. Dynamic Snake Convolution Based on Topological Geometric Constraints for Tubular Structure Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6070–6079. [Google Scholar]
Zhang, X.; Liu, C.; Yang, D.; Song, T.; Ye, Y.; Li, K.; Song, Y. RFAConv: Innovating Spatial Attention and Standard Convolutional Operation. arXiv 2023, arXiv:2304.03198. [Google Scholar]
Park, J.; Woo, S.; Lee, J.-Y.; Kweon, I.S. BAM: Bottleneck Attention Module. In Proceedings of the British Machine Vision Conference (BMVC), British Machine Vision Association (BMVA), Newcastle, UK, 3–6 September 2018. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13708–13717. [Google Scholar] [CrossRef]
Lin, W.; Wu, Z.; Chen, J.; Huang, J.; Jin, L. Scale-Aware Modulation Meet Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 6015–6026. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; pp. 1577–1586. [Google Scholar] [CrossRef]
Dai, X.; Chen, Y.; Xiao, B.; Chen, D.; Liu, M.; Yuan, L.; Zhang, L. Dynamic Head: Unifying Object Detection Heads with Attentions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7373–7382. [Google Scholar]
JTG5210-2018; Highway Performance Assessment Standards. Ministry of Transport of the People’s Republic of China, Research Institute of Highway Ministry of Transport: Beijing, China, 2018.
Arya, D.; Maeda, H.; Ghosh, S.K.; Toshniwal, D.; Sekimoto, Y. RDD2020: An Annotated Image Dataset for Automatic Road Damage Detection Using Deep Learning. Data Brief 2021, 36, 107133. [Google Scholar] [CrossRef]
Zhang, H.; Wu, Z.; Qiu, Y.; Zhai, X.; Wang, Z.; Xu, P.; Liu, Z.; Li, X.; Jiang, N. A New Road Damage Detection Baseline with Attention Learning. Appl. Sci. 2022, 12, 7594. [Google Scholar] [CrossRef]
Huyan, J.; Li, W.; Tighe, S.; Xu, Z.; Zhai, J. CrackU-net: A Novel Deep Convolutional Neural Network for Pixelwise Pavement Crack Detection. Struct. Control Health Monit. 2020, 27, e2551. [Google Scholar] [CrossRef]
Xu, X.; Zhao, M.; Shi, P.; Ren, R.; He, X.; Wei, X.; Yang, H. Crack Detection and Comparison Study Based on Faster R-CNN and Mask R-CNN. Sensors 2022, 22, 1215. [Google Scholar] [CrossRef]

Figure 1. Comparison of two pavement crack images.

Figure 2. CrackYOLO model structure.

Figure 3. Crack convolution.

Figure 4. Illustration of the coordinate calculation for CrackConv.

Figure 5. ADSample down sample.

Figure 6. CAS hybrid attention mechanism structure diagram.

Figure 7. MSConv multi-scale convolution.

Figure 8. AHead detection head.

Figure 9. Selected LNTU_RDD_NC dataset images.

Figure 10. Different network recognition effect comparisons.

Figure 11. Dataset of different models to identify cracks in rural roads. (a) Showing transverse cracks in shadow; (b) Showing disturbing cracks; (c) Showing reticular cracks in shadow interference; (d) Showing vertical cracks of different scales; (e) Showing transverse cracks with small disturbances.

Figure 12. Public dataset identification results.

Figure 13. Open dataset recognition effect.

Figure 14. Analysis of experimental identification results for each group.

Figure 15. Shadow occlusion detection results. (a) Shows transverse cracks under shadow; (b) Shows reticular cracks under shadow; (c) Shows longitudinal cracks under shadow.

Figure 16. Minor interference detection results. (a) Shows the interference items such as branches and soil around the crack; (b) Shows the situation where the interference items of branches are misidentified.

Figure 17. Multi-scale detection results. (a) shows small transverse cracks of different scales; (b) shows vertical cracks of multiple scales; (c) shows small longitudinal cracks of different scales.

Table 1. Comparison between existing models and the proposed model.

Contrast Type	Existing Detection Methods	Ours Detection Methods
Performance	Fast and accurate, refined inspection	Reliable detection accuracy and strong anti-interference
Advantages	High detection accuracy	1. The main research direction is rural roads, which have so far received less attention 2. Constructed a unique rural pavement dataset 3. Self-developed downsampling, multi-scale convolution, and detection head to improve model detection ability
Limitations	1. It cannot be used to detect images depicting complex scenes 2. Lack of research on widely distributed rural pavement	1. The measurement accuracy of the specific damage area of the crack is not high 2. Distress types need to be enriched

Table 2. Model training settings.

Model Name	Learning Rate	Batch Size	Epoch	Optimizer
YOLO-LWNet [22]	10⁻⁴	16	300	SGD
Faster-RCNN [41]	0.005	8	300	SGD
YOLOv7 [24]	10⁻⁴	4	300	SGD
CrackYOLO	10⁻⁴	4	300	SGD

Table 3. Data distribution table by category.

Dataset Name	Train	Val	Test	Total
LNTU_RDD_NC	7841	980	980	9801
RDD2022	3844	480	481	4805
CN_RDD	3455	432	432	4319
CrackDataset_DL_HY	1902	238	238	2378

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Sun, S.; Song, W.; Zhang, J.; Teng, Q. CrackYOLO: Rural Pavement Distress Detection Model with Complex Scenarios. Electronics 2024, 13, 312. https://doi.org/10.3390/electronics13020312

AMA Style

Li Y, Sun S, Song W, Zhang J, Teng Q. CrackYOLO: Rural Pavement Distress Detection Model with Complex Scenarios. Electronics. 2024; 13(2):312. https://doi.org/10.3390/electronics13020312

Chicago/Turabian Style

Li, Yuxuan, Shangyu Sun, Weidong Song, Jinhe Zhang, and Qiaoshuang Teng. 2024. "CrackYOLO: Rural Pavement Distress Detection Model with Complex Scenarios" Electronics 13, no. 2: 312. https://doi.org/10.3390/electronics13020312

APA Style

Li, Y., Sun, S., Song, W., Zhang, J., & Teng, Q. (2024). CrackYOLO: Rural Pavement Distress Detection Model with Complex Scenarios. Electronics, 13(2), 312. https://doi.org/10.3390/electronics13020312

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CrackYOLO: Rural Pavement Distress Detection Model with Complex Scenarios

Abstract

1. Introduction

2. CrackYOLO Detection Model

2.1. Improvement for Complex Backgrounds

2.2. Improvement for Fine Interference

2.3. Improvement for Multi-Scale Target Detection

3. Experimental Results and Analysis

3.1. Experimental Data Preparation

3.2. Model Comparative Experiments

3.2.1. Experimental Setup

3.2.2. Experimental Results and Comparisons

4. Analysis of Model Method Effectiveness

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI