Comparison of Residual Network and Other Classical Models for Classification of Interlayer Distresses in Pavement

Cai, Wenlong; Li, Mingjie; Jin, Guanglai; Liu, Qilin; Lu, Congde

doi:10.3390/app14156568

Open AccessArticle

Comparison of Residual Network and Other Classical Models for Classification of Interlayer Distresses in Pavement

by

Wenlong Cai

¹,

Mingjie Li

^2,*,

Guanglai Jin

¹,

Qilin Liu

² and

Congde Lu

²

¹

Jiangsu Sinoroad Engineering Technology Research Institute Co., Ltd., Nanjing 211800, China

²

School of Mechanical and Electrical Engineering, Chengdu University of Technology, Chengdu 610059, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(15), 6568; https://doi.org/10.3390/app14156568

Submission received: 3 May 2024 / Revised: 21 July 2024 / Accepted: 24 July 2024 / Published: 27 July 2024

(This article belongs to the Special Issue Damage Monitoring and Defect Identification Based on Deep/Machine Learning)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Many automatic classification methods published can identify the main hidden distress types of highways, but they cannot meet the precise needs of operation and maintenance. The classification of interlayer distresses based on ground penetrating radar (GPR) images is very important to improve maintenance efficiency and reduce cost. However, among models of different complexities, which models are suitable for the interlayer distress data needs further verification. Firstly, to cover enough of the variable range of distress samples, the interlayer distress dataset collected containing 32,038 samples was subcategorized into three types: interlayer debonding, interlayer water seepage, and interlayer loosening. Secondly, residual networks (ResNets) that render easier to build shallower or deeper networks (ResNet-4, ResNet-6, ResNet-8, ResNet-10, ResNet-14, ResNet-18, ResNet-34, and ResNet-50) and five classical network models (DenseNet-121, EfficientNet B0, SqueezeNet1_0, MobileNet V2, and VGG-19) were evaluated by training and validation loss, test accuracy, and model complexity. The experimental results show that all models have high test accuracy with little difference, but ResNet-4, ResNet-6, SqueezeNet1_0, and ResNet-8 exhibit no overfitting which means they have good generalization performance.

Keywords:

interlayer distress; GPR B-scans; deep learning; ResNet

1. Introduction

The highway plays an important role in promoting the economic growth of the country, the improvement in people’s quality of life, and the coordinated development of regions. Scientific and timely maintenance of the highway can ensure traffic safety, extend the service life of the highway, and provide users with a safer, more convenient, and more comfortable travel environment. In recent years, the total amount of highway infrastructure in China has increased significantly, so the efficient and accurate detection of hidden distresses in highways is a key problem that needs to be addressed urgently. So far, a variety of non-destructive testing (NDT) technologies have been applied to recognize hidden distresses in pavement, which can provide a scientific basis for highway maintenance, for example, ground penetrating radar (GPR) [1], X-ray [2], and ultrasonic damage detection [3], and so on. As a non-destructive geophysical method with high efficiency, high precision, and continuous sampling [4,5,6], GPR has become one of the mainstream detection methods for hidden distresses in highway.

At present, the amount of data obtained by a vehicle-mounted GPR for highway hidden distress detection is relatively large. Manual interpretation relies heavily on manpower and practitioner experience, which is time-consuming and costly, and human error is inevitable. To avoid these problems, automatic interpretation or automatic identification of hidden distresses in highways has become a current research focus [7]. In 2012, AlexNet was proposed as the first convolutional neural network (CNN) to attract widespread attention. Since then, CNNs of different architectures have appeared successively, such as visual geometry group network (VGG), residual network (ResNet), MobileNet, etc. Due to the powerful feature extraction capability of CNNs, they have been widely used in distress classification and detection. Hidden distresses of highways can be roughly divided into two types: distresses with hyperbolic characteristics such as cracks and voids; distresses with non-hyperbolic features such as interlayer distress and structure loosening.

At present, most studies mainly focus on the automatic recognition of distresses with hyperbolic features. For example, the recognition CNN and the location CNN were respectively designed to automatically identify and locate concealed cracks between the pavement surface and the base layer [8]. As a one-stage object detection framework, YOLO [9] has a high detection efficiency and has evolved into a series of versions. DarkNet was used as the backbone of YOLOv3, with four-scale detection layers (FDL) to autonomously identify concealed cracks of pavement, effectively addressing the issue of missed detection of minor cracks to some extent [10]. ResNet50vd-deformable convolution (DCN) was used as a backbone of the novel YOLOv3 for pavement crack detection [11]. A method based on pixel points was proposed to quantify and calculate the vertical height of cracks, and combined YOLOv5 with the backbone of ResNet-50 to detect cracks [12]. Next, an automatic detection method based on YOLOv4 with the backbone of CSPDarknet53 was proposed to identify pavement subsurface cracks and voids [13]. Meanwhile, the two-stage target detection framework was also used to detect hidden cracks in pavement, such as Region-CNN (RCNN) [14] and Mask RCNN [15]. ResNeXt-101 with a feature pyramid network (FPN) was selected as the backbone for the Mask RCNN to detect cracks based on GPR B-scan, and the detection performance was improved [16]. Further, an improved Mask RCNN with the backbone of ResNet-101 was proposed to automatically detect and segment small cracks in asphalt pavement at the pixel level [17]. With the use of 3D GPR, the performance of YOLOv3 with the backbone of DarkNet-53, YOLOv4 with the backbone of CSPDarkNet-53, and YOLOv5 the backbone of CSPDarkNet-53 in automatically detecting concealed cracks in asphalt pavement is compared using 3D GPR images [18].

There are also a small number of studies that are not based on hyperbolic features, such as identifying and locating debonding, water damage, looseness, and settlement. Residual networks have been widely utilized for the identification of hidden distresses due to their exceptional performance. For instance, ResNet-50 was employed for subgrade distress classification and yielded favorable results [19]. Meanwhile, the combination of ResNet-50 and YOLOv2 in detection frameworks showed promising performance in detecting water damage on pavement [20]. Additionally, the use of ResNet-50, EfficientNet, and VIT in distress recognition based on GPR images demonstrated reliable performance [21]. VGG16 and ResNet-50 were applied for distress classification based on 3D-GPR with satisfactory results [22]. In addition to residual networks, other network architectures are also widely used for hidden distresses identification. For instance, a backpropagation neural network method was proposed to recognize loose damage in pavement based on GPR A-scans [23], which can reduce the reliance on empirical judgment in GPR image interpretation. Meanwhile, as the advantages of CNNs in classification and regression tasks, the classification and the regression network architecture were designed separately to classify moisture quantitatively and predict water content based on GPR A-scans [24]. Additionally, DenseNet was used as a backbone of a novel B-scan image generator, and detected distresses based on the simulation data mixed with real B-scan images [25]. And a comparison among Xception, Inception, EfficientNet, and MobileNet using GPR images revealed that MobileNet exhibited the best performance [26], while an analysis indicated that SqueezeNet performs well too [27]. Based on GPR data, two network-in-network were respectively used to recognize and locate water-damage pits and uneven settlements [28]. And an improved two-way object detection model with the backbone of Res2Net for targets of different scales was proposed to detect settlement and loosening in pavement [29]. Moreover, YOLOv3 with the backbones of DarkNet-53 and U-Net model were combined to detect and segment debonding in pavement, which made further advances in automatically detecting and locating the distresses in pavement [30]. However, there is limited research on the identification of interlayer distresses.

In recent years, we have carried out distress detection on several highways in Jiangsu Province and found that most of the distresses are interlayer distresses [31,32]. Through the comprehensive analysis of interlayer distresses, the initial manifestation is the lack of adhesion between highway layers, leading to water infiltration and subsequent destruction of the pavement structure. Ultimately, when interlayer water seepage occurs, it is not repaired in time, and the action of load results in interlayer loosening. Therefore, the interlayer distresses may be further categorized as interlayer debonding, interlayer water seepage, and interlayer loosening (refer to Section 2.1 for specific descriptions). Nevertheless, achieving precise automatic identification of these interlayer distresses remains a formidable challenge.

For the time being, deep CNN has emerged as the leading approach in image analysis, significantly advancing various classification tasks with performance that often surpasses human accuracy. The deep CNN has thus become the mainstream method for distress classification in GPR data [33], such as ResNet, DenseNet, EfficientNet, SqueezeNet, MobileNet, and VGG. However, among models of different complexities, which models are suitable for interlayer distress data needs further verification. Among these deep learning architectures, ResNet [34] performs well in top-1 accuracy [35], especially as the residual structure makes it easier to build shallower or deeper networks in the construction of different complexity. Moreover, ResNet has a unique residual structure which can effectively alleviate overfitting, and the residual representation may also have advantages on small datasets with simple features. So, this paper compares eight ResNet models (ResNet-4, ResNet-6, ResNet-8, ResNet-10, ResNet-14, ResNet-18, ResNet-34, and ResNet-50) and another five models (DenseNet, EfficientNet, SqueezeNet, MobileNet, and VGG) known for their good performance on GPR images to evaluate the performance of these models on the recognition of interlayer distress.

Section 2 introduces the methodology, including dataset description, CNN-based models, evaluation metrics and gradient-weighted class activation mapping (Grad-CAM). Section 3 presents and analyzes the experimental results, including training and test performance, the confusion matrix, and Grad-CAM outcomes. Finally, the conclusion is summarized in Section 4.

2. Methodology

2.1. Dataset Construction and Analysis

The data collection in this paper is mainly concentrated in Jiangsu Province, including Nanjing-Lianyungang Highway, Shanghai-Nanjing Highway, Beijing-Shanghai Highway, Coastal Highway, Nanjing-Hangzhou Highway, Changzhou-Yixing Highway, and Wuxi-Zhangjiagang Highway. The data are collected by using a vehicle-mounted GPR, which emits high-frequency electromagnetic waves to the ground through transmitting antenna. When these waves encounter media with varying dielectric characteristics, they produce reflection and refraction, leading to changes in the intensity and waveform of the electromagnetic field. The reflected waves are then captured by the receiving antenna. By data processing and analysis, it is possible to infer the spatial position, structure, and shape of the underground objects. The GPR used in this paper is the fourth-generation high dynamic ground penetrating radar, i.e., MALA GX750. The basic parameters are shown in Table 1. The center frequency of the emitted electromagnetic wave is 750 MHz; the scanning rate is 1290 channels per second; the detection depth ranges from 0 to 1.5 m; the coupling mode is air coupling; and the weight is 3.6 kg.

In the data collected by GPR, the A-scan records the radar response at a specific position, and the image formed by connecting the A-scans recorded along the measurement line is referred to as a B-scan. The horizontal axis of the B-scan represents spatial position, while the vertical axis represents time. The original data were sequentially processed through dewow, time-zero correction, time-varying gain, background removal, deconvolution, band-pass filtering, and moving average processing to obtain B-scan images with clearly visible distress features. Subsequently, the B-scan images processed were labeled manually. Finally, once distresses are labeled, a corresponding label file is generated, and the distress samples are then cropped according to the coordinate information of the distress box in the label file. The specific process is illustrated in Figure 1.

The classification of interlayer distresses can improve maintenance efficiency and reduce maintenance costs. For example, if an interlayer distress is identified as interlayer debonding, maintenance is necessary but not urgent under the owner’s current regulations. If an interlayer distress is identified as interlayer water seepage, it indicates a problem with horizontal drainage, and it is necessary to not only adopt the grouting maintenance method, but also consider implementing a drainage maintenance scheme. If an interlayer distress is identified as interlayer loosening, then it is necessary to adopt the grouting maintenance scheme in time. Therefore, the classification of interlayer distresses is a very important task for highway operation and maintenance engineering. So, interlayer distresses in pavement need to be further categorized as interlayer debonding, interlayer water seepage, and interlayer loosening.

Figure 2 shows the B-scan images and core samples of interlayer distress and samples of three types of interlayer distress. In Figure 2, the red square represents the location of the distress, A represents the coring position, B represents the interface of the surface layer and the upper base layer, and C represents the interface of the upper base layer and the lower base layer. By comparing and analyzing the B-scan image and the corresponding drilled sample core, combined with the experience of experts in the field, the abnormal area, that is the distress, can be analyzed and classified as interlayer debonding (Figure 2a), interlayer water seepage (Figure 2b) and interlayer loosening (Figure 2c).

The characteristics of three distresses are described as follows [31,32]:

Interlayer debonding: The term refers to the continuous delamination phenomenon within the pavement interlayer structure, characterized by hollowing and debonding between layers, typically resulting from aging and excessive loading, causing bond separation while maintaining good horizontal continuity. The random samples with interlayer debonding are shown in Figure 2a. Interlayer debonding is labeled as “Poor_l”.
Interlayer water seepage: When interlayer debonding already exists and is not promptly maintained, rain and other infiltration may lead to interlayer water seepage. The dielectric constant of water and air is very different, so the polarity reversal of the interlayer water seepage is obvious compared with interlayer debonding. The random samples with interlayer water seepage are shown in Figure 2b. Interlayer water seepage is labeled as “Water_l”.
Interlayer loosening: When interlayer water seepage occurs, it is not repaired in time, and the action of load results in interlayer loosening distress. The GPR B-scan images of the distress are characterized by discontinuity and curvature of the event. The random samples with interlayer loosening are shown in Figure 2c. Interlayer loosening is labeled as “Loose_l”.

The dataset was established after data preprocessing. The number of samples in the training, validation, and test set of the dataset is shown in Table 2.

To visualize the distribution of the dataset, the dataset is reduced to three-dimensional space using T-distributed random neighborhood embedding (t-SNE). The dimension reduction results are shown in Figure 3. The red dots represent interlayer loosening (loose_l), the green dots represent interlayer debonding (poor_l), and the blue dots represent interlayer water seepage (water_l).

Figure 3 indicates that the distribution of interlayer debonding and interlayer water seepage shows little overlap, while the interlayer loosening is wrapped in the middle by the other two distresses, and there is a certain overlap with these two kinds of distresses.

2.2. CNN-Based Models

CNN architectures are mainly composed of the convolutional layer, the pooling layer, and the fully connected layer. The convolutional layer is used to extract features. The pooling layer is a process of subsampling, which can extract important features and reduce the parameters. The full connection layer implements linear transformation in high dimensional space, and softmax is used as the classifier.

The structure of the VGG network is relatively simple, mainly composed of a 3

\times

3 convolution layer, maximum pooling layer, and fully connected layer. VGG-19 consists of sixteen 3

\times

3 convolution layers, five maximum pooling layers, and three fully connected layers.

SqueezeNet is a lightweight convolutional neural network which is mainly divided into three parts: The first part is a 3

\times

3 convolutional layer. The second part consists of several fire modules and the number of filters per fire module are gradually increased from the beginning to the end of the network. A fire module comprises the following: a squeeze convolution layer (which has only 1

\times

1 filters), feeding into an expand layer that has a mix of 1

\times

1 and 3

\times

3 convolution filters. The third part consists of a 1

\times

1 convolutional layer, a global average pooling layer, and a fully connected layer.

MobileNet V2 is an efficient lightweight neural network. Its core structure is the inverted residual block. The inverted residual block first increases the number of channels through 1

\times

1 convolution, then depthwise separable convolution is used to extract features. Finally, another 1

\times

1 convolution is used to reduce the number of channels. There are shortcut connections between the input and output of the inverted residual block, only in the case that the size of the input and output feature maps is the same. MobileNet V2 consists of a 3

\times

3 convolutional layer, multiple inverted residual blocks, a 1

\times

1 convolutional layer, a global averaging pooling layer, and a fully connected layer.

The core structure of EfficientNet is the Mobile Inverted Bottleneck Convolution (MBConv) module which combines the depthwise separable convolution and squeeze-and-excitation (SE) networks. The MBConv module mainly consists of a 1

\times

1 convolution, a depthwise separable convolution, an SE module, and a 1

\times

1 convolution. There are shortcut connections between the input and output of the MBConv module, only in the case that the size of the input and output feature maps is the same. EfficientNet B0 consists of a 3

\times

3 convolutional layer, multiple MBConv modules, a 1

\times

1 convolutional layer, a global averaging pooling layer, and a fully connected layer.

The core idea of DenseNet is to introduce direct connections from any layer to all subsequent layers. This direct connection promotes feature reuse, enhances feature propagation, and helps to alleviate the problem of gradient disappearance. DenseNet consists of several basic components: an initial convolutional layer, multiple dense blocks, transition layers, a global averaging pooling layer, and a fully connected layer. Each dense block consists of multiple dense layers, and each dense layer consists of a 1

\times

1 convolution and a 3

\times

3 convolution. The transition layers are used between dense blocks to reduce the number of channels by 1 × 1 convolution and the size of the feature map by the average pooling layer.

ResNet alleviates the problem of gradient vanishing and gradient exploding. The core idea is to introduce shortcut connections in building deeper networks, which integrate the information extracted from shallow and deep convolutional layers, making it possible to improve the performance of the model. The residual network is mainly divided into three parts: the first part is a 7

\times

7 convolutional layer with 64 channels and a maximum pooling layer. The second part consists of several residual blocks and an average pooling layer, where the more residual blocks are stacked, the greater the number of channels. The third part is a fully connected layer, which uses the softmax activation function to output the probability of the predictions. The basic unit of the residual network is the residual block. Each residual block of residual networks with depths of less than 34 layers consists of two 3 × 3 convolutional layers, while each residual block of deeper residual networks consists of three convolutional layers with convolution kernel sizes of 1 × 1, 3 × 3, and 1 × 1, respectively. Deeper networks can be built by superimposing multiple residual blocks as shown in Figure 4.

To find the network models with an appropriate complexity that may match the interlayer distress data, 13 models of different complexity were selected, including ResNet-4, ResNet-6, ResNet-8, ResNet-10, ResNet-14, ResNet-18, ResNet-34, ResNet-50, DenseNet-121, EfficientNet B0, SqueezeNet1_0, MobileNet V2, and VGG-19.

2.3. Evaluation Metrics

2.3.1. Metrics

To evaluate the quality of the models, several metrics were used in this paper:

Precision: the proportion of true positives (TPs) among the predicted positives. It measures the model’s ability to identify TPs without including false positives (FPs).
Recall: the proportion of TPs among the actual positives. It measures the model’s ability to identify all positives.
F1 score: a commonly used comprehensive evaluation metric that combines precision and recall, measuring the performance of models comprehensively.
Accuracy: the proportion of correctly predicted samples accounts for all samples. It is often used to evaluate the overall performance of models.
Confusion matrix: an evaluation index is utilized to intuitively visualize the performance of models. Each column in the confusion matrix represents the predicted category, each row represents the true category.

P r e c i s i o n = \frac{T P}{T P + F P}

(1)

R e c a l l = \frac{T P}{T P + F N}

(2)

F 1 s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(3)

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(4)

The precision, recall, F1 score, and accuracy can be calculated by Formulas (1), (2), (3), and (4), respectively. TP (True positive) indicates the number of positive samples correctly predicted; TN (True negative) indicates the number of negative samples correctly predicted; FP (False positive) indicates the number of negative samples incorrectly predicted as positive samples; FN (False negative) refers to the number of samples that are actually positive but incorrectly predicted as negative.

To evaluate the overall performance of the network model, this paper also introduces a macro-average F1 score (macro-F1 score) which is the arithmetic average of all categories based on the F1 score of each category. Precision, recall, F1 score, and accuracy reflect the model’s performance from different visual angles. F1 score combines precision and recall and provides a more comprehensive evaluation of the model, while precision and recall measure a certain aspect of the model’s performance, respectively. The macro-F1 score and accuracy evaluate the overall performance of the model. Therefore, F1 score, macro-F1 score, and accuracy are primarily utilized for evaluating the models.

2.3.2. Grad-CAM

The Grad-CAM [36] is used to visualize the features to illustrate the most important region of an image for model prediction, capable of producing visual explanations for any CNN network. The central concept of this method is to use gradient information to obtain the class-discriminative localization map. Specifically, the score for the class of interest is first computed. Subsequently, the gradient of score with respect to convolutional feature map activations is computed to obtain the neuron importance weights. Finally, the class-discriminative localization map is obtained through a weighted combination of forward activation maps followed by a ReLU.

Here, Grad-CAM is used to generate heatmaps for the final convolutional layer of each network. Then the heatmaps are used to analyze the prediction mechanism of the model and offer insights for improving the model.

3. Results and Analysis

The experimental environment is configured as follows: a graphics processing unit (GPU) (NVIDIA GeForce GTX 3090, RAM: 24 GB), python3.8, pytorch2.1.1, cuda12.1. The optimizer utilized is Stochastic Gradient Descent (SGD) with a momentum of 0.9 and initial learning rate of 0.01, following an exponential decay strategy (gamma = 0.9). The loss function used is the cross-entropy loss function. The batch size is set to 64 images. ResNet-4, ResNet-6, and ResNet-8 underwent 100 epochs of training, while the remaining models were trained for a total of 50 epochs.

All distress samples were resized to 224

\times

224 as input, and transfer learning was employed to fine-tune the pre-trained weights obtained from the ImageNet dataset [37] for mitigating overfitting. The training was repeated three times for each model to mitigate the impact of sample randomness.

3.1. Performance of Models

Eight ResNet models with different depths and five other classical models are trained, including ResNet-4, ResNet-6, ResNet-8, ResNet-10, ResNet-14, ResNet-18, ResNet-34, ResNet-50, DenseNet-121, EfficientNet B0, SqueezeNet1_0, MobileNet V2, and VGG-19. Table 3 summarizes the size of models, the average training time per epoch, the total parameters of the models, and the floating point of operations (FLOPs) of the models.

The model’s training time is affected by various factors such as the architecture of the model and the FLOPs of the model. Generally, for the same architecture, the more FLOPs of the model, the longer the training time. The maximum FLOPs and training time appeared on VGG-19 while the minimum FLOPs and training time appeared on ResNet-4. For residual networks, a greater number of FLOPs means more training time.

The parameters of a model can reflect the complexity of the model. Generally, a higher number of parameters leads to a larger model size and increased complexity. Among the 13 network models, ResNet-4, ResNet-6, and SqueezeNet have fewer parameters, with none exceeding 1 M. The parameters of ResNet-8, MobileNet, EfficientNet, ResNet-10, ResNet-14, and DenseNet ranged from 1.227 M to 6.951 M. The parameters of ResNet-18, ResNet-34, ResNet-50, and VGG-19 were relatively high, with VGG-19 having the largest parameter of 139.581 M.

The learning curves of the models are shown in Figure 5, where red lines indicate the accuracy and loss calculated on the training set, and blue lines represent the accuracy and loss calculated on the validation set.

In order to accurately analyze the differences between models, the training and validation loss curves are analyzed in detail. Generally, the validation loss increases while the training loss decreases, indicating overfitting. The learning curves in Figure 5 show that ResNet-4, ResNet-6, SqueezeNet1_0, and ResNet-8 exhibit no overfitting, and with the complexity of the model increasing, the training loss shows improvement, but overfitting was observed in DenseNet-121, EfficientNet B0, MobileNet V2, VGG-19, and ResNet models from ResNet-10 to ResNet-50, and for most models with the severity of overfitting increasing as the complexity of the model increases. In fact, by increasing the complexity of the model, its feature extraction capability can be enhanced, leading to improved training loss. However, the excessively complex model may lead to an increase in validation loss and an increased risk of overfitting. In a word, the balance between training and validation loss is crucial. Notably, MobileNet V2 and EfficientNet B0 are more serious than ResNet-10 and ResNet-14 in overfitting, although MobileNet V2 and EfficientNet B0 have fewer parameters. Perhaps this is the advantage of the residual structure of ResNet.

To assess the model’s performance on the test set, the weight with the lowest validation loss is chosen for making predictions on the test set. Each of all the models underwent three rounds of training and testing. Metrics are calculated from the average of three test results to mitigate the impact of sample randomness. Table 4 summarizes the F1 score (F1) for each model and each class as well as the macro-F1 score (Macro-F1) and accuracy (Acc) for each model. The highest scores are indicated in red while the lowest scores are indicated in blue. The models without overfitting are indicated in red.

For the interlayer loosening, the F1 score ranges from 0.9239 to 0.9366, with SqueezeNet1_0 having the lowest and ResNet-8 achieving the highest. For the interlayer debonding, the F1 scores range from 0.9812 to 0.9853, with ResNet-34 being the top-performing model, while SqueezeNet1_0 scored the lowest. For the interlayer water seepage, the F1 score ranges from 0.9798 to 0.9846, with MobileNet V2 having the lowest score, and ResNet-6 and ResNet-8 having the highest. Overall, ResNet-8 performs best while SqueezeNet1_0 performs worst for the interlayer loosening. ResNet-34 performs best while SqueezeNet1_0 performs worst for the interlayer debonding. ResNet-6 and ResNet-8 demonstrate superior performance while MobileNet V2 performs the worst for the interlayer water seepage.

In terms of macro-F1 score, SqueezeNet1_0 exhibited the lowest macro-F1 score; ResNet-8 achieves the highest macro-F1 score. For accuracy, ResNet-6, ResNet-8, and ResNet-34 achieve the highest scores, each reaching 0.9772, while SqueezeNet1_0 performs worst. In summary, ResNet-8 emerges as the top-performing model, followed by ResNet-6 and ResNet-34, while SqueezeNet1_0 performs worst. Overall, the macro-F1 score and accuracy of these models have reached a high level but the differences in the macro-F1 scores and accuracy of these models are small.

Generally, overfitting will lead to the weak generalization performance of a model and the predictive risk on new data. The accuracy and macro-F1 scores of the 13 models were all at a high level with a small gap. MobileNet V2, EfficientNet B0, ResNet-10, ResNet-14, DenseNet-121, ResNet-18, ResNet-34, ResNet-50, and VGG-19 showed overfitting, and these models may have learned noise in the training data rather than the underlying data distribution which may be a risk for the predictions on new data.

To compare each model’s performance more intuitively, Figure 6 summarizes the confusion matrix of each model. The diagonal elements of the confusion matrix indicate the number of correct predictions made by the model for each class, while the remaining elements represent the number of incorrect predictions.

For the category of interlayer loosening, the number of correct predictions is between 1361 and 1426. The number of samples correctly predicted by MobileNet V2 was the largest; 1426 out of 1501 samples were correctly predicted, and 39 samples were incorrectly predicted as interlayer debonding, and 36 samples were incorrectly predicted as interlayer water seepage. While DenseNet-121 correctly predicted the fewest number of samples, and DenseNet-121 correctly predicted 1361 samples, 92 samples were incorrectly predicted as interlayer debonding, and 48 samples were incorrectly predicted as interlayer water seepage.

For the category of interlayer debonding, the number of correct predictions is between 4406 and 4456. The number of samples correctly predicted by DenseNet-121 was the largest, 4456 out of 4505 samples were correctly predicted, and 34 samples were incorrectly predicted as interlayer loosening; 15 samples were incorrectly predicted as interlayer water seepage. On the other hand, MobileNet V2 correctly predicted the fewest number of samples, correctly predicted 4406 samples; 64 samples were incorrectly predicted as interlayer loosening, and 35 samples were incorrectly predicted as interlayer water seepage.

For the category of interlayer water seepage, the number of correct predictions is between 3532 and 3567. The number of samples correctly predicted by ResNet-6 was the largest; 3567 out of 3607 samples were correctly predicted, and 30 samples were incorrectly predicted as interlayer loosening, and 10 samples were incorrectly predicted as interlayer debonding. On the other hand, MobileNet V2 correctly predicted the fewest number of samples, and MobileNet V2 correctly predicted 3532 samples; 68 samples were incorrectly predicted as interlayer loosening, and 7 samples were incorrectly predicted as interlayer debonding.

It can be seen from the confusion matrixes that there are more mispredictions related to interlayer loosening, while the mispredictions between interlayer debonding and interlayer water seepage are relatively infrequent. The reason may be that both interlayer debonding and interlayer water seepage exhibit similar characteristics in part of samples to interlayer loosening as shown in Figure 3.

3.2. Grad-CAM Results

Grad-CAM is used to gain more insights into how each of the models made their predictions. Figure 7 shows the heatmaps of interlayer distress samples. The warm colors indicate the areas with the greatest impact on prediction, while cooler colors indicate little to no contribution to the prediction.

As shown in Figure 7, these models focus on a small area of interlayer debonding and interlayer water seepage samples, while focusing on a large area of interlayer loosening samples. For the interlayer debonding and interlayer water seepage, there is a polarity reversal between the two distresses, which is enough for the model to distinguish the two distresses by focusing on the small lower part of the distress samples. Compared with interlayer debonding and interlayer water seepage, interlayer loosening has more complex texture features. Thus, the model needs to extract more information to distinguish the distress. As the heatmaps of interlayer loosening shown in Figure 7, simpler models focus on multiple smaller areas representing local features. In contrast, more complex models focus on larger areas representing high-level semantic features.

4. Conclusions

In order to precisely classify highway interlayer distresses, a dataset was established based on GPR B-scan images of highways in Jiangsu Province, China. The dataset is divided into a training set, validation set, and test set, in which the training set contains 19,222 B-scan images, the validation set contains 3203 B-scan images, and the test set contains 9613 B-scan images. To evaluate the performance of various models, several metrics were used. The losses calculated on the training and validation set and the learning curves were produced to evaluate the training performance. The F1 score, macro-F1 score, accuracy, and confusion matrix were used to evaluate the model’s performance on the test set. Furthermore, to gain a more comprehensive understanding of the model’s learning process and predictive mechanisms, the Grad-CAM technique was employed to generate heatmaps illustrating the specific regions within the images that had the most significant impact on the model predictions. Through the above work, the conclusions of this paper are described as follows.

Firstly, in the performance evaluation of the model, we should not only consider the test accuracy, but also analyze the loss curves in the training stage, especially in the case of little difference in test accuracy, so as to judge the generalization performance of the model. Secondly, all models have similar test accuracy, but ResNet-4, ResNet-6, SqueezeNet1_0, and ResNet-8 exhibit no overfitting which means they have a good generalization performance. Finally, the generalization performance of a model is not merely associated with the number of parameters within the model, but also with the architecture of the model. For example, MobileNet V2 and EfficientNet B0 are more serious than ResNet-10 and ResNet-14 in overfitting, although MobileNet V2 and EfficientNet B0 have fewer parameters.

This paper provides satisfactory answers on the classification of interlayer distresses in pavement and lays the groundwork for a more detailed fine-grained interlayer distresses classification. This paper also offers valuable insights into the real-time classification of highway interlayer distresses, which is crucial for highway maintenance. Meanwhile, ResNet-4, ResNet-6, SqueezeNet1_0, and ResNet-8 may be used as a backbone network candidate for interlayer distress detection as a more challenging downstream task. Additionally, ResNet-4, ResNet-6, SqueezeNet1_0, and ResNet-8 will be tested and evaluated in other hidden distress recognition and detection based on GPR images as future work.

Author Contributions

Conceptualization, W.C. and C.L.; data curation, M.L.; formal analysis, W.C. and C.L.; funding acquisition, W.C. and C.L.; investigation, W.C.; methodology, M.L. and G.J.; project administration, W.C. and M.L.; resources, W.C. and G.J.; software, M.L.; supervision, C.L.; validation, M.L., Q.L. and C.L.; visualization, M.L.; writing—original draft, W.C. and M.L.; writing—review and editing, W.C. and C.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the project Highway Hidden Distresses Detection and Recognition of Jiangsu Sinoroad Engineering Technology Research Institute Co., Ltd., grant number JSZL-20200021.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset presented in this article is not readily available because the data are part of an ongoing study. Requests to access the dataset should be directed to contact the corresponding author.

Acknowledgments

We sincerely acknowledge the engineering technicians from Jiangsu Sinoroad Engineering Technology Research Institute Co., Ltd. in conducting all the fieldworks and collecting the GPR data.

Conflicts of Interest

Authors Wenlong Cai and Guanglai Jin were employed by the company Jiangsu Sinoroad Engineering Technology Research Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The authors declare that this study received funding from Jiangsu Sinoroad Engineering Technology Research Institute Co., Ltd. The funder was not involved in the study design, collection, analysis, interpretation of data, the writing of this article or the decision to submit it for publication.

References

Guo, S.; Xu, Z.; Li, X.; Zhu, P. Detection and characterization of cracks in highway pavement with the amplitude variation of GPR Diffracted waves: Insights from forward modeling and field data. Remote Sens. 2022, 14, 976. [Google Scholar] [CrossRef]
Liu, T.; Zhang, X.-n.; Li, Z.; Chen, Z.-q. Research on the homogeneity of asphalt pavement quality using X-ray computed tomography (CT) and fractal theory. Constr. Build. Mater. 2014, 68, 587–598. [Google Scholar] [CrossRef]
Edwards, L.; Bell, H.P. Comparative evaluation of nondestructive devices for measuring pavement thickness in the field. Int. J. Pavement Res. Technol. 2016, 9, 102–111. [Google Scholar] [CrossRef]
Chen, D.-H.; Hong, F.; Zhou, W.; Ying, P. Estimating the hotmix asphalt air voids from ground penetrating radar. NDT E Int. 2014, 68, 120–127. [Google Scholar] [CrossRef]
Annan, A.P.; Diamanti, N.; Redman, J.D.; Jackson, S.R. Ground-penetrating radar for assessing winter roads. Geophysics 2016, 81, WA101–WA109. [Google Scholar] [CrossRef]
Liu, H.; Sato, M. In situ measurement of pavement thickness and dielectric permittivity by GPR using an antenna array. NDT E Int. 2014, 64, 65–71. [Google Scholar] [CrossRef]
Kheradmandi, N.; Mehranfar, V. A critical review and comparative study on image segmentation-based techniques for pavement crack detection. Constr. Build. Mater. 2022, 321, 126162. [Google Scholar] [CrossRef]
Tong, Z.; Gao, J.; Zhang, H. Recognition, location, measurement, and 3D reconstruction of concealed cracks using convolutional neural networks. Constr. Build. Mater. 2017, 146, 775–787. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Liu, Z.; Gu, X.; Chen, J.; Wang, D.; Chen, Y.; Wang, L. Automatic recognition of pavement cracks from combined GPR B-scan and C-scan images using multiscale feature fusion deep neural networks. Autom. Constr. 2023, 146, 104698. [Google Scholar] [CrossRef]
Liu, Z.; Gu, X.; Yang, H.; Wang, L.; Chen, Y.; Wang, D. Novel YOLOv3 model with structure and hyperparameter optimization for detection of pavement concealed cracks in GPR images. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22258–22268. [Google Scholar] [CrossRef]
Zhang, B.; Cheng, H.; Zhong, Y.; Tao, X.; Li, G.; Xu, S. Automatic quantitative recognition method for vertical concealed cracks in asphalt pavement based on feature pixel points and 3D reconstructions. Measurement 2023, 220, 113296. [Google Scholar] [CrossRef]
Li, Y.; Liu, C.; Yue, G.; Gao, Q.; Du, Y. Deep learning-based pavement subsurface distress detection via ground penetrating radar data. Autom. Constr. 2022, 142, 104516. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Hou, F.; Lei, W.; Li, S.; Xi, J. Deep learning-based subsurface target detection from GPR scans. IEEE Sens. J. 2021, 21, 8161–8171. [Google Scholar] [CrossRef]
Liu, Z.; Yeoh, J.K.; Gu, X.; Dong, Q.; Chen, Y.; Wu, W.; Wang, L.; Wang, D. Automatic pixel-level detection of vertical cracks in asphalt pavement based on GPR investigation and improved mask R-CNN. Autom. Constr. 2023, 146, 104689. [Google Scholar] [CrossRef]
Li, S.; Gu, X.; Xu, X.; Xu, D.; Zhang, T.; Liu, Z.; Dong, Q. Detection of concealed cracks from ground penetrating radar images based on deep learning algorithm. Constr. Build. Mater. 2021, 273, 121949. [Google Scholar] [CrossRef]
Xu, Z.; Yu, X.; Liu, Z.; Zhang, S.; Sun, Q.; Chen, N.; Lv, H.; Wang, D.; Hou, Y. Safety monitoring of transportation infrastructure foundation: Intelligent recognition of subgrade distresses based on B-Scan GPR images. IEEE Trans. Intell. Transp. Syst. 2022, 24, 15468–15477. [Google Scholar] [CrossRef]
Zhang, J.; Yang, X.; Li, W.; Zhang, S.; Jia, Y. Automatic detection of moisture damages in asphalt pavements from GPR data with deep CNN and IRS method. Autom. Constr. 2020, 113, 103119. [Google Scholar] [CrossRef]
Rosso, M.M.; Marasco, G.; Aiello, S.; Aloisio, A.; Chiaia, B.; Marano, G.C. Convolutional networks and transformers for intelligent road tunnel investigations. Comput. Struct. 2023, 275, 106918. [Google Scholar] [CrossRef]
Liang, X.; Yu, X.; Chen, C.; Jin, Y.; Huang, J. Automatic classification of pavement distress using 3D ground-penetrating radar and deep convolutional neural network. IEEE Trans. Intell. Transp. Syst. 2022, 23, 22269–22277. [Google Scholar] [CrossRef]
Zhang, B.; Liu, J.; Zhong, Y.; Li, X.; Hao, M.; Li, X.; Zhang, X.; Wang, X. A BP neural network method for grade classification of loose damage in semirigid pavement bases. Adv. Civ. Eng. 2021, 2021, 1–11. [Google Scholar] [CrossRef]
Zheng, J.; Teng, X.; Liu, J.; Qiao, X. Convolutional neural networks for water content classification and prediction with ground penetrating radar. IEEE Access 2019, 7, 185385–185392. [Google Scholar] [CrossRef]
Wang, B.; Chen, P.; Zhang, G. Simulation of GPR B-scan data based on dense generative adversarial network. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 3938–3944. [Google Scholar] [CrossRef]
Alzubaidi, L.; Chlaib, H.K.; Fadhel, M.A.; Chen, Y.; Bai, J.; Albahri, A.; Gu, Y. Reliable deep learning framework for the ground penetrating radar data to locate the horizontal variation in levee soil compaction. Eng. Appl. Artif. Intell. 2024, 129, 107627. [Google Scholar] [CrossRef]
Ozkaya, U.; Melgani, F.; Bejiga, M.B.; Seyfi, L.; Donelli, M. GPR B scan image analysis with deep learning methods. Measurement 2020, 165, 107770. [Google Scholar] [CrossRef]
Tong, Z.; Yuan, D.; Gao, J.; Wei, Y.; Dou, H. Pavement-distress detection using ground-penetrating radar and network in networks. Constr. Build. Mater. 2020, 233, 117352. [Google Scholar] [CrossRef]
Liu, W.; Luo, R.; Xiao, M.; Chen, Y. Intelligent detection of hidden distresses in asphalt pavement based on GPR and deep learning algorithm. Constr. Build. Mater. 2024, 416, 135089. [Google Scholar] [CrossRef]
Xiong, X.-L.; Meng, A.; Lu, J.; Tan, Y.; Chen, B.; Tang, J.; Zhang, C.; Xiao, S.-q.; Hu, J. Automatic detection and location of pavement internal distresses from ground penetrating radar images based on deep learning. Constr. Build. Mater. 2024, 411, 134483. [Google Scholar] [CrossRef]
Jin, G.; Zang, G.; Cai, W.; Lu, H.; Zhao, J. Quantitative evaluation method of pavement structural integrity based on ground penetrating radar. Highway 2020, 5, 16–20. (In Chinese) [Google Scholar]
Cai, W.; Zhou, K.; Wang, G. Research on quantitative identification method of hidden distresses in asphalt pavement structure based on ground penetrating radar. Comput. Tech. Geophys. Geochem. Explor. 2022, 44, 597–604. (In Chinese) [Google Scholar]
Tong, Z.; Gao, J.; Yuan, D. Advances of deep learning applications in ground-penetrating radar: A survey. Constr. Build. Mater. 2020, 258, 120371. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Canziani, A.; Paszke, A.; Culurciello, E. An analysis of deep neural network models for practical applications. arXiv 2016, arXiv:1605.07678. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]

Figure 1. The establishment process of distress dataset.

Figure 2. Comparison of GPR B-scan images of interlayer distresses with core samples and samples of three types of interlayer distress.

Figure 3. Three-dimensional distribution plot of the dataset after t-SNE dimensionality reduction.

Figure 4. Architecture of ResNet.

Figure 5. The learning curves of ResNet-4, ResNet-6, SqueezeNet1_0, ResNet-8, MobileNet V2, EfficientNet B0, ResNet-10, ResNet-14, DenseNet-121, ResNet-18, ResNet-34, ResNet-50, and VGG-19.

Figure 6. Confusion matrixes of models.

Figure 7. The Grad-CAM results of models.

Table 1. Technical specifications of GX750.

Technical Specification	Parameters
Antenna center frequency	750 MHz
Scanning rate	1290 channel/s, time window 75 ns
Depth of detection	0~1.5 m
Coupling mode	Air coupling
Weight	3.6 kg

Table 2. The number of samples in the training, validation, and test set.

	Loose_l	Water_l	Poor_l	Total
Training set	3001	7242	8979	19,222
Validation set	500	1207	1496	3203
Test set	1501	3607	4505	9613

Table 3. Statistical table of model size, average training time per epoch, total parameters of models, and total FLOPs of models.

Models	Model’s Size (MB)	Average Epoch Time (s)	Model’s Total Parameters (M)	Model’s Total FLOPs (M)
ResNet-4	0.307	53.8	0.077	275.57
ResNet-6	1.19	54.6	0.307	456.50
SqueezeNet	2.80	63.5	0.728	620.88
ResNet-8	4.71	54.9	1.227	636.88
MobileNet	8.77	63.9	2.228	318.99
EfficientNet	15.6	70.7	4.011	406.64
ResNet-10	18.7	55.4	4.901	816.99
ResNet-14	20.1	56.3	5.270	1281.82
DenseNet	27.1	116.3	6.951	2817.31
ResNet-18	42.7	57.5	11.172	1744.85
ResNet-34	81.3	68.7	21.280	3599.55
ResNet-50	90.0	96.2	23.508	4053.02
VGG-19	532.48	185.3	139.581	19,570.23

Table 4. Metrics for the 13 models.

Network	Loose_l	Poor_l	Water_l
Network	F1	F1	F1	Macro-F1	Acc
ResNet-4	0.9243	0.9827	0.9796	0.9622	0.9725
ResNet-6	0.9362	0.9848	0.9846	0.9685	0.9772
SqueezeNet1_0	0.9239	0.9812	0.9813	0.9621	0.9724
ResNet-8	0.9366	0.9848	0.9846	0.9687	0.9772
MobileNetV2	0.9323	0.9838	0.9798	0.9653	0.9741
EfficientNetB0	0.9280	0.9828	0.9825	0.9644	0.9742
ResNet-10	0.9341	0.9846	0.9838	0.9675	0.9765
ResNet-14	0.9313	0.9843	0.9827	0.9661	0.9755
DenseNet-121	0.9265	0.9830	0.9837	0.9644	0.9746
ResNet-18	0.9314	0.9837	0.9825	0.9659	0.9752
ResNet-34	0.9363	0.9853	0.9839	0.9685	0.9772
ResNet-50	0.9298	0.9838	0.9826	0.9654	0.9749
VGG-19	0.9291	0.9823	0.9827	0.9647	0.9741

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cai, W.; Li, M.; Jin, G.; Liu, Q.; Lu, C. Comparison of Residual Network and Other Classical Models for Classification of Interlayer Distresses in Pavement. Appl. Sci. 2024, 14, 6568. https://doi.org/10.3390/app14156568

AMA Style

Cai W, Li M, Jin G, Liu Q, Lu C. Comparison of Residual Network and Other Classical Models for Classification of Interlayer Distresses in Pavement. Applied Sciences. 2024; 14(15):6568. https://doi.org/10.3390/app14156568

Chicago/Turabian Style

Cai, Wenlong, Mingjie Li, Guanglai Jin, Qilin Liu, and Congde Lu. 2024. "Comparison of Residual Network and Other Classical Models for Classification of Interlayer Distresses in Pavement" Applied Sciences 14, no. 15: 6568. https://doi.org/10.3390/app14156568

APA Style

Cai, W., Li, M., Jin, G., Liu, Q., & Lu, C. (2024). Comparison of Residual Network and Other Classical Models for Classification of Interlayer Distresses in Pavement. Applied Sciences, 14(15), 6568. https://doi.org/10.3390/app14156568

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparison of Residual Network and Other Classical Models for Classification of Interlayer Distresses in Pavement

Abstract

1. Introduction

2. Methodology

2.1. Dataset Construction and Analysis

2.2. CNN-Based Models

2.3. Evaluation Metrics

2.3.1. Metrics

2.3.2. Grad-CAM

3. Results and Analysis

3.1. Performance of Models

3.2. Grad-CAM Results

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI