1. Introduction
The growth of urban traffic and the consequent increase in traffic volume over the years have made the timely maintenance of pavements extremely important. Repetitive traffic loads [
1], rapid temperature changes [
2], and reflection from base layers [
3] are deemed to contribute directly to pavement damages. Also, water ingress into initial pavement cracks can deepen the damage resulting in distresses like potholes, even pavement structural failures [
4]. Thus, timely maintenance can not only ensure safe operation but also increase and the service life of pavements. The current pavement crack detection is manual with subjective human interpretation and reparation mainly involve filling of the crack with sealant. Although, automated pavement detection systems have been studied for many years, previous researches were primarily focused on crack extraction. However, for the actual complex road conditions, the existing methods have limited error detection rates to identify all kinds of cases [
5]. Multi-sensor fusion processing idea for complex road conditions was considered where acceleration sensors [
6], infrared sensors [
7], multi-vision cameras [
8], and 3D laser scanning [
9] can provide additional identification information to the optical images of the pavement.
Automated pavement detection has undergone several significant technological changes, and digital image-based methods have been widely used for pavement crack detection and segmentation. The difference in grayscale values of crack pixels and background of digital images makes segmentation as well as detection logical [
10]. Other factors such as lighting conditions, asphalt oil markings, and pavement markings, make pavement crack classification and segmentation are challenging as well. Thus, the use of image processing has the advantages of fast processing speed, low cost and high robustness compared to the use of laser scanning or 3D point clouds from multi-vision cameras. At the same time, due to the acquisition of images only, it may not be possible to achieve complete recognition under complex pavement scenes. Additionally, image recognition achieved through image processing and machine learning, wherein machine learning can be subdivided into traditional machine learning and deep learning [
11]. These image processing methods do not require a model training process and usually involve the use of filters, morphological analysis, and statistical methods to detect cracks. These methods require careful filtering for the usage scenario and may not be robust to noise such as lighting, oil stains, etc. Zou et al. [
12] proposed a crack tree noting the effect of lighting on pavement detection and proposed a shadow removal algorithm before crack extraction to eliminate the effect of shadows on the extraction results. However, the crack recognition requires the assistance of some machine learning algorithms such as SVM (Support Vector Machine), RBF (Radial Basis Function), KNN (K-Nearest Neighbor) and Random Decision Forest [
10,
13,
14]. Also, statistical features, gray-level features, texture and shape features of cracked images are increasingly used for feature extraction of the images. In addition to considering block-level information, multi-scale information also gets used for multi-scale fusion crack detection (MFCD) to detect cracks [
13]. Furthermore, principal component analysis (PCA)uses simplified number of feature levels to speed up the identification of a single crack block [
10].
In recent years, artificial intelligence (AI) has become increasingly popular in automatic image processing and recognition, and deep learning has gained traction in object detection and segmentation [
15]. Unlike basic machine learning, deep learning uses designed convolution to replace feature extraction in traditional machine learning with the powerful parallel computing capability of the graphics processing unit (GPU). Deep learning for recognition detection problems greatly improves the detection efficiency and the accuracy of recognition. Deep structures with many hidden layers, such as deep convolutional neural networks (DCNN), provide increased levels of feature abstraction to reflect the complexity of the data, and diverse convolutional layers of feature extraction increase the confidence of the features [
16]. Deep learning also uses raw data for processing, end-to-end networks with minimal human intervention and prior assumptions, which increases the possibility of recognition detection for complex scenarios. Also, the image recognition network was transferred to the domain of pavement crack detection through transfer learning. Gopalakrishnan et al. [
17] completed fully automated detection of pavement distress using the VGG16 network. And Cha et al. [
18] proposed a DCNN architecture for detecting concrete cracks in intensity images under uneven illumination conditions. Similarly, Cha et al. [
19] modified the original fast R-CNN architecture to detect and classify five defective concrete cracks. And Zhang et al. [
20] transferred a network from a pre-trained AlexNet to facilitate the learning process to classify image patches as cracks, sealed cracks, and background regions. These deep learning models by transfer learning seek new application scenarios from established image recognition models, and their results show that the deep learning approach outperforms traditional image processing. However, these methods only focus on the recognition of the cracks themselves and do not identify and detect the complex factors such as oil markings, joints, manholes, etc. Majidifard et al. [
21] combined YOLO (you only look once) deep learning framework and U-net as a two-step network to identify and extract cracks from the pavement image and made a detailed subdivision into eight types of cracks and potholes for the complex scenes of the pavement. Also, Chen et al. [
6] also used a fusion network with acceleration using wavelet transform and VGG16 to identify and detect transverse cracks and manholes.
The pavement crack detection and even quantitative crack measurement in complex scenes have been further investigated by multi-sensors and deep learning. Zhou and Song [
22] used DCNN and laser-scanned range images to identify cracks with depth mapping information to avoid the effect of oil stains and shadows on the pavement. Also, Guan et al. (2021) used automatic pavement detection based on stereo vision and deep learning, and the 3D image dataset enabled effective identification of cracks and potholes, and the 3D image with depth information enabled volume measurement of potholes. A thermal image was also used for pavement crack detection, and the surface temperature distribution pattern is directly related to the pavement crack profile, which can be used as an indicator of crack depth [
23]. Seo et al. [
24] conducted an experimental study on the behavior of cracks by applying infrared thermal images, depending on the different widths of the columns, and confirmed that infrared thermal imagers can detect cracks [
25]. Compared to multi-visual cameras or laser-scanned points or images, thermal imagers tend to have better real-time efficiency, low-cost characteristics, and the ability to directly process raw data into deep learning networks. Thermal imaging is certainly one of the auxiliary devices for practical pavement inspection. Georgia Tech Sensing Vehicle (GTSV) [
26] was used to collect 3D pavement surface images to validate existing pavement detection methods. Mainstream models such as the Full Convolutional Network (FCN) model [
27], U-Net [
28], DeepCrack [
29] with VGG16 backbone, and Pix2Pix [
30] based on generative adversarial networks (GAN) were used for qualitative and objective evaluation of crack detection algorithms. Ali et al. [
31] proposed a local weighting factor with sensitivity map to eliminate network bias and accurately predict sensitive pixels, a deep full convolutional neural network with better crack segmentation performance than U-Net was implemented. Fan et.al [
32] proposed a medium hierarchical feature learning and inflated convolutional encoder-decoder architecture for crack detection with an end-to-end approach. Wang et.al [
33] proposed a semantic segmentation framework for cracked images based on semi-supervised learning, which greatly reduces the workload of data annotation, and his proposed network can extract and merge information from multiple feature layers to improve the performance of the algorithm.
Based on previous crack detection studies, this paper analyses the statistical distribution characteristics of cracks in the optical and thermal images. This paper also proposes two improvements for pavement detection assisted by the thermal image: data augmentation and the detection system with fusion pre-processing. It may not be enough to support sufficient recognition accuracy of the complex scenes by a small dataset due to the absence of any existing infrared camera-assisted pavement detection database. Therefore, this paper uses a data augmentation method to augment the training set in the collected dataset to verify that augmented data can effectively increase detection accuracy. When using optical sensors for image-based pavement damage detection, the accuracy of the detection system was affected by the uneven light, shadow and oil marks. Point clouds are not available as real-time detection due to their full acquisition and long processing time. This paper also proposes a pavement crack detection algorithm based on EfficientNet [
34], which gets tested using optical images, thermal images, and pre-processed fusion images to investigate the effectiveness of infrared cameras for recognizing complex scenes on a pavement in a side-by-side comparison.
3. Difference between Infrared Thermal Imaging and Optical Image Imaging in Complex Pavement Conditions
Conventionally, thermal imaging determines the temperature of an object by measuring the infrared radiation emitted from the surface of the target. Thermography measures temperature quickly enough to meet the purpose of real-time detection. Also, the Infrared ray (IR) is a band of invisible light found on the electromagnetic spectrum, with wavelengths ranging from 0.75 to 100 [
45]. Thermal infrared cameras can detect infrared radiation from the short-wave to the long-wave region, wherein the infrared detector receives the infrared radiation energy distribution of the target, the sensor converts the radiation into an electrical signal, and the output gets processed to form an infrared thermal image. This thermal image corresponds to the heat distribution field on the surface of the object. This thermal imaging merits from having high portability, non-contact temperature measurement, absence of harmful radiation, real-time image acquisition capability and image independence from light intensity [
46]. Also, with the relatively inexpensive cost of thermal infrared sensors makes them suitable in the field of identification where the accuracy of the temperature field is not as demanding.
The characteristics of thermal images and optical images are presented in
Figure 8a–c. It is evident from
Figure 8a that the crack region of the alligator crack has a higher temperature than the normal pavement, as the exposed asphalt can have higher heat absorption properties. Also, Longitudinal crack and transverse crack show a similar distribution of thermodynamic images with the exception of the direction of the crack. Also, the crack area appears to have a lower temperature due to the homogeneous pavement fracture into two parts with uneven thermal conductivity. Furthermore, the joint in
Figure 8b has a higher temperature than the normal pavement area, which also applies to the repaired cracks due to the better heat absorption properties of the asphalt material. Manhole, on the other hand, showed a lower temperature in the manhole area than the normal pavement area due to the prevalence of separation between them. Additionally, the potholes area appears to have higher roughness and slower heat absorption properties resulting in a lower temperature profile. Similarly, in
Figure 8c, the oiled areas showed higher heat absorption properties and as a result, displayed higher temperature profiles Also, the pavement markings (often as white or yellow markings) displayed lower temperature (than normal pavement) due to their low heat absorption performance. Finally, as the shadow blocks the sunlight, they displayed a lower temperature profile than the normal pavement.
Figure 9 presents the differential distribution statistical histogram of thermal image and RGB image respectively of
Figure 8. In the RGB image, except for the grayscale of road marking which has a relatively large value, the other types have smaller grayscale values. The main reason for this being the crack recognition method based on histogram analysis tends to focus only on the grayscale differentiation of the road surface. Furthermore, the thermal image exhibits a higher temperature profile for exposed asphalt areas and oiled areas whereas lower measures for road marking, shadows, transverse cracks, and longitudinal cracks than normal pavement, thus gainfully complementing the single RGB image.
6. Conclusions
Deep learning methods are increasingly being to pavement detections; however, the prediction accuracy of previous image-based detection methods tends to be compromised due to its inability to accommodate complex and large pavement features/scenes. This paper proposes an EfficientNet B4-based detection model for pavement damage recognition using a thermal image-assisted fusion of RGB images. The effect of model, data enhancement, and data source on the recognition performance is illustrated in detail by comparing the results.
This paper has proposed an efficient net deep learning architecture to classify nine classes of pavement features from the collected pavement dataset. Considering the average accuracy and average precision metrics of the original and augmented datasets, the EfficientNetB5 model achieved 98.92% accuracy and 98.22% precision on the original dataset, while the EfficientNetB4 model achieved 98.34% accuracy and 98.35% precision on the augmented dataset. Also combining the performance of other networks on the augmented data increased the recognition stability of the model. Since EfficientNetB4 has fewer training parameters, efficientB4 has the best performance on EfficientNet.
The IR image enhanced RGB image provides more image information compared to RGB image, IR image, or MSX image. Better Accuracy, Precision, Recall, and F1Score were achieved using the fused images.
The proposed method is based on thermal and RGB fused images and achieves the same accuracy as laser scanner when using sensor image overlay. Date from multiple sensors can provide more valid information for complex scenarios. Also, the proposed method is completely based on images; the processing is fast. But layer superposition of data from different sources will undoubtedly lose some of the original information, and using fusion model to handle different source sensors data to achieve better detection accuracy.