Nighttime Pothole Detection: A Benchmark

Ling, Min; Shi, Quanjun; Zhao, Xin; Chen, Wenzheng; Wei, Wei; Xiao, Kai; Yang, Zeyu; Zhang, Hao; Li, Shuiwang; Lu, Chenchen; Zeng, Yufan

doi:10.3390/electronics13193790

Open AccessArticle

Nighttime Pothole Detection: A Benchmark

by

Min Ling

¹,

Quanjun Shi

¹,

Xin Zhao

¹,

Wenzheng Chen

¹,

Wei Wei

¹,

Kai Xiao

^1,*,

Zeyu Yang

²

,

Hao Zhang

²,

Shuiwang Li

^2,*,

Chenchen Lu

³ and

Yufan Zeng

³

¹

Guangxi Baining Expressway Co., Ltd., Nanning 530012, China

²

College of Computer Science and Engineering, Guilin University of Technology, Guilin 541006, China

³

Guangxi Jet Toll Technology Co., Ltd., Nanning 530022, China

^*

Authors to whom correspondence should be addressed.

Electronics 2024, 13(19), 3790; https://doi.org/10.3390/electronics13193790

Submission received: 6 August 2024 / Revised: 8 September 2024 / Accepted: 12 September 2024 / Published: 24 September 2024

(This article belongs to the Special Issue New Trends in AI-Assisted Computer Vision)

Download

Browse Figures

Versions Notes

Abstract

:

In the field of computer vision, the detection of road potholes at night represents a critical challenge in enhancing the safety of intelligent transportation systems. Ensuring road safety is of paramount importance, particularly in promptly repairing pothole issues. These abrupt road depressions can easily lead to vehicle skidding, loss of control, and even traffic accidents, especially when water has pooled in or submerged the potholes. Therefore, the detection and recognition of road potholes can significantly reduce vehicle damage and the incidence of safety incidents. However, research on road pothole detection lacks high-quality annotated datasets, particularly under low-light conditions at night. To address this issue, this study introduces a novel Nighttime Pothole Dataset (NPD), independently collected and comprising 3831 images that capture diverse scene variations. The construction of this dataset aims to counteract the insufficiency of existing data resources and strives to provide a richer and more realistic benchmark. Additionally, we develop a baseline detector, termed WT-YOLOv8, for the proposed dataset, based on YOLOv8. We also evaluate the performance of the improved WT-YOLOv8 method and eight state-of-the-art object detection methods on the NPD and the COCO dataset. The experimental results on the NPD demonstrate that WT-YOLOv8 achieves a 2.3% improvement in mean Average Precision (mAP) over YOLOv8. In terms of the key metrics—AP@0.5 and AP@0.75—it shows enhancements of 1.5% and 2.8%, respectively, compared to YOLOv8. The experimental results provide valuable insights into each method’s strengths and weaknesses under low-light conditions. This analysis highlights the importance of a specialized dataset for nighttime pothole detection and shows variations in accuracy and robustness among methods, emphasizing the need for improved nighttime pothole detection techniques. The introduction of the NPD is expected to stimulate further research, encouraging the development of advanced algorithms for nighttime pothole detection, ultimately leading to more flexible and reliable road maintenance and road safety.

Keywords:

pothole detection; nighttime; low light; benchmark

1. Introduction

Pothole detection is crucial for ensuring traffic safety, preventing vehicle damage, and maintaining road infrastructure [1,2,3,4]. Potholes can cause accidents by prompting sudden swerves or posing risks to cyclists and pedestrians, as well as lead to significant vehicle repair costs due to tire, wheel, suspension, and underbody damage. Early detection allows for timely repairs, extending road lifespan, and reducing long-term maintenance expenses [2,5]. Economically, proactive detection minimizes insurance claims and repair costs while also supporting efficient resource allocation for road maintenance [1,6]. Smooth roads enhance driving comfort, reduce fuel consumption, and lower vehicle emissions, contributing to environmental benefits [7]. Additionally, pothole detection data inform urban planning and can be integrated into smart city initiatives, supporting broader infrastructure management and the safe operation of autonomous vehicles [8,9]. Manual visual inspection of roads for potholes requires skilled personnel, is time-consuming, and usually requires a significant workforce, often trained in identifying various road defects [10,11]. In recent years, many automated pothole detection algorithms have been proposed, aiming to alleviate the inefficiencies associated with manual inspection by leveraging advancements in technology such as artificial intelligence, machine learning, and sensor integration. For instance, Wang et al. [12] employed a grayscale co-occurrence matrix feature extraction algorithm in conjunction with support vector machines as a classification tool for detecting road potholes. Chen et al. [13] proposed a method for pothole detection using location-aware convolutional neural networks (CNNs). Asad et al. [1] focused on pothole detection using deep learning techniques from a real-time and AI-on-the-edge perspective, exploring methods to detect potholes efficiently using AI algorithms that can operate in real-time scenarios, enhancing infrastructure maintenance and road safety measures. However, these methods all focus on daytime pothole detection. This work represents the first attempt to explore pothole detection under nighttime conditions. Detecting potholes at night offers several advantages over daytime detection. For instance, roads typically have less traffic at night, which allows for safer and more efficient pothole detection and repair processes without causing significant traffic delays or disruptions. Nighttime temperatures are usually lower than daytime temperatures, especially in warmer climates. These cooler conditions can help prevent pothole detection equipment from overheating, thereby improving their effectiveness and accuracy. Moreover, nighttime often sees fewer pedestrians and cyclists using the roads, reducing the safety concerns and logistical challenges associated with conducting pothole detection and repair activities. However, nighttime pothole detection faces distinct challenges. A comparison between normal illumination and low-illumination nighttime road pothole images is shown in Figure 1. For instance, the quality and intensity of lighting can vary significantly from one location to another, causing inconsistencies in detection performance. Areas with insufficient street lighting or uneven illumination can create shadows and highlights that obscure potholes. Artificial lights, such as those from streetlights and vehicle headlights, can create glare and reflections on wet or shiny road surfaces, leading to false positives or missed detections. Therefore, automated pothole detection algorithms must overcome these challenges to ensure the accuracy and reliability of detection results.

One of the biggest challenges in advancing research on nighttime pothole detection is the lack of specialized, high-quality annotated datasets tailored for nighttime conditions. Most existing pothole datasets focus predominantly on well-illuminated daytime environments, with insufficient consideration for the unique lighting conditions and complexities at night. For example, the Pothole Mix dataset [14] consists of 4340 pairs of images and masks at various resolutions but is limited to road pothole images under normal lighting conditions. The RDD2020 dataset [15] contains 26,336 road images from India, Japan, and the Czech Republic, capturing four types of road damage—longitudinal cracks, transverse cracks, alligator cracks, and potholes. This dataset was collected during the day and considers various weather and lighting conditions at the time of image capture, but it does not include low-light night scenarios. The current bias toward daytime data in pothole detection datasets significantly limits the applicability and accuracy of existing algorithms in real-world nighttime scenarios. Nighttime conditions present unique challenges, including variable illumination, glare, and reduced image quality—factors such as these are not typically considered in datasets designed for daytime environments. Without annotated datasets that capture these nighttime complexities, it is difficult to develop and validate algorithms capable of reliable pothole detection in low-light conditions. This gap highlights the need for focused efforts to collect and annotate nighttime road data, which would greatly improve the performance and reliability of pothole detection systems in low-light conditions.

In this work, we present the Nighttime Pothole Dataset (NPD), a comprehensive benchmark specifically designed for nighttime pothole detection. The NPD encompasses a wide range of environments, including urban roadways and rural trails, and captures various weather and lighting conditions to ensure its broad applicability for diverse detection scenarios. Each image in the dataset is meticulously labeled with pixel-level annotations, establishing its state-of-the-art quality in terms of scale, diversity, and practicality. This dataset is designed to facilitate the development and enhancement of pothole detection algorithms, ensuring reliable performance in low-light conditions. The NPD’s comprehensive coverage of various lighting and weather conditions can help algorithms become more robust and accurate, ultimately contributing to safer and more efficient road maintenance practices. Moreover, nighttime images lose high-frequency details due to inadequate illumination, making it difficult for traditional convolutional neural networks to effectively detect potholes. To address this challenge, we develop a baseline detector, termed WT-YOLOv8, based on YOLOv8. This method integrates the WTConv module [16], which leverages wavelet transformation to expand the receptive field and effectively extract low-frequency information. This compensates for the lack of high-frequency information in nighttime images, enhancing the network’s performance in detecting potholes under low-light conditions. We also evaluate the efficacy of the improved WT-YOLOv8 method and eight state-of-the-art object detection methods on the Nighttime Pothole Dataset (NPD) to compare their performance in the task of nighttime pothole detection. This evaluation provides critical insights into how well these advanced detection algorithms adapt to the unique challenges posed by low-light conditions. By systematically benchmarking these methods, we can identify their strengths and weaknesses in handling variable illumination, glare, shadows, and other complexities specific to nighttime environments. Our comparative study underscores the importance of specialized datasets like the NPD in advancing the field of pothole detection. By leveraging the NPD to test and refine object detection methods, we can drive significant progress in developing reliable, efficient, and effective solutions for maintaining road safety and infrastructure quality during nighttime conditions.

Our contributions can be summarized as follows:

We are the first to extensively explore the challenges associated with nighttime pothole detection. We examine how these factors affect the performance of detection algorithms and highlight the need for specialized datasets and advanced techniques tailored to nighttime environments. By shedding light on these issues, our work aims to inspire further research in this direction, ultimately enhancing road safety and maintenance practices in nighttime conditions.
We present the NPD, a comprehensive benchmark specifically designed for nighttime pothole detection, covering a wide range of environments, including urban roadways and rural trails, and various weather and lighting conditions. The NPD includes 3831 meticulously collected and annotated images of nighttime potholes.
Based on YOLOv8, we develop a baseline detector by introducing wavelet transform convolution (WTConv) [16] to the original model for better performance.
We evaluate the performance of the improved WT-YOLOv8 method and eight state-of-the-art object detection methods on the NPD, providing critical insights into their efficacy and adaptability to the unique challenges of nighttime pothole detection. This systematic benchmarking identifies the strengths and weaknesses of these methods in handling variable illumination, glare, shadows, and other nighttime complexities.

The organization of this paper is as follows. In Section 2, we provide a comprehensive review of the related research works in the field of road pothole detection. In Section 3, we detail the construction process of the NPD, the annotation guidelines, and statistical analysis. In Section 4, we introduce the WT-YOLOv8 baseline detector for nighttime pothole detection, which enhances the YOLOv8 model by incorporating a wavelet transform-based convolutional attention module. In Section 5, we evaluate the performance of nine object detection methods on the NPD with in-depth analysis. Finally, in Section 6, we summarize this paper and discuss the potential applications and future research directions of the NPD.

2. Related Works

A key aspect of effective road maintenance and repair planning is the timely detection and assessment of potholes [17]. But the reliance on manual visual inspection for road hazard diagnosis is inefficient, costly, and labor-intensive. In recent years, a multitude of methods have been proposed for the automated detection and estimation of potholes. These methods are broadly categorized into two main classes: traditional methods and deep learning approaches. Traditional methods primarily focus on vibration-based and 3D reconstruction techniques. However, they exhibit limited robustness in detecting potholes within complex environments and under varying lighting conditions. In recent years, with the rapid advancement of deep learning, significant progress has been made in road pothole detection using deep neural networks [18,19,20]. These methods fully leverage the characteristics of annotated samples, effectively enhancing the precision and robustness of detection. This section provides an exhaustive overview of the research developments in the field of road pothole detection.

2.1. Traditional Pothole Detection Methods

Traditional methods rely on manual visual inspection for detecting road surface potholes. However, these approaches are inefficient, costly, and labor-intensive. As a result, innovative methods have emerged that leverage technologies such as vibration sensors, 3D reconstruction, and computer vision. These methods utilize manually crafted features and localization techniques to detect potholes in images. For instance, Kim and Ryu [21] proposed an algorithm that combines vibration sensors and vehicle dynamic data to effectively identify potholes by analyzing changes in road surface vibrations detected by onboard sensors. Additionally, Koch and Brilakis [22] designed an automated method for detecting asphalt road potholes. They initially used histogram-shape thresholding to segment images into defect and non-defect regions. Subsequently, geometric features are used to estimate potential pothole shapes, and their presence is verified by comparing the internal textures of potential potholes with surrounding non-defective road surfaces. In the realm of computer vision, Buza et al. [23] introduced a 2D image-based pothole detection method, utilizing shape detection and Hough transforms to analyze geometric features and texture information for pothole identification. Furthermore, Kim et al. [4] utilized laser scanners or stereoscopic vision techniques for precise 3D road reconstruction to facilitate pothole detection. Moreover, Ryu et al. [24] proposed an enhanced 2D image-based pothole detection method designed for intelligent transportation systems (ITSs) and road management systems. They utilized 2D road images collected by survey vehicles and compared the proposed method’s performance under various conditions such as road types, recording conditions, and lighting.

These innovative approaches exemplify a shift from traditional manual methods to advanced technological developments in pothole detection, offering new possibilities for enhancing road maintenance efficiency and safety management.

2.2. Deep Learning Methods for Pothole Detection

Due to their remarkable capability to effectively extract image features, many researchers are keen to leverage deep learning networks for enhanced pothole detection. Numerous methods use convolutional neural networks (CNNs) for feature extraction to accurately localize and detect pothole areas [25,26]. For example, Bhatia et al. [27] explored thermal images, which can provide different insights compared to visible spectrum images, potentially revealing potholes not detectable through standard imaging methods. Chellaswamy et al. [28] focused on a CNN-based approach for detecting both potholes and bumps, showcasing the versatility of CNNs in addressing various road anomalies. Saisree and Kumaran’s work [29] emphasized classifying road conditions into discrete categories, a method useful for broader classification tasks beyond just pothole detection. Khan et al. [30] utilized YOLO for its efficiency and speed in object detection, suitable for integration into autonomous vehicle systems where real-time processing is crucial.

These methods leverage the advanced feature extraction capabilities of deep learning networks to achieve accurate pothole detection and localization. Nonetheless, challenges persist, such as the shortage of high-quality, well-annotated datasets and the necessity for continuous algorithm enhancements to effectively address complex scenarios and, in particular, nighttime conditions. This work addresses these issues by providing a comprehensive and meticulously annotated benchmark dataset for road pothole detection, offering valuable technical support for enhancing road traffic safety and maintenance management.

3. The Construction of the NPD

This section outlines the methods and technical aspects of the pipeline used to construct the NPD. In the process of constructing the NPD, we select various scenarios and provide manual annotations for each image.

3.1. Motivation for Developing the NPD

In the era of deep learning, the quality of datasets and their precise annotations play a critical role in driving the success of various computer vision tasks. With the continuous advancement of deep learning technology, the field of road pothole detection has witnessed new challenges and opportunities. Although there are some datasets specifically designed for road pothole detection, such as Pothole-600 [31] and the Pothole Dataset [32] provided by the Indian Institute of Technology, these datasets primarily focus on daytime road conditions and provide images of potholes in different environments. The Road Damage Dataset [15], which contains various types of road damage, including potholes and cracks, is used for training and evaluating road damage detection models. However, these datasets have limited applicability in nighttime environments and are unable to meet detection requirements under complex lighting conditions.

In view of this, we developed the Nighttime Pothole Dataset (NPD), aiming to address the gap in existing datasets for nighttime applications. The NPD comprises 3831 high-quality images that specifically focus on night scenes, providing researchers with a rich and diverse dataset for nighttime pothole detection. It includes images meticulously planned and captured by us, encompassing a wide range of scenes, lighting conditions, and camera angles, thus providing an unprecedented benchmark for nighttime road pothole detection algorithms. This dataset serves as a reliable benchmark for the development and evaluation of nighttime pothole detection algorithms, contributing to advancements in research and applications in this field.

3.2. Data Collection

Our research group analyzed existing road pothole data and identified a significant deficiency: existing road pothole datasets do not fully cover the range of road conditions. Given the specific lighting environment at night, which presents higher requirements for identifying potholes on the road, we decided to take matters into our own hands to address this issue. We conducted a specialized study focused on night observations and, based on the self-collected data, obtained 3831 high-quality images. These images were specifically taken to capture pothole conditions on night roads, covering different scenes such as urban streets, campus environments, and towns. We focused on night photographs to ensure that the dataset was comprehensive and representative of this particular scenario. To ensure the quality and diversity of the collected data, we used various types of photographic equipment, including smartphones (i.e., iPhone 15 Plus) and professional cameras (i.e., GoPro 12). The high performance of these devices allowed us to capture clear and high-quality images even under low-light conditions at night.

During the collection process, we performed strict data cleaning, manually removing defective images that contained repeated scenes, blurry images, or overlapping elements. This step ensured that each image was of high quality and could be effectively used for subsequent research and analysis. Ultimately, our Nighttime Pothole Dataset (NPD) comprised 3831 well-annotated images, making it the first and only dataset in this field specifically focused on nighttime pothole detection. We believe that the introduction of the NPD will greatly promote the development and application of nighttime pothole detection technology. Some representative images from the NPD are shown in Figure 2.

3.3. Data Source and Processing

The image collection locations for this dataset were various sites in Guilin City, Guangxi Zhuang Autonomous Region, including Lintang Town in Yanshan District, Guilin University of Technology, Guilin Normal University, County Road 103 in Yanshan District, and Liangfeng Road in Yanshan District. The samples included dry potholes at night, water-containing potholes at night, normal asphalt pavement at night, and potholes on mud roads at night. At the same time, we applied preprocessing operations, such as static cropping, dynamic cropping, rotation, and shearing, using Roboflow Universe to the dataset images. The purpose of these operations was to enhance the generalization capability of the model. These operations expanded our dataset from the original 3831 images to 8928 images.

3.4. Data Annotation

In this section, we provide a detailed introduction to the specific practices for annotating bounding boxes and category labels during the dataset creation process. For this dataset, we defined potholes as the unique category, and bounding boxes were drawn around the clearly visible pothole areas in the images. Figure 3 shows some examples of annotated images from the NPD. To ensure consistency and accuracy in the annotation process, we referred to a set of detailed annotation guidelines [33]. These guidelines included ensuring the correct placement of bounding boxes, describing the perspective, and recording the truncation status, among others. They also emphasized the importance of accurately marking all instances of potholes and taking measures to minimize errors within the bounding boxes as much as possible. During the annotation process, we adhered to the following basic guidelines:

Bounding boxes were drawn as rectangles and strictly aligned with the edges of the potholes.
Each independent pothole was assigned a corresponding bounding box. If multiple potholes visually merged into one and were indistinguishable, they were treated as a single entity for annotation.
The focus of annotation was solely on the pothole itself. Even if the pothole was connected to other physical entities, only the pothole part was annotated.
Small potholes in the image background that resembled the main pothole but were not the primary focus of annotation were ignored.

To ensure the effective completion of the annotation task, we adopted a three-stage process consisting of manual annotation, visual inspection, and bounding box fine-tuning. In the first stage, professionals (i.e., students involved in image annotation) annotated the images according to the previously outlined guidelines. Subsequently, the annotated dataset was submitted to an audit team responsible for checking to ensure that no potholes were missed and that the annotations were complete. In the third stage, we corrected and optimized any discovered annotation errors, which were then fed back into the initial annotation process to improve quality. Through this three-step strategy, we ensured that the annotations of potholes in the dataset met a high-quality standard.

3.5. Statistical Analysis

To facilitate model training and evaluation, we divided the Nighttime Pothole Dataset (NPD) into training, validation, and test sets, containing 2625, 754, and 376 annotated images, respectively. This division ensured that the dataset was comprehensively covered and efficiently utilized.

In Figure 4 and Figure 5, we use histograms to display the distribution of pothole features in the NPD. Figure 4 illustrates the number of potholes at different depths in the training, validation, and test subsets of the NPD. Specifically, the counts of shallow, medium, and deep potholes are 1267, 975, and 383 in the training subset; 148, 187, and 41 in the test subset; and 226, 425, and 103 in the validation subset.

Figure 5 reflects the clarity of pothole images at different depths in the NPD. The counts of low, medium, and high clarity images are 531, 895, and 215 for shallow potholes; 254, 927, and 406 for medium potholes; and 162, 283, and 82 for deep potholes. Compared to images captured under good lighting conditions, nighttime images exhibit more blurred and noisy details, further emphasizing the challenges of pothole detection in nighttime environments and providing important insights for developing more effective detection methods.

4. Baseline Detector for Nighttime Pothole Detection

The challenge of nighttime imagery stems from inadequate illumination, leading to the loss of high-frequency detail and making it difficult for traditional convolutional neural networks to effectively identify potholes. However, the shape and contour of potholes, which are low-frequency information, still persist in nighttime images and become key features for recognition. To overcome this challenge and effectively advance nighttime road pothole detection, we propose an enhanced method based on the YOLOv8 baseline detector, termed WT-YOLOv8. The core idea of WT-YOLOv8 is the integration of the WTConv module [16] into the backbone of YOLOv8. The WTConv module leverages wavelet transformation to achieve a larger receptive field with minimal parameters and effectively extract low-frequency information. This design significantly enhances the network’s responsiveness to shapes, thereby compensating for the lack of high-frequency information in nighttime images. Compared to other modules, the FFC module [34] utilizes the fast Fourier transform for feature fusion but has limited frequency mixing capability and cannot effectively extract low-frequency information. The RepLK module [35] can also achieve a large receptive field, but its parameter quantity grows quadratically with the size of the receptive field, which can easily lead to over-parameterization issues, and its ability to extract low-frequency information is not as strong as that of the WTConv module.

Therefore, by integrating the WTConv module into the YOLOv8 backbone, we provide a new approach for nighttime road pothole detection. To validate the effectiveness of the WTConv module, we conduct evaluation experiments in the task of nighttime pothole detection, as detailed in Section 5.

4.1. WT-YOLOv8

In Figure 6, we present the network architecture of WT-YOLOv8. The WT-YOLOv8 architecture introduces a wavelet transform convolution (WTConv) module [16] into the backbone of the state-of-the-art object detection model, YOLOv8. This approach aims to leverage the multi-frequency response and large receptive field provided by WTConv to enhance the model’s ability to capture both local and global features in the image. The PANet [36] neck subnetwork receives these features and is responsible for the encoding process. The network utilizes a bottom-up path to combine deep features with shallow features and then combines the deep features with a top-down path. The prediction head uses the original YOLOv8 head, which consists of two concurrent task-specific heads: a classification head and a regression head. The definition of the total loss for training WT-YOLOv8 is as follows:

L_{t o t a l} = L_{c l s} + L_{C I o U} + L_{d f l},

(1)

where

L_{c l s}

represents the classification loss and

L_{C I o U}

and

L_{d f l}

represent the regression losses. The definitions of these losses are given below.

The classification loss function used is the Binary Cross-Entropy (BCE) loss function. The calculation is as follows:

L_{c l s} = - \frac{1}{Γ} \sum_{i = 1}^{Γ} [y_{i} log (\hat{y_{i}}) + (1 - y_{i}) log (1 - \hat{y_{i}})],

(2)

where

y_{i}

represents the true label of the i-th sample (either 0 or 1),

\hat{y_{i}}

represents the predicted value of the i-th sample (a real number between 0 and 1), and

Γ

represents the number of samples.

L_{C I o U} = 1 - IoU + \frac{σ^{2} (m, m_{g t})}{c^{2}} + λ v,

(3)

where IoU is the Intersection over Union, m and

m_{g t}

represent the center points of the two bounding boxes,

σ

represents the Euclidean distance between these center points, c denotes the diagonal length of the smallest enclosing box that covers both the predicted and ground-truth bounding boxes, measures the consistency of the aspect ratios between the two bounding boxes, and

λ

is a weighting coefficient.

L_{d f l} = - ((y_{i + 1} - y) \log D_{i} + (y - y_{i}) \log D_{i + 1}),

(4)

where y represents the position of the true label and

y_{i}

and

y_{i + 1}

represent two points that are closest to the label y.

D_{i}

and

D_{i + 1}

represent the ratios of the distances from these points to y, given by the following expressions:

D_{i} = \frac{y_{i + 1} - y}{y_{i + 1} - y_{i}}

and

D_{i + 1} = \frac{y - y_{i}}{y_{i + 1} - y_{i}}

.

4.2. Wavelet Transform Convolution (WTConv)

The WTConv module [16] utilizes the wavelet transform (WT) to enhance the receptive field of convolutional neural networks (CNNs) while avoiding issues related to excessive parameterization. In this study, we integrate WTConv into the backbone of our baseline detector to extract pothole information at night, aiming to further improve the performance of nighttime road pothole detection.

The Haar wavelet transform (WT) is selected as the basis for WTConv due to its computational efficiency. The Haar WT decomposes the image into low-frequency and high-frequency components in each spatial dimension through deep convolution and downsampling. The one-dimensional Haar WT can be represented using convolution kernels

[1, 1] / \sqrt{2}

and

[1, - 1] / \sqrt{2}

, while the two-dimensional Haar WT extends these operations across two dimensions, resulting in four filters:

\begin{matrix} f_{L L} = \frac{1}{2} [\begin{matrix} 1 & 1 \\ 1 & 1 \end{matrix}], f_{L H} = \frac{1}{2} [\begin{matrix} 1 & - 1 \\ 1 & - 1 \end{matrix}], f_{H L} = \frac{1}{2} [\begin{matrix} 1 & 1 \\ - 1 & - 1 \end{matrix}], f_{H H} = \frac{1}{2} [\begin{matrix} 1 & - 1 \\ - 1 & 1 \end{matrix}] . \end{matrix}

(5)

These filters further decompose the input image X into low-frequency components,

X_{L L}

, and high-frequency components,

X_{L H}, X_{H L}, X_{H H}

.

The WTConv module is designed to decompose the input signal into sub-bands of varying frequencies through wavelet transformation, which entails filtering and downsampling the low-frequency and high-frequency components of the input signal. Subsequently, small convolutional kernels (e.g., 3 × 3) are applied to these sub-bands, and the output is reconstructed using the inverse wavelet transform (IWT). Figure 7 illustrates the WTConv module for a two-level WT. This process can be encapsulated as follows:

Y = IWT (Conv (w, WT (X))),

(6)

where X represents the input tensor, w denotes the convolutional kernel weights, and Conv, WT, and IWT signify the convolution operation, wavelet transformation, and its inverse, respectively.

5. Evaluation

5.1. Implementation Details

In the implementation, we utilized the Adam W optimizer, initiating the learning rate at 0.001. Throughout the training process, we used batch sizes of 32. We trained the models on the NPD and the COCO dataset [33] for 300 epochs. Additionally, we incorporated a weight decay factor of 0.05 to regularize the models and prevent overfitting. All experimental procedures were carried out on a personal computer configured with an i9-10850K CPU, 16GB of RAM, and an NVIDIA TitanX GPU.

5.2. Evaluation Metrics

In this work, we chose average precision (AP) and mean average precision (mAP) as the core evaluation metrics to comprehensively evaluate the performance of nighttime pothole detection models in terms of recognition accuracy. Precision measures the proportion of the model’s predicted results that were correctly identified as potholes, providing us with an intuitive performance indicator. To assess the overlap between the model’s predicted pothole bounding boxes and the true pothole bounding boxes, we utilized Intersection over Union (IoU), which has become a key tool for assessing detection accuracy and recall. For a more detailed introduction, please refer to [37].

According to the COCO evaluation metrics [33], this study subdivided multiple IoU thresholds as reference standards for evaluation, including 0.5, 0.75, and a series of thresholds ranging from 0.5 to 0.95 with a step of 0.05. The AP values were calculated under these specific IoU thresholds, marked as AP@0.5, AP@0.75, and AP@[0.50:0.05:0.95], providing us with the opportunity to observe the performance of the model from different angles. Furthermore, by employing the COCO mAP metric, this study comprehensively assessed the nighttime pothole detection model’s ability to accurately identify and locate potholes, with the “m” prefix symbolizing the aggregation of average precision.

5.3. Baseline Methods

To validate the practicality of the Nighttime Pothole Dataset (NPD), we comprehensively evaluated the improved WT-YOLOv8 method alongside eight leading object detection algorithms. This assessment aimed to comprehensively review current detection technology and establish a solid baseline for future research on the NPD. We comprehensively compared our enhanced WT-YOLOv8 model with eight state-of-the-art object detection algorithms, including YOLOv5, YOLOv7 [38], YOLOv8, YOLOv9 [39], YOLOX [40], DiffusionDet [41], RTMDet [42], and DINO [43], to assess its performance and advancements in nighttime pothole detection. The selected baseline methods reflect the wide application and technological progress in the field of object detection, serving as benchmarks for evaluating the efficacy of the NPD in this study. By evaluating these advanced object detection algorithms, we provide valuable references and insights for future research. In the subsequent section on the experimental results, we present a detailed introduction to the test results and a comparative analysis of different methods, investigating the performance and potential of these methods in the task of nighttime pothole detection.

5.4. Evaluation Results

5.4.1. Overall Performance

We performed a comprehensive evaluation of the newly constructed Nighttime Pothole Dataset (NPD) to gain deep insight into the performance of the baseline detection models. Table 1 and Table 2 present the evaluation results of the precision metrics: mAP, AP@0.5, and AP@0.75. Figure 8 presents a bar chart that visually compares the performance of the various detection models.

In the evaluation results, YOLOv5 achieved the best performance with mAP, AP@0.5, and AP@0.75 scores of 0.568, 0.906, and 0.599, respectively, ranking at the top among the eight advanced baseline detectors. YOLOv8 and RTMDet closely followed, demonstrating their powerful capabilities in the task of nighttime pothole detection. In contrast, YOLOX performed relatively poorly, with mAP, AP@0.5, and AP@0.75 scores of 0.350, 0.708, and 0.301, respectively, showing a nearly 20% gap in each metric compared to YOLOv5, highlighting its limitations in addressing the specific challenges of the NPD. On the COCO dataset, WT-YOLOv8 continued to demonstrate its robustness with an mAP of 0.450, outperforming YOLOv8, which achieved an mAP of 0.442. This indicates that the integration of the WTConv module is beneficial not only in specialized datasets like the NPD but also in more generic object detection tasks.

The excellent performance of YOLOv5 and RTMDet on the NPD can be attributed to their innovative technologies and high adaptability to complex nighttime scenarios. YOLOv5 optimized its adaptability to nighttime conditions through special image enhancement techniques and fine-grained feature extraction strategies. RTMDet enhanced its recognition ability for potholes of different shapes and sizes by employing sophisticated multi-scale processing and deeply fusing contextual information. The relatively poor performance of YOLOX on the NPD can be attributed to its feature extractor’s limitations in capturing nuanced details under low-light conditions. Additionally, YOLOX’s deficiency in incorporating effective mechanisms to address the dynamic fluctuations in lighting during nighttime, coupled with its suboptimal utilization of distinctive image features during training, collectively contributed to its subpar performance on the NPD. DINO’s superior performance on the COCO dataset can be attributed to its end-to-end training framework, which incorporates advanced denoising strategies and contextual attention mechanisms, enabling it to effectively handle class imbalance and capture nuanced object features present in diverse and complex scenes.

By integrating the WTConv into YOLOv8, we successfully developed WT-YOLOv8. The experimental results indicate that WT-YOLOv8 outperformed the original YOLOv8 model in three metrics. Specifically, the mAP of WT-YOLOv8 reached 0.585, representing a 2.3% improvement over YOLOv8’s 0.562; in terms of AP@0.5, WT-YOLOv8’s score of 0.918 showed a 1.5% increase compared to YOLOv8’s 0.903; and for AP@0.75, WT-YOLOv8’s 0.637 represents a 2.8% improvement over YOLOv8’s 0.609. These enhancements demonstrate that WT-YOLOv8 is more accurate and robust in detecting potholes under nighttime conditions. The performance enhancement of WT-YOLOv8 can be attributed to the multi-band response characteristic of the WTConv module, which enhances the model’s ability to capture low-frequency information in nighttime images through wavelet transformation. Low-frequency information often corresponds to key structural features in images, such as pothole shape and size, which are essential for detection under nighttime conditions.

When comparing the performance between the NPD and the COCO dataset, it is evident that WT-YOLOv8 adapts well to different scenarios. The consistent performance enhancement over the base YOLOv8 model across both datasets underscores the effectiveness of the WTConv module in enhancing feature extraction capabilities, particularly in challenging lighting conditions. The performance uplift on COCO, albeit slightly lower than that on the NPD, is still significant. This can be attributed to the inherent differences in dataset characteristics, where COCO’s broader range of object categories and scenarios presents a different set of challenges compared to the focused pothole detection task in the NPD.

The comprehensive evaluation of both the NPD and the COCO dataset solidifies the superiority of WT-YOLOv8. The model’s ability to excel in diverse conditions, from specific tasks like pothole detection to general object detection, validates its potential for real-world applications. The WTConv module’s contribution to enhancing low-light performance and capturing essential structural features is pivotal, offering a promising direction for future research and development in computer vision, especially under low-light conditions. These detailed performance evaluations and analyses not only provide valuable insights for research in the field of nighttime pothole detection but also pave the way for future algorithm improvements and optimizations in this area.

To further assess the performance of models, we focused on analyzing the precision–recall (PR) curve, a key metric that accurately reflects a model’s performance on imbalanced datasets. Figure 9 shows the precision–recall (PR) curve values for nine models. It can be seen that WT-YOLOv8 achieved a score of 0.918 in the mAP@0.5 metric, which is a 1.5% improvement over YOLOv8’s score of 0.903. While this numerical enhancement may seem minor, in practical applications, especially in scenarios requiring high precision such as nighttime pothole detection, this increase means that more potholes were accurately identified, which can significantly enhance road safety. Additionally, when comparing WT-YOLOv8’s mAP@0.5 with YOLOv5’s 0.906, WT-YOLOv8 showed a similar improvement of 1.2%. This result demonstrates that incorporating the WTConv module not only enhanced the base YOLOv8 model but also surpassed other state-of-the-art object detection models in overall performance. Furthermore, WT-YOLOv8 also demonstrated a similar trend in the AP@0.75 metric, where it performed better than YOLOv8 with a score of 0.637 compared to 0.609, an enhancement of 4.6%. This gain is particularly important under stricter Intersection over Union (IoU) thresholds, as it reflects the model’s performance under more precise matching conditions.

In summary, the analysis of the precision–recall curve results indicates that the WT-YOLOv8 model performs exceptionally well in pothole detection for autonomous vehicles, ranking among the top in terms of both precision and recall. This validates its potential as a high-efficiency and reliable solution. These results strongly support the potential of WT-YOLOv8 for enhancing autonomous vehicle safety.

5.4.2. Qualitative Evaluation

Figure 10 and Figure 11 present the qualitative detection results of the nine baseline detectors for nighttime road pothole detection, with each method represented by different-colored bounding boxes. Figure 10 displays examples where the detectors’ performance was satisfactory. Despite the relatively dark background in these images, they feature clear textures, simple backgrounds, and standard-sized potholes, making detection easier. The detectors were able to accurately identify and localize potholes under these conditions, demonstrating high detection accuracy and reliability. In contrast, Figure 11 displays cases where the detectors deviated in their identification process. These images mainly depict scenes with darker, more complex lighting, where potholes are obscured by shadows, low contrast, or blurry boundaries, and are further complicated by cluttered backgrounds. This complexity posed a challenge for the detectors to accurately predict, leading to deviations in the identification results.

These evaluation results indicate that the baseline detectors are prone to errors when faced with challenging night scenes, especially when dealing with potholes under shadowed and low-contrast conditions. These results offer a clearer understanding of how the detectors perform in various nighttime scenarios and help pinpoint areas for improvement.

Further analysis revealed that the performance of the detectors in nighttime pothole detection was affected by factors such as image clarity, background complexity, and pothole illumination. In high-quality images, the detectors fully utilized clear features for accurate localization and identification. In complex scenes, the detectors required stronger feature extraction and context information processing capabilities to overcome background interference and the challenges posed by lighting changes. Especially under nighttime conditions, the presence of dim lighting and shadows posed higher demands on the detectors. The detectors need to adapt to varying lighting conditions and maintain effective detection in low-light environments. This requires incorporating image enhancement and data augmentation techniques tailored for nighttime scenes during model training to improve the robustness and adaptability of the models. In conclusion, these qualitative evaluation results not only provide insights into the performance of current baseline detectors in the task of nighttime road pothole detection but also offer valuable references for future algorithm improvements and optimizations in this field.

5.4.3. Ablation Study

In order to study the influence of the WTConv module on nighttime road pothole detection in the WT-YOLOv8 backbone, we evaluated WT-YOLOv8 with different levels and wavelets of the WTConv module on the NPD. We trained WT-YOLOv8 for 300 epochs on the NPD, using a 5 × 5 kernel size and other configurations. First, we compared the effect of different wavelet bases on model performance. Second, we experimented with different WT levels to see how different levels of the WTConv layer impacted the final result. Table 3 and Table 4 show the results of all described configurations.

When comparing the effect of different wavelet bases on model performance, we specifically focused on the Haar, db1, and db2 wavelet bases, as they are widely utilized in multi-resolution analysis and have demonstrated favorable characteristics in image processing tasks. Table 3 presents a performance comparison of each wavelet basis at one level. In terms of the mAP metric, the performance of the db1 wavelet basis (0.585) was slightly higher than that of Haar (0.581) and db2 (0.580). This indicates that db1 offers better performance when considering different IoU thresholds comprehensively. At an IoU threshold of 0.5, the performance of all wavelet bases was relatively high, with Haar being slightly superior, although Haar and db1 performed very similarly. At a more stringent IoU threshold (0.75), the performance of db1 (0.637) also surpassed that of Haar (0.630) and db2 (0.636), suggesting an advantage of db1 in precise matching. Based on the aforementioned analysis, we selected db1 as the wavelet basis for the WT-YOLOv8 model in our subsequent experiments.

To evaluate the impact of different levels on model performance, the db1 wavelet basis was utilized in the experiments due to its superior performance demonstrated in previous studies. Table 4 presents the performance of the model under various level settings. In the one-level case, the db1 wavelet basis achieved an mAP of 0.585, which was the best performance. When increased to three levels, the mAP significantly dropped to 0.562, which can be attributed to the unnecessary complexity introduced by the additional levels, leading to a decline in performance. Based on this analysis, we selected one level as the optimal setting for the WT-YOLOv8 model. This choice was based on maintaining a high mAP while keeping the model complexity and computational cost relatively low.

6. Conclusions

In this work, we introduce the Nighttime Pothole Dataset (NPD), which marks a significant advancement in the field by providing a robust collection of 3831 annotated images captured under diverse low-light conditions. This innovative dataset addresses a critical gap in high-quality annotated data, which has constrained the development and evaluation of nighttime pothole detection algorithms. Additionally, we develop a baseline detector, termed WT-YOLOv8, specifically for the proposed dataset. We conduct an extensive evaluation of the improved WT-YOLOv8 method and eight state-of-the-art object detection techniques on the NPD and the COCO dataset. Our performance analysis provides valuable insights into their effectiveness in nocturnal environments. The results reveal substantial variations in accuracy and robustness among these methods, highlighting the inherent challenges of detecting road potholes under low-light conditions. These discrepancies underscore the need for ongoing refinement and adaptation of detection algorithms to meet the specific demands of nighttime scenarios. By providing a comprehensive and realistic benchmark, the NPD is poised to drive further research and innovation in this essential area. Researchers and practitioners can utilize this dataset to enhance existing algorithms and explore new methodologies for more effective nighttime pothole detection. This, in turn, will help achieve more flexible, reliable, and accurate road maintenance solutions, significantly reducing vehicle damage and preventing accidents caused by undetected potholes.

Author Contributions

Conceptualization, S.L.; Data curation, Q.S., X.Z., W.C., W.W., Z.Y. and H.Z.; Investigation, M.L., Q.S., X.Z., W.C., W.W., Z.Y. and H.Z.; Project administration, K.X. and S.L.; Resources, C.L. and Y.Z.; Software, M.L., Z.Y. and H.Z.; Supervision, K.X., C.L. and Y.Z.; Writing—original draft, M.L., Z.Y. and H.Z.; Writing—review and editing, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset that supports the findings of this study is available at https://github.com/hhaozhang/NPD (accessed on 8 September 2024).

Conflicts of Interest

Authors Min Ling, Quanjun Shi, Xin Zhao, Wenzheng Chen, Wei Wei, and Kai Xiao were employed by the company Guangxi Baining Expressway Co., Ltd. Authors Chenchen Lu and Yufan Zeng were employed by the company Guangxi Jet Toll Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Asad, M.H.; Khaliq, S.; Yousaf, M.H.; Ullah, M.O.; Ahmad, A. Pothole Detection Using Deep Learning: A Real-Time and AI-on-the-Edge Perspective. Adv. Civil Eng. 2022, 1, 9221211. [Google Scholar] [CrossRef]
Ma, N.; Fan, J.; Wang, W.; Wu, J.; Jiang, Y.; Xie, L.; Fan, R. Computer Vision for Road Imaging and Pothole Detection: A State-of-the-Art Review of Systems and Algorithms. arXiv 2022, arXiv:2204.13590. [Google Scholar] [CrossRef]
Bhavan Kumar, S.B.; Guhan, S.; Manyam Kishore; Santhosh, R.; Alfred Daniel, J. Deep Learning Approach for Pothole Detection—A Systematic Review. In Proceedings of the 2023 Second International Conference on Electronics and Renewable Systems (ICEARS), Tuticorin, India, 2–4 March 2023; pp. 1410–1414. [Google Scholar]
Kim, Y.-M.; Kim, Y.-G.; Son, S.; Lim, S.Y.; Choi, B.-Y.; Choi, D.-H. Review of Recent Automated Pothole-Detection Methods. Appl. Sci. 2022, 12, 5320. [Google Scholar] [CrossRef]
Pandey, A.K.; Iqbal, R.; Maniak, T.; Karyotis, C.; Akuma, S.; Palade, V. Convolutional neural networks for pothole detection of critical road infrastructure. Comput. Electr. Eng. 2022, 99, 107725. [Google Scholar] [CrossRef]
Bučko, B.; Lieskovská, E.; Zábovská, K.; Zábovský, M. Computer Vision Based Pothole Detection under Challenging Conditions. Sensors 2022, 22, 8878. [Google Scholar] [CrossRef]
Lincy, A.; Dhanarajan, D.; Kumar, S.; Gobinath, G. Road Pothole Detection System. In Proceedings of the ITM Web of Conferences, Chapel Hill, NC, USA, 22–26 May 2023. [Google Scholar]
Zhang, F.; Hamdulla, A. Research on Pothole Detection Method for Intelligent Driving Vehicle. In Proceedings of the 2022 3rd International Conference on Pattern Recognition and Machine Learning (PRML), Chengdu, China, 15–17 July 2022; pp. 124–130. [Google Scholar]
Chen, X.; Zhang, J.; Chen, J. Road Pothole Detection Based on AlexNet for Autonomous Driving. In Proceedings of the 2023 International Conference on Computers, Information Processing and Advanced Education (CIPAE), Ottawa, ON, Canada, 26–28 August 2023; pp. 414–418. [Google Scholar]
Bharat, R.; Ikotun, A.M.; Ezugwu, A.E.; Abualigah, L.M.; Shehab, M.; Abu Zitar, R. A real-time automatic pothole detection system using convolution neural networks. Appl. Comput. Eng. 2023, 6, 879–886. [Google Scholar] [CrossRef]
Lieskovská, E.; Jakubec, M.; Bučko, B.; Zábovská, K. Automatic pothole detection. Transp. Res. Procedia 2023, 74, 1164–1170. [Google Scholar] [CrossRef]
Wang, H.-F.; Zhai, L.; Huang, H.; Guan, L.-M.; Mu, K.-N.; Wang, G.-P. Measurement for cracks at the bottom of bridges based on tethered creeping unmanned aerial vehicle. Autom. Constr. 2020, 119, 103330. [Google Scholar] [CrossRef]
Chen, H.; Yao, M.; Gu, Q. Pothole Detection Using Location-Aware Convolutional Neural Networks. Int. J. Mach. Learn. Cybern. 2020, 11, 899–911. [Google Scholar] [CrossRef]
Thompson, E.M.; Ranieri, A.; Biasotti, S.; Chicchon, M.; Sipiran, I.; Pham, M.-K.; Nguyen-Ho, T.-L.; Nguyen, H.-D.; Tran, M.-T. SHREC 2022: Pothole and crack detection in the road pavement using images and RGB-D data. Comput. Graphics 2022, 107, 161–171. [Google Scholar] [CrossRef]
Arya, D.; Maeda, H.; Ghosh, S.K.; Toshniwal, D.; Sekimoto, Y. RDD2020: An annotated image dataset for automatic road damage detection using deep learning. Data Brief 2021, 36, 107133. [Google Scholar] [CrossRef] [PubMed]
Finder, S.E.; Amoyal, R.; Treister, E.; Freifeld, O. Wavelet Convolutions for Large Receptive Fields. arXiv 2024, arXiv:2407.05848. [Google Scholar]
Mei, Q.; Gül, M.; Azim, M.R. Densely connected deep neural network considering connectivity of pixels for automatic crack detection. Autom. Constr. 2020, 110, 103018. [Google Scholar] [CrossRef]
Peralta-López, J.-E.; Morales-Viscaya, J.-A.; Lázaro-Mata, D.; Villaseñor-Aguilar, M.-J.; Prado-Olivarez, J.; Pérez-Pinal, F.-J.; Padilla-Medina, J.-A.; Martínez-Nolasco, J.-J.; Barranco-Gutiérrez, A.-I. Speed bump and pothole detection using deep neural network with images captured through ZED camera. Appl. Sci. 2023, 13, 8349. [Google Scholar] [CrossRef]
Huang, Y.-T.; Jahanshahi, M.R.; Shen, F.; Mondal, T.G. Deep learning–based autonomous road condition assessment leveraging inexpensive RGB and depth sensors and heterogeneous data fusion: Pothole detection and quantification. J. Transp. Eng. Part B Pavements 2023, 149, 04023010. [Google Scholar] [CrossRef]
Babbar, S.; Bedi, J. Real-time traffic, accident, and potholes detection by deep learning techniques: A modern approach for traffic management. Neural Comput. Appl. 2023, 35, 19465–19479. [Google Scholar] [CrossRef]
Kim, T.; Ryu, S.-K. Review and analysis of pothole detection methods. J. Emerg. Trends Comput. Inf. Sci. 2014, 5, 603–608. [Google Scholar]
Koch, C.; Brilakis, I. Pothole detection in asphalt pavement images. Adv. Eng. Inform. 2011, 25, 507–515. [Google Scholar] [CrossRef]
Buza, E.; Omanovic, S.; Huseinovic, A. Pothole detection with image processing and spectral clustering. In Proceedings of the 2nd International Conference on Information Technology and Computer Networks, Qinghai, China, 16–18 June 2013; p. 4853. [Google Scholar]
Ryu, S.-K.; Kim, T.; Kim, Y.-R. Image-Based Pothole Detection System for ITS Service and Road Management System. Math. Probl. Eng. 2015, 1, 968361. [Google Scholar] [CrossRef]
Saraswat, D.; Bhattacharya, P.; Verma, A.; Prasad, V.K.; Tanwar, S.; Sharma, G.; Bokoro, P.N.; Sharma, R. Explainable AI for healthcare 5.0: Opportunities and challenges. IEEE Access 2022, 10, 84486–84517. [Google Scholar] [CrossRef]
Tanwar, S.; Ramani, T.; Tyagi, S. Dimensionality reduction using PCA and SVD in big data: A comparative case study. In Future Internet Technologies and Trends: First International Conference, ICFITT 2017, Surat, India, 31 August–2 September 2017; Proceedings 1; Springer: Cham, Switzerland, 2018; pp. 116–125. [Google Scholar]
Bhatia, Y.; Rai, R.; Gupta, V.; Aggarwal, N.; Akula, A. Convolutional neural networks based potholes detection using thermal imaging. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 578–588. [Google Scholar]
Chellaswamy, C.; Saravanan, M.; Kanchana, E.; Shalini, J. Deep learning based pothole detection and reporting system. In Proceedings of the 7th International Conference on Smart Structures and Systems (ICSSS), Chennai, India, 21–24 July 2020; pp. 1–6. [Google Scholar]
Saisree, C.; Kumaran, U. Pothole detection using deep learning classification method. Procedia Comput. Sci. 2023, 218, 2143–2152. [Google Scholar] [CrossRef]
Khan, M.; Raza, M.A.; Abbas, G.; Othmen, S.; Yousef, A.; Jumani, T.A. Pothole detection for autonomous vehicles using deep learning: A robust and efficient solution. Front. Built Environ. 2024, 9, 1323792. [Google Scholar] [CrossRef]
Dhiman, A.; Klette, R. Pothole detection using computer vision and learning. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3536–3550. [Google Scholar] [CrossRef]
Agrawal, R.; Chhadva, Y.; Addagarla, S.; Chaudhari, S. Road surface classification and subsequent pothole detection using deep learning. In Proceedings of the 2021 2nd International Conference for Emerging Technology (INCET), Belagavi, India, 21–23 May 2021; pp. 1–6. [Google Scholar]
Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part V. pp. 740–755. [Google Scholar]
Chi, L.; Jiang, B.; Mu, Y. Fast fourier convolution. Adv. Neural Inf. Process. Syst. 2020, 33, 4479–4488. [Google Scholar]
Ding, X.; Zhang, X.; Han, J.; Ding, G. Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), New Orleans, LA, USA, 18–24 June 2022; pp. 11963–11975. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Silver Spring, MD, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Wang, C.-Y.; Yeh, I.-H.; Liao, H.-Y.M. Yolov9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding YOLO series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Chen, S.; Sun, P.; Song, Y.; Luo, P. DiffusionDet: Diffusion model for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Vancouver, BC, Canada, 17–24 June 2023; pp. 19830–19843. [Google Scholar]
Lyu, C.; Zhang, W.; Huang, H.; Zhou, Y.; Wang, Y.; Liu, Y.; Zhang, S.; Chen, K. RTMDET: An empirical study of designing real-time object detectors. arXiv 2022, arXiv:2212.07784. [Google Scholar]
Zhang, H.; Li, F.; Liu, S.; Zhang, L.; Su, H.; Zhu, J.; Ni, L.M.; Shum, H.-Y. DINO: DETR with improved denoising anchor boxes for end-to-end object detection. arXiv 2022, arXiv:2203.03605. [Google Scholar]

Figure 1. The first row shows road pothole images under normal illumination, while the second row displays nighttime road pothole images.

Figure 2. Sample images from our NPD, showing potholes at night across various scenes. These images include potholes of different shapes, sizes, and depths.

Figure 3. Sample images with bounding boxes in the NPD. The red rectangles represent the bounding boxes in the NPD.

Figure 4. The proportion of potholes at different depths in the NPD.

Figure 5. The image clarity of pothole images at different depths in the NPD.

Figure 6. The network architecture of our baseline detector, WT-YOLOv8, which inherits its structure from YOLOv8. The integration of the wavelet transform convolution (WTConv) module into the backbone of the original YOLOv8 represents a significant difference.

Figure 7. An example of the WTConv operation on a single channel using a 2-level wavelet decomposition.

Figure 8. Bar chart showing the visual evaluation results of the nine baseline detectors on the NPD.

Figure 9. Precision–recall curves, showing the tradeoff between precision and recall of nine object detection methods under different thresholds.

Figure 10. Visual comparison of satisfactory detection results, with the bounding boxes of different methods represented by different colors, where red represents the bounding boxes in the NPD.

Figure 11. Visual comparison of the detection results with deviations. The bounding boxes of different methods are represented by different colors, where red represents the bounding boxes in the NPD.

Table 1. Comparison of the results of nine different quantitative assessments on the NPD using mAP, AP@0.5, and AP@0.75.

Method	Publication	Performance
Method	Publication	mAP	AP@0.5	AP@0.75
WT-YOLOv8	Ours	0.585	0.918	0.637
YOLOv5	——	0.568	0.906	0.599
YOLOv7 [38]	CVPR, 2023	0.534	0.897	0.559
YOLOv8	——	0.562	0.903	0.609
YOLOv9 [39]	ArXiv, 2024	0.541	0.894	0.564
YOLOX [40]	ECCV, 2021	0.350	0.708	0.301
DiffusionDet [41]	ICCV, 2023	0.496	0.855	0.528
RTMDet [42]	ArXiv, 2022	0.552	0.903	0.586
DINO [43]	ICLR, 2023	0.505	0.873	0.533

Table 2. Comparison of the results of nine different quantitative assessments on the COCO dataset [33] using mAP, AP@0.5, and AP@0.75.

Method	Publication	Performance
Method	Publication	mAP	AP@0.5	AP@0.75
WT-YOLOv8	Ours	0.450	0.618	0.498
YOLOv5	——	0.377	0.571	0.410
YOLOv7 [38]	CVPR, 2023	0.375	0.558	0.402
YOLOv8	——	0.442	0.611	0.479
YOLOv9 [39]	ArXiv, 2024	0.384	0.582	0.429
YOLOX [40]	ECCV, 2021	0.327	0.503	0.348
DiffusionDet [41]	ICCV, 2023	0.351	0.522	0.389
RTMDet [42]	ArXiv, 2022	0.409	0.573	0.444
DINO [43]	ICLR, 2023	0.490	0.664	0.533

Table 3. Ablation study with WT-YOLOv8: Comparison of different wavelet bases in WTConv.

Level	Wavelet	Performance
Level	Wavelet	mAP	AP@0.5	AP@0.75
1-level	Haar	0.581	0.920	0.630
1-level	db1	0.585	0.918	0.637
1-level	db2	0.580	0.917	0.636

Table 4. Ablation study with WT-YOLOv8: Selecting db1 as the wavelet basis and comparing different levels of WTConv.

Level	Wavelet	Performance
Level	Wavelet	mAP	AP@0.5	AP@0.75
1-level	db1	0.585	0.918	0.637
2-level	db1	0.583	0.917	0.635
3-level	db1	0.576	0.913	0.631

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ling, M.; Shi, Q.; Zhao, X.; Chen, W.; Wei, W.; Xiao, K.; Yang, Z.; Zhang, H.; Li, S.; Lu, C.; et al. Nighttime Pothole Detection: A Benchmark. Electronics 2024, 13, 3790. https://doi.org/10.3390/electronics13193790

AMA Style

Ling M, Shi Q, Zhao X, Chen W, Wei W, Xiao K, Yang Z, Zhang H, Li S, Lu C, et al. Nighttime Pothole Detection: A Benchmark. Electronics. 2024; 13(19):3790. https://doi.org/10.3390/electronics13193790

Chicago/Turabian Style

Ling, Min, Quanjun Shi, Xin Zhao, Wenzheng Chen, Wei Wei, Kai Xiao, Zeyu Yang, Hao Zhang, Shuiwang Li, Chenchen Lu, and et al. 2024. "Nighttime Pothole Detection: A Benchmark" Electronics 13, no. 19: 3790. https://doi.org/10.3390/electronics13193790

APA Style

Ling, M., Shi, Q., Zhao, X., Chen, W., Wei, W., Xiao, K., Yang, Z., Zhang, H., Li, S., Lu, C., & Zeng, Y. (2024). Nighttime Pothole Detection: A Benchmark. Electronics, 13(19), 3790. https://doi.org/10.3390/electronics13193790

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Nighttime Pothole Detection: A Benchmark

Abstract

1. Introduction

2. Related Works

2.1. Traditional Pothole Detection Methods

2.2. Deep Learning Methods for Pothole Detection

3. The Construction of the NPD

3.1. Motivation for Developing the NPD

3.2. Data Collection

3.3. Data Source and Processing

3.4. Data Annotation

3.5. Statistical Analysis

4. Baseline Detector for Nighttime Pothole Detection

4.1. WT-YOLOv8

4.2. Wavelet Transform Convolution (WTConv)

5. Evaluation

5.1. Implementation Details

5.2. Evaluation Metrics

5.3. Baseline Methods

5.4. Evaluation Results

5.4.1. Overall Performance

5.4.2. Qualitative Evaluation

5.4.3. Ablation Study

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI