1. Introduction
Dams, serving as water conservation and hydropower infrastructure, play an important role in flood control, water storage, irrigation, power generation, and ecological environment protection [
1,
2,
3]. Currently, many dams experience structural aging and corrosion during long-term operation, which increases the risk of damage [
4,
5,
6]. Regularly inspecting a dam, comprehensively evaluating and monitoring its safety status, and timely identifying and addressing potential problems are important for the smooth operation of a dam structure.
Damage such as cracks, spalling, erosion, cavities, and wear may occur during the long-term use of dams [
7]. These defects significantly affect the safe and stable operation of dams. Among these disasters, cracks have received widespread attention, owing to their proneness, danger, and ability to trigger other disasters [
8]. In recent years, deep learning (DL) and computer vision technologies have made significant progress and have led to significant breakthroughs in artificial intelligence and the development of automatic crack detection methods. Devices equipped with high-resolution visual sensors, such as industrial cameras and unmanned aerial vehicles (UAVs), are used to collect numerous images of dam surfaces. The collected images are processed, a deep learning model is trained to learn the damage features in the images, and a crack detection method based on deep learning is then employed [
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19]. However, because of the complex underwater environment, dam-surface crack detection technology cannot be directly applied to underwater crack detection. Underwater images suffer from color distortion, low contrast, a low signal-to-noise ratio, and a lack of detail. Hence, if deep learning technology is applied, it is difficult to obtain and collect datasets of underwater crack areas. An innovative underwater crack detection system is urgently needed to ensure the safe operation of dams.
Currently, research on underwater crack detection is limited. Huang et al. [
8] proposed a method to generate underwater dam crack images using the CycleGAN model, a type of adversarial learning network, to convert above-water dam crack images through image-to-image translation. Then, they used the converted underwater dam crack images for dataset annotation and deep learning network training to solve the problem of insufficient datasets. Cao et al. [
20] proposed a large-scale underwater crack detection method based on image stitching and segmentation that can adapt to complex underwater environments and perform well in different research fields. Li et al. [
21] proposed a comprehensive, pixel-level, underwater structure, multi-defect instance segmentation, and quantization framework based on machine vision and deep learning for hydraulic tunnels. They also developed a video dataset of multiple defects in underwater tunnel lining structures for training and detection. Li et al. [
22] proposed a real-time pixel-level automatic segmentation and quantification framework for underwater cracks in dams using a lightweight semantic segmentation network (i.e., LinkNet) and two-stage hybrid transfer learning (TL) for the detection and segmentation of underwater cracks.
Previous studies mostly trained and segmented deep-learning networks based on underwater crack datasets. However, obtaining underwater crack datasets is difficult and time-consuming. Converting underwater crack images into above-water crack images with the same crack characteristics is an effective crack detection method. This method can be used to detect above-water crack images generated after image processing using a more easily accessible water surface crack dataset, which can improve work efficiency. However, new image-processing technology is required to convert underwater images into water surface images.
There are three main differences between the underwater and above-water environments [
23]:
- (1)
The optical imaging principle of underwater images is different from that of underwater images.
Figure 1a shows that compared with above-water images, the image received by the underwater imaging system is affected by the direct attenuation, forward scattering, and backscattering components [
24].
- (2)
Underwater images are likely to be affected by bubbles, sand grains, and other impurities, resulting in more noise than that observed in above-water images [
25].
- (3)
The attenuation characteristics of light in water are significantly different from those in air.
Figure 1b shows that in an underwater environment, light of different wavelengths has different attenuation rates when propagating underwater [
26].
Figure 1.
Underwater imaging principle: (a) underwater light loss diagram; (b) underwater attenuation rate graph of different light rays.
Figure 1.
Underwater imaging principle: (a) underwater light loss diagram; (b) underwater attenuation rate graph of different light rays.
To convert underwater crack images into above-water images, targeted image processing is required to address the three aforementioned differences. Underwater images require image enhancement, color balance, and noise removal [
27]. There have been several breakthroughs in the field of underwater image processing [
28,
29,
30,
31]. Xin et al. [
28] proposed a method based on the affine shadow transform and adaptive histogram equalization to achieve uniform illumination in a processed image. Ma et al. [
29] proposed a method based on an affine shadow transform that reduces the color deviation of the processed image and improves its clarity. Zhang et al. [
30] proposed a method based on biological principles that adaptively uniformizes the global brightness of an image according to its overall brightness distribution without manual intervention. Wang et al. [
31] proposed a dual information modulation network that effectively supplemented the network’s structural and global perception capabilities by utilizing the differences and complementarities between features of different scales, thereby further improving the clarity and contrast of the image.
Previous research has played a significant role in the processing and optimization of underwater images. However, the optimization effect on the crack feature areas of underwater images is insignificant. Perhaps because of the different fields used in the research, most studies have only focused on optimizing the overall quality of underwater images and have neglected the optimization and denoising of specific areas (crack regions). Therefore, a more suitable underwater image-processing method must be developed to optimize underwater images with crack features. This study proposes a new underwater image-processing technology combining a novel white balance algorithm and bilateral filtering method to improve the image quality of processed underwater crack images, compared with previous image-processing methods.
After obtaining water crack images with underwater crack features, a crack detection network is used to achieve crack region detection. With the popularity of target detection networks, such as the YOLO (You Only Look Once) and U-net networks, digital image detection technology has become a detection method with a lower cost and higher efficiency. Moreover, it is widely used in underwater crack detection in concrete dams. Currently, there are many studies on the improvement and innovation of crack detection networks. Li et al. [
32] proposed a new IDP-YOLOv8 network based on the newly launched YOLOv9 network, which utilizes a parallel architecture, including an image denoising and enhancement module as well as an improved YOLOv9 object detection module that can significantly improve image quality. Wu et al. [
33] proposed an integrated approach based on the convolutional neural network DeepLabv3+ for crack detection as well as a pixel-level crack quantification algorithm that achieved high accuracy in crack detection. Abbasi et al. [
34] proposed a fog perception adaptive YOLO algorithm for target detection in foggy weather environments, and this algorithm achieved a reasonable improvement in terms of average accuracy. Luo et al. [
35] used a deep convolutional neural network for drone detection. In the process of feature extraction in convolutional layers of deep learning neural networks, there are often issues with insufficient and imprecise feature extraction, which can lead to decreased model accuracy. The OREPA convolutional structure addresses this problem. By integrating residual connections, edge information, and pyramid attention mechanisms, the OREPA convolutional layer enhances the model’s ability to extract complex features, thereby resolving the shortcomings of traditional convolutional layers in feature extraction and significantly improving the model’s recognition accuracy. A detailed introduction to the OREPA convolutional layer will be provided in
Section 3.
Previous studies based on deep-learning networks have made significant progress in object detection [
36]. A new deep-learning network, YOLOv9 [
37], was proposed and used for object detection and segmentation. There has been relatively little research on the improvement of the YOLOv9 network; therefore, further research is required on the YOLOv9 network.
Building on previous research, this study proposes a novel image-processing technology to convert underwater crack images to water surface crack images and then replace the original convolutional blocks in YOLOv9 with OREPA convolutional blocks to improve the ability of the YOLOv9 network to accurately detect underwater cracks. The contributions of this study are as follows:
- (1)
This study proposes a novel white balance algorithm that improves the image quality of processed underwater crack images, compared with previous white balance algorithms.
- (2)
This study used a bilateral filtering denoising method to further denoise underwater crack images, reduce the impact of noise while preserving underwater crack features, and convert underwater crack images to water surface crack images. The denoising effect of bilateral filtering is superior to that of other denoising methods and can better restore the crack features of underwater images.
- (3)
The original convolutional blocks of YOLOv9 were replaced with OREPA convolutional blocks. Compared with all the original models of YOLOv9, the proposed model shows better crack detection results in terms of the loss rate and other indicators, which applies to the method proposed in this study.
The remainder of this paper is organized as follows.
Section 2 introduces the proposed underwater image-processing technology and various evaluation indicators for underwater images.
Section 3 introduces the combination of the improved YOLOv9 model and the OREPA convolutional blocks.
Section 4 describes the relevant experimental plans and results.
Section 5 provides the conclusions.
Section 5 provides the conclusions.
Section 6 introduces the future work to be done and some limitations of this method.
2. A Novel Underwater Image-Processing Technology
This section introduces a novel white balance algorithm and the principle of bilateral filtering and then introduces the meaning and expressions of various indicators used to evaluate image quality.
2.1. A Novel White Balance Algorithm
White balance, also known as color constancy, is the ability to maintain color consistency in a scene under different lighting conditions. In an underwater environment, owing to different light source conditions, the light reflected from the same object may differ, leading to the final camera assuming that the same object has different colors under different lighting conditions. To address this issue, a white balance adjustment function is introduced in the image to increase or decrease the color values of the red, green, and blue channels at each pixel point, according to the perceived light, color, and temperature information. In this manner, the white objects in the image can be presented as truly white under different lighting conditions, and the other colors can maintain relative accuracy.
Four types of underwater image white balance algorithms are commonly used [
38]: the average white balance method, perfect reflection method, image analysis-based chromatic aberration detection and correction method, and gray world white balance method. We introduce a new white balance method based on a dynamic threshold algorithm [
39] to achieve image color uniformity.
This method introduces a new color space, namely, the YUV color space, where Y represents luminance and U and V represent chrominance (hue and saturation). The Y-component is constructed from the RGB input signals by combining specific portions of RGB signals. Chrominance is defined by two aspects of color: hue and saturation, which are represented by Cr and Cb, respectively. Here, Cr reflects the difference between the red component of the RGB input signal and the luminance of the RGB signal, whereas Cb reflects the difference between the blue component of the RGB input signal and the luminance of the RGB signal.
This method determines a reference point by converting the RGB color space into the YC
rC
b color space for analysis. By analyzing the YC
rC
b coordinate space of the image, a region that is close to white and contains the reference point can be identified. A threshold is then set to designate certain points as reference points. Next, the input image is divided into 16 regions, and each region is transformed from the RGB color space to the YC
rC
b space. A threshold is then set to determine the reference point position in each region. In
Figure 2, the thirteenth region is used as an example.
The specific algorithm process is as follows.
- (1)
Transform the image from the RGB space into the YC
rC
b space.
- (2)
Calculate the mean values of
Cr and
Cb for each region, where
Mr,
Mb, and
N are the numbers of pixels in each region, and (
i,
j) denotes the reference point coordinates for each region.
- (3)
Find a reference point that satisfies the conditions in each region. The judgment criteria are shown in Equation (6).
- (4)
After determining the reference point position, calculate the maximum brightness Ymax of the reference point.
- (5)
Calculate the gain coefficients (
Rgain,
Ggain, and
Bgain) of each RGB channel. In Equation (7),
Ravgw,
Gavgw, and
Bavgw are the mean values of the reference points for the three channels in each region.
- (6)
Calculate the gain coefficients (Rgain, Ggain, and Bgain) for each RGB channel. In Equation (7), Ravgw, Gavgw, and Bavgw are the mean values of the reference points for the three channels in each region.
- (7)
The RGB channel value of the image is calculated using the gain coefficient. In Equation (8),
R′,
G, and
B′ are the RGB channel values of the new image calculated by the gain coefficient in each region.
- (8)
Combine the newly obtained RGB channel (
R′,
G′, and
B′) values to generate new underwater crack images.
Figure 3 shows different underwater crack images generated by the four classic white balance algorithms and the white balance algorithm proposed in this study.
2.2. Bilateral Filtering Denoising
Image denoising has many common methods, such as median filtering, Gaussian filtering, and mean filtering. This paper uses the bilateral filtering denoising method to process crack feature images [
40,
41]. Bilateral filtering is a non-linear filtering method that combines the spatial proximity and pixel similarity of an image, taking into account both spatial information and grayscale similarity to achieve edge preservation and denoising. The advantage of the bilateral filter is that it can preserve edges. Usually, Gaussian filtering is used for denoising, which blurs the edges more and does not have a significant protection effect for high-frequency details. The reason why the bilateral filter can achieve smooth denoising while preserving the edges well is that the filter kernel is generated by two functions: the spatial domain kernel and the value domain kernel. In the spatial domain kernel, the bilateral filter follows Equation (9). The ω
d term is a template weight determined by the Euclidean distance between pixel positions,
q(
i,
j) is the coordinates of other coefficients in the template window,
p(
k,
l) is the center coordinate point of the template window, and
σd is the standard deviation of the Gaussian kernel function in the spatial domain and is used to control the weights of pixel positions.
Within the range kernel, bilateral filtering is as follows in Equation (10). Here,
ωr is the template weight determined by the difference in pixel values,
f(
i,
j) represents the pixel value of the image at point
q(
i,
j),
f(
k,
l) denotes the pixel value of the image at point
p(
k,
l), and
σr is the standard deviation of the Gaussian kernel function in the pixel value domain that is used to control the weights of pixel values.
Here,
ωd measures the distance between two points, with lower weights as the distance increases, and
ωr measures the degree of similarity in pixel values between two points, with higher weights assigned to points that are more similar. Finally, multiplying the above two templates yields the template weights
ω for the bilateral filter.
The data equation for the bilateral filter is given in Equation (12):
This study uses median filtering, mean filtering, Gaussian filtering, and bilateral filtering to denoise the input image and then compares the values of various evaluation indicators of the processed image to find the most suitable denoising method.
Figure 4 illustrates the specific process.
2.3. Image Evaluation Indicators
The main image metrics used in this study were the UIQM [
23], SSIM, and PSNR [
42]. UIQM includes three underwater image attribute metrics: the underwater image color metric (UICM), the underwater image clarity metric (UISM), and the underwater image contrast metric (UIConM). Below are the methods used to calculate various evaluation indicators. The solution equations for the UICM, UISM, and UIConM terms in Equation (13) can be found in the literature [
23].
Given a reference image
f and a test image
g, both of size
M ×
N, the PSNR between
f and
g is defined as follows:
The PSNR value approaches infinity as the MSE [
42] approaches zero, indicating that a higher PSNR value provides a higher image quality. On the other end of the scale, a small PSNR value implies large numerical differences between the images.
The SSIM is a quality metric used to measure the similarity between two images. This is considered to correlate with the quality perception of the human visual system. The SSIM is defined as follows:
Here, l(f, g) is used to measure the closeness of the average brightness (μf and μg) between two images, c(f, g) is used to measure the closeness of contrast (σf and σg) between two images, and s(f, g) is used to measure the correlation coefficient between the images f and g. After describing the UIQM, PSNR, and SSIM, this study compares various underwater image-processing methods using these three indicators and thus determines the most suitable underwater image-processing method.
2.4. A New Underwater Image-Processing System
This study proposes a novel underwater image-processing technology combined with bilateral filtering and a novel white balance algorithm for underwater image denoising to achieve image enhancement and compensation. This method is used to convert underwater crack images into above-water crack images.
Figure 5 shows a detailed structural diagram. Box 1 in
Figure 5 represents the five white balance algorithms from top to bottom: the average white balance method, perfect reflection method, proposed white balance method, chromatic aberration detection and correction method based on image analysis, and gray world method. Box 2 in
Figure 5 shows the four filtering and denoising methods from top to bottom: mean, median, bilateral, and Gaussian filtering. The results of the experiment described in
Section 4.1 indicate that the proposed white balance method and bilateral filtering denoising method have significant advantages over other methods in processing underwater images, and can convert underwater crack images into above-water crack images without damaging the crack characteristics.