1. Introduction
Underwater image enhancement, a technique used to restore underwater images to clarity, is a challenging task for the serious quality problems of underwater images in the water medium due to light absorption and scattering. Underwater images differ from normal image imaging in that different wavelengths of light have different energy attenuation rates during transmission. The longer the wavelength, the faster the attenuation rate. The red light with the longest wavelength decays faster, while the blue and green light decays relatively slowly, so the underwater images are mostly blue-green biased. In addition, this enhanced technology is heavily relied on by vision-guided robots and autonomous underwater vehicles to effectively observe regions of interest for a great deal of advanced computer vision tasks such as underwater docking [
1], submarine cable and debris inspection [
2], salient target detection [
3], and other operational decisions. Therefore, how to solve the problems of color distortion, low contrast, and blurred details in underwater images is the main challenge for researchers today.
The solutions to these underwater image problems can be divided into two categories, one based on traditional enhancement methods and the other on deep learning methods [
4].
The traditional methods can be divided into two categories, i.e., non-physical model-based enhancement methods [
5] and physical model-based enhancement methods [
6]. Firstly, non-physical model methods do not need to consider the imaging process, and such methods mainly include histogram equalization, grayscale world algorithm, Retinex algorithm, etc. The histogram equalization method [
7] can evenly distribute the image pixels, which improves the image quality and sharpness to some extent. Grayscale world algorithm [
8] removes the effect of ambient light from the image and enhances the underwater image. Fu et al. [
9] proposed a Retinex-based underwater image enhancement method, applying the Retinex method to obtain the reflection and irradiation components based on the correction of the underwater image color, resulting in an enhanced underwater image. Ghani et al. [
10] proposed Rayleigh stretched finite contrast adaptive histograms to normalize global and local contrast enhancement maps to enhance underwater image quality. Zhang et al. [
11] used Retinex, bilateral filtering, and trilateral filtering in CIELAB color space for underwater image enhancement. Li et al. [
12] used information loss minimization and histogram distribution to eliminate water fog and enhance the contrast and brightness of underwater images. In short, the above methods based on non-physical models have simple algorithms for fast implementation, but suffer from problems such as over-enhancement and artificial noise. Traditional image enhancement methods can, to a certain extent, eliminate image blur, enhance edges, etc. However, such methods improve the quality of underwater images only using a single image processing, by adjusting the image pixel values to improve the visual quality. Since the physical process of underwater image degradation is not taken into account, the achieved effect is limited and there are still problems such as high noise, low definition, and color distortion, so further enhancement and improvement are needed.
Considering these shortcomings, scholars have further proposed a physical model-based approach. The core idea is to construct a mathematical imaging model for the degradation process of underwater images. The parameters of the imaging model are estimated based on the observations and various a priori assumptions to derive an undegraded image in the ideal state. In the most classical dark channel prior (DCP) algorithm [
13], researchers obtain light transmittance and atmospheric light estimates based on the relationship between the fog image and the imaging model, so as to achieve enhancement of the fog image. Since underwater images are somewhat similar to fogged images, the DCP algorithm is also used for underwater image enhancement, but its application scenario is very limited. Therefore, researchers proposed the Underwater Dark Channel (UDCP) algorithm [
14] specifically for the underwater environment, which takes into account the attenuation characteristics of light underwater and estimates the transmittance of light waves in water more accurately to achieve underwater image enhancement. Peng et al. [
15] proposed an underwater image restoration method to deal with underwater image blurring and light absorption, which introduces depth of field in the atmospheric scattering model and applies a dark channel a priori to solve for more accurate transmittance, thus achieving underwater image enhancement. To sum up, the methods based on physical models rely on imaging models and a priori knowledge of the dark channel [
16], but the specificity of the underwater environment leads to the limitations of the methods. The physical model-based approach takes into account the optical properties of underwater images, but usually relies on environmental assumptions and specialized a priori knowledge of physics, and therefore has significant limitations. The estimation methods of model parameters are difficult to generalize to different underwater conditions and lack strong generalization and applicability.
In recent years, deep learning has attracted widespread attention with its remedy for the shortcomings of traditional methods [
17]. The deep learning approach can reduce the impact of the complex underwater environment on the image to achieve better enhancement results. Both convolutional neural network (CNN)-based [
18] models and generative adversarial network (GAN)-based [
19] models require a large number of paired or unpaired datasets. Chen et al. [
20] proposed an underwater image enhancement method that fuses deep learning with an imaging model, which obtains an enhanced underwater image by estimating the background scattered light and combining it with an imaging model for convolution operations. Islam et al. [
21] proposed a fast underwater image enhancement model (FUnIE-GAN), which establishes a new loss function to evaluate the perceptual quality of images. Fabbri et al. [
22] proposed a generative adversarial network (UGAN)-based method that enhances the details of underwater images, but has ambiguous experimental results occur since Euclidean distance loss is applied. Wang et al. [
23] (2021) proposed an unsupervised underwater generative adversarial network (underwater GAN, UWGAN), which synthesize realistic underwater images (with color distortion and haze effects) from aerial images and color depth map pairs based on an improved underwater imaging model, and directly reconstructs clear images underwater based on the synthesized underwater dataset using an end-to-end network. In summary, the above underwater image enhancement algorithm based on deep learning improves the overall performance of the algorithm. The techniques in the existing literature are mainly based on very deep convolutional neural networks (CNNs) and generative adversarial network (GAN) models, focusing on noise removal for image defogging, contrast extension, combination with multi-information improvement and deep learning, etc. However, these large models require a high amount of computation and memory, which makes it difficult to perform real-time underwater image enhancement tasks.
In this paper, a lightweight model Rep-UWnet based on structural reparameterization (RepVGG) [
24] is designed to recover underwater images by addressing the problems of color distortion, detailed features loss, large memory consumption, and high computation in the underwater images enhanced by existing algorithms. In addition, some ideas from Shallow-UWnet [
25] are adopted in the model. Although Shallow-UWnet has a smaller number of parameters, its model accuracy and inference speed need to be further improved. In this paper, RepVGG Block is used instead of a normal convolution, which leads to an average inference speedup of 0.11 s. Secondly, a multi-scale hybrid convolutional attention module is designed in this paper, which leads to an improvement of model accuracy by about 11.1% in PSNR, 9.8% in SSIM, and 7.9% in UIQM. We also decrease the channel of convolutions to design the lightweight model, so the overall model has approximately 0.45 M parameters, which is less and faster than other state-of-the-art models. According to the experimental conclusion, the innovation points of the model in this paper are as follows.
- (1)
A multi-scale hybrid convolutional attention module is designed. Considering the complex and diverse local features of underwater scenes, this paper uses a spatial attention mechanism and channel attention mechanism. The former is to improve the network’s ability to pay attention to complex regions such as light field distribution and color depth information in underwater images, while the latter focuses on the network’s representation of important channels in features, thus improving the overall representation performance of the model.
- (2)
RepVGG Block is used instead of ordinary convolution, and different network architectures are used in the network training and network inference phases. With more attention to accuracy in the training phase and more attention to speed in the inference phase, an average speedup of 0.11 s for a single image test is achieved.
- (3)
A joint loss function combining content perception, mean square error, and structural similarity is designed, and weight coefficients are applied to reasonably assign each loss size. For the perceptual loss, layers 1, 3, 5, 9, and 13 of the VGG19 model are selected in this paper to extract hidden features and generate clearer underwater images while maintaining the original image texture and structure.
4. Experiments
4.1. Dataset and Experimental Setup
Dataset. A total of 3000 paired images on EUVP underwater image are selected for training. Due to its diversity of capture locations and perceptual quality, the EUVP dataset is chosen as the training dataset, so that the model in this paper can be generalized to other underwater datasets. In addition, 515 paired test samples on EUVP and 120 pairs of test sets from the UFO 120 dataset are selected for testing.
Training setup. First, during the training period, the images are scaled to 256 × 256. Second, for perceptual loss, layers 1, 3, 5, 9, and 13 in the VGG19 model are chosen to extract hidden features. Third, λ1, λ2, and λ3 are set to 1, 0.6, and 1.1, respectively, in the experiments. Fourth, the Adam optimizer is applied to iterate through 200 rounds with a learning rate set to 0.0002 and batch-size of 4. Fifth, for the platform, the experiments are conducted on the Pytorch 2.3 with a CPU intel (R) Core(TM) i7-10870H CPU @ 2.20 GHz (Santa Clara, CA, USA), 16 GB of running memory, and NVIDIA GeForce RTX 2080Ti GPU 11 GB (Santa Clara, CA, USA) for training and testing. Sixth, the network training time is about 10 h.
Evaluation indicators. Three evaluation indicators are used to analyze the quality of the generated output images, including peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and reference-free underwater image quality measure (UIQM). Peak signal-to-noise ratio (PSNR) is used to express the ratio between the maximum possible power of a signal and the power of the corrupted noise that affects the fidelity of its representation. Since many signals have a very wide dynamic range, PSNR is often represented as a logarithmic quantity with a decibel scale. In image processing, it is primarily used to quantify the reconstruction quality of images and videos affected by lossy compression, and it is often defined simply by the mean squared error (MSE). The PSNR metric is given by the following equation:
The structural similarity index (SSIM) is an index used to measure the similarity between two digital images. When two images are taken, one without distortion and the other after distortion, the structural similarity of the two images can be considered as a measure of the image quality of the distorted image. Compared to traditional image quality measures, such as peak signal-to-noise ratio (PSNR), the structural similarity is more consistent with the human eye’s judgment of image quality. SSIM is defined as Equation (3) above.
Unreferenced underwater image quality (UIQM) consists of three underwater image attribute indicators: image colorimetry (UICM), sharpness (UISM), and contrast (UIConM), where each attribute evaluates one aspect of underwater image degradation. The UIQM is given by the following equation:
where the parameters
,
are set according to the (Panetta, Gao, and Agaian [
38]) paper. In addition, this paper measures the model compression and acceleration performance with compression rates and acceleration rates:
where
is the number of parameters in model N,
is the test time per image in model N, N is the original model, and
is the compressed model.
4.2. Experimental Results
The main reason for comparing our method with CLAHE, DCP, HE, ILBA, UDCP, Deep SESR, FUnIE-GAN, and U-GAN is that they represent a range of techniques in the field of underwater image enhancement. CLAHE is a classic technique for enhancing image contrast, but it may introduce artifacts and retain a haze-like effect. DCP, based on dark channel prior, performs well in dehazing and image enhancement but may generate artifacts in complex scenes. HE is a simple and intuitive method for image enhancement but has limited effectiveness in handling images with high noise or uneven contrast. ILBA, an improved method for uneven lighting conditions, can enhance image contrast and details but may fail in complex scenes. UDCP, a dark channel prior method specifically for underwater images, exhibits some robustness but may have issues with high-contrast and multimodal images. Deep SESR, a deep learning-based super-resolution method, has good performance and generalization, but requires a large amount of training data and computational resources. FUnIE-GAN and U-GAN, two generative adversarial network-based methods, enhance image clarity and contrast but require longer training times and significant computational resources. Through in-depth analysis of these methods, our study gains a better understanding of their characteristics, strengths, and limitations, providing targeted directions for improvement and optimization of our proposed underwater image enhancement algorithm.
In this section, our proposed method is compared subjectively and objectively with CLAHE [
11], DCP [
13], HE [
6], UDCP [
22], ILBA [
15], U-GAN [
14], FUnIE-GAN [
21], and Deep SESR [
39], representing various underwater image enhancement algorithms. CLAHE enhances image contrast but may introduce artifacts and retain a haze-like effect. DCP has limited effectiveness in improving the quality of underwater images. HE improves image quality, but there are some limitations. UDCP and ILBA have limited effectiveness in recovering images with specific color tones. U-GAN, Deep SESR, and FUnIE-GAN enhance image contrast but may have limitations in color recovery and artifact avoidance. In contrast, our proposed method not only effectively enhances the recovery of underwater images, but also improves features such as color bias and low contrast, resulting in more natural and clear images. Additionally, our method achieves better subjective quality, closer to the reference image (as shown in
Figure 4j).
Figure 4 displays some images enhanced by these methods, among which our method demonstrates results closer to the reference image in
Figure 4j.
As described in 4.1, the peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and underwater image quality measure (UIQM) [
38] are chosen as objective indicators for quantitative evaluation. The larger the indicator values are, the better the images generated. The results are shown in
Table 2: the proposed method outperforms all the algorithms in terms of the indicators on the EUVP dataset, but it only achieves the second-best results on the UFO 120 dataset, which is probably because the best performer Deep SESR is trained on UFO 120.
The experiment employed a subjective evaluation grading standard to comprehensively assess the effectiveness of various underwater image enhancement algorithms. The grading standard consisted of five levels, ranging from “very poor” to “very good,” to describe the overall image quality and its impact on visual experience. By inviting five students to evaluate a range of representative algorithms, including CLAHE, DCP, HE, IBLA, UDCP, Deep SESR, FUnIE-GAN, U-GAN, and our proposed algorithm, we obtained comprehensive subjective evaluation data. Traditional underwater image processing algorithms (such as CLAHE, DCP, HE, IBLA, and UDCP) received relatively average scores in subjective evaluations, with average scores ranging from 2.8 to 3.2. In comparison, deep learning algorithms (such as Deep SESR, FUnIE-GAN, and U-GAN) achieved higher average scores, ranging from 3.8 to 4.0, demonstrating superior performance. Particularly, our algorithm obtained the highest average score of 4.0 in subjective evaluations, indicating significant advantages. Therefore, while traditional algorithms show some effectiveness in underwater image processing, deep learning algorithms perform better, with our algorithm exhibiting the best performance across all evaluations, providing superior visual effects for underwater image processing. The results are shown in
Table 3.
In this paper, the RepVGG Block is replaced with a normal residual network, but this causes an increase in the number of parameters and a decrease in the testing speed. In addition, the proposed model has the lowest number of parameters and the shortest testing time for a single image compared to other deep learning-based models. This indicates that the structural reparameterization of RepVGG Block helps to speed up the network training and testing, resulting in an average speed increase of 0.11 s for single-image testing. The results are shown in
Table 4.
4.3. Ablation Experiments
4.3.1. Loss Function Ablation Experiment
In order to verify the effects of the mean square error loss term, structural similarity loss term, and content perception loss term in the loss function on the experimental results, ablation experiments are conducted on the above three loss terms on the EUVP dataset. In each ablation experiment, one of the loss terms is removed to conduct a comparative study.
From the subjective aspect, the image generated by the proposed method is closer to the reference image (
Figure 5f), while the image generated with loss term removal suffers from obvious color bias, as shown in
Figure 5. In the figure, w/o indicates the removal of a loss term in the loss function.
From the objective aspect, with the highest index of the complete method proposed in this paper, the loss terms are proven to be effective. The results of the objective quality comparison in ablation experiments are shown in
Table 5.
4.3.2. Attention Ablation Experiment
Another ablation experiment is also conducted to demonstrate the effectiveness of the multi-scale hybrid convolutional attention module. In this paper, two training methods are proposed as follows: (1) Rep-UWnet+multiscale hybrid convolutional block w/o spatial attention, and (2) Rep-UWnet+multiscale hybrid convolutional block w/o channel attention. In
Table 6, it is demonstrated that multi-scale-based channel attention and spatial attention allow the proposed model to better learn the features of real underwater complex environments and obtain better indicators.
4.4. Application Testing Experiments
Rep-UWnet is a lightweight model suitable for various advanced visual tasks. This paper focuses on studying the problems of edge detection [
40] and single-image color depth estimation [
41] in the underwater environment. Underwater images often become blurred due to light attenuation and water quality effects, further reducing the accuracy of edge detection and single-image color depth estimation. To better investigate these issues, this paper selected the EUVP dark dataset, which is blurrier and allows for a more accurate evaluation of algorithm performance.
In this study, we observed that MiDaS [
42] is affected by the green and blue tones in single-image color depth estimation, while HED [
43] also faces similar issues in edge detection. As shown in
Figure 6b, the color depth estimation contours are not clear in the original image, and the edge information in edge detection is insufficient. However, the performance of edge detection and color depth estimation is significantly improved in the enhanced images produced by our method, as illustrated in
Figure 6d. This further demonstrates the effectiveness of our approach.
However, conducting edge detection and color depth estimation tasks in underwater environments also poses several challenges. Firstly, light attenuation and changes in water quality in underwater environments can lead to unstable image quality, which may affect the performance of models. Secondly, the deployment and optimization of underwater equipment are also challenging, as they may be affected by factors such as water flow, water pressure, and water temperature, which can affect image acquisition and sensor performance.
To overcome these challenges, this paper needs to design and optimize models tailored to the characteristics of underwater environments. For example, advanced image enhancement techniques can be employed to improve the quality of underwater images, thereby enhancing the accuracy of edge detection and color depth estimation. Additionally, the use of high-performance sensors and stable mechanical structures can improve the stability and reliability of underwater equipment, ensuring the effectiveness and reliability of the model in practical applications.