Underwater Image Enhancement Algorithm Based on Adversarial Training

Zhang, Monan; Li, Yichen; Yu, Wenbin

doi:10.3390/electronics13112184

Open AccessArticle

Underwater Image Enhancement Algorithm Based on Adversarial Training

by

Monan Zhang

^1,2

,

Yichen Li

^1,2 and

Wenbin Yu

^1,2,*

¹

Department of Automation, Shanghai Jiao Tong University, Shanghai 200240, China

²

Key Laboratory of System Control and Information Processing, Ministry of Education, Shanghai Jiao Tong University, Shanghai 200240, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(11), 2184; https://doi.org/10.3390/electronics13112184

Submission received: 30 April 2024 / Revised: 30 May 2024 / Accepted: 31 May 2024 / Published: 3 June 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Ocean observation is the first step in the development of the ocean, whose abundant resources and strategic significance are attracting increasing attention. Observation methods based on visual sensor networks have received great attention from researchers due to their visualization capability and high information capacity. However, below the sea surface, objective factors such as blurriness, turbulence, and underwater color casting can cause image distortion and affect the acquisition of images. In this paper, the enhancement of underwater images is tackled using an adversarial learning-based approach. First, pre-processing is applied to address the significant color casting in the dataset, thus enhancing feature learning for subsequent style transfer. Then, corresponding improvements are made to a generative adversarial network’s structure and loss functions to better restore the features of the network output. Finally, evaluations and comparisons are performed using underwater image quality assessment metrics and several public datasets. Through multidimensional experiments, the proposed algorithm is shown to exhibit excellent performance in both subjective and objective evaluation metrics compared to state-of-the-art algorithms, as well as in practical visual applications.

Keywords:

underwater image enhancement (UIE); adversarial training; generative adversarial networks (GANs)

1. Introduction

With the accelerated progress of ocean exploration, the underwater environments observed are becoming increasingly complex, making imaging processes more challenging. In such environments, the influence of the environmental conditions and the limitations of imaging devices make high-quality imaging of underwater targets difficult and ineffective [1]. The underwater images acquired not only suffer from severe color distortion and significant blurring, but also exhibit uneven brightness, artifacts, and other issues. Traditional image enhancement algorithms have limited effectiveness in improving image quality in such adverse environments, and are thus unsuitable for underwater vision applications, where preserving the integrity and accuracy of image information is crucial. This necessitates the development of image enhancement techniques specifically designed for underwater scenarios [2].

A degraded underwater image is presented in Figure 1, where it is clearly evident that the main challenges that underwater image enhancement (UIE) typically needs to address are color casting and blurriness. Color casting refers to the distortion of colors in images, which is caused by light refraction, absorption, and scattering in the aquatic environment. This phenomenon results in a deviation of hues, saturation levels, and brightness of specific colors from their true representation. On the other hand, image blurring in underwater photography is primarily induced by light diffusion, water turbidity, and the movement of marine organisms. These factors contribute to the loss of sharpness and detail in underwater images, leading to a lack of crispness and clarity in the visual output. However, in more adverse imaging conditions, considering these aspects alone cannot lead to image enhancement effects of superior quality. Under such adverse underwater conditions, algorithms need to demonstrate superior generalization and feature restoration capabilities. To address these challenges, based on the previous analysis, in this paper, enhancement and restoration solutions for underwater imaging under adverse conditions are investigated using adversarial learning. On the one hand, a dataset with prominent color casting characteristics is utilized to enhance feature learning for style transfer, which is a technique that aims to apply the style characteristics of one image onto another image, resulting in a new synthesized image. On the other hand, corresponding improvements are made to the network structure and loss function to improve the network’s output feature restoration ability.

Based on the above, the work of this paper is summarized as follows:

To achieve better style transfer effects for UIE, in this study the training dataset is pre-processed based on color balance and fusion theories. The color balance weights assigned to images with more severe background color casting (i.e., images with higher intensities near the channel histogram centers) are relatively larger to compensate for the increased color degradation.
A high-dimensional semantic cyclic loss function is proposed based on the theoretical analysis of style transfer and content loss. Through the generation of two images twice using the generator in adversarial learning, the extraction of features from the generated images and their comparison with the initial ones using a VGG-19 [3] feature extraction network module, a high-dimensional semantic cyclic loss function is formed. The training process demonstrates good convergence.
Improvements are made to the generator network structure in a generative adversarial network (GAN), including the activation functions and upsampling layers. This improves loss reduction and the quality of the images generated during training.

The remainder of this paper is organized as follows: Section 2 provides an overview of the evolution of UIE technology and represents the latest research findings in the field. In Section 3, the intricacies of the algorithmic and model design are delineated. In Section 4, simulation and experimental validation are undertaken, followed by a thorough analysis of the outcomes. Finally, in Section 5 the the entirety of the document is encapsulated in a comprehensive summary.

2. Related Work

In recent years, the field of underwater vision has stimulated extensive attention towards the issue of UIE, owing to its wide range of applications. However, due to the inherent limitations of imaging conditions and devices in the underwater environment, UIE remains a persistently challenging task [2]. Enhancement methods can be broadly categorized into two principal categories: non-deep learning (DL)-based methods and DL-based methods. While the former possess advantages in terms of algorithmic complexity, they tend to exhibit poor visual effects and limited generalization capabilities. In contrast, the latter leverage the power of DL to learn and adapt to the features of reference datasets for the effective restoration of degraded underwater images. While DL-based methods are heavily dependent on hardware conditions and may possess high algorithmic complexity, they tend to exhibit strong migration properties and better generalization capabilities for multi-class scenes.

With the rapid improvement in graphics hardware computing power, DL methods have made great achievements in various image research and application fields, and the field of UIE is no exception. As one of the most classical DL frameworks, GANs [4] were first proposed in 2014 based on game theory. The generator and discriminator created according to the characteristics of the game were very simple and elegant, and the method of adversarial training was understandable and clear. GANs have been widely used in image style transfer, super-resolution, image enhancement, and other applications [5,6]. Among them, the CycleGAN [7] algorithm, which evolved from GANs, was a great success at the 2017 International Conference on Computer Vision, and achieved significant improvement in the field of style transfer. Fabbri [8] et al. applied image transfer to UIE based on the idea of CycleGAN, improved the loss function according to the characteristics of underwater imaging, and used paired datasets for enhancement training to achieve better performance. Islam [9] et al. extended CycleGAN and previous work and proposed FUGAN, which had a loss function based on the global content, color, local texture, and style information of the image, and was applied on the network structure of a conditional GAN; they also created the EUVP UIE dataset. In [10], an end-to-end underwater GAN for UIE and depth estimation was proposed. First, a shallow generative network was used to roughly estimate the depth map and, then, a more detailed underwater feature layer network was used to splice the estimated shallow and coarse depth maps with the input image and calculate the final depth map. Liu et al. [11] proposed a UIE method based on object-guided twin adversarial contrastive learning, where a bilateral-constraint closed-loop adversarial enhancement module was used to alleviate the requirement of paired data in unsupervised mode through the coupling of a twin inverse mapping, thus retaining more information features. However, the algorithms mentioned above have not demonstrated notable improvements in terms of network structure design, and exhibit high computational cost and complexity during the training process. This intricacy poses a potential hindrance to their practicality, particularly in real-time applications or environments with limited computational resources.

In addition, a task-aware feedback module was embedded in the enhancement process to guide the algorithm to update towards the target image. Wu et al. [12] proposed a two-level underwater image convolutional neural network UIE algorithm based on structural decomposition. By decomposing the original underwater image into high- and low-frequency components, a two-stage underwater enhancement network was proposed, which included a preliminary enhancement network and a refinement network. Qi [13] et al. proposed a UIE network with semantic attention mechanism guidance and multi-scale perception, in which semantic information was introduced as high-level feedback through regional enhancement feature learning. This multi-scale perception resulted in a better learning of local enhancement features of semantic regions. The fused features were consistent in semantics and contributed to effective visual enhancement. Panetta et al. [14] used an enhanced GAN model to improve underwater target tracking performance, and further introduced a cascaded residual network model for UIE, which improved the accuracy and success rate of the tracker. Zhou et al. [15] proposed a domain-adaptive learning UIE algorithm based on physics model feedback. A domain-adaptive mechanism was embedded in the learning framework to eliminate the gap between domains, and a physical constraint was used as a feedback controller to perform UIE. Zhang et al. [16] attempted to combine traditional methods with DL and proposed a multi-input dense connection generator network (MDNet) for underwater image enhancement. Additionally, they devised a multi-component loss function to improve the visual quality of the generated images. Guan et al. [17] changed the learning of the self-attention mechanism by introducing a trainable weight to balance the effect of the mechanism, improving the self-adaptive capability of the model. Lin et al. [18] proposed the dilated GAN (DGAN) method, which added an additional loss function using structural similarity. Yang [19] proposed a triple-branch dense block-based generative adversarial network (TDGAN) for the quality enhancement of underwater images, which improved performance and feature extraction efficiency, and retained more image details. However, almost all of the image enhancement methods based on DL have a common trade-off; due to the feature influence of the reference dataset, recovering more details usually means that more noise will be generated during the enhancement process.

In conclusion, methods based on DL mainly learn and fit a data feature distribution of the reference dataset [20,21]. In general, most enhancement algorithms are based on the training and learning strategy of fitting [22], which mainly consists of two parts. The first part is to improve the loss function of the training process, design an efficient learning task objective function, and quantify the loss function for the task; the final loss function should ultimately cause the network to converge. The second part is to optimize the structure of the network to improve its feature downsampling ability during training, and feature retention and reduction during upsampling. The first key point pertains to the direction of learning and fitting, while the second focuses on feature-level implementation. However, another important factor affecting DL model performance is the characteristics of the dataset, which directly affect the network capability of learning to fit the corresponding features. To enhance the generalization ability of the neural network, the characteristics of the reference dataset are also extremely important for the training process.

3. Design of Algorithms and Models

In this section, a UIE algorithm based on image pre-processing and adversarial learning is proposed. Its main idea is as follows: for adverse underwater environment image enhancement, the training data are pre-processed using color balance and fusion methods to optimize the features. The pre-processing follows a two-step approach, where in the first step a physics-based model approach targeting image feature mechanisms is adopted, while in the second step the performance of the UIE algorithm is further optimized using adversarial learning methods.

3.1. Color Balance and Fusion Pre-Processing

UIE mainly deals with two problems, namely color casting and blurring, both of which are particularly prevalent in underwater environments. Other issues that cause the loss of image features also need to be considered, such as uneven lighting and artifacts. In DL, the features of the training dataset have a significant impact on the mapping results of the generated images. In this paper, DL is combined with a physics-based approach to enhance underwater images. Before performing UIE, color balancing and fusion mappings are applied to the images using the white balance theory. Feature pre-processing based on mapping coefficients helps to restore the subtle changes in the images before and after enhancement. Figure 2 illustrates the steps involved in the pre-processing of the training set images.

In Figure 2, the image is input into a network trained using the training dataset, and the white balancing process is applied first to obtain an image whose colors are closer to those of the real scene. Next, sharpening is employed to enhance the edges and details of the image, making the image clearer. By applying the gamma function for correction, the perceived brightness and contrast of the image can be adjusted to output the image brightness value. Then, the global contrast coefficient is estimated through Laplacian filtering. Finally, multi-scale feature fusion is realized using a Laplacian and a Gaussian pyramid to obtain the final output image.

The pre-processing method for the underwater image dataset in this paper comprises a two-step strategy that combines white balance-based methods with image multi-scale fusion. The inconsistent imaging depth in underwater environments results in variations in color wavelength. The purpose of white balancing is to correct the color projection deviation encountered in such situations. This approach does not require explicit inversion of the optical model and can be used to correct images with relatively consistent background tones in the dataset. Following white balancing, image fusion is applied for the enhancement of the edges and details of the images, and to alleviate the contrast loss caused by backward scattering. The “gray world” method can also be used to remove the blue background tone from underwater images. However, this method can cause disruption in the red channel features, which mainly occurs because in underwater images the red channel values are too low, leading to excessive channel value compensation (since the gray world theory divides each channel based on its average value). To address this issue, it is necessary to calculate a correction for the red channel:

I_{r c} (x) = I_{r} (x) + α \cdot ({\bar{I}}_{g} - {\bar{I}}_{r}) \cdot (1 - I_{r} (x)) \cdot I_{g} (x)

(1)

where

I_{r}

and

I_{g}

represent the values of two channels in the image, r represents red, g represents green. The application of this equation means that channel values are not processed based on their original values but are rather normalized dynamically. Similarly,

{\bar{I}}_{r}

and

{\bar{I}}_{g}

represent the average values of the corresponding channels.

Underwater light is strongly absorbed and attenuated, especially in turbid waters or areas with high concentrations of plankton, resulting in significant attenuation of the blue channel. To address this issue, the compensation for the attenuation of the blue channel is calculated as follows:

I_{b c} (x) = I_{b} (x) + α \cdot ({\bar{I}}_{g} - {\bar{I}}_{b}) \cdot (1 - I_{b} (x)) \cdot I_{g} (x)

(2)

In a similar fashion to the previous equation,

I_{b}

and

I_{g}

are the blue and red channel values of the image, and

α

in Equations (1) and (2) takes the value 1. Once the channel corrections have been applied, the image is sharpened using a sharpening factor:

S = \frac{I + N (I - G * I)}{2}

(3)

where I represents the original image, G represents the image after smoothing using a filter (usually a Gaussian filter), * represents the convolution operation, and

N {.}

represents a linear normalization operator whose effect is typically histogram stretching. This operator scales the intensity of all color pixels in the image through the application of a unique scaling factor, thus improving the intensity of all color pixels to cover the entire dynamic range.

After sharpening, the next step is fusion. The Laplacian contrast weight

W_{L}

is used to estimate the global contrast and is obtained through Laplacian filtering. The calculation of the Laplacian sampling layer

L_{i}

corresponding to the output image

I (x)

is as follows:

\begin{matrix} I (x) & = I (x) - G_{1} {I (x)} + G_{1} {I (x)} \\ ≜ L_{1} {I (x)} + G_{1} {I (x)} \\ = L_{1} {I (x)} + G_{1} {I (x)} - G_{2} {I (x)} + G_{2} {I (x)} \\ = L_{1} {I (x)} + L_{2} {I (x)} + G_{2} {I (x)} \\ = \dots \\ = \sum_{l = 1}^{N} L_{1} {I (x)} \end{matrix}

(4)

where L represents the Laplacian pyramid sampling and

G_{i}

represents the Gaussian pyramid sampling, with i representing the i-th layer. Both the Gaussian and the Laplacian pyramids have the capability of multi-scale representation and image reconstruction, yet they emphasize different aspects of feature information. The Gaussian pyramid primarily focuses on the overall structure and texture information of the image, whereas the Laplacian pyramid is more concerned with the local detailed information of the image. For underwater image deblurring, the Laplacian weight alone is insufficient for color variation discrimination and contrast restoration. To address this problem, an additional contrast evaluation metric is introduced.

The Laplacian contrast weight

W_{L}

is a combination of

L_{i}

. The saturation weight (

W_{S}

) aims to enhance textural features in underwater scenes through contrast improvement. To measure the level of image saliency, a saliency estimation method is used. In regions with high brightness, the effect of the saturation weight is more pronounced:

W_{S} (x, y) = \sum_{(m, n) \in N} d [p (x, y), q (m, n)]

(5)

Larger convolution kernels can have a better image noise smooth effect, and so a

10 \times 10

convolution kernel is used here. In Equation (5), N is the small neighborhood of the pixel block

(x, y)

obtained by filtering and downsampling the image, and d is the Euclidean distance between pixel vectors

p

and

q

.

The saturation weight

(W_{S a t})

is used to retain color information even when the intensity values of the image enter the saturated range, and is defined as follows:

W_{S a t} = \sqrt{\frac{1}{3 Δ}}

(6)

W_{S a t}

combines the channel values of

R_{k}

,

G_{k}

, and

B_{k}

with the k-th image’s background light intensity

L_{k}

, while

Δ

is defined as follows:

Δ = {(R_{k} - L_{k})}^{2} + {(G_{k} - L_{k})}^{2} + {(B_{k} - L_{k})}^{2}

(7)

In practical applications, each individual weight coefficient has its advantages, so a combination of the three coefficients is used. For each input k, the three weights

W_{L}

,

W_{S}

, and

W_{S a t}

discussed earlier are summed to obtain the aggregated weight

W_{k}

. Then, the aggregated weights are normalized on a per-pixel basis by dividing each pixel’s weight with the sum of the weights for the corresponding pixel across all images, resulting in the following equation:

{\bar{W}}_{k} = \frac{β (W_{k} + σ)}{\sum_{k = 1}^{K} W_{k} + K σ}

(8)

where

σ

is a small regularization term to ensure that the input corresponds to the output, with a value of 0.1, and

β

is a correction factor with a value of 0.85.

The relationship between the weight parameters obtained during the normalization process can be used to reconstruct the image

R (x)

through weighted fusion between the image and pixel positions x:

R (x) = \sum_{k = 1}^{K} {\bar{W}}_{k} (x) I_{k} (x)

(9)

where

I_{k}

represents the input. In order to achieve a good image fusion effect, all the aforementioned operations need unified image dimensions. Therefore, in the implementation process, image pyramids are used to sample the images based on their resolutions in order to obtain the same dimensions. Following a traditional multi-scale fusion strategy [23], each input

I_{k}

’s features are decomposed into a Laplacian pyramid [24], while the corresponding weights are decomposed into a Gaussian pyramid. To ensure the uniformity of the feature dimensions, these two pyramid structures should have the same numbers of levels. To obtain the final weight information, the Laplacian inputs and Gaussian pyramid weights are fused, as calculated below:

R_{l} (x) = \sum_{k = 1}^{K} G_{l} \{{\bar{W}}_{k} (x)\} L_{l} \{I_{k} (x)\}

(10)

where l represents the pyramid sampling level, and k represents the number of input images.

3.2. Generative Adversarial Learning Principles

GANs [2] are based on the concept of a “game”, and utilize an adversarial process between a generator and a discriminator to estimate a generative model [25]. The process is similar to a scenario where a police officer tries to distinguish counterfeit money from genuine currency; the generator can be considered as the team producing counterfeit money and attempting to use it without detection, while the discriminator acts as the police officer, aiming to verify the authenticity of the currency. The training process of adversarial learning is akin to that of a competitive game where both sides improve their technique until counterfeit and genuine items become indistinguishable. Figure 3 illustrates the structure of a GAN.

In adversarial learning, the data generated by the generator and the real data are input into the discriminator, which judges considering the features of the two data inputs. The loss is then fed back to both models during label verification. Adversarial learning requires training both models simultaneously: the generator G, which aims to fit the training data distribution, and the discriminator D, which estimates the probability that a sample comes from the training data or the generator. In the function space of G and D, the ideal scenario is when the discriminator D outputs each of the two values with a probability of 0.5, thus achieving a game equilibrium. The generator and discriminator are composed of multilayer perceptrons, enabling the training of the entire system model using gradient-based back-propagation. During sample training or generation, there is no need for approximate inference networks or Markov chains; instead, the quality of network training is evaluated through a qualitative and quantitative assessment of the generated samples.

Figure 4 shows the process of data distribution changes during the training of a GAN. Training involves updating the discriminator distribution (represented by the blue dashed line) to differentiate between samples from the real data distribution (black dashed line)

p_{x}

and samples from the generated distribution

p_{g}

(green solid line). The lower line z represents random noise initially distributed uniformly, while the upper line represents real data. The upward arrows represent the mapping of uniformly distributed random noise z through the generator to the real data x. During the adversarial process, the real and the generated data can be easily distinguished by the discriminator D through an activation function (blue dashed line). After multiple iterations, the distribution change from (a) to (d) occurs, and the two distributions become almost identical. At this point, the discriminator cannot differentiate between the two distributions, and the output becomes a straight line with a value of 0.5, achieving an adversarial balance. Neither the generator nor the discriminator can further improve their accuracy.

3.3. Design of Adversarial Learning Loss Function

The use of DL for UIE can be considered as a style transfer process from the original domain to the target domain. Style transfer between images is a type of visual and graphic reconstruction problem, with the objective of learning a feature mapping relationship from the training set such that the input and target images have the same content style. In recent years, GANs have excelled in the field of image style transfer, with numerous representative style transfer algorithms emerging. In this paper, improvements are made based on style transfer algorithms, including the loss function and network structure.

The overall process diagram of this paper’s algorithm is shown in Figure 5. Recent algorithms used for this purpose, such as CycleGAN [7], UGAN [8], or FUGAN [9], consider only pixel-level cycle loss or unidirectional high-dimensional semantic loss. CycleGAN and VGG-19 provide good inspiration, as they take into account high-dimensional image feature information for the cycle loss function. The main difference in the proposed model is the definition of a cycle loss function based on high-dimensional semantic information, which includes both content- and pixel-level cycle losses. In the proposed function, high-dimensional semantic information is extracted from the original image using VGG-19’s feature extraction module. However, in our opinion, L2 loss is too susceptible to outliers and may result in feature blurring during training with high-dimensional semantic loss. Therefore, in this paper, an L1 loss is employed for high-dimensional semantic information, as follows:

\begin{matrix} L_{con} (G) & = E_{x \sim Pdata (x)} [{∥ Φ (x) - Φ (F (G (x))) ∥}_{1}] \\ + E_{y \sim Pdata (y)} [{∥ Φ (y) - Φ (G (F (y))) ∥}_{1}] \end{matrix}

(11)

where

Φ

represents the feature extraction operation performed by the VGG-19 network, and G and F are the image generation functions that distinguish the generation from the original domain to the target domain and vice versa. The form of the adversarial loss in this paper is as follows:

\begin{matrix} L_{adv} (G, D, X, Y) & = E_{y \sim Pdata (y)} [\log D (y)] \\ + E_{x \sim P Pdata (x)} [\log (1 - D (G (x)))] \end{matrix}

(12)

In addition to the cycle loss with high-dimensional semantics, in order to better calculate the network weights and generate multidimensional feature information in images, pixel-level loss functions are also taken into account in this paper. These include pixel-level cycle loss and content loss, which are formulated as follows:

\begin{matrix} L_{cyc} (G, F) & = E_{x \sim Pdata (x)} [{∥ F (G (x) - x) ∥}_{1}] \\ + E_{y \sim Pdata (y)} [{∥ G (F (y) - y) ∥}_{1}] \end{matrix}

(13)

L_{pixel} = E_{x \sim Pdata (x)} [{∥ Y - G (x) ∥}_{1}]

(14)

Finally, the loss function for the adversarial learning generator is calculated as the weighted sum of the above loss functions:

L_{G} = L_{adv} + λ_{1} L_{cyc} + λ_{2} L_{pixel} + λ_{3} L_{con}

(15)

where the weight coefficients

λ_{1}

,

λ_{2}

, and

λ_{3}

are normalized, with

λ_{1} = 0.25

,

λ_{2} = 0.5

, and

λ_{3} = 0.1

.

The discriminator loss mainly comes from two parts, namely the real and the generated data. The overall loss function is the weighted average of the two, calculated as follows:

\begin{matrix} L_{D} & = E_{x \sim Pdata (x)} ({∥ D (x) - V (x) ∥}_{1}) + E_{y \sim Pdata (y)} (∥ D ({G (y) - F (y) ∥}_{1}) \end{matrix}

(16)

where

V (x)

represents the labels of the real data, defined as a tensor of the same size as the input image, with all values equal to 1. Similarly,

F (x)

represents the labels of the fake data, defined as a tensor of the same size as the input image, with all values equal to 0. When the data pass through the discriminator, the latter outputs a value in the range of

[0, 1]

, which indicates the probability of the data being true values.

3.4. Adversarial Learning Network Architecture

The generator network, shown in Figure 6, plays a crucial role in ensuring the quality of the generated images. In this study, some improvements are made to the original generator network structure. On one hand, to ensure that the training feature accuracy encompasses comprehensive texture details, the ResNet network structure is utilized for training, where the upsampling layer is replaced with PixelShuffle [26]. On the other hand, considering the negative samples’ influence on gradient updates, LeakyReLU is used as the activation function, which improves the training performance in negative gradients. The generator network structure includes an initialization layer, a ResNet network, upsampling and downsampling layers, and a final output layer.

The discriminator network structure is shown in Figure 7; it is the original architecture proposed [7], which includes four discriminator modules and an output layer. In adversarial learning, the discriminator is mainly responsible for distinguishing between real and generated images. During training, especially in the early stages when the generator’s capability is low, the discriminator’s loss is sufficiently small to handle the discrimination of real and generated images. The training process is shown in Algorithm 1.

Algorithm 1: Adversarial learning models train algorithms

4. Overall Experimental Results and Comparative Analysis

As mentioned earlier, the comprehensive evaluation of underwater image quality should consider factors such as colorfulness, saturation, and contrast, which reflect the image’s quality. Common evaluation metrics include the underwater color image quality evaluation (UCIQE) metric [27], the underwater image quality measure (UIQM) [28], and entropy. To comprehensively evaluate the performance of the trained DL models and demonstrate the superiority of the proposed algorithm, the UIEB [2] and EUVP [9] datasets were used; the latter contains two sub-datasets corresponding to different scenes. In the UIEB dataset, which contained reference images, the structural similarity (SSIM) and peak signal-to-noise ratio (PSNR) were used as evaluation metrics to assess the quality of the generated images. The compared algorithms include four DL algorithms, namely CycleGAN [7], FUGAN [9], Shallow-Uwnet [29], and MDNet [16], as well as three non-DL algorithms, which were BLOT [30], CB [31], and RGHS [32].

4.1. Design and Testing of Adversarial Learning Model Training

For the experiments, the adversarial learning network structure was first designed in Spyder software and the logic of the code implementation was initially verified. To improve efficiency, all code models were trained on the GPU. The specific experimental platform parameters were as follows: the processor was an Intel Xeon E5-2640v4 with four GTX1080Ti graphics cards. The operating system was Ubuntu 16.04 and the system RAM was 32 GB at 2400 MHz. To achieve sufficient training accuracy, the number of training iterations (epochs) was set to 200. A large batch size would cause significant fluctuations in the training process loss, making it difficult to observe the parameter updates during training. Additionally, a large batch size may lead to weak feature learning from the batch. Therefore, we set the batch size to 1. The ADAM network optimizer was also used, with a learning rate of 0.0002, and the coefficient range for calculating running averages and squared gradients was

(0.5, 0.999)

. Learning rate decay occurred after 100 iterations. The image transformation size was set to

256 \times 256

with three channels. The input data were normalized before training.

To verify that the proposed loss function logic met the training requirements, the overall loss function and content loss of the high-dimensional semantic information (

L_{c o n}

) were observed during the training process. Figure 8 shows the convergence of the loss functions during the training of the adversarial learning model.

From Figure 8, it can be seen that the loss of high-dimensional semantic information during the training process initially fluctuated significantly but stabilized and decreased to a stable value as the number of iterations increased. This indicates that the designed loss function demonstrates good convergence during model training. In addition, the overall generation loss followed a similar trend. The process transitions from oscillation to stability and eventually convergence, and is therefore suitable for model training.

4.2. Evaluation Metrics

The clarity and color richness of underwater images are highly correlated. The UCIQE is a commonly used underwater image quality metric that quantifies the color, saturation, and contrast aspects of underwater optical images. The UCIQE avoids excessive reliance on saturation deviation because an excessive emphasis on saturation tends to highlight dark areas, which are often present in underwater images due to low lighting conditions. Overcompensating in these areas can lead to evaluations where a high index value does not reflect the perceived image quality.

Compared to other color image evaluation metrics, UCIQE encompasses quantified degradation metrics of underwater images through three quality quantization indicators: chromaticity, contrast, and saturation. UCIQE has higher correlation with the prediction and perceived quality of underwater images, and it can be calculated using an efficient real-time algorithm with low complexity.

UCIQE is used to evaluate the quality of underwater images in terms of color chromaticity, saturation, and contrast. However, a single evaluation metric cannot comprehensively reflect the quality of underwater images. To objectively evaluate whether the output image is in line with human perception, the UIQM evaluation metric was also used to evaluate the performance of the proposed model [28]. The UIQM is a quantification metric based on the human visual system, which is highly correlated with the perception of human visual perception and the perceived quality of underwater images. The UIQM takes into account three dimensions of underwater image characteristics measurement, namely the Underwater Image Colorfulness Measure (UICM), the Underwater Image Sharpness Measure (UISM), and the Underwater Image Contrast Measure (UIConM).

The UIQM is a linear combination of UICM, UISM, and UIConM:

\begin{matrix} U I Q M & = c_{1} * U I C M + c_{2} * U I S M + c_{3} * U I C o n M \end{matrix}

(17)

It is worth noting that the UIQM in Equation (17) has three parameters,

c_{1}

,

c_{2}

, and

c_{3}

. The selections of these parameters are application-dependent. For general applications, the combination coefficients are obtained using multiple linear regression. For the results shown in this paper, a generic coefficient set

c_{1} = 0.0282

,

c_{2} = - 0.2953

, and

c_{3} = 3.5753

was used for the calculation of UIQM.

Information entropy represents the amount of information in an image, and its expression is as follows:

H = - \sum_{i = 0}^{255} P_{i} \log P_{i}

(18)

where

P_{i}

represents the proportion of pixels with a grayscale value of i in the total pixels.

In order to comprehensively evaluate the quality of images generated through DL, it is necessary to evaluate their color feature composition in an unreferenced manner, and also to compare the texture features of the generated images to those of the reference images. Therefore, two additional evaluation metrics were adopted: the SSIM and the PSNR. The SSIM is used to measure the structural similarity between two images and is calculated as follows:

S S I M (x, y) = (\frac{2 μ_{x} μ_{y} + c_{1}}{μ_{x}^{2} + μ_{y}^{2} + c_{1}}) (\frac{2 σ_{x y} + c_{2}}{σ_{x}^{2} + σ_{y}^{2} + c_{2}})

(19)

where

μ

represents the mean,

σ

represents the standard deviation,

σ_{x y}

represents the covariance of x and y, and c is a constant to ensure stability. SSIM evaluates the similarity of two images from the perspectives of brightness, contrast, and structure.

The PSNR is commonly used to measure the reconstruction quality of an image. It is based on the Mean Squared Error (MSE) between the generated image x and the ground truth image y, and its equation is as follows:

P S N R = 10 \log_{10} (\frac{M A X_{I}^{2}}{M S E})

(20)

4.3. Generalization Comparison Experiment

In the first experiment, the generalization ability of the trained adversarial learning model was tested and its enhancement performance was evaluated across multiple classes of scenes. Experiments were conducted on the validation set of the IMAGENET sub-dataset in EUVP, which contains various underwater image features, including color distortion and different object characteristics. Randomly selected enhanced images from the experiments are shown in Figure 9, while Table 1 shows the average metric values for each of the compared algorithms on the respective sub-datasets.

Next, subjective and objective experiments were conducted on the SCENES sub-dataset of EUVP. The comparison of enhancement results and metrics is presented in Table 2 and Figure 10, while Table 3 and Table 4, and Figure 11 are the corresponding results from the UIEB RAW dataset.

Observing the enhancement results from the three datasets, it is evident that, compared to the CB algorithm, the algorithm in this paper exhibits better contrast and color saturation, resulting in an overall brighter image. Compared to the BLOT algorithm, the proposed algorithm performed better in color bias reduction, overexposure avoidance, and blurriness minimization. Compared to the CycleGAN algorithm, the proposed algorithm also shows superior results in residual color bias and overall enhancement; while the FUGAN enhancement is effective, it tends to overcorrect colors, leading to other color bias issues in the image. Compared to RGHS, which has the best performance among the various control groups, the proposed algorithm exhibited lesser residual color bias. In summary, the images enhanced by the proposed algorithm generally demonstrated optimal color bias correction, color saturation, brightness, and contrast, and relatively low noise levels.

From Table 1, Table 2, Table 3 and Table 4, it can be observed that, on the IMAGENET dataset, the proposed algorithm achieved the highest average values for the UIQM and entropy metrics. In terms of subjective and objective evaluations, the proposed algorithm performed best on this sub-dataset. In the SCENES dataset, the proposed algorithm also yielded the highest average values for the UIQM and entropy metrics, indicating its superior overall enhancement performance. In the UIEB RAW dataset, since reference images were available, the two additional metrics SSIM and PSNR were included. The RGHS algorithm achieved the highest average values for these two metrics, while the proposed algorithm also obtained the highest average values for both metrics. Therefore, on this dataset, the proposed algorithm surpassed the other algorithms in their respective specialties, with each algorithm having its own strengths.

Considering the three datasets collectively, the proposed algorithm performed optimally on two sub-datasets and demonstrated its advantages in the other sub-dataset compared to the other algorithms. Thus, the proposed algorithm exhibits good generalization ability across both synthetic and real-world datasets.

4.4. Harsh Underwater Environment Comparison Experiment

The generalization ability of the proposed algorithm shows its versatility in enhancing underwater images in general scenes. To evaluate the algorithm’s superior performance in imaging under harsh underwater environments, a comparative experiment on the CHALLENGING (CHAL) sub-dataset from the UIEB dataset was conducted.

From Table 5, it can be seen that, compared to the other algorithms, the proposed algorithm exhibited significant improvement in the UCIQE metric, surpassing them by a large margin in terms of UIQM and entropy values. However, relying solely on objective evaluation metrics has certain limitations. As shown in Figure 12, although RGHS achieves a high UCIQE score, its enhancement effect is not as good as those of the proposed algorithm in certain challenging underwater environments. The RGHS exhibited issues such as excessive color correction, residual color distortion, and poor exposure in dark areas. In contrast, the proposed algorithm performed better in terms of contrast, color saturation, and clarity in the majority of image enhancements, aligning more with human perception and producing images of better quality. Thus, it can be concluded that the proposed algorithm outperforms other algorithms in terms of enhancement effectiveness in harsh underwater environments, both subjectively and objectively.

4.5. Ablation Experiment

In this paper, the white balance and the multi-scale fusion algorithms were adopted for image pre-processing. In order to verify the roles of the pre-processing module in the entire network model, it is necessary to conduct ablation experiments.

The ablation experiments were designed to exclude one or both of the pre-processing modules from the network model, and then evaluate the performance of the modified network model on a benchmark dataset. By comparing the evaluation results of the modified network model with those of the original network model, the effectiveness of the pre-processing module and the DL module can be quantitatively assessed.

The experiments were conducted using the UIEB RAW dataset. The experimental environment and parameter settings were consistent with those described in the previous sections. The experimental results are shown in Figure 13 and Table 6.

According to the results of the ablation experiments, it can be concluded that incorporating pre-processing modules leads to better overall performance. First, the addition of the pre-processing modules has a significant effect on improving image quality. By applying white balance and multi-scale fusion algorithms, color balance and detail richness are improved. Compared to the absence of pre-processing, the images with pre-processing are clearer, more natural, and reproduce colors more accurately. Second, the introduction of pre-processing modules also has a positive impact on the model’s prediction accuracy. By optimizing the images, the model is less disturbed during feature extraction and prediction, thereby improving prediction accuracy. Compared to the absence of pre-processing, the model with pre-processing achieved significant improvements in all evaluation metrics. Finally, comparing the results of the ablation experiments, it can be inferred that pre-processing modules have a greater impact on the performance of the entire system. In this experiment, white balance and multi-scale fusion algorithms were proven to be two important pre-processing modules, and their presence greatly improved the performance of the model in image processing tasks. In addition, the incorporation of DL modules also made a positive contribution to the model’s performance, demonstrating the importance of DL in image feature extraction and prediction tasks.

To this end, the results of the ablation experiments indicate that the incorporation of pre-processing modules is crucial for improving the performance of image processing systems. The application of white balance and multi-scale fusion algorithms can significantly improve image quality and prediction accuracy, while the incorporation of DL modules further enhances the model’s ability. These experimental results provide valuable guidance for the selection and optimization of pre-processing algorithms for similar image processing tasks.

4.6. Multidimensional Comprehensive Evaluation

To comprehensively evaluate the algorithm’s performance, after verifying the algorithm’s superiority across multiple datasets using subjective and objective evaluations, comparative experiments were conducted on feature detection in practical scenarios. First, feature detection was performed using the SIFT [33] algorithm, which detects image features that are positively correlated with image metrics such as contrast and color saturation. In this experiment, the detection of features is facilitated when the image textures are clearer, as this results in correspondingly clear transformation results. Therefore, the number of SIFT key-points is a quantitative indicator reflecting the richness of the features in an image. Seven underwater images named as Scene 1 to Scene 7 and obtained from harsh environments, as presented in Figure 14 and described in Table 7, were used to demonstrate the key-points detected after the application of the different algorithms.

From Table 7, it can be observed that the proposed algorithm ranked second in terms of the number of detections in the two scenes and first in five scenes and overall. Moreover, the total number of detected key-points exceeded that of the other compared algorithms by an order of magnitude. In harsh underwater imaging environments, the proposed algorithm significantly enhances the image quality, resulting in approximately a 4.46 times increase in the number of detected key-points compared to the original images. SIFT is commonly used for feature matching scenarios.

Next, the RANSAC algorithm was used to filter feature points for image pairs with the same features and then SIFT feature matching was performed. Figure 15 shows the SIFT feature matching results for a pair of original underwater images and a pair of images enhanced using the proposed algorithm. From the feature matching comparison, it can be seen that the original images had 102 matches, while the enhanced images had 153 matches. Therefore, the proposed algorithm achieved a significant enhancement of the features in the underwater images, which is a key step for underwater imaging applications.

5. Conclusions

This paper focused on the enhancement of images obtained in harsh underwater environments. Due to the limited color correction information contained in the reference images in different datasets, the feature learning ability of style transfer algorithms is inhibited in UIE applications. To address this, traditional image processing methods were introduced to pre-process to reduce the significant color distortion of features in the underwater image dataset, thus enhancing the algorithm’s ability of scene transfer learning. Additionally, an adversarial learning algorithm was proposed with high-dimensional semantic cycle loss to tackle issues such as blurriness, color distortion, and brightness artifacts in harsh underwater environments. Based on this algorithm, corresponding DL network structures were designed, which improved the algorithm’s generalization ability and feature restoration accuracy across multiple scene categories during training. As evidenced by the multidimensional experiments presented, the proposed algorithm exhibits excellent performance in both subjective and objective evaluation tests, as well as in practical visual applications.

Author Contributions

Methodology and writing—original draft preparation, M.Z.; writing—review and editing, Y.L.; supervision, W.Y.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62373246, Grant 62203299, and Grant 62388101; in part by the Research Fund of Henan Key Laboratory of Underwater Intelligent Equipment under Grant KL02B2301; in part by the Oceanic Interdisciplinary Program of Shanghai Jiao Tong University under Grant SL2023MS007, Grant SL2022MS008; in part of the STI2030-Major Projects Grant 2022ZD0213100.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, [[email protected]], upon reasonable request.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Sun, S.; Wang, H.; Zhang, H.; Li, M.; Xiang, M.; Luo, C.; Ren, P. Underwater Image Enhancement With Reinforcement Learning. IEEE J. Ocean. Eng. 2024, 49, 249–261. [Google Scholar] [CrossRef]
Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE Trans. Image Process. 2020, 29, 4376–4389. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Lowe, D.G. Alias-Free Generative Adversarial Networks. In Proceedings of the Neural Information Processing Systems (NeurIPS), Online, 6–14 December 2021; pp. 852–863. [Google Scholar]
Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing Underwater Imagery Using Generative Adversarial Networks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar] [CrossRef]
Islam, M.J.; Xia, Y.; Sattar, J. Fast Underwater Image Enhancement for Improved Visual Perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
Hambarde, P.; Murala, S.; Dhall, A. UW-GAN: Single-Image Depth Estimation and Image Enhancement for Underwater Images. IEEE Trans. Instrum. Meas. 2021, 70, 5018412. [Google Scholar] [CrossRef]
Liu, R.; Jiang, Z.; Yang, S.; Fan, X. Twin Adversarial Contrastive Learning for Underwater Image Enhancement and Beyond. IEEE Trans. Image Process. 2022, 31, 4922–4936. [Google Scholar] [CrossRef]
Wu, S.; Luo, T.; Jiang, G.; Yu, M.; Xu, H.; Zhu, Z.; Song, Y. A Two-Stage Underwater Enhancement Network Based on Structure Decomposition and Characteristics of Underwater Imaging. IEEE J. Ocean. Eng. 2021, 46, 1213–1227. [Google Scholar] [CrossRef]
Qi, Q.; Li, K.; Zheng, H.; Gao, X.; Hou, G.; Sun, K. SGUIE-Net: Semantic Attention Guided Underwater Image Enhancement with Multi-Scale Perception. IEEE Trans. Image Process. 2022, 31, 6816–6830. [Google Scholar] [CrossRef]
Panetta, K.; Kezebou, L.; Oludare, V.; Agaian, S. Comprehensive Underwater Object Tracking Benchmark Dataset and Underwater Image Enhancement With GAN. IEEE J. Ocean. Eng. 2022, 47, 59–75. [Google Scholar] [CrossRef]
Zhou, Y.; Yan, K.; Li, X. Underwater Image Enhancement via Physical-Feedback Adversarial Transfer Learning. IEEE J. Ocean. Eng. 2022, 47, 76–87. [Google Scholar] [CrossRef]
Zhang, S.; Zhao, S.; An, D.; Li, D.; Zhao, R. MDNet: A Fusion Generative Adversarial Network for Underwater Image Enhancement. J. Mar. Sci. Eng. 2023, 11, 1183. [Google Scholar] [CrossRef]
Guan, F.; Lu, S.; Lai, H.; Du, X. AUIE–GAN: Adaptive Underwater Image Enhancement Based on Generative Adversarial Networks. J. Mar. Sci. Eng. 2023, 11, 1476. [Google Scholar] [CrossRef]
Lin, J.C.; Hsu, C.B.; Lee, J.C.; Chen, C.H.; Tu, T.M. Dilated generative adversarial networks for underwater image restoration. J. Mar. Sci. Eng. 2022, 10, 500. [Google Scholar] [CrossRef]
Yang, P.; He, C.; Luo, S.; Wang, T.; Wu, H. Underwater Image Enhancement via Triple-Branch Dense Block and Generative Adversarial Network. J. Mar. Sci. Eng. 2023, 11, 1124. [Google Scholar] [CrossRef]
Lu, J.; Yuan, F.; Yang, W.; Cheng, E. An Imaging Information Estimation Network for Underwater Image Color Restoration. IEEE J. Ocean. Eng. 2021, 46, 1228–1239. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, J.; Cao, Y.; Wang, Z. A deep CNN method for underwater image enhancement. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 1382–1386. [Google Scholar] [CrossRef]
Yang, H.H.; Huang, K.C.; Chen, W.T. LAFFNet: A Lightweight Adaptive Feature Fusion Network for Underwater Image Enhancement. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 685–692. [Google Scholar] [CrossRef]
Mertens, T.; Kautz, J.; Van Reeth, F. Exposure fusion: A simple and practical alternative to high dynamic range photography. In Computer Graphics Forum; Wiley Online Library: Hoboken, NJ, USA, 2009; pp. 161–171. [Google Scholar]
Burt, P.J.; Adelson, E.H. The Laplacian pyramid as a compact image code. In Readings in Computer Vision; Elsevier: Amsterdam, The Netherlands, 1987; pp. 671–679. [Google Scholar]
Karras, T.; Aittala, M.; Laine, S.; Härkönen, E.; Hellsten, J.; Lehtinen, J.; Aila, T. Alias-Free Generative Adversarial Networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2021; Volume 34, pp. 852–863. [Google Scholar]
Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar] [CrossRef]
Yang, M.; Sowmya, A. An Underwater Color Image Quality Evaluation Metric. IEEE Trans. Image Process. 2015, 24, 6062–6071. [Google Scholar] [CrossRef] [PubMed]
Panetta, K.; Gao, C.; Agaian, S. Human-Visual-System-Inspired Underwater Image Quality Measures. IEEE J. Ocean. Eng. 2016, 41, 541–551. [Google Scholar] [CrossRef]
Naik, A.; Swarnakar, A.; Mittal, K. Shallow-uwnet: Compressed model for underwater image enhancement (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 15853–15854. [Google Scholar]
Song, W.; Wang, Y.; Huang, D.; Liotta, A.; Perra, C. Enhancement of Underwater Images with Statistical Model of Background Light and Optimization of Transmission Map. IEEE Trans. Broadcast. 2020, 66, 153–169. [Google Scholar] [CrossRef]
Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bovik, A.C. Single-Scale Fusion: An Effective Approach to Merging Images. IEEE Trans. Image Process. 2017, 26, 65–78. [Google Scholar] [CrossRef]
Huang, D.; Wang, Y.; Song, W.; Sequeira, J.; Mavromatis, é. Shallow-Water Image Enhancement Using Relative Global Histogram Stretching Based on Adaptive Parameter Acquisition. In MultiMedia Modeling; Springer: Cham, Switzerland, 2018; pp. 453–465. [Google Scholar]
Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]

Figure 1. Underwater degradation image.

Figure 2. Image pre-processing.

Figure 3. Generative adversarial network (GAN) structural diagram.

Figure 4. Change in data distributions during training. Blue dashed line: discriminator distribution. Green solid line: generated distribution. Black dashed line: real data distribution. Figures (a–d) depict the trend of data distribution throughout the iterations. As the number of iterations increases, the distribution of the final generated data aligns closely with the real data. This indicates that the discriminator converges to a straight line with an output value of 0.5, signifying the attainment of adversarial equilibrium.

Figure 5. Proposed adversarial learning model framework with cyclic loss for high-dimensional semantic information: content- and pixel-level cycle losses.

Figure 6. Generator network structure, which includes an initialization layer, the ResNet network, upsampling and downsampling layers, and a final output layer.

Figure 7. Discriminator network structure, which includes four discriminator modules and an output layer.

Figure 8. Loss of high-dimensional semantic information and total generation during training.

Figure 9. Comparison of image enhancement algorithms on the IMAGENET dataset.

Figure 10. Comparison of image enhancement algorithms on the SCENES dataset.

Figure 11. Comparison of image enhancement algorithms on the UIEB RAW dataset.

Figure 12. Comparison of UIE algorithms on the UIEB CHAL dataset.

Figure 13. Ablation experiment result on the UIEB RAW dataset.

Figure 14. Comparison of enhanced images and SIFT feature detection.

Figure 15. Comparison of SIFT-based feature matching.

Table 1. Average index value of each algorithm on the IMAGENET dataset.

Evaluation Index	Original Image	CB	BLOT	Cycle-GAN	RGHS	FU-GAN	The Proposed Method
UCIQE (↑)	0.569	0.575	0.631	0.579	0.623	0.583	0.571
UIQM (↑)	2.592	3.010	2.192	2.814	2.200	2.763	3.093
Entropy (↑)	7.438	7.612	7.340	7.542	7.322	7.580	7.655

Table 2. Average index value of each algorithm on the SCENES dataset.

Evaluation Index	Original Image	CB	BLOT	Cycle-GAN	RGHS	FU-GAN	The Proposed Method
UCIQE (↑)	0.566	0.589	0.657	0.589	0.624	0.588	0.594
UIQM (↑)	2.466	3.067	2.246	2.785	2.155	2.767	3.066
Entropy (↑)	7.531	7.579	7.446	7.579	7.430	7.579	7.606

Table 3. Average index value of each algorithm on the UIEB RAW dataset (1).

Evaluation Index	Original Image	CB	BLOT	Cycle-GAN	RGHS	FU-GAN	The Proposed Method
UCIQE (↑)	0.548	0.589	0.626	0.580	0.628	0.576	0.586
UIQM (↑)	2.364	3.003	2.242	2.949	2.391	3.075	3.301
Entropy (↑)	7.273	7.481	6.971	7.489	7.509	7.521	7.614
SSIM (↑)	0.795	0.853	0.768	0.752	0.827	0.707	0.790
PSNR (↑)	20.823	21.245	18.739	2.947	23.484	20.457	22.467

Table 4. Average index value of each algorithm on the UIEB RAW dataset (2).

Evaluation Index	Original Image	Shallow- Uwnet	MDNet	The Proposed Method
UIQM (↑)	2.364	3.090	2.078	3.301
SSIM (↑)	0.795	0.810	0.820	0.790
PSNR (↑)	20.823	20.780	22.270	22.467

Table 5. Average index value of each algorithm on the UIEB CHAL dataset.

Evaluation Index	Original Image	CB	BLOT	Cycle-GAN	RGHS	FU-GAN	The Proposed Method
UCIQE (↑)	0.518	0.550	0.586	0.555	0.612	0.546	0.556
UIQM (↑)	1.985	2.381	1.836	2.645	1.916	2.876	3.099
Entropy (↑)	6.934	7.195	6.854	7.219	7.325	7.215	7.447

Table 6. Ablation experiment result on the UIEB RAW dataset.

Pre-Processing Module	DL Module	UCIQE (↑)	UIQM (↑)	Entropy (↑)	SSIM (↑)	PSNR (↑)
✓		0.574	3.111	7.384	0.847	19.935
	✓	0.579	2.984	7.476	0.754	21.119
✓	✓	0.586	3.301	7.614	0.790	22.467

Table 7. Comparison of feature point detection numbers in selected underwater images.

Evaluation Index	Original Image	CB	BLOT	Cycle-GAN	RGHS	FU-GAN	The Proposed Method
Scene 1	232	549	630	650	725	600	791
Scene 2	16	189	98	70	181	25	391
Scene 3	20	129	65	83	98	60	133
Scene 4	174	313	299	315	449	287	428
Scene 5	64	241	234	235	314	73	516
Scene 6	19	29	45	86	73	88	94
Scene 7	9	8	14	37	29	22	30
Summary	534	1458	1385	1476	1869	1155	2383

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, M.; Li, Y.; Yu, W. Underwater Image Enhancement Algorithm Based on Adversarial Training. Electronics 2024, 13, 2184. https://doi.org/10.3390/electronics13112184

AMA Style

Zhang M, Li Y, Yu W. Underwater Image Enhancement Algorithm Based on Adversarial Training. Electronics. 2024; 13(11):2184. https://doi.org/10.3390/electronics13112184

Chicago/Turabian Style

Zhang, Monan, Yichen Li, and Wenbin Yu. 2024. "Underwater Image Enhancement Algorithm Based on Adversarial Training" Electronics 13, no. 11: 2184. https://doi.org/10.3390/electronics13112184

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Underwater Image Enhancement Algorithm Based on Adversarial Training

Abstract

1. Introduction

2. Related Work

3. Design of Algorithms and Models

3.1. Color Balance and Fusion Pre-Processing

3.2. Generative Adversarial Learning Principles

3.3. Design of Adversarial Learning Loss Function

3.4. Adversarial Learning Network Architecture

4. Overall Experimental Results and Comparative Analysis

4.1. Design and Testing of Adversarial Learning Model Training

4.2. Evaluation Metrics

4.3. Generalization Comparison Experiment

4.4. Harsh Underwater Environment Comparison Experiment

4.5. Ablation Experiment

4.6. Multidimensional Comprehensive Evaluation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI