MEvo-GAN: A Multi-Scale Evolutionary Generative Adversarial Network for Underwater Image Enhancement

Fu, Feiran; Liu, Peng; Shao, Zhen; Xu, Jing; Fang, Ming

doi:10.3390/jmse12071210

Open AccessArticle

MEvo-GAN: A Multi-Scale Evolutionary Generative Adversarial Network for Underwater Image Enhancement

by

Feiran Fu

^1,2

,

Peng Liu

^3,*

,

Zhen Shao

³,

Jing Xu

^2,3 and

Ming Fang

^1,2

¹

College of Artificial Intelligence, Changchun University of Science and Technology, Changchun 130022, China

²

Zhongshan Institute of Changchun University of Science and Technology, Zhongshan 528400, China

³

College of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(7), 1210; https://doi.org/10.3390/jmse12071210

Submission received: 10 June 2024 / Revised: 12 July 2024 / Accepted: 17 July 2024 / Published: 18 July 2024

(This article belongs to the Special Issue Application of Deep Learning in Underwater Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

In underwater imaging, achieving high-quality imagery is essential but challenging due to factors such as wavelength-dependent absorption and complex lighting dynamics. This paper introduces MEvo-GAN, a novel methodology designed to address these challenges by combining generative adversarial networks with genetic algorithms. The key innovation lies in the integration of genetic algorithm principles with multi-scale generator and discriminator structures in Generative Adversarial Networks (GANs). This approach enhances image details and structural integrity while significantly improving training stability. This combination enables more effective exploration and optimization of the solution space, leading to reduced oscillation, mitigated mode collapse, and smoother convergence to high-quality generative outcomes. By analyzing various public datasets in a quantitative and qualitative manner, the results confirm the effectiveness of MEvo-GAN in improving the clarity, color fidelity, and detail accuracy of underwater images. The results of the experiments on the UIEB dataset are remarkable, with MEvo-GAN attaining a Peak Signal-to-Noise Ratio (PSNR) of 21.2758, Structural Similarity Index (SSIM) of 0.8662, and Underwater Color Image Quality Evaluation (UCIQE) of 0.6597.

Keywords:

underwater image enhancement; color transfer; genetic algorithms

1. Introduction

The field of underwater imaging technology plays a vital role in numerous applications, including marine resource exploitation, marine ecological protection, and biodiversity monitoring. It is a fundamental component of marine information collection. Nevertheless, the underwater environment presents a number of significant challenges, including strong scattering, absorption, and background noise, which can degrade image quality by affecting contrast, sharpness, and color [1]. These challenges present significant barriers to the effective application of underwater imaging techniques and require advances in imaging technology to overcome them.

Deep learning-based methods are more effective than traditional methods at capturing useful information in underwater images and providing more accurate and adaptive enhancements. This is achieved by utilizing deep neural networks to learn complex features and map functions of the image. Generative adversarial networks, which are powerful deep learning models, have been successfully applied to underwater image enhancement tasks with remarkable results. These methods use end-to-end mapping without relying on any underwater imaging models and prior knowledge, are widely applicable, and achieve better results than traditional methods.

However, GAN training for underwater image enhancement is challenging. The typical low contrast, blurriness, and color distortion in underwater images lead to unstable training processes, resulting in images with low fidelity; insufficient texture detail; and, often, color bias.

Addressing these issues, this paper introduces a novel underwater image enhancement algorithm called Multi-scale Evolutionary Generative Adversarial Networks (MEvo-GAN). MEvo-GAN enhances the traditional GAN framework by improving the network’s loss function, integrating deep residual shrinkage network blocks, and employing multi-scale generative networks. This method effectively learns the mapping relationship between degraded and clear underwater images, capturing diverse scale features and details more comprehensively. It significantly improves image clarity, addressing low contrast, blurriness, and color distortion more efficiently. Additionally, the incorporation of a genetic algorithm stabilizes the training process by selecting the most fit offspring.

The contributions of MEvo-GAN are twofold. First, MEvo-GAN employs a multi-path approach in its generator and discriminator, a strategy crucial for capturing a broader range of features at different scales. This multi-scale processing enables the network to more effectively extract complex features inherent in underwater imagery, such as varying light patterns and obscure textures, thereby substantially improving the restoration of image details and structure. Secondly, to address the specificity of underwater images, underwater image quality metrics are also taken into account when evaluating the offspring. This makes the genetic algorithm integrated in MEvo-GAN play a key role in optimizing the generator parameters. Targeting this approach reflects the evolutionary process, selectively retaining and combining effective features passed on from generation to generation, thus improving the diversity and quality of the generated underwater images. Such optimization ensures a more nuanced adaptation to the unique challenges of underwater environments, enhancing the realism of the restored images.

In summary, these advances make MEvo-GAN a significant advancement in the field, providing powerful solutions to the complex challenges of underwater imaging and opening up new avenues for ocean exploration and research.

2. Related Work

2.1. Underwater Image Enhancement

With the rapid development in the fields of computer vision and image processing, researchers have continued to explore and improve methods for underwater image enhancement. The field encompasses a range of approaches, from traditional physics-based methods to contemporary deep learning techniques.

In the field of physics-based methods, researchers often employ physical models to simulate underwater light propagation, coupled with complicated mathematical operations for image restoration. One prominent example is the Dark Channel Prior (DCP) algorithm by He et al., which ingeniously utilizes the darkest points in hazy images to restore them, integrating physical models of image propagation [2]. These algorithms typically require the formulation of underwater imaging models, including the estimation of scattering light components and attenuation coefficients. But because of the complex and variable nature of the underwater environment, it is difficult to establish a precise model and estimate robust parameters. Building on the DCP framework, Chiang et al. introduced a novel method that amalgamates DCP with wavelength-dependent compensation, adeptly restoring color balance in underwater imagery [3]. Similarly, Galdran et al. developed an enhanced underwater image restoration algorithm, considering the distinct influences of natural and artificial light sources, by modifying the red channel/dark channel approach [4]. Drews et al. contributed to this field with their Underwater Dark Channel Prior (UDCP) algorithm, focusing on the attenuation characteristics of red light underwater [5]. Peng et al. expanded the DCP concept through their Generalized DCP (GDCP) algorithm, which incorporates adaptive color correction into the restoration model, offering more versatility in underwater image enhancement [6]. Hou et al. designed a variational model with an L0 norm term, constraint term, and gradient term by integrating the proposed ICSP into an extended underwater image formation model [7]. Despite the efficacy of these methods, they rely on heuristic enhancement strategies and specific prior knowledge, which renders them incapable of addressing the intricate and multifaceted degradation issues encountered in real-world underwater scenarios. Consequently, they are subject to inherent limitations.

In contrast, deep learning approaches, based on deep neural networks, have been shown to be particularly effective in the field of image enhancement. These methods are especially effective in capturing vital information from underwater images and providing accurate, adaptive enhancement. In recent years, generative adversarial networks have been widely used in this field. For instance, Yao et al. used a deep learning-based approach to solve the underwater image degradation problem. They constructed Gaussian pyramids of multiple dimensions to extract shallow features. Then, they enhanced the high-dimensional salient features using a VGG16-based progressive enhancement neural network [8]. Zhang et al. proposed an adversarial learning-based approach to enhance underwater images, addressing issues such as color casting. They also utilized pre-processing techniques and improvements in generative adversarial networks and evaluated their approach using public datasets [9]. The ECO-GAN method proposed by Jiang et al. successfully solves the problems of color distortion, low contrast, and motion blur in underwater images by means of an innovative generative adversarial network and a specific decoder design. This demonstrates its significant contribution and potential for extension in the field of underwater image enhancement [10]. Chen et al. introduced a hybrid restoration scheme that combines filtering techniques in the Fourier domain with GAN-based enhancement, demonstrating significant improvements in image quality [11]. In an innovative approach, Li et al. developed WaterGAN, a network that incorporates depth estimation and color correction modules, utilizing unsupervised learning to generate realistic underwater images from aerial image and depth pairings for color correction [12]. Yang’s contribution involves a CGAN-based approach using multi-scale generative networks and dual discriminator networks, specifically targeting underwater image distortion [13]. In addition, Li et al. proposed a new approach to improve the traditional loss function of CycleGAN, which provides a two-step learning strategy to enhance the performance of underwater images [14]. Cong et al. designed dual discriminators for the style-content adversarial constraint, promoting the authenticity and visual aesthetics of the results [15]. Wang et al. divided underwater enhanced images into different domains and utilized a feature vector to measure the distance from the raw image domain to each enhanced image domain [16]. In a further development, Li et al. designed a template-free color transfer learning framework for predicting transfer parameters, which are more easily captured and described [17].

GAN-based approaches have been demonstrated to outperform conventional methods in mapping degraded underwater images to visually clear outputs. However, the utilization of GANs in this context is not without challenges. Common issues include training instabilities, model collapse, and the necessity of significant time and computational resources. These problems can result in inaccurate color restoration and unclear images in underwater photography, necessitating further research and optimization in this area.

2.2. Genetic Algorithms with GAN

GAN-based methods still face problems such as training instability, mode collapse, and other problems, which restrict their application in the field of underwater image enhancement, particularly in the early stages. The adversarial process between the generator and the discriminator may result in local optimization. Mode collapse occurs when a generator is trapped in a specific pattern, producing similar samples uniformly, which leads to a lack of diversity and variability.

Evolutionary algorithms optimize GAN generators by simulating biological evolution processes like selection, crossover, and mutation. Combining evolutionary algorithms with GANs can improve stability and enhance the expressive capability of generators. Wang et al. proposed a novel framework named Evolutionary Generative Adversarial Networks (E-GANs), evolving a group of generators to compete against each other [18]. Experiments show that E-GANs can overcome the limitations of a single adversarial training objective and consistently retain well-performing offspring, further advancing GAN success and progress. Chen et al. introduced the CDE-GAN framework, integrating dual evolution of generators and discriminators into a unified evolutionary adversarial framework, utilizing their complementary properties and injecting dual mutation diversity during training, effectively conducting adversarial multi-objective optimization, stably capturing multi-modal estimated densities, and improving generative performance [19]. Mu et al. employed mutation operations from genetic algorithms, retaining well-performing generators for subsequent training, effectively capturing data distributions, and mitigating mode collapse in standard GANs [20]. He et al. proposed a multi-objective evolutionary algorithm driven by GANs, classifying parental solutions as real and fake samples to train GANs, then improving stability and the quality of training [21]. Zhang et al. presented a GAN based on the PSO algorithm to enhance training stability, particularly by improving the inertia weight of particle swarms and assessing generator performance, achieving notable results in face generation [22]. Liu et al. proposed EvoGAN, an evolutionary algorithm (EA)-assisted GAN method for generating various composite expressions, accurately generating target composite expressions [23]. Xue et al. incorporated evolutionary mechanisms into CycleGAN, continuously improving generator weight configurations and enhancing generated image quality and details through a channel attention mechanism [24].

Evolutionary mechanisms in generative adversarial network training can enhance stability, generative effects, and diversity. These improvements lead to increased efficiency and generative capacity in GANs. However, challenges remain, including selecting appropriate loss functions, designing effective network structures, and optimizing the efficiency of the evolutionary algorithms.

3. Proposed Method

Contemporary deep learning methodologies for underwater image enhancement encounter challenges in processing multi-scale underwater images, particularly in addressing the varying physical properties inherent at different scales. This can lead to noise and unwanted artifacts in the generation process, further reducing image clarity and negatively impacting subsequent visual tasks.

To address these issues, we introduce a novel underwater image enhancement method, MEvo-GAN. As depicted in Figure 1, the MEvo-GAN network comprises two generators, namely

G_{X \to Y}

and

G_{Y \to X}

, alongside two discriminators, namely DX and DY. Generator

G_{X \to Y}

is tasked with transforming degraded underwater images into clear counterparts, whereas

G_{Y \to X}

performs the inverse function. The discriminators, DX and DY, ascertain the authenticity of images produced by these generators.

In Figure 1, the double-headed arrows labeled “as close as possible” represent the various loss functions. These loss functions are designed to ensure that images transformed between domains maintain maximum similarity to the original version after a round-trip conversion. By minimizing the differences between the original and enhanced images, our approach achieves the ability to enhance images that are both sharp and retain the corresponding detail of the original image.

We propose a multi-scale generator that is more sensitive to image details. This multi-scale approach captures a wider range of features at different scales, enabling the network to more effectively extract complex features inherent in underwater images, such as changing light patterns and blurred underwater textures. The multi-scale generator achieves this by processing the input image through a number of different dilated convolution kernels, allowing it to focus on both fine detail and broader structural features, resulting in greatly improved recovery of image detail and structure.

Furthermore, by incorporating evolutionary algorithms, the parameters of the generator are gradually optimized during the training process. In each iteration, offspring with higher fitness are selected from the generators and used as parents for the next generation, progressively enhancing the network’s performance. This evolutionary process mirrors natural selection, improving the overall quality and robustness of the generated images.

3.1. Generators and Discriminators

The primary function of the generator is to convert degraded input images into clear underwater representations. For enhanced detail retrieval, the generator employs a multipath methodology in feature extraction, leveraging convolutional kernels of varied dimensions and sizes. This approach significantly mitigates computational load while concurrently augmenting processing speed. A deep residual contraction network is also used, which includes a deep residual network and a soft thresholding learning network function [25]. Soft thresholding is a nonlinear transformation method whose formula can be expressed as follows:

f (x) = \{\begin{matrix} x - λ, & x > λ \\ x + λ, & x < - λ \\ 0, & - λ \leq x \leq λ \end{matrix}

(1)

This function subtracts the absolute value of the signal from the threshold value, and when the absolute value is less than the threshold value, the output is zero. Such an operation effectively attenuates small fluctuations in the signal and retains larger signal changes, thus removing noise or unimportant fluctuations from the signal. Specifically, the soft thresholding function is applied after each residual block to ensure that all small noise fluctuations are effectively filtered out during feature extraction, while larger useful signals are retained. This approach not only helps to reduce noise but also enhances image details. After obtaining a series of thresholds, the soft threshold learning network achieves channel weighting, which reduces redundant information and helps to suppress the effects of noise. The innovation of these three path networks incorporates deep residual contraction networks in order to capture different levels of features and form a multi-scale, high-level semantic feature map.

The purpose of the discriminator is to differentiate whether the input enhanced images are real. The discriminator and generator engage in adversarial learning, prompting the generator to produce more realistic images, thereby improving the quality of the generated images. With a multi-scale discriminator structure, the global structure and local details of images are considered simultaneously, further enhancing the realism and fidelity of the generated images.

In summary, MEvo-GAN is able to extract valuable features from clear underwater images by utilizing multi-scale paths and depth residual shrinkage blocks, adeptly extracting valuable features from clear underwater images. These features are then reinfused into the generated images through a series of encoding, transforming, and decoding steps. Detailed generator and discriminator architecture as shown in Figure 2.

3.2. Genetic Algorithm

Genetic algorithms mimic the process of biological evolution by simulating the processes of natural selection, crossover, and mutation in order to optimize solutions and improve performance. There are three modules at the core of the genetic algorithm, namely G-Variations (mutation), G-Evaluation (evaluation), and G-Selection (selection). These modules are combined to optimize the parameters of the generator, as shown in Figure 3. With the genetic algorithm module, the network is able to gradually optimize the parameters of the generator during the training process, and the variations module enables the network to converge faster and adapt to different input situations, which enhances the stability and generalization of the network. The evaluation and selection modules combine underwater physical imaging characteristics to generate higher-quality and clearer underwater images.

In the evolutionary process, we randomly initialize a set of generators from an extensive parameter space. These generators constitute the initial set of parent generators. The parameters (

θ

) of each generator are chosen by random distribution to ensure diversity in the parameter space. The initial parent generators can be represented as follows:

{G_{X \to Y} (θ^{1}), G_{X \to Y} (θ^{2}), \dots, G_{X \to Y} (θ^{J})}, {G_{Y \to X} (θ^{1}), G_{Y \to X} (θ^{2}), \dots, G_{Y \to X} (θ^{J})},

where

θ^{j}

represents the parameter set for the (j)-th generator. Then, each parent generator (

G_{X \to Y} (θ^{j})

and

G_{Y \to X} (θ^{j})

) produces M offspring through variation, resulting in the following:

{G_{X \to Y} (θ_{1}^{j}), G_{X \to Y} (θ_{2}^{j}), \dots, G_{X \to Y} (θ_{M}^{J})}, {G_{Y \to X} (θ_{1}^{j}), G_{Y \to X} (θ_{2}^{j}), \dots, G_{Y \to X} (θ_{M}^{J})} .

where

θ_{m}^{j}

denotes the parameter set of the (m)-th offspring derived from the (j)-th parent. Thus, we generate a total of

J \times M

offspring generators for each generation. These offspring are then evaluated, and the best-performing ones are selected as the new parents for the next generation as follows:

{G_{X \to Y} (θ_{1}^{1}), G_{X \to Y} (θ_{1}^{2}), \dots, G_{X \to Y} (θ_{1}^{J})}, {G_{Y \to X} (θ_{1}^{1}), G_{Y \to X} (θ_{1}^{2}), \dots, G_{Y \to X} (θ_{1}^{J})} .

In the G-Variations module, three different mutation strategies are applied, each corresponding to different minimization objectives of the generator, namely G-minimax mutation, G-heuristic mutation, and G-least-square mutation, corresponding to a traditional GAN, Non-Saturated GAN (NS-GAN), and Least Squares GAN (LSGAN), respectively [26].

G-Minimax Mutation: This mutation strategy aims to minimize the difference between generated and real samples, making the generated samples more realistic. Its objective function is to minimize the Jensen–Shannon divergence between the generated samples and the real samples.

M_{G}^{minimax} = \frac{1}{2} E_{z \sim p_{z}} [log (1 - D (G_{θ} (z)))] .

(2)

In Equation (2) and the equations that follow, D is the discriminator,

G_{θ}

is the generator, z is the noise sample, and

p_{z}

is the noise distribution. It is evident that the G-minimax mutation is the most effective in terms of offspring development during the training process. However, this mutation fails when the discriminator can discriminate well between samples generated by the generator.

G-Heuristic variant: In contrast to the minimax mutation, the G-heuristic mutation is non-saturating when the discriminator effectively rejects the generated samples. This avoids the phenomenon of gradient vanishing.

M_{G}^{heuristic} = - \frac{1}{2} E_{z \sim p_{z}} [log (D (G_{θ} (z)))] .

(3)

Nevertheless, G-heuristic mutation may result in instability and fluctuations in generative quality due to the pushing apart of data and model distributions.

G-Least-Square variant: The G-least-square mutation is effective in preventing gradient vanishing when the discriminator easily recognizes the generated samples. In addition, G-least-square mutations do not impose extremely high penalties for generating false samples, nor do they impose extremely low penalties for pattern dropping, thus helping to avoid mode collapse.

M_{G}^{ls} = E_{z \sim p_{z}} [{(D (G_{θ} (z)) - 1)}^{2}] .

(4)

Thus, three different mutations provide multiple training strategies for the generator. The overall goal is to optimize the generator’s parameters for effective and high-quality underwater image generation.

Then, each of the initial generators is evaluated using the G-Evaluation module. In the G-Evaluation module, we assess the quality and diversity of individual generators and decide the parents for the next generation. We introduce the Quality Fitness Score (

F G_{q}

), Underwater Image Quality Score (

F G_{u}

), and Diversity Fitness Score (

F G_{d}

), which consider the quality and diversity of the produced samples. The goal is to make informed decisions about the selection of parents for the next generation based on these evaluations.

Quality Fitness Score (

F G_{q}

): This score is used to evaluate the quality of generated samples by calculating the cumulative output of the generated samples across multiple discriminators.

F G_{q} = E_{z \sim p z} [D (G_{θ} (z))] .

(5)

The Quality Fitness Score (

F G_{q}

) measures the acceptance level of the generated samples across the discriminators, representing how well the generator’s samples conform to the real distribution.

Underwater Image Quality Score (

F G_{u}

): The UCIQE (Underwater Color Image Quality Evaluation) is a metric specifically designed to assess the quality of underwater images [27]. It analyzes the color, clarity, and contrast of images to measure image quality. A higher overall UCIQE value indicates clearer images with higher contrast, more details, and better restoration effects.

Diversity Fitness Score (

F G_{d}

): This score is primarily used to assess the diversity of generated samples, i.e., the difference between the distributions of generated samples and real samples [28]. The formula for the diversity fitness score is expresses as follows:

\begin{matrix} F G_{d} & = - log | | \nabla_{D} - E_{x \sim p_{d a t a}} [log D (x)] \\ - E_{z \sim p_{z}} [log (1 - D (G_{θ} (z)))] | | . \end{matrix}

(6)

The Comprehensive Fitness Score (FG) combines the Underwater Image Quality Score and the Diversity Fitness Score to holistically evaluate the performance of individual generators as follows:

F G = F G_{q} + γ F G_{d} + η F G_{u},

(7)

where

γ

and

η

are weight coefficients used to balance the Underwater Image Quality Score and the Diversity Fitness Score. The Comprehensive Fitness Score is used to evaluate the performance of individual generators, determining which generators will be selected as parents for the next generation and continuously optimizing the generator’s parameters during the evolutionary process. This approach is in line with the evolutionary principles, guiding the selection of parents for the next generation in a way that enhances both the quality and diversity of the generated underwater images.

In the G-Selection module, the next generation’s parents are determined by comparing the fitness scores of individual generators. We use the (

μ

,

λ

) selection strategy, which is a variant of the selection process in evolutionary algorithms. This strategy balances exploration (

μ

: parent population size) and exploitation (

λ

: offspring population size) by selecting the best individuals from both parent and offspring populations.

After sorting, J individuals possessing the maximum fitness score can survive for the next evolution during adversarial training. This process is formulated as follows:

θ^{1}, θ^{2}, \dots, θ^{J} \leftarrow θ_{1}^{1}, θ_{1}^{2}, \dots, θ_{1}^{J} .

(8)

The pseudo-code of genetic algorithm involved in MEvo-GAN is shown in Algorithm 1.

Algorithm 1 The algorithm of MEvo-GAN

Require:: The generators $G_{X \to Y}, G_{Y \to X}$ ; the discriminators $D_{X}, D_{Y}$ ; the number of iterations T; the number of parents for Generators J; the number of mutations for each generator M; the hyper-parameter $γ, η$ of fitness function of Generators.
1:: Initialize $G_{X \to Y}$ ’s parameter ${G_{X \to Y} (θ^{1}), G_{X \to Y} (θ^{2}), \dots, G_{X \to Y} (θ^{J})}$ , initialize $G_{Y \to X}$ ’s parameter ${G_{Y \to X} (θ^{1}), G_{Y \to X} (θ^{2}), \dots, G_{Y \to X} (θ^{J})}$ .
2:: Initialize $D_{X}, D_{Y}$ parameters.
3:: for $t = 1$ to T do
4:: Sample a batch of $x_{r e a l X} \sim P_{d a t a}$ .
5:: Sample a batch of $z \sim P_{z}$ , and generate a batch of $x_{f a k e}$ with Generators.
6:: Update $D_{X}, D_{Y}$ parameters.
7:: end for
8:: for $j = 1$ to J do
9:: Sample a batch of $z \sim P_{z}$ .
10:: for $m = 1$ to M do
11:: $G_{X \to Y} (θ^{j})$ and $G_{Y \to X} (θ^{j})$ produce offspring $G_{X \to Y} (θ_{m}^{j})$ and $G_{Y \to X} (θ_{m}^{j})$ via Equation (2), Equation (3), and Equation (4) respectively.
12:: end for
13:: end for
14:: Evaluate the $J \times M$ evolved offspring of Generators via Equation (7).
15:: Select the best-performing offspring ${G_{X \to Y} (θ_{1}^{1}), G_{X \to Y} (θ_{1}^{2}), \dots, G_{X \to Y} (θ_{1}^{J})}$ for $G_{X \to Y}$ and ${G_{Y \to X} (θ_{1}^{1}), G_{Y \to X} (θ_{1}^{2}), \dots, G_{Y \to X} (θ_{1}^{J})}$ for $G_{Y \to X}$ as the next generation’s parents of Generators.

3.3. Loss Functions

The loss functions comprise four main components, each playing different roles in training the generator and discriminator. By balancing these components, the generator is guided to produce the desired transformation results.

(1): Adversarial loss is primarily used to train the generator and discriminator, enabling the generator to create realistic target-domain images and allowing the discriminator to distinguish between generated and real images. The loss function for the generator G is expressed as follows:

$L_{adv} (G, D_{Y}) = E_{x \sim p_{data} (x)} [log D_{Y} (G (x))] .$

(9)

The loss function of the discriminator (D) usually consists of the following two parts: the loss for generated images and the loss for real images. The goal of these losses is to enable the discriminator to accurately distinguish between generated and real images. The loss function is expressed as follows:

\begin{matrix} L_{adv} (D_{Y}, G) & = - E_{y \sim p_{data} (y)} [log D_{Y} (y)] \\ - E_{x \sim p_{data} (x)} [log (1 - D_{Y} (G (x)))] . \end{matrix}

(10)

This indicates that the discriminator (D) aims to maximize this loss function while the generator (G) seeks to minimize it. By alternating optimization of the generator and discriminator’s losses during training, the generator gradually produces more realistic images, and the discriminator improves its discrimination capability. This adversarial training process leads to the generation of high-quality images.

(2): Cycle consistency loss ensures that an image, after being transformed by the generator then reversed back, maintains its original form. This helps the generator learn the mapping between the source and target domains and prevents mode collapse. Cycle consistency loss consists of two parts—for transformations from the source to target domain and vice versa.

$\begin{matrix} L_{c y c} (G, F) & = E_{X \sim P_{d a t s} (x)} {[∥ F (G (x)) - x ∥}_{1}] \\ + E_{y \sim P_{d a t a} (y)} {[∥ G (F (y)) - y ∥}_{1}] . \end{matrix}$

(11)
(3): Identity consistency loss ensures that the input image retains its own characteristics after being transformed by the generator, i.e., the input and generated images are similar to a certain extent. This helps reduce information loss during image transformation.

$\begin{matrix} L_{idt} (G, F) = & E_{y \sim p_{data} (y)} {[‖ G (y) - y ‖}_{1}] \\ + E_{x \sim p_{data} (x)} {[‖ F (x) - x ‖}_{1}] . \end{matrix}$

(12)
(4): To further improve image quality, perceptual loss is introduced to reduce detail loss, improve image blur, and make enhanced images more realistic. The VGG network is trained on large-scale datasets such as ImageNet, making it visually perceptive for feature extraction. The use of VGG loss ensures that the generated images are visually perceived to be consistent with the real images, thus enhancing the subjective quality of the images.

$L_{VGG} = \sum_{l = 1}^{L} w_{l} {‖ ϕ_{l} (G (x)) - ϕ_{l} (y) ‖}_{1} .$

(13)

Both the generated image (G(x)) and the target image (y) are input into the VGG network to extract feature mappings at various layers. Then, the L1 distance between these mappings is calculated as the VGG loss. By minimizing this loss, results closer to the real image in terms of perception are obtained.

\begin{matrix} L_{G} = & L_{adv} (G, D_{Y}) + λ_{cyc} L_{cyc} (G, F) & + λ_{idt} L_{idt} (G, F) + λ_{vgg} L_{vgg} (G, F) . \end{matrix}

(14)

Here,

λ_{c y c}

,

λ_{i d t}

, and

λ_{v g g}

are hyperparameters controlling the weights of cycle consistency, identity consistency, and perceptual losses, respectively. The overall generator loss balances these various parts, guiding the generator to learn the necessary transformations.

4. Experimental Results and Analysis

4.1. Datasets

We used publicly available underwater image enhancement datasets EUVP [29], UIEB [30], and UFO-120 [31]. These datasets were carefully chosen for their diverse characteristics, allowing us to comprehensively train and test MEvo-GAN across a range of underwater imaging conditions.

The EUVP dataset includes a wide range of underwater images, both paired and unpaired. These images were taken with seven different cameras and cover various scenarios, such as marine exploration and human–robot cooperation. The dataset is diverse in terms of visibility conditions and locations, making it a realistic representation of underwater environments. It also includes images from public YouTube videos, showcasing different water types and lighting conditions. The EUVP dataset is divided into the following three subsets: synthesized underwater dark-scene images, degraded underwater images generated using ImageNet, and authentic underwater-scene images. We randomly selected 80% of the dataset for training and kept the remaining 20% for testing.

The UIEB dataset encompasses 890 pairs of underwater images, each vividly illustrating various underwater scene degradations, such as insufficient lighting and blurriness. Unique to this dataset is that each image pair includes an original, unenhanced image alongside a high-quality reference image. These reference images were carefully curated and enhanced using various algorithms, providing a valuable benchmark for image quality. In line with our data handling protocol, 80% of the UIEB dataset was randomly selected for the training of MEvo-GAN, with the remaining 20% allocated for testing.

Lastly, the UFO-120 dataset, comprising 120 underwater light fields, primarily consists of images captured across different marine environments and time periods. This dataset highlights the complexity and diversity inherent in underwater environments, making it an ideal tool for testing the adaptability of MEvo-GAN. Unlike the other datasets, UFO-120 is primarily utilized for testing, providing a robust platform to evaluate the effectiveness of MEvo-GAN in real-world scenarios. By training and testing MEvo-GAN with these diverse datasets, we gained a comprehensive understanding of the algorithm’s performance and its potential for practical application in the processing of images of real-world marine environments.

4.2. Training Details

In our network training setup, we categorized the images into the following two sets: degraded underwater images in the TrainA folder and clearer counterparts in the TrainB folder. This organization streamlined the training process without separating generator and discriminator training phases. To optimize the training for both speed and memory efficiency, we adjusted the input sample size to a resolution of 256 × 256 pixels. Moreover, we set the batch size to 1 and defined the training duration as 200 epochs, balancing computational demands with performance. The implemented evolutionary algorithm included a specification of three offspring per generation, coupled with the adaptiveness parameters, set at values of 1 and 0.1, respectively. These settings were chosen to effectively balance exploration and exploitation in the learning process. For visual analysis and progress tracking, we employed the Visidom tool, which enabled us to periodically save and visualize the reconstruction results every five iterations. This approach provided a more intuitive monitoring of the network’s learning trajectory. During the testing phase of the network model, we designed the system to allow for flexible adjustment of the input sample size to accommodate various testing scenarios. The initialization of parameters was conducted using the Kaiming algorithm, a method known for its effectiveness in neural network initialization. In all our experiments, we utilized the Adam optimizer, a widely used optimization algorithm in machine learning, setting the initial learning rates for the generator and discriminator at 1

\times 10^{- 3}

and 2

\times 10^{- 3}

, respectively, to achieve a balanced optimization. Training parameters

λ_{v g g}

,

λ_{c y c}

, and

λ_{i d t}

were meticulously set at 1, 12, and 0.6, respectively, after careful consideration of their impact on the network’s performance in terms of feature extraction, cycle consistency, and identity mapping.

4.3. Comparison of Visual Quality of Enhancement

In this section, detailed experimental results of MEvo-GAN are provided and compared with existing underwater enhancement algorithms. Tests included the EUVP, UIEB, and UFO-120 datasets, demonstrating MEvo-GAN’s performance in various underwater environments.

The color chart recovery test evaluated MEvo-GAN’s effectiveness in underwater image color correction using color chart recovery. Based on a distortion-free color chart that undergoes color degradation due to complex underwater imaging environments, the processing of degraded images validated the method’s color restoration effectiveness. Color Fidelity Error (CFE) was used to quantify results, measuring the color difference between enhanced and original color chart images. A lower CFE value indicates better color restoration. The results of a comparison of MEvo-GAN with classical methods are shown in Figure 4.

In the color chart recovery test, UDnet [32] generally resulted in darker images. CWR [33] and FunieGAN [34] caused intermingling of color tones, with some overexposure effects affecting actual perception. Shallow-UWnet [35] and URSCT [36] produced images with a general grayish tone, with shallow color information recovery. UGAN [37] and WaterNet restored the color chart image more naturally but with lower distinction in the same color series. RAUnet [38] showed natural color restoration but with uneven color in some blocks. In contrast, MEvo-GAN displayed bright colors with clear distinction among various color series in the color chart images, such as higher contrast in dark blocks, closely resembling the real color chart, offering a good visual effect, and having the smallest CFE index, making it closer to the standard color chart image. Comparative images before and after enhancement are shown in Figure 5. FunieGAN, CWR, and UDnet increased the brightness of enhanced images but were not effective in removing color bias, as especially noticeable in some images with areas of over-enhancement leading to color distortion. Shallow-UWnet and WaterNet effectively removed color bias but were not as effective in removing blurriness. UGAN and URSCT were effective in removing color bias in underwater images, making the colors more natural and maintaining brightness well. However, compared to MEvo-GAN, their restored colors were still not as realistic and vivid. In some areas, URSCT-treated images still had slight blurriness or obstructions, not achieving complete clarity. RAUnet appeared to restore colors naturally without significant color distortion. Clarity and contrast were moderately improved, but there was still room for improvement in some areas. In contrast, MEvo-GAN not only successfully removed color bias and blurriness but also excellently restored image brightness and details. The color restoration appeared both natural and vivid, especially in red and blue recovery, making underwater images truer to life and clear.

Then, we compared MEvo-GAN with other mainstream underwater image enhancement methods in terms of various evaluation metrics. As shown in Table 1, MEvo-GAN demonstrated superior performance in metrics like PSNR, SSIM, and UCIQE, especially excelling in the UCIQE index, indicating its significant advantage in improving the overall quality of underwater images that other methods did not have.

4.4. Multi-Scale Visualization

MEvo-GAN implements a multi-branch architecture that integrates convolutional kernels of various sizes and layers of different depths to capture features on multiple scales, thus enhancing image detail. Specifically, the model contains several sub-models, like conv1, conv2, conv3, and conv4, which apply 3 × 3, 1 × 1, and 5 × 5 dilated convolutional kernels, capturing local details and comprehensive contextual information from larger areas. Additionally, the inclusion of a deep residual shrinkage network helps further improve feature extraction efficiency.

Figure 6 illustrates the feature maps generated by these sub-models. A detailed observation of these maps reveals the specialized functions of each sub-model. Conv1 is adept at extracting texture and structural information, playing a pivotal role in restoring details that are often lost in underwater haziness, particularly around object edges and textures. Conv2 is tailored to the extraction of local features, thereby sharpening image detail and enhancing contrast. This enhancement is crucial in making underwater objects more discernible and visually striking. Conv3 produces a binarized effect, concentrating mainly on prominent contours and shapes within the image. This functionality is key to improving the distinction between foreground and background elements, thus highlighting the subject matter more effectively. Conv4, on the other hand, is primarily responsible for capturing color information and luminance levels. This capability is vital for reinstating the original colors of underwater images and for optimizing their dynamic range. The ‘result’ feature map is a synthesis of the outputs from these four sub-models. This collective integration harnesses their individual strengths, leading to a marked enhancement in image details, contrast, saturation, and color fidelity. When compared to the final ‘output’, it is evident that this structured, multi-scale approach significantly enriches the quality and realism of the output images.

4.5. Ablation Study

To rigorously evaluate the effectiveness of MEvo-GAN and the contributions of its individual components, we executed an ablation study. This process involved the sequential removal of critical elements within MEvo-GAN, namely the multi-scale network, the evolutionary mechanism, and the VGG loss function.

As shown in Table 2, the intact MEvo-GAN configuration demonstrates superior performance across all evaluated metrics, clearly highlighting the significant contributions of the integrated components. Notably, the ablation experiments brought to light instances of gradient explosion in configurations lacking the evolutionary mechanism. This finding underscores the mechanism’s pivotal role in bolstering training stability, as depicted in Figure 7.

Additionally, we employed the SIFT detection algorithm to compare feature point matching before and after image enhancement, as illustrated in Figure 8. When using the SIFT detection algorithm, a higher number of matched feature points indicates that the generation process preserves many key features of the original image, resulting in higher image clarity. The data presented in Figure 8 demonstrate that MEvo-GAN exhibits superior feature-point retention capability.

Moreover, the omission of either the multi-scale network or the VGG perceptual loss function markedly diminished the quality of the resultant images. Specifically, in real underwater environments, images produced without the multi-scale network exhibited noticeable blurriness, particularly in finer details. Similarly, the absence of VGG perceptual loss led to issues like excessive color saturation and texture loss. MSE and MAE losses often result in the generated image being too smooth and lacking in detail and texture. VGG losses retain more detail and texture information, making the generated image visually sharper and more realistic. In contrast, the complete MEvo-GAN architecture synergistically combines these components to yield images that closely resemble real underwater scenes in color accuracy and detail clarity. The Visual comparison of the enhancement effects of different ablation models is shown in Figure 9.

The results from the ablation study affirm the significant contribution of each component in MEvo-GAN towards its overall efficacy in underwater image enhancement tasks. A notable highlight is MEvo-GAN’s exceptional performance in the Underwater Color Image Quality Evaluation (UCIQE) metric, where it substantially outperforms existing methods. This achievement underscores MEvo-GAN’s advanced capability in significantly enhancing underwater image quality by effectively reducing color biases and blurriness, improving brightness, and restoring intricate details.

5. Conclusions

The Multi-scale Evolutionary Generative Adversarial Network (MEvo-GAN) has demonstrated remarkable capabilities in enhancing underwater images. Its innovative integration of adversarial learning with evolutionary strategies enables effective multi-scale image processing and optimization. This approach has led to substantial improvements in the visual quality of underwater images. Compared to previously popular methods, MEvo-GAN performed particularly well in the UCIQE metric, which comprehensively evaluates multiple aspects of the image, such as color balance, contrast, and clarity. This further verifies the significant advantages of MEvo-GAN in enhancing the quality of underwater images.

Notably, while MEvo-GAN exhibits exceptional proficiency in color reproduction, it is recognized that the full spectrum of its capabilities in enhancing overall underwater image quality is yet to be fully tapped. Future research endeavors will focus on this aspect, aiming to further elevate the model’s efficacy. This will involve delving into optimization of the model structure and integrating attention mechanisms to refine the restoration of image details.

In summary, MEvo-GAN represents a significant stride forward in the realm of underwater image enhancement, thanks to its synergistic use of deep learning and evolutionary strategies. Building upon the robust framework of MEvo-GAN, our future objectives are twofold—to extend the application of this methodology into a broader spectrum of related tasks and to refine the quality of training datasets specific to MEvo-GAN. Specifically, through meticulous selection and fine tuning of hyperparameters such as

λ_{v g g}

,

λ_{c y c}

, and

λ_{i d t}

, alongside the incorporation of additional evolutionary algorithms, we anticipate further enhancement of MEvo-GAN’s performance and elevation of the level of detail and overall image quality. This strategic approach is anticipated to substantially enhance the overall visual quality and efficiency of MEvo-GAN. Through these advancements, our aim is to not only elevate MEvo-GAN’s current capabilities but also to expand its range of practical applications. Such developments are expected to contribute significantly to the field of image processing, highlighting MEvo-GAN’s role as a versatile and impactful tool in this domain.

Author Contributions

Conceptualization, F.F.; Funding acquisition, M.F.; Methodology, F.F.; Project administration, F.F. and M.F.; Resources, J.X.; Software, P.L.; Supervision, Z.S.; Writing—original draft, P.L.; Writing—review and editing, P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by the Education Department of Jilin Province (No. JJKH20220770KJ), Jilin Provincial Scientific and Technological Development Program (No. JYDZJ202301ZYTS411), and the Zhongshan Science and Technology Bureau introduced scientific research and innovation team projects (No. CXTD2023005).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Correction Statement

This article has been republished with a minor correction to resolve spelling and grammatical errors. This change does not affect the scientific content of the article.

References

Liang, Z.; Zhang, W.; Ruan, R.; Zhuang, P.; Xie, X.; Li, C. Underwater image quality improvement via color, detail, and contrast restoration. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 1726–1742. [Google Scholar] [CrossRef]
He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
Chiang, J.Y.; Chen, Y.C. Underwater image enhancement by wavelength compensation and dehazing. IEEE Trans. Image Process. 2011, 21, 1756–1769. [Google Scholar] [CrossRef] [PubMed]
Galdran, A.; Pardo, D.; Picón, A.; Alvarez-Gila, A. Automatic red-channel underwater image restoration. J. Vis. Commun. Image Represent. 2015, 26, 132–145. [Google Scholar] [CrossRef]
Drews, P.L.; Nascimento, E.R.; Botelho, S.S.; Campos, M.F.M. Underwater depth estimation and image restoration based on single images. IEEE Comput. Graph. Appl. 2016, 36, 24–35. [Google Scholar] [CrossRef]
Peng, Y.T.; Cosman, P.C. Underwater image restoration based on image blurriness and light absorption. IEEE Trans. Image Process. 2017, 26, 1579–1594. [Google Scholar] [CrossRef] [PubMed]
Hou, G.; Li, N.; Zhuang, P.; Li, K.; Sun, H.; Li, C. Non-uniform illumination underwater image restoration via illumination channel sparsity prior. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 799–814. [Google Scholar] [CrossRef]
Yao, X.; He, F.; Wang, B. Deep learning-based recurrent neural network for underwater image enhancement. In Proceedings of the Sixth Conference on Frontiers in Optical Imaging and Technology: Imaging Detection and Target Recognition, Nanjing, China, 30 April 2024; Volume 13156, pp. 368–378. [Google Scholar]
Zhang, M.; Li, Y.; Yu, W. Underwater Image Enhancement Algorithm Based on Adversarial Training. Electronics 2024, 13, 2184. [Google Scholar] [CrossRef]
Jiang, X.; Yu, H.; Zhang, Y.; Pan, M.; Li, Z.; Liu, J.; Lv, S. An underwater image enhancement method for a preprocessing framework based on generative adversarial network. Sensors 2023, 23, 5774. [Google Scholar] [CrossRef]
Guo, Y.; Li, H.; Zhuang, P. Underwater image enhancement using a multiscale dense generative adversarial network. IEEE J. Ocean. Eng. 2019, 45, 862–870. [Google Scholar] [CrossRef]
Li, J.; Skinner, K.A.; Eustice, R.M.; Johnson-Roberson, M. WaterGAN: Unsupervised generative network to enable real-time color correction of monocular underwater images. IEEE Robot. Autom. Lett. 2017, 3, 387–394. [Google Scholar] [CrossRef]
Yang, Y.; Lu, H. Single image deraining using a recurrent multi-scale aggregation and enhancement network. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; pp. 1378–1383. [Google Scholar]
Li, Q.Z.; Bai, W.X.; Niu, J. Underwater image color correction and enhancement based on improved cycle-consistent generative adversarial networks. Acta Autom. Sin. 2023, 49, 820–829. [Google Scholar]
Cong, R.; Yang, W.; Zhang, W.; Li, C.; Guo, C.L.; Huang, Q.; Kwong, S. Pugan: Physical model-guided underwater image enhancement using gan with dual-discriminators. IEEE Trans. Image Process. 2023, 32, 4472–4485. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Shen, L.; Wang, Z.; Lin, Y.; Jin, Y. Generation-based joint luminance-chrominance learning for underwater image quality assessment. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 1123–1139. [Google Scholar] [CrossRef]
Li, K.; Fan, H.; Qi, Q.; Yan, C.; Sun, K.; Wu, Q.J. TCTL-Net: Template-free Color Transfer Learning for Self-Attention Driven Underwater Image Enhancement. IEEE Trans. Circuits Syst. Video Technol. 2023. [Google Scholar] [CrossRef]
Wang, C.; Xu, C.; Yao, X.; Tao, D. Evolutionary Generative Adversarial Networks. IEEE Trans. Evol. Comput. 2019, 23, 921–934. [Google Scholar] [CrossRef]
Chen, S.; Wang, W.; Xia, B.; You, X.; Peng, Q.; Cao, Z.; Ding, W. CDE-GAN: Cooperative dual evolution-based generative adversarial network. IEEE Trans. Evol. Comput. 2021, 25, 986–1000. [Google Scholar] [CrossRef]
Mu, J.; Zhou, Y.; Cao, S.; Zhang, Y.; Liu, Z. Enhanced evolutionary generative adversarial networks. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 7534–7539. [Google Scholar]
He, C.; Huang, S.; Cheng, R.; Tan, K.C.; Jin, Y. Evolutionary multiobjective optimization driven by generative adversarial networks (GANs). IEEE Trans. Cybern. 2020, 51, 3129–3142. [Google Scholar] [CrossRef] [PubMed]
Zhang, L.; Zhao, L. High-quality face image generation using particle swarm optimization-based generative adversarial networks. Future Gener. Comput. Syst. 2021, 122, 98–104. [Google Scholar] [CrossRef]
Liu, F.; Wang, H.; Zhang, J.; Fu, Z.; Zhou, A.; Qi, J.; Li, Z. EvoGAN: An evolutionary computation assisted GAN. Neurocomputing 2022, 469, 81–90. [Google Scholar] [CrossRef]
Xue, Y.; Zhang, Y.; Neri, F. A method based on evolutionary algorithms and channel attention mechanism to enhance cycle generative adversarial network performance for image translation. Int. J. Neural Syst. 2023, 33, 2350026. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Chen, L.; Zhang, C.; Shi, H.; Li, H. GMA-DRSNs: A novel fault diagnosis method with global multi-attention deep residual shrinkage networks. Measurement 2022, 196, 111203. [Google Scholar] [CrossRef]
Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
Nagarajan, V.; Kolter, J.Z. Gradient descent GAN optimization is locally stable. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed]
Yuzhen, L.; Meiyi, L.; Sen, L.; Zhiyong, T. Underwater Image Enhancement Based on Multi-Scale Feature Fusion and Attention Network. J. Comput.-Aided Des. Comput. Graph. 2023, 35, 685–695. [Google Scholar]
Islam, M.J.; Luo, P.; Sattar, J. Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception. arXiv 2020, arXiv:2002.01155. [Google Scholar]
Saleh, A.; Sheaves, M.; Jerry, D.; Azghadi, M.R. Adaptive uncertainty distribution in deep learning for unsupervised underwater image enhancement. arXiv 2022, arXiv:2212.08983. [Google Scholar]
Han, J.; Shoeiby, M.; Malthus, T.; Botha, E.; Anstee, J.; Anwar, S.; Wei, R.; Petersson, L.; Armin, M.A. Single underwater image restoration by contrastive learning. In Proceedings of the 2021 IEEE iNternational Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 2385–2388. [Google Scholar]
Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
Naik, A.; Swarnakar, A.; Mittal, K. Shallow-uwnet: Compressed model for underwater image enhancement (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 15853–15854. [Google Scholar]
Ren, T.; Xu, H.; Jiang, G.; Yu, M.; Luo, T. Reinforced swin-convs transformer for underwater image enhancement. arXiv 2022, arXiv:2205.00434. [Google Scholar]
Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing underwater imagery using generative adversarial networks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar]
Peng, W.; Zhou, C.; Hu, R.; Cao, J.; Liu, Y. RAUNE-Net: A Residual and Attention-Driven Underwater Image Enhancement Method. arXiv 2023, arXiv:2311.00246. [Google Scholar]

Figure 1. General MEvo-GAN architecture.

Figure 2. Detailed generator and discriminator architecture.

Figure 3. General genetic algorithm architecture.

Figure 4. Results of the 9-method color-card recovery experiment (CFE indicator in the upper left of the image).

Figure 5. Visual comparison of enhancements of images from the EUVP, UIEB, and UFO-120 datasets.

Figure 6. Visualization of feature maps.

Figure 7. Example of gradient explosion phenomenon during training without integration of evolutionary mechanism. (a) Original image; (b) training image without incorporation of evolutionary mechanism; (c) MEvo-GAN.

Figure 8. Enhancement effect feature point matching for different ablation models and the full MEvo-GAN model. (a) −w/o-VGG loss; (b) −w/o-multiscale network; (c) MEvo-GAN.

Figure 9. Visual comparison of the enhancement effects of different ablation models and the full MEvo-GAN model. (a) Original image; (b) −w/o-VGG loss; (c) −w/o-multiscale network; (d) MEvo-GAN.

Table 1. Comparison on UIEBD, EUVP, and UFO-120 datasets. The best and second-best scores are indicated in red and blue, respectively.

Method	UIEBD			EUVP			UFO-120
Metric	PSNR	SSIM	UCIQE	PSNR	SSIM	UCIQE	PSNR	SSIM	UCIQE
UGAN	19.4947	0.7496	0.6476	19.6102	0.8131	0.6169	23.1764	0.7959	0.6487
WaterNet	21.1659	0.8290	0.6414	19.4398	0.8492	0.6628	19.6768	0.7704	0.6373
FunieGAN	16.3028	0.7045	0.6434	20.3005	0.7721	0.6451	23.4593	0.7959	0.6487
CWR	16.8157	0.7451	0.5334	16.2670	0.6820	0.6230	16.3482	0.6120	0.6346
Shallow-UWnet	16.9228	0.6857	0.5457	18.9380	0.8288	0.5367	22.2391	0.7796	0.5682
UDnet	18.3965	0.7959	0.5509	20.0486	0.8251	0.5594	19.4468	0.7560	0.6206
URSCT	17.8031	0.6609	0.5432	17.1730	0.8114	0.4231	21.3893	0.7930	0.4314
RAUnet	22.9179	0.8148	0.6467	19.9144	0.8092	0.5809	24.0392	0.8224	0.5961
Ours	21.2758	0.8662	0.6597	20.0502	0.8255	0.6727	19.4011	0.7989	0.7001

Table 2. Comparison of different models on PSNR, SSIM, and UCIQE metrics.

Model	PSNR	SSIM	UCIQE
−w/o multiscale network	19.0107	0.7950	0.6053
−w/o Evo mechanism	20.0486	0.8352	0.5694
−w/o VGG loss	20.8675	0.8251	0.6420
MEvo-GAN	21.2758	0.8662	0.6597

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fu, F.; Liu, P.; Shao, Z.; Xu, J.; Fang, M. MEvo-GAN: A Multi-Scale Evolutionary Generative Adversarial Network for Underwater Image Enhancement. J. Mar. Sci. Eng. 2024, 12, 1210. https://doi.org/10.3390/jmse12071210

AMA Style

Fu F, Liu P, Shao Z, Xu J, Fang M. MEvo-GAN: A Multi-Scale Evolutionary Generative Adversarial Network for Underwater Image Enhancement. Journal of Marine Science and Engineering. 2024; 12(7):1210. https://doi.org/10.3390/jmse12071210

Chicago/Turabian Style

Fu, Feiran, Peng Liu, Zhen Shao, Jing Xu, and Ming Fang. 2024. "MEvo-GAN: A Multi-Scale Evolutionary Generative Adversarial Network for Underwater Image Enhancement" Journal of Marine Science and Engineering 12, no. 7: 1210. https://doi.org/10.3390/jmse12071210

APA Style

Fu, F., Liu, P., Shao, Z., Xu, J., & Fang, M. (2024). MEvo-GAN: A Multi-Scale Evolutionary Generative Adversarial Network for Underwater Image Enhancement. Journal of Marine Science and Engineering, 12(7), 1210. https://doi.org/10.3390/jmse12071210

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MEvo-GAN: A Multi-Scale Evolutionary Generative Adversarial Network for Underwater Image Enhancement

Abstract

1. Introduction

2. Related Work

2.1. Underwater Image Enhancement

2.2. Genetic Algorithms with GAN

3. Proposed Method

3.1. Generators and Discriminators

3.2. Genetic Algorithm

3.3. Loss Functions

4. Experimental Results and Analysis

4.1. Datasets

4.2. Training Details

4.3. Comparison of Visual Quality of Enhancement

4.4. Multi-Scale Visualization

4.5. Ablation Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Correction Statement

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI