AMSMC-UGAN: Adaptive Multi-Scale Multi-Color Space Underwater Image Enhancement with GAN-Physics Fusion

Chao, Dong; Li, Zhenming; Zhu, Wenbo; Li, Haibing; Zheng, Bing; Zhang, Zhongbo; Fu, Weijie

doi:10.3390/math12101551

Open AccessArticle

AMSMC-UGAN: Adaptive Multi-Scale Multi-Color Space Underwater Image Enhancement with GAN-Physics Fusion

by

Dong Chao

^1,2,3,

Zhenming Li

⁴,

Wenbo Zhu

^4,*

,

Haibing Li

⁴,

Bing Zheng

^1,2,3,

Zhongbo Zhang

⁴ and

Weijie Fu

⁴

¹

Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519000, China

²

South China Sea Marine Survey Center, Ministry of Natural Resources of the People’s Republic of China, Guangzhou 510300, China

³

Key Laboratory of Marine Environmental Survey Technology and Application, Ministry of Natural Resources of the People’s Republic of China, Guangzhou 510300, China

⁴

College of Mechanical Engineering and Automation, Foshan University, Foshan 528200, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(10), 1551; https://doi.org/10.3390/math12101551

Submission received: 22 April 2024 / Revised: 10 May 2024 / Accepted: 13 May 2024 / Published: 16 May 2024

(This article belongs to the Special Issue Mathematics for Visual Computing: Acquisition, Processing, Analysis and Rendering of Visual Information)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Underwater vision technology is crucial for marine exploration, aquaculture, and environmental monitoring. However, the challenging underwater conditions, including light attenuation, color distortion, reduced contrast, and blurring, pose difficulties. Current deep learning models and traditional image enhancement techniques are limited in addressing these challenges, making it challenging to acquire high-quality underwater image signals. To overcome these limitations, this study proposes an approach called adaptive multi-scale multi-color space underwater image enhancement with GAN-physics fusion (AMSMC-UGAN). AMSMC-UGAN leverages multiple color spaces (RGB, HSV, and Lab) for feature extraction, compensating for RGB’s limitations in underwater environments and enhancing the use of image information. By integrating a membership degree function to guide deep learning based on physical models, the model’s performance is improved across different underwater scenes. In addition, the introduction of a multi-scale feature extraction module deepens the granularity of image information, learns the degradation distribution of different image information of the same image content more comprehensively, and provides useful guidance for more comprehensive data for image enhancement. AMSMC-UGAN achieved maximum scores of 26.04 dB, 0.87, and 3.2004 for PSNR, SSIM, and UIQM metrics, respectively, on real and synthetic underwater image datasets. Additionally, it obtained gains of at least 6.5%, 6%, and 1% for these metrics. Empirical evaluations on real and artificially distorted underwater image datasets demonstrate that AMSMC-GAN outperforms existing techniques, showcasing superior performance with enhanced quantitative metrics and strong generalization capabilities.

Keywords:

underwater image enhancement; multi-color space; multi-scale; adaptive; GAN-physics fusion

MSC:

68T45; 68U101

1. Introduction

In the 21st century, underwater image enhancement technology has found extensive applications in marine exploration, research, and monitoring, providing crucial support for resource development, ecological conservation, scientific inquiry, and security defense. However, due to adverse underwater conditions, factors such as light absorption, scattering, color attenuation, flow, and turbulence result in uneven brightness, low contrast, blurring, and distortion in the images, hindering the acquisition of high-quality underwater imagery. In recent years, conventional and deep learning-based techniques for enhancing underwater imagery have seen significant advancements, demonstrating a degree of maturity in addressing the inherent low-contrast issues in underwater images. However, these methods continue to face constraints imposed by the intricate underwater environment, thereby exhibiting limited adaptability. Moreover, they demand substantial resources in terms of data and computational power, contributing to information degradation concerns. Addressing these perceptual limitations is of paramount importance, necessitating the advancement of top-tier underwater image enhancement methodologies aimed at optimizing the acquisition of superior image signals in submerged settings. To enhance underwater image quality, this study put forwards an adaptive multi-scale multi-color space underwater image enhancement method, referred to as AMSMC-UGAN. This approach integrates generative adversarial networks (GANs) with physical models to discern the mapping between distorted underwater images and their high-quality counterparts. The primary contributions of this study encompass the following.

This study employs RGB, HSV, and Lab color spaces for cross-color-space feature extraction, specifically addressing the limitations of using RGB alone in underwater settings. By transforming RGB images into additional color space types and integrating all three color spaces as input to the network, these compensatory measures effectively liberate the color limitation that exists in a single color space.
This study introduces a novel approach involving the adaptive selection of diverse physical models enabled by a membership degree function based on evaluation metrics. These models leverage prior image information to guide deep learning and integrate information from multiple color spaces at the input data level. This innovative approach significantly enhances image quality, thereby improving the generalization performance of underwater image enhancement models across diverse scenarios.
This study presents a novel multi-scale feature module aimed at refining the granularity of image information to enhance image details. Operating at three scales, this module extracts features tailored to each scale. By integrating diverse image details within each scale, it facilitates a more comprehensive understanding of degradation patterns and effectively separates degradation from image content. This approach enables a thorough analysis of degradation distribution related to various image details while preserving the integrity of the original image content.

The study’s contributions include utilizing RGB, HSV, and Lab color spaces for cross-color-space feature extraction and introducing adaptive selection of different physical models and a multi-scale feature module to enhance underwater images. These advancements have significantly improved the performance of underwater image enhancement models. However, there are still future research and improvement directions. Future research can explore the following application areas: applying this method to underwater robotics technology, enhancing visualization in marine biology research, preprocessing for image retrieval [1], and improving image quality in underwater monitoring systems. These application studies are expected to further drive the development of the underwater image enhancement field, providing better solutions for practical applications.

2. Related Work

In underwater image enhancement, there are two main approaches: conventional methods and deep learning-driven methods. Conventional methods can be further divided into non-physics-based and physics-based enhancements.

2.1. Non-Physics Module-Based Methods

Methods for enhancing underwater images that rely on non-physical models often overlook the unique degradation mechanisms specific to underwater imagery. These methods primarily involve image processing in either the spatial or frequency domain to improve contrast or correct colors. Common approaches include enhanced algorithms using histogram equalization. Ghani et al. [2] introduced an adaptive histogram enhancement method using Rayleigh stretch for contrast improvement, enhancing contrast, refining details, and reducing issues like over-enhancement, oversaturation, and noise. Li et al. [3] proposed an integrated approach for underwater image enhancement by combining an improved underwater white balance algorithm with histogram stretching, demonstrating superior outcomes in color correction, haze reduction, and detail enhancement. However, these methods have not accounted for image degradation factors. Directly processing underwater images may lead to problems such as color distortion and incomplete dehazing.

2.2. Physics Module-Based Methods

The image restoration method based on a physics-based model utilizes the underwater scattering model as its foundation, transforming image enhancement into an estimation of model parameters. One of the most classic underwater scattering models [4,5] can be described as follows:

E_{c} = J_{c} (x) + T_{c} (x, + A_{4} - T_{c} (x)), T_{c} (x) = e^{a d (x)}

(1)

In the equation,

C \in \{R, G, B\}

represents the three channels of a standard RGB image,

E_{c} (x)

is the underwater image to be restored,

J_{c} (x)

is a clear aerial image,

A

is ambient light,

T_{c} (x)

represents the transmittance corresponding to each pixel, and

α

is the scattering coefficient.

He et al. [6] introduced an image restoration method based on the underwater imaging model known as the dark channel prior (DCP) algorithm. Li et al. [7] proposed a technique employing red channel correction and blue–green channel dehazing, utilizing a gray-world algorithm for red channel color correction and adaptive exposure for handling overexposed and underexposed regions, thus enhancing visibility and contrast. Han et al. [8] accounted for backscattering effects during imaging, altering the light source to mitigate scattering and acquiring orthogonal polarized images. They introduced a polarization-based point spread estimation method. Meng et al. [9] integrated color balance and volume methods into underwater image enhancement, recovering images using color balance when the red channel value approximates the blue channel. The maximum a posteriori (MAP)-based restoration using DCP and a sharpening-based approach reduces blurriness, enhancing visibility and foreground texture preservation. While this category of methods has achieved some efficacy, accurate restoration of clear underwater images proves challenging in the absence of valid prior assumptions.

2.3. Deep Learning-Based Methods

The underwater image enhancement technique based on deep learning involves training convolutional neural networks (CNNs) and generative adversarial networks (GANs) on extensive datasets to predict clear underwater images. Perez et al. [10] initiated a CNN-based model for enhancing underwater images, creating a dataset of degraded and clear image pairs to establish the mapping using deep learning. Yang et al. [11] proposed an end-to-end neural model with color correction and adaptive refinement, resulting in high-quality enhancements in details, contrast, and color. However, it struggles with style and texture preservation. Another approach by Yang et al. [12] utilized GANs to generate clear underwater images using a multi-scale structure and dual discriminators to capture local and global semantic information. Wu et al. [13] introduced fusion water-GAN (FW-GAN), refining and encoding original images using multiple inputs and achieving satisfying enhancement results through multi-scale fusion features and an attention mechanism. These end-to-end methods can learn mapping without relying on physical models, but face challenges like limited receptive fields and loss of structural information. Enhancing the network’s ability to aggregate features from various scales is crucial.

2.4. Literature Analysis

In conclusion, within this section, a comprehensive literature analysis will be conducted, with a particular focus on reviewing the recent contributions and limitations of advancements in the field of underwater image enhancement in recent years, as depicted in Table 1.

Through an in-depth literature analysis, it is evident that both traditional underwater image methods and pure deep learning approaches currently face three significant challenges.

The limitation of a single color space in underwater environments, failing to fully capture complex color features and affecting the effectiveness of image processing.
Image quality variations and feature uncertainties across diverse underwater scenes, making it difficult for a single model to adapt to this diversity effectively.
The diverse degradation phenomena in underwater images, posing challenges in accurately modeling the degradation distribution for different image information.

A thorough literature analysis reveals significant challenges faced by both traditional underwater image methods and pure deep learning approaches, including limitations of a single color space, variations in image quality and feature uncertainty across different underwater scenes, and various degradation phenomena present in underwater images. Multi-color spaces possess color characteristics that single-color spaces lack. Extracting features from multiple color spaces effectively overcomes the constraints of single-color spaces, enabling models to address color deviation issues more effectively. Adaptive physical model integration with deep learning model decisions simultaneously offers good scene adaptability and automatic learning features of image content. By integrating physical models and deep learning models, the model achieves better generalization performance on images obtained from different underwater scenes. Multi-scale feature extraction, characterized by extracting features with different distributions and characteristics at different scales, effectively models random degradation factors, thereby aiding in comprehensively learning degradation distributions of different image information under the same content. Therefore, the proposed AMSMC-UGAN method, through the incorporation of three modules—multi-color space, adaptive physical model integration with deep learning model decisions, and multi-scale feature extraction—provides outstanding solutions to the challenges identified in the literature analysis above.

3. Materials and Methods

Inspired by the underwater image enhancement method in recent years, this study addresses some problems of the findings of Section 2.4, which will be elaborated on in Section 3.

3.1. Network Architecture

AMSMC-UGAN, illustrated in Figure 1, employs an encoder–decoder generator. The input image undergoes restoration based on a physical model and a conversion to Lab and HSV color spaces, generating respective outputs. The network selectively applies physical model-based restoration algorithms, including total variation [14], dark channel prior [6],contrast improvement based on inverse filtering [15], and resolution reconstruction via sparse representation [16], then the underwater depth estimation parameters of the image were estimated by the underwater physical model and transmitted to the multi-scale module of the image as a fusion factor of the deep learning model, as shown in line 23 of Algorithm 1, to guide and constrain the deep model’s output. Next, the multi-scale module processes various color spaces and employs attention mechanisms to enhance critical features, significantly improving the model’s performance. The output undergoes up-sampling and reconstruction using deconvolution layers.

The AMSMC-UGAN discriminator is modeled as PatchGAN [17]. The input of the discriminator block is a pair consisting of the reconstructed image generated by the generator and the original image. The output of the discriminator block is a discrimination result, which measures the degree of difference between the generated reconstructed image and the original image. The definition of the AMSMC-UGAN is elaborated in Algorithm 1.

Algorithm 1: Definition process of AMSMC-UGAN

1: function underwater transmission estimator (x):

2: Initialize a parameter controlling the effect of the atmosphere.

ω

3:

T r a n s m i s s i o n E s t i m a t o r (o b s e r v e d_i m a g e, r e c o v e r e d_i m a g e) = 1 - ω (\frac{r e c o v e r e d_i m a g e}{m a x (o b s e r v e d_i m a g e)})

4: function multi-scale-module (x):

5:

s c a l e 1 (x) = R e L U (B N (C o n v 2 D (x, 3 \to 64,3 \times 3,1, 1)) \times 3

6:

s c a l e 2 (x) = R e L U (B N (C o n v 2 D (x, 3 \to 64,5 \times 5,1, 2)) \times 3

7:

s c a l e 3 (x) = R e L U (B N (C o n v 2 D (x, 3 \to 64,7 \times 7,1, 3)) \times 3

8: function RGB to HSV (x):

9: Initialize the maximum value and minimum value among r,g,b.

m a x c

m i n c

10: Initialize the difference between

m a x c

and

m i n c

.

∆ c

11:

R G B t o H S V (r g b) = [\begin{matrix} H \\ S \\ V \end{matrix}], H = \{\begin{matrix} 60 ° \times (\frac{g - b}{∆ c}), i f m a x c = r \\ 60 ° \times (2 + \frac{b - r}{∆ c}), i f m a x c = g \\ 60 ° \times (4 + \frac{r - g}{∆ c}), i f m a x c = b \end{matrix}, S = \frac{∆ c}{m a x c}, V = m a x c

12: function RGBtoLab (x):

13:

n o r m a l i z e d_r g b = x / 255.0

,

14:

l i n e a r_r g b = \{\begin{matrix} {(\frac{{r g b}_{n o r m a l i z e d} + 0.55}{1.055})}^{2.4}, i f {r g b}_{n o r m a l i z e d} > 0.04045 \\ \frac{{r g b}_{n o r m a l i z e d}}{12.92}, o t h e r w i s e \end{matrix}

15: switch to XYZ and normalize XYZ, obtain

f_{x}, f_{y} f_{z}

16: switch to Lab

L = \{\begin{matrix} 116 \times f_{y} - 16, if Y > 0.008856451679035631 \\ 0.9032962962962963 \times Y, otherwise \end{matrix}, a = 500 \times (f_{x} - f_{y}), b = 200 \times (f_{y} - f_{z})

17:

R G B t o L a b (r g b) = s t a c k (L, a, b)

18: function AMSMC-UGAN (x):

19: Input raw picture data

X

and ground-truth picture

Y

20: Calculate PSNR:

P S N R (X, Y) = 10 \times {l o g}_{10} (\frac{255^{2}}{M S E})

21: Defining image quality grades

G r a d e = \{\begin{matrix} “ p o o r, ” i f 0 < P S N R < 15 \\ “ f a i r, ” i f 10 < P S N R < 20 \\ “ a v e r a g e, ” i f 15 < P S N R < 30 \\ “ g o o d, ” i f 25 < P S N R < 40 \end{matrix}

22: Selecting an underwater restore model to obtain recovered image

r e c o v e r e d_i m a g e = \{\begin{matrix} T V (X), i f G r a d e = “ p o o r ” \\ C E D (X), i f G r a d e = “ f a i r ” \\ D C P (X), i f G r a d e = “ a v e r a g e ” \\ S R (X), i f G r a d e = “ g o o d ” \end{matrix}

23: obtain underwater physical estimation:

P e = T r a n s m i s s i o n E s t i m a t o r (X, r e c o v e r e d_i m a g e)

24: Initialize

C o n t r 5

and

C B A M

G e n e r a t o r d e c o d e r (x) = C o n t r 5 (C o n t r 4 (C o n t r 3 (C o n t r 2 (C o n t r 1 (x))))

G e n e r a t o r e n c o d e r (x) = C B A M [c a t (\binom{s c a l e 1 (r g b)}{\begin{matrix} s c a l e 1 (h s v) \\ s c a l e 1 (l a b) \\ s c a l e 1 (P e) \end{matrix}}), c a t (\binom{s c a l e 2 (r g b)}{\begin{matrix} s c a l e 2 (h s v) \\ s c a l e 2 (l a b) \\ s c a l e 2 (P e) \end{matrix}}), c a t (\binom{s c a l e 3 (r g b)}{\begin{matrix} s c a l e 3 (h s v) \\ s c a l e 3 (l a b) \\ s c a l e 3 (P e) \end{matrix}})]

G e n e r a t o r (x) = G e n e r a t o r d e c o d e r (G e n e r a t o r e n c o d e r (x))

Initialize different discriminator blocks.

D B 1, D B 2, D B 3, D B 4

D i s c r i m i n a t o r (x) = d i v i d e_i n t o_p a t c h e s (D B 4 (D B 3 (D B 2 (D B 1 (G e n e r a t o r (x) + Y)))), 16 \times 16)

3.2. Generator

The generator comprises an encoder–decoder structure. Next, the components of the encoder and decoder will be detailed.

3.2.1. Encoder

The encoder is composed of an adaptive decision enhancement module, a multi-color-space module, and a multi-scale feature extraction-fusion module.

Adapting process Figure 2 shows the strategy of the physical model of an adaptive deep learning network for image enhancement.

To mitigate excessive enhancement and ensure the appropriate selection of physical models, this study use membership degree functions to adaptively choose these physical models to fuse with GAN.

In this study, image quality is defined into four levels.

Poor image quality typically indicates severe distortion between the processed image and the original image, with PSNR values in the range of (0, 15).

Fair image quality is often attributed to low brightness and contrast in the image, resulting in a certain degree of distortion between the processed image and the original image, indicating relatively poor image quality, with PSNR values in the range of (10, 25).

When image quality is average, it indicates relatively good image quality, approaching the level of imperceptible distortion, yet factors affecting the PSNR value still exist, such as haze, with PSNR values in the range of (15, 30).

When image quality is good, it indicates very high image quality, approaching the quality of the original image, with PSNR values in the range of (25, 40).

As expressed in Algorithm 1 lines 21 and 22, the image quality levels were determined by PSNR value calculation. Four different physical models were selected based on the quality assessment:

“Poor”-quality images employ total variation (TV) reconstruction, ideal for denoising and reconstruction purposes.
“Fair”-quality images utilize CED (inverse filtering-based image contrast restoration physical model) to address contrast issues.
“Average”-quality images make use of dark channel prior (DCP) for haze removal and detail enhancement.
“Good”-quality images utilize sparse representation reconstruction, enhancing resolution and detail.

The fuzzy logic toolbox defines membership functions for fuzzy sets representing quality levels (“poor”, “fair”, “average”, “good”, “excellent”). Adjusting the physical model and implementing deep learning techniques based on these quality ratings prevents excessive enhancement.

Multi-color space module The approach in this study employs a multi-color space encoder. It densely connects features extracted from the RGB channel with those from the HSV and Lab channels, thereby providing specific image characteristics for each color space, enriching various aspects of image information.

Figure 3a–c showcases visual representations of the channels extracted from different color spaces. These visuals offer an intuitive display of how different color spaces impact image processing in underwater settings, highlighting the limitations of the RGB channel in terms of color distortion and reduced contrast.

Multi-scale feature extraction fusion module This study inputs images processed through three color spaces and a physical model into a multi-scale feature extraction module for feature extraction. Figure 4 illustrates the structure of the multi-scale feature extraction module, where scale1, scale2, and scale3 correspond to feature extraction at large, medium, and small scales respectively. Each scale extraction block consists of three convolutional layers (Conv2d), differing in kernel size, stride, and padding parameters. Specifically, all Conv2d layers in scale1 have a kernel size of 3 × 3, stride of 1, and padding of 1; all Conv2d layers in scale2 have a kernel size of 5 × 5, stride of 1, and padding of 2; and all Conv2d layers in scale3 have a kernel size of 7 × 7, stride of 1, and padding of 3. Additionally, the number of channels for each scale changes progressively from an initial 3 channels to 16 channels, then to 32 channels, and finally to 64 channels. This multi-scale feature extraction method enables the extraction of different hierarchical information from different scale features, providing a richer feature representation for subsequent processing and analysis. The variation in kernel size and channel number at different scales helps capture features of different scales and levels, enhancing the model’s capability and diversity in expressing image features. Unlike other methods, the multi-scale feature extraction module used in this study preserves information integrity and does not employ pooling layers for down-sampling. This design decision ensures that the size of output feature maps at different scales is 256 × 256 × 64, avoiding the need for up-sampling during the fusion stage. This approach prevents the loss of feature information and effectively utilizes image information for multi-scale feature extraction in image enhancement tasks. By not using pooling layers for down-sampling, the multi-scale feature extraction module retains more detailed information, enhancing feature precision and integrity. The output feature maps’ size of 256 × 256 × 64 indicates that throughout the feature extraction process, both spatial dimensions and channel numbers of the images are adequately preserved and expressed. Therefore, in subsequent image enhancement tasks, these feature maps of different scales can be comprehensively analyzed and processed to improve image quality and detail expression. Moreover, by eliminating the need for up-sampling steps, computational burden is reduced and operational efficiency is increased, making the multi-scale feature extraction module more feasible and effective in practical applications. To determine the number of layers corresponding to each scale in the multi-scale module, this study conducted validation experiments. As shown in the line chart in Figure 4, it can be observed that when the number of layers corresponding to each scale in the multi-scale module is set to 3, the image generation achieves optimal results. This holds true for both the UIEB and EUVP datasets, as the PSNR evaluation scores are highest when the number of layers is set to 3.

RGB color space is processed using a physical model. The output, along with data from other color spaces, is fed into a multi-scale feature extraction module. Each scale extraction block captures diverse image features, enhancing the model’s expressiveness. Distinct image features at various scales are integrated to obtain fused feature representation. (See Figure 4 and Figure 5 for visual representation).

3.2.2. Decoder

The attention mechanism’s output is up-sampled via deconvolution layers, integrating multi-scale features for enriched representation. This deepens the information granularity of the image, improving spatial awareness by emphasizing both local details and global context. The final result is obtained through five layers of deconvolution, ensuring spatial consistency.

Figure 6 illustrates the fusion of features at different scales, which are then entered into the channel spatial attention module (CBAM) [18] before undergoing the decoding process. Firstly, at the stage of fusion between the physical model and deep learning features, the output of the physical model is concatenated with multi-scale features extracted from different color spaces (HSV, RGB, Lab) at various scales. At each scale, the concatenated features result in feature maps of size 256 × H × W, where H and W represent the height and width of the feature maps, respectively. Secondly, at the input attention mechanism (CBAM) stage, CBAM operations are performed on the concatenated features at each scale to enhance feature representation. Following CBAM, the size of the feature maps at each scale remains 256 × H × W. Lastly, at the up-sampling stage, spatial dimensions of the feature maps are progressively increased using transposed convolutional layers. After the first transposed convolution (convtr1), the size of the feature maps becomes 128 × 2 H × 2 W, due to the doubling of height and width. In the second transposed convolution (convtr2), the size of the feature maps is 64 × 4 H × 4 W, achieved by concatenating with fused features from the preceding scale. Similarly, after the third transposed convolution (convtr3), the size of the feature maps becomes 32 × 8 H × 8 W, and after the final transposed convolution (convtr4), the size is 16 × 16 H × 16 W. Ultimately, the fifth transposed convolution (convtr5) yields the final output with the desired number of output channels.

3.3. Discriminator

The AMSMC-UGAN discriminator scores input image blocks to determine authenticity. Each element in the final output matrix actually represents a relatively large receptive field in the original image, which corresponds to a patch in the original image. The patch (as illustrated in Figure 7) structure preserves high-frequency info and is computationally efficient. The discrimination block (Figure 7) contains a 4 × 4 convolution layer, batch normalization (BN layer), spectral normalization (SN layer) [19], and leaky ReLU activation. SN ensures k-Lipschitz condition, stabilizing GAN training, and balancing of generator and discriminator. It controls gradient norm, improving stability.

3.4. Objective Function Formulation

In order to improve the visual, structural, and perceptual resemblance of generated images to real images within AMSMC-UGAN, four types of loss functions have been employed for training. This approach enables the evaluation of the quality of enhanced images in this study The objective function is denoted as follows:

L_{t o t a l} = L_{D} + L_{G} + λ_{e} L_{e d g e} + λ_{1} L_{1} + λ_{s} L_{S S I M}

(2)

In the equation,

L_{D}

is the loss of the discriminator D,

L_{G}

is the loss of the generator G,

L_{e d g e}

is the edge loss [20],

L_{1}

is the L1 distance loss,

L_{S S I M}

is the SSIM loss [21],

λ_{e}

,

λ_{1}

, and

λ_{s}

correspond to the weights of the edge loss function, L1 distance loss function, and similarity loss function, respectively. These functions are defined in detail below.

Discriminator D loss: The overall loss of the discriminator network

L_{D}

is composed of two components:

l o s s_r e a l

and

l o s s_f a k e

. The

l o s s_r e a l

represents the adversarial loss for real samples. On the other hand, the

l o s s_f a k e

represents the adversarial loss for generated samples.

L_{D} = 0.5 (l o s s_{r e a l} + l o s s_{f} a k e) \times 10

(3)

The processing approach aims to balance real and generated samples to train the discriminator to accurately distinguish between them, providing useful feedback for the generator.

Generator G loss: The total loss of the generator network

L_{G}

is obtained by summing three loss terms:

l o s s_G A N

, 7 times of the

l o s s_1

, and 3 times of the

l o s s_c o n

. The

l o s s_G A N

is computed by calculating the mean squared error between the discriminator’s output result

D (G (X))

and the labels marked as real samples—

p r e d_r e a l

. The

l o s s_1

is computed by calculating the L1 distance loss between the generator’s output

G (X)

and the real reference data

Y

. The

l o s s_c o n

is computed by applying the VGG19 pretrained loss function to the generator’s output

G (X)

and the real reference data

Y

. The loss function

L_{G}

is represented as:

L_{G} = l o s s_{G A N} + 7 \times l o s s_{1} + 3 \times l o s s_{c} o n

(4)

The varied loss terms guide generator training to meet adversarial objectives, minimize L1 differences with the target, and capture content features.

Edge loss: Underwater images frequently experience high-frequency information loss. The edge loss is employed to constrain the high-frequency components between the real and generated enhanced images, aiming to retain the high-frequency information.

L_{e d g e} = \sqrt{{(\nabla Y_{x} - \nabla G {(x)}_{x})}^{2} + {(\nabla Y_{y} - \nabla G {(x)}_{y})}^{2}}

(5)

In this equation, let

Y

represent the target image,

G (X)

denote the output image from the generator,

\nabla Y_x

and

\nabla Y_y

represent the horizontal and vertical gradients of the target image, respectively, and

\nabla G (x)_x

and

\nabla G (x)_y

represent the horizontal and vertical gradients of the generated output image. The edge loss is computed using the squared differences of target and generated images’ gradients.

L1 loss: This study employs the

L_{1}

distance to supervise the model in generating results that are globally similar to the reference image. The

L_{1}

loss function is represented as:

L_{1} = E [{|Y - G (x)|}_{1}]

(6)

SSIM loss: this study utilizes the SSIM loss to constrain the brightness, contrast, and structure between the generated images and the ground-truth real images:

S S I M (x, y) = (\frac{{2 μ}_{x} μ_{y} + c_{1}}{μ_{x}^{2} + μ_{y}^{2} + c_{1}}) (\frac{{2 σ}_{x y} + c_{2}}{σ_{x}^{2} + σ_{y}^{2} + c_{2}})

(7)

where

μ_{x} (μ_{y})

represents the mean value,

σ_{x}^{2} (σ_{y}^{2})

represents the variance of

x (y)

, and

σ_{x y}

refers to the covariance of

x (y)

, In addition,

c_{1} = {(255 \times 0.01)}^{2}

and

c_{2} = {(255 \times 0.03)}^{2}

are set as two constants to avoid division by zero. The SSIM loss is computed using the SSIM values. The SSIM loss function is represented as:

L_{S S I M} = E [1 - S S I M (Y - G (x))]

(8)

Here, let

Y

represent the target image and

G (X)

denote the output image from the generator. Finally, this study sets the weight for each loss function based on the literature:

λ_{e} = 0.5

,

λ_{1} = 0.1

,

λ_{s} = 0.1

.

4. Experiments

4.1. Implementation Details

This study implemented AMSMC-UGAN using the PyTorch library in Python on an NVIDIA GeForce RTX 4090. The GPU has a memory capacity of 24 GB. The software versions used were Python 3.8 and Torch 1.13.1. Training used 13,920 EUVP [22] and 890 UIEB [23] paired samples. Data were split for validation/testing. Images are resized to 256 × 256. Layer filter weights were initialized from a normal distribution; Adam optimizer with LR 0.003, batch size 16; model trained for 200 epochs to obtain the final version.

4.2. Experiment Settings

4.2.1. Dataset Splitting

This study employed two datasets: enhancing underwater visual perception (EUVP) with 13,920 samples and the underwater image evaluation benchmark (UIEB) with 890 images. Due to the larger quantity and wider variability of the synthetic dataset, more samples were allocated to the training set to enhance the model’s understanding of underwater environment characteristics. Validation and test sets received smaller allocations to evaluate performance under synthetic underwater conditions. The real underwater dataset, though smaller, was more representative and received fewer samples in the training set. Larger allocations for validation and test sets facilitated performance evaluation under real underwater conditions, improving the model’s generalization to real-world scenarios. The dataset is divided as in Table 2.

4.2.2. Comparison Method

In this study, AMSMC-UGAN is compared with five image enhancement methods. This study compared five UIE methods: UGAN [24], FUnie-GAN [22], Ucolor [25], U-shape [26], and SyrealNet [27].

4.2.3. Evaluation Metrics

The primary image quality evaluation metrics used were PSNR and SSIM, which were quantitatively compared with other methods at both pixel and structural levels. Higher PSNR and SSIM values indicate better image quality. Additionally, UIQM was employed as an evaluation metric, considering brightness, contrast, and structure. Higher UIQM values indicate good perceptual effects in different image scenarios.

4.3. Visual Comparison

Table 3 presents the quantitative results of comparing different UIE algorithms on the UIEB dataset and the EUVP dataset, including UGAN, FUnie-GAN, Ucolor, U-shape, and SyrealNet. This study primarily employed PSNR, SSIM, and UIQM as evaluation metrics for the UIEB dataset, and PSNR and SSIM for the EUVP dataset.

The results in Table 3 indicate that the AMSMC-UGAN algorithm achieves state-of-the-art performance on three full-reference image quality assessment metrics for UIE tasks, significantly outperforming several other UIE methods, thereby validating the robustness of the proposed AMSMC-UGAN in underwater image enhancement. Additionally, visual comparisons with other UIE methods are demonstrated separately on the UIEB dataset and the EUVP dataset in Figure 8 and Figure 9.

This study has reproduced all the comparison methods mentioned, maintaining their original hyperparameter settings to ensure optimal training effectiveness. The optimizers used for SyrealNet, UGAN, and FUnie-GAN are all Adam, with a learning rate set to 0.001. Ucolor employs the AdamW optimizer with a learning rate of 1 × 10⁻⁶ (10⁻⁶), while U-shape uses SGD with a learning rate of 0.1. AMSMC-UGAN utilizes the Adam optimizer with a learning rate of 0.0001. This setting aids in accelerating convergence, enhancing training stability, and mitigating issues such as gradient explosion and vanishing gradients. Additionally, the lower learning rate facilitates fine-tuning of the optimization process, preventing the oversight of optimal solutions.

Figure 8 depicts five randomly selected cases from the UIEB dataset, while Figure 9 showcases five randomly selected cases from the EUVP dataset. In Figure 8, the proposed method (AMSMC-UGAN) was visually compared with other underwater image enhancement (UIE) methods using real underwater image data. The method proposed in this study demonstrates certain advantages in visual presentation. For instance, as observed in Figure 8a–e, the visual results of UGAN, FUnie-GAN, U-color, U-shape, and Syeanet still exhibit issues such as haze, color deviation, and low contrast. In contrast, the proposed method (AMSMC-UGAN) effectively addresses these issues related to haze, color deviation, and low contrast. Similarly, in Figure 9, the proposed method was visually compared with other UIE methods using synthetic real image data. The method proposed in this study continues to demonstrate certain advantages in visual presentation. For example, Figure 9b,e reveal that the mentioned comparative methods fail to handle the deviations in green and yellow colors effectively in synthetic images. Conversely, the proposed method (AMSMC-UGAN) addresses the issues of green and yellow color deviations, ensuring that the images better represent the conditions in the air and achieve the effect of eliminating color deviations. Furthermore, in Figure 9a,d,c, it is evident that AMSMC-UGAN preserves more details of the images and approaches closer to the paired reference images. The results obtained from AMSMC-UGAN consistently demonstrate visual superiority and a more natural appearance in test images. This suggests strong indications of the robust generalization capabilities of AMSMC-UGAN for real-world applications.

In order to better verify the effectiveness of AMSMC-UGAN method in detail and color restoration, this study conducted RGB distribution tests and feature point matching tests, as shown in Figure 10. As shown in Figure 10a, after underwater image enhancement using AMSMC-UGAN, the RGB distribution of the output is close to that of the ground-truth reference image, which indicates that the AMSMC-UGAN method is effective in color correction for underwater images during underwater enhancement tasks. As shown in Figure 10b, comparing the feature point matching with the methods listed in Table 1, it can be seen that AMSMC-UGAN has the best feature matching performance, with the most feature points matched the ground-truth reference image. This also indicates that the AMSMC-UGAN method is effective in detail restoration for underwater images during underwater enhancement tasks.

4.4. Adaptive Control Verification Experiment

This experiment validates the adaptive control strategy for underwater image enhancement using the fuzzy membership function on various test datasets. By integrating different physical models with deep learning models for enhancement processing of the same underwater image, this experiment aimed to validate whether the integration of four selected physical models into deep learning models can improve image quality for various underwater conditions. These conditions include low contrast, haze, color deviation, blurred detail information, and low resolution. The validation is performed on data sampled from real underwater images (Test-UP100), as shown in Figure 11a–e.

In Figure 11a,b, (a) shows the original input image with low contrast, haze, and color deviation. The proposed adaptive image enhancement control strategy (Section 3) addresses these issues: (b) reconstructs (a), recovering its information; (c) improves contrast; (d) removes haze; and (e) enhances resolution. The experiments in Figure 11 demonstrate that by adaptively incorporating physical models into deep models, the image quality gradually improves when dealing with underwater images suffering from low contrast, haze, color deviation, blurred details, and low resolution, as evidenced by the results in Table 4. Compared to the initial images, the adaptive integration of physical models into deep models through AMSMC-UGAN leads to progressively increasing gains in the quality evaluation metric (PSNR), with improvements of 41.6%, 42%, 52%, and 70%, respectively. Real underwater image experiments (Test-UP100) validate this strategy’s effectiveness in improving contrast, haze, and color deviation.

The adaptive image enhancement control strategy effectively improves underwater image quality and avoids excessive or subtle enhancements. It also enhances the network’s generalization performance. Table 4 supports these conclusions.

4.5. Ablation Experiment

In order to evaluate the effectiveness of various key components within the network, this study conducted ablation experiments on both UIEB and EUVP, comparing the performance of each network in enhancing underwater images, as demonstrated in Figure 12.

The strategy involves removing key components from the original model to obtain five models lacking these critical components. As the key components gradually decrease, the image enhancement results of the network output are far inferior to those of the original network. Based on the performance on the UIEB and EUVP data shown in Figure 11, reducing the involvement of key components leads to color deviation issues (either biased towards blue or green) in underwater images, as well as problems such as blurred details and low contrast. The extent of image enhancement is far inferior to networks where all components are involved. This demonstrates the effectiveness of each component in underwater image enhancement. Additionally, this study continued to use PSNR and SSIM as evaluation metrics to score each network in the ablation experiment. Table 5 and Table 6 indicate that the key components this study designed are effective for underwater image enhancement tasks.

5. Conclusions

This paper introduces a novel underwater image enhancement model, leveraging multi-color space features and physically-based restoration. The model combines multi-scale feature extraction from various color spaces and integrates physical models to guide the enhancement process, aligning outputs with real-world characteristics. An adaptive decision control mechanism prevents over- or under-enhancement. Experimental results showcase the effectiveness and advantages of this approach, validating the importance of key components through ablation studies.

Author Contributions

Data curation, Z.L.; formal analysis, Z.L.; investigation, Z.L.; methodology, Z.L.; resources, Z.L.; writing—original draft, D.C., Z.L., W.Z., H.L., B.Z., Z.Z. and W.F.; writing—review and editing, D.C., Z.L., W.Z., H.L., B.Z., Z.Z. and W.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received by the Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai) under Grant SML2022SP101.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kelishadrokhi, M.K.; Ghattaei, M.; Fekri-Ershad, S. Innovative local texture descriptor in joint of human-based color features for content-based image retrieval. Signal Image Video Process. 2023, 17, 4009–4017. [Google Scholar] [CrossRef]
Ghani, A.S.A.; Isa, N.A.M. Enhancement of low quality underwater image through integrated global and local contrast correction. Appl. Soft Comput. 2015, 37, 332–344. [Google Scholar] [CrossRef]
Li, X.J.; Hou, G.J.; Tan, L.; Liu, W.Q. A Hybrid Framework for Underwater Image Enhancement. IEEE Access 2020, 8, 197448–197462. [Google Scholar] [CrossRef]
McGlamery, B. A Computer Model for Underwater Camera Systems; SPIE: Bellingham, WA, USA, 1980; Volume 0208. [Google Scholar]
Jaffe, J. Computer modeling and the design of optimal underwater imaging systems. IEEE J. Ocean. Eng. 1990, 15, 101–111. [Google Scholar] [CrossRef]
He, K.M.; Sun, J.; Tang, X.O. Single Image Haze Removal Using Dark Channel Prior. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2341–2353. [Google Scholar] [PubMed]
Li, C.; Quo, J.; Pang, Y.; Chen, S.; Wang, J. Single underwater image restoration by blue-green channels dehazing and red channel correction. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 1731–1735. [Google Scholar]
Han, P.; Liu, F.; Yang, K.; Ma, J.; Li, J.; Shao, X. Active underwater descattering and image recovery. Appl. Opt. 2017, 56, 6631–6638. [Google Scholar] [CrossRef] [PubMed]
Meng, H.Y.; Yan, Y.J.; Cai, C.T.; Qiao, R.J.; Wang, F. A hybrid algorithm for underwater image restoration based on color correction and image sharpening. Multimed. Syst. 2022, 28, 1975–1985. [Google Scholar] [CrossRef]
Perez, J.; Attanasio, A.C.; Nechyporenko, N.; Sanz, P.J. A Deep Learning Approach for Underwater Image Enhancement. In Biomedical Applications Based on Natural and Artificial Computing; Springer: Cham, Switzerland, 2017; pp. 183–192. [Google Scholar]
Yang, X.; Li, H.; Chen, R. Underwater image enhancement with image colorfulness measure. Signal Process. Image Commun. 2021, 95, 116225. [Google Scholar] [CrossRef]
Yang, M.; Hu, K.; Du, Y.X.; Wei, Z.Q.; Sheng, Z.B.; Hu, J.T. Underwater image enhancement based on conditional generative adversarial network. Signal Process. Image Commun. 2020, 81, 141002. [Google Scholar] [CrossRef]
Wu, J.J.; Liu, X.L.; Lu, Q.H.; Lin, Z.Q.; Qin, N.W.; Shi, Q.W. FW-GAN: Underwater image enhancement using generative adversarial network with multi-scale fusion. Signal Process. Image Commun. 2022, 109, 116855. [Google Scholar] [CrossRef]
Liu, X.W. Total generalized variation and wavelet frame-based adaptive image restoration algorithm. Vis. Comput. 2019, 35, 1883–1894. [Google Scholar] [CrossRef]
Guo, L.Q.; Zha, Z.Y.; Ravishankar, S.; Wen, B.H. Exploiting Non-Local Priors via Self-Convolution for Highly-Efficient Image Restoration. IEEE Trans. Image Process. 2022, 31, 1311–1324. [Google Scholar] [CrossRef] [PubMed]
Zhu, Z.L.; Guo, F.D.; Yu, H.; Chen, C. Fast Single Image Super-Resolution via Self-Example Learning and Sparse Representation. IEEE Trans. Multimed. 2014, 16, 2178–2190. [Google Scholar] [CrossRef]
Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral Normalization for Generative Adversarial Networks. arXiv 2018, arXiv:1802.05957. [Google Scholar]
Jiang, K.; Wang, Z.; Yi, P.; Chen, C.; Huang, B.; Luo, Y.; Ma, J.; Jiang, J. Multi-scale progressive fusion network for single image deraining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8346–8355. [Google Scholar]
Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Proceedings, Part II 14, Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–711. [Google Scholar]
Islam, M.J.; Xia, Y.Y.; Sattar, J. Fast Underwater Image Enhancement for Improved Visual Perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
Li, C.Y.; Guo, C.L.; Ren, W.Q.; Cong, R.M.; Hou, J.H.; Kwong, S.; Tao, D.C. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE Trans. Image Process. 2020, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed]
Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing underwater imagery using generative adversarial networks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar]
Li, C.; Anwar, S.; Hou, J.; Cong, R.; Guo, C.; Ren, W. Underwater image enhancement via medium transmission-guided multi-color space embedding. IEEE Trans. Image Process. 2021, 30, 4985–5000. [Google Scholar] [CrossRef] [PubMed]
Peng, L.; Zhu, C.; Bian, L. U-shape transformer for underwater image enhancement. In Computer Vision—ECCV 2022 Workshops; Springer: Cham, Switzerland, 2023. [Google Scholar]
Wen, J.; Cui, J.; Zhao, Z.; Yan, R.; Gao, Z.; Dou, L.; Chen, B.M. Syreanet: A physically guided underwater image enhancement framework integrating synthetic and real images. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 5177–5183. [Google Scholar]

Figure 1. Architecture of AMSMC-UGAN.

Figure 2. Adaptive matching of physical models with deep learning decisions.

Figure 3. Multi-color space transformation. (a) RGB. (b) HSV. (c) Lab.

Figure 4. Multi-scale feature extraction module and validation of layer distribution across multiple scale modules.

Figure 5. Multi-scale feature fusion process.

Figure 6. Architecture of decoder.

Figure 7. Architecture of discrimination block and the patch.

Figure 8. Visual results compared with other UIE methods on the UIEB dataset. Five randomly selected cases from the UIEB dataset (a–e).

Figure 9. Visual results compared with other UIE methods on the EUVP dataset. Five randomly selected cases from the EUVP dataset (a–e).

Figure 10. (a) Visualization curve comparing the output of AMSMC-UGAN with the RGB distributions of input and ground truth. The vertical axis is measured in

color value / 255

, and the horizontal axis is measured in

sample points / 1000

. (b) Feature point matching comparison between AMSMC-UGAN and other UIE methods, from top to bottom: AMSMC-UGAN, UGAN, FUnie-GAN, Ucolor, U-shape, SyreaNet.

Figure 10. (a) Visualization curve comparing the output of AMSMC-UGAN with the RGB distributions of input and ground truth. The vertical axis is measured in

color value / 255

, and the horizontal axis is measured in

sample points / 1000

. (b) Feature point matching comparison between AMSMC-UGAN and other UIE methods, from top to bottom: AMSMC-UGAN, UGAN, FUnie-GAN, Ucolor, U-shape, SyreaNet.

Figure 11. Validation experiments of adaptive control for underwater image enhancement. (a) is the original image, and (b–e) are four different physical models fused to GAN networks, namely, TV + GAN, CED + GAN, DCP + GAN and SR + GAN.

Figure 12. Visualization comparison of ablation experiments.

Table 1. Contributions and limitations of past research in the field of underwater image enhancement.

Methods	Author	Algorithm	Limitations
Non-physics module-based	Ghani et al.	Adaptive histogram equalization based on Rayleigh stretching	The Rayleigh distribution is not universally suitable for all underwater imaging conditions.
Non-physics module-based	Li et al.	Hybrid approach for enhanced underwater image enhancement	In challenging underwater conditions, excessive amplification of noise and artifacts.
Physics module-based	He et al.	Restoring images using the underwater imaging model—dark channel prior (DCP) algorithm	In challenging underwater conditions, recovery may yield unsatisfactory results with potential for artifacts.
	Li et al.	A method using red channel correction and blue–green channel dehazing	This approach may not fully adapt to unstable underwater conditions regarding light intensity and color distribution.
	Han et al.	A polarization-based point spread estimation method	A polarization-based point spread estimation method. Light direction and surface properties affect accuracy in complex conditions.
	Meng et al.	Hybrid underwater image restoration through color correction and sharpening	Possible noise and artifact overemphasis during enhancement.
Deep learning-based	Perez et al.	A CNN-based underwater image enhancement model	Restricted adaptability to different underwater conditions and varied image quality.
	Yang et al.	A trainable end-to-end neural model	It struggles with overall style and texture preservation.
	Yang et al.	A GAN-based underwater image enhancement method	Higher complexity, hinders real-time efficiency.
	Wu et al.	A multi-scale fusion generative adversarial network	Prior knowledge may limit enhancement in challenging underwater scenarios.

Table 2. Dataset setting.

Dataset	Total Images	Training Set	Validation Set	Test Set
EUVP	13,920	9600 (70%)	1920 (14%)	2400 (16%)
UIEB	890	600 (67%)	120 (13%)	170 (20%)
Test Subset	Description
Test-EP1000	Paired synthetic underwater images with ground-truth real images (2400)
Test-UP100	Paired real underwater images with ground-truth real images (170)

Table 3. Quantitative comparison of UIEB dataset and EUVP dataset. ↑ indicates higher values are preferable.

Methods	UIEB			EUVP
Name	PSNR ↑	SSIM ↑	UIQM ↑	PSNR ↑	SSIM ↑
UGAN	19.85	0.72	3.1686	24.80	0.76
FUnie-GAN	19.18	0.66	3.1162	25.87	0.77
Ucolor	20.81	0.82	2.9426	22.86	0.75
U-shape	17.18	0.63	2.9004	24.91	0.77
SyrealNet	18.60	0.78	3.1439	20.71	0.75
AMSMC-UGAN	23.66	0.87	3.2004	26.04	0.77

Table 4. Validation using data sampled from real underwater images (Test-UP100).

Methods	PSNR (dB) (Increase%)	SSIM (Increase%)
Raw	18.51	0.77
TV + Generator	26.22 (41.6%)	0.89 (15%)
CED + Generator	26.30 (42%)	0.89 (15%)
DCP + Generator	28.20 (52%)	0.87 (12%)
SR + Generator	31.55 (70%)	0.89 (15%)

Table 5. Ablation experiments on the UIEB dataset and EUVP dataset. ↑ indicates higher values are better. Here, APM represents the adaptive physics encoder, and MC represents the multi-color space encoder. √ represents the existence of this component, × represents the absence of this component.

Methods					UIEB
Module	PM	MS	MCS	CBAM	PSNR ↑	SSIM ↑
1	√	√	√	√	23.66	0.87
2	×	√	√	√	22.01	0.81
3	×	×	√	√	21.64	0.80
4	×	×	×	√	21.15	0.80
5	×	×	×	×	20.61	0.77

Table 6. Ablation experiments on the UIEB dataset and EUVP dataset. ↑ denotes higher values are better. Here, APM represents the adaptive physics encoder, and MC represents the multi-color space encoder. √ represents the existence of this component, × represents the absence of this component.

Methods					EUVP
Module	PM	MS	MCS	CBAM	PSNR ↑	SSIM ↑
1	√	√	√	√	26.04	0.77
2	×	√	√	√	24.47	0.76
3	×	×	√	√	23.94	0.77
4	×	×	×	√	23.57	0.75
5	×	×	×	×	23.13	0.74

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chao, D.; Li, Z.; Zhu, W.; Li, H.; Zheng, B.; Zhang, Z.; Fu, W. AMSMC-UGAN: Adaptive Multi-Scale Multi-Color Space Underwater Image Enhancement with GAN-Physics Fusion. Mathematics 2024, 12, 1551. https://doi.org/10.3390/math12101551

AMA Style

Chao D, Li Z, Zhu W, Li H, Zheng B, Zhang Z, Fu W. AMSMC-UGAN: Adaptive Multi-Scale Multi-Color Space Underwater Image Enhancement with GAN-Physics Fusion. Mathematics. 2024; 12(10):1551. https://doi.org/10.3390/math12101551

Chicago/Turabian Style

Chao, Dong, Zhenming Li, Wenbo Zhu, Haibing Li, Bing Zheng, Zhongbo Zhang, and Weijie Fu. 2024. "AMSMC-UGAN: Adaptive Multi-Scale Multi-Color Space Underwater Image Enhancement with GAN-Physics Fusion" Mathematics 12, no. 10: 1551. https://doi.org/10.3390/math12101551

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AMSMC-UGAN: Adaptive Multi-Scale Multi-Color Space Underwater Image Enhancement with GAN-Physics Fusion

Abstract

1. Introduction

2. Related Work

2.1. Non-Physics Module-Based Methods

2.2. Physics Module-Based Methods

2.3. Deep Learning-Based Methods

2.4. Literature Analysis

3. Materials and Methods

3.1. Network Architecture

3.2. Generator

3.2.1. Encoder

3.2.2. Decoder

3.3. Discriminator

3.4. Objective Function Formulation

4. Experiments

4.1. Implementation Details

4.2. Experiment Settings

4.2.1. Dataset Splitting

4.2.2. Comparison Method

4.2.3. Evaluation Metrics

4.3. Visual Comparison

4.4. Adaptive Control Verification Experiment

4.5. Ablation Experiment

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI