GFRENet: An Efficient Network for Underwater Image Enhancement with Gated Linear Units and Fast Fourier Convolution

Zhang, Bingxian; Fang, Jiahao; Li, Yujie; Wang, Yue; Zhou, Qinglong; Wang, Xing

doi:10.3390/jmse12071175

Open AccessArticle

GFRENet: An Efficient Network for Underwater Image Enhancement with Gated Linear Units and Fast Fourier Convolution

by

Bingxian Zhang

¹,

Jiahao Fang

²,

Yujie Li

²,

Yue Wang

¹,

Qinglong Zhou

²

and

Xing Wang

^2,*

¹

Beijing Institute of Space Mechanics and Electricity, Beijing 100094, China

²

School of Marine Science and Technology (SMST), Tianjin University (TJU), Tianjin 300072, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(7), 1175; https://doi.org/10.3390/jmse12071175

Submission received: 16 June 2024 / Revised: 4 July 2024 / Accepted: 8 July 2024 / Published: 13 July 2024

(This article belongs to the Section Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Underwater image enhancement is critical for a variety of marine applications such as exploration, navigation, and biological research. However, underwater images often suffer from quality degradation due to factors such as light absorption, scattering, and color distortion. Although current deep learning methods have achieved better performance, it is difficult to balance the enhancement performance and computational efficiency in practical applications, and some methods tend to cause performance degradation on high-resolution large-size input images. To alleviate the above points, this paper proposes an efficient network GFRENet for underwater image enhancement utilizing gated linear units (GLUs) and fast Fourier convolution (FFC). GLUs help to selectively retain the most relevant features, thus improving the overall enhancement performance. FFC enables efficient and robust frequency domain processing to effectively address the unique challenges posed by the underwater environment. Extensive experiments on benchmark datasets show that our approach significantly outperforms existing state-of-the-art techniques in both qualitative and quantitative metrics. The proposed network provides a promising solution for real-time underwater image enhancement, making it suitable for practical deployment in various underwater applications.

Keywords:

underwater image enhancement; gated linear units; fast Fourier convolution; attention mechanism

1. Introduction

Underwater imaging plays a pivotal role in many marine applications such as underwater navigation [1], marine biology [2], and archaeological exploration [3]. However, acquiring high-quality images underwater is inherently challenging due to the physical characteristics of the underwater environment. Light absorption and scattering can significantly degrade image quality, leading to reduced visibility and color distortion. In the face of these challenges, it is necessary to develop robust image enhancement techniques to obtain clear and accurate visual information from underwater scenes.

In general, existing underwater image enhancement (UIE) methods are categorized into three groups. The first category consists of traditional underwater image enhancement methods, which typically rely on histogram equalization [4], white balance adjustment [5], and dehazing techniques [6]. Although these methods can provide improvements, they are usually difficult to generalize across different underwater conditions and may introduce artifacts. The second category consists of physical model-based methods [7] that focus on accurately estimating the transmission map to generate enhanced images. However, the effectiveness of these model-based methods is limited to less complex environments. In addition to this, the third category is deep learning-based methods [8], and recent advances have shown that the performance of deep learning-based methods for underwater image enhancement far exceeds that of non-deep learning methods, which utilize large datasets and complex models to learn effective enhancement strategies, and still achieve better performance in complex environments. Significant progress has been made in deep learning-based underwater image enhancement methods [9,10,11,12].

However, deep learning models typically require significant computational resources for training and inference, especially when using complex Generative Adversarial Networks (GANs) [13] and Vision Transformer [14]. In practical applications, real-time processing is an important requirement, but complex models may not be able to run under real-time requirements. Meanwhile, the optical properties of different waters vary greatly, and it is difficult for a single model to adapt to different underwater environments. There are obvious differences in image properties at different depths, turbidity, and lighting conditions. In addition, the enhancement performance of the model is inconsistent for input images of different sizes.

To address these challenges, we propose an efficient network architecture, GFRENet, which combines gated linear units (GLUs) and fast Fourier convolution (FFC) for underwater image enhancement. GLUs act as a selective mechanism that allows the network to retain essential features while discarding irrelevant information. This selective feature retention enhances the network’s ability to generate high-quality images. At the same time, the FFC component facilitates efficient frequency domain processing, which is particularly effective in dealing with specific distortions prevalent in underwater images. Also, the ability to capture the global frequency features of the image through the Fourier transform allows the structure to adapt to input images of different size dimensions. Validation results on a large number of reference and reference-free datasets can demonstrate that the method proposed in this paper is able to achieve superior performance with lower computational requirements as well as better generalization compared to other state-of-the-art (SOTA) methods.

The main contributions of this paper are as follows:

(1): We introduce a novel underwater image enhancement network, GFRENet, capable of end-to-end adaptation to input images of different sizes. Extensive experiments on standard underwater image datasets demonstrate the superior performance of our proposed network, which is a significant improvement over existing methods;
(2): We introduce gated linear units combined with adaptive channel weights to construct a gated convolutional layer module, which can effectively aggregate spatial information and transform features efficiently;
(3): Fast Fourier convolution and pixel attention are used to construct the Fourier convolution layer module for further feature refinement. The distortion of underwater images can be effectively dealt with by efficient frequency domain processing. It still maintains its effective performance on large-size input images.

In the next sections, we review related work, describe the proposed network architecture, present experimental results, and discuss the implications of our findings. Our approach aims to bridge the gap between high-quality underwater image enhancement and practical real-time applications, contributing to the advancement of underwater imaging technology.

2. Related Work

2.1. Underwater Image Enhancement

Underwater image enhancement has been an active research area because of its importance in various marine applications. Traditional underwater image enhancement methods mainly include histogram equalization [4], white balance adjustment [5], and dehazing techniques [6]. Histogram equalization aims to improve the overall contrast of an image by redistributing pixel intensities but often leads to over-enhancement and loss of details. White balance adjustment corrects color distortion by adjusting the color balance based on estimated light levels but has limited effectiveness in complex underwater environments with variable lighting conditions. Dehazing techniques originally developed for terrestrial images have been used for underwater images to minimize the effects of light scattering. These methods, such as the dark channel prior (DCP), often require additional assumptions and may introduce artifacts when applied to underwater scenes.

Recent advances in deep learning have greatly improved underwater image enhancement. Convolutional neural networks (CNNs) are used to learn end-to-end mapping from degraded to enhanced images. Promising results have been achieved based on data-driven approaches by utilizing large-scale datasets to train models capable of enhancing underwater images. An underwater convolutional neural network (UWCNN) based on a priori underwater scenes was proposed by Li et al. [15]. Unlike traditional methods for estimating parameters of underwater imaging models, UWCNN directly reconstructs clear underwater images using an underwater scene prior, which is used to synthesize underwater training data, avoiding the need for explicit parameter estimation and focusing on end-to-end image reconstruction. Subsequently, Li et al. [16] introduced Water-Net, an underwater image enhancement network trained on the Underwater Image Enhancement Benchmark (UIEB). Water-Net generates three inputs based on the characteristics of underwater image degradation via white balance, histogram equalization, and gamma correction. The gated fusion network then learns the three confidence maps and combines these inputs into an enhanced image. Li et al. [17] also developed Ucolor, an underwater image enhancement network guided by medium transfer and multi-color space embedding. Ucolor has a network of multi-color space encoders to enrich the feature representations by fusing features from different color spaces. The network incorporates an attention mechanism that adaptively integrates and highlights the most discriminative features in these color spaces. A media-transmission-guided decoder network enhances the network’s response to regions of degraded quality. Fu et al. [18] proposed a probabilistic network, PUIE-Net, that uses a conditional variational autoencoder combined with adaptive instance normalization to learn the enhancement distribution of degraded underwater images. By predicting deterministic results based on a set of samples in the distribution through a consensus process, the method copes to some extent with the bias introduced in the reference map labeling. Qi et al. [19] introduced SGUIENet, an underwater image enhancement network that uses semantic information as a high-level guide. The network includes a semantic region enhancement module that is able to sense and enhance the degradation of different semantic regions at multiple scales. Guo et al. [20] proposed URanker, a ranking-based method for underwater image quality assessment based on an efficient convolutional attention image Transformer. In addition, a tail normalization technique was proposed to significantly improve the performance of underwater image enhancement networks.

Chi et al. [21] proposed Trinity-Net, a new model that combines a priori and deep learning strategies for recovering realistic surface information. Trinity-Net incorporates a priori information into a CNN and Swin Transformer to accurately estimate haze parameters. The model includes a gradient-guided module that inherits the structural prior of the gradient map and guides the depth model to generate visually satisfying details. Peng et al. [22] introduced a U-shaped Transformer network for underwater image enhancement that integrates the Channel Multiscale Feature Fusion Transformer (CMSFFT) and Spatial Global Feature Modeling Transformer (SGFMT). This approach enhances the network to focus on color channels and heavily attenuated spatial regions. A novel loss function combining RGB, LAB, and LCH color spaces was designed to further improve contrast and saturation. Wang et al. [23] proposed TUDA, a two-stage underwater domain adaptation network designed to minimize both inter-domain and intra-domain gaps simultaneously. TUDA employs a bi-aligned network for the translation and enhancement phases and performs both image-level and feature-level adaptation through joint adversarial learning to bridge the inter-domain gaps. Ease and difficulty classification of real data is based on the assessment quality of the enhanced image, embedding a rank-based underwater quality assessment method, and utilizing the implicit quality information learned from the ranking to more accurately assess the perceived quality of the enhanced image. Khan et al. [24] introduced a multi-domain query cascade Transformer network Spectroformer for underwater image enhancement. The method includes a multi-domain query cascade attention mechanism that integrates local transmission features and global illumination features. An attention block based on spatial spectral fusion is proposed and a hybrid Fourier spatial upsampling module is introduced to effectively enhance the feature resolution. Qi et al. [25] developed CCMSR-Net to decompose the underwater image enhancement task into two processes, namely, color correction and visibility enhancement. The CCMSR-Net consists of a color correction sub-network (CC-Net) and a multi-scale Retinex sub-network (MSR-Net), based on a hybrid convolutional axial attention block. This structure can effectively capture local features and global context for efficient underwater image enhancement.

However, these models typically involve high computational costs and require heavily labeled training data, which limits their applicability in real-time scenarios and under various underwater conditions. Lightweight networks focus on achieving comparable enhancement quality in environments with limited computational resources. These models are designed for real-time applications and environments with limited computational power, and a great deal of research has been done by scholars on the subject:

Naik et al. [26] proposed Shallow-UWnet, a shallow neural network architecture that maintains performance and has fewer parameters than state-of-the-art models. The network is computationally efficient while providing high-quality enhancement. Jiang et al. [27] developed FA + Net, an efficient and lightweight real-time underwater image enhancement network with only ~9 k parameters and ~0.01 s processing time. FA + Net uses a two-stage enhancement architecture: a powerful pre-stage that decomposes challenging underwater degradation into sub-problems, and a fine-grained stage combining multiple branch color enhancement and pixel attention modules to enhance the perception of details. Liu et al. [28] introduced Boths, an ultra-lightweight underwater image enhancement neural network with only 0.0064 M parameters. Boths focuses on structural and detail features, pixel and channel dimensions, and high and low frequency information. The network uses 3D attention to consider both channel and pixel dimensions, and a novel loss function based on the wavelet transform to focus on high and low frequency information.

Our proposed network builds on these advances by integrating gated linear units (GLUs) that selectively retain important features to improve the overall performance of the enhancement process. This selective feature retention mechanism allows the network to focus on the most relevant aspects of the image, resulting in an enhancement quality that outperforms traditional and existing deep learning-based methods.

2.2. Fast Fourier Convolution

Fast Fourier convolution (FFC) is a recent development in the field of deep learning that utilizes the Fourier transform for efficient feature extraction and processing. Traditional spatial-domain convolution operations are computationally expensive, especially for high-resolution images. The FFC method solves this problem by transforming the input image to the frequency domain using the Fast Fourier Transform (FFT), performing the convolution operation, and then transforming the result back to the spatial domain using the inverse FFT. Fast Fourier convolution (FFC) has been widely used in several low-vision tasks in recent years due to its unique frequency domain operation and global receptive field.

Chi et al. [29] proposed a novel convolution operator called fast Fourier convolution (FFC), which is mainly characterized by having a nonlocal receptive field and cross-scale fusion within the convolution unit. According to the spectral convolution theorem in Fourier theory, the point-by-point update in the spectral domain globally affects all the input features involved in the Fourier transform, which provides clues for the design of neural architectures with nonlocal receptive fields. Huang et al. [30] on the other hand, introduced a new perspective of using spatial frequency interactions for exposure correction. They proposed a depth Fourier-based exposure correction network (FECNet), which consists of a magnitude sub-network and a phase sub-network to progressively reconstruct the representation of luminance and structure components. The frequency characteristics of differently exposed images are revisited through the Fourier transform, and a Spatial Frequency Interaction block is proposed to interactively process local spatial features and global frequency information to encourage complementary learning. The study by Sinha et al. [31] focuses on the problem of upsampling high-resolution images. Most methods learn upsampling by using convolution in the spatial domain, which is usually limited to local features, thus limiting the sensory field of the network. To alleviate this problem, Sinha et al. proposed an architecture that combines nonlocal attention-assisted fast Fourier convolution (NL-FFC) to expand the receptive field and learn remote dependencies to generate high-quality images. Suvorov et al. [32] proposed a new approach called Large Mask Restoration (LaMa). LaMa employs a new restoration network architecture using the FFC with image-wide receptive fields. In addition, the use of sensory field-aware loss and large training masks further unleashes the potential of the repair network to achieve the state of the art on a range of datasets, enabling excellent performance even in challenging scenes. To further improve the efficiency of capturing global information, Zhang et al. [33] proposed SwinFIR, which extends SwinIR by replacing the FFC component with an image-range sensing field. The method demonstrates significant performance gains across different tasks. To alleviate the problem of unclear texture details in reconstructed images due to CNNs, Zhuang et al. [34] propose a novel module using Fourier coefficients, which can recover high-quality texture details and complement the spatial domain under the constraints of frequency-phase semantics. Unlike existing low-light image enhancement methods that address the spatial domain problem, Li et al. [35] proposed a new solution, UHDFour, that embeds the Fourier transform into a cascade network. By embedding the Fourier transform into the network, the magnitude and phase of low-light images can be processed separately to avoid amplifying noise when enhancing luminance. In addition, UHDFour can be extended to ultra-high-resolution images by realizing magnitude and phase enhancement under low-resolution conditions and then adjusting the high-resolution scaling with a small amount of computation.

In summary, the application of fast Fourier convolution in different low-vision tasks demonstrates its advantages in processing global information and improving image quality. These studies provide theoretical and practical support for our application of FFC in underwater image enhancement. The use of the Fast Fourier Transform has been shown to significantly reduce computational complexity while maintaining or even improving the quality of the results. By operating in the frequency domain, the FFC can more efficiently capture global information and handle variations in the input data. Therefore, it is particularly suitable for applications such as image enhancement, where global features play a crucial role.

In underwater image enhancement, FFC has several advantages. Unique distortions present in underwater images, such as different light absorption and scattering effects, are usually more pronounced in the frequency domain. By applying FFC, our proposed network can handle these distortions more efficiently, thus improving the enhancement performance. In addition, the computational efficiency of FFC makes it well suited for real-time applications, a key requirement for many underwater imaging tasks.

Compared to traditional spatial domain convolution, our study takes advantage of FFC to enhance underwater images more efficiently. By combining FFC with gated linear cells, our network achieves a balance between computational efficiency and enhancement quality, providing a practical solution for underwater image enhancement under various conditions. In the next sections, we will detail the structure of our proposed network, describe the training process, and show experimental results to demonstrate the effectiveness of our approach.

3. Proposed Method

Underwater images often suffer from various distortions such as blurring, color degradation and reduced visibility due to light absorption and scattering. Enhancement of these images is critical for numerous applications such as marine biology, underwater detection, and navigation. Conventional methods, while useful, often fail to provide the necessary clarity and color accuracy. To address these issues, we propose a novel underwater image enhancement network that utilizes advanced deep learning techniques to improve the visual quality of underwater images.

3.1. Network Overall Architecture

Our proposed network GFRENet is a 7-stage variant of the U-Net [36] architecture that integrates our innovatively proposed base modules: a gated convolutional layer module (GCLM) and frequency feature enhancement module (FFEM), as shown in Figure 1. The network first uses convolution to create low-level feature embeddings

F_{0} \in R^{H \times W \times C}

from the degraded input image

I \in R^{H \times W \times 3}

, where

H

and

W

represent the spatial dimensions and

C

represents the number of channels. These shallow features are then converted to deeper features using a three-stage symmetric encoder-decoder, with extensive use of the base module in each stage.

The encoder starts with high-resolution inputs, increasing the channel capacity and gradually reducing the spatial dimension. The decoder uses the low-resolution latent features as inputs and methodically recovers the high-resolution representation. We use the traditional convolutional layer method for feature downsampling, setting the step size value to 2 and doubling the number of output channels to the number of input channels. We use the 1 × 1 convolution + PixelShuffle [37] method for upsampling. As shown in the experimental section, these design choices resulted in observable quality improvements. Finally, a 3 × 3 convolutional layer is used to transform the feature dimensions from

C

to 3, generating the residual image

R \in R^{H \times W \times 3}

. And, it is added to the original image to obtain the recovered image:

\hat{I} = I + R

. The image is then upsampled to the original image, which is the same as the original image.

3.2. Gating Convolutional Layer Module

The gated convolutional layer module is designed to selectively retain the most relevant features, thereby improving overall enhancement performance. The module uses a gating mechanism to control the flow of information through the network, allowing it to focus on important details and discard irrelevant information. In this way, the network can better handle complex variations in underwater images, significantly improving clarity and color accuracy.

According to previous enhancement networks, the underlying blocks are broadly categorized into two types, a CNN block that incorporates channel attention and spatial attention, and a transformer block that uses self-attention. Enhancement of underwater images is related to the model’s global and local perceptual capabilities, which means that the network should encode degraded global information and therefore requires a larger effective perceptual field. In conventional CNNs and transformer blocks, increasing the receptive field by increasing the size of the convolutional kernel causes an increase in parameters and computation by a factor of squares. However, related studies [38,39] have shown that the performance of image restoration can be improved by simply replacing the activation function with GLUs in the baseline. Moreover, GLUs have less computational overhead and can efficiently capture global information from features. We build the gated convolutional layer module (GCLM) based on GLUs, the details of which are shown in Figure 2.

Letting

x

to be the feature map, we first use BatchNorm to normalize it by

\hat{x} = B a t c h N o r m (x)

.

\begin{matrix} x_{1} = Sigmoid ({PWConv}_{1} (\hat{x})), \\ x_{2} = DWConv ({PWConv}_{2} (\hat{x})) . \end{matrix}

(1)

where PWConv denotes the point-by-point convolutional layer, i.e., 1×1 Conv, and DWConv denotes the depthwise convolutional layer. We then use

x_{1}

as the gating signal for

x_{2}

and then project it using another 1 × 1 Conv which can be represented as

x_{3} = {PWConv}_{3} (x_{1} \cdot x_{2}) .

(2)

Then,

x_{3}

is computed with its adaptive channel attention, and the dot product of the computed channel weights and the feature

x_{3}

, and then the output is shortcut-connected to the input

x

, which can be expressed as

\begin{matrix} A C A = Sigmoid (Conv 1 d (GAP (x_{3}))), \\ y = x + x_{3} \otimes A C A . \end{matrix}

(3)

In traditional channel attention, the kernel size is fixed and does not change during training. This means that some features may be over-smoothed (due to a large kernel size) or under-smoothed (due to a small kernel size) by a fixed kernel size, which results in the loss of important information, thus degrading performance. To avoid this problem, adaptive channel attention (ACA) adaptively selects the kernel size based on the number of input feature channels. The adaptive kernel size k is determined by the following equation:

k = ψ (C) = {| \frac{l o g_{2} (C)}{γ} + \frac{b}{γ} |}_{o d d,}

(4)

where

{| t |}_{o d d}

denotes the closest odd number to

t

. We set

γ

and

b

to 2 and 1, respectively, in all experiments. It is clear that by mapping

ψ

, the high-dimensional channels have longer-range interactions, while the low-dimensional channels have shorter-range interactions through the use of a nonlinear mapping.

3.3. Frequency Feature Enhancement Module

The frequency feature enhancement module is designed to capture and integrate local and global features. The module operates in both the spatial and frequency domains, utilizing the strengths of both to improve feature representation. By processing features in the frequency domain, the network can capture global patterns and structures that are critical for enhancing underwater images. The integration of spatial and frequency domain features allows the network to effectively cope with unique distortions in underwater images.

Recent research [24,27] has shown that global background illumination and texture in underwater images can be partially decomposed in the Fourier domain. However, current methods for recovering degraded images mainly rely on spatial domain processing, and traditional convolutional methods often ignore the rich global information present in the Fourier domain. To address this issue, we propose a module called the frequency feature enhancement module (FFEM) for underwater image enhancement, the details of which are shown in Figure 3. FFEM achieves perceptual field coverage over the entire image by fusing feature information in the Fourier domain, thereby improving the perceptual quality and parametric efficiency of the network.

Letting x to be the feature map, we firstly and similarly use the BatchNorm to normalize it by

\hat{x} = B a t c h N o r m (x)

and then use the FFT to obtain the magnitude component (mag) and phase component (pha) as follows:

F_{M A G}, F_{P H A} = f_{F F T} (\hat{x}) .

(5)

These two components are then fed into two separate convolutional layers with 1 × 1 kernels. When dealing with magnitude and phase, we use only the 1 × 1 convolution kernel to avoid corrupting the structural information, and then transform them back to the spatial domain via IFFT.

x_{1} = f_{I F F T} (f_{F C} (F_{M A G}), f_{F C} (F_{P H A})),

(6)

where

f_{F C}

is the PWConv-GELU-PWConv operation for the convolution of frequency domain rates. The spatial weights are then obtained using pixel attention. Pixel attention is effective in extracting location-related information features such as the degree of degradation of different pixels on the image.

\begin{matrix} P A = Sigmoid (PWConv (GELU (PWConv (x_{1})))), \\ x_{2} = x_{1} \otimes P A . \end{matrix}

(7)

The final output obtained is connected to the input

x

with weighted residuals to adaptively fuse the spatial and frequency features.

y = x + x_{2} \cdot α

(8)

3.4. Loss Function

To train the network, we use the L1 loss function, which is defined as the average absolute error between the predicted image and the real image. The L1 loss is particularly effective for image enhancement tasks because it encourages the network to generate clear and accurate details without introducing too much smoothing. The loss function is formulated as follows:

L = \frac{1}{N} \sum_{i = 1}^{N} ‖ {\hat{I}}_{i} - I_{i} ‖_{1},

(9)

where

{\hat{I}}_{i}

and

I_{i}

are the enhanced and real images, respectively, and

N

is the number of training samples.

4. Experimental Results

4.1. Experimental Settings

4.1.1. Datasets

We conducted experiments on several publicly available underwater image datasets, including the LSUI (Large Scale Underwater Image) dataset [22] and the UIEB (Underwater Image Enhancement Benchmark) dataset [16]. The LSUI dataset contains a large number of underwater images covering a wide range of underwater scenes and objects. These images may come from different water environments such as oceans, lakes, and rivers, etc. The UIEB dataset contains 950 real underwater images, of which 890 have corresponding reference images and 60 are unreferenced images. A total of 45 real underwater images, named U45, have been carefully selected by Li et al. [40] based on the color bias of underwater degradation, low contrast, and blurring effects, which are divided into three subsets: green, blue, and haze. In our experiments, we use the LSUI dataset Train-L (3879 images) for training and then test it on the LSUI test set Test-L400 (400 images). In addition, to verify that in order to evaluate the generalization performance of GFRENet, we use the C60 (degraded image without reference in the UIEB dataset) and U45 datasets for testing.

4.1.2. Implementation Details

We implemented the proposed network using Python and PyTorch frameworks (version 1.11.0). All experiments were performed on a workstation equipped with eight NVIDIA TITAN XP graphics cards (Nvidia Corporation, Santa Clara, CA, USA). The model was trained using the AdamW [41] optimizer, and the training was performed for a total of 10,000 iterations. The initial learning rate was 0.001, which was then gradually reduced to 1 × 10⁻⁶ using a cosine annealing scheme [42]. For each GPU, the allocated batch size was 16, and the training patch size was 256 × 256. Data augmentation techniques, such as random cropping and level-flipping, were used during the training process in order to improve the generalization ability of the model.

4.1.3. Evaluation Metric and Benchmark Methods

In order to evaluate the performance of the model, we adopt commonly used image quality evaluation metrics, including PSNR (peak signal-to-noise ratio) and SSIM (structural similarity index) [43]. PSNR is used to measure the error between the enhanced image and the reference image, with larger values indicating better image quality. SSIM is used to evaluate the structural similarity of the images, with a value closer to 1 indicating a more similar image structure. In addition, we choose the UIQM (Underwater Image Quality Measure) [44] and UCIQE (Underwater Color Image Quality Evaluation) [45] to evaluate the quality of the recovered underwater images for the reference-free evaluation metrics, and the larger the values of UIQM and UCIQE, the higher the quality of the recovered underwater images. For UIQM and UCIQE, the higher the value, the higher the quality of the underwater image.

PSNR is calculated using the following formula:

PSNR = 10 l o g_{10} (\frac{m a x^{2}}{M S E}) .

(10)

Here,

m a x

denotes the maximum possible value of an image pixel. If the image is 8-bit,

m a x

is usually 255.

M S E

represents the root mean square error between the two images being compared.

SSIM evaluates three basic aspects of an image: brightness, contrast, and structure. SSIM is able to quantify the degree of distortion within an image and the degree of similarity between two images. Therefore, it is more in line with the intuitive visual perception of the human eye. SSIM is defined as follows:

SSIM (x, y) = \frac{(2 u_{x} u_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(u_{x}^{2} + u_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})},

(11)

where

u_{x}

and

u_{y}

are the mean values of images

x

and

y

;

σ_{x y}

is the covariance of images

x

and

y

;

σ_{x}

is the variance of

x

;

σ_{y}

is the variance of

y

; and

C_{1}

and

C_{2}

are constants used to ensure the stability of the computation and to avoid the denominator being zero.

UIQM is used to evaluate the quality of underwater images, taking into account the contrast, color, and sharpness. Its calculation formula is as follows:

UIQM = c_{1} \cdot UICM + c_{2} \cdot UISM + c_{3} \cdot UIConM .

(12)

where UICM (Underwater Image Colorfulness Measure) is used to measure the colorfulness of the image, UISM (Underwater Image Sharpness Measure) is used to measure the sharpness of the image, UIConM (Underwater Image Contrast Measure) is used to measure the contrast of the image, and

C_{1}

,

C_{2}

, and

C_{3}

are weighting factors with values of 0.0282, 0.2953, and 3.5753, in that order. UCIQE is used to evaluate the quality of an underwater image, which mainly takes into account the color, contrast, and saturation. The calculation formula is as follows:

UCIQE = c_{1} \cdot σ_{c} + c_{2} \cdot c o n_{l} + c_{3} \cdot μ_{s} .

(13)

where

σ_{c}

is the standard deviation of the image color;

{c o n}_{l}

is the contrast of the image brightness;

μ_{s}

is the mean of the image saturation; and

C_{1}

,

C_{2}

, and

C_{3}

are weighting factors, usually taking the values of 0.4680, 0.2745, and 0.2576, respectively.

In addition, the number of parameters (#Param), and the number of floating point operations (FLOPs) are chosen as a measure of the computational complexity of the model. The number of parameters refers to the total number of parameters included in the model, which usually include weights and biases. The number of parameters directly determines the size and complexity of the model, as well as the amount of memory the model takes up during inference. FLOPs are mainly used as a measure of the computational complexity of the model. It represents the number of floating point operations required for the model to perform one forward propagation. The size of FLOPs in the experiment is calculated based on the input image size 1280 × 720.

We have selected several state-of-the-art underwater image enhancement methods for comparison, including UWCNN [15], Shallow-UWnet [26], NU2Net [20], FANet [27], and Spectroformer [24]. All the compared methods are experimented on with their public codes to ensure the fairness and reproducibility of the results.

4.2. Quantitative Evaluations

After comparing and analyzing the performance metrics of different methods on the LSUI test set and the C60 and U45 unreferenced datasets, we comprehensively evaluated the comprehensive performance of GFRENet, especially its advantages under various evaluation criteria. Table 1 and Table 2 show the specific performance of each method on the LSUI test set and the C60 and U45 datasets, respectively, where the best and second-best results are marked in red and blue, respectively.

As can be seen from the tables, our proposed GFRENet method demonstrates significant advantages in several aspects. Firstly, on the referenced dataset L400, GFRENet achieves a PSNR of 26.4995, which is the highest among all the methods, indicating that it performs the best in terms of restoring image quality. Although its SSIM of 0.8793 is slightly lower than Spectroformer’s 0.8850, it is still in the high-score range, indicating that it still has an advantage in maintaining image structure. Second, GFRENet achieved a score of 2.7576 on the UIQM, which is higher than all other methods, showing its superiority in color reproduction and contrast. Although slightly lower than FANet with a score of 0.5805 on the UCIQE metric, it is still higher than most of the other methods, showing a good performance on integrated image quality.

On the no-reference dataset C60, GFRENet’s UIQM score of 2.392 is the highest among all methods, indicating its excellent performance in color reproduction and contrast. Meanwhile, the UCIQE score is 0.547, which is slightly lower than FANet but still higher than most other methods, showing its superiority in color and contrast evaluation. In the U45 dataset, GFRENet’s UIQM score of 2.708 is second only to NU2Net, showing high image quality, while its UCIQE score of 0.575 is close to the highest score, showing superior color and contrast performance.

Most notably, GFRENet demonstrates great advantages in terms of the number of parameters and computational efficiency. Its parameter count is only 0.3972 M, much lower than NU2Net’s 3.1459 M and Spectroformer’s 2.4062 M, which indicates that the model is more lightweight. In addition, the number of floating point operations (FLOPs) of GFRENet is 16.4209 G, which is significantly lower than that of UWNet’s 304.1722 G and Spectroformer’s 221.5794 G, which means that while maintaining high performance, GFRENet significantly reduces the demand for computational resources, reflecting extremely high computational efficiency. This enables GFRENet to not only provide high-quality image restoration, but also to operate efficiently in environments with limited computing resources.

In summary, GFRENet’s overall performance on the LSUI test set and the C60 and U45 unreferenced datasets is excellent. It is close to or reaches the best values in several key metrics (PSNR, SSIM, UIQM, UCIQE), especially in image restoration quality and comprehensive image quality evaluation. Meanwhile, GFRENet has significant advantages in the number of parameters and computational efficiency, demonstrating extremely high efficiency and practical application potential. As can be seen in Figure 4, our proposed method GFRENet significantly outperforms other methods in PSNR and SSIM metrics and has significantly lower computational overhead.

4.3. Qualitative Comparisons

To validate the effectiveness of GFRENet, this subsection compares the enhancement effects of multiple existing methods in different underwater image scenes. The performance, strengths, and weaknesses of the algorithms can be evaluated comprehensively and objectively by visualizing the quality assessment of the processed images.

First, the first set of images shows a submersible underwater scene. The original image suffers from color distortion and blurring due to light refraction and scattering (see Figure 5a); UWCNN improves in color balance, but the image is still blurred (see Figure 5b); UWNet improves in clarity but suffers from a slight color distortion (see Figure 5c); NU2Net improves in image clarity but the color contrast is not satisfactory (see Figure 5d); FANet has enhanced contrast but introduces some color artifacts (see Figure 5e); and Spectroformer has a balanced enhancement effect, but the image is slightly blurred (see Figure 5f). In contrast, GFRENet performs well in terms of color correction and sharpness enhancement, with a natural image and no noticeable artifacts (see Figure 5g). GFRENet effectively preserves geometric features and provides a natural color balance, resulting in a clearer and more realistic image.

The second set of images shows an underwater reef scene. The original image has a distinct greenish tint with missing details due to the absorption and scattering effects of the water (see Figure 6a); UWCNN reduces the greenish tint but introduces some blurring (see Figure 6b); UWNet, despite the improved clarity, still has less accurate colors (see Figure 6c); the NU2Net image is better in terms of clarity but the colors are not vivid enough (see Figure 6d); FANet reduces the greenish tint but introduces other color artifacts (see Figure 6e); and Spectroformer enhancements are balanced but not sharp enough (see Figure 6f).GFRENet performs well in color correction and detail retention, resulting in a naturally crisp image (see Figure 6g). GFRENet is particularly good at correcting color distortion and preserving detail, making the images visually pleasing and accurate.

The third set of images shows a scene of a seafloor biotope containing corals and other marine organisms. The original image has low visibility and a strong green tint due to the scattering effect of the water (see Figure 7a); UWCNN reduces the green tint but introduces noise (see Figure 7b); UWNet improves visibility but the colors are still dark (see Figure 7c); NU2Net improves visibility and color balance but the results are less than satisfactory (see Figure 7d); FANet reduces the green tint but loses some detail (see Figure 7e); Spectroformer enhances the balance but is not sharp enough (see Figure 7f); GFRENet significantly improves the green tint problem, enhances visibility, and retains detail without introducing noise (see Figure 7g); GFRENet excels in color correction, visibility enhancement, and detail retention. GFRENet excels in color correction, visibility enhancement, and detail retention, making it an effective tool for underwater image enhancement.

The fourth set of images shows a scene with an underwater statue and divers. The original image has a noticeable greenish tint and blurring due to light refraction and scattering (see Figure 8a); UWCNN reduces the greenish tint but the image is still blurred (see Figure 8b); UWNet improves the clarity but the colors are still distorted (see Figure 8c); NU2Net has better clarity but unsatisfactory color contrast (see Figure 8d); FANet shows contrast enhancement but introduces some color artifacts (see Figure 8e); Spectroformer enhancement is balanced, but the image is slightly blurred (see Figure 8f). GFRENet excels in color correction and clarity enhancement, with a natural image and no noticeable artifacts (see Figure 8g). GFRENet efficiently preserves geometric features and provides a natural color balance, resulting in a clearer and more realistic image. In particular, it excels in removing green tones and enhancing details, resulting in more distinct silhouettes of the divers and statues.

The fifth set of images shows a scene of an underwater coral reef and other marine life. The original image has low visibility and exhibits large color distortions due to the scattering effect of the water (see Figure 9a).The UWCNN has some color correction but introduces some noise (see Figure 9b); the UWNet has improved clarity but the colors are not sufficiently saturated (see Figure 9c); the NU2Net image has better clarity but the color contrast is unsatisfactory (see Figure 9d); the FANet exhibits color enhancement, but there is a loss of detail (see Figure 9e); Spectroformer enhancement is balanced but not sharp enough (see Figure 9f). GFRENet significantly improves color distortion, enhances visibility and detail, and the image is naturally clear (see Figure 9g). GFRENet performs well in correcting color distortion, improving visibility, and retaining detail; in particular, GFRENet’s enhanced images have higher clarity and realism, allowing the details of underwater creatures and corals to be fully displayed.

In summary, GFRENet demonstrates significant superiority in color correction, detail retention, and noise reduction. Compared with other methods (e.g., UWCNN, UWNet, NU2Net, FANet, and Spectroformer), GFRENet provides more natural and clearer image enhancement. Through detailed comparative analysis, this paper demonstrates the excellent performance of GFRENet in underwater image enhancement tasks and provides an effective solution for the field of underwater image processing.

4.4. Ablation Analysis

In order to delve deeper into the role of each module, this section compares the use and non-use of the adaptive channel attention, gated convolutional layer module, and the frequency characteristic enhancement module to assess the effectiveness of these components in the model. The qualitative results for each metric are shown in the Table 3. We can see from the table that all the proposed modules significantly improve the model performance, while the best performance is achieved when all the modules are used simultaneously.

(1): Effectiveness of gated convolutional layer module (w/o GCLM)

From the data in the table, it can be seen that when the gated convolutional layer module is removed, the PSNR value of the model is 25.715 and the SSIM value is 0.871. Compared to the full GFRENet model (PSNR of 26.500 and SSIM of 0.879), the removal of the GCLM results in a reduction in the PSNR by 0.785 and in the SSIM by 0.008. This indicates that the gated convolutional layer module contributes significantly to the model performance improvement, especially in terms of image clarity and structural similarity.

(2): Effectiveness of frequency feature enhancement module (w/o FFEM)

When the frequency feature enhancement module is removed, the PSNR value of the model is 25.845 and the SSIM value is 0.870. Compared to the full GFRENet model, the PSNR is reduced by 0.655 and the SSIM is reduced by 0.009. Although the effect of removing the FFEM on the model’s performance is slightly smaller than that of removing the GCLM, it still demonstrates the module’s importance in enhancing image details and overall quality.

The complete GFRENet model has the highest PSNR value of 26.500 and the highest SSIM value of 0.879, indicating that the model combining the two modules, GCLM and FFEM, achieves the best performance in terms of image enhancement effects. These results validate the effectiveness and necessity of these two modules.

5. Discussion

The experimental results show that the proposed method has significant advantages in underwater image enhancement tasks. The combination of the gated convolutional layer module and the frequency feature enhancement module effectively improves the image quality and verifies the rationality and effectiveness of our design. Deep learning-based underwater image enhancement methods have made significant progress in recent years, mainly due to data-driven learning capabilities and powerful feature extraction. However, these methods still face many challenges and research hotspots in terms of dataset production and algorithm real-time performance. These two aspects are discussed below.

In terms of dataset production, the performance of a deep learning model relies heavily on the quality and quantity of the dataset. For underwater image enhancement, the dataset needs to cover a variety of underwater environments and conditions, such as different depths, light, turbidity, and water composition. Due to the complexity and variability of underwater environments, constructing a diverse and large-scale dataset is a difficult task. Underwater image enhancement requires high-quality labeled data, including clear reference images and corresponding degraded images. However, obtaining high-quality reference images is often very difficult, especially in natural waters. As a result, researchers often use simulated degradation methods to generate training data, but this may lead to inconsistencies between the training data and the real scene, thus affecting the generalization ability of the model.

In terms of algorithm real-time improvement, deep learning models usually have a large number of parameters and complex computational structures, resulting in high computational costs in practical applications. To improve the real-time performance of the algorithms, researchers need to explore lightweight model design, such as utilizing techniques such as depth-separable convolution, pruning, and quantization, to reduce the number of model parameters and computation. In addition, in order to cope with the variations of different underwater environments, online learning and adaptive techniques are particularly important. These techniques allow the model to be continuously updated and optimized during actual operation, thus maintaining efficient real-time performance. Meanwhile, adaptive methods also enhance the robustness and applicability of the model in various environments.

Deep learning-based underwater image enhancement methods still face many challenges in terms of dataset production and algorithm real-time performance. The performance and usefulness of the algorithms can be further enhanced by constructing diverse and high-quality underwater image datasets, as well as optimizing the model structure and inference framework. Future research should focus on the disclosure and sharing of datasets, lightweight design of models, and efficient real-time inference techniques to promote the development and application of underwater image enhancement techniques.

6. Conclusions

In this study, we propose a novel underwater image enhancement network that combines a gated convolutional layer module and a frequency feature enhancement module, which significantly improves the clarity and color restoration of underwater images by effectively capturing and integrating local and global features. The experimental results show that our method outperforms the existing state-of-the-art methods on the LSUI dataset and achieves the optimal score in the PSNR metric. Meanwhile, the underwater image enhancement performance is also more excellent on other datasets. This is thanks to our proposed modules, specifically, the gated convolutional layer module effectively reduces image distortion and improves the detail and quality of the enhanced image by selectively preserving relevant features. The frequency feature enhancement module, on the other hand, captures global patterns and structures by processing features in the frequency domain, enabling the network to better cope with complex variations and distortions in underwater images. Our ablation experiments further validate the contribution of each module to the overall performance, showing that the combination of the two is a key factor in improving image quality. The visual evaluation results also show that our method has obvious advantages in detail preservation and color restoration, and the enhanced images are clearer and more natural. In conclusion, the method proposed in this study demonstrates significant performance enhancement in the field of underwater image enhancement and provides an effective solution for dealing with complex underwater imaging problems. Future work will continue to optimize the model structure, improve the computational efficiency, and explore the performance in more application scenarios, such as underwater video enhancement and the development of real-time processing systems. Through this research, we hope to contribute to the development of underwater image processing technology and promote its widespread use in scientific research, ocean exploration, and industrial applications.

Author Contributions

Conceptualization, B.Z., Y.W. and J.F.; methodology, J.F.; software, J.F.; validation, B.Z., J.F. and X.W.; formal analysis, Y.L. and Y.W.; investigation, X.W.; resources, B.Z.; data curation, Q.Z.; writing—original draft preparation, J.F.; writing—review and editing, J.F., Y.L. and Q.Z.; visualization, J.F.; supervision, B.Z., J.F. and X.W.; project administration, B.Z. and Y.W.; funding acquisition, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Chinese Ministry of Science and Technology (MOST) and the European Space Agency (ESA) within the DRAGON 5 Cooperation under grant ID 57192.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the need to prevent their use in illegal ways.

Acknowledgments

The authors would like to express their gratitude to the researchers who provided public codes and datasets, as well as the editors and anonymous reviewers for their insightful comments and ideas.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xie, K.; Pan, W.; Xu, S. An Underwater Image Enhancement Algorithm for Environment Recognition and Robot Navigation. Robotics 2018, 7, 14. [Google Scholar] [CrossRef]
Sun, S.; Wang, H.; Zhang, H.; Li, M.; Xiang, M.; Luo, C.; Ren, P. Underwater Image Enhancement With Reinforcement Learning. IEEE J. Ocean. Eng. 2024, 49, 249–261. [Google Scholar] [CrossRef]
Bruno, F.; Barbieri, L.; Mangeruga, M.; Cozza, M.; Lagudi, A.; Čejka, J.; Liarokapis, F.; Skarlatos, D. Underwater Augmented Reality for Improving the Diving Experience in Submerged Archaeological Sites. Ocean Eng. 2019, 190, 106487. [Google Scholar] [CrossRef]
Hitam, M.S.; Yussof, W.N.J.H.W.; Awalludin, E.A.; Bachok, Z. Mixture Contrast Limited Adaptive Histogram Equalization for Underwater Image Enhancement. In Proceedings of the 2013 International Conference on Computer Applications Technology (ICCAT), Sousse, Tunisia, 20–22 January 2013; pp. 1–5. [Google Scholar]
Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bekaert, P. Color Balance and Fusion for Underwater Image Enhancement. IEEE Trans. Image Process. 2018, 27, 379–393. [Google Scholar] [CrossRef] [PubMed]
Li, C.-Y.; Guo, J.-C.; Cong, R.-M.; Pang, Y.-W.; Wang, B. Underwater Image Enhancement by Dehazing with Minimum Information Loss and Histogram Distribution Prior. IEEE Trans. Image Process. 2016, 25, 5664–5677. [Google Scholar] [CrossRef]
Wang, Y.; Song, W.; Fortino, G.; Qi, L.-Z.; Zhang, W.; Liotta, A. An Experimental-Based Review of Image Enhancement and Image Restoration Methods for Underwater Imaging. IEEE Access 2019, 7, 140233–140251. [Google Scholar] [CrossRef]
Raveendran, S.; Patil, M.D.; Birajdar, G.K. Underwater Image Enhancement: A Comprehensive Review, Recent Trends, Challenges and Applications. Artif. Intell. Rev. 2021, 54, 5413–5467. [Google Scholar] [CrossRef]
An, S.; Xu, L.; Deng, Z.; Zhang, H. HFM: A Hybrid Fusion Method for Underwater Image Enhancement. Eng. Appl. Artif. Intell. 2024, 127, 107219. [Google Scholar] [CrossRef]
Wang, H.; Zhang, W.; Bai, L.; Ren, P. Metalantis: A Comprehensive Underwater Image Enhancement Framework. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–19. [Google Scholar] [CrossRef]
Wang, H.; Zhang, W.; Ren, P. Self-Organized Underwater Image Enhancement. ISPRS J. Photogramm. Remote Sens. 2024, 215, 1–14. [Google Scholar] [CrossRef]
Zhang, D.; Wu, C.; Zhou, J.; Zhang, W.; Lin, Z.; Polat, K.; Alenezi, F. Robust Underwater Image Enhancement with Cascaded Multi-Level Sub-Networks and Triple Attention Mechanism. Neural Netw. 2024, 169, 685–697. [Google Scholar] [CrossRef] [PubMed]
Wu, J.; Liu, X.; Lu, Q.; Lin, Z.; Qin, N.; Shi, Q. FW-GAN: Underwater Image Enhancement Using Generative Adversarial Network with Multi-Scale Fusion. Signal Process. Image Commun. 2022, 109, 116855. [Google Scholar] [CrossRef]
Huang, Z.; Li, J.; Hua, Z.; Fan, L. Underwater Image Enhancement via Adaptive Group Attention-Based Multiscale Cascade Transformer. IEEE Trans. Instrum. Meas. 2022, 71, 1–18. [Google Scholar] [CrossRef]
Li, C.; Anwar, S.; Porikli, F. Underwater Scene Prior Inspired Deep Underwater Image and Video Enhancement. Pattern Recognit. 2020, 98, 107038. [Google Scholar] [CrossRef]
Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE Trans. Image Process. 2020, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed]
Li, C.; Anwar, S.; Hou, J.; Cong, R.; Guo, C.; Ren, W. Underwater Image Enhancement via Medium Transmission-Guided Multi-Color Space Embedding. IEEE Trans. Image Process. 2021, 30, 4985–5000. [Google Scholar] [CrossRef] [PubMed]
Fu, Z.; Wang, W.; Huang, Y.; Ding, X.; Ma, K.-K. Uncertainty Inspired Underwater Image Enhancement. In Computer Vision—ECCV 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2022; Volume 13678, pp. 465–482. ISBN 978-3-031-19796-3. [Google Scholar]
Qi, Q.; Li, K.; Zheng, H.; Gao, X.; Hou, G.; Sun, K. SGUIE-Net: Semantic Attention Guided Underwater Image Enhancement With Multi-Scale Perception. IEEE Trans. Image Process. 2022, 31, 6816–6830. [Google Scholar] [CrossRef]
Guo, C.; Wu, R.; Jin, X.; Han, L.; Zhang, W.; Chai, Z.; Li, C. Underwater Ranker: Learn Which Is Better and How to Be Better. Proc. AAAI Conf. Artif. Intell. 2023, 37, 702–709. [Google Scholar] [CrossRef]
Chi, K.; Yuan, Y.; Wang, Q. Trinity-Net: Gradient-Guided Swin Transformer-Based Remote Sensing Image Dehazing and Beyond. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–14. [Google Scholar] [CrossRef]
Peng, L.; Zhu, C.; Bian, L. U-Shape Transformer for Underwater Image Enhancement. IEEE Trans. Image Process. 2023, 32, 3066–3079. [Google Scholar] [CrossRef]
Wang, Z.; Shen, L.; Xu, M.; Yu, M.; Wang, K.; Lin, Y. Domain Adaptation for Underwater Image Enhancement. IEEE Trans. Image Process. 2023, 32, 1442–1457. [Google Scholar] [CrossRef] [PubMed]
Khan, M.R.; Mishra, P.; Mehta, N.; Phutke, S.S.; Vipparthi, S.K.; Nandi, S.; Murala, S. Spectroformer: Multi-Domain Query Cascaded Transformer Network For Underwater Image Enhancement. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; pp. 1443–1452. [Google Scholar]
Qi, H.; Zhou, H.; Dong, J.; Dong, X. Deep Color-Corrected Multiscale Retinex Network for Underwater Image Enhancement. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–13. [Google Scholar] [CrossRef]
Naik, A.; Swarnakar, A.; Mittal, K. Shallow-UWnet: Compressed Model for Underwater Image Enhancement (Student Abstract). Proc. AAAI Conf. Artif. Intell. 2021, 35, 15853–15854. [Google Scholar] [CrossRef]
Jiang, J.; Ye, T.; Bai, J.; Chen, S.; Chai, W.; Jun, S.; Liu, Y.; Chen, E. Five A⁺ Network: You Only Need 9K Parameters for Underwater Image Enhancement. arXiv 2023. [Google Scholar] [CrossRef]
Liu, X.; Lin, S.; Chi, K.; Tao, Z.; Zhao, Y. Boths: Super Lightweight Network-Enabled Underwater Image Enhancement. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
Chi, L.; Jiang, B.; Mu, Y. Fast Fourier Convolution. In Proceedings of the Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H., Eds.; Curran Associates, Inc.: New York, NY, USA, 2020; Volume 33, pp. 4479–4488. [Google Scholar]
Huang, J.; Liu, Y.; Zhao, F.; Yan, K.; Zhang, J.; Huang, Y.; Zhou, M.; Xiong, Z. Deep Fourier-Based Exposure Correction Network with Spatial-Frequency Interaction. In Proceedings of the Computer Vision—ECCV 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer Nature: Cham, Switzerland, 2022; pp. 163–180. [Google Scholar]
Sinha, A.K.; Manthira Moorthi, S.; Dhar, D. NL-FFC: Non-Local Fast Fourier Convolution for Image Super Resolution. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022; pp. 466–475. [Google Scholar]
Suvorov, R.; Logacheva, E.; Mashikhin, A.; Remizova, A.; Ashukha, A.; Silvestrov, A.; Kong, N.; Goka, H.; Park, K.; Lempitsky, V. Resolution-Robust Large Mask Inpainting with Fourier Convolutions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2022; pp. 2149–2159. [Google Scholar]
Zhang, D.; Huang, F.; Liu, S.; Wang, X.; Jin, Z. SwinFIR: Revisiting the SwinIR with Fast Fourier Convolution and Improved Training for Image Super-Resolution. arXiv 2022. [Google Scholar] [CrossRef]
Zhuang, Y.; Zheng, Z.; Lyu, C. DPFNet: A Dual-Branch Dilated Network with Phase-Aware Fourier Convolution for Low-Light Image Enhancement. arXiv 2022. [Google Scholar] [CrossRef]
Li, C.; Guo, C.-L.; Zhou, M.; Liang, Z.; Zhou, S.; Feng, R.; Loy, C.C. Embedding Fourier for Ultra-High-Definition Low-Light Image Enhancement. arXiv 2023. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. ISBN 978-3-319-24573-7. [Google Scholar]
Shi, W.; Caballero, J.; Huszar, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.-H. Restormer: Efficient Transformer for High-Resolution Image Restoration. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 5718–5729. [Google Scholar]
Chen, L.; Chu, X.; Zhang, X.; Sun, J. Simple Baselines for Image Restoration. In Computer Vision—ECCV 2022; Lecture Notes in Computer Science; Springer Nature: Cham, Switzerland, 2022; Volume 13667, pp. 17–33. ISBN 978-3-031-20070-0. [Google Scholar]
Li, H.; Li, J.; Wang, W. A Fusion Adversarial Underwater Image Enhancement Network with a Public Test Dataset. arXiv 2019, arXiv:1906.06819. [Google Scholar]
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Loshchilov, I.; Hutter, F. SGDR: Stochastic Gradient Descent with Warm Restarts. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Panetta, K.; Gao, C.; Agaian, S. Human-Visual-System-Inspired Underwater Image Quality Measures. IEEE J. Ocean. Eng. 2016, 41, 541–551. [Google Scholar] [CrossRef]
Yang, M.; Sowmya, A. An Underwater Color Image Quality Evaluation Metric. IEEE Trans. Image Process. 2015, 24, 6062–6071. [Google Scholar] [CrossRef] [PubMed]

Figure 1. GFRENet network architecture.

Figure 2. Structure of the gated convolutional layer module (GCLM).

Figure 3. Structure of the frequency feature enhancement module (FFEM).

Figure 4. Comparison of methods on L400 test set. The size of the rounds in the figure represents the number of floating point operations (FLOPs) of the method.

Figure 5. Visual comparison of underwater scenes of submersibles in L400. (a) Input; (b) UWCNN; (c) UWNet; (d) NU2Net; (e) FANet; (f) Spectroformer; (g) GFRENet; (h) GT.

Figure 6. Visual comparison of underwater scenes of reef in L400. (a) Input; (b) UWCNN; (c) UWNet; (d) NU2Net; (e) FANet; (f) Spectroformer; (g) GFRENet; (h) GT.

Figure 7. Visual comparison of underwater scenes of marine organisms in L400. (a) Input image; (b) UWCNN result; (c) UWNet; (d) NU2Net; (e) FANet; (f) Spectroformer; (g) GFRENet; (h) GT.

Figure 8. Visual comparison of underwater scenes of statue and divers in U45. (a) Input; (b) UWCNN; (c) UWNet; (d) NU2Net; (e) FANet; (f) Spectroformer; (g) GFRENet.

Figure 9. Visual comparison of underwater coral reef in U45. (a) Input; (b) UWCNN; (c) UWNet; (d) NU2Net; (e) FANet; (f) Spectroformer; (g) GFRENet.

Table 1. Evaluation metrics and computational efficiency of comparison methods on the L400 test set, we use ↑ for the higher the better.

	L400				Efficiency
Method	PSNR↑	SSIM↑	UIQM↑	UCIQE↑	Params (M)	FLOPs (G)
UWCNN	23.1760	0.8372	2.8399	0.5706	0.0400	36.7027
UWNet	22.8273	0.8429	2.8107	0.5567	0.2195	304.1722
NU2Net	24.5102	0.8598	2.7313	0.5621	3.1459	147.4634
FANet	24.5060	0.8585	2.7411	0.5877	0.0090	8.2299
Spectroformer	26.4237	0.8850	2.6988	0.5780	2.4062	221.5794
GFRENet (ours)	26.4995	0.8793	2.7576	0.5805	0.3972	16.4209

Table 2. Evaluation metrics for comparing methods on the C60 test set and the U45 test set, we use ↑ for the higher the better.

	C60		U45
Method	UIQM↑	UCIQE↑	UIQM↑	UCIQE↑
UWCNN	2.311	0.526	2.609	0.551
UWNet	2.261	0.510	2.579	0.527
NU2Net	2.310	0.527	2.845	0.569
FANet	2.223	0.552	2.759	0.570
Spectroformer	2.196	0.541	2.673	0.576
GFRENet (ours)	2.392	0.547	2.708	0.575

Table 3. Ablation study of the L400 test set. Bold numbers represent the best results.

Setting	PSNR	SSIM
w/o GCLM	25.715	0.871
w/o FFEM	25.845	0.870
GFRENet	26.500	0.879

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, B.; Fang, J.; Li, Y.; Wang, Y.; Zhou, Q.; Wang, X. GFRENet: An Efficient Network for Underwater Image Enhancement with Gated Linear Units and Fast Fourier Convolution. J. Mar. Sci. Eng. 2024, 12, 1175. https://doi.org/10.3390/jmse12071175

AMA Style

Zhang B, Fang J, Li Y, Wang Y, Zhou Q, Wang X. GFRENet: An Efficient Network for Underwater Image Enhancement with Gated Linear Units and Fast Fourier Convolution. Journal of Marine Science and Engineering. 2024; 12(7):1175. https://doi.org/10.3390/jmse12071175

Chicago/Turabian Style

Zhang, Bingxian, Jiahao Fang, Yujie Li, Yue Wang, Qinglong Zhou, and Xing Wang. 2024. "GFRENet: An Efficient Network for Underwater Image Enhancement with Gated Linear Units and Fast Fourier Convolution" Journal of Marine Science and Engineering 12, no. 7: 1175. https://doi.org/10.3390/jmse12071175

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GFRENet: An Efficient Network for Underwater Image Enhancement with Gated Linear Units and Fast Fourier Convolution

Abstract

1. Introduction

2. Related Work

2.1. Underwater Image Enhancement

2.2. Fast Fourier Convolution

3. Proposed Method

3.1. Network Overall Architecture

3.2. Gating Convolutional Layer Module

3.3. Frequency Feature Enhancement Module

3.4. Loss Function

4. Experimental Results

4.1. Experimental Settings

4.1.1. Datasets

4.1.2. Implementation Details

4.1.3. Evaluation Metric and Benchmark Methods

4.2. Quantitative Evaluations

4.3. Qualitative Comparisons

4.4. Ablation Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI