Two-Branch Underwater Image Enhancement and Original Resolution Information Optimization Strategy in Ocean Observation

Zhang, Dehuan; Cao, Wei; Zhou, Jingchun; Peng, Yan-Tsung; Zhang, Weishi; Lin, Zifan

doi:10.3390/jmse11071285

Open AccessArticle

Two-Branch Underwater Image Enhancement and Original Resolution Information Optimization Strategy in Ocean Observation

¹

School of Information Science and Technology, Dalian Maritime University, Dalian 116026, China

²

Department of Computer Science, National Chengchi University, Taipei 116011, China

³

Department of Electrical and Electronic Engineering, University of Western Australia, Perth, WA 6009, Australia

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

J. Mar. Sci. Eng. 2023, 11(7), 1285; https://doi.org/10.3390/jmse11071285

Submission received: 10 May 2023 / Revised: 21 June 2023 / Accepted: 22 June 2023 / Published: 25 June 2023

(This article belongs to the Topic Applications and Development of Underwater Robotics and Underwater Vision Technology)

Download

Browse Figures

Versions Notes

Abstract

:

In complex marine environments, underwater images often suffer from color distortion, blur, and poor visibility. Existing underwater image enhancement methods predominantly rely on the U-net structure, which assigns the same weight to different resolution information. However, this approach lacks the ability to extract sufficient detailed information, resulting in problems such as blurred details and color distortion. We propose a two-branch underwater image enhancement method with an optimized original resolution information strategy to address this limitation. Our method comprises a feature enhancement subnetwork (FEnet) and an original resolution subnetwork (ORSnet). FEnet extracts multi-resolution information and utilizes an adaptive feature selection module to enhance global features in different dimensions. The enhanced features are then fed into ORSnet as complementary features, which extract local enhancement features at the original image scale to achieve semantically consistent and visually superior enhancement effects. Experimental results on the UIEB dataset demonstrate that our method achieves the best performance compared to the state-of-the-art methods. Furthermore, through comprehensive application testing, we have validated the superiority of our proposed method in feature extraction and enhancement compared to other end-to-end underwater image enhancement methods.

Keywords:

underwater image enhancement; adaptive feature selection; two-branch network; original resolution information enhancement

1. Introduction

The quality of underwater images is significantly diminished by light absorption, and scattering [1], such as color distortion and contrast decrease. For ocean engineering [2], and underwater archaeology, acquiring high-vision underwater images is crucial [3].

Many techniques based on image enhancement, image restoration, and deep learning have been thoroughly investigated to enhance the visual effects of single underwater images. Image pixel values are directly processed using enhancement approaches [4,5] to improve particular image qualities, such as color, contrast, and brightness. Image quality enhancement is treated as an inverse problem using physical imaging models and prior restrictions in image restoration-based approaches [6,7]. Deep learning has recently achieved exceptional performance [8,9,10] and image processing [11,12,13,14] benefits from the robust modeling capabilities of neural networks and the rich feature information extracted from enormous amounts of training images. Many deep learning-based techniques [15,16,17] are designed to draw vital information for improving the visual performance of underwater images.

At present, the underwater image enhancement methods with better performance mostly use the U-net structure as the main framework, and the U-net structure gives the same weight to the multi-resolution information, ignoring the importance of the original resolution, but the image enhancement largely relies on the representation of original resolution pixels, which prompts us to propose a multi-resolution information enhancement network (MIEN), which contains a feature enhancement subnetwork (based on Unet) and an original resolution subnetwork. The feature enhancement subnetwork branch comprehensively uses the characteristics of the Unet structure to focus the network on multi-resolution features and extract the features of different objects in different scenes. FEnet facilitates subsequent image recovery. The original resolution subnetwork does not have any downsampling operations and pays more attention to the dependencies between pixels to promote detail enhancement. Compared with other enhancement methods, we not only consider the global information of different scales when performing feature enhancement but also pay attention to the detailed information of the original size, which makes the enhanced image better visual effects and provides better help for subsequent work such as object recognition. The main contributions are as follows:

(1) To enhance the extraction ability of detailed information, we design an original resolution subnetwork (ORSnet) to extract scene detail information at the original resolution without any up-sampling and down-sampling, to retain the scene detail feature information at the original resolution to the greatest extent.

(2) Our proposed adaptive feature selection module (AFSM) can extract the global information of underwater images from different scales and allow the network to process different information. We introduced adaptive coefficients to promote AFSM effectively fuse remote context information.

(3) To better solve the problem of image blurring, we developed the Semantic Feature Reconstruction Module (SFRM), which uses different convolutions to obtain semantic feature information, and then reconstructs the semantic features of the image through the fusion mechanism to make the visual effect of the image closer to the ground reality.

The remainder of the paper is structured as follows: Section II thoroughly analyzes learning-based and conventional UIE techniques. The proposed MIEN is then thoroughly introduced in Section III. In Section IV, we examined several novel methods on various datasets and ran ablation experiments, and application tests to test the capability of the MIEN. Section V is where we wrap up our discussion and make recommendations for further work.

We will demonstrate how our method outperforms existing techniques in terms of enhancing underwater images. We will provide quantitative metrics such as image quality assessment scores and subjective evaluations to support our claims. By showcasing the superior performance of our method, we will illustrate the need for an improved approach.

2. Related works

Many UIE techniques, which can be generally separated into traditional methods [18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35] and deep learning techniques [2,36,37,38,39,40,41,42,43,44,45,46], have recently been presented to address blurring and color cast.

2.1. Traditional Underwater Image Enhancement Methods

At that time, visual enhancement of underwater images was mostly achieved by modifying the RGB space pixel values. The blue channel, which predominates in underwater images, was stretched toward the lower level following the Rayleigh distribution by Ghani and Isa [18], whereas the inferior red channel was stretched toward the higher level. These techniques provide results that look brilliant, but they are prone to overexposure issues. Additionally, it is unable to effectively resolve the precise content and suspended materials [18]. As we all know, refraction and absorption lead to deteriorated underwater image quality. Underwater restoration techniques frequently employ the Jaffe-McGlamery model [19]. Numerous researchers developed several transmission estimating techniques based on Jaffe-McGlamery and inversely achieved the reconstructed findings. To optimize the visual contrast, Liu and Chau [20], developed a loss function based on quadtree subdivision and found the ideal transmission map. A technique for underwater dehazing was suggested by Li et al. [21], which included a global underwater light estimating algorithm and a regression model-based medium transmission estimate algorithm. An adaptive attenuation-curve prior, created by Wang et al. [22] and based on the statistical distribution of pixel values, is a non-local prior. To lessen the effects of water interference, Xie et al. [23] included a forward scattering component in the formulation model together with the normalized total variation and sparse previous information.

But there is a crucial distinction between the scattering model and the deteriorated undersea appearance that cannot be overlooked. The complexity of underwater imaging is increased by its optical characteristics. Other widely used conventional techniques take advantage of the statistical patterns in the unique look of underwater images. In [24] it is initially suggested to use a dark channel before disclosing the statistics of outside images. To accomplish visual enhancement, Chiang and Chen [25] enhanced the dark channel and coupled the wavelength-dependent compensatory techniques. After that, by examining the applicability of the absorption in underwater images, Drews et al. [26] created the underwater dark channel. Before anticipating a more reliable transmission, Liang et al. [27] introduced the underwater dark channel. Before underwater image restoration, Peng et al. [28] proposed the generalized dark channel and included an adaptive color correction in the construction.

Moreover, Carlevaris-Bianco et al. [29] suggested a straightforward prior that estimates the transmission using the stark differences in attenuation across the RGB color channels in water. Ref. [29] discovered that the red wavelength attenuates more quickly than the green and blue ones. A red channel technique that restores the color connected to the short wavelength was created by Galdran et al. [30]. Li et al. [31,47] developed a unique histogram distribution prior based on the minimal information loss assumption. To rectify the image color, Akkaynak and Treibitz [32] included the depth information and altered the underwater image creation.

Although the usual model-based approaches mentioned above have some influence on solving the color cast problem, traditional underwater image enhancement methods are only helpful for a particular kind of underwater image because of the influence of lighting. Additionally, because of the fixed formations and models, image restoration approaches are challenged by the exact assessment of the degradation concerns.

2.2. Learning-Based Underwater Image Enhancement Methods

Deep learning has advanced significantly in a lot of low-level vision missions. A growing number of researchers are beginning to apply it to the enhancement of underwater images [36]. Researchers have put much effort into developing novel sample generation techniques, more efficient learning techniques, and network topologies in recent years.

Because image transfer techniques have succeeded, Fabbri et al. [37] suggested using CycleGAN [38] to create synthetic underwater images. Then, GAN-based deep models are extensively investigated, including FGAN [39], DenseGAN [40], MLFcGAN [41], and FUnIE-GAN [42]. A real-world underwater image enhancement benchmark (UIEB), made up of 890 paired real-world underwater images, was later created by Li et al. [45]. According to the subject’s preferred visual style, the associated enhancement references are manually chosen among enhancement candidates. The reference-based performance evaluation [48] and the investigation of deep learning-based techniques for UIE [2,43] both benefited considerably from the work of UIEB.

Researchers also looked at novel learning techniques and network topologies to make the most of the few highly visual underwater images available. Li et al. [45] proposed the WaterNet gated fusion network with the UIEB dataset, which fuses the inputs with three projected confidence maps to provide an improved result. When Qi et al. [44] introduced correlation feature matching units to transmit the mutual correlation of the two input branches, [44] suggested the Underwater Image Co-Enhancement Network (UICoE-Net). Li et al. [43] proposed learning rich feature information from various color spaces and attention weights through transmission maps. As a result, [43] designed the Ucolor encoder-decoder enhancement network.

Combined with the characteristics of underwater imaging, Wu et al. [49] proposed a novel two-stage underwater image convolutional neural network (CNN) based on structure decomposition (UWCNN-SD), the first stage is designed to design a preliminary enhancement network containing high-frequency and low-frequency enhancement networks to achieve the initial enhancement of underwater images, and the second stage proposes a refined network to further optimize the color of underwater images to obtain underwater images with higher visual quality. Ding et al. [50] proposed an efficient dual-stream method to improve the quality of underwater images, in which a significant area refinement algorithm is designed to solve the problem of chromatic aberration of the image, and a global appearance adjustment algorithm is designed to improve the clarity of the image and obtain a clearer enhanced image. Yan et al. [51] proposed an attention-guided dynamic multi-branching neural network (ADMNNet) to acquire high-quality underwater images, in which attention-guided dynamic multi-branching blocks are used to combine attributes under different RF into a single-stream structure to improve the diversity of feature representations. Fu et al. [52] proposed a two-branch network to compensate for global distortion and local contrast reduction, respectively. The use of a global-local network greatly simplifies the learning problem, allowing a lightweight network architecture to be used to process underwater images. Lin et al. [53] proposed a new two-stage network for underwater images, which divides the recovery process into two stages, horizontal and vertical distortion recovery, so that the network can effectively solve the scattering and absorption problems. In the first phase, they propose a model-based network that embeds underwater physical models directly into the network to deal with horizontal distortion. In the second stage, they propose a new Attenuation Coefficient Priority Attention Block (ACPAB) to adaptively recalibrate the RGB channel feature map of images affected by vertical distortion. Yu et al. [54] proposed a new multi-attention path aggregation network (APAN), which uses a path aggregation network structure with a backbone network and bottom-up path extension features to enhance semantic features, enhance feature extraction capabilities, and improve the visual effect of underwater images. As the existing underwater image enhancement methods primarily rely on the U-net structure, which emphasizes the extraction of global information, it often results in the loss of image details. To address this limitation, we propose a multi-resolution information enhancement network that enhances features at various resolutions, with particular emphasis on the original resolution. This approach aims to mitigate the loss of image detail information.

3. Proposed Method

The proposed MIEN is divided into two branches: the feature enhancement subnetwork and the original resolution subnetwork. The feature enhancement subnetwork focuses on the drawing of global information, AFSM is used to enhance the extraction of global features of underwater images to solve the problem of color casting. To emphasize the preservation of detailed information, we devised the original resolution subnetwork, which focuses on retaining the feature details of the original resolution layer in underwater images. As a result, our MIEN can produce excellent results for image recovery. The MIEN frame is shown in Figure 1.

3.1. Feature Enhancement Subnetwork

We develop a feature enhancement subnetwork (FEnet) based on the Unet structure, which focuses on multi-resolution features. In FEnet, we propose the adaptive feature selection module to extract the various scenes in the underwater image and extract the global features in the low-resolution. As a result, FEnet can obtain better feature representation for high-quality image reproduction.

Adaptive Feature Selection Module

The enhancing feature selection module, the adaptive channel selection module, and the adaptive spatial selection module make up the adaptive feature selection module (AFSM). These three branches work together to extract global information and provide our network the capacity to handle various information kinds in various ways. Additionally, we can tackle the issue of information overload and enhance the effectiveness and precision of task processing by incorporating an adjustable coefficient to force AFSM to concentrate on the essential information of the present task among the various input information. Thus, for high-quality image restoration, our AFSN obtains improved feature representations. The module framework is shown in Figure 2.

(1) ASSM

The UIE task relies heavily on context relationships in space, so to further enhance the spatial information of features, we introduce ASSM. Additionally, we incorporate an adaptive parameter (

α

) in the ASSM to dynamically adjust the weighting of the entire ASSM branch. The initial value of

α

is set to 0, and its optimal value is gradually learned during the training process. By incorporating this adaptive parameter, our AFSN effectively integrates long-range spatial contextual information, thus enhancing its learning capacity.

First, we compress channel features with convolutional layers and use Softmax to obtain spatial attention weights:

ω_{j}^{s} = \frac{exp (A_{j}^{s})}{\sum_{m = 1}^{N} exp (A_{m}^{s})},

(1)

where

N = H \times W

is the number of pixels.

A_{j}^{s}

represented squeeze channel-wise, j represents the j-th pixel. The input features are reshaped into

R^{H W \times 1 \times 1}

after a convolution, and then softmax to generate spatial attention weights

ω_{j}^{S} \in R^{H W \times 1 \times 1}

. The

ω_{j}^{S}

is then multiplied by the input features to generate spatial long-range contextual features

D^{S} \in R^{C \times 1 \times 1}

.

Attention weights and raw characteristics are multiplied in a matrix to acquire information about the distant spatial context:

D^{s} = \sum_{j = 1}^{N} ω_{j}^{s} B_{j}^{s},

(2)

where we reshape an input

X \in R^{C \times H \times W}

to obtain

B^{s} \in R^{C \times H W}

,

D^{s} \in R^{C \times 1 \times 1}

represents spatial features.

Finally,

α

is used for feature transformation and feature fusion. As in Formula (3):

\begin{matrix} O u t_{A S S M} = (α W_{s 2} (W_{s 1} \sum_{j = 1}^{N} \frac{exp (W_{k} X_{j})}{\sum_{m = 1}^{N} exp (W_{k} X_{m})} X_{j}) \\ \times \frac{1}{2} [1 + e r f (W_{s 1} \sum_{j = 1}^{N} \frac{exp (W_{k} X_{j})}{\sum_{m = 1}^{N} exp (W_{k} X_{m})} X_{j} / \sqrt{2})]), \end{matrix}

(3)

where

W_{s 1}

stands for a

1 \times 1

convolutional layer,

W_{s 2}

represents a GeLU activation layer, and a

1 \times 1

convolutional layer,

α

for an adaptive learning weight,

W_{k}

represents a convolutional layer to squeeze channel-wise features:

A^{s} = W_{k} X

,

A^{s} \in R^{1 \times H \times W}

, and

O u t_{A S S M} \in R^{C \times 1 \times 1}

represents the output of ASSM.

(2) ACSM

The underwater image color recovery task relies heavily on the context relationship in the channel, so to further enhance the channel information of the feature, we introduce ACSM. Besides, We also have introduced an adaptive parameter(

β

) in ACSM to adjust the result weighting of the entire branch of ACSM, the initial value of

β

is set to 0, and the optimal value is gradually learned. AFSM efficiently fuses long-range channel-wise contextual data owing to its adaptive learning weight.

First, the spatial features are compressed with average pooling, and the channel attention weights are obtained using Softmax. As in Equation (4):

ω_{j}^{c} = \frac{exp (A_{j}^{c})}{\sum_{m = 1}^{c} exp (A_{m}^{c})},

(4)

where C is the number of channels,

A_{j}^{c}

indicates squeeze spatial-wise, j represents the j-th pixel, c represents the channel dimension,

ω^{c}

represents the channel-wise attention weights.

Attention weights and raw characteristics are multiplied in a matrix to generate remote channel context data. similar to Formula (5):

D^{c} = \sum_{j = 1}^{c} ω_{j}^{c} B_{j}^{c},

(5)

where we reshape an input

X \in R^{C \times H \times W}

to obtain

B^{c} \in R^{C \times H W}

,

D^{s} \in R^{C \times 1 \times 1}

indicates channel features.

Finally, we introduce an adaptive weight for feature transformation and feature fusion. As in Formula (6):

\begin{matrix} O u t_{A C S M} = (β W_{c 2} (W_{c 1} \sum_{j = 1}^{N} \frac{exp (P o o l_{a v g} X_{j})}{\sum_{m = 1}^{c} exp (P o o l_{a v g} X_{m})} X_{j}) \\ \times \frac{1}{2} [1 + e r f (W_{c 1} \sum_{j = 1}^{N} \frac{exp (P o o l_{a v g} X_{j})}{\sum_{m = 1}^{c} exp (P o o l_{a v g} X_{m})} X_{j} / \sqrt{2})]), \end{matrix}

(6)

where

W_{c 1}

represents a

3 \times 3

convolutional layer,

W_{c 2}

represents one GeLU activation layer and one

3 \times 3

convolutional layer,

β

represents an adaptive learning weight,

O u t_{A C S M} \in R^{1 \times H \times W}

represents the output of ACSM.

(3) EFSM

To enhance the global information of the image, we introduce EFSM to aggregate them with the other two branches to ensure that the main information of the input image is retained while obtaining the context features of space and channel.

The main features are first extracted using convolutional layers, and the weights of the extracted features are obtained using Sigmoid. As in Formula (7):

ω_{j}^{P} = \sum_{j = 1}^{P} σ W_{p 2} (W_{p 1} (X_{j} \times \frac{1}{2} [1 + e r f (X_{j} / \sqrt{2})]),

(7)

where

X \in R^{C \times H \times W}

indicates the input,

P = C \times H \times W

is the number of pixels.

W_{p 1}

represents one GeLU activation layer and one

3 \times 3

convolutional layer,

W_{p 2}

represents a

3 \times 3

convolutional layer,

σ

represents a Sigmoid activation layer,

ω_{j}^{P}

represents the detailed features weights.

Finally, the corresponding element point multiplication operation is performed between the weight and original feature to get the global feature. As in Formula (8):

{Out}_{E F S M} = \sum_{j = 1}^{P} ω_{j}^{p} B_{j}^{p},

(8)

where

B \in R^{C \times H \times W}

,

{Out}_{P S M} \in R^{C \times H \times W}

represents the output of EFSM.

By combining the three branches of the ASSM, ACSM, and EFSM, our proposed method achieves a more comprehensive enhancement of both global and local information. This enables the FEnet to effectively process different types of features, ensuring that the network prioritizes the extraction of global information while addressing the color bias issue in images. AFSM can be formulated as follows:

O u t_{A F S M} = O u t_{A S S M} + O u t_{A C S M} + O u t_{E F S M} .

(9)

3.2. Original Resolution Subnetwork

Most of the existing methods [40,41,43] adopted the Unet architecture to extract image information of different resolutions of images through downsampling, but underwater enhanced images strongly rely on the features of the original resolution space, and Unet-based methods process the original resolution image features and low-resolution features to the same extent. According to the characteristics of image enhancement, we have added an original resolution branch based on the Unet-based method, which can directly process the original resolution image, and better preserve the details and features, avoiding the loss of detail features resulting in reducing the resolution. To keep the feature resolution fixed, ORSnet does not use any upsampling or downsampling operations, so that the network can retain spatial details to the greatest extent. It is composed of Semantic Feature Reconstruction Groups (SFRG).

Semantic Feature Reconstruction Module

The SFRG comprises a Semantic Feature Reconstruction Module (SFRM) and AFSM, under the premise of unchanged original resolution, retaining the global feature information from the input map to the output map to generate high-resolution underwater images. We introduce the multi-scale structure, which can solve the problem of loss of detail features caused by a single convolution. The module frame structure is shown in Figure 3.

Firstly,

3 \times 3

and

5 \times 5

convolution kernels are used to extract essential features, respectively. Such as formula (10), where

k \in {3, 5}

.

F_{c} = \sum_{j = 1}^{N} W_{k} (X_{j} \times \frac{1}{2} [1 + e r f (X_{j} / \sqrt{2})]),

(10)

where

X \in R^{C \times H \times W}

indicates the input,

W_{k}

represents one GeLU activation layer, and one

k \times k

convolutional layer.

The generated feature map is convoluted to achieve fusing features:

G_{f} = W_{3} (cat (F_{3}, F_{5})),

(11)

where

W_{3}

represents one GeLU activation layer and one

3 \times 3

convolutional layer,

F_{3}

,

F_{5}

represent the characteristic map convolved by a convolution kernel of

3 \times 3

and

5 \times 5

, respectively. The cat denotes a concatenate operation.

Extract features from the fused feature and utilize Sigmoid to obtain the weights of important features:

ω_{f} = σ (W_{3} (G_{f} \times \frac{1}{2} [1 + e r f (G_{f} / \sqrt{2})])),

(12)

where

σ

indicates the Sigmoid,

ω_{f}

represents the weight of extracted features.

Finally, the point multiplication operation is performed between the weights and the original features to preserve the global feature information:

O u t_{S F R M} = \sum_{j = 1}^{N} ω_{j}^{N} B_{j}^{N},

(13)

where N is the number of pixels, and

O u t_{S F R M}

represents the output of SFRM. Input the feature map generated by SFRM into the AFSM module and output the feature map.

3.3. Loss Function

Underwater images suffer from limitations such as low brightness, image blurring, and color deviation. Previous approaches to underwater image enhancement that rely solely on a single loss function may not yield satisfactory results in image restoration. We comprehensively consider the absolute error loss

L_{1}

and the minimization of the perceived loss

L_{p}

, and the loss function suitable for underwater images is designed.

L_{2}

loss is often used in previous image recovery tasks, but its flaws can overly penalize more significant errors and reduce the quality of image recovery. We use the minimization of total error losses. It is defined as follows:

L_{I 1} = {∥I_{out} - I_{label}∥}_{1},

(14)

where

I_{out}

and

I_{label}

represent enhanced images and labeled images from UIEB, respectively.

L_{1}

loss calculates absolute values by comparing them pixel by pixel.

Another loss is perception loss, which compares the features obtained from the enhanced image with those obtained from the original image, bringing high-level information closer. Referential perceived loss helps improve the performance of our network, resulting in images with better visual effects. The perceived loss defined on a pre-trained VGG-19 network on ImageNet can be expressed as:

L_{P} = \frac{1}{H^{*} W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} {[φ {(I_{o u t})}_{i, j} - φ {(I_{l a b e l})}_{i, j}]}^{2},

(15)

where the feature map of the pool-3 layers of a VGG-19 network trained on ImageNet is represented by

φ ()

. The height and width of the feature map are denoted by H and W.

The total loss function is as follows:

L_{t} = λ_{1} L_{I 1} + λ_{2} L_{P},

(16)

where

λ_{1}

and

λ_{2}

are hyperparameters set empirically to 1 and 0.2, respectively.

4. Experiments and Analysis

4.1. Preparation

4.1.1. Data

In contrast to other computer vision applications, obtaining clear underwater images is complicated since the small amount of publically available data. The training data set was only just suggested in UIEB [45]. It includes 890 image pairings that include both distorted and ground-truth underwater images. The remaining 90 images are utilized for validation after we randomly choose 800 image pairings to use as the training set.

4.1.2. Training Settings

Our method trains 500 epochs using batch learning techniques during training. There are 8 batches in each. The network is optimized using AdamW, and the learning rate is set at 0.0001. The training images have a resolution of

256 \times 256 \times 3

. The UIEB validation [45], UIEB_test60, MABLs [55], UFO120 [56], EUVP [42] and the U45 [57] were chosen for testing throughout the testing phase.

4.1.3. Methods for Comparison

We investigated seven comparison techniques on UIEB, including IBLA [58], SMBL [55], GDCP [28], HLRP [59], Ucolor [43], and WaterNet [45], to confirm the improvements and outcomes obtained by our method. For the deep learning-based comparison method, we retrained using a partitioned dataset.

4.2. Evaluation of Underwater Images

4.2.1. Objective Evaluation Metrics

We use the non-reference underwater image quality measure (UIQM) [60], CCF [61], naturalness image quality evaluator (NIQE) [62], contrast enhancement image quality (CEIQ) [63], Shannon (SH), mean-square error (MSE), feature similarity index measure (FSIM) [64], peak signal-to-noise-ratio (PSNR) [65], and structural similarity (SSIM) [66] to evaluate the underwater image quality quantitatively. This CCF is a feature-weighted combination of the colorfulness index, contrast index, and fog density index, which are used to predict the color loss caused by absorption, the blurring caused by forward scattering, and the foggy caused by backward scattering. In addition, for CCF we also removed the color component. The MSE is a measure that reflects the degree of difference between an estimator and an estimated amount. The FSIM uses feature similarity for quality evaluation. The PSNR calculates the average error between the input and output based on error-sensitive image quality evaluation. The three components of an image—illumination, contrast, and structure—are often assessed using the SSIM. The underwater image colorfulness measure (UICM) [60], underwater image sharpness measure (UISM) [60], and underwater image contrast measure (UIConM) [60] are the three underwater image attribute measures that make up the UIQM. The UIQM formula is as follows:

U I Q M = c_{1} \times U I C M + c_{2} \times U I S M + c_{3} \times U I C o n M,

(17)

where

c_{1}

,

c_{2}

and

c_{3}

are the coefficients. We set their values to:

c_{1}

= 0.0282,

c_{2}

=0.2953, and

c_{3}

= 3.5753.

4.2.2. Underwater Image Evaluation of Different Scenarios

The proposed method is tested using actual underwater images with three color deviations. Figure 4 displays a few subjective sample outcomes from various techniques. In Figure 4b, the IBLA enhances contrast but is unable to enhance the yellow- and green-toned underwater images. GDCP introduces reddish or greenish hues based on the findings in Figure 4c. In Figure 4d, the improvement produced by WaterNet was darker than the outcomes of other techniques, indicating poor visual contrast. This may also result in color bias as the WaterNet approach creates a white balance channel, which is sometimes inaccurate for enhancing images. The estimated transmission map is dependent on the information from the red channel, thus while SMBL considerably improves the yellow-toned images, it fails to work with images that have high red channel attenuation, as seen in Figure 4e. Ucolor employs a multi-color spatial encoder to enable enhanced image colors to be more authentic, however, as the subjective results image shows, Ucolor does not solve the problem of blurred edges. From a color-correction standpoint, Ucolor’s enhancing outcomes produce greater visual quality. The enhanced image is closer to GT and has outstanding aesthetic effects when using the HLRP approach in blue-green situations, but in green scenarios, the improved image’s color darkens and makes it difficult to detect finer details, as illustrated in Figure 4g. The explanation might be that these physical model-based methods require a precise prior model, which is very challenging to get. In contrast, our approach efficiently eliminates the greenish, blue, and bluish-green tones without adding any artificial colors. This result demonstrates the effectiveness of the proposed approach in generalizing across various underwater scenarios. Additionally, the technique suggested in this paper may eliminate the majority of the haze-like effects and raise the clarity of the features in the image.

Haze-like underwater images are more impacted by effects, which are brought on by different wavelength attenuation, than underwater images with distinct color deviations. As seen in Figure 5, these contrast methods are not very effective in dehazing, and some methods can even cause overexposure problems. Contrarily, our technique successfully enhances the visibility of underwater images by removing haze while restoring true color.

In Figure 6, we can see that GDCP and HLRP still exhibit color casts and water retention when processing images, and GDCP methods are overexposed. The enhanced image of the IBLA method will have a reddish cast, which will affect the visual effect. The image enhanced by the SMBL method appears overly shadowed, resulting in some blurred details. The WaterNet and Ucolor methods enhance the visual effect of the image, and our method can achieve this effect.

Figure 7 shows a subjective comparison of our method with the comparative method on the UFO120 dataset. In Figure 7, IBLA, GDCP, and HLRP all over-enhance during image enhancement, making the image shadow too heavy, resulting in unclear image edges. The image enhancement results of the WaterNet method in the third row of Figure 7 still retain some water effects. Ucolor visualizes better than other methods but still causes some fog effects, resulting in blurry images that are not sharp enough. Our method, both in terms of eliminating the effects of water and color recovery, is superior to other methods, and our method does not cause other post-enhancement effects.

In Figure 8, we compared it in the MABLS dataset, and compared to other methods, we eliminated the water effect when processing, making the image closer to the GT effect and sharper than other methods.

In Figure 9, the IBLA and Ucolor methods are better preserved during the enhancement process but do not eliminate the influence of water, which directly affects the visual effect. The GDCP method causes a blue-green color cast during the enhancement process, and excessive enhancement leads to the loss of some detailed information. The SMBL method enhances images with overexposure and loss of detail, while WaterNet and our method solve these problems very well and have good visual effects.

We assessed the proposed strategy using both full-reference and non-reference objective metrics in our objective tests. As shown in Table 1, our technique performed well in terms of PSNR and SSIM measures, but the UIQM indicators showed only moderate performance. As discussed in Table 1, the objective indicators currently being developed for underwater images sometimes conflict with visual perception. This is because human vision has not evolved to match the conditions of aquatic environments. Using human visual perception as the sole method for color-correction of underwater images can be entirely unreliable since when people view an underwater image, humans tend to focus on the center of the scene or anything that appears vivid or interesting. Although this approach may work well for images with good aesthetic appeal, it does not consider attenuation and backscattering. Furthermore, Table 1 demonstrates that our strategy is more advantageous than the alternatives. In previous studies on underwater image enhancement, our proposed approach, which enhances contrast and corrects color casts in underwater images, showed competitive performance.

In addition, we chose Ucolor, the method with the highest UIQM among the comparison methods, and subjectively compared it with our method. It can be seen in Figure 10, although our method is inferior to Ucolor in terms of UIQM indicators, our visuals are superior to the Ucolor method.

4.3. Ablation Experiments

In Table 2, we perform ablation research using the following experiments to better manifest the function of enhancement models in the MIEN architecture:

1. MIEN removes AFSM operation (w/o AFSM);

2. MIEN removes SFRM operation (w/o SFRM);

3. MIEN deleted FEnet operation (w/o FEnet);

4. MIEN deleted ORSnet operation (w/o ORSnet).

The scores of PSNR, SSIM, MSE, FEIM, CEIQ, and UIQM when removing different modules are given in Table 2. It can be found that the quantitative results are significantly reduced by removing different components, while MIEN has the best performance and quantitative results.

Inside AFSM, we did w/o ASSM, w/o ACSM, w/o EFSM, and w/o

α, β

corresponding ablation experiments to prove the effectiveness of the AFSM module. Table 3 demonstrates how we configured w/o ASSM to exclusively remove ASSM and the associated option

α

. Similarly, we create w/o ACSM, which solely removes ACSM and the associated option

β

. We created the w/o EFSM option, which eliminates EFSM. As evidence of the efficacy of the ACSM, ASSM, and EFSM, the relevant indicators of AFSM, w/o ASSM, w/o ACSM, and w/o EFSM all decreased. The adaptive learning weights

α

and

β

are then removed using the proposal w/o

α, β

. The correlation of indicators for adaptive learning weights w/o

α, β

is also reduced when compared to AFSM, demonstrating their efficiency. As a result of a few more settings, our AFSM significantly improves image restoration.

Inside SFRM, we set w/o

3 \times 3

, w/o

5 \times 5

, w/o one layer multiscale, w/o two layers multiscale, and w/o AFSM corresponding ablation experiments to prove the effectiveness of the AFSM module. We configured w/o

3 \times 3

, which removes the

3 \times 3

convolution in the multiscale part, as shown in Table 4. Similar to this, we created the w/o

5 \times 5

algorithm. We came up with w/o two layers of multiscale, which removes the multiscale part. The relevant indicators of w/o

3 \times 3

, w/o

5 \times 5

, w/o one layer multiscale, and w/o two layers multiscale all decreased when compared to the SFRM, demonstrating the efficiency of the multiscale section. In addition, we suggest the deletion of the Adaptive Feature Selection Module, abbreviated as w/o AFSM. The fact that w/o AFSM has fewer than the corresponding indicators of SFRM shows how effective the Adaptive Feature Selection Module is. In conclusion, our SFRM significantly improves image color restoration.

We contrasted the suggested enhancement framework and the two enhancement modules using actual underwater images. Figure 11 displays the outcomes of the comparison. The underwater image improved by the Adaptive Feature Selection Module retains some detailed information when the original resolution subnetwork is removed, and only Unet containing the Adaptive Feature Selection Module is used, as shown in Figure 11.

The image has normal colors, but the image background still has some underwater color styles. Underwater images enhanced by the Semantic Feature Reconstruction Module showed greater contrast and clarity when the Adaptive Feature Selection Module was disabled, and just the Original Resolution subnetwork was employed.

Nonetheless, as seen in Figure 11, the image still has some hazy effects in addition to a distinct underwater color pattern. Our technique makes use of the Adaptive Feature Selection Module for image color correction and the original resolution subnetwork for texture detail restoration. This significantly raises the caliber of underwater images.

4.4. Application Test

The edge of the image is the essential feature of the image, and the so-called edge refers to the discontinuity of the local characteristics of the image. Mutations in information, such as grayscale or structure are called edges. For example, mutations in grayscale, color mutations, texture structures, etc. An edge is the end of one area, and the beginning of another, and the feature can be used to extract features and segment the image. To show that our method works well in feature extraction, we compare edge detection with different technologies, as shown in Figure 12. In Figure 12, it is clear that our method can depict the outline and details of the objects in the figure in greater detail than other comparison methods and is closer to the ground truth. MIEN-enhanced images can recognize the contour and feature information of the target, which manifests that MIEN has demonstrated superior performance in feature extraction.

We measured the number of essential feature matches from the underwater image and the output image by MIEN. The algorithm we use is Scale Invariant Feature Transform (SIFT), that is, the description vector of the same local features is the same under different-size images. In Figure 13, we can see that the enhanced image group has significantly more critical feature-matching points than the original image group, which can prove that MIEN can make the enhanced image acquire more detailed information.

5. Conclusions

In this paper, we present the multi-resolution information enhancement network (MIEN) for UIE. Our proposed MIEN uses two-branch to enhance the global and local information of underwater images, especially the design of the original resolution subnetwork, which specifically solves the problem of insufficient attention to original resolution information in the Unet structure. Compared to the current network architectures, MIEN extracts more scale-specific information, and global and low-level data are combined on each scale. Many tests are carried out, and quantitative and qualitative comparisons with cutting-edge methods are undertaken, demonstrating that our design delivers considerable gains. At the same time, how to effectively solve the trade-off problem in intensity value and detail, enhance image details while eliminating the overexposure problem of enhanced images, optimize our methods more accurately, and produce clearer images is the direction of our future development. Our approach might also be investigated for other computer vision subjects like segmentation and salient object recognition, as feature aggregation is crucial for solving computer vision problems using deep learning.

Future research should focus on addressing the limitations of paired datasets and the scarcity of reliable ground truth in underwater image enhancement. Innovative methods need to be developed to generate realistic and effective ground truth data specifically tailored for the underwater domain. This will improve the training and evaluation of models, leading to more accurate and reliable performance in underwater imaging applications.

6. Discussion

We evaluated the performance of our proposed MIEN in classical underwater image datasets, using both referenced and unreferenced metrics to verify that our approach outperforms other advanced methods. In future work, we will focus on optimizing the run time of our approach and improving the accuracy of the enhancement effect.

Author Contributions

Conceptualization, D.Z.; methodology, D.Z.; software, W.C. and J.Z.; validation, J.Z. and Y.-T.P.; formal analysis, J.Z. and Z.L.; investigation, W.C. and J.Z.; resources, J.Z. and W.Z.; data curation, J.Z., Z.L. and W.C.; writing—original draft preparation, J.Z.; writing—review and editing, W.C., J.Z. and Y.-T.P.; visualization, J.Z. and W.Z.; supervision, J.Z. and W.Z.; project administration, J.Z. and W.Z.; funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (No. 61702074), the Liaoning Provincial Natural Science Foundation of China (No. 20170520196), the Fundamental Research Funds for the Central Universities (Nos. 3132019205 and 3132019354), and the Cultivation Program for the Excellent Doctoral Dissertation of Dalian Maritime University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Written informed consent has been obtained from the patients to publish this paper.

Data Availability Statement

The references [42,45,57] provide open access to the data used in this research.

Acknowledgments

We would like to express our sincere appreciation to the anonymous reviewers and the editor for their valuable comments and constructive suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

UIE	Underwater Image Enhancement
FEnet	Feature Enhancement Subnetwork
ORSnet	Original Restoration Subnetwork
MIEN	Multi-resolution Information Enhancement Network
AFSM	Adaptive Feature Selection Module
ASSM	Adaptive Spatial Selection Module
ACSM	Adaptive Channel Selection Module
EFSM	Enhancing Feature Selection Module
SFRG	Semantic Feature Reconstruction Group
SFRM	Semantic Feature Reconstruction Module
UIQM	Underwater Image Quality Measure
NIQE	Naturalness Image Quality Evaluator
CEIQ	Contrast Enhancement Image Quality
MSE	Mean-square Error
FSIM	Feature Similarity Index Measure
PSNR	Peak Signal-to-noise-ratio
SSIM	Structural Similarity
UICM	Underwater Image Colorfulness Measure
UISM	Underwater Image Sharpness Measure
UIConM	Underwater Image Contrast Measure

References

Zhou, J.; Zhuang, J.; Zheng, Y.; Li, J. Area Contrast Distribution Loss for Underwater Image Enhancement. J. Mar. Sci. Eng. 2023, 11, 909. [Google Scholar] [CrossRef]
Jiang, Q.; Gu, Y.; Li, C.; Cong, R.; Shao, F. Underwater image enhancement quality evaluation: Benchmark dataset and objective metric. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5959–5974. [Google Scholar] [CrossRef]
Johnson-Roberson, M.; Bryson, M.; Friedman, A.; Pizarro, O.; Troni, G.; Ozog, P.; Henderson, J.C. High-resolution underwater robotic vision-based mapping and three-dimensional reconstruction for archaeology. J. Field Robot. 2017, 34, 625–643. [Google Scholar] [CrossRef] [Green Version]
Li, T.; Rong, S.; He, B.; Chen, L. Underwater image deblurring framework using a generative adversarial network. In Proceedings of the OCEANS 2022-Chennai, Chennai, India, 21–24 February 2022; pp. 1–4. [Google Scholar]
Fan, G.D.; Fan, B.; Gan, M.; Chen, G.Y.; Chen, C.P. Multiscale low-light image enhancement network with illumination constraint. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 7403–7417. [Google Scholar] [CrossRef]
Peng, Y.T.; Lu, Z.; Cheng, F.C.; Zheng, Y.; Huang, S.C. Image haze removal using airlight white correction, local light filter, and aerial perspective prior. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 1385–1395. [Google Scholar] [CrossRef]
Berman, D.; Treibitz, T.; Avidan, S. Diving into haze-lines: Color restoration of underwater images. In Proceedings of the British Machine Vision Conference (BMVC), London, UK, 4–7 September 2017; Volume 1. [Google Scholar]
Li, K.; Wu, L.; Qi, Q.; Liu, W.; Gao, X.; Zhou, L.; Song, D. Beyond single reference for training: Underwater image enhancement via comparative learning. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 2561–2576. [Google Scholar] [CrossRef]
Gao, R.; Li, R.; Hu, M.; Suganthan, P.N.; Yuen, K.F. Dynamic ensemble deep echo state network for significant wave height forecasting. Appl. Energy 2023, 329, 120261. [Google Scholar] [CrossRef]
Gao, R.; Cheng, W.X.; Suganthan, P.; Yuen, K.F. Inpatient discharges forecasting for singapore hospitals by machine learning. IEEE J. Biomed. Health Inform. 2022, 26, 4966–4975. [Google Scholar] [CrossRef]
Ren, W.; Liu, S.; Ma, L.; Xu, Q.; Xu, X.; Cao, X.; Du, J.; Yang, M.H. Low-light image enhancement via a deep hybrid network. IEEE Trans. Image Process. 2019, 28, 4364–4375. [Google Scholar] [CrossRef]
Ren, W.; Pan, J.; Zhang, H.; Cao, X.; Yang, M.H. Single image dehazing via multi-scale convolutional neural networks with holistic edges. Int. J. Comput. Vis. 2020, 128, 240–259. [Google Scholar] [CrossRef]
Liu, J.; Shang, J.; Liu, R.; Fan, X. Attention-guided global-local adversarial learning for detail-preserving multi-exposure image fusion. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5026–5040. [Google Scholar] [CrossRef]
Liu, J.; Fan, X.; Jiang, J.; Liu, R.; Luo, Z. Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 105–119. [Google Scholar] [CrossRef]
Zhou, J.; Sun, J.; Zhang, W.; Lin, Z. Multi-view underwater image enhancement method via embedded fusion mechanism. Eng. Appl. Artif. Intell. 2023, 121, 105946. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, J.; Cao, Y.; Wang, Z. A deep CNN method for underwater image enhancement. In Proceedings of the 2017 IEEE international conference on image processing (ICIP), Beijing, China, 17–20 September 2017; pp. 1382–1386. [Google Scholar]
Zhang, Y.; Chen, D.; Zhang, Y.; Shen, M.; Zhao, W. A Two-Stage Network Based on Transformer and Physical Model for Single Underwater Image Enhancement. J. Mar. Sci. Eng. 2023, 11, 787. [Google Scholar] [CrossRef]
Ghani, A.S.A.; Isa, N.A.M. Underwater image quality enhancement through integrated color model with Rayleigh distribution. Appl. Soft Comput. 2015, 27, 219–230. [Google Scholar] [CrossRef]
Jaffe, J.S. Computer modeling and the design of optimal underwater imaging systems. IEEE J. Ocean. Eng. 1990, 15, 101–111. [Google Scholar] [CrossRef]
Liu, H.; Chau, L.P. Underwater image restoration based on contrast enhancement. In Proceedings of the 2016 IEEE International Conference on Digital Signal Processing (DSP), Beijing, China, 16–18 October 2016; pp. 584–588. [Google Scholar]
Li, C.; Guo, J.; Guo, C.; Cong, R.; Gong, J. A hybrid method for underwater image correction. Pattern Recognit. Lett. 2017, 94, 62–67. [Google Scholar] [CrossRef]
Wang, Y.; Liu, H.; Chau, L.P. Single underwater image restoration using adaptive attenuation-curve prior. IEEE Trans. Circuits Syst. I Regul. Pap. 2017, 65, 992–1002. [Google Scholar] [CrossRef]
Xie, J.; Hou, G.; Wang, G.; Pan, Z. A variational framework for underwater image dehazing and deblurring. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 3514–3526. [Google Scholar] [CrossRef]
He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar]
Chiang, J.Y.; Chen, Y.C. Underwater image enhancement by wavelength compensation and dehazing. IEEE Trans. Image Process. 2011, 21, 1756–1769. [Google Scholar] [CrossRef] [PubMed]
Drews, P.L.; Nascimento, E.R.; Botelho, S.S.; Campos, M.F.M. Underwater depth estimation and image restoration based on single images. IEEE Comput. Graph. Appl. 2016, 36, 24–35. [Google Scholar] [CrossRef] [PubMed]
Liang, Z.; Ding, X.; Wang, Y.; Yan, X.; Fu, X. GUDCP: Generalization of underwater dark channel prior for underwater image restoration. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 4879–4884. [Google Scholar] [CrossRef]
Peng, Y.T.; Cao, K.; Cosman, P.C. Generalization of the dark channel prior for single image restoration. IEEE Trans. Image Process. 2018, 27, 2856–2868. [Google Scholar] [CrossRef]
Carlevaris-Bianco, N.; Mohan, A.; Eustice, R.M. Initial results in underwater single image dehazing. In Proceedings of the Oceans 2010 Mts/IEEE Seattle, Washington, DC, USA, 20–23 September 2010; pp. 1–8. [Google Scholar]
Galdran, A.; Pardo, D.; Picón, A.; Alvarez-Gila, A. Automatic red-channel underwater image restoration. J. Vis. Commun. Image Represent. 2015, 26, 132–145. [Google Scholar] [CrossRef] [Green Version]
Li, C.; Guo, J.; Chen, S.; Tang, Y.; Pang, Y.; Wang, J. Underwater image restoration based on minimum information loss principle and optical properties of underwater imaging. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 1993–1997. [Google Scholar]
Akkaynak, D.; Treibitz, T. Sea-thru: A method for removing water from underwater images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1682–1691. [Google Scholar]
Zhou, J.; Pang, L.; Zhang, D.; Zhang, W. Underwater Image Enhancement Method via Multi-Interval Subhistogram Perspective Equalization. IEEE J. Ocean. Eng. 2023, 48, 474–488. [Google Scholar] [CrossRef]
Zhuang, P.; Ding, X. Underwater image enhancement using an edge-preserving filtering retinex algorithm. Multimed. Tools Appl. 2020, 79, 17257–17277. [Google Scholar] [CrossRef]
Yuan, J.; Cao, W.; Cai, Z.; Su, B. An underwater image vision enhancement algorithm based on contour bougie morphology. IEEE Trans. Geosci. Remote Sens. 2020, 59, 8117–8128. [Google Scholar] [CrossRef]
Anwar, S.; Li, C. Diving deeper into underwater image enhancement: A survey. Signal Process. Image Commun. 2020, 89, 115978. [Google Scholar] [CrossRef]
Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing underwater imagery using generative adversarial networks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Li, H.; Zhuang, P. DewaterNet: A fusion adversarial real underwater image enhancement network. Signal Process. Image Commun. 2021, 95, 116248. [Google Scholar] [CrossRef]
Guo, Y.; Li, H.; Zhuang, P. Underwater image enhancement using a multiscale dense generative adversarial network. IEEE J. Ocean. Eng. 2019, 45, 862–870. [Google Scholar] [CrossRef]
Liu, X.; Gao, Z.; Chen, B.M. MLFcGAN: Multilevel feature fusion-based conditional GAN for underwater image color correction. IEEE Geosci. Remote. Sens. Lett. 2019, 17, 1488–1492. [Google Scholar] [CrossRef] [Green Version]
Islam, M.J.; Xia, Y.; Sattar, J. Fast underwater image enhancement for improved visual perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef] [Green Version]
Li, C.; Anwar, S.; Hou, J.; Cong, R.; Guo, C.; Ren, W. Underwater image enhancement via medium transmission-guided multi-color space embedding. IEEE Trans. Image Process. 2021, 30, 4985–5000. [Google Scholar] [CrossRef] [PubMed]
Qi, Q.; Zhang, Y.; Tian, F.; Wu, Q.J.; Li, K.; Luan, X.; Song, D. Underwater image co-enhancement with correlation feature matching and joint learning. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1133–1147. [Google Scholar] [CrossRef]
Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef] [Green Version]
Zhou, J.; Zhang, D.; Zhang, W. Cross-view enhancement network for underwater images. Eng. Appl. Artif. Intell. 2023, 121, 105952. [Google Scholar] [CrossRef]
Li, C.Y.; Guo, J.C.; Cong, R.M.; Pang, Y.W.; Wang, B. Underwater image enhancement by dehazing with minimum information loss and histogram distribution prior. IEEE Trans. Image Process. 2016, 25, 5664–5677. [Google Scholar] [CrossRef]
Hou, G.; Li, Y.; Yang, H.; Li, K.; Pan, Z. UID2021: An underwater image dataset for evaluation of no-reference quality assessment metrics. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 1–24. [Google Scholar] [CrossRef]
Wu, S.; Luo, T.; Jiang, G.; Yu, M.; Xu, H.; Zhu, Z.; Song, Y. A two-stage underwater enhancement network based on structure decomposition and characteristics of underwater imaging. IEEE J. Ocean. Eng. 2021, 46, 1213–1227. [Google Scholar] [CrossRef]
Ding, D.; Gan, S.; Chen, L.; Wang, B. Learning-based underwater image enhancement: An efficient two-stream approach. Displays 2023, 76, 102337. [Google Scholar] [CrossRef]
Yan, X.; Qin, W.; Wang, Y.; Wang, G.; Fu, X. Attention-guided dynamic multi-branch neural network for underwater image enhancement. Knowl.-Based Syst. 2022, 258, 110041. [Google Scholar] [CrossRef]
Fu, X.; Cao, X. Underwater image enhancement with global–local networks and compressed-histogram equalization. Signal Process. Image Commun. 2020, 86, 115892. [Google Scholar] [CrossRef]
Lin, Y.; Shen, L.; Wang, Z.; Wang, K.; Zhang, X. Attenuation coefficient guided two-stage network for underwater image restoration. IEEE Signal Process. Lett. 2020, 28, 199–203. [Google Scholar] [CrossRef]
Yu, H.; Li, X.; Feng, Y.; Han, S. Multiple attentional path aggregation network for marine object detection. Appl. Intell. 2023, 53, 2434–2451. [Google Scholar] [CrossRef]
Song, W.; Wang, Y.; Huang, D.; Liotta, A.; Perra, C. Enhancement of underwater images with statistical model of background light and optimization of transmission map. IEEE Trans. Broadcast. 2020, 66, 153–169. [Google Scholar] [CrossRef] [Green Version]
Islam, M.J.; Luo, P.; Sattar, J. Simultaneous enhancement and super-resolution of underwater imagery for improved visual perception. arXiv 2020, arXiv:2002.01155. [Google Scholar]
Li, H.; Li, J.; Wang, W. A fusion adversarial underwater image enhancement network with a public test dataset. arXiv 2019, arXiv:1906.06819. [Google Scholar]
Peng, Y.T.; Cosman, P.C. Underwater image restoration based on image blurriness and light absorption. IEEE Trans. Image Process. 2017, 26, 1579–1594. [Google Scholar] [CrossRef] [PubMed]
Zhuang, P.; Wu, J.; Porikli, F.; Li, C. Underwater image enhancement with hyper-laplacian reflectance priors. IEEE Trans. Image Process. 2022, 31, 5442–5455. [Google Scholar] [CrossRef]
Panetta, K.; Gao, C.; Agaian, S. Human-visual-system-inspired underwater image quality measures. IEEE J. Ocean. Eng. 2015, 41, 541–551. [Google Scholar] [CrossRef]
Wang, Y.; Li, N.; Li, Z.; Gu, Z.; Zheng, H.; Zheng, B.; Sun, M. An imaging-inspired no-reference underwater color image quality assessment metric. Comput. Electr. Eng. 2018, 70, 904–913. [Google Scholar] [CrossRef]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
Yan, J.; Li, J.; Fu, X. No-reference quality assessment of contrast-distorted images using contrast enhancement. arXiv 2019, arXiv:1904.08879. [Google Scholar]
Zhang, L.; Zhang, L.; Mou, X.; Zhang, D. FSIM: A feature similarity index for image quality assessment. IEEE Trans. Image Process. 2011, 20, 2378–2386. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jagalingam, P.; Hegde, A.V. A review of quality metrics for fused image. Aquat. Procedia 2015, 4, 133–142. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [Green Version]

Figure 1. The structure of MIEN. MIEN contains two branches, the upper branch is the feature enhancement subnetwork for extracting features of different resolutions, and the lower branch is the original resolution subnetwork for extracting local information.

Figure 2. The overall architecture of AFSM. The AFSM is divided into three branches: ASSM, ACSM, and EFSM.

Figure 3. The overall architecture of the Semantic Feature Reconstruction Module. We use

3 \times 3

and

5 \times 5

to obtain different ranges of sensory fields, and then enhance the features by obtaining weights.

Figure 3. The overall architecture of the Semantic Feature Reconstruction Module. We use

3 \times 3

and

5 \times 5

to obtain different ranges of sensory fields, and then enhance the features by obtaining weights.

Figure 4. Results of qualitative comparison of 90 random images in UIEB. The contrast methods are (a) Raw, (b) IBLA [58], (c) GDCP [28], (d) WaterNet [45], (e) SMBL [55], (f) Ucolor [43], (g) HLRP [59], (h) Ours, (i) Ground-Truth.

Figure 5. Comparisons of results on the haze-like dataset. At the bottom of this image are the scores for PSNR and SSIM.

Figure 6. Visual comparisons on images. (a) represents the original image from UIEB_test, (b–h) showcases the results obtained by IBLA [58], GDCP [28], WaterNet [45], SMBL [55], Ucolor [43], HLRP [59], and our proposed method, respectively.

Figure 7. Visual comparisons on images. (a) represents the original image from UFO120, (b–h) showcases the results obtained by IBLA [58], GDCP [28], WaterNet [45], SMBL [55], Ucolor [43], HLRP [59], and our proposed method, respectively.

Figure 8. Visual comparisons on images. (a) represents the original image from MABLs, (b–h) showcases the results obtained by IBLA [58], GDCP [28], WaterNet [45], SMBL [55], Ucolor [43], HLRP [59], and our proposed method, respectively.

Figure 9. Visual comparisons on images. (a) represents the original image from U45 and EUVP, (b–h) showcases the results obtained by IBLA [58], GDCP [28], WaterNet [45], SMBL [55], Ucolor [43], HLRP [59], and our proposed method, respectively.

Figure 10. In the UIEB dataset, Ucolor is subjectively compared with our method.

Figure 11. Comparison of ablation experimental results of different components. The yellow and red boxes in the figure correspond to the enlarged image below the image, respectively.

Figure 12. Plot of results of edge detection on UIEB. From left to right are (a) Raw, (b) IBLA [58], (c) GDCP [28], (d) WaterNet [45], (e) SMBL [55], (f) Ucolor [43], (g) HLRP [59], (h) Ours.

Figure 13. Significance detection of the result of feature matching. The first line is the feature-matching result of the original image group, and the second line is the feature-matching result of the enhanced image group.

Table 1. Objective comparison of different enhancement techniques. Red in the table denotes best performance, blue denotes inferior performance.

Dataset	Method	IBLA	GDCP	WaterNet	SMBL	Ucolor	HLRP	Ours
UIEB_val	PSNR	17.9884	13.3856	17.3488	16.5970	20.9615	16.4516	23.1424
	SSIM	0.8048	0.7474	0.8132	0.7950	0.8635	0.6720	0.9119
	MSE	0.0891	0.2281	0.1445	0.1601	0.0972	0.1650	0.0729
	FSIM	0.9326	0.8988	0.9185	0.9229	0.9395	0.8464	0.9556
	CEIQ	3.2835	3.2076	3.1008	3.3067	3.2090	3.2763	3.3734
	UIQM	2.4900	2.6697	2.9165	2.5430	3.0495	2.1772	2.9566
	Average	4.2350	3.4468	4.1589	4.0008	4.8210	3.8764	5.2112
UIEB_test	CEIQ	3.1802	3.1207	2.9826	3.1425	3.0533	2.7885	3.1624
	UIQM	1.8344	2.1100	2.3986	1.9039	2.4813	1.9850	2.5254
	Average	2.5073	2.6154	2.6906	2.5232	2.7673	2.3868	2.8439
U45	CEIQ	3.2491	3.1914	3.1863	3.2491	3.2826	3.2986	3.3178
	UIQM	2.3877	2.2750	2.9570	2.3877	3.1481	2.7960	2.9153
	Average	2.8184	2.7332	3.0717	2.8184	3.2154	3.0473	3.1166

Table 2. Qualitative comparison of the different components of the proposed method. The bold font indicates the best.

Metric	w/o FEnet	w/o ORSnet	w/o AFSM	w/o SFRM	Ours
PSNR	20.7856	21.2991	22.8955	23.0597	23.1424
SSIM	0.8582	0.8711	0.9066	0.9037	0.9119
MSE	0.1025	0.0963	0.1141	0.1138	0.0729
FSIM	0.8912	0.8943	0.9465	0.9480	0.9556
CEIQ	3.2510	3.3271	3.3097	3.3540	3.3734
UIQM	2.8512	2.9375	2.9009	2.9148	2.9566
Average	4.7558	4.8721	5.1409	5.1777	5.2112

Table 3. Ablation study in AFSM. The bold font indicates the best.

Metric	w/o ASSM	w/o ACSM	w/o EFSM	w/o SAM	w/o CAM	w/o $α$	w/o $β$	w/o $α, β$	Ours
PSNR	22.4899	22.5163	22.6754	23.0010	21.9513	22.0468	22.8911	22.8080	23.0597
SSIM	0.8794	0.8990	0.8987	0.8115	0.9010	0.8905	0.8872	0.9024	0.9037
MSE	0.1257	0.1191	0.1209	0.1300	0.1257	0.1180	0.1140	0.1148	0.1138
FSIM	0.9169	0.9254	0.9447	0.9351	0.9194	0.9267	0.9099	0.9418	0.9480
CEIQ	2.9867	2.9987	3.1182	3.0081	2.9971	3.1007	3.2016	3.2004	3.3540
UIQM	2.8927	2.9001	2.8871	2.8775	2.9068	2.9111	2.9081	2.9051	2.9148
Average	5.0067	5.0201	5.0672	5.0839	4.9250	4.6411	5.1140	5.1072	5.1777

Table 4. Ablation study in SFRM. The bold font indicates the best.

Metric	w/o $3 \times 3$	w/o $5 \times 5$	w/o OLMS	w/o TLMS	w/o AFSM	Ours
PSNR	22.8857	22.6449	21.9864	22.3551	22.0011	22.8955
SSIM	0.9030	0.9022	0.8816	0.8927	0.8993	0.9066
MSE	0.1187	0.1253	0.1307	0.1220	0.1146	0.1141
FSIM	0.9305	0.9129	0.9317	0.9188	0.9092	0.9465
CEIQ	3.2774	3.2835	3.0943	3.1742	3.1904	3.3097
UIQM	2.6421	2.3569	2.9004	2.8928	2.8803	2.9009
Average	5.0867	4.9959	4.9440	5.0186	4.9610	5.1409

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, D.; Cao, W.; Zhou, J.; Peng, Y.-T.; Zhang, W.; Lin, Z. Two-Branch Underwater Image Enhancement and Original Resolution Information Optimization Strategy in Ocean Observation. J. Mar. Sci. Eng. 2023, 11, 1285. https://doi.org/10.3390/jmse11071285

AMA Style

Zhang D, Cao W, Zhou J, Peng Y-T, Zhang W, Lin Z. Two-Branch Underwater Image Enhancement and Original Resolution Information Optimization Strategy in Ocean Observation. Journal of Marine Science and Engineering. 2023; 11(7):1285. https://doi.org/10.3390/jmse11071285

Chicago/Turabian Style

Zhang, Dehuan, Wei Cao, Jingchun Zhou, Yan-Tsung Peng, Weishi Zhang, and Zifan Lin. 2023. "Two-Branch Underwater Image Enhancement and Original Resolution Information Optimization Strategy in Ocean Observation" Journal of Marine Science and Engineering 11, no. 7: 1285. https://doi.org/10.3390/jmse11071285

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Two-Branch Underwater Image Enhancement and Original Resolution Information Optimization Strategy in Ocean Observation

Abstract

1. Introduction

2. Related works

2.1. Traditional Underwater Image Enhancement Methods

2.2. Learning-Based Underwater Image Enhancement Methods

3. Proposed Method

3.1. Feature Enhancement Subnetwork

Adaptive Feature Selection Module

3.2. Original Resolution Subnetwork

Semantic Feature Reconstruction Module

3.3. Loss Function

4. Experiments and Analysis

4.1. Preparation

4.1.1. Data

4.1.2. Training Settings

4.1.3. Methods for Comparison

4.2. Evaluation of Underwater Images

4.2.1. Objective Evaluation Metrics

4.2.2. Underwater Image Evaluation of Different Scenarios

4.3. Ablation Experiments

4.4. Application Test

5. Conclusions

6. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI