Next Article in Journal
Former-CR: A Transformer-Based Thick Cloud Removal Method with Optical and SAR Imagery
Next Article in Special Issue
Water Surface Acoustic Wave Detection by a Millimeter Wave Radar
Previous Article in Journal
Global and Local Graph-Based Difference Image Enhancement for Change Detection
Previous Article in Special Issue
A Texture Feature Removal Network for Sonar Image Classification and Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DBFNet: A Dual-Branch Fusion Network for Underwater Image Enhancement

1
Ocean College, Jiangsu University of Science and Technology, Zhenjiang 212100, China
2
School of Computer and Information Engineering, Chuzhou University, Chuzhou 239000, China
3
School of Information and Communication Engineering, Guangzhou Maritime University, Guangzhou 510725, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2023, 15(5), 1195; https://doi.org/10.3390/rs15051195
Submission received: 5 January 2023 / Revised: 17 February 2023 / Accepted: 19 February 2023 / Published: 21 February 2023
(This article belongs to the Special Issue Advancement in Undersea Remote Sensing)

Abstract

:
Due to the absorption and scattering effects of light propagating through water, underwater images inevitably suffer from severe degradation, such as color casts and losses of detail. Many existing deep learning-based methods have demonstrated superior performance for underwater image enhancement (UIE). However, accurate color correction and detail restoration still present considerable challenges for UIE. In this work, we develop a dual-branch fusion network, dubbed the DBFNet, to eliminate the degradation of underwater images. We first design a triple-color channel separation learning branch (TCSLB), which balances the color distribution of underwater images by learning the independent features of the different channels of the RGB color space. Subsequently, we develop a wavelet domain learning branch (WDLB) and design a discrete wavelet transform-based attention residual dense module to fully employ the wavelet domain information of the image to restore clear details. Finally, a dual attention-based selective fusion module (DASFM) is designed for the adaptive fusion of latent features of the two branches, in which both pleasing colors and diverse details are integrated. Extensive quantitative and qualitative evaluations of synthetic and real-world underwater datasets demonstrate that the proposed DBFNet significantly improves the visual quality and shows superior performance to the compared methods. Furthermore, the ablation experiments demonstrate the effectiveness of each component of the DBFNet.

1. Introduction

Underwater robots are important tools for the development and utilization of marine resources. They are of great importance in supporting resource detection and engineering applications, such as wreck salvage, pipeline inspection, aquatic life observation, and deep-water aquaculture operations. The underwater robot vision system is essential in the working process, as it serves as the eyes of the underwater robot. Nevertheless, images captured underwater are affected by the selective attenuation of the light propagating through the water, resulting in varying degrees of color deviation. Meanwhile, underwater images are also affected by the scattering of particles in the water, such as gravel and plankton, resulting in uneven haze effects and blurry details. Underwater image enhancement (UIE) aims to obtain high-quality and clear images, so that the robot vision system can better utilize them for analysis and decision-making.
Due to the complexity and variability of the underwater scene, the visual enhancement of underwater images has always been a challenging issue. Recently, many UIE methods have been developed, and they can be roughly classified into two types: traditional model-based methods [1,2,3] and deep learning (DL)-based methods [4,5,6,7,8,9,10,11]. The traditional model-based methods generally depend on a mathematical model of the underwater imaging process, which estimates the model parameters through prior knowledge to produce a clear image. However, due to the influence of environmental factors such as the medium of water and light propagation in water, traditional model-based methods have difficulties in dealing with complex and variable underwater scenes.
Compared with the traditional methods that use manually designed features, the data-driven DL-based methods that learn latent features from data and directly map degraded images to clear images are more effective. Recently, DL technology has demonstrated impressive performance in machine vision tasks, and many network modules have been applied to UIE, such as attention mechanisms [4,5], residual learning [6,7], encoder–decoders [8,9], and generative adversarial networks [10,11]. These methods can enhance the visual quality of underwater images and achieve limited success in challenging natural scenery.
However, the existing DL-based methods still face many challenges in complex underwater scenes. The absorption of light in the process of propagation in water will lead to color casts of underwater images. Most DL methods use the raw image as the input and do not consider the different absorption coefficients of varying color channels; therefore, the color correction of underwater images is not satisfactory. Meanwhile, many end-to-end DL methods do not preserve the image details, resulting in blurred image details after enhancement and affecting the visual quality.
To meet the aforementioned challenges, this paper proposes a dual-branch fusion network, abbreviated as DBFNet. Underwater images are absorbed during the propagation of light in water, and different channels have different absorption coefficients, which leads to color casts, as shown in Figure 1a. To deal with this issue, we first design a triple-color channel separation learning branch (TCSLB) inspired by [12], which divides the input image into R, G, and B channels to learn the color distribution independently using a multi-scale-based attention res-dense module (MSARDM). The MSARDM comprises a residual dense block and a multi-scale channel attention sub-module. The learning of residual information can contribute to improving the color mapping performance, and the multi-scale attention module can enhance the acquisition of local context features. As illustrated in Figure 1b, the color distribution of the enhanced image is well-balanced.
Another important factor affecting underwater image visuals is the detailed information. To enhance the detail of the restored image, we propose a wavelet domain learning branch (WDLB), which is mainly composed of a convolution block and a discrete wavelet transform (DWT)-based attention res-dense module (DARDM) in an encoder–decoder structure. The DARDM can recover clear texture details by retaining sufficient high-frequency knowledge. Finally, we propose a dual attention-based selective fusion module (DASFM) to integrate the latent features of the TCSLB and WDLB to obtain the reconstruction result, which adaptively emphasizes the feature information from different modules to achieve their reasonable fusion. We can obtain visually pleasing enhancement results with the dual-branch fusion network.
We summarize the main contribution of this work as follows:
We propose a dual-branch network termed the DBFNet for UIE. Our method is more effective for color correction and detail restoration, thanks to the use of the triple-color channel separation learning branch and wavelet domain learning branch;
In the TCSLB, we design an effective MSARDM consisting of dense residual blocks and a multi-scale channel attention sub-module, which can improve the color mapping performance;
In the WDLB, we design an effective DARDM consisting of dense residual blocks and a DWT-based attention module, which can provide more detailed features in the wavelet domain;
We design the dual attention-based selective fusion module to achieve the feasible fusion of TCSLB and WDLB output features, which can adaptively emphasize the information parts of different latent results;
We validate the effectiveness of the DBFNet by comparing it with recent DL-based and model-based methods on different datasets. Moreover, we provide detailed ablation experiments and visual and quantitative evaluations.

2. Related Work

The underwater robot vision system plays a significant role in exploiting marine resources. However, it is very difficult for these systems to capture high-quality images due to absorption and scattering. Before the emergence of DL-based methods, traditional model-based methods were mainly adopted. Recently, convolution neural network (CNN)- and transformer-based methods have achieved encouraging enhancement results. Considering the model’s overall structure, these methods can be roughly classified into two types: single-branch and multi-branch methods.
Single-branch methods: The single-branch methods usually use a single-path network structure to map the original degraded underwater image to the clear one directly. Liu et al. [6] proposed a residual learning model for the UIE tasks, while an asynchronous training mode was used to boost the performance of the loss function. Chen et al. [13] developed a UIE algorithm based on DL and the image formation model. They constructed the backscatter estimation module and the direct-transmission estimation module using the convolutional neural network’s operation and restored the image using the modified imaging model. Gangisetty et al. [7] designed a novel CNN architecture that improves the residual network structure by leveraging both global and local residual learning approaches. Guo et al. [10] designed a multi-scale dense block for the generator under the framework of generative adversarial networks. A U-shape transformer network was developed by [14], which integrates a channel-wise multi-scale feature fusion transformer module and a spatial-wise global feature modeling transformer module. Although the above methods achieved satisfactory results for UIE, they did not comprehensively consider the complex factors of underwater image degradation. These factors include the color casts caused by absorption and the loss of image details caused by scattering during light propagation in water, making it difficult to mitigate the degradation of complex scenes.
Multi-branch methods: Unlike the single-branch method, the multi-branch method mainly fuses the feature information processed by different branches separately to obtain comprehensive enhancement results. Xue et al. [15] developed a collaborative learning network for luminance and chrominance, which redefines the UIE task as haze removal and color correction sub-tasks by splitting the luminance and chrominance of underwater images. In their subsequent work, Xue et al. [16] designed a multi-branch aggregation network and trained the model to learn a degradation factor to simultaneously achieve color correction and contrast enhancement. Yan et al. [17] developed a multi-branch neural network for the UIE task, which involved the design of an attention-guided dynamic multi-branch block for learning feature representations from different branches. A novel two-branch deep neural network was designed for UIE [18], which could remove the color shifts and improve the visual contrast by fully using the valuable properties of the HSV color space. Jiang et al. [19] considered the factors that affected degradation in underwater images regarding turbidity and chromatism. The authors designed a multi-scale dense boosted module and a deep aesthetic render module to enhance the visual contrast and perform color correction, respectively. Although the aforementioned multi-branch methods have made great progress in terms of visual enhancements, there are still many deficiencies in complex underwater scenes.
DWT-based methods: The discrete wavelet transform technology has good local characteristics in the wavelet domain, and is widely employed in image processing. In recent years, many researchers have integrated DWT technology into DL models to gain more diversified feature information in the wavelet domain. Jamadandi et al. [20] designed an encoder–decoder network with wavelet pooling and unpooling units to solve the issue of UIE. Aytekin et al. [21] developed a denoising network that applied a split convolutional layer to each sub-band of the DWT. Huo et al. [22] proposed a multi-stage model to ameliorate the hybrid degradations progressively, and decomposed the features through the wavelet transform to enhance the details. A wavelet-based two-stream network was designed by [23], which used the DWT to decompose the original signals into low-frequency and high-frequency sub-bands to address the color cast and blurred details of underwater images, respectively. The above-mentioned DWT-based DL network has been successfully applied in image processing tasks, which provides an idea for the model construction in our work.

3. Proposed Method

In this section, the proposed DBFNet framework is introduced in detail. First, an overview of the DBFNet architecture is presented, which comprises three parts: the TCSLB, WDLB, and DASFM. Subsequently, the details of each component of our model are described. Finally, the hybrid loss function adopted during the training of this framework is introduced.

3.1. Overall Architecture

Underwater images inevitably suffer from scattering and absorption when light travels in water, resulting in color casts and detail losses. The UIE task aims to obtain an image with bright colors and clear details from a given input degraded image. However, the underwater scene is intricate, and it is difficult to obtain satisfactory results with general network architectures. As Figure 2 shows, this paper designs a dual-branch fusion architecture. An MSARDM is proposed in the TCSLB, which is applied to the R, G, and B channels, respectively, to learn effective color features. In the WDLB, the detailed features are learned in the wavelet domain by the proposed DARDM. Finally, the two features are reliably merged by the DASFM to produce an underwater image with bright colors and clear details. Mathematically, the whole process can be formulated as:
I o u t p u t = Φ D A S F M ( Φ T C S L B ( I i n p u t ) ,   Φ WDLB ( I i n p u t ) )
where I i n p u t and I o u t p u t represent the degraded image and enhanced image, respectively. The symbols Φ T C S L B ( · ) , Φ WDLB ( · ) , and Φ D A S D M ( · ) represent the modules of the triple-color channel separation learning branch, wavelet domain learning branch, and dual attention-based selective fusion module, respectively.

3.2. Triple-Color Channel Separation Learning Branch (TCSLB)

Since light is absorbed when traveling in water, different color channels have different absorption coefficients that lead to an unbalanced color distribution, resulting in color casts. To alleviate this issue, we design a TCSLB inspired by [12] that uses an MSARDM to process three color channels. Consequently, the network can adapt to adjust the color distribution of the different color channels. For the TCSLB, we use a residual structure and a series of MSARDM groups, which can learn the feature representation according to the different characteristics of the R, G, and B channels, and can adaptively emphasize the latent color information of the three channels. Mathematically, the TCSLB can be formulated as follows:
F i n i = φ ( I i )   i { R ,   G ,   B }
F T i = C o n v ( M S A R D M n ( M S A R D M n 1   M S A R D M 1 ( F i n i ) ) + F i n i )
F T = C o n c a t ( F T R , F T G , F T B ) + I
where I denotes the input image, I R ,   I G ,   I B denote the R, G, and B channel of input image, φ denotes the P Re L U ( C o n v ( · ) ) operation sequence, and C o n c a t ( · ) denotes the concatenation operation. The symbol M S A R D M n ( · ) denotes the feature map through the n-th MSARDM, and F T R ,   F T G ,   F T B denote the feature maps obtained after feature extraction for the R, G, and B channels, respectively.
Figure 3 illustrates the detailed structure of the MSARDM, which comprises a residual dense block (RDB) and a multi-scale channel attention sub-module. Previous studies have shown that the RDB has been available for feature extraction and can improve the color mapping performance. As Figure 3 shows, the RDB consists of convolutional and PReLU operation sequences. The first N convolutional layers and the PReLU operation sequence aim to boost the number of feature maps, and the purpose of the last layer is to concatenate all feature maps produced from the previous N layer. In our work, N is set to four. A multi-scale channel attention sub-module is introduced at the end of the RDB to enhance the local context feature-capturing ability. Specifically, we first carry out the multi-scale scaling of the feature map f R D B obtained by the RDB, where the scale factors are 1, 1/2, and 1/4. Subsequently, we use these factors in the channel attention to focus on the context features at different scales. Finally, we carry out feature aggregation through the upsampling operation. Given an input feature map f i n and output feature map f o u t , the MSARDM is represented as follows:
f R D B = R D B ( f i n )
f o u t = C o n v ( C o n c a t ( C A ( f R D B ) ,   U p 2 ( C A ( D o w n 1 / 2 ( f R D B ) ) ) ,   U p 4 ( C A ( D o w n 1 / 4 ( f R D B ) ) ) ) ) + f i n
where U p and D o w n denote the upsampling and downsampling operations, respectively, R D B ( · ) denotes the residual dense block operation sequence, and CA denotes the channel attention consisting of the s i g m o i d ( C o n v ( Re L U ( C o n v ( G A P ( · ) ) ) ) ) · operation sequence.

3.3. Wavelet Domain Learning Branch (WDLB)

The wavelet transform is extensively used in traditional image processing methods for denoising and local feature extraction. Recently, deep learning methods based on the DWT techniques have been widely applied in visual processing tasks, such as image deblurring [24], image enhancement [25], and image denoising [26]. For two-dimensional image signals, the horizontal and vertical filtering methods can be used to achieve two-dimensional wavelet decomposition, which results in low-frequency and high-frequency sub-bands. The high-frequency signals usually contain edge and texture information, and the low-frequency signals contain image background information. The DWT can decompose an image into a series of sub-band signals with different frequency characteristics. These sub-signals can supply essential information for the subsequent feature representation and analysis of the model. In addition, the receptive field can be increased while simultaneously preventing information loss, thanks to the reversibility and downsampling properties of the DWT. In this work, we decompose an input using the Haar wavelet transform, which consists of four filters as follows:
f L L = 1   1 1   1 , f L H = 1   1 1 1 f H L = 1 1 1 1 , f H H = 1 1 1 1
Thus, given an input I, we can convolve it with the above filters followed by downsampling to obtain four sub-bands: I L L ,   I L H ,   I H L ,   I H H , i.e., I i = ( I f i ) 2 ,   i { L L , L H , H L , H H } . According to the bi-orthogonal property of the DWT, the input I can be restored using the IDWT. Therefore, the DWT can be regarded as a convolution operation with a kernel size of 2 × 2 , stride of 2, and fixed weights, while the IDWT is its transposed convolution operation.
Inspired by previous work [22,23,27,28], we designed a wavelet domain learning branch (WDLB) to extract the wavelet domain feature, which uses an encoder–decoder architecture and embeds the DARDM, as shown in Figure 2. Specifically, the WDLB contains three stages: the encoder, bottleneck, and decoder. In the encoder stage, a convolution block (convolution and PReLU) and the designed DARDM are employed to extract the features. In addition, the size of the output feature maps in each encoding stage is halved and the number of channels is doubled. After the encoder stage, a series of DARDMs are cascaded to further refine the encoding features. Finally, in the last stage, a decoder is introduced to restore the feature maps, which consist of a DARDM and a deconvolution block (deconvolution and PReLU). The encoder and decoder operations can be represented as follows:
f e n i = D A R D M ( φ ( f e n i 1 ) )   i { 1 ,   2 ,   3 }
f d e i = φ ( D A R D M ( f d e i 1 ) ) + f e n 3 i   i { 1 ,   2 , 3 }
where f e n i and f d e i are the latent features from the encoder and decoder at the i-th layer, respectively; f e n 0 represents the input image; f d e 0 represents the output of the bottleneck layer; φ represents the P Re L U ( D e C o n v ( · ) ) .
Figure 4 shows the detailed structure of the DARDM, which comprises a dense residual block and a DWT-based attention module. Unlike the RDB in the MSARDM, this base block of the dense residual block comprises a 3 × 3 convolution layer, BatchNorm layer, and PReLU activation function. In the WDLB, the dense residual block comprises four base blocks and a 1 × 1 convolution layer. Each prior base block is directly connected to the current base block, and the 1 × 1 convolution layer is used to handle the output information adaptively. Next, a DWT-based attention module (DAM) is used at the end of the dense residual block to encourage the network to learn the features in the wavelet domain, as illustrated in Figure 4.
Given an intermediate feature map X i n C × H × W , the DWT decomposes into half-resolution low-frequency sub-band ( F L L ) and high-frequency ( F L H ,   F H L ,   F H H ) sub-bands. Next, these sub-bands are fed to the pixel attention [29] and spatial attention sub-modules, respectively. The low-frequency signal mainly contains structural information, and we use pixel attention to make the model concentrate more on the structural information features. The high-frequency signal mainly contains texture and other details, and we can make the model concentrate more on spatial information features by using spatial attention. After the application of pixel and spatial attention, we can obtain feature maps F L L and [ F L H , F H L , F H H ] , respectively. Subsequently, we can integrate and reconstruct these sub-bands to the original size by performing the IDWT. Next, we can perform average pooling, convolution, and PReLU operations on the input X i n to control the weights of various channel-wise features. Subsequently, these features are passed through the 3 × 3 convolution layer and PReLU layer to obtain the residual feature F r . Finally, the shortcut features are summed to the residual feature F r to obtain the output feature map X o u t C × H × W , which has the DWT-based attention feature information. The mathematical process of the DARDM is as follows:
[ F L L ,   F L H ,   F H L ,   F H H ] = D W T ( X i n )
F L L = s i g m o i d ( C o n v ( Re L U ( C o n v ( F L L ) ) ) ) F L L
[ F L H , F H L , F H H ] = s i g m o i d ( C o n v ( [ G A P ( [ F L H , F H L , F H H ] ) ; G M P ( [ F L H , F H L , F H H ] ) ] ) ) [ F L H , F H L , F H H ]
F d = I D W T ( [ F L L , F L H , F H L , F H H ] )
F e = F d φ ( A v g p o o l ( X i n ) )
X o u t = φ ( F e ) + X i n
where φ denotes the P Re L U ( C o n v ( · ) ) operation sequence, and G A P ( · ) and G M P ( · ) denote the global average pooling and max pooling, respectively.

3.4. Dual Attention-Based Selective Fusion Module (DASFM)

In our model, the results from the TCSLB and WDLB contribute differently to the acquisition of high-quality underwater images. In order to fuse these two branches, we design a DASFM inspired by [19,30], which adaptively emphasizes the beneficial feature information of the different branches. Figure 5 shows the corresponding structure. Specifically, the results F T and F W from the TCSLB and WDLB, respectively, are first fed into the convolution layer to extract the shallow feature, i.e., F t and F w , and then concatenated to obtain F c . Second, the pixel attention, channel attention, and convolution layer (convolution and PReLU activation function) are employed to extract the features and obtain the feature maps F c p . Third, we feed F c p into the convolution layer and the sigmoid activation function to obtain two attention maps W t and W w , which are used to adaptively adjust each branch. Last, the feature maps from the dual branches are summed, and the fused feature map is adjusted to the size of the output via a convolution operation. Mathematically, the DASFM can be formulated as:
F t = C o n v ( F T ) ,   F w = C o n v ( F W )
F c = c o n c a t ( F t ,   F w )
F c p = φ ( c o n c a t ( C A ( F c ) ,   P A ( F c ) ) )
W t = s i g m o i d ( C o n v ( F c p ) ) ,   W w = s i g m o i d ( C o n v ( F c p ) )
F o u t = C o n v ( ( W t F t ) ( W w F w ) )
where c o n c a t ( · ) denotes the concatenation operation; PA denotes the channel attention, which consists of a S i g m o i d ( C o n v ( Re L U ( C o n v ( · ) ) ) ) · operation sequence; and CA denotes pixel attention, which consists of a S i g m o i d ( C o n v ( Re L U ( C o n v ( G A P ( · ) ) ) ) ) · operation sequence.

3.5. Hybrid Loss Function

Our DBFNet is trained with the loss function L t o t a l that combines the L1 loss L l 1 and perceptual loss L p r e as follows:
L t o t a l = L l 1 + λ L p r e
where λ is set to 0.1 to balance different losses, and the details are presented in the ablation study section. Concretely, the L1 loss evaluates the difference between the reconstructed image I r e c o n and the corresponding reference image I r e f as:
L l 1 = x = 1 H y = 1 W I r e c o n I r e f
The perceptual loss [31] is computed based on the VGG-16 network [32] pre-trained on ImageNet. Let ϕ j ( · ) be the j-th convolutional layer, then ϕ j ( I ) will be a feature map of shape C j × H j × W j when the image I is processed by ϕ j ( · ) . The perceptual loss is the Euclidean distance between the reconstructed image I r e c o n and the corresponding reference image I r e f , and is expressed as follows:
L p r e = 1 C j H j W j ϕ j ( I r e c o n ) ϕ j ( I r e f ) 2 2

4. Experiments

In this section, the experimental implementation details are described; then, to verify the performance of our DBFNet, we compare it with nine different recent methods through the described experimental setup and implementation details, and present the visual and quantitative results. Finally, the effectiveness of each component of our DBFNet is verified by an extensive ablation study.

4.1. Experimental Implementation Details

Datasets: The effectiveness of the DBFNet is verified by experiments utilizing publicly available synthetic and real-world underwater datasets. The synthetic dataset is generated from [33], which contains ten water types, labeled as type 1, 3, 5, 7, 9, I, IA, IB, II, and III. Each type consists of 1449 images from the RGB-D NYU-v2 datasets. In our experiments, we select nine water type images as the training and testing datasets, excluding type 9, which is too turbid. We randomly select 1100 samples from each water type. A total of 9900 samples are employed for training, and the remaining 3141 samples are employed as the test set. For the evaluation on real-world underwater images, 800 images from the UIEB are randomly selected [34] for training, and the remaining 90 images are employed as the test set, denoted as Test-90. In addition to 890 degraded–high-quality paired images, the UIEB dataset also contains 60 challenge images for which no corresponding reference images are available. We use these images as the test set, denoted as Test-C60. The resolution of the images is unified both in the training and testing stages.
Experimental settings: The DBFNet is trained using Pytorch as the DL framework on an Nvidia Tesla V100 GPU with 32 Gb of VRAM. Our proposed network is trained using the Adam optimizer with a momentum rate of 0.9. The learning rate is changed according to the cosine annealing strategy [35] with an initial value of 1 × 10−4. The batch size and the number of epochs are set to 8 and 200, respectively. We empirically set the number of MSARDMs and DARDMs to 5.
Evaluation metrics: For the test set of synthetic datasets and UIEB datasets, we choose three full-reference evaluation metrics, including PSNR [36], SSIM [37], and MSE [38], and two non-reference evaluation metrics, including UIQM [39] and UCIQE [40]. The PSNR and MSE are employed to measure content similarities between the output and reference images; a larger PSNR score or a smaller MSE indicates that the result is similar to the reference image in terms of the contents. The SSIM is employed to measure the contrast and structure similarity; a larger SSIM value indicates that the result is closer to the reference image in terms of the texture and structure. Its definition is given by:
SSIM ( x , y ) = ( 2 μ x μ y + c 1 ) ( 2 σ x σ y + c 2 ) ( μ x 2 + μ y 2 + c 1 ) ( σ x 2 + σ y 2 + c 2 )
where μ x and σ x represent the mean and variance of x, respectively; μ y and σ y represent the mean and variance of y, respectively; σ x y represents the cross-correlation between x and y; c 1 and c 2 are fixed constants.
The UIQM has a stronger correlation with human visual perception, which includes three attribute measures, i.e., the underwater image color metric (UICM), underwater image sharpness metric (UISM), and underwater image contrast metric (UIConM). Its definition is given by:
UIQM = c 1 × UICM + c 2 × UISM + c 3 × UIConM
where c 1 = 0.0282 , c 2 = 0.2953 , and c 3 = 3.5753 are set according to [39].
The UIQM and UCIQE are employed to measure the non-uniform color shift and contrast of output images; a larger UIQM value or a larger UCIQE value indicates a better visual effect.
Comparison methods: The performance achieved by DBFNet is demonstrated by comparing it with nine recent state-of-the-art (SOTA) methods, including traditional methods (UDCP [1], IBLA [2]) and DL-based methods (Shallow-UWnet [41], Chen et al.’s method [13], UResnet [6], WaterNet [34], Deep-WaveNet [42], UGAN [43], Ma et al.’s method [23]).

4.2. Comparisons on Synthetic Datasets

As reference underwater images are difficult to obtain, many researchers train and test deep learning models using synthetic underwater image datasets. Table 1 presents the quantitative results of the PSNR, SSIM, and MSE for underwater images of nine different water types, where the bold and underlined scores show the optimal and sub-optimal results. It can be observed that our DBFNet obtains the highest score for the PSNR, SSIM, and MSE metrics in all water types. This shows that our DBFNet can effectually enhance the image contrast, correct the image color, and perform well in detail retention. In addition, in terms of these three evaluation metrics, our DBFNet is obviously superior to the comparative method, with absolute advantages.
Figure 6 presents the visual results of our DBFNet versus the comparison method on the synthesized underwater dataset. It can be observed that the UDCP and IBLA perform poorly in terms of the image contrast improvement and color correction, and even aggravate image degradation, which may be because these methods rely on prior knowledge. Although DL-based methods can obtain relatively better results, a few methods, such as Shallow-UWnet, UResnet, and Chen et al.’s method, are prone to artifacts and blurring. In contrast, WaterNet, Deep-WaveNet, UGAN, Ma et al.’s method, and our proposed method can achieve relatively good visual results. However, WaterNet and Deep-WaveNet show poor color correction performance, as presented in the second column of Figure 6. It is obvious that our result is similar to the reference image than the comparison methods, and our DBFNet has good performance for color correction and detail preservation. Moreover, as presented in Figure 6, our model produces the highest PSNR and SSIM scores among the compared methods, which further demonstrates the effectiveness of the DBFNet.

4.3. Comparisons on Real-World Datasets

Visual comparisons: Due to the absorption and scattering of light traveling through the water, the image often produces color casts and suffers from a loss of detail. To verify the performance of our DBFNet in color correction and detail restoration, inspired by [44], we divide the images of the Test-90 dataset into five categories: bluish, greenish, yellow tone, shallow water, and low-illuminated images, as present in Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11.
From the perspective of the color correction performance, the UDCP and IBIA are ineffective in eliminating the color offset and enhancing the visual contrast, which may be due to the low robustness of the traditional methods in complex underwater images. The Shallow-UWnet and UResnet methods slightly improve the color shifts, but do not eliminate the severe color bias. In addition, UResnet introduces extra background noise. The other methods have different color correction performances, but the WaterNet method can easily over-enhance the color correction, and Chen et al.’s method, the Deep-WaveNet method, the UGAN method, and Ma et al.’s method cannot completely eliminate the color casts. Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11 show that our DBFNet can produce the best color correction performance.
From the perspective of the detail restoration performance, the UDCP, IBLA, and Shallow-UWnet methods perform poorly due to the interference from color casts. The UResnet is affected by the introduction of additional background noise, and the level of detail is not clearly restored. The remaining methods affect the restoration of details to some extent, although a few methods such as Chen et al.’s method, the WaterNet method, the UGAN method, and Ma et al. method are affected by color offset and haze, rendering poor results, as shown in Figure 7 and Figure 9. The comparison results demonstrate that our DBFNet produces the best detail restoration results and improves the brightness and saturation of the underwater images.
The robustness of the proposed DBFNet is verified by conducting comparative experiments on the challenging test set, i.e., Test-C60, whose visual effects are shown in Figure 12. We can observe that the traditional methods perform poorly, the UDCP reduces the image contrast, and the IBLA aggravates the color casts. The Shallow-UWnet has a slight enhancement effect, UResnet gives rise to extra background noise, and Chen et al.’s method result in over-enhancement. The UGAN method, Ma et al.’s method, and our method show good enhancement performance. Out of these methods, our method performs better in brightness improvement.
Quantitative evaluation: For the quantitative evaluation, we use the fully referenced evaluation metrics and the non-referenced evaluation metrics for the fully referenced images in the Test-90 and the non-referenced images in Test-C60, respectively. We calculate the average scores of all images in the corresponding dataset, and the results are presented in Table 2 and Table 3. As Table 2 shows, our proposed DBFNet obtains the best PSNR, SSIM, and MSE scores. Compared with the Deep-WaveNet (suboptimal method), our DBFNet achieves gains of 8.2% and 0.8% in terms of the PSNR and SSIM metrics, respectively. Table 3 presents the comparison results of non-referenced measures. It can be observed that our DBFNet obtains third-best results with respect to the UISM, UIQM, and UCIQE metrics. Although the UResnet and UGAN methods perform well for these metrics, from the perspective of visual results, there is serious background noise in the results obtained using the UResnet, which is obviously unwanted. It should be noted that there is a certain difference between the visual effects and the quantitative values in some cases, which is also confirmed in [45,46]. Therefore, based on the combined visual comparison and quantitative evaluation, our DBFNet achieves better performance.

4.4. Ablation Studies

In this section, we perform ablation studies on UIEB datasets, based on a Nvidia Tesla V100 GPU with 32 Gb of VRAM. The same training and testing datasets and parameter settings are used for all ablation studies.
Parameter selection of the loss function: The validity of the parameters of the loss function used in our work is verified by calculating the PSNR and SSIM values under various values of λ on the Test-90 dataset, and the quantitative results are presented in Figure 13. It can be clearly seen that the best performance is obtained when λ is equal to 0.1.
Effectiveness of the network components: The validity of each of our developed modules is verified by designing appropriate ablation studies. The visual and quantitative results are presented in Figure 14 and Table 4, respectively. In these results, w/o WDLB, w/o TCSLB, and w/o DARDM denote the proposed model without WDLB, without TCSLB, and without DARDM, respectively. As presented in Table 4, our full model achieves the highest score for all evaluation metrics for the ablated models. As shown in Figure 14, especially in the red rectangular box, our full model achieves the best visual effect in terms of detail preservation and color correction. It can be further observed from Figure 14 that although the color can be mostly corrected when the WDLB, TCLSB, or DARDM is removed from the full model, different degrees of haze are produced and the details are unclear. Combining the quantitative and qualitative results, it can be observed that our proposed components play a significant role in the performance of the full model.
Effectiveness of the fusion method: We verify the effectiveness of the DASFM by comparing it with two other commonly used fusion methods, i.e., element-wise summation and concatenation. As Table 5 shows, the DASFM gets the highest scores in terms of the PSNR and SSIM metrics compared with the other fusion methods, while the MSE metric is slightly weaker than the concatenated fusion method. As Figure 15 shows, especially in the red rectangular box, compared with the other two fusion methods, the DASFM achieves better visual performance, the result is similar to the reference image, and the interference of blue haze can be effectively eliminated.

5. Conclusions

In this work, we developed a novel dual-branch fusion network for UIE, which contains three components, including a triple-color channel separation learning branch, a wavelet domain learning branch, and a dual attention-based selective fusion module. Firstly, an MSARDM was designed to learn the color feature in three channels independently to balance the color distribution. Secondly, a DARDM was designed to make the most of learning the wavelet domain information, which can protect the details of the information. Finally, a DASFM was designed to adapt the fusion the beneficial results from two branches. Our extensive quantitative and visualization evaluations on synthetic and real-world underwater datasets have demonstrated that the proposed DBFNet performs better than other methods.
However, the DBFNet still has some limitations. The dual attention-based selective fusion module we designed shows limited improvements compared with the concatenated operation. In future studies, we hope to enhance the study of the fusion module for better visual results.

Author Contributions

Conceptualization, methodology, software, validation, writing—original draft preparation, K.S.; methodology, software, writing—review and editing, project administration, funding acquisition, Y.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Natural Science Foundation of Guangdong Province of China under grant no. 2023A1515011272, the Tertiary Education Scientific Research Project of Guangzhou Municipal Education Bureau of China under no. 202234598, the Special Project in Key Fields of Guangdong Universities of China under no. 2022ZDZX1020, and the University Natural Science Research Project of Anhui Province under no. KJ2020B07.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Drews, P.; Nascimento, E.; Moraes, F.; Botelho, S.; Campos, M. Transmission Estimation in Underwater Single Images. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Sydney, Australia, 1–8 December 2013; pp. 825–830. [Google Scholar]
  2. Peng, Y.-T.; Cosman, P.C. Underwater image restoration based on image blurriness and light absorption. IEEE Trans. Image Process. 2017, 26, 1579–1594. [Google Scholar] [CrossRef] [PubMed]
  3. Zhou, J.; Liu, D.; Xie, X.; Zhang, W. Underwater image restoration by red channel compensation and underwater median dark channel prior. Appl. Optics 2022, 61, 2915–2922. [Google Scholar] [CrossRef] [PubMed]
  4. Liu, S.; Fan, H.; Lin, S.; Wang, Q.; Ding, N.; Tang, Y. Adaptive Learning Attention Network for Underwater Image Enhancement. IEEE Robot. Autom. Lett. 2022, 7, 5326–5333. [Google Scholar] [CrossRef]
  5. Li, Y.; Chen, R. UDA-Net: Densely Attention Network for Underwater Image Enhancement. IET Image Process. 2021, 15, 774–785. [Google Scholar] [CrossRef]
  6. Liu, P.; Wang, G.; Qi, H.; Zhang, C.; Zheng, H.; Yu, Z. Underwater image enhancement with a deep residual framework. IEEE Access 2019, 7, 94614–94629. [Google Scholar] [CrossRef]
  7. Gangisetty, S.; Rai, R.R. FloodNet: Underwater image restoration based on residual dense learning. Signal Process. Image Commun. 2022, 104, 116647. [Google Scholar] [CrossRef]
  8. Yang, H.H.; Huang, K.C.; Chen, W.T. LAFFNet: A lightweight adaptive feature fusion network for underwater image enhancement. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 685–692. [Google Scholar]
  9. Qi, Q.; Zhang, Y.; Tian, F.; Wu, Q.M.J.; Li, K.; Luan, X.; Song, D. Underwater image co-enhancement with correlation feature matching and joint learning. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 1133–1147. [Google Scholar] [CrossRef]
  10. Guo, Y.; Li, H.; Zhuang, P. Underwater image enhancement using a multiscale dense generative adversarial network. IEEE J. Oceanic Eng. 2019, 45, 862–870. [Google Scholar] [CrossRef]
  11. Yang, M.; Hu, K.; Du, Y.; Wei, Z.; Sheng, Z.; Hu, J. Underwater image enhancement based on conditional generative adversarial network. Signal Process. Image Commun. 2020, 81, 115723. [Google Scholar] [CrossRef]
  12. Zhang, D.; Shen, J.; Zhou, J.; Chen, E.; Zhang, W. Dual-path joint correction network for underwater image enhancement. Opt. Express 2022, 30, 33412–33432. [Google Scholar] [CrossRef]
  13. Chen, X.; Zhang, P.; Quan, L.; Yi, C.; Lu, C. Underwater image enhancement based on deep learning and image formation model. arXiv 2021, arXiv:2101.00991. [Google Scholar]
  14. Peng, L.; Zhu, C.; Bian, L. U-shape Transformer for Underwater Image Enhancement. arXiv 2021, arXiv:2111.11843. [Google Scholar]
  15. Xue, X.; Hao, Z.; Ma, L.; Wang, Y.; Liu, R. Joint luminance and chrominance learning for underwater image enhancement. IEEE Signal Process. Lett. 2021, 28, 818–822. [Google Scholar] [CrossRef]
  16. Xue, X.; Li, Z.; Ma, L.; Jia, Q.; Liu, R.; Fan, X. Investigating intrinsic degradation factors by multi-branch aggregation for real-world underwater image enhancement. Pattern Recognit. 2023, 133, 109041. [Google Scholar] [CrossRef]
  17. Yan, X.; Qin, W.; Wang, Y.; Wang, G.; Fu, X. Attention-guided dynamic multi-branch neural network for underwater image enhancement. Knowl.-Based Syst. 2022, 258, 110041. [Google Scholar] [CrossRef]
  18. Hu, J.; Jiang, Q.; Cong, R.; Gao, W.; Shao, F. Two-branch deep neural network for underwater image enhancement in HSV color space. IEEE Signal Process. Lett. 2021, 28, 2152–2156. [Google Scholar] [CrossRef]
  19. Jiang, Z.; Li, Z.; Yang, S.; Gao, W.; Shao, F. Target Oriented Perceptual Adversarial Fusion Network for Underwater Image Enhancement. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6584–6598. [Google Scholar] [CrossRef]
  20. Jamadandi, A.; Mudenagudi, U. Exemplar-based underwater image enhancement augmented by wavelet corrected transforms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 15–21 June 2019; pp. 11–17. [Google Scholar]
  21. Aytekin, C.; Alenius, S.; Paliy, D.; Gren, J. A Sub-band Approach to Deep Denoising Wavelet Networks and a Frequency-adaptive Loss for Perceptual Quality. In Proceedings of the IEEE International Workshop on Multimedia Signal Processing, Tampere, Finland, 6–8 October 2021; pp. 1–6. [Google Scholar]
  22. Huo, F.; Li, B.; Zhu, X. Efficient Wavelet Boost Learning-Based Multi-stage Progressive Refinement Network for Underwater Image Enhancement. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1944–1952. [Google Scholar]
  23. Ma, Z.; Oh, C. A Wavelet-Based Dual-Stream Network for Underwater Image Enhancement. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 2769–2773. [Google Scholar]
  24. Zou, W.; Jiang, M.; Zhang, Y.; Chen, L.; Lu, Z.; Wu, Y. SDWnet: A straight dilated network with wavelet transformation for image deblurring. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1895–1904. [Google Scholar]
  25. Fan, C.M.; Liu, T.J.; Liu, K.H. Half Wavelet Attention on M-Net+ for Low-Light Image Enhancement. arXiv 2022, arXiv:2203.01296. [Google Scholar]
  26. Peng, Y.; Cao, Y.; Liu, S.; Yang, J.; Zuo, W. Progressive training of multi-level wavelet residual networks for image denoising. arXiv 2020, arXiv:2010.12422. [Google Scholar]
  27. Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
  28. Sun, K.; Meng, F.; Tian, Y. Underwater image enhancement based on noise residual and color correction aggregation network. Digit. Signal Process. 2022, 129, 103684. [Google Scholar] [CrossRef]
  29. Qin, X.; Wang, Z.; Bai, Y.; Xie, X.; Jia, H. FFA-Net: Feature fusion attention network for single image dehazing. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 11908–11915. [Google Scholar]
  30. Yang, H.; Zhou, D.; Cao, J.; Zhao, Q. DPNet: Detail-preserving image deraining via learning frequency domain knowledge. Digit. Signal Process. 2022, 130, 103740. [Google Scholar] [CrossRef]
  31. Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 694–711. [Google Scholar]
  32. Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  33. Anwar, S.; Li, C.; Porikli, F. Deep underwater image enhancement. arXiv 2018, arXiv:1807.03528. [Google Scholar]
  34. Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An underwater image enhancement benchmark dataset and beyond. IEEE Trans. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef] [Green Version]
  35. Loshchilov, I.; Hutter, F. SGDR: Stochastic gradient descent with warm restarts. arXiv 2016, arXiv:1608.03983. [Google Scholar]
  36. Hore, A.; Ziou, D. Image Quality Metrics: PSNR vs. SSIM. In Proceedings of the International Conference on Pattern Recognition, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
  37. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [Green Version]
  38. Hunt, B.R. The Application of Constrained Least Squares Estimation to Image Restoration by Digital Computer. IEEE Trans. Comput. 1973, 100, 805–812. [Google Scholar] [CrossRef]
  39. Panetta, K.; Gao, C.; Agaian, S. Human-visual-system-inspired Underwater Image Quality Measures. IEEE J. Ocean. Eng. 2015, 41, 541–551. [Google Scholar] [CrossRef]
  40. Yang, M.; Sowmya, A. An Underwater Color Image Quality Evaluation Metric. IEEE Trans. Image Process. 2015, 24, 6062–6071. [Google Scholar] [CrossRef]
  41. Naik, A.; Swarnakar, A.; Mittal, K. Shallow-UWnet: Compressed Model for Underwater Image Enhancement. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; pp. 15853–15854. [Google Scholar]
  42. Sharma, P.K.; Bisht, I.; Sur, A. Wavelength-based Attributed Deep Neural Network for Underwater Image Restoration. arXiv 2021, arXiv:2106.07910. [Google Scholar] [CrossRef]
  43. Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing underwater imagery using generative adversarial networks. In Proceedings of the IEEE International Conference on Robotics and Automation, Brisbane, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar]
  44. Wang, Y.; Guo, J.; Gao, H.; Yue, H. UIEC2Net: CNN-based Underwater Image Enhancement Using Two Color Space. Signal Process. Image Commun. 2021, 96, 116250. [Google Scholar] [CrossRef]
  45. Chen, L.; Jiang, Z.; Tong, L.; Liu, Z.; Zhao, A.; Zhang, Q.; Dong, J.; Zhou, H. Perceptual underwater image enhancement with deep learning and physical priors. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 3078–3092. [Google Scholar] [CrossRef]
  46. Liu, R.; Fan, X.; Zhu, M.; Hou, M.; Luo, Z. Real-world underwater enhancement: Challenges, benchmarks, and solutions under natural light. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 4861–4875. [Google Scholar] [CrossRef]
Figure 1. An example of the color distribution of an underwater raw image and its corresponding enhanced image: (a) raw image and its tricolor histogram and three-dimensional color distribution; (b) enhanced image and its tricolor histogram and three-dimensional color distribution.
Figure 1. An example of the color distribution of an underwater raw image and its corresponding enhanced image: (a) raw image and its tricolor histogram and three-dimensional color distribution; (b) enhanced image and its tricolor histogram and three-dimensional color distribution.
Remotesensing 15 01195 g001
Figure 2. The architecture of our proposed DBFNet for UIE.
Figure 2. The architecture of our proposed DBFNet for UIE.
Remotesensing 15 01195 g002
Figure 3. Illustration of the MSARDM in the TCSLB.
Figure 3. Illustration of the MSARDM in the TCSLB.
Remotesensing 15 01195 g003
Figure 4. Illustration of the DARDM in the WDLB.
Figure 4. Illustration of the DARDM in the WDLB.
Remotesensing 15 01195 g004
Figure 5. Illustration of the DASFM.
Figure 5. Illustration of the DASFM.
Remotesensing 15 01195 g005
Figure 6. Visual comparisons of synthetic underwater images.
Figure 6. Visual comparisons of synthetic underwater images.
Remotesensing 15 01195 g006
Figure 7. The visual comparisons of various UIE approaches on the bluish underwater images. The number on each image represents its PSNR score.
Figure 7. The visual comparisons of various UIE approaches on the bluish underwater images. The number on each image represents its PSNR score.
Remotesensing 15 01195 g007
Figure 8. The visual comparisons of various UIE approaches on greenish underwater images. The number on each image represents its PSNR score.
Figure 8. The visual comparisons of various UIE approaches on greenish underwater images. The number on each image represents its PSNR score.
Remotesensing 15 01195 g008
Figure 9. The visual comparisons of various UIE approaches on the yellowish underwater images. The number on each image represents its PSNR score.
Figure 9. The visual comparisons of various UIE approaches on the yellowish underwater images. The number on each image represents its PSNR score.
Remotesensing 15 01195 g009
Figure 10. The visual comparisons of various UIE approaches on shallow water images. The number on each image represents its PSNR score.
Figure 10. The visual comparisons of various UIE approaches on shallow water images. The number on each image represents its PSNR score.
Remotesensing 15 01195 g010
Figure 11. The visual comparisons of various UIE approaches on the low-illuminated underwater images. The number on each image represents its PSNR score.
Figure 11. The visual comparisons of various UIE approaches on the low-illuminated underwater images. The number on each image represents its PSNR score.
Remotesensing 15 01195 g011
Figure 12. The visual comparisons of various UIE approaches on the Test-C60 dataset.
Figure 12. The visual comparisons of various UIE approaches on the Test-C60 dataset.
Remotesensing 15 01195 g012
Figure 13. The ablation study for parameter λ of the loss function.
Figure 13. The ablation study for parameter λ of the loss function.
Remotesensing 15 01195 g013
Figure 14. Ablation study of the performance of the model components. The number on each image represents its PSNR score.
Figure 14. Ablation study of the performance of the model components. The number on each image represents its PSNR score.
Remotesensing 15 01195 g014
Figure 15. Visual results of different fusion methods. The number on each image represents its PSNR score.
Figure 15. Visual results of different fusion methods. The number on each image represents its PSNR score.
Remotesensing 15 01195 g015
Table 1. Quantitative comparison results of various UIE approaches on the synthetic underwater dataset in terms of PSNR (dB), SSIM, and MSE ( × 10 3 ) values. We present the optimal and suboptimal results with values in bold and underlined, respectively.
Table 1. Quantitative comparison results of various UIE approaches on the synthetic underwater dataset in terms of PSNR (dB), SSIM, and MSE ( × 10 3 ) values. We present the optimal and suboptimal results with values in bold and underlined, respectively.
TypesMetricsUDCPIBLAShallow-UWnetUResnetChen et al.WaterNetDeep-WaveNetUGANMa et al.Ours
1PSNR13.6814.7018.8519.5020.0720.6723.0825.4527.8231.93
SSIM0.65470.68810.77890.71580.75180.83610.86510.88740.88730.9165
MSE3.33322.82911.30590.96650.84800.77020.41720.19140.12790.0464
3PSNR11.8413.0315.0416.7117.4317.9719.7624.7424.5829.68
SSIM0.54440.06390.68630.64920.69080.78250.79720.86160.84340.8962
MSE4.94853.65532.27951.76871.40221.37240.90740.23530.30100.0895
5PSNR10.0011.1613.8013.3415.4015.1316.5222.6120.9124.95
SSIM0.39920.45310.61180.50790.60590.69550.69970.78630.76710.8356
MSE7.49065.75443.00153.34932.16462.41771.79840.46460.74680.3420
7PSNR8.999.7313.0511.3313.9713.4414.4319.1117.4320.01
SSIM0.27750.31970.52850.38840.52720.60250.59200.63220.65900.7185
MSE9.75618.51023.57995.18343.01443.48882.84721.18331.68481.1682
IPSNR16.5314.1321.9820.7023.6724.0625.9525.5729.7233.00
SSIM0.76400.55910.84950.73740.83390.88470.91490.89340.90690.9247
MSE1.68502.98650.55880.77150.33160.32950.19980.18560.07690.0353
IAPSNR16.7214.3722.1421.2223.7424.1626.0325.6329.9032.89
SSIM0.77470.58040.85540.75320.84050.88840.91970.89640.91060.9257
MSE1.60992.94420.48470.63480.31610.31400.19400.18360.07400.0360
IBPSNR16.4414.5321.9521.1023.2923.4125.7525.5929.7832.76
SSIM0.76610.59980.84480.74790.82600.87700.91090.89510.90600.9231
MSE1.72552.88510.53050.65780.35870.38320.20740.18420.07680.0372
IIPSNR15.5515.2921.0121.0621.8122.7824.9025.5829.2932.70
SSIM0.73840.67630.82160.74940.79780.86570.89920.89570.90150.9227
MSE2.14282.56990.75490.65440.53530.44710.25670.18430.08880.0380
IIIPSNR13.6714.9218.219.4319.8420.1322.6425.4627.4431.91
SSIM0.66390.70350.7760.72200.75780.83450.86810.89060.88690.9186
MSE3.31432.57481.48351.01800.90280.85560.45150.19080.13920.0461
Table 2. Quantitative comparison results of various UIE approaches on the Test-90 dataset in terms of PSNR (dB), SSIM, and MSE ( × 10 3 ). We present the optimal and sub-optimal results with values in bold and underlined, respectively.
Table 2. Quantitative comparison results of various UIE approaches on the Test-90 dataset in terms of PSNR (dB), SSIM, and MSE ( × 10 3 ). We present the optimal and sub-optimal results with values in bold and underlined, respectively.
Methods PSNR SSIM MSE
UDCP11.510.52125.1332
IBLA15.810.66512.8412
Shallow-UWnet17.790.74031.6002
UResnet18.320.71751.1126
Chen et al.21.320.82600.6588
WaterNet20.880.84180.7840
Deep-WaveNet22.340.86560.7030
UGAN20.430.82550.6836
Ma et al.20.040.83050.8495
Ours24.180.87290.4054
Table 3. Quantitative comparison results of various UIE approaches on the Test-C60 dataset. We present the best, second-best, and third-best results with values in bold, underlined, and double underlined, respectively.
Table 3. Quantitative comparison results of various UIE approaches on the Test-C60 dataset. We present the best, second-best, and third-best results with values in bold, underlined, and double underlined, respectively.
Methods UICM UISM UIConM UIQM UCIQE
UDCP5.35113.88810.04721.46790.5364
IBLA5.85224.39570.16272.04480.5685
Shallow-UWnet2.07694.20780.28422.31720.4677
UResnet6.79926.43520.19762.79860.5974
Chen et al.4.55195.3269 0.2821 2.70990.5466
WaterNet4.11665.29740.26202.61720.5698
Deep-WaveNet4.22545.18850.24992.54500.5729
UGAN 5.4232 6.08590.25912.87660.6037
Ma et al.3.86335.25740.28512.68090.5473
Ours5.1320 5.5205 0.2678 2.7326 0.5827
Table 4. Quantitative comparison results of the ablation study of the components of DBFNet in terms of PSNR (dB), SSIM, and MSE ( × 10 3 ) values. Bold values indicate the best results.
Table 4. Quantitative comparison results of the ablation study of the components of DBFNet in terms of PSNR (dB), SSIM, and MSE ( × 10 3 ) values. Bold values indicate the best results.
Methodsw/o WDLBw/o TCSLBw/o DARDMFull Model
PSNR21.3923.4022.8924.18
SSIM0.84340.85730.85840.8729
MSE0.64660.45010.50060.4054
Table 5. Quantitative comparison results of ablation study of the dual-branch fusion methods in terms of PSNR (dB), SSIM, and MSE ( × 10 3 ) values. Bold values indicate the best results.
Table 5. Quantitative comparison results of ablation study of the dual-branch fusion methods in terms of PSNR (dB), SSIM, and MSE ( × 10 3 ) values. Bold values indicate the best results.
MethodsSummationConcatenateDASFM
PSNR23.2523.9524.18
SSIM0.86880.87100.8729
MSE0.44460.39340.4054
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, K.; Tian, Y. DBFNet: A Dual-Branch Fusion Network for Underwater Image Enhancement. Remote Sens. 2023, 15, 1195. https://doi.org/10.3390/rs15051195

AMA Style

Sun K, Tian Y. DBFNet: A Dual-Branch Fusion Network for Underwater Image Enhancement. Remote Sensing. 2023; 15(5):1195. https://doi.org/10.3390/rs15051195

Chicago/Turabian Style

Sun, Kaichuan, and Yubo Tian. 2023. "DBFNet: A Dual-Branch Fusion Network for Underwater Image Enhancement" Remote Sensing 15, no. 5: 1195. https://doi.org/10.3390/rs15051195

APA Style

Sun, K., & Tian, Y. (2023). DBFNet: A Dual-Branch Fusion Network for Underwater Image Enhancement. Remote Sensing, 15(5), 1195. https://doi.org/10.3390/rs15051195

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop