Next Article in Journal
Voltage and Current Sensor Fault Diagnosis Method for Traction Rectifier in High-Speed Trains
Previous Article in Journal
Unsupervised Anomaly Detection of Intermittent Demand for Spare Parts Based on Dual-Tailed Probability
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Underwater Image Enhancement Network Based on Dual Layers Regression

1
State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang 110016, China
2
Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang 110169, China
3
University of Chinese Academy of Sciences, Beijing 100049, China
4
Key Laboratory of Manufacturing Industrial Integrated, Shenyang University, Shenyang 110044, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2024, 13(1), 196; https://doi.org/10.3390/electronics13010196
Submission received: 5 December 2023 / Revised: 26 December 2023 / Accepted: 30 December 2023 / Published: 2 January 2024

Abstract

:
Due to the absorption and scattering of light in water, captured underwater images often suffer from some degradation, such as color cast, blur, and low contrast. These types of degradation usually affect and degrade the performance of computer vision methods and tasks under water. In order to solve these problems, in this paper, we propose a multi-stage and gradually optimized underwater image enhancement deep network, named DLRNet, based on dual layers regression. Our network emphasizes important information by aggregating different depth features in the channel attention module, and the dual-layer regression module is designed with regression to obtain the ambient light and scene light transmission for an underwater image. Then, with the underwater imaging model, the enhanced underwater image for a degraded image can be obtained with normal color, higher clarity, and contrast. The experiments on some different datasets with qualitative analysis and quantitative evaluations validate our network, and show that it outperforms some state-of-the-art approaches.

1. Introduction

Underwater computer vision systems play an irreplaceable role in the exploitation, protection, and utilization of marine resources and are important tools for both manned and unmanned underwater vehicles in perceiving and recognizing underwater environments [1,2,3]. Underwater optical imaging, as the information source of underwater computer vision systems, usually have quality degradation problems, such as color distortion, low contrast, and blur, as a result of the scattering and absorption of light in water mediums. These types of degradation usually affect the performance of vision-based tasks in water, such as object detection, recognition, and underwater visual surveys. Thus, underwater image restoration and enhancement [4,5,6,7] become significant for comprehending the underwater environment and enhancing image quality visually in the underwater computer vision field.
The degradation factors can be summarized as selective light attenuation and light scattering in water. The light attenuation characteristic in water results in color distortion of an underwater image with blue or green hue. The main reason is that, in the visible light range, red light has the longest wavelength and disappears fastest in water. Light scattering in water causes the reflected light from a scene to be absorbed and scattered by water or suspended particles before reaching the camera, which results in image blur and low illumination.
Traditional methods for restoring and enhancing underwater degraded images are mainly divided into two categories: data-based methods and physical model-based methods. Some classic data-based image processing methods are committed to improve visual quality. They include Gray World Algorithm [8], Limited Contrast Histogram Equalization [9], and Special Color Correction [10,11] in specialty settings underwater. These methods enhance underwater images by correcting the contrast or color of the image. The drawback of these methods is that they do not consider the special imaging conditions in water and are not always effective for color restoration and deblurring. Based on a simplified physical model of underwater optical imaging [12], many methods for single underwater image restoration have been proposed [13,14,15]. These physical model-based approaches are usually realized by estimating the underwater ambient light and scene light transmission map to obtain a clear image with normal color. The effects of these restoration methods depend on the estimation accuracy of the underwater light transmission map, and require subsequently denoising and enhancement.
Recently, deep learning has outstanding performance in image enhancement [16], detection [17], image segmentation [18], and so on. Similarly, some deep learning-based methods of underwater image restoration and enhancement are proposed and achieve remarkable success compared with most non-learning methods. The deep learning methods can be classified into physical model-based nets [19,20,21] and pure data-driven nets [22,23,24,25], which both require abundant ground-truth images as training data. Datasets play a significant role in processing. However, paired images are difficult to acquire limited by the severe underwater conditions. Some researchers [24] utilized synthetic methods to generate paired underwater images. Pairs of underwater images and reference images are used as inputs and ground truth (GT) respectively to train the target network.
Considering the above deep learning-based methods, some of them are actually pure data-driven learning methods while using depth information or transmission maps. They directly learn the mapping between inputs and outputs, and do not consider characteristics of underwater imaging. Furthermore, most of these methods only put emphasis on the loss between inputs and final outputs, and do not constrain the outputs of each stage. In fact, constraints on the intermediate of the network also have significance for enhancement. Inspired by these problems, we combine the deep learning network with the physical imaging model, constrain the output of each stage of the network, and propose a multi-stage enhancement network named DLRNet. The ambient light and transmission map are acquired by a dual-layer regression module in DLRNet. With the network deepening and stage increasing, to produce enhancement images with better visual and metric quality than those in the previous stage, we constrain the outputs of each stage so that the intermediate outputs are constantly close to the reference. Moreover, we design the network to fuse the features of different stages. In this way, the previous stages are also helpful in promoting the comprehensive understanding of the features in the next stage so that the distort images can be enhanced stage by stage. As a result, this multi-stage and gradual optimization network improves images quality and solves the problem of color distortion.
The main contributions are summarized as follows:
  • We propose a multi-stage progressive optimization network model, named as Dual Layers Regression Network (DLRNet). In our model, the enhancement of an underwater image is decomposed into multiple controllable processes and optimized gradually. In this way, the degraded underwater image can be enhanced stage by stage.
  • We propose a fusion mechanism to integrate the features from every stage. Coupled with an attention module, shallow features are fused to continuously deepen the network’s understanding of features, which is beneficial for gradual optimization of the network.
  • Under the supervision of the previous outputs, the network continuously explores more effective enhanced features on the basis of ensuring the integrity of feature information.
  • The qualitative and quantitative evaluations on different datasets show that, compared with some state-of-the-art methods, our DLRNet can more effectively restore color distortion and enhance the contrast.

2. Related Work

In general, underwater image restoration and enhancement methods consist of two categories: traditional restoration and enhancement methods and deep learning-based enhancement methods.

2.1. Traditional Approaches

Traditional underwater image approaches include traditional underwater image enhancement methods and physical model-based restoration methods.

2.1.1. Traditional Enhancement Approaches

In addition to traditional physical model-based methods, some researchers also enhanced degraded images and improved the visual quality by increasing pixel values. Ancuti et al. [26] proposed the Fusion algorithm. Firstly, this involves obtaining the color corrected version and contrast enhanced version of the degradation as inputs, and then calculating the corresponding Laplacian contrast weight, Local contrast weight, Saliency weight, and Exposedness weight, before finally the enhanced images are obtained by multi-scale fusion. Ancuti et al. [27] subsequently improved this strategy. By improving the white balance algorithm, which compensated the red channel using green color channels in a gray world algorithm, the method removes the blue-green tone of underwater images, and solves the red artifact problem introduced in the traditional gray world algorithm. Huang et al. [28] proposed a method using RGHS (Relative Global Histogram Stretching) to enhance shallow-water images. Roman [29] presented a couple of new color space transformations for lossless image compression that is simpler than existing transformations. Wiseman [30] proposed a two-step method to reduce the amount of data transmission by reducing the color resolution of the image and modifying the quantization table. However, this method failed to successfully enhance seriously distorted images from deeper water. Zhuang et al. [31] proposed a Bayesian retinex approach for single underwater image enhancement. They employed a maximum posteriori formulation on the preprocessed image by imposing reflectance and illumination priors. Although non-physical model methods can achieve good results, these algorithms ignore the underwater image mechanism and are susceptible to resulting in over-enhancement, under-enhancement, and even artifacts.

2.1.2. Traditional Restoration Approaches

Traditional physical model-based methods estimate scene transmission map and ambient light through underwater imaging models to recover degraded images. In recent years, researchers have proposed different estimation methods. He et al. [32] proposed the dark channel prior (DCP) algorithm to remove haze. There are similarities between underwater image enhancement and image dehazing, and thus many researchers use the DCP algorithm to complete the task of underwater image restoration. However, absorption and scattering in the underwater environment make the DCP algorithm not directly applicable to underwater images. Therefore, many scholars use the improved DCP algorithm for underwater image restoration. Since red light disappears fast in the underwater scene, Galdran et al. [13] proposed the red channel prior (RDCP) algorithm, which substitutes the inverse of the red channel as compensation used in a dark channel prior algorithm. Peng et al. [33] developed an improved DCP method that uses the blurriness degree of the image and the difference in light absorption to estimate the underwater ambient light and transmission. Akkaynak et al. [34,35] obtained a more complex underwater imaging model after many simulations and experiments, which introduced more parameters and recovered the underwater images by combining the prior information. In underwater image restoration tasks, the accurate parameter estimation of a complex underwater imaging model remains a great challenge for current physical model-based methods. For instance, the prior approach in Peng [33] does not work all the time when handling clear images.

2.2. CNN-Based Approaches

In recent years, underwater image enhancement approaches using deep learning were shown to perform well. Islam et al. [24] translated images from domain X to domain Y utilizing CycleGAN (Cycle Generative Adversarial Network) to generate degraded images corresponding to clear images, where domain X represents a set of underwater images with no degradation, and domain Y represents images with distortion. By distorting images in X, they generated a new dataset with paired underwater images, named EUVP (Enhancement of Underwater Visual Perception) Dataset. Anwar et al. [23] designed the UWCNN network, the model of which is of an end-to-end structure and utilizes a data-driven training mechanism to directly reconstruct clear underwater images. It can preserve the original structural and texture information by jointly optimizing the mean squared error and structural similarity loss. Li et al. [36] made the UWCNN (Underwater Convolutional Neural Network) lightweight, which allowed it to be more easily applied to underwater video enhancement. Yang et al. [37] developed a lightweight adaptive feature fusion network, named LAFFNet, to reduce memory and model parameter costs. This network reduces the number of parameters by approximately 94%. Wang et al. [38] constructed UIECˆ2-Net (Underwater Image Enhancement Convolution Neural Network using 2 Color Space) to realize image enhancement. This algorithm effectively integrates the RGB (Red, Green, Blue) and HSV (Hue, Saturation, Value) color spaces. However, these data-driven end-to-end methods easily introduce artifacts due to the lack of constraints on the physical model. Since the depth information or the transmission map play a significant role in restoration and enhancement, some methods choose to utilize this information to improve enhancement and restoration performance. For example, Li et al. [19] generated paired images in WaterGAN (Water Generative Adversarial Network) to realize underwater image enhancement. The WaterGAN firstly inputs clear images and a corresponding depth map, and then generates degraded underwater images as output, before finally sending them into a two-stage fully convolutional restoration network. The first stage of restoration aims to learn a relative depth map using an underwater image. Next, the degraded image and depth map are forwarded to the second stage to obtain an enhanced image. Li et al. [21] presented a deep learning-based network called Ucolor. The Ucolor first gathers the features of three color spaces into a united structure to enrich characteristic representations, and then integrates and highlights the distinct features combined with an attention module in an encoder network. Second, a transmission guide decoder network is hired to enhance the response for degraded parts, in which the reverse medium transmission (RMT) map is also used as input. Although these methods take the depth map into account, they all belong to pure data-driven enhancement methods for underwater images. These methods do not consider the physical model which reflects imaging characteristics underwater. Thus, we propose a multi-stage strategy which estimates the ambient light and scene light transmission map of the underwater imaging model. The parameters are learned by training under abundant paired datasets. And then the reconstructed underwater images with high quality are acquired. Compared with other state-of-the-art approaches, our method achieves the best performance.

3. Proposed Method

3.1. Underwater Optical Imaging Model

In the underwater environment, due to the selective absorption of light and the scattering of suspended particles, captured underwater images often suffer from degraded phenomena such as blur and color distortion. According to the characteristics of underwater imaging, the Jaffe-McGlamery underwater optical imaging model [39,40] was proposed. In this model, the underwater image captured by the camera can be expressed as a linear combination of direct component, forward scattering component, and back scattering component. As shown in Figure 1, the direct component is the visible light that directly enters the camera, the forward scattering component is the reflected light from the surface of an object that is scattered by some suspended particles in the water and then enters the camera, and the backscattered component is the natural light that is received by the camera, which enters the water body and is scattered by suspended particles in the water. Usually, the forward scattering component can be ignored when the distance between the target object and the camera is short. Therefore, the simplified underwater imaging model [12] can be expressed as
I ( x ) = J ( x ) t ( x ) + A ( 1 t ( x ) )
where x denotes a pixel in an image, where I ( x ) denotes the degraded image captured by a camera, J ( x ) denotes the clear and normal image, t ( x ) denotes the scene transmission, A denotes the underwater ambient light, J ( x ) t ( x ) denotes the direct component, and A ( 1 t ( x ) ) denotes the backscattered component.
In this paper, the scene light transmission and underwater ambient light under different depth features are estimated through network training, and are gradually optimized to obtain clearer enhanced images.

3.2. Architecture

Figure 2 depicts details of our framework, which includes two modules, Feature Extraction Module and Dual Layers Regression Module. Features extracted from degraded images by the Feature Extraction Module are fed into the Dual Layers Regression Module to estimate the scene light transmission map and underwater ambient light, in order to obtain the underwater image enhancement results. In the Feature Extraction Module, we hire a three-layer U-Net [41] structure. The degraded image goes through two 2 × downsamplings implemented by convolution operation with stride equal to 2, and then forward to two deconvolution operations to complete transform from low resolution to high resolution. In the Dual Layers Regression Module, a dual-layer model is adopted to estimate the two parameters, the ambient light and the transmission map. Here, we employ dilated convolutions to achieve comprehensive feature understanding. The extracted features first go through two dilated convolutions with a dilated rate of 1. Then, we feed the features into the bilateral regression path simultaneously. After an Attention Module(AM), the underwater ambient light A 0 and the scene light transmission map t 0 are generated, so that we obtain a preliminary enhancement result J 0 using the underwater imaging model. Since J 0 still appears with color cast and blur, we further send features forward to a stage similar to the previous one. In this stage, we increase the dilated rate of convolution and integrate features of the first stage and current stage to forward to AM. Then, we obtain a more effective enhancement image. Finally, we design the DLRNet with three stages. Apparently, the multi-stage network can realize progressive optimization and can not only retain more original information but also obtain rich feature representations. Furthermore, the previous stages have a restrictive supervision effect on the current stage. The qualitative and quantitative evaluations on different datasets show that our DLRNet can more effectively restore color distortion and enhance low contrast compared to others.
In the following, the Dual Layers Regression Module is introduced in detail.

3.2.1. Dilated Convolution

In the Dual Layers Regression Module, features are encoded using dilated convolutions [42] with an increasing dilation rate. Dilated convolution coding can not only have a larger receptive field but also contain more contextual information, so as to obtain more feature information and deepen the network’s understanding of features. There are six dilated convolution layers in the Dual Layers Regression Module, and we set the dilated rate of each layer to 1, 1, 2, 4, 8, and 16. Since the color cast and blur always cover the whole image, we expect to capture more comprehensive feature information. The increasing dilated rates are helpful for understanding the overall characteristics at different levels. Furthermore, experiments show that the enhancement effect decreases when we constantly increase the dilated rate in the Dual Layers Regression Module.

3.2.2. Features Integration

In this part, shallow and deep features of dual layers structure are aggregated into the AM, so that richer combination information can be learned across all levels. As Figure 3 shows, each AM takes the aggregated features of different dilation rates in the current stage and the attention module features of previous stages as input to realize the concentration of shallow and deep features, and deepen the understanding of these features.

3.2.3. Attention Mechanism

Features incorporating different levels should make significant contributions to regression of the underwater ambient light and the scene light transmission map. Thus, for each AM, Squeeze-and-Excitation block (SE-block) [43] structure, which can intensify the learning of convolution features by explicitly modeling mutual dependencies between channels and increase the network’s sensitivity to feature information, is adopted. Therefore, we can use global information to selectively emphasize important informative features and suppress less useful features.
The detail of the AM is shown in Figure 3. The input of AM is the aggregation features F = C o n c a t ( f 1 , f 2 , , f C ) R C × H × W , where C is the number of features, C o n c a t is the aggregation operation, and H × W is the size of the input images. The SE-block can be divided into two steps: Squeeze and Excitation. The goal of Squeeze is to obtain global information of the feature maps in each channel. To achieve this purpose, the Excitation operation must meet the following two criteria: first, it should be flexible, and second, it can learn a non-mutually-exclusive relationship. To meet the above criteria, we successively adopt ReLU activation function and Sigmoid activation function as the gating mechanism. And to limit model complexity and improve its generalization, we parameterize the gating mechanism by forming a bottleneck with two fully connected layers around the non-linearity. In this step, a channel descriptor z R C of the global spatial information is obtained, which is achieved by global average pooling. The c - t h descriptor of z is expressed as Equation (2) below. The role of Excitation is to completely capture dependencies of channels. We employ a gating mechanism to calculate the gating unit s R C , that is, a new set of weights for each channel. As shown in Equation (3), the Excitation process consists of two layers of fully connected, where W 1 R C r × C and W 2 R C × C r are weight matrices of the two fully connected layers, respectively, r is the number of hidden layer nodes, δ ( · ) represents the ReLU activation function, and σ ( · ) represents the Sigmoid activation function.
z c = 1 H × W i H j W f c ( i , j ) , c [ 1 , C ]
s = σ ( W 2 ( δ ( W 1 z ) ) )
Then, F is reassigned new weights. In order to maintain properties of the original features, we also map F to the output features U = ( u 1 , u 2 , · · · , u C ) R C × H × W , depicted in Equation (4). Finally, the feature representation of each channel is shown in Equation (5). In our DLRNet, the features acquired by AM are used to estimate scene transmission map and ambient light.
U = F F s
u c = f c + s c f c , c [ 1 , C ]

3.2.4. Parameters Estimation and Output

As described above, the Dual Layers Regression Module adopts a dual-layer structure to estimate the transmission map and the ambient light, respectively. As shown in Figure 2, the upper layer estimates the ambient light, and the lower layer estimates the scene light transmission map. Since we adopt a three-stage module, three pairs of parameters are regressed, respectively A m , t m   ( m = 0 , 1 , 2 ) . Then, every pair A m , t m is put into Equation (1) to acquire the enhancement image J m   ( m = 0 , 1 , 2 ) . Apparently, J 2 is the best image compared to J 0 and J 1 . With the supervision and constraint of the previous stage, we can output the enhancement with high quality and no color cast. Moreover, to verify the correctness of the three-stage regression module, we use a four-stage module to process the degraded image. More analysis is introduced in the ablation study.

3.3. Loss Function

In order to achieve good visual effects and quantitative results, we use a linear combination of l 2 loss L l 2 and multi-layer perceptual loss L p e r [44] to train the DLRNet.
In training, combined with the strategy of stage-by-stage optimization, l 2 loss and multi-layer perceptual loss at each stage are calculated, respectively L l 2 _ m and L p e r _ m . To constrain the results of each stage, we add the loss at each stage to the training loss function:
L l o s s = λ L l 2 + L p e r = λ m = 0 2 L l 2 _ m + m = 0 2 L p e r _ m
λ is set as 0.01, balancing L l 2 and L p e r so that the losses are all on the same order of magnitude.
Specifically, the l 2 loss describes the difference between the enhanced image J and the underwater clear image (ground truth) J:
L l 2 _ m = i H j W ( J ( i , j ) J ( i , j ) ) 2
Multi-layer perceptual loss L p e r is computed based on the VGG-19 network [45] pretrained on the ImageNet dataset [46]. We consider the feature difference between the generated image J and the underwater clear image J in the CONV _ k   ( k = 0 , 1 , · · · , 5 ) layer of the VGG-19 network as a perceptual loss, and denoted by l 1 loss with a weight of λ l :
L p e r _ m = k = 0 5 λ l Φ k ( J ) Φ k ( J ) 1 = k = 0 5 λ l i = 1 H j = 1 W Φ k ( J ) ( i , j ) Φ k ( J ) ( i , j )
where Φ k denotes the CONV _ k convolutional layer of the VGG-19 network pretrained on the ImageNet dataset, and k = 0 represents the original underwater degraded image.

4. Experiments and Discussion

In order to verify the effectiveness of our DLRNet, we compare with other methods, including traditional approaches and deep learning-based methods. The results are compared and analyzed in both subjective and objective ways. In this part, we first discuss datasets and experimental settings, and then introduce qualitative and quantitative evaluation. Finally, we depict the ablation experiments to further prove the effectiveness of the DLRNet.

4.1. Datasets and Experimental Settings

We select two public datasets, UIEBD (Underwater Image Enhancement Benchmark Dataset) [25] and EUVP (Enhancement of Underwater Visual Perception) [24], to train our network. The UIEBD was proposed by Li Chongyi et al. in 2019. It is a real-world underwater image dataset containing different ranges of resolutions and diverse scenes. In this dataset, 890 pairs are degraded underwater images and corresponding high-quality underwater images, and 60 examples are poor-quality underwater images. In 2020, Islam et al. proposed the EUVP dataset, which includes 12,000 paired and 8000 unpaired underwater images. It is divided into three groups according to the degree of degradation and the difference in scenes and sources, which are EUVP-underwater dark, EUVP-underwater imagenet, and EUVP-underwater scenes, respectively.
Our DLRNet is implemented using the TensorFlow framework. We complete training and testing on an NVIDIA TITAN Xp GPU. We utilize the Adam optimizer with a learning rate of 0.00005 for model optimization. During the stage-by-stage training operation, the epoch of each stage is set to 150, and the batch size is equal to 1. The resolution of the images in all test datasets is 256 × 256. These images were tested on an NVIDIA TITAN Xp GPU (ASUS, Taiwan, China) with a processing time of 0.18 s per image.

4.2. Qualitative Evaluation

To evaluate the effectiveness of DLRNet, we compare it with the traditional underwater image restoration and enhancement algorithms, IBLA (Image Blurriness and Light Absorption) algorithm [33], RGHS algorithm [28], ULAP (Underwater Light Attenuation Prior) algorithm [47], and MLLE (Minimal Color Loss and Locally Adaptive Contrast Enhancement) [48], as well as the deep learning-based enhancement algorithms, FUnIE_GAN (Fully Convolutional Conditional Generative Adversarial Network-based Model for Underwater Image Enhancement) algorithm [24], UGAN (Underwater Generative Adversarial Network) algorithm [22], UResNet (Underwater Residual Network) algorithm [49], WaterNet algorithm [25], and Ucolor [21], for comparative analysis.
We randomly select several samples from each dataset to verify the color restoration and contrast enhancement effect of underwater images. Figure 4, Figure 5, Figure 6 and Figure 7 show the results of these algorithms on UIEBD, EUVP-underwater scenes, EUVP-underwater imagenet, and EUVP-underwater dark, respectively.
In UIEBD, traditional methods tend to produce artifacts in processing underwater images. For example, in Figure 4, when IBLA algorithm and ULAP algorithm process Image1, the output images present blue and yellow tones. In addition, although other deep learning-based methods can solve the problem of blue-green bias in underwater images, they cannot achieve the expected effect in terms of clarity. As shown in Figure 4, the outputs processed by WaterNet are still blurred. However, DLRNet can not only solve the problem of blue-green cast, but also effectively remove fog-like blur, which enhances the image contrast and clarity. Similarly, in the dataset EUVP-underwater scenes and EUVP-underwater imagenet datasets, DLRNet is still able to output satisfactory enhanced images.
As the results of all figures show, the traditional methods are not as effective as the deep learning-based methods. The results of the RGHS algorithm in each dataset show that it not only cannot remove the color cast, but it also leads to an oversaturation phenomenon. For example, when processing Image2 and Image3 of the EUVP-underwater scenes dataset in Figure 5, the light color area on the turtle’s back and the red color area on the diver’s head are over-enhanced. It is worth noting that the results of MLLE remove the blue-green bias, but their color is different from the reference images. This is the reason for the low PSNR and SSIM values of the results from MLLE. However, the results retain rich details in the images.
In the EUVP-underwater imagenet dataset, most compared methods cannot solve the color cast problem. In Image1 and Image4, the Waternet and Ucolor do not recover accurate color compared with reference images. In cases such as Image3 and Image4 in Figure 6, the results of IBLA, RGHS, ULAP, UGAN, and WaterNet still remain color cast in green.
The images in the EUVP-underwater dark dataset have a high degree of degradation, which is extremely challenging for all methods. In Figure 7, the IBLA, ULAP, FUnIE_GAN, and UResNet algorithms all introduce different degrees of color cast. Taking Image3 as an example, the results of the IBLA and ULAP algorithms show a purple tone of the water body, and the output of the FUnIE_GAN algorithm shows a yellow tone. Even the UResNet algorithm cannot completely remove the blue color cast, as the water still has a blue color cast. Furthermore, the WaterNet algorithm produces a green water color cast when it processes Image4. However, the outputs of DLRNet are closer to the real clear image than those of the above algorithms. In contrast, DLRNet can effectively remove the color casts without excessive enhancement and saturation while also improving image clarity.

4.3. Quantitative Evaluation

In this section, we select two full reference indicators, Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM), to analyze the performance of our DLRNet and other algorithms objectively.
The PSNR uses Mean-Square Error (MSE) to compare the generated image with the real clear image to evaluate the quality of the enhanced image. SSIM mainly considers three key characteristics of the image: brightness, contrast, and structure. The larger the PSNR and SSIM values, the better the enhancement result.
In Table 1, we can see that the PSNR and SSIM values of DLRNet are significantly higher than those of the other methods on all datasets. Observing the index values, it can be found that the quantitative evaluation conclusion is consistent with the qualitative evaluation conclusion. Visually, the results of MLLE appear to be better than the other methods, but the quantitative evaluation ranks MLLE very low. This is because PSNR and SSIM are two fully-reference evaluations, and they determine image quality according to the similarity between the enhanced image and the reference image. Although the visual effect of MLLE is good, it is not close to the reference image, so the ranking of quantitative indicators is low. The PSNR and SSIM values of the traditional methods are lower than the indicators of the deep learning approaches. As discussed in the previous section, the effect of the traditional methods is not as good as that of deep learning networks. Moreover, PSNR and SSIM values of our DLRNet both achieve best performance among all deep learning-based methods. In contrast, DLRNet can not only output a clear enhanced underwater image through this multi-stage gradual optimization idea, but also make an image closer to the non-degraded real image in terms of clarity, structure, and contrast.

4.4. Ablation Experiments

In order to verify the effectiveness of our proposed multi-stage strategy, we carry out an experimental analysis of the enhancement effect in different stages. Firstly, we consider the situation without lateral output supervision. Parameters A 0 and t 0 regressed after the attention mechanism are delivered to the underwater imaging model to obtain the first-stage enhancement results. Furthermore, we perform a contrast experiment with two stages. In the two-stage network, we encode the features obtained in the first stage using two dilated convolutions with increasing dilate rates. Similar to the previous stage, we forward features to an attention module to regress parameters A 1 and t 1 . Then, we acquire J 1 with the supervision and constraint of the previous stage. Furthermore, we add a fourth stage to illustrate that our three-stage strategy is optimal. Now, the dilated rates of convolution are 32 and 64.
The PSNR values in each stage are presented in Table 2, from which we can conclude that J 1 and J 2 both have better results than previous outputs, and the enhanced images obtained in these stages have more details and higher clarity, which means that the concept of multi-stage constraints tends to be valid for underwater enhancement. Compared with the supervision methods that only focus on the loss between input and final output, this supervision strategy that constrains each stage of the network can obtain better quality enhanced images than the previous stage with the deepening of the network and the increase of stages, so that the intermediate output is constantly close to the reference and the final enhancement effect is improved. But PSNR of J 3 decreases in the datasets, which indicates that it is unreasonable to add stages continuously for better enhancement effects. In contrast, our three-stage regression module can achieve the most satisfactory results.

5. Conclusions

We propose a deep learning-based method combined with physical models to enhance underwater images. Specifically, this paper proposes a multi-stage optimization model that emphasizes important information by aggregating different depth features in the channel attention module. Moreover, the dual-layer regression module can regress the ambient light and scene light transmission map, and transmit them into the underwater imaging model to obtain an enhanced underwater image. The channel Attention Module is progressively optimized, so with the deepening of the network, the Dual Regression Module can obtain more and more important and accurate features, so that the enhanced images obtained by the dual regression modules are gradually closer to the Ground Truth. In addition, as the network gradually deepens, the results of shallow features have a supervisory constraint on the deep features, thereby outputting clearer images that do not have color casts. The experimental results show that our DLRNet combining physical models with deep learning is better than both physical model-based and purely data-driven deep learning approaches. It also proves that our supervision strategy of constraining the output of each stage is effective. The experimental results show that the PSNR and SSIM values of our DLRNet are better than those of other algorithms for different datasets, and that clear underwater images can be effectively restored. In future work, we consider making the network more lightweight for the purpose of underwater image enhancement. At the same time, we consider cascading the underwater enhancement network with other networks such as object detection, recognition, and classification to enhance the accuracy of various underwater tasks.

Author Contributions

Conceptualization, Q.W., H.J. and Y.X.; methodology, H.J. and Y.X.; software, H.J. and Y.X.; validation, H.J. and Q.W.; formal analysis, X.C.; investigation, X.C.; resources, Y.T. and Z.H.; data curation, H.J.; writing—original draft preparation, H.J. and Y.X.; writing—review and editing, Y.T., Q.W. and H.J.; visualization, H.J.; supervision, X.C.; project administration, X.C.; funding acquisition, Y.T., Q.W. and Z.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China under Grant 61991413 and Grant 62073205, the Natural Science Foundation of Liaoning Province of China under Grant 2021-KF-12-07, the Youth Innovation Promotion Association of the Chinese Academy of Sciences under Grant 2022196 and Grant Y202051, and the National Science Foundation of Liaoning Province under Grant 2021-BS-023.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, Z.; Zhang, G.; Luan, K.; Yi, C.; Li, M. Image-Fused-Guided Underwater Object Detection Model Based on Improved YOLOv7. Electronics 2023, 12, 4064. [Google Scholar] [CrossRef]
  2. Chen, X.; Yuan, M.; Fan, C.; Chen, X.; Li, Y.; Wang, H. Research on an Underwater Object Detection Network Based on Dual-Branch Feature Extraction. Electronics 2023, 12, 3413. [Google Scholar] [CrossRef]
  3. Jiang, Z.; Li, Z.; Yang, S.; Fan, X.; Liu, R. Target Oriented Perceptual Adversarial Fusion Network for Underwater Image Enhancement. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 6584–6598. [Google Scholar] [CrossRef]
  4. Zhang, W.; Li, X.; Xu, S.; Li, X.; Yang, Y.; Xu, D.; Liu, T.; Hu, H. Underwater Image Restoration via Adaptive Color Correction and Contrast Enhancement Fusion. Remote Sens. 2023, 15, 4699. [Google Scholar] [CrossRef]
  5. Zhou, J.; Liu, Q.; Jiang, Q.; Ren, W.; Lam, K.; Zhang, W. Underwater Camera: Improving Visual Perception via Adaptive Dark Pixel Prior and Color Correction. Int. J. Comput. Vis. 2023, 1, 1–19. [Google Scholar] [CrossRef]
  6. Zhou, J.; Li, B.; Zhang, D.; Yuan, J.; Zhang, W.; Cai, Z.; Shi, J. UGIF-Net: An Efficient Fully Guided Information Flow Network for Underwater Image Enhancement. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–17. [Google Scholar] [CrossRef]
  7. Zhou, J.; Wang, Y.; Li, C.; Zhang, W. Multicolor Light Attenuation Modeling for Underwater Image Restoration. IEEE J. Ocean. Eng. 2023, 48, 1322–1337. [Google Scholar] [CrossRef]
  8. Buchsbaum, G. A Spatial Processor Model for Object Colour Perception. J. Frankl. Inst. 1980, 310, 1–26. [Google Scholar] [CrossRef]
  9. Pizer, S.; Johnston, R.; Ericksen, J.; Yankaskas, B.; Muller, K. Contrast-Limited Adaptive Histogram Equalization: Speed and Effectiveness. In Proceedings of the First Conference on Visualization in Biomedical Computing, Atlanta, GA, USA, 22–25 May 1990; pp. 337–345. [Google Scholar]
  10. Li, C.; Guo, J. Underwater Image Enhancement by Dehazing and Color Correction. J. Electron. Imaging 2015, 24, 33023. [Google Scholar] [CrossRef]
  11. Henke, B.; Vahl, M.; Zhou, Z. Removing Color Cast of Underwater Images through Non-Constant Color Constancy Hypothesis. In Proceedings of the 2013 8th International Symposium on Image and Signal Processing and Analysis (ISPA), Trieste, Italy, 4–6 September 2013; pp. 20–24. [Google Scholar]
  12. Trucco, E.; Olmos-Antillon, A.T. Self-Tuning Underwater Image Restoration. IEEE J. Ocean. Eng. 2006, 31, 511–519. [Google Scholar] [CrossRef]
  13. Galdran, A.; Pardo, D.; Picón, A.; Alvarez-Gila, A. Automatic Red-Channel Underwater Image Restoration. J. Vis. Commun. Image Represent. 2015, 26, 132–145. [Google Scholar] [CrossRef]
  14. Emberton, S.; Chittka, L.; Cavallaro, A. Hierarchical Rank-based Veiling Light Estimation for Underwater Dehazing. In Proceedings of the British Machine Vision Conference (BMVC), Swansea, UK, 7–10 September 2015; pp. 125.1–125.12. [Google Scholar]
  15. Berman, D.; Levy, D.; Avidan, S.; Treibitz, T. Underwater Single Image Color Restoration using Haze-Lines and a New Quantitative Dataset. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 2822–2837. [Google Scholar] [CrossRef] [PubMed]
  16. Zhao, R.; Han, Y.; Zhao, J. End-to-End Retinex-Based Illumination Attention Low-Light Enhancement Network for Autonomous Driving at Night. Comput. Intell. Neurosci. 2022, 2022, 4942420. [Google Scholar] [CrossRef] [PubMed]
  17. Liu, P.; Feng, J.; Sang, J.; Kim, Y. Fusion Attention Mechanism for Foreground Detection Based on Multiscale U-Net Architecture. Comput. Intell. Neurosci. 2022, 2022, 7432615. [Google Scholar] [CrossRef] [PubMed]
  18. Wu, J.; Zheng, X.; Liu, D.; Ai, L.; Tang, P.; Wang, B.; Wang, Y. WBC Image Segmentation Based on Residual Networks and Attentional Mechanisms. Comput. Intell. Neurosci. 2022, 2022, 1610658. [Google Scholar] [CrossRef]
  19. Li, J.; Skinner, K.A.; Eustice, R.M.; Johnson-Roberson, M. WaterGAN: Unsupervised Generative Network to Enable Real-Time Color Correction of Monocular Underwater Images. IEEE Robot. Autom. Lett. 2017, 3, 387–394. [Google Scholar] [CrossRef]
  20. Liu, X.; Gao, Z.; Chen, B.M. IPMGAN: Integrating Physical Model and Generative Adversarial Network for Underwater Image Enhancement. Neurocomputing 2021, 453, 538–551. [Google Scholar] [CrossRef]
  21. Li, C.; Anwar, S.; Hou, J.; Cong, R.; Guo, C.; Ren, W. Underwater Image Enhancement via Medium Transmission-Guided Multi-Color Space Embedding. IEEE Trans. Image Process. 2021, 30, 4985–5000. [Google Scholar] [CrossRef]
  22. Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing Underwater Imagery using Generative Adversarial Networks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar]
  23. Anwar, S.; Li, C.; Porikli, F. Deep Underwater Image Enhancement. arXiv 2018, arXiv:1807.03528. [Google Scholar]
  24. Islam, M.J.; Xia, Y.; Sattar, J. Fast Underwater Image Enhancement for Improved Visual Perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
  25. Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE Trans. Image Process. 2019, 29, 4376–4389. [Google Scholar] [CrossRef] [PubMed]
  26. Ancuti, C.; Ancuti, C.O.; Haber, T.; Bekaert, P. Enhancing Underwater Images and Videos by Fusion. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 81–88. [Google Scholar]
  27. Ancuti, C.O.; Ancuti, C.; De Vleeschouwer, C.; Bekaert, P. Color Balance and Fusion for Underwater Image Enhancement. IEEE Trans. Image Process. 2017, 27, 379–393. [Google Scholar] [CrossRef] [PubMed]
  28. Huang, D.; Wang, Y.; Song, W.; Sequeira, J.; Mavromatis, S. Shallow-Water Image Enhancement using Relative Global Histogram Stretching based on Adaptive Parameter Acquisition. In Proceedings of the International Conference on Multimedia Modeling, Bangkok, Thailand, 5–7 February 2018; pp. 453–465. [Google Scholar]
  29. Starosolski, R. New simple and efficient color space transformations for lossless image compression. J. Vis. Commun. Image Represent. 2014, 25, 1056–1063. [Google Scholar] [CrossRef]
  30. Wiseman, Y. Adapting the H.264 Standard to the Internet of Vehicles. Technologies 2023, 11, 103. [Google Scholar] [CrossRef]
  31. Zhuang, P.; Li, C.; Wu, J. Bayesian Retinex Underwater Image Enhancement. Eng. Appl. Artif. Intell. 2021, 101, 104171. [Google Scholar] [CrossRef]
  32. He, K.; Sun, J.; Tang, X. Single Image Haze Removal using Dark Channel Prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
  33. Peng, Y.T.; Cosman, P.C. Underwater Image Restoration based on Image Blurriness and Light Absorption. IEEE Trans. Image Process. 2017, 26, 1579–1594. [Google Scholar] [CrossRef]
  34. Akkaynak, D.; Treibitz, T. A Revised Underwater Image Formation Model. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6723–6732. [Google Scholar]
  35. Akkaynak, D.; Treibitz, T. Sea-thru: A Method for Removing Water from Underwater Images. In Proceedings of the IEEE/CVF Conference on computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1682–1691. [Google Scholar]
  36. Li, C.; Anwar, S.; Porikli, F. Underwater Scene Prior Inspired Deep Underwater Image and Video Enhancement. Pattern Recognit. 2020, 98, 107038. [Google Scholar] [CrossRef]
  37. Yang, H.H.; Huang, K.C.; Chen, W.T. Laffnet: A Lightweight Adaptive Feature Fusion Network for Underwater Image Enhancement. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 685–692. [Google Scholar]
  38. Wang, Y.; Guo, J.; Gao, H.; Yue, H. UIECˆ2-Net: CNN-based Underwater Image Enhancement using Two Color Space. Signal Process. Image Commun. 2021, 96, 116250. [Google Scholar] [CrossRef]
  39. McGlamery, B.L. A Computer Model for Underwater Camera Systems. In Proceedings of the Ocean Optics VI, Monterey, CA, USA, 23–25 October 1979; Volume 208, pp. 221–231. Available online: https://www.spiedigitallibrary.org/conference-proceedings-of-spie/0208/0000/A-Computer-Model-For-Underwater-Camera-Systems/10.1117/12.958279.short (accessed on 2 December 2023).
  40. Jaffe, J.S. Computer Modeling and the Design of Optimal Underwater Imaging Systems. IEEE J. Ocean. Eng. 1990, 15, 101–111. [Google Scholar] [CrossRef]
  41. Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
  42. Yu, F.; Koltun, V. Multi-scale Context Aggregation by Dilated Convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
  43. Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  44. Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual Losses for Real-Time Style Transfer and Super-Resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–711. [Google Scholar]
  45. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  46. Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
  47. Song, W.; Wang, Y.; Huang, D.; Tjondronegoro, D. A Rapid Scene Depth Estimation Model based on Underwater Light Attenuation Prior for Underwater Image Restoration. In Proceedings of the Pacific Rim Conference on Multimedia, Hefei, China, 21–22 September 2018; pp. 678–688. [Google Scholar]
  48. Zhang, W.; Zhuang, P.; Sun, H.H.; Li, G.; Kwong, S.; Li, C. Underwater Image Enhancement via Minimal Color Loss and Locally Adaptive Contrast Enhancement. IEEE Trans. Image Process. 2022, 31, 3997–4010. [Google Scholar] [CrossRef] [PubMed]
  49. Liu, P.; Wang, G.; Qi, H.; Zhang, C.; Zheng, H.; Yu, Z. Underwater Image Enhancement with a Deep Residual Framework. IEEE Access 2019, 7, 94614–94629. [Google Scholar] [CrossRef]
Figure 1. The model of underwater optical imaging.
Figure 1. The model of underwater optical imaging.
Electronics 13 00196 g001
Figure 2. Structure of the DLRNet. The left part is the Feature Extraction Module and the right part is the Dual Layers Regression Module which has three optimization units.
Figure 2. Structure of the DLRNet. The left part is the Feature Extraction Module and the right part is the Dual Layers Regression Module which has three optimization units.
Electronics 13 00196 g002
Figure 3. Attention Module. The Attention Module captures more channel features after squeeze and excitation operation. We utilize global average pooling to realize squeeze and use two fully connected layers to acquire new weights in excitation.
Figure 3. Attention Module. The Attention Module captures more channel features after squeeze and excitation operation. We utilize global average pooling to realize squeeze and use two fully connected layers to acquire new weights in excitation.
Electronics 13 00196 g003
Figure 4. Qualitative results in UIEBD. IBLA, RGHS, ULAP, and MLLE are traditional methods. FUnIE_GAN, UGAN, UResNet, WaterNet, and Ucolor are deep learning-based methods. Ground Truth is the reference image with high quality.
Figure 4. Qualitative results in UIEBD. IBLA, RGHS, ULAP, and MLLE are traditional methods. FUnIE_GAN, UGAN, UResNet, WaterNet, and Ucolor are deep learning-based methods. Ground Truth is the reference image with high quality.
Electronics 13 00196 g004
Figure 5. Qualitative results in the EUVP-underwater scenes dataset. IBLA, RGHS, and ULAP are traditional methods. FUnIE_GAN, UGAN, UResNet, and WaterNet are deep learning-based methods. Ground Truth is the reference image with high quality.
Figure 5. Qualitative results in the EUVP-underwater scenes dataset. IBLA, RGHS, and ULAP are traditional methods. FUnIE_GAN, UGAN, UResNet, and WaterNet are deep learning-based methods. Ground Truth is the reference image with high quality.
Electronics 13 00196 g005
Figure 6. Qualitative results in the EUVP-underwater imagenet dataset. IBLA, RGHS, and ULAP are traditional methods. FUnIE_GAN, UGAN, UResNet, and WaterNet are deep learning-based methods. Ground Truth is the reference image with high quality.
Figure 6. Qualitative results in the EUVP-underwater imagenet dataset. IBLA, RGHS, and ULAP are traditional methods. FUnIE_GAN, UGAN, UResNet, and WaterNet are deep learning-based methods. Ground Truth is the reference image with high quality.
Electronics 13 00196 g006
Figure 7. Qualitative results in the EUVP-underwater dark dataset. IBLA, RGHS, and ULAP are traditional methods. FUnIE_GAN, UGAN, UResNet, and WaterNet are deep learning-based methods. Ground Truth is the reference image with high quality.
Figure 7. Qualitative results in the EUVP-underwater dark dataset. IBLA, RGHS, and ULAP are traditional methods. FUnIE_GAN, UGAN, UResNet, and WaterNet are deep learning-based methods. Ground Truth is the reference image with high quality.
Electronics 13 00196 g007
Table 1. Image Quantitative Evaluation of different methods on EUVP and UIEB datasets. The best results are bolded.
Table 1. Image Quantitative Evaluation of different methods on EUVP and UIEB datasets. The best results are bolded.
MethodsEUVP-DarkEUVP-ImagenetEUVP-ScenesUIEBD
Original16.10/0.8216.98/0.7420.93/0.8217.75/0.77
IBLA [33]16.66/0.7716.09/0.6319.65/0.7215.31/0.65
RGHS [28]15.91/0.7916.51/0.7118.43/0.7519.72/0.84
ULAP [47]17.37/0.7718.39/0.7119.93/0.7516.33/0.76
FUnIE_GAN [24]21.17/0.8822.21/0.7725.48/0.8319.82/0.83
UResnet [49]20.99/0.8723.07/0.8126.63/0.8719.31/0.83
UGAN [22]21.11/0.8724.19/0.8325.27/0.8422.78/0.83
WaterNet [25]20.80/0.8622.50/0.8222.65/0.8223.82/0.89
Ucolor [21]20.56/0.8623.12/0.7826.21/0.8722.28/0.90
MLLE [48]14.28/0.5915.44/0.5814.98/0.6318.22/0.73
DLRNet21.55/0.8924.67/0.8527.04/0.9024.15/0.91
Table 2. PSNR values of outputs in each stage of DLRNet. The best results are bolded.
Table 2. PSNR values of outputs in each stage of DLRNet. The best results are bolded.
J 0 J 1 J 2 J 3
EUVP-dark20.9221.1721.5521.08
EUVP-imagenet24.4024.5624.6724.71
EUVP-scenes26.1526.7427.0426.12
UIEBD22.7423.4224.1523.85
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jia, H.; Xiao, Y.; Wang, Q.; Chen, X.; Han, Z.; Tang, Y. Underwater Image Enhancement Network Based on Dual Layers Regression. Electronics 2024, 13, 196. https://doi.org/10.3390/electronics13010196

AMA Style

Jia H, Xiao Y, Wang Q, Chen X, Han Z, Tang Y. Underwater Image Enhancement Network Based on Dual Layers Regression. Electronics. 2024; 13(1):196. https://doi.org/10.3390/electronics13010196

Chicago/Turabian Style

Jia, Huidi, Yeqing Xiao, Qiang Wang, Xiai Chen, Zhi Han, and Yandong Tang. 2024. "Underwater Image Enhancement Network Based on Dual Layers Regression" Electronics 13, no. 1: 196. https://doi.org/10.3390/electronics13010196

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop