Enhanced U-Net for Underwater Laser Range-Gated Image Restoration: Boosting Underwater Target Recognition

Liu, Peng; Chen, Shuaibao; He, Wei; Wang, Jue; Chen, Liangpei; Tan, Yuguang; Luo, Dong; Chen, Wei; Jiao, Guohua

doi:10.3390/jmse13040803

Open AccessArticle

Enhanced U-Net for Underwater Laser Range-Gated Image Restoration: Boosting Underwater Target Recognition

by

Peng Liu

^1,2

,

Shuaibao Chen

^1,3

,

Wei He

¹,

Jue Wang

^1,3,

Liangpei Chen

^1,2

,

Yuguang Tan

¹,

Dong Luo

¹

,

Wei Chen

^1,2

and

Guohua Jiao

^1,2,*

¹

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

²

University of Chinese Academy of Sciences, Beijing 101408, China

³

College of Engineering, Southern University of Science and Technology, Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(4), 803; https://doi.org/10.3390/jmse13040803

Submission received: 6 March 2025 / Revised: 9 April 2025 / Accepted: 15 April 2025 / Published: 17 April 2025

(This article belongs to the Special Issue Application of Advanced Technologies in Maritime Safety—Second Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Underwater optical imaging plays a crucial role in maritime safety, enabling reliable navigation, efficient search and rescue operations, precise target recognition, and robust military reconnaissance. However, conventional underwater imaging methods often suffer from severe backscattering noise, limited detection range, and reduced image clarity—challenges that are exacerbated in turbid waters. To address these issues, Underwater Laser Range-Gated Imaging has emerged as a promising solution. By selectively capturing photons within a controlled temporal gate, this technique effectively suppresses backscattering noise-enhancing image clarity, contrast, and detection range. Nevertheless, residual noise within the imaging slice can still degrade image quality, particularly in challenging underwater conditions. In this study, we propose an enhanced U-Net neural network designed to mitigate noise interference in underwater laser range-gated images, improving target recognition performance. Built upon the U-Net architecture with added residual connections, our network combines a VGG16-based perceptual loss with Mean Squared Error (MSE) as the loss function, effectively capturing high-level semantic features while preserving critical target details during reconstruction. Trained on a semi-synthetic grayscale dataset containing synthetically degraded images paired with their reference counterparts, the proposed approach demonstrates improved performance compared to several existing underwater image restoration methods in our experimental evaluations. Through comprehensive qualitative and quantitative evaluations, underwater target detection experiments, and real-world oceanic validations, our method demonstrates significant potential for advancing maritime safety and related applications.

Keywords:

laser range-gated imaging; maritime safety; U-Net; target recognition; VGG16

1. Introduction

Underwater optical imaging [1,2] plays a vital role in maritime safety, enabling critical applications including marine navigation, search and rescue operations, infrastructure inspection, ecological monitoring, and military reconnaissance [3,4,5,6,7,8]. High-quality underwater imaging is essential for detecting submerged objects, monitoring marine ecosystems, and ensuring the operational security of underwater vehicles. However, traditional underwater optical imaging methods face significant limitations, due to severe backscattering noise, rapid light attenuation, and the presence of suspended particles, all of which degrade image clarity and contrast. These challenges become even more pronounced in turbid waters, where light scattering and absorption drastically reduce visibility and detection range. Additionally, conventional imaging techniques often struggle to distinguish targets from background noise, making them less effective in complex underwater environments. To overcome these drawbacks, Underwater Laser Range-Gated Imaging (ULRGI) has emerged as a critical optical imaging technology. By utilizing pulsed laser illumination and high-speed gating, ULRGI effectively filters out scattered light from the return signal, significantly enhancing the contrast and clarity of underwater images [9,10,11,12]. Compared to conventional optical imaging, Underwater Laser Range-Gated Imaging provides the additional advantage of long-range imaging, enabling the capture of high-quality images from greater distances; this capability is crucial for maritime search and rescue operations, as well as target detection [13,14,15]. However, in real-world oceanic and underwater environments, the presence of suspended particles, plankton, and micro-organisms often increases water turbidity, presenting significant challenges to imaging systems [16]. Turbid environments introduce two-fold degradation: firstly, increased imaging noise directly degrades image quality, and, secondly, intensified light scattering and absorption mechanisms reduce the SNR, fundamentally limiting effective information extraction from background interference [17,18,19]. Furthermore, under turbid conditions, the resolution and contrast of the imaging system are compromised, making target identification and localization increasingly difficult. In military reconnaissance and security surveillance, this degradation in image quality can delay detection and response to potential threats [20]. In maritime search and rescue operations, it can hinder the timely identification of rescue targets, ultimately affecting the efficiency and success rate of rescue efforts. Therefore, improving the SNR and reducing noise interference are of paramount importance for enhancing underwater target recognition and detection capabilities.

Traditional underwater image restoration techniques primarily rely on physical models or image processing algorithms to enhance image quality. He [21] proposed the Dark Channel Prior (DCP) method for image dehazing. Building on DCP, Drews [22] introduced an adaptive method to estimate light transmission in underwater environments, known as the Underwater Dark Channel Prior (UDCP) image dehazing algorithm. Peng [23] developed a technique based on Image Blurriness and Light Absorption (IBLA) to extract depth maps from degraded underwater images, subsequently using an exponential decay model to estimate transmission. Carlevaris [24] proposed a novel Maximum Intensity Prior (MIP) method, which innovatively explores significant attenuation differences across various color channels in underwater images, to estimate the transmission map. Chiang [25] proposed a new underwater image enhancement algorithm called Wavelength Compensation and Dehazing (WCID), which addresses light scattering, color changes, and the influence of artificial light by compensating for attenuation and restoring color balance. Hou [26] proposed an Illumination Channel Sparsity Prior (ICSP)-guided variational framework for non-uniform illumination underwater image restoration. This method enhances brightness, corrects color distortion, and reveals fine-scale details by integrating ICSP into an extended underwater image formation model. However, the inherent complexity and diversity of underwater environments pose significant challenges in deriving accurate and universally applicable prior knowledge. The domain-specific nature of these priors often results in suboptimal restoration performance when encountering unlearned environmental conditions.

In recent years, deep learning technologies have shown significant potential in underwater image restoration, particularly in handling complex multidimensional and non-linear signals. Fabbri [27] introduced an Underwater Generative Adversarial Network (UGAN) designed for underwater image restoration, which addresses issues such as light refraction, absorption, and color distortion in underwater visual data. Liu [28] introduced the Underwater ResNet (UResnet), a residual learning model for underwater image enhancement. It utilizes CycleGAN for data generation and VDSR for resolution improvement, with innovative loss functions and training modes to achieve superior color correction and detail enhancement. Li [29] proposed a new underwater restoration model called UnderWater Convolutional Neural Network (UWCNN). This model utilizes a synthetic underwater image database and employs an end-to-end, data-driven automatic training mechanism to achieve high-resolution restoration of underwater images while effectively suppressing the green tint in the images. Han [30] presented a method called Contrastive underWater Restoration (CWR), based on an unsupervised image-to-image translation framework. It utilizes contrastive learning and generative adversarial networks to maximize mutual information between raw and restored images, achieving state-of-the-art results. Fu [31] introduced an UnSupervised Underwater Image Restoration (USUIR) method that leverages image homology to estimate latent components and generate re-degraded images for effective restoration, demonstrating promising results in both speed and quality. While demonstrating promising restoration performance, existing deep learning approaches predominantly target color image restoration—a fundamental mismatch with the grayscale nature of Underwater Laser Range-Gated Imaging systems.

The lack of color information in grayscale images means that traditional deep learning models, which are based on color images, may not fully exploit their advantages in feature extraction and restoration. Therefore, it is essential to appropriately adjust or redesign these models to better accommodate the characteristics of underwater grayscale images for effective restoration. In this paper, we propose a perceptually enhanced network architecture based on U-Net [32], termed U-Net with Perceptual Enhancement (UP-Net). Building upon the U-Net framework, our network adopts a four-layer structure with residual connections to enhance gradient flow and mitigate noise interference, combined with a hybrid loss function integrating pixel-level fidelity and high-level semantic consistency. This design explicitly addresses three critical challenges in underwater laser range-gated image restoration: (1) suppression of backscattering noise while preserving weak target signals, (2) accurate reconstruction of fine textures and structural details, and (3) robustness in turbid environments with complex degradation patterns. Experimental validation demonstrates that UP-Net significantly outperforms existing methods in noise suppression, perceptual quality, and downstream target recognition tasks, achieving reliable generalization in real-world marine scenarios.

2. Methods

2.1. Underwater Laser Range-Gated Imaging

The Underwater Laser Range-Gated Imaging technique primarily consists of a nanosecond-pulsed laser source and a synchronously gated Intensified Charge-Coupled Devices (ICCD) camera. The timing of the laser pulse emission and the opening of the camera shutter are precisely controlled, to separate the target signal from scattered light [33,34]. The detailed process is illustrated in Figure 1. A pulsed laser generates a very short-duration pulse, which travels through the water and reaches the target. After being reflected by the target, the pulse travels back through the water to the gated camera. Before the laser pulse reaches the camera, the shutter remains closed, preventing scattered light, generated during the laser’s transmission through the water, from entering the camera. When the reflected pulse arrives at the camera, the shutter opens briefly, known as the gate time, to capture the reflected light from the target before closing again. The gate time is typically slightly longer than the laser pulse duration, ensuring that the target signal is received while minimizing the amount of backscattered light entering the camera. Due to the short duration of the laser pulse, ICCD are commonly used as imaging equipment to enhance detection.

In the context of range-gated underwater imaging systems, the total detected optical power by the system at any given time is modeled as the sum of different components [35]:

P (t) = P_{BSN} (t) + P_{S} (t) + P_{SMM} (t) .

(1)

The total detected optical power

P (t)

at any given time can be expressed as the sum of three components:

P_{BSN} (t)

, the energy contribution from backscattered photons within the Field Of View (FOV) that do not include target reflections;

P_{S} (t)

, the energy from photons directly reflected by the target, which represents the desired signal in range-gated imaging; and

P_{SMM} (t)

, the energy from multiply scattered photons that results in the loss of target information.

The backscattering noise

P_{BSN} (t)

is modeled using a convolution of the Temporal Point Spread Function (TPSF) and the emitted pulse profile [35]:

P_{BSN} (t) = \int_{\frac{t - 2 t_{0}}{2}}^{\frac{t}{2}} S (r) \cdot P_{0} (t - \frac{2 r}{v}) d r,

(2)

Here,

P_{0} (t)

represents the emitted pulse profile, while

S (r)

is the kernel function that incorporates the Temporal Point Spread Function (TPSF), radiation decay, and receiver efficiency. The parameter

t_{0}

denotes the duration of the emitted pulse, and v is the speed of light in the medium.

The target-reflected signal

P_{S} (t)

received by the camera is modeled as a convolution of the arriving pulse with the target reflectivity. This can be expressed as [35]

P_{S} (t) = B \cdot P_{0} (t - \frac{2 r_{0}}{v}), for \frac{2 r_{0}}{v} \leq t \leq \frac{2 r_{0}}{v} + 2 t_{0},

(3)

where B is a coefficient representing the combined effects of target reflectivity, signal attenuation, and radiation decay. Here,

P_{0} (t)

is the emitted pulse profile,

r_{0}

is the distance to the target,

t_{0}

is the duration of the emitted pulse, and v is the speed of light in the medium. This equation defines the temporal profile of the signal reflected by the target and accounts for the attenuation and scattering effects of the medium.

In scenarios with short attenuation lengths, multiply scattered photons are rapidly absorbed during propagation. Similarly, in narrow fields of view, most multiply scattered photons are scattered outside the receiver’s field of view. As a result, we assume that the contribution of

P_{SMM} (t)

to the total signal received by the system can be neglected [36]. By leveraging range-gated imaging technology, the impact of backscattering noise can be effectively reduced, thereby enhancing the prominence of the target-reflected signal

P_{S} (t)

.

However, as water turbidity increases, backscattering noise

P_{BSN} (t)

intensifies significantly, further weakening the target-reflected signal

P_{S} (t)

and leading to degraded image quality received by the system. In highly turbid environments, the severe backscattering and signal attenuation make it extremely challenging to capture clear target information. To address this, we integrate deep learning with Underwater Laser Range-Gated Imaging technology. By leveraging the strong adaptive learning capabilities of neural networks, this approach bypasses traditional imaging models and directly performs “end-to-end” learning on underwater degraded images, producing clear underwater images as output.

2.2. Hardware Architecture of the Underwater Laser Range-Gated Imaging System

Based on the laser range-gated imaging principle described in the previous section, we designed and integrated an Underwater Laser Range-Gated Imaging system. Figure 2a illustrates its three-dimensional structural model. The system utilizes a laser with a wavelength of 532 nm, a repetition rate of 2 kHz, a pulse width of 1.5 ns, and a maximum pulse energy of 2 mJ as the active light source. In our experiments, we set the laser repetition rate to 1 kHz, with a pulse width of 1.5 ns and a pulse energy of 1.5 mJ, resulting in an average power of approximately 1.5 W and a peak power of about 1 MW. For image acquisition, we employed an ICCD camera with a resolution of 1388 × 1038 pixels and a pixel size of 6.45 µm. To ensure reliable operation in underwater environments, we enclosed the system in a sealed, waterproof housing. The fully integrated system is depicted in Figure 2b. To assess its performance, we conducted experiments in a laboratory water tank made of stainless steel, measuring 8.9 m × 1.7 m × 1.0 m. Figure 2c presents a photograph of the system in operation, with laser illumination inside the tank.

2.3. Dataset Preparation

In supervised deep learning network training, degraded images and their corresponding clear reference images are typically used as image pairs, allowing the model to learn how to recover clear reference images from blurred or noisy inputs. However, acquiring real clear underwater reference images is challenging, due to the complex nature of underwater environments and the unpredictable characteristics of underwater noise, which can degrade image quality. To address this challenge, researchers often simulate degraded images or generate them as substitutes for image pairs, or they control imaging conditions to create these pairs [37,38]. Since underwater range-gated images obtained with active laser illumination are grayscale, and most publicly available underwater image datasets are in color [37,38,39,40,41,42], direct application of these datasets is not feasible for laser-based restoration tasks. Additionally, the noise distribution in images captured by traditional underwater cameras differs significantly from the fixed-pattern noise in range-gated images, creating domain shifts, which critically degrades the generalization ability of data-driven models trained on conventional datasets.

To construct a comprehensive dataset, we captured 800 clear reference images of various toy models at different distances, using an Underwater Laser Range-Gated Imaging system in our water tank filled with clear tap water (as shown in Figure 2c); these images served as reference images. To simulate turbid water conditions, we incrementally added pure milk to the tank and captured multiple background-only images (containing only background noise) at varying turbidity levels. Four background images with significant noise variations were selected and pixel-wise added to each reference image to synthesize degraded images, as illustrated in Figure 3. To validate the statistical relevance of our synthetic degradation approach, we conducted comparative analyses between synthetic and real degraded images. First, we compared the histogram distributions of pixel intensities in the synthetic and real degraded images. As shown in Figure 3, the histograms exhibited strong alignment, particularly in the noise-dominated background regions, confirming that our additive noise model replicates the statistical characteristics of underwater laser range-gated image degradation in turbid conditions (specifically captured by our system).

Spatial Frequency (SF) is a quantitative metric that characterizes the overall activity level of gray-level variations in an image. This metric comprehensively captures high-frequency components including edges, textures, and noise patterns. It has been widely adopted in applications such as image quality assessment, sensor noise analysis, and algorithm performance benchmarking. SF is computed as the root mean square of the gradient magnitudes across the entire image:

S F = \sqrt{\frac{1}{M N} \sum_{i = 1}^{M} \sum_{j = 1}^{N} [{({\frac{\partial f}{\partial x}|}_{(i, j)})}^{2} + {({\frac{\partial f}{\partial y}|}_{(i, j)})}^{2}]}

(4)

with the following definitions:

$S F$ : Spatial Frequency value (higher values indicate more intensive texture or noise);
$M \times N$ : Dimensions of the digital image in pixels;
${\frac{\partial f}{\partial x}|}_{(i, j)}$ : Horizontal gradient (approximated by central differences);
${\frac{\partial f}{\partial y}|}_{(i, j)}$ : Vertical gradient (approximated by central differences).

Through Spatial Frequency analysis, the indices of the synthetic underwater degraded image and the real degraded image in Figure 3 were 2.3512 and 2.3566, respectively, demonstrating a high degree of consistency. This result indicates that the synthetic underwater degraded image exhibited statistical similarity to the real degraded image, in terms of spatial texture complexity and grayscale variation intensity. This validates the effectiveness of our synthetic method for underwater degraded images in accurately replicating the typical noise characteristics of Underwater Laser Range-Gated Imaging systems.

We synthesized a total of 3200 degraded images, resulting in 3200 image pairs, each consisting of a reference image and its corresponding degraded image. To mitigate overfitting risks, we selected 2560 image pairs (80% of the images pairs) as the training set, which included images of four different targets captured from various distances, angles, and positions. The remaining pairs were used to create the test set, from which 104 pairs were randomly selected. The test set featured six different targets, including two that were not present in the training set. All the images of the dataset were grayscale. To construct a comprehensive Underwater Laser Range-Gated Imaging dataset, every training image was divided into smaller regions, and random augmentations, such as flipping and rotation, were applied to each image, generating eight sub-images with dimensions of 256 × 256 pixels. After data augmentation and expansion, the training set contained a total of 20,480 image pairs, with 19,456 pairs used for training and 1024 pairs used for validation during model training. Additionally, we prepared a real degraded set consisting of 100 underwater laser range-gated images captured in a turbid environment (we added 250 mL pure milk) within our experimental water tank. This dataset included seven different targets, three of which were unseen during training. The comprehensive dataset statistics are summarized in Table 1, which quantifies the scale, diversity, and quality control measures implemented throughout our data preparation pipeline.

2.4. U-Net with Perceptual Enhancement Network (UP-Net)

2.4.1. The Structure of UP-Net

Inspired by the network architecture of U-Net [32,43,44], we propose a perceptually enhanced network structure based on U-Net, called U-Net with Perceptual Enhancement (UP-Net), to improve the quality of underwater image restoration. UP-Net builds upon U-Net by integrating perceptual loss and introducing cross-layer residual connections, which improve gradient flow and enable deeper feature learning. Notably, UP-Net employs a four-layer deep encoder–decoder structure for effective feature extraction and image reconstruction.

As shown in Figure 4, the encoding path consists of four sets of double convolution modules. Each module contains two Conv2D layers with 3 × 3 convolution kernels (the number of channels gradually increases to 256), followed by a ReLU activation after each convolution layer. At the end of each module, spatial downsampling is performed, using 2 × 2 max pooling (with the resolution decreasing step by step), forming a hierarchical representation that includes multi-scale features. In the bottleneck stage, the feature maps are further expanded to 512 channels through two Conv2D layers, resulting in compact high-dimensional features with a resolution of 16 × 16. The decoding path adopts a symmetric structure, where each module first uses a 2 × 2 upsampling layer to restore the spatial resolution (gradually increasing) and then performs skip connections with the corresponding feature maps from the encoding path. Each decoding module contains two 3 × 3 transposed convolution layers, with the number of channels gradually decreasing to 32. The final output layer uses a 1 × 1 convolution kernel to compress the channels to 1, with a linear activation to generate a predicted image of the same size as the input. Additionally, the architecture incorporates cross-layer residual connections by directly adding the original input to the network’s final output, to enhance gradient flow and promote deeper feature learning.

2.4.2. Loss Function

In this study, we propose an appropriate combination of loss functions for the training process, integrating the perceptual loss from the pre-trained VGG16 network (Visual Geometry Group 16-layer network) [45,46,47] with pixel loss—Mean Squared Error (MSE) [48], which is one of the most commonly used metrics for measuring accuracy, particularly in image restoration tasks. It calculates the pixel-level differences between the input image and the ground-truth reference image, making it a “per-pixel loss function”. By squaring the differences, MSE emphasizes larger errors, ensuring that the model minimizes significant deviations while striving for accurate pixel-wise reconstruction.

VGG16 focuses on extracting high-level features from images, and it uses the differences between synthesized images and target images in the feature space as a perceptual loss. This approach emphasizes the overall structure, texture, and visual quality of the images, rather than just pixel-level differences. The perceptual loss function helps maintain fine-grained details in the synthesized image by comparing the high-level semantic features extracted from intermediate layers of the VGG16 network. The use of VGG16 helps the model focus on preserving content and structure during tasks such as denoising, which requires balancing the preservation of high-level features and reducing pixel-wise errors.

The VGG16 model [47] used here is pre-trained on the ImageNet [49,50,51] dataset and comprises a 16-layer deep convolutional network designed for image classification. The network extracts hierarchical features from images, with the deeper layers capturing high-level semantic information, such as textures and object shapes. In this case, the perceptual loss was calculated using the activations from the ‘block3-conv3’ layer of the VGG16 network, which provides a rich feature map for evaluating image similarity in a perceptual sense.

This combination of loss functions effectively enhances the quality of the output images by ensuring that the denoising process not only focuses on pixel-level accuracy but also maintains the consistency of the image content. The two losses are weighted to balance their contributions to the optimization, with the perceptual loss providing a stronger focus on image content and structure, and the MSE loss ensuring fine pixel-wise accuracy.

The relationship between these two loss components can be mathematically expressed as

L = λ \cdot L_{M S E} + (1 - λ) L_{p e r c e p t u a l}

(5)

where

L_{M S E}

is the Mean Squared Error loss component, and where

L_{p e r c e p t u a l}

is the perceptual loss component. The parameter

λ

is a weighting factor that balances the contributions of the two loss components. The MSE loss can be expressed as

L_{M S E} = \frac{1}{N} \sum_{i = 1}^{N} {∥I_{p r e d} (i) - I_{r e f} (i)∥}^{2}

(6)

where

I_{p r e d} (i)

represents the value of the i-th pixel in the image synthesized by the model, and

I_{r e f} (i)

is the value of the i-th pixel in the reference image. N denotes the total number of pixels in the image.

The perceptual loss uses high-level feature maps from the pre-trained VGG16 network to compute the differences between the predicted image and the reference image, ensuring that they are perceptually similar. The intermediate layers of the VGG16 network extract high-level semantic features from the images, making this loss more effective at capturing visual features than simple pixel-based losses. The principle of perceptual loss can be expressed as follows:

L_{p e r c e p t u a l} = \frac{1}{C H W} \sum_{c = 1}^{C} \sum_{h = 1}^{H} \sum_{w = 1}^{W} {∥ϕ_{l} {(I_{p r e d})}_{c h w} - ϕ_{l} {(I_{r e f})}_{c h w}∥}^{2}

(7)

where

ϕ_{l}

represents the feature map after the l-th layer of the VGG16 network,

I_{p r e d}

is the synthesized image, and

I_{r e f}

is the reference image. C, H, and W represent the number of channels, height, and width of the feature map at that layer, respectively. Perceptual loss measures the similarity between the synthesized image and the reference image, in terms of high-level semantic features. This loss is particularly effective when the pixel differences between the reference and synthesized images are small, as it captures the semantic similarity between the two images more effectively.

The model was trained for 200 epochs, using Python 3.8 and TensorFlow on an NVIDIA Tesla V100S-PCIE-32GB GPU. To ensure efficient memory usage and optimize the learning process, training was performed using mini-batch stochastic gradient descent with a batch size of 16. The initial learning rate was set to 0.001, and the Adam optimizer was chosen for its adaptive learning rate capability, facilitating effective convergence.

2.5. Imaging Quality

2.5.1. Full-Reference Metrics

To evaluate the quality of imaging results against reference images, several quantitative metrics are commonly employed [52]. Among these, Mean Squared Error (MSE), Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM) [53], and Learned Perceptual Image Patch Similarity (LPIPS) [54] are particularly significant. These metrics assess the fidelity and perceptual quality of images by comparing them to their corresponding reference images, providing valuable insights into the performance of various imaging techniques.

MSE is a widely used metric for evaluating the quality of reconstructed images when a reference image is available. It computes the average of the squared differences between the corresponding pixel values of the reference and the predicted image. A lower MSE value indicates better image quality. The formula for MSE is given by

MSE = \frac{1}{x y} \sum_{i = 1}^{x} \sum_{j = 1}^{y} {(I_{pred} (i, j) - I_{ref} (i, j))}^{2}

(8)

where x and y represent the width and height of the image, respectively,

I_{pred} (i, j)

is the pixel value at position

(i, j)

in the predicted image, and

I_{ref} (i, j)

is the pixel value at position

(i, j)

in the reference image.

PSNR is a metric derived from MSE and is expressed in decibels (dBs). It measures the peak error between the reference and the predicted images. Higher PSNR values indicate better image quality, as they suggest a lower amount of noise in the predicted image relative to the reference. The formula for PSNR can be expressed as

PSNR = 10 \cdot {log}_{10} (\frac{{MAX}^{2}}{MSE})

(9)

where MAX is the maximum possible pixel value of the image (for an 8-bit image, MAX = 255).

The Structural Similarity Index Measure (SSIM) is a widely used metric for assessing image quality by comparing the structural information between two images. Unlike traditional methods, such as Mean Squared Error (MSE) or Peak Signal-to-Noise Ratio (PSNR), which focus on pixel-wise differences, SSIM evaluates the perceptual similarity between two images by considering three key aspects: luminance, contrast, and structure. The combination of these three components is expressed as

SSIM (x, y) = {[l (x, y)]}^{α} \cdot {[c (x, y)]}^{β} \cdot {[s (x, y)]}^{γ}

(10)

where

α

,

β

, and

γ

are positive parameters used to adjust the relative importance of luminance, contrast, and structure, respectively. Typically, these parameters are set to 1, simplifying the SSIM calculation as follows:

SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(11)

where

μ_{x}

and

μ_{y}

are the mean intensities of the image patches x and y, respectively,

σ_{x}^{2}

and

σ_{y}^{2}

are the variances of the image patches x and y, respectively, and

σ_{x y}

is the covariance between x and y.

C_{1}

and

C_{2}

are small constants added to stabilize the division in the case of weak denominators.

The SSIM index ranges from 0 to 1, where a value of 1 indicates perfect structural similarity between the two images. SSIM is computed locally for image patches and then averaged over the entire image to provide a global similarity score, making it a more perceptually relevant metric for assessing image quality.

LPIPS is a perceptual metric designed to evaluate the similarity between two images based on deep features extracted from a pre-trained network (such as AlexNet [55]). Unlike pixel-wise metrics, LPIPS measures the perceptual differences by comparing the representations of images at multiple layers of the network. A lower LPIPS value indicates higher perceptual similarity. The formula for LPIPS is given by

LPIPS (x, y) = \sum_{l} \frac{1}{H_{l} W_{l}} \sum_{h = 1}^{H_{l}} \sum_{w = 1}^{W_{l}} w_{l} {∥{\hat{f}}_{l} {(x)}_{h w} - {\hat{f}}_{l} {(y)}_{h w}∥}_{2}^{2}

(12)

where x and y denote the two images being compared, l indexes the layers of the pre-trained network,

H_{l}

and

W_{l}

represent the height and width of the feature maps at layer l,

{\hat{f}}_{l} (x)

and

{\hat{f}}_{l} (y)

are the normalized feature activations at layer l for images x and y, respectively, and

w_{l}

is a learned weight for layer l that scales the contribution of the feature differences.

2.5.2. Non-Reference Metrics

In real underwater environments, it is often challenging to obtain real reference images, rendering full-reference metrics unsuitable for evaluating degraded images. As a result, no-reference image quality assessment metrics must be used. However, since all the images in our dataset are in grayscale, commonly used no-reference metrics for color images [52,56], such as UIQM [57] and UCIQE [56], may not provide accurate evaluations. Most no-reference metrics are designed for conventional images, whereas images acquired from the Underwater Laser Range-Gated Imaging system in this dataset typically feature black backgrounds. The degradation in these images is generally caused by background noise. Therefore, this study adopted the following four no-reference metrics specifically for grayscale images: Standard Deviation (SD) [58], Signal-to-Noise Ratio (SNR) [58], No-Reference Quality Metric (NRQM) [59], and contrast.

In image processing, Standard Deviation (SD) measures the variation in pixel values, reflecting the level of detail and noise in an image. The formula for Standard Deviation is [58]

S D = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(I_{i} - μ)}^{2}}

(13)

where

I_{i}

is the pixel value,

μ

is the mean of the image, and N is the total number of pixels. A larger Standard Deviation indicates more noise or complex details in the image, while a smaller Standard Deviation suggests a smoother, less noisy image. In denoising tasks, background noise typically appears as fluctuations in pixel values. After denoising, the background should become smoother, with less noise, resulting in a smaller Standard Deviation. Therefore, a smaller Standard Deviation of background noise in a grayscale image usually indicates better denoising performance and a clearer image restoration.

In image processing, the Signal-to-Noise Ratio (SNR) is used to measure the quality of an image relative to its noise. It compares the mean of the target signal to the Standard Deviation of the noise, with a higher SNR indicating better image quality. The combined formula for the SNR is [58]

S N R = 10 {log}_{10} (\frac{μ_{s}^{2}}{σ_{n}^{2}})

(14)

where

μ_{s}

is the mean of the target signal, and where

σ_{n}

is the Standard Deviation of the noise. In this study, the mean of the target signal is calculated over a selected region of the restored image, while the noise Standard Deviation is computed from a pure background area without the target signal. A higher SNR indicates a clearer image with less noise, while a lower SNR suggests more noise and a poorer quality image.

The No-Reference Quality Metric (NRQM) is a no-reference image quality metric based on visual perception [59]. This method quantifies artifacts and distortions in a single image by extracting low-level statistical features from both the spatial and frequency domains. These features are then used to train a two-stage regression model. The model predicts an image quality score that aligns with human visual perception, without relying on high-resolution ground truth images. This technique does not rely on RGB color channels and can be used for evaluating grayscale images. A higher score indicates that the image quality is closer to ideal human visual perception, meaning the image is clearer, contains richer details, and exhibits fewer artifacts. Conversely, a lower score suggests poor image quality, with potential issues such as blurriness, artifacts, or distortions.

2.5.3. Evaluation Metrics for Object Detection Performance

In object detection tasks, common evaluation metrics [60,61]—such as Precision (P), Recall (R), Average Precision (AP), and mean Average Precision (mAP)—are used to objectively evaluate a model’s performance. These metrics assess not only the accuracy of object detection but also the model’s sensitivity to positive samples and its overall detection performance. Below are the detailed definitions, formulas, and significance of each metric.

Precision (P) measures the proportion of true positives among all samples predicted as positive by the model, reflecting the accuracy of its positive predictions. The formula for Precision is

P = \frac{T P}{T P + F P}

(15)

Here,

T P

(True Positive) denotes the number of samples correctly predicted as positive, and

F P

(False Positive) denotes the number of samples incorrectly predicted as positive. Precision reflects the reliability of the model’s positive predictions; higher Precision indicates fewer False Positives.

Recall (R) measures the model’s ability to identify positive samples, defined as the proportion of actual positives correctly predicted by the model. The formula for Recall is

R = \frac{T P}{T P + F N}

(16)

Here,

F N

(False Negative) represents the number of actual positive samples that are incorrectly predicted as negative by the model. Recall reflects the model’s sensitivity; higher Recall indicates better detection of positive targets. However, increasing Recall may increase False Positives, so it must be evaluated alongside Precision for a balanced assessment of model performance.

The mean Average Precision (mAP) is a comprehensive metric used to assess the overall performance of multi-class detection models. It is computed by averaging the Average Precision (AP) scores across all classes. Here, the AP for each class is defined as the area under the Precision–Recall curve, as follows:

A P = \int_{0}^{1} P (R) d R

(17)

where

P (R)

denotes the Precision corresponding to a given Recall R. The mAP is then given by

m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}

(18)

with N being the total number of classes and

A P_{i}

the Average Precision for the i-th class. A higher mAP score indicates better overall detection performance.

In evaluation reports, two commonly used mAP variants are mAP@50 and mAP@50:95; mAP@50 is computed using a fixed IoU threshold of 0.5, which tolerates moderate overlaps between predicted and ground truth boxes, while mAP@50:95 averages the mAP scores over IoU thresholds ranging from 0.5 to 0.95 in steps of 0.05, providing a more comprehensive and rigorous evaluation.

3. Experimental Results

In this section, we provide both quantitative and qualitative comparisons of our proposed method (UP-Net) with several representative underwater image restoration techniques, including UDCP [22], WCID [25], UGAN [27], UResnet [28], and CWR [30]. Except for the traditional methods (UDCP and WCID), all neural network-based methods were trained on the same dataset. Although these neural network models were originally designed for color image restoration, they inherently support grayscale input. To ensure a fair comparison, we modified the input layers of UGAN, UResnet, and CWR—specifically adjusting the channel dimension—to adapt them to our training set. All the networks were re-implemented and trained under consistent grayscale settings prior to evaluation.

3.1. Quantitative Evaluation

First, we performed image restoration tests on the synthetic test set of our dataset, which consists of 104 image pairs and includes six different targets, two of which were not present in the training set. The restoration results of various methods, along with their corresponding reference images, are shown in Figure 5 (note that the figure shows only the relevant target regions extracted from the original images).

3.1.1. The Restoration Results on the Test Set

As shown in Figure 5, most of the methods effectively enhanced the quality of the synthetic degraded underwater images. Although UDCP and WCID preserved the shape, texture, and details of the target signal relatively well, significant background noise remained, resulting in residual noise around the target signal and degrading the overall image quality. Although UGAN and CWR effectively reduced background noise, they could suffer from over-smoothing, which resulted in the loss of some target signal details and information in the restored images. Additionally, these methods produced darker images overall. UResnet preserved the texture and details of the target signal well, losing only a few weaker target signals; however, some noise remained at the edges of the target, and minor artifacts were introduced. In contrast, our proposed method not only retained the target signal’s details, texture, and contours but also effectively filtered out most of the background noise. Even weak target signals were well preserved, resulting in a more natural and smoother overall appearance. Furthermore, our method showed promising generalization capability by effectively removing background noise while preserving the essential details of the target signal, even in the untrained samples (d) and (e). However, we acknowledge that these initial results, based on a limited number of test targets, warrant further validation to confirm the method’s general applicability.

3.1.2. Restoration Results on the Real Degraded Set

After completing the qualitative evaluation on the test set, we conducted an additional evaluation on the previously mentioned real degraded set to validate the model’s generalization capability. Since these are real underwater degraded images, no corresponding reference images are available. The same restoration methods from the previous section were then applied to the images in this set. The restoration results are shown in Figure 6 (note that only the relevant target regions are displayed, extracted from the original images), for the trained targets (samples (a) and (b)); most of the methods enhanced the quality of the original images. However, the results of UDCP and WCID still contained significant background noise, and they failed to separate the target signal from the background effectively. For the untrained targets (samples (c), (d), and (e)), the restoration results varied considerably across the methods. Both UGAN and CWR lost some weak target signals, resulting in incomplete restorations with overall darker tones and poor contrast. Similarly, UDCP and WCID were unable to separate the target signal from the background, leaving noticeable noise around the targets. UResnet preserved most of the image information and features, losing only a few weaker target signals, but some noise remained near the target edges. In contrast, our method achieved excellent restoration performance, fully separating the target signal from the background while preserving even the weakest target signals. The restored images appear natural and smooth overall.

3.2. Quantitative Evaluation

3.2.1. Full-Reference Evaluation

We begin with a full-reference evaluation, using four commonly used metrics: MSE, PSNR, SSIM, and LPIPS. Although the ground truth images may differ slightly from the reference images, the evaluation results still provide a reasonable indication of the performances of the different methods. Table 2 presents the quantitative results of various methods, in terms of MSE, PSNR, and SSIM, on the test set. These results were obtained by comparing the output of each method with the corresponding reference image, and the scores represent the average values across all the images in the test set. As shown in Table 2, our method achieved the best performance in the full-reference image quality evaluation, outperforming all the other methods in all four metrics.

3.2.2. Non-Reference Evaluation

Due to the absence of real reference images, we evaluated the real degraded set using three no-reference image quality metrics: SD, SNR, and NRQM. For SD, the Standard Deviation was computed in the background region, which contained no target. A lower SD indicated less noise and a cleaner image. The SNR was calculated by comparing the signal region (containing the target) to the background region, where a higher SNR signified a better Signal-to-Noise Ratio and improved image quality. A higher NRQM score reflected better consistency with human visual perception.

Table 3 reports the quantitative results of the different methods, in terms of SD, SNR, and NRQM, on the real degraded set, with scores representing the average values across all the images in the set. As shown in Table 3, our method outperformed the others in all three metrics, producing clearer images with less background noise, higher Signal-to-Noise Ratios, and results more consistent with human visual perception.

3.2.3. Evaluation on Object Detection Task

Object detection methods can struggle to achieve precise target identification in underwater environments, due to the degradation effects commonly present in subaquatic imagery. Our proposed UP-Net effectively restores degraded underwater laser range-gated images, leading to significant improvements in target recognition accuracy. The well-known YOLO (You Only Look Once) [62] has been broadly used for object detection because of its fast speed and high accuracy. YOLO11 [63] achieves balanced accuracy-speed performance through enhanced small object detection, multi-scale feature fusion, and a lightweight design, enabling real-time detection in complex scenarios.

For comparative evaluation, we used YOLO11 to assess the performance of the restoration methods described in the previous section. Specifically, we selected 650 clear “real reference” images from the UP-Net training set as our training data, and we classified all the targets into two categories: fish and diver. Additionally, we chose 90 severely degraded real images along with their restored versions—produced by various restoration methods—to serve as input for YOLO11’s object detection. Figure 7 provides an example comparing the object detection performance on the restored images with that on the real degraded images. Table 4 shows the object detection performance using YOLO11 across different restoration methods. Notably, the raw images (i.e., the real degraded images without any restoration processing), as well as the UDCP and WCID methods, yielded zero scores across all metrics, indicating that neither the unprocessed images nor these restoration approaches support effective object detection. Both UGAN and UResnet offered modest improvements, while CWR achieved substantially better results. However, our proposed method outperformed all the alternatives, achieving the highest Precision, Recall, and mAP scores. This clearly demonstrates the superior capability of our approach to restoring degraded images for object detection.

3.3. Ablation Study

3.3.1. Loss Function Ablation

In the combined loss function we proposed, the weighting factor

λ

balances the contributions of the MSE and perceptual loss components. Firstly, we conducted an ablation study to assess the impact of varying

λ

on image restoration performance, using metrics such as MSE, PSNR, SSIM, and LPIPS. The experimental results presented in Table 5 reveal a clear trend: as

λ

increased from 0.1 to 0.7, the MSE decreased significantly, indicating enhanced reconstruction fidelity. Additionally, both the PSNR and SSIM metrics achieved their highest values at

λ

= 0.7, suggesting superior image quality and structural similarity at this setting. Although the perceptual similarity score showed a slightly lower value, at

λ

= 0.9, the overall performance across all metrics favored

λ

= 0.7. In summary, our ablation study demonstrates that

λ

= 0.7 provides the best balance between fidelity and perceptual quality, thereby validating the effectiveness of our combined loss design in improving image restoration outcomes.

The most commonly used loss function for image restoration is the Mean Squared Error (MSE, or L2 loss). However, the effectiveness of different loss functions, especially for underwater grayscale image restoration, has not been thoroughly investigated, and their ability to provide optimal results remains uncertain. For this paper, we conducted an ablation study, inspired by related work, to evaluate the impact of various loss functions on the performance of our model for underwater image restoration. Table 6 summarizes the loss functions used in our experiments, including both basic loss functions and their combinations. In the loss function formulas, y represents the reference image, while

\hat{y}

denotes the predicted image generated by the model. The total number of pixels in the image is represented by N. For perceptual loss,

ϕ (\cdot)

denotes the feature extraction function from the pre-trained VGG16 network, which maps images to a feature space for comparison. Additionally,

α

and

β

are weight parameters used to balance the different components of the combined loss functions.

We trained our network, using different loss functions, and we tested it on our dataset. A sample image was selected from the test set of our dataset. As shown in Figure 8, the restored results in Figure 8c,h were obtained using different loss functions, and the second row shows the details from the red boxed area of the image. From the results in the first row, it is evident that Figure 8c,e–g exhibit significant noise around the edges that was not fully removed. In the second row, Figure 8d,e show artifacts along the edges, while Figure 8c,g suffer from uneven brightness, obscuring some detailed textures. In contrast, the details in Figure 8f,h appear more natural, and the images overall are clearer.

Table 7 presents the quantitative evaluation of various loss functions on the test dataset, using full-reference metrics: MSE, PSNR, and SSIM. The results show that the combined loss function, L2 + VGG16, outperformed the others in PSNR, SSIM, and LPIPS, achieving the highest PSNR, of 40.8241 dB, the highest SSIM, of 40.8241, and the lowest LPIPS score, of 0.0095, indicating superior image quality and reconstruction accuracy. Additionally, for MSE, the perceptual loss use VGG16 alone yielded the highest MSE score (12.2351), closely followed by L2 + VGG16 (13.1962), suggesting strong preservation of perceptual image details. Overall, our UP-Net model, which incorporates the combined L2 + VGG16 loss function, achieved the best performance in image restoration.

3.3.2. Architectural Choices

While our main focus was on the loss function, we also evaluated the robustness of the baseline U-Net architecture, using our proposed combined loss function. Table 8 presents experiments conducted on U-Net architectures with depths of 3, 4, and 5, both with and without residual connections. For the shallowest network (depth 3), adding residual connections led to noticeable improvements in PSNR and SSIM, although the overall performance remained suboptimal compared to deeper architectures. The best performance was achieved at a depth of 4 with residual connections, yielding the lowest MSE (13.1962), the highest PSNR (40.8241 dB), and the best SSIM (0.9572), along with a competitive LPIPS score (0.0095). In contrast, increasing the network depth to 5 led to performance degradation, particularly when residual connections were applied, suggesting that excessive depth may introduce unnecessary complexity that hinders effective feature learning. These results indicate that a U-Net architecture with a moderate depth of 4 combined with residual connections provides the optimal balance between model complexity and image restoration quality under our combined loss function.

3.4. Applications in Real Marine Environments

To validate the restoration performance of our proposed method in real marine environments, we conducted a brief sea trial near a pier in Lingshui Li Autonomous County, Hainan Province, China, as shown in Figure 9. We prepared a custom-made target wrapped with tapes of different colors, which was submerged approximately 2 m below the water surface near our system, using a rope, with the target located about 2 m from the system. The environmental conditions involved high water turbidity (attributed to rainfall), with an average turbidity of approximately 5.53 NTU, and the experiment was conducted during the daytime with overcast weather. The laser was activated, and the gating time and laser power were adjusted to capture underwater images using an Underwater Laser Range-Gated Imaging system. For comparison, we also captured images of the target at the same location, using a standard visible light camera.

As shown in Figure 9, due to rainfall the previous day, the seawater was highly turbid; the visible light camera failed to detect the target, while the laser range-gated system successfully captured the target’s degraded underwater image. However, the image exhibited a relatively low Signal-to-Noise Ratio, with significant noise surrounding the target signal. We used the Underwater Laser Range-Gated Imaging system to continuously capture multiple degraded underwater images of the target, and these images were restored using several methods, such as UDCP, WCID, UResnet, UGAN, CWR, and our UP-Net. Some of the results are shown in Figure 10.

Due to the limited number of collected samples and the absence of reference images, quantitative evaluation using common image quality metrics might have yielded inaccurate results, lacking sufficient reliability and representativeness. Thus, we relied on visual perception to compare the performances of the different methods. As shown in Figure 10, while UDCP, WCID, and UResnet preserved some details and contours of the target signal, they introduced significant noise around the target, resulting in a low Signal-to-Noise Ratio and degraded quality. UGAN, on the other hand, removed parts of the target signal’s details and contours as noise, leaving only a fraction of the signal, leading to suboptimal restoration. Among all the methods, CWR and our approach performed the best. Both eliminated most background noise while preserving the target signal’s details and contours. However, CWR lost weak target signals, such as the rope above the target in the original image. In contrast, our method restored weak signals more effectively, retaining these subtle features almost entirely.

We acknowledge that our current marine environment experiments have limitations regarding sample size and task-specific evaluation. Therefore, while the results are promising, we refrain from making overgeneralizing claims about the performance improvements in practical underwater target recognition. Future work will aim to provide a more comprehensive evaluation with additional quantitative analysis and a broader range of testing conditions.

4. Discussions and Conclusions

In this paper, we propose an enhanced U-Net image restoration neural network (called UP-Net), specifically designed to improve the performance of Underwater Laser Range-Gated Imaging systems by reducing noise interference in their captured images. Based on the U-Net architecture, the network integrates both pixel loss and perceptual loss, enabling it to effectively extract high-level semantic features and preserve critical target details during the reconstruction process—features that are particularly suited to the unique characteristics of Underwater Laser Range-Gated Imaging. To train the network, we constructed a semi-synthetic grayscale dataset based on underwater laser range-gated images, and we conducted comprehensive quantitative and qualitative evaluations.

The experimental results clearly demonstrate that, compared to several existing underwater image restoration methods, our UP-Net contributes to improving the imaging quality of underwater laser range-gated systems. Furthermore, we validated the positive impact of UP-Net on improving underwater laser range-gated image quality through underwater target detection experiments. By suppressing residual noise and reinforcing key semantic information through a VGG16-based perceptual loss, the proposed approach not only restores image quality but also facilitates more accurate detection and recognition of underwater targets. This improvement is critical for maritime safety, search and rescue operations, and military reconnaissance, where the reliable performance of Underwater Laser Range-Gated Imaging systems is paramount.

While the results are promising, the current study has several limitations: the semi-synthetic dataset, although carefully constructed, may not fully capture the variability and complexity of real-world underwater conditions. Additionally, the evaluation in real marine environments was conducted under limited conditions with only a small number of targets, which may not entirely represent broader operational scenarios. In future work, we plan to address these limitations by expanding the dataset to include a broader variety of underwater scenes and conditions, thereby improving the generalizability of the network. Additionally, we intend to conduct extended experiments in real ocean environments, to further validate and refine our approach. Overall, this work establishes a solid foundation for advanced Underwater Laser Range-Gated Imaging techniques and opens promising avenues for enhancing target recognition. These improvements will contribute to ensuring maritime safety, supporting underwater search and rescue, and strengthening military reconnaissance capabilities.

Author Contributions

Data curation, P.L. and S.C.; methodology, P.L., W.H., Y.T. and G.J.; software, P.L., J.W. and D.L.; formal analysis, P.L. and L.C.; investigation, S.C.; resources, W.H. and W.C.; writing—original draft preparation, P.L.; visualization, P.L.; supervision, D.L. and W.C.; project administration, W.C.; funding acquisition, G.J. and W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shenzhen Science and Technology Program (Grant No. JCYJ20220531100612027), the Youth Innovation Promotion Association CAS, the Shenzhen Science and Technology Program (Grant No. GJHZ20210705141403009), and the Water Conservancy Science and Technology Innovation Project of Guangdong Province (no. 2022-03).

Data Availability Statement

The datasets presented in this article are not readily available because the datasets were collected under highly specific experimental conditions using our self-developed Underwater Laser Range-Gated Imaging system. Due to the unique technical parameters and environmental constraints of this proprietary setup, the data may not be directly applicable to other researchers without access to equivalent hardware or contextual calibration.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Shen, Y.; Zhao, C.; Liu, Y.; Wang, S.; Huang, F. Underwater Optical Imaging: Key Technologies and Applications Review. IEEE Access 2021, 9, 85500–85514. [Google Scholar] [CrossRef]
Zhang, J.; Han, F.; Han, D.; Yang, J.; Zhao, W.; Li, H. Integration of Sonar and Visual–Inertial Systems for SLAM in Underwater Environments. IEEE Sens. J. 2024, 24, 16792–16804. [Google Scholar] [CrossRef]
Zhao, H.; Xu, X.; Qian, Z.; Shi, H.; Sun, W.; Zhai, J.; Wu, H. High precision underwater 3D imaging of non-cooperative target with frequency comb. Opt. Laser Technol. 2022, 148, 107749. [Google Scholar] [CrossRef]
Eken, İ.C.; Çetin, Y.Y. Underwater target detection with hyperspectral imagery for search and rescue missions. In Proceedings of the Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XXIV; Velez-Reyes, M., Messinger, D.W., Eds.; International Society for Optics and Photonics, SPIE: Bellingham, WA, USA, 2018; Volume 10644, p. 106441Z. [Google Scholar] [CrossRef]
González-Sabbagh, S.P.; Robles-Kelly, A. A Survey on Underwater Computer Vision. ACM Comput. Surv. 2023, 55, 268. [Google Scholar] [CrossRef]
Sudevan, V.; Mankovskii, N.; Javed, S.; Karki, H.; Masi, G.D.; Dias, J. Multisensor fusion for marine infrastructures’ inspection and safety. In Proceedings of the OCEANS 2022, Hampton Roads, VA, USA, 17–20 October 2022; pp. 1–7. [Google Scholar] [CrossRef]
Yang, J.; Li, C.; Lo, L.S.H.; Zhang, X.; Chen, Z.; Gao, J.; U, C.; Dai, Z.; Nakaoka, M.; Yang, H.; et al. Artificial Intelligence-Assisted Environmental DNA Metabarcoding and High-Resolution Underwater Optical Imaging for Noninvasive and Innovative Marine Environmental Monitoring. J. Mar. Sci. Eng. 2024, 12, 1729. [Google Scholar] [CrossRef]
Tegdan, J.; Ekehaug, S.; Hansen, I.M.; Aas, L.M.S.; Steen, K.J.; Pettersen, R.; Beuchel, F.; Camus, L. Underwater hyperspectral imaging for environmental mapping and monitoring of seabed habitats. In Proceedings of the OCEANS 2015—Genova, Genova, Italy, 18–21 May 2015; pp. 1–6. [Google Scholar] [CrossRef]
Yang, Y.; Wang, X.; Sun, L.; Zhong, X.; Lei, P.; Chen, J.; He, J.; Zhou, Y. Binning-based local-threshold filtering for enhancement of underwater 3D gated range-intensity correlation imaging. Opt. Express 2021, 29, 9385–9395. [Google Scholar] [CrossRef]
Risholm, P.; Thorstensen, J.; Thielemann, J.T.; Kaspersen, K.; Tschudi, J.; Yates, C.; Softley, C.; Abrosimov, I.; Alexander, J.; Haugholt, K.H. Real-time super-resolved 3D in turbid water using a fast range-gated CMOS camera. Appl. Opt. 2018, 57, 3927–3937. [Google Scholar] [CrossRef]
Lin, H.; Ma, L.; Hu, Q.; Zhang, X.; Xiong, Z.; Han, H. Single Image Deblurring for Pulsed Laser Range-Gated Imaging System with Multi-Slice Integration. Photonics 2022, 9, 642. [Google Scholar] [CrossRef]
Hu, Y.; Hou, A.; Zhang, X.; Han, F.; Zhao, N.; Xu, S.; Ma, Q.; Gu, Y.; Dong, X.; Chen, Y.; et al. Assessment of Lateral Structural Details of Targets Using Principles of Full-Waveform Light Detection and Ranging. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5704116. [Google Scholar] [CrossRef]
Wang, M.; Wang, X.; Zhang, Y.; Sun, L.; Lei, P.; Yang, Y.; Chen, J.; He, J.; Zhou, Y. Range-intensity-profile prior dehazing method for underwater range-gated imaging. Opt. Express 2021, 29, 7630–7640. [Google Scholar] [CrossRef]
Mariani, P.; Quincoces, I.; Haugholt, K.H.; Chardard, Y.; Visser, A.W.; Yates, C.; Piccinno, G.; Reali, G.; Risholm, P.; Thielemann, J.T. Range-Gated Imaging System for Underwater Monitoring in Ocean Environment. Sustainability 2019, 11, 162. [Google Scholar] [CrossRef]
Lin, H.; Zhang, X.; Ma, L.; Hu, Q.; Jin, D. Estimation of water attenuation coefficient by imaging modeling of the backscattered light with the pulsed laser range-gated imaging system. Opt. Contin. 2022, 1, 989–1002. [Google Scholar] [CrossRef]
Yang, X.; Liu, Y.; Mou, X.; Hu, T.; Yuan, F.; Cheng, E. Imaging in turbid water based on a Hadamard single-pixel imaging system. Opt. Express 2021, 29, 12010–12023. [Google Scholar] [CrossRef] [PubMed]
Liu, F.; Han, P.; Wei, Y.; Yang, K.; Huang, S.; Li, X.; Zhang, G.; Bai, L.; Shao, X. Deeply seeing through highly turbid water by active polarization imaging. Opt. Lett. 2018, 43, 4903–4906. [Google Scholar] [CrossRef]
Yu, T.; Wang, X.; Xi, S.; Mu, Q.; Zhu, Z. Underwater polarization imaging for visibility enhancement of moving targets in turbid environments. Opt. Express 2023, 31, 459–468. [Google Scholar] [CrossRef]
Jin, X.; Du, D.; Jin, J.; Fan, Y. Time-of-flight based imaging in strong scattering underwater environments. Opt. Express 2024, 32, 37247–37259. [Google Scholar] [CrossRef]
Liu, L.; Li, X.; Yang, J.; Tian, X.; Liu, L. Target recognition and segmentation in turbid water using data from non-turbid conditions: A unified approach and experimental validation. Opt. Express 2024, 32, 20654–20668. [Google Scholar] [CrossRef]
He, K.; Sun, J.; Tang, X. Single Image Haze Removal Using Dark Channel Prior. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2341–2353. [Google Scholar] [CrossRef]
Drews, P.L.; Nascimento, E.R.; Botelho, S.S.; Montenegro Campos, M.F. Underwater Depth Estimation and Image Restoration Based on Single Images. IEEE Comput. Graph. Appl. 2016, 36, 24–35. [Google Scholar] [CrossRef]
Peng, Y.T.; Cosman, P.C. Underwater Image Restoration Based on Image Blurriness and Light Absorption. IEEE Trans. Image Process. 2017, 26, 1579–1594. [Google Scholar] [CrossRef]
Carlevaris-Bianco, N.; Mohan, A.; Eustice, R.M. Initial results in underwater single image dehazing. In Proceedings of the OCEANS 2010 MTS/IEEE SEATTLE, Seattle, WA, USA, 20–23 September 2010; pp. 1–8. [Google Scholar] [CrossRef]
Chiang, J.Y.; Chen, Y.C. Underwater Image Enhancement by Wavelength Compensation and Dehazing. IEEE Trans. Image Process. 2012, 21, 1756–1769. [Google Scholar] [CrossRef] [PubMed]
Hou, G.; Li, N.; Zhuang, P.; Li, K.; Sun, H.; Li, C. Non-Uniform Illumination Underwater Image Restoration via Illumination Channel Sparsity Prior. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 799–814. [Google Scholar] [CrossRef]
Fabbri, C.; Islam, M.J.; Sattar, J. Enhancing Underwater Imagery Using Generative Adversarial Networks. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 7159–7165. [Google Scholar] [CrossRef]
Liu, P.; Wang, G.; Qi, H.; Zhang, C.; Zheng, H.; Yu, Z. Underwater Image Enhancement With a Deep Residual Framework. IEEE Access 2019, 7, 94614–94629. [Google Scholar] [CrossRef]
Li, C.; Anwar, S.; Porikli, F. Underwater scene prior inspired deep underwater image and video enhancement. Pattern Recognit. 2020, 98, 107038. [Google Scholar] [CrossRef]
Han, J.; Shoeiby, M.; Malthus, T.; Botha, E.; Anstee, J.; Anwar, S.; Wei, R.; Petersson, L.; Armin, M.A. Single Underwater Image Restoration by Contrastive Learning. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021; pp. 2385–2388. [Google Scholar] [CrossRef]
Fu, Z.; Lin, H.; Yang, Y.; Chai, S.; Sun, L.; Huang, Y.; Ding, X. Unsupervised Underwater Image Restoration: From a Homology Perspective. Proc. AAAI Conf. Artif. Intell. 2022, 36, 643–651. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, X.; Sun, L.; Lei, P.; Chen, J.; He, J.; Zhou, Y.; Liu, Y. Mask-guided deep learning fishing net detection and recognition based on underwater range gated laser imaging. Opt. Laser Technol. 2024, 171, 110402. [Google Scholar] [CrossRef]
Wang, M.; Wang, X.; Sun, L.; Yang, Y.; Zhou, Y. Underwater 3D deblurring-gated range-intensity correlation imaging. Opt. Lett. 2020, 45, 1455–1458. [Google Scholar] [CrossRef]
Tan, C.S.; Sluzek, A.; Seet, G.G. Model of gated imaging in turbid media. Opt. Eng. 2005, 44, 116002. [Google Scholar] [CrossRef]
Walker, R.E.; McLean, J.W. Lidar equations for turbid media with pulse stretching. Appl. Opt. 1999, 38, 2384–2397. [Google Scholar] [CrossRef]
Li, C.; Guo, C.; Ren, W.; Cong, R.; Hou, J.; Kwong, S.; Tao, D. An Underwater Image Enhancement Benchmark Dataset and Beyond. IEEE Trans. Image Process. 2020, 29, 4376–4389. [Google Scholar] [CrossRef]
Duarte, A.; Codevilla, F.; Gaya, J.D.O.; Botelho, S.S.C. A dataset to evaluate underwater image restoration methods. In Proceedings of the OCEANS 2016—Shanghai, Shanghai, China, 10–13 April 2016; pp. 1–6. [Google Scholar] [CrossRef]
Liu, R.; Fan, X.; Zhu, M.; Hou, M.; Luo, Z. Real-World Underwater Enhancement: Challenges, Benchmarks, and Solutions Under Natural Light. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 4861–4875. [Google Scholar] [CrossRef]
Islam, M.J.; Xia, Y.; Sattar, J. Fast Underwater Image Enhancement for Improved Visual Perception. IEEE Robot. Autom. Lett. 2020, 5, 3227–3234. [Google Scholar] [CrossRef]
Li, J.; Skinner, K.A.; Eustice, R.M.; Johnson-Roberson, M. WaterGAN: Unsupervised Generative Network to Enable Real-Time Color Correction of Monocular Underwater Images. IEEE Robot. Autom. Lett. 2018, 3, 387–394. [Google Scholar] [CrossRef]
Berman, D.; Levy, D.; Avidan, S.; Treibitz, T. Underwater Single Image Color Restoration Using Haze-Lines and a New Quantitative Dataset. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 2822–2837. [Google Scholar] [CrossRef]
Hashisho, Y.; Albadawi, M.; Krause, T.; von Lukas, U.F. Underwater Color Restoration Using U-Net Denoising Autoencoder. In Proceedings of the 2019 11th International Symposium on Image and Signal Processing and Analysis (ISPA), Dubrovnik, Croatia, 23–25 September 2019; pp. 117–122. [Google Scholar] [CrossRef]
Weigert, M.; Schmidt, U.; Boothe, T.; Müller, A.; Dibrov, A.; Jain, A.; Wilhelm, B.; Schmidt, D.; Broaddus, C.; Culley, S.; et al. Content-Aware Image Restoration: Pushing the Limits of Fluorescence Microscopy. Nat. Methods 2018, 15, 1090–1097. [Google Scholar] [CrossRef]
Marcos, L.; Alirezaie, J.; Babyn, P. Low Dose CT Image Denoising Using Boosting Attention Fusion GAN with Perceptual Loss. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Virtual Conference, 1–5 November 2021; pp. 3407–3410. [Google Scholar] [CrossRef]
Fu, L.; Majeed, Y.; Zhang, X.; Karkee, M.; Zhang, Q. Faster R–CNN–based apple detection in dense-foliage fruiting-wall trees using RGB and depth features for robotic harvesting. Biosyst. Eng. 2020, 197, 245–256. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Yang, Q.; Yan, P.; Zhang, Y.; Yu, H.; Shi, Y.; Mou, X.; Kalra, M.K.; Zhang, Y.; Sun, L.; Wang, G. Low-Dose CT Image Denoising Using a Generative Adversarial Network With Wasserstein Distance and Perceptual Loss. IEEE Trans. Med. Imaging 2018, 37, 1348–1357. [Google Scholar] [CrossRef]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Bosse, S.; Maniry, D.; Müller, K.R.; Wiegand, T.; Samek, W. Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment. IEEE Trans. Image Process. 2018, 27, 206–219. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Yang, M.; Sowmya, A. An Underwater Color Image Quality Evaluation Metric. IEEE Trans. Image Process. 2015, 24, 6062–6071. [Google Scholar] [CrossRef]
Panetta, K.; Gao, C.; Agaian, S. Human-Visual-System-Inspired Underwater Image Quality Measures. IEEE J. Ocean. Eng. 2016, 41, 541–551. [Google Scholar] [CrossRef]
Jagalingam, P.; Hegde, A.V. A Review of Quality Metrics for Fused Image. Aquat. Procedia 2015, 4, 133–142. [Google Scholar] [CrossRef]
Ma, C.; Yang, C.Y.; Yang, X.; Yang, M.H. Learning a no-reference quality metric for single-image super-resolution. Comput. Vis. Image Underst. 2017, 158, 1–16. [Google Scholar] [CrossRef]
Song, P.; Zhao, L.; Li, H.; Xue, X.; Liu, H. RSE-YOLOv8: An Algorithm for Underwater Biological Target Detection. Sensors 2024, 24, 6030. [Google Scholar] [CrossRef]
Song, G.; Chen, W.; Zhou, Q.; Guo, C. Underwater Robot Target Detection Algorithm Based on YOLOv8. Electronics 2024, 13, 3374. [Google Scholar] [CrossRef]
Jocher, G.; Qiu, J. Ultralytics YOLO11. Version 11.0.0. License: AGPL-3.0. 2024. Available online: https://github.com/ultralytics/ultralytics (accessed on 3 December 2024).
Mao, M.; Hong, M. YOLO Object Detection for Real-Time Fabric Defect Inspection in the Textile Industry: A Review of YOLOv1 to YOLOv11. Sensors 2025, 25, 2270. [Google Scholar] [CrossRef]

Figure 1. Schematic of the Underwater Laser Range-Gated Imaging technique.

Figure 2. The Underwater Laser Range-Gated Imaging system: (a) A 3D model overview of the system. (b) Photograph of the system in the air. (c) The system being tested in the water tank.

Figure 3. The synthesis process of the degraded images.

Figure 4. The structure of UP-Net.

Figure 5. Qualitative comparisons for samples from test set. From left to right, the images include Synthetic Degraded underwater Images (SDIs), underwater reference images, and the results of UDCP, WCID, UGAN, UResnet, CWR, and our method. Samples (a–c) represent targets that were included in the training set, while samples (d, e) correspond to previously unseen targets that were not part of the training set.

Figure 6. Qualitative comparisons for samples from the real degraded set. From left to right are the raw images (real degraded images), and the results of UDCP, WCID, UGAN, UResnet, CWR, and our method. Samples (a, b) represent targets included in the training set, whereas (c–e) represent untrained targets that were not part of the training set.

Figure 7. Visual comparison of underwater object detection on real degraded images (raw) and restored images by UDCP, WCID, UGAN, UResnet, CWR, and our UP-Net. (a–d) represent four different targets in real degraded set.

Figure 8. Restored results of the sample image, using different loss functions. From (c–h) are, respectively, the restored results of the loss function L1, L2, SSIM, VGG16, L2 + SSIM, and L2 + VGG16.

Figure 9. Real marine environment testing: (a) shows the 2D views of the sea trial location, (b) shows the location of our system during marine testing (the red arrow); (c) shows the real image of the target captured in the air, using a standard color camera; (d) presents the degraded image of the target captured underwater by a visible light camera in our sensor; and (e) displays the degraded image of the target captured by the Underwater Laser Range-Gated Imaging system.

Figure 10. Results of testing in real marine environments.

Table 1. Composition and characteristics of Underwater Laser Range-Gated Imaging dataset.

Category	Description	Quantity	Characteristics
Synthetic Data Generation	Reference images (clear water)	800	8-bit grayscale, 1388 × 1038 px
	Background noise (no target)	four	milk-induced turbidity
	Degraded image pairs	3200	additive noise
Data Partitioning	Training set	2560	four targets, multi-view, multi-distance
	Test set	104	six targets (two new targets)
	Real degraded set	100	seven targets (three new targets)
Preprocessing	Patch extraction	eight/img	256 × 256 px
	Geometric augmentation	eight types	random rotations, flips
	Final training pairs	20,480	19,456 train + 1024 val

Table 2. Full-reference image quality evaluation of the test set.

Method	UDCP	WCID	UGAN	UResnet	CWR	Ours
MSE ↓	277.0137	251.5621	82.6599	30.9333	34.2218	13.1962
PSNR(dB) ↑	26.0802	26.1879	30.4068	35.2844	37.7635	40.8241
SSIM ↑	0.2961	0.4446	0.9301	0.8140	0.9560	0.9572
LPIPS ↓	0.4782	0.4633	0.1200	0.1452	0.0303	0.0095

Table 3. Non-reference image quality evaluation of the real degraded set.

Method	UDCP	WCID	UGAN	UResnet	CWR	Ours
SD ↓	3.7773	1.3443	5.4470	0.6650	0.4909	0.4670
SNR (dB) ↑	3.2825	16.7009	0.8951	21.2008	11.1036	22.6883
NRQM ↑	2.6789	2.7319	2.6732	2.7657	2.7647	2.8053

Table 4. Object detection performance of the images restored by different restoration methods.

Method	UGAN	UResnet	CWR	Ours
P(%) ↑	0.243	0.233	0.672	0.868
R(%) ↑	0.174	0.167	0.553	0.732
mAP@50(%) ↑	0.132	0.216	0.566	0.835
mAP@50–95(%) ↑	0.032	0.047	0.232	0.385

Table 5. Quantitative comparison of different

λ

values in the combined loss function.

Table 5. Quantitative comparison of different

λ

values in the combined loss function.

Loss	$λ$ = 0.1	$λ$ = 0.3	$λ$ = 0.5	$λ$ = 0.7	$λ$ = 0.9
MSE ↓	34.9015	19.0582	16.6284	13.1962	13.2337
PSNR (dB) ↑	38.5171	40.1177	39.8979	40.8241	39.6251
SSIM ↑	0.7897	0.8018	0.7638	0.9572	0.8418
LPIPS ↓	0.0202	0.0083	0.0089	0.0095	0.0079

Table 6. Different loss functions for underwater image restoration.

Loss Function	Mathematical Formula
L1 loss (MAE)	$L_{L 1} = \frac{1}{N} \sum_{i = 1}^{N} \| y_{i} - {\hat{y}}_{i} \|$
L2 loss (MSE)	$L_{L 2} = \frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}$
SSIM loss	$L_{S S I M} = 1 - SSIM (y, \hat{y})$
VGG16 loss (perceptual loss)	$L_{p e r c e p t u a l} = \frac{1}{N} \sum_{i = 1}^{N} {∥ϕ (y_{i}) - ϕ ({\hat{y}}_{i})∥}_{2}^{2}$
L2 + SSIM loss	$L_{L 2 + S S I M} = α L_{L 2} + (1 - α) L_{SSIM}$
L2 + VGG16 loss	$L_{L 2 + p e r c e p t u a l} = λ L_{L 2} + (1 - λ) L_{VGG 16}$

Table 7. Quantitative results of different loss functions evaluated on the test dataset.

Loss	L1	L2	SSIM	VGG16	L2 + SSIM	L2 + VGG16
MSE ↓	44.8974	18.1050	23.4942	13.0532	12.2351	13.1962
PSNR (dB) ↑	38.3087	37.7510	37.2081	40.8113	39.9549	40.8241
SSIM ↑	0.8810	0.9404	0.8169	0.8406	0.9156	0.9572
LPIPS ↓	0.0465	0.0283	0.0642	0.0187	0.0176	0.0095

Table 8. Impact of network depth and residual connections.

Depth	Residual	MSE ↓	PSNR (dB) ↑	SSIM ↑	LPIPS ↓
3	NO	26.7505	38.5892	0.6920	0.0192
3	YES	26.2029	39.5384	0.8071	0.0228
4	NO	16.9187	40.2322	0.7742	0.0110
4	YES	13.1962	40.8241	0.9572	0.0095
5	NO	23.0809	38.2186	0.6909	0.0152
5	YES	31.6297	38.2175	0.7107	0.0261

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, P.; Chen, S.; He, W.; Wang, J.; Chen, L.; Tan, Y.; Luo, D.; Chen, W.; Jiao, G. Enhanced U-Net for Underwater Laser Range-Gated Image Restoration: Boosting Underwater Target Recognition. J. Mar. Sci. Eng. 2025, 13, 803. https://doi.org/10.3390/jmse13040803

AMA Style

Liu P, Chen S, He W, Wang J, Chen L, Tan Y, Luo D, Chen W, Jiao G. Enhanced U-Net for Underwater Laser Range-Gated Image Restoration: Boosting Underwater Target Recognition. Journal of Marine Science and Engineering. 2025; 13(4):803. https://doi.org/10.3390/jmse13040803

Chicago/Turabian Style

Liu, Peng, Shuaibao Chen, Wei He, Jue Wang, Liangpei Chen, Yuguang Tan, Dong Luo, Wei Chen, and Guohua Jiao. 2025. "Enhanced U-Net for Underwater Laser Range-Gated Image Restoration: Boosting Underwater Target Recognition" Journal of Marine Science and Engineering 13, no. 4: 803. https://doi.org/10.3390/jmse13040803

APA Style

Liu, P., Chen, S., He, W., Wang, J., Chen, L., Tan, Y., Luo, D., Chen, W., & Jiao, G. (2025). Enhanced U-Net for Underwater Laser Range-Gated Image Restoration: Boosting Underwater Target Recognition. Journal of Marine Science and Engineering, 13(4), 803. https://doi.org/10.3390/jmse13040803

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced U-Net for Underwater Laser Range-Gated Image Restoration: Boosting Underwater Target Recognition

Abstract

1. Introduction

2. Methods

2.1. Underwater Laser Range-Gated Imaging

2.2. Hardware Architecture of the Underwater Laser Range-Gated Imaging System

2.3. Dataset Preparation

2.4. U-Net with Perceptual Enhancement Network (UP-Net)

2.4.1. The Structure of UP-Net

2.4.2. Loss Function

2.5. Imaging Quality

2.5.1. Full-Reference Metrics

2.5.2. Non-Reference Metrics

2.5.3. Evaluation Metrics for Object Detection Performance

3. Experimental Results

3.1. Quantitative Evaluation

3.1.1. The Restoration Results on the Test Set

3.1.2. Restoration Results on the Real Degraded Set

3.2. Quantitative Evaluation

3.2.1. Full-Reference Evaluation

3.2.2. Non-Reference Evaluation

3.2.3. Evaluation on Object Detection Task

3.3. Ablation Study

3.3.1. Loss Function Ablation

3.3.2. Architectural Choices

3.4. Applications in Real Marine Environments

4. Discussions and Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI