1. Introduction
The detection and identification of underwater targets such as shipwrecks, aircraft wreckages, pipelines, and reefs are crucial tasks in marine science research, resource exploration, and ocean mapping [
1]. These tasks have significant implications for maritime traffic safety, marine fishery development, sonar detection, and military operations. Side-scan sonar, known for its high-resolution imaging capabilities, has long been the preferred technology for underwater target detection and identification [
2,
3,
4,
5]. However, due to limitations in range and the complexity of the measurement environment, a large number of low-resolution side-scan sonar images still exist, greatly hindering the development of underwater target identification. Super-resolution (SR), a technique that enhances image resolution through reconstruction, can improve the quality of underwater sonar images and play a crucial role in enhancing target detection accuracy [
2,
3,
4,
5], target segmentation [
6,
7,
8], and other image scenarios [
9,
10,
11,
12].
Super-resolution (SR) is an image processing technology aimed at reconstructing high-resolution (HR) images from low-resolution (LR) images or videos [
13]. Depending on the number of input images, SR can be classified into single-image SR (SISR) or multi-image SR (MISR). SISR generates high-resolution images from a single low-resolution image and is widely used in environmental monitoring, medical image processing, video surveillance, and security. The key to SISR lies in exploring the mapping relationship between HR and LR images. Previous studies have proposed various methods to learn this mapping. Traditional filtering and interpolation techniques such as linear interpolation [
14,
15] generate HR images based on neighborhood information. Although these methods are computationally efficient, they simplify the mapping relationship in SISR, leading to overly smooth images that lack important details, especially texture and target edges.
With the development of deep learning, convolutional neural networks (CNNs) were introduced to the SR field by Dong et al. [
16]. As deep learning neural networks evolved, many researchers designed powerful neural network models such as ResNet [
17], DenseNet [
18], and Residual Dense Blocks (RDBs) to further enhance SR models. These include multi-layer networks like VDSR, the enhanced deep SR (EDSR) network [
19], the deep recursive convolutional network (DRCN) [
20], and very deep residual channel attention networks (RCANs) [
21].
Despite significant breakthroughs achieved by complex CNNs in image super-resolution reconstruction, issues such as blurred image edges, high noise levels, and poor perceived image quality persist. The advent of generative adversarial networks (GANs) led to their introduction into the SR domain by Ledig et al. [
22]. Since the emergence of SRGAN [
22], numerous GAN-based models have been applied to super-resolution image generation. Enhanced SRGAN (ESRGAN) reduced image artifacts by extending Residual Blocks (RRDBs), while fine-grained-attention GANs (FASRGANs) [
23] improved the capability to generate high-quality images through image scoring.
Despite the significant achievements of super-resolution (SR) in optical images [
24,
25,
26,
27], the effectiveness of SR generation for underwater samples remains unsatisfactory. The complexity and diversity of interference in underwater environments result in substantial differences in texture and detail between sonar images and optical images. Existing algorithms such as RCAN [
21], EDSR [
19], SRResNe [
28], and SRGAN have demonstrated suboptimal performance in the super-resolution reconstruction of side-scan sonar images. Specifically, EDSR and SRGANs exhibit limited capabilities in image restoration and texture reproduction, leading to unclear images. Furthermore, RCANs and SRResNe suffer from noise amplification issues while enhancing image resolution.
To address these issues and improve the image quality of super-resolution reconstruction for side-scan sonar images, this paper proposes a multi-scale adversarial generative network. The network details are as follows: The generator utilizes Residual Dense Blocks (RDB) to design a five-layer Residual Dense Network (RDN), extracting rich local features through densely connected convolutional layers [
28,
29,
30,
31]. The final layer of the generator incorporates a Convolutional Block Attention Module (CBAM) to capture detailed texture information by focusing on different scales and channels [
32,
33]. The discriminator adopted a multi-scale discriminative structure to enhance the detail perception of both generated and high-resolution (HR) images through the comprehensive scale-aware judgment of sonar images [
34,
35]. Additionally, considering the increased noise in super-resolved sonar images, our loss function emphasized the peak signal-to-noise ratio (PSNR) to reduce noise and improve output image quality. We also constructed a dataset (DNASI-I) for side-scan sonar experiments. Subjective qualitative comparisons and objective quantitative analyses with the current state-of-the-art super-resolution image reconstruction methods were conducted on the public dataset KLSG-II and our dataset DNASI-I. The experimental results demonstrated the significant effectiveness and superiority of SIGAN in the super-resolution reconstruction of side-scan sonar images.
3. Experimental Validation
To demonstrate the practicality and efficacy of the SIGAN in the super-resolution reconstruction of side-scan sonar images, we conducted comparative analyses with several state-of-the-art super-resolution (SR) networks, including EDSR, the SRGAN, the RCAN, and MSRResNet. To ensure a fair comparison, these models were evaluated under the same dataset conditions. The hardware setup for model training comprised two Intel Xeon Silver 4410T processors and four NVIDIA GeForce RTX 4090 graphics cards. The software environment was configured with PyTorch 1.6.0, CUDA 11.8, and Python 3.10 on Windows 10.
3.1. Datasets
Given the scarcity of publicly available side-scan sonar datasets, we collected various underwater sonar images from diverse scenarios and locations across the nation. The data sources included publicly available datasets (e.g., KLSG-II), datasets obtained through collaborations with other universities, and data from in situ ocean exploration experiments. The dataset encompassed a wide range of content, including seabed shipwrecks, aircraft wreckages, underwater rocks, schools of fish, divers, and seabed sand dunes. We compiled this data to create our dataset, named DNASI-I. To validate our methodology, we selected two datasets for experimentation: the publicly available side-scan sonar dataset KLSG-II and our proprietary dataset DNASI-I. The KLSG-II dataset is available for download on GitHub (URL:
https://github.com/HHUCzCz/-SeabedObjects-KLSG--II, accessed on 3 June 2024). Both datasets underwent identical processing techniques. Specifically, we randomly cropped all images to 128 × 128 pixels, covering three categories: seabed shipwrecks, aircraft wreckages, and underwater rocks. Finally, we selected 100 images with distinct target texture structures and easy comparability from our dataset DNASI-I to serve as the test set (t100). Sample images from the datasets are shown in
Figure 4.
3.2. Objective Evaluation Metrics
The evaluation metrics adopted in this study included peak signal-to-noise ratio (PSNR), the Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS). The PSNR was used to calculate the mean squared error between two images, deriving a peak signal-to-noise ratio to assess the similarity between the training and generated images. The SSIM quantifies structural information from the perspective of image composition, independent of brightness and contrast, reflecting the structural integrity of objects within the scene. LPIPS measures the perceptual differences between two images by learning the inverse mapping from the generated to real images, prioritizing perceptual similarity. These metrics collectively assist in comprehensively assessing the quality of generated images.
3.3. Visual Comparison with Other SR Methods
We trained the models using the RCAN, EDSR, MSRResNet, the SRGAN, and our method (the SIGAN). Subjective visual comparisons were conducted for magnification factors of r = 2, r = 4, and r = 8.
Figure 5 shows the results generated by training our SIGAN model on the KLSG-II dataset.
Figure 6 presents the results generated by different networks trained on the KLSG-II dataset.
A comparative analysis was performed using images of underwater rocks, aircraft, and shipwrecks. At a magnification of r = 2, images generated by the RCAN, EDSR, and the SRGAN exhibited pronounced sharpening effects, resulting in overly harsh edges and image distortions. Conversely, images from MSRResNet appeared overly smooth, leading to a loss in high-frequency information. At a magnification of r = 4, while EDSR-based methods managed to restore some textures, they also introduced artifacts not present in the original images, causing distortions. The RCAN and SRGAN methods produced images with unclear edges and blurring. The images from MSRResNet continued to suffer from excessive smoothness, resulting in distortions. At a magnification of r = 8, a detailed image of a sunken ship was analyzed. Comparisons of the ship’s surface structure revealed that MSRResNet and EDSR restored images with uneven edges, though some high-frequency details were preserved. The RCAN and SRGAN handled edge details well but resulted in overall blurry images.
In contrast, our SIGAN method, utilizing attention mechanisms, focused more adeptly on the super-resolution generation of targets, thus offering significant advantages in handling edge details. For instance, at a magnification of r = 4 in
Figure 6, the SIGAN effectively managed the edge details of the aircraft wreckage, enhancing edge resolution while preserving high-frequency image details. Due to the loss function’s emphasis on the PSNR metric, our method produced noticeably fewer image artifacts compared to other methods, successfully avoiding the introduction of noise while enhancing resolution.
Training was also conducted on the DNASI-I dataset, with the results shown in
Figure 7:
Representative images were selected for comparison. At a magnification of r = 2, images generated by the RCAN, EDSR, and the SRGAN still exhibited pronounced sharpening effects. At a magnification of r = 4, a noisier image was selected for comparison, revealing significant noise increase in MSRResNet and the SRGAN, while our method managed noise effectively, not exacerbating it with increased resolution. At a magnification of r = 8, an image with detailed horizontal features of a sunken ship showed that the deck railing structures restored by the RCAN and SRGAN were blurry and discontinuous, yielding suboptimal results.
In contrast, our SIGAN approach provided clear advantages, especially noticeable at higher magnifications. The attention mechanism in the SIGAN allowed for superior detail preservation and clarity, demonstrating enhanced capability in handling complex image textures without compromising on noise, which is crucial for practical applications in underwater imaging.
3.4. Objective Evaluation with Other Methods
Table 1 presents the quantitative analysis of our SIGAN method on the KISG-II dataset, using magnification factors of r = 2, r = 4, and r = 8 for different categories such as shipwrecks, aircraft, and underwater rocks. We calculated the average results for PSNR, SSIM, and LPIPS. The most outstanding performance indicators are highlighted in bold.
An analysis of
Table 1 revealed that our SIGAN method consistently outperformed other methods across all scales, demonstrating significantly better average values for each evaluation metric, specifically:
PSNR: the SIGAN and EDSR generally showed higher PSNR values, particularly at a magnification factor of 2, where both algorithms demonstrated strong performance across multiple categories. Notably, at r = 4, the SIGAN achieved an average PSNR of 25.95 on the Set100, nearly indistinguishable from high-resolution images.
SSIM: the SIGAN consistently performed well across different magnification factors and categories, particularly in the “Shipwreck” and “Aircraft” categories, where the SSIM values were high, indicating good structural fidelity of the images.
LPIPS: Low LPIPS values indicated that the perceived quality of the images was closer to the original. EDSR stood out in the “Set100” category, showcasing its advantages in perceived image quality. Similarly, the SIGAN showed lower LPIPS values at lower magnification factors (such as 2×) in the “Shipwreck” and “Aircraft” categories, indicating superior perceptual quality.
We also conducted a quantitative analysis of various methods on the DNASI-I dataset, with the results presented in
Table 2. The best-performing data is highlighted in bold:
From the analysis of
Table 2, it is evident that the SIGAN continued to excel on the DNASI-I dataset. Although the performance generally declines with increasing magnification factors, it is noteworthy that our method maintained a certain level of stability without significant drops. In
Table 2, the SIGAN’s performance on the Set100 test set significantly surpasses other methods on every evaluation metric. The quantitative comparisons clearly demonstrate that our method is superior. At r = 4, the average PSNR on Set100 is higher by 3.5, illustrating the clarity and detail preserved by the SIGAN.
Comparing
Table 1 and
Table 2, it is apparent that models trained on the DNASI-I dataset perform better than those trained on the KLSG-II dataset at the same scale. Whether it is our SIGAN approach or other networks like EDSR, this indirectly attests to the superior performance and higher quality of our dataset in super-resolution tasks.
4. Discussion
The essence of deep-learning-based side-scan sonar image super-resolution is to extract edge detail features from targets, such as the bows, masts, and shadows of shipwrecks, from complex models. These features, learned from high-resolution images, are then applied to enhance the detail in low-resolution images. Considering that the ultimate goal of image super-resolution reconstruction is to improve the performance of downstream target recognition tasks, this discussion focuses on analyzing the performance of SIGAN-generated super-resolution reconstruction images in target recognition tasks. Additionally, we conducted ablation studies to explore the contributions of various components of the network, aiming to understand how super-resolution enhances the accuracy of target recognition.
4.1. Performance in Target Recognition
To assess the effectiveness of super-resolution reconstructed side-scan sonar images in target recognition tasks, we selected a dataset of 200 high-resolution side-scan shipwreck images, each measuring 600 × 600 pixels, for testing. As a baseline, we used 2181 low-resolution side-scan sonar images of 128 × 128 pixels, reconstructed using EDSR, the SRGAN, the RCAN, MSRResNet, and our proposed SIGAN, followed by annotation processing. The detection model employed was Yolov5, known for its light weight, accuracy, and ease of deployment. The experimental groups are detailed as follows
Table 3:
Testing was conducted using 200 real high-resolution side-scan sonar shipwreck target images, and the models were evaluated using precision, recall, and average precision (AP), widely used metrics in the field of object detection. The results are as follows:
From the results in
Table 4, it was observed that the introduction of high-resolution reconstructed images does not always directly improve the accuracy of target detection. For example, compared to Group G1 (using only low-resolution images), Group G3 (using high-resolution images reconstructed by the SRGAN) saw a 2% decrease in precision. This highlights that super-resolution algorithms may negatively impact detection performance under certain conditions, potentially due to the loss of complex textures or excessive smoothing during the reconstruction process. Notably, the super-resolution approach based on EDSR (G2) achieved a precision of 91.1%, indicating a relatively high accuracy rate, yet its recall was lower than the control group (G1). This suggests that although this method can increase the probability of correctly identifying targets, it does not detect as many targets as desired. This may reflect some limitations of the EDSR method in reconstructing details, such as possibly losing some critical information while preserving local image features. In contrast, our method (G6, using high-resolution images reconstructed by SIGAN) performed excellently in both precision and recall, with a 2.6% increase in precision over the original low-resolution images (G1), significantly demonstrating the effectiveness of our approach. The high precision and recall indicated that images reconstructed by the SIGAN not only retained crucial textures and structural details but also enhanced the authenticity and diversity of the images, which are vital for applications in side-scan sonar imaging.
To prevent dataset bias on a single detection model, multiple detection models (YOLOv5n, YOLOv5s, and YOLOv5m) were used for a precision comparison experiment (
Table 5). The analysis of the results showed that there are differences in precision among the detection models due to varying complexities and training durations. However, a horizontal comparison across the three experimental groups consistently showed the best performance in Group G6, further validating the effectiveness of the experimental data.
4.2. Ablation Study
To verify the role of each component in image super-resolution, we conducted ablation studies on the CBAM, the multi-scale structure of the discriminator, and the L2 loss function using the method of controlled variables. The evaluation metrics employed were PSNR, SSIM, and LPIPS. Four groups were designed for comparative experiments, with the experimental setup, training dataset, and evaluation data consistent with Chapter 3. The results are shown in
Table 6.
From the table, it is evident that compared to the control group (T1), Group T2, which integrated the CBAM, focused on important channels and spatial regions, aiding the model in concentrating on crucial information in the image. This significantly improved the similarity of the reconstructed image on the target, as evidenced by the substantial improvement in the LPIPS metric. Comparing T1 and T3, the multi-scale structure helped the discriminator in distinguishing the authenticity of images, thereby enhancing the quality of the generated images, which was reflected in the improved PSNR and SSIM metrics. Comparing T1 with T4 shows the importance of the L2 loss function in enhancing the PSNR metric, which increased by 4.3, significantly reducing the noise in the generated images.
5. Conclusions
Addressing the challenges of limited underwater target samples, small dataset sizes, high acquisition difficulty, and low resolution in side-scan sonar imagery, we proposed a super-resolution reconstruction method named the SIGAN, targeting shipwrecks, airplane debris, and underwater reefs. We conducted comparative experiments with current state-of-the-art super-resolution reconstruction methods. The experimental results demonstrated that our proposed method exhibited significant superiority in both subjective performance and objective data analysis. The generated images have clear edges and complete structures, achieving a PSNR value of 32 on the DNASI-I dataset, which is significantly higher than other similar super-resolution algorithms.
Additionally, we discussed the results of target detection and ablation experiments. The results indicated that using the SIGAN for target detection improved accuracy by 2% compared to general methods and by 3.6% compared to the original images. These findings preliminarily demonstrated the applicability and superiority of the SIGAN method in handling the super-resolution reconstruction of underwater side-scan sonar images, providing an effective solution to the issues of underwater target sample scarcity and low image resolution. However, due to the current small scale of side-scan sonar datasets, our method’s applicability to all side-scan sonar data remains unverified. Therefore, the experimental results of this study have certain limitations. More publicly available data and extensive experiments are needed in the future to further refine and validate our method.