Next Article in Journal
Simulation and Design of Three 5G Antennas
Previous Article in Journal
Forecasting Cost Risks of Corn and Soybean Crops through Monte Carlo Simulation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Side-Scan Sonar Image Augmentation Method Based on CC-WGAN

1
School of Electrical Engineering, Naval University of Engineering, Wuhan 430033, China
2
People’s Liberation Army Unit 91650, Guangzhou 510220, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(17), 8031; https://doi.org/10.3390/app14178031 (registering DOI)
Submission received: 29 July 2024 / Revised: 3 September 2024 / Accepted: 6 September 2024 / Published: 8 September 2024

Abstract

:
The utilization of deep learning algorithms for side-scan sonar target detection is impeded by the restricted quantity and representativeness of side-scan sonar (SSS) samples. To address this issue, this paper proposes a method for image augmentation using a CC-WGAN network. First, the generator incorporates the Convolutional Block Attention Module (CBAM) to enhance the assimilation of global information and local features in the input images. This integration also improves stability and avoids mode collapse problems associated with the original Generative Adversarial Network. Subsequently, the CBAM is incorporated into the discriminator to facilitate a better understanding of the relevance and significance of input data, thereby enhancing the model’s generalization ability. Finally, based on this model, existing few-sample SSS images are augmented, and we utilize the augmented images for discrimination and detection with YOLOv5. The experimental results show that following training with the SSS dataset that is augmented by this network, the accuracy of target detection increased by 7.6%, validating the feasibility of our proposed method. This method presents a novel solution to the problem of low model accuracy in underwater target detection with side-scan sonar due to limited samples.

1. Introduction

With the continuous exploitation of marine resources, an increasing number of detection technologies have been applied to underwater target detection [1], maritime search and rescue [2], navigation safety, marine engineering [3,4], seabed mapping, and other fields. Currently, side-scan sonar systems, owing to their affordability and superior scanning precision, have obvious advantages for seabed target identification and are the main technology for underwater target detection [5]. The advancement of deep learning technology in computer vision has significantly improved target detection performance and has been widely used in the underwater intelligent detection field. However, deep convolutional neural network (DCNN) technology requires a substantial amount of high-quality sample data that are representative to train target detection models [6]. Due to the elevated costs associated with measurements and the limited occurrence of maritime events, there is a scarcity of side-scan sonar image samples, which also lack representativeness. This dearth constrains the progress of DCNN in underwater target detection. Consequently, there is an immediate requirement for research on augmenting samples for few-sample underwater targets in SSS images.
To address the issue of limited SSS images, some scholars have utilized techniques for image transformation augmentation in order to expand the number of samples [7,8]. Nevertheless, these artificially generated samples are deficient in terms of representativeness due to their failure to account for variations in imaging conditions and environmental factors. This deficiency has a negative impact on the enhancement of accuracy in recognizing underwater targets. Inspired by sample augmentation techniques in optical imaging, many researchers have used transfer learning methods for SSS underwater target sample augmentation. However, these methods do not consider the imaging mechanism of side-scan sonar during the transfer conversion model training [9,10], resulting in samples being generated with single styles that have poor representativeness, thus providing limited improvement in target detection models’ performance. In recent years, with the rapid advancement of GAN networks, they have been extensively employed to produce numerous high-quality image samples. The core of GAN is to generate high-quality images so that the discriminator cannot distinguish between real and fake by adversarially training the generator and discriminator, having stronger and more comprehensive feature extraction and learning capabilities than traditional machine learning. Huang et al. [11] presented a thorough sample augmentation approach considering the target, texture, resolution, noise, and background to solve the problem of insufficient training samples for shipwreck detection. However, the overall implementation process is complex. Tang et al. [12] introduced a technique for enhancing underwater target images in side-scan sonar data through cross-domain mapping of corresponding target images. However, due to the need for actual sea trials, this method is costly and complex. Xu et al. [13] introduced a self-attention network called multi-feature fusion (MFSANet) for the creation of novel side-scan sonar image varieties. Bai et al. [14] introduced a novel global context external attention network (GCEANet) for producing side-scan sonar images that align with absent categories in image classification. Yang et al. [15] proposed an SSS image augmentation approach that is suitable for multi-task scenarios. However, cross-domain transformation sample augmentation methods produce samples with weak representativeness due to the similarity but not exact match of targets in non-domain images to actual underwater targets to be detected. Zhao et al. [16] proposed a method that utilizes the imaging mechanism and image features of side-scan sonar to simulate samples of SSS images, incorporating a ray model and sonar equation. However, the simulation is constrained in its capacity to model shipwrecks as entities with consistent reflection coefficients across their entire structure, failing to reflect material differences in different parts of shipwrecks.
As a result, few GAN networks can meet the data augmentation needs for a low number of samples [17,18]. In 2017, Ishaan Gulrajani et al. [19] proposed the Wasserstein GAN with a gradient penalty (WGAN-GP). The WGAN-GP is an improved GAN algorithm with the following advantages over traditional GAN algorithms: It solves the instability problem of traditional GAN training, making it easier to train the generator and discriminator. By introducing a gradient penalty mechanism, the training process becomes smoother and more stable. Establishing a more effective loss function between the generator and discriminator enables the generator to learn better distributions of generated samples, resulting in higher-quality samples that are capable of generating more realistic images or data. It also improves the stability of the optimization process and addresses issues like gradient vanishing. Therefore, the WGAN-GP network is chosen as the principal approach for augmenting side-scan sonar data. However, using WGAN-GP to improve underwater target images captured by black-and-white SSS is impractical because the original network’s dataset mainly includes daily life images and natural scenery images, resulting in unrealistic-looking targets (Figure 1).
To improve the learning ability of the network for targets and considering the features of black-and-white waterfall images in SSS, this paper proposes a method for augmenting underwater target samples based on the CC-WGAN. First, the generator incorporates the CBAM to compensate for the convolution operation’s limitation of only calculating correlations within local pixel regions, thereby improving the clarity of the generated images. It also provides better stability, avoiding the mode collapse problem, where the generated image modes concentrate on a few pixels, lacking diversity. Then, the discriminator incorporates the CBAM to help the model better understand the relevance and importance of input data, enhance target learning, reduce information diffusion, and improve the model’s generalization ability. Finally, based on this model, existing few-sample SSS images are augmented, and we utilize the augmented images for discrimination and detection with YOLOv5. The experiments demonstrate that the CC-WGAN-based augmentation approach can effectively produce a substantial number of high-quality samples that align with the characteristics of SSS images, compensating to some extent for the limited number of underwater samples. This method provides a novel solution to the problem of low model accuracy in underwater target detection with side-scan sonar due to limited samples.

2. Materials and Methods

2.1. CC-WGAN Network Structure

The training approach of the GAN involves a competitive interaction between two networks. The generator network transforms random noise into the input space, while the discriminator network is tasked with distinguishing between generated samples and real data samples, i.e., it determines whether the input data to the discriminator come from real data or generated data. The generator is instructed to outperform the discriminator. The objective function of the competition between the generator and the discriminator is shown in Equation (1).
min G max D L G A N G , D = E x ~ P r log D x + E x ~ ~ P g log 1 D x ~
where P r is the data distribution, P g is the model distribution and implicitly defined as x ~ = G ( z ) , z ~ p ( z ) , and the generator’s input z is drawn from a straightforward noise distribution p , like a uniform or spherical Gaussian distribution.
If the discriminator is trained to optimality before each update of the generator’s parameters, then minimizing the function is equivalent to minimizing the Jensen–Shannon (JS) divergence between P r and P g . However, the saturation of the discriminator can lead to vanishing gradients. In practice, the generator is usually trained to maximize E x ~ ~ P g [ l o g ( D ( x ~ ) ) ] , which can partially mitigate this issue. Nevertheless, even this modified loss function can perform poorly with a well-trained discriminator.
The advent of the Wasserstein GAN (WGAN) addresses these issues by using the Wasserstein distance instead of JS divergence as the loss function. The Wasserstein distance can reflect the distance between two distributions even when P r and P g do not overlap (where JS divergence remains constant, leading to zero gradients and causing gradient vanishing and mode collapse problems). However, in practical applications, the WGAN handles the Lipschitz continuity constraint by directly employing a weight clipping strategy [19]. This often results in the discriminator’s learned weights being concentrated at the maximum and minimum values, leading to gradient explosion and non-convergence. Therefore, a penalty term is added to the original network’s loss function, as shown in Equation (2).
min G max D L G P G , D = L G A N G , D O r i g i n a l   c r i t i c   l o s s + λ E x ^ ~ P x ^ [ ( x ^ D x ^ 2 1 ) 2 ] G r a d i e n t   p e n a l t y
In the equation, · 2 represents the L2 norm, denotes the gradient operator, λ is the penalty coefficient, P x ^ indicates the sampling distribution of the gradient penalty term, and x ^ represents the linear interpolation between real sample data and generated data. This network model, which includes the gradient penalty term, is referred to as the WGAN-GP. By independently applying the gradient penalty to each sample, the WGAN-GP satisfies the Lipschitz continuity while ensuring that the discriminator’s weights are evenly distributed within a certain range, thus improving the model’s convergence speed and addressing the gradient explosion and non-convergence issues that are often encountered in the WGAN. The structure of the WGAN-GP network is illustrated in Figure 2. In this structure, G is the generator network that generates images. It receives a random noise z and produces an image denoted as G(z). D is the discriminator network, tasked with evaluating the authenticity of an image. The generator is used to generate data distributions that match real SSS images. The information input is evaluated and classified by the discriminator. During model training, the generator G creates synthetic data that closely mirror real data to mislead the discriminator D. Conversely, D attempts to discern the authenticity of the input data. Through continuous iterative training of the model, new data are generated.
In this paper, the following optimizations were made to the structure of the WGAN-GP network: First, the CBAM was integrated into the generator. This module improves the acquisition of specific attributes, providing better stability and avoiding the inherent mode collapse problem. Additionally, it improves the clarity of the generated images. Second, the CBAM was integrated into the discriminator. This integration helps the model better understand the relevance and importance of the input data, consequently enhancing its generalization capacity. It also enhances target learning and reduces information diffusion. The CC-WGAN is an optimization of the WGAN-GP network structure, i.e., a clever integration of the WGAN-GP and CBAM.

2.2. Integration of CBAM in Generator and Discriminator

It is essential to have a thorough understanding of the specific characteristics and contextual background in side-scan sonar images in order to produce images of a high quality. To compensate for the convolution operation’s limitation of only calculating correlations within local pixel regions, improve the integration of global and local information in input images for enhanced learning, and provide better stability while avoiding the inherent mode collapse problem, this paper introduces the CBAM into the generator, as illustrated in Figure 3.
Figure 3 depicts the entire structure after incorporating the attention module into the generator. The CBAM comprises both the Channel Attention Module (CAM) and the Spatial Attention Module (SAM), as shown in Figure 4. The CAM focuses on the importance of individual channels in the feature map. It captures the information of the entire feature map through a global pooling operation and compresses it into a vector, which is then fed into a multilayer perceptron (MLP). The MLP generates weight values for each channel, which are used to recalibrate each channel in the original feature map, thus emphasizing the important features and suppressing the less important features. The CAM learns the dependencies between different channels by evaluating the significance of each channel’s weight to adjust the response in the feature map. This helps the network focus on the most important feature channels, enhancing the feature representation capability. The SAM, on the other hand, focuses on different spatial locations in the feature map. It generates a weight map by computing the spatial distribution of the input feature map, and this weight map is used to indicate which spatial locations are more important. The SAM typically learns these spatial weights by using a small convolutional network. The SAM focuses on the spatial relationships within the feature map by computing the importance weight of each spatial position to adjust the spatial distribution of the feature map. This allows the network to focus on the spatial locations that are most crucial, thereby enhancing the spatial feature representation. This method enables convenient insertion at different locations in the network and conserves computational resources. In contrast to attention mechanisms that concentrate only on spatial or channel characteristics, this method achieves superior performance.
To help the model better understand the relevance and importance of input data, improve its generalization ability, enhance target learning, and reduce information diffusion, this paper also introduces the CBAM into the discriminator, as illustrated in Figure 5.
This model enhances the interaction between channel dimensions and spatial dimensions, improving the network performance and accuracy, which in turn increases the clarity of the generated images. It provides better stability, avoiding the mode collapse problem of the original Generative Adversarial Network, where the generated image modes are concentrated in a few modes lacking diversity. This improves the model’s capacity to generalize and the clarity of generated images, contributing to the production of high-quality output images.

3. Experiments

The sample augmentation of SSS images based on the CC-WGAN network is a crucial component of this method. To assess the practicality and efficiency of the proposed methodology, multiple comparative experiments were designed to assess the performance of the CC-WGAN network.

3.1. Dataset

Currently, publicly available side-scan sonar image datasets are significantly fewer and smaller in scale compared to datasets in other target detection fields. Therefore, this study uses the openly accessible KLSG side-scan sonar image dataset for relevant research. The KLSG dataset includes two categories, ships and aircrafts, with a total of 447 images. Among them, there are 395 ship targets and 62 aircraft targets. The network generates 2430 augmented images. The augmented images, in conjunction with authentic SSS images, are utilized in the comparative experiments. Model training is carried out using an Intel(R) Core(TM) i5-13500H CPU and a 6 GB NVIDIA GeForce RTX 4050 GPU. The software compilation environment consists of PyTorch 2.1.2, CUDA 11.8, and Python 3.11.7 on a Windows 11 system. Figure 6 displays a portion of the dataset used for this study.

3.2. Evaluation Metrics

Image generation quality is primarily evaluated based on image clarity, feature diversity, and structural similarity. In this study, we chose the Fréchet Inception Distance (FID), Maximum Mean Discrepancy(MMD), Peak Signal-to-Noise Ratio (PSNR), and Structural Similarity Index Measure (SSIM) to evaluate the quality of the generated images. The FID quantifies the dissimilarity between the feature vectors of authentic and generated image sets, reflecting the similarity between the two sets of images. It involves extracting features from both images using a pre-trained network on a publicly available dataset, modeling the feature space using a Gaussian model, and determining the distance by considering the mean and covariance of the Gaussian model. This approach captures the differences between the two distributions more effectively by considering both the mean and the covariance matrix. The MMD is a statistical test utilized to quantify the similarity between two sets of features. It involves transforming real and generated datasets into a kernel space using a predetermined kernel function, followed by computing the average disparity between the two distributions to determine if the samples originate from different distributions. The PSNR is a metric used to quantify the disparity between two images. It begins by determining the average squared error between the images and subsequently derives the PSNR, which is utilized to assess the training set against the generated images. The SSIM takes the structural details that are present in the image into account. It defines structural information as a characteristic that represents the arrangement of objects in the scene, regardless of their brightness and contrast. The SSIM characterizes distortion by considering three distinct elements: brightness, contrast, and structure. The average value is utilized to estimate brightness, while the standard deviation serves as an indicator of contrast, and covariance measures structural similarity. For FID and MMD, smaller values indicate better performance, suggesting that the distribution of the generated images is more similar to that of real images, which implies a higher quality. For the PSNR, larger values indicate smaller differences between the images, signifying a better quality of the generated images. Similarly, higher SSIM values denote a greater similarity between the images.

3.3. Assessment of Augmented Image Quality

In this study, image augmentation techniques were applied to ships and aircraft. Diverse background settings were utilized to select representative images for the training process. Below in Figure 7, several instances of the augmented samples are presented as examples.
The FID, MMD, PSNR, and SSIM metrics for the two types of augmented targets were calculated to evaluate the image quality for each category, as shown in Table 1.
From Table 1, it can be seen that for the FID evaluation metric, all categories have relatively high FID values. It is important to take into account that FID values could be compared against varying benchmarks when analyzing side-scan sonar images. The FID values for the “Ship” and “Aircraft” categories are similar, indicating a high generation quality for both categories. For the MMD evaluation metric, all categories have MMD values around 0.2, suggesting a significant statistical similarity between the produced images and their authentic equivalents. The “Aircraft” category has the highest PSNR value, indicating better generation quality for this category. For the SSIM metric, both categories have relatively low SSIM values, revealing a notable variance in structural characteristics between the produced and authentic images. Analyzing these assessment criteria reveals that the underwater target augmentation technique presented in this study exhibits varying effectiveness across different targets, successfully generating high-quality augmented images.

3.4. Performance of Target Detection Model

Given that the purpose of this study is to enhance images of underwater targets acquired from SSS in order to enhance the effectiveness of deep learning-based target detection models, this section conducts performance evaluation experiments utilizing a deep learning approach for the detection of targets. Currently, numerous target detection models are available. Considering the high speed, high precision, and lightweight features of the YOLOv5, this study employs the YOLOv5 model.
Using images of ships and aircrafts, three experimental sets with different dataset sizes were designed to train the YOLOv5 model. The datasets comprise a collection of authentic images, a compilation of augmented images, and a blend of both authentic and augmented images. The dataset distribution is shown in Table 2. The augmented ships and aircraft images were selected after filtering out low-quality images. To assess the effectiveness of the target detection model, 200 authentic SSS images were chosen.
After training the model, 200 authentic SSS images were employed to evaluate the model. The evaluation criteria utilized comprise precision, recall, and mean average precision. The results of the detection are detailed in Table 3.
From Table 3, it can be observed that, when compared to YOLOv5-1 and YOLOv5-2, the target detection model trained using the image augmentation technique proposed in this study achieves superior precision, recall, and mean average precision for aircraft and ship targets. This indicates that augmented images play a crucial role in improving the model’s detection capabilities. YOLOv5-1, which exclusively utilizes authentic images, has the lowest precision, recall, and mean average precision, highlighting that the amount of data are a key factor affecting model detection performance. After image augmentation, the dataset YOLOv5-3, which includes both real and augmented images, shows significant improvements in precision, recall, and mean average precision. This demonstrates that the performance improvement is mainly attributed to the proposed method for generating augmented SSS images, which effectively augments the underwater SSS image data.
To further illustrate the impact of various training datasets on the identification of authentic side-scan sonar target images, some detection results using different training sets are presented in Figure 8. As shown in Figure 8, the confidence in detecting the corresponding categories is highest when using the third dataset and lowest when using the first dataset.

4. Discussion

4.1. Ablation Study

To assess the contribution of each module to the model’s performance, we performed ablation experiments on CBAM1 and CBAM2 using evaluation metrics such as the FID, MMD, PSNR, and SSIM. Four sets of experiments were designed, as outlined in Table 4.
As shown in Table 4, compared to the first group (control group), the FID value is higher after integrating the CBAM into the generator. However, the FID value is lower than the control group when the CBAM is added to the discriminator (third group). The FID value is the lowest when the CBAM is integrated into both the generator and the discriminator, indicating that the proposed model can produce higher-quality images. For the PSNR metric, the fourth group has the highest PSNR, suggesting that the proposed model produces images with a higher level of useful information relative to noise, resulting in better image quality. For the SSIM metric, the fourth group also has the highest SSIM value, indicating a smaller structural variance between the generated images and the authentic images.

4.2. Impact of Different Models on Target Detection Results

Since different models may yield varying results in target detection, in order to mitigate the potential bias introduced by a single detection model affecting the performance evaluation of the generated images, precision comparison experiments were conducted using multiple detection models (YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l).
As shown in Table 5, owing to variances in the intricacy and duration of training for detection models, the results of target detection vary when using different models. However, the experimental results of the second and third groups, which used datasets containing augmented images, were superior to those of the first group (which did not use augmented images), further demonstrating the effectiveness of the experimental data.

4.3. Impact of Different Generation Algorithms

In order to test the performance of the generative models in this paper, CC-WGAN is compared and analyzed with the basic GAN and WGAN models. From Section 4.2, it can be seen that the YOLOv5l detection model is the best for target detection in sonar images, so in this section, we use the YOLOv5l detection model for experiments to evaluate the performance of the three generative models.
The FID, MMD, PSNR, and SSIM metrics are calculated to evaluate the quality of images generated by the three algorithms, as shown in Table 6.
As can be seen from Table 6, the CC-WGAN algorithm generates images with the lowest FID value, which indicates that this paper’s algorithm generates higher-quality images, and its MMD is the smallest, which indicates that there is a significant statistical similarity between the generated images and the real images. For the PSNR metric, this paper’s algorithm generates images with the highest PSNR, which indicates that the proposed model obtains a better image quality. For the SSIM metric, the three groups are similar.
Next, YOLOv5l was used to evaluate the performance of target detection on the images generated by the three algorithms, as shown in Table 7.
As can be seen from Table 7, the images generated by the CC-WGAN algorithm are optimal in terms of precision, recall, and mean average precision for target detection, further reflecting the advantages of the proposed algorithm.

5. Conclusions

Addressing the issues of the scarcity, limited number of samples, and high acquisition cost of SSS underwater target images, which lead to suboptimal outcomes in the detection of underwater targets using deep learning techniques, a CC-WGAN-based side-scan sonar underwater target augmentation method is proposed. This approach amplifies the diversity and quality of the generated images. By integrating the CBAM into the generator and discriminator, the correlation between the dimensions of channels and spatial dimensions is enhanced, enhancing target learning, reducing information diffusion, and improving the model’s performance and accuracy, thereby increasing the quality of the generated images. For side-scan sonar images with too few samples, image augmentation aids in sample augmentation and improves target detection performance. Additionally, this method considers the relationship between targets and shadows, which is one of the crucial pieces of information contained in SSS images. Finally, we performed an assessment of the enhanced images utilizing the FID, MMD, PSNR, and SSIM assessment criteria and conducted performance evaluation experiments for target detection using deep learning models. Ablation experiments were also set up to validate the function of each module in influencing the model’s performance. It is worth mentioning that when using the CC-WGAN for image generation, the algorithm needs to run for about five days, which is slow, and the running speed of the algorithm should be improved in the future. This study explores the generation of numerous augmented images from a limited dataset, demonstrating that the use of augmented images can enhance the precision of detecting and recognizing underwater targets. This method partially mitigates the problem of limited underwater sample numbers.

Author Contributions

Conceptualization, J.Z. and H.L.; methodology, J.Z.; software, H.L.; validation, Y.P.; formal analysis, J.H.; data curation, J.Z.; writing—original draft preparation, J.Z.; writing—review and editing, H.L. and P.Q.; visualization, Y.P.; supervision, P.Q.; funding acquisition, H.L. and J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Fund for Excellent Young Scholars, grant number 42122025; the National Natural Science Foundation of China, grant number 42374050; and the Natural Science Foundation for Scholars of Hubei Province of China, grant number 2022CFB865.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to thank Guojun Zhai for his valuable suggestions and discussions that contributed to the development of this work. We appreciate the constructive feedback provided by the anonymous reviewers, which has significantly enhanced the quality of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yu, Y.; Zhao, J.; Gong, Q.; Huang, C.; Zheng, G.; Ma, J. Real-time underwater maritime object detection in side-scan sonar images based on transformer-YOLOv5. Remote Sens. 2021, 13, 3555. [Google Scholar] [CrossRef]
  2. Cvikel, D.; Grøn, O.; Boldreel, L.O. Detecting the Ma’agan Mikhael B shipwreck. Underw. Technol. 2016, 34, 93–98. [Google Scholar] [CrossRef]
  3. Xiaonan, C.; Minyan, L.; Longjun, H. Shipwreck statistical analysis and suggestions for ships carrying liquefiable solid bulk cargoes in China. Proc. Eng. 2014, 84, 188–194. [Google Scholar] [CrossRef]
  4. Piccinelli, M.; Gubian, P. Modern ships voyage data recorders: A forensics perspective on the Costa concordia shipwreck. Digit. Investig. 2013, 10, 41–49. [Google Scholar] [CrossRef]
  5. Greene, A.; Rahman, A.F.; Kline, R.; Rahman, M.S. Side scan sonar: A cost-efficient alternative method for measuring seagrass cover in shallow environments. Estuar. Coast. Shelf Sci. 2018, 207, 250–258. [Google Scholar] [CrossRef]
  6. Tan, C.; Sun, F.; Kong, T.; Zhang, W.; Yang, C.; Liu, C. A survey on deep transfer learning. In Proceedings of the Artificial Neural Networks and Machine Learning-ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018; pp. 270–279. [Google Scholar]
  7. Nayak, N.; Nara, M.; Gambin, T.; Wood, Z.; Clark, C.M. Machine learning techniques for AUV side-scan sonar data feature extraction as applied to intelligent search for underwater archaeological sites. Field Serv. Robot. 2021, 16, 219–233. [Google Scholar]
  8. Nguyen, H.-T.; Lee, E.-H.; Lee, S. Study on the classification performance of underwater sonar image classification based on convolutional neural networks for detecting a submerged human body. Sensors 2019, 20, 94. [Google Scholar] [CrossRef] [PubMed]
  9. Huo, G.; Wu, Z.; Li, J. Underwater object classification in sidescan sonar images using deep transfer learning and semisynthetic training data. IEEE Access 2020, 8, 47407–47418. [Google Scholar] [CrossRef]
  10. Ge, Q.; Ruan, F.; Qiao, B.; Zhang, Q.; Zuo, X.; Dang, L. Side-scan sonar image classification based on style transfer and pre-trained convolutional neural networks. Electronics 2021, 10, 1823. [Google Scholar] [CrossRef]
  11. Huang, C.; Zhao, J.; Yu, Y.; Zang, H. Comprehensive Sample Augmentation by Fully Considering SSS Imaging Mechanism and Environment for Shipwreck Detection Under Zero Real Samples. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  12. Tang, Y.; Wang, L.; Bian, S.; Jin, S.; Dong, Y.; Li, H.; Ji, B. SSS Underwater Target Image Samples Augmentation Based on the Cross-Domain Mapping Relationship of Images of the Same Physical Object. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 6393–6410. [Google Scholar] [CrossRef]
  13. Xu, H.; Bai, Z.; Zhang, X.; Ding, Q. MFSANet: Zero-Shot Side-Scan Sonar Image Recognition Based on Style Transfer. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
  14. Bai, Z.; Xu, H.; Ding, Q.; Zhang, X. Side-Scan Sonar Image Classification With Zero-Shot and Style Transfer. IEEE Trans. Instrum. Meas. 2024, 73, 1. [Google Scholar] [CrossRef]
  15. Yang, Z.; Zhao, J.; Yu, Y.; Huang, C. A Sample Augmentation Method for Side-Scan Sonar Full-Class Images That Can Be Used for Detection and Segmentation. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–11. [Google Scholar] [CrossRef]
  16. Zhao, X.; Zhao, J.; Zhu, W. Side-Scan Sonar Image Simulation Considering Imaging Mechanism and Marine Environment for Zero-Shot Shipwreck Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar]
  17. Liu, W.; Piao, Z.; Tu, Z.; Luo, W.; Ma, L.; Gao, S. Liquid warping gan with attention: A unified framework for human image synthesis. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5114–5132. [Google Scholar] [CrossRef] [PubMed]
  18. Jiang, W.; Liu, S.; Gao, C.; Cao, J.; He, R.; Feng, J.; Yan, S. Psgan: Pose and expression robust spatial-aware gan for customizable makeup transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5194–5202. [Google Scholar]
  19. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved Training of Wasserstein GANs. arXiv 2017, arXiv:1704.00028. [Google Scholar]
Figure 1. The unrealistic-looking targets.
Figure 1. The unrealistic-looking targets.
Applsci 14 08031 g001
Figure 2. Structure of the WGAN-GP network.
Figure 2. Structure of the WGAN-GP network.
Applsci 14 08031 g002
Figure 3. Integration of the CBAM into the generator.
Figure 3. Integration of the CBAM into the generator.
Applsci 14 08031 g003
Figure 4. Structure of the attention module.
Figure 4. Structure of the attention module.
Applsci 14 08031 g004
Figure 5. Integration of the CBAM into the discriminator.
Figure 5. Integration of the CBAM into the discriminator.
Applsci 14 08031 g005
Figure 6. Part of the dataset.
Figure 6. Part of the dataset.
Applsci 14 08031 g006
Figure 7. Augmented images.
Figure 7. Augmented images.
Applsci 14 08031 g007
Figure 8. Partial detection results using different training sets. The first row corresponds to dataset group1, the second row to dataset group2, and the third row to dataset group3.
Figure 8. Partial detection results using different training sets. The first row corresponds to dataset group1, the second row to dataset group2, and the third row to dataset group3.
Applsci 14 08031 g008
Table 1. Computed FID, MMD, PSNR, and SSIM metrics for augmented images.
Table 1. Computed FID, MMD, PSNR, and SSIM metrics for augmented images.
TargetFIDMMDPSNRSSIM
Aircraft125.940.247919.65390.5994
Ship123.500.235816.02960.5956
Table 2. Composition of datasets and validation sets for YOLOv5 detection model.
Table 2. Composition of datasets and validation sets for YOLOv5 detection model.
SetAuthentic ImagesAugmented Images
150-
2-450
350450
Detection Images200-
Table 3. Influence of diverse training sets on the detection of authentic SSS target images.
Table 3. Influence of diverse training sets on the detection of authentic SSS target images.
GroupClassPrecisionRecallmAP0.5mAP0.5:0.95
YOLOv5-1ship90.1%93.8%0.9250.697
aircraft89.5%86.4%0.9110.653
all89.8%90.1%0.9180.675
YOLOv5-2ship92%93.6%0.9550.824
aircraft98.7%95%0.9460.743
all95.3%94.3%0.9510.784
YOLOv5-3ship96.8%96.1%0.9610.879
aircraft98%94.1%0.9480.733
all97.4%95.1%0.9550.806
Table 4. Quantification of visual parameters in ablation experiments.
Table 4. Quantification of visual parameters in ablation experiments.
SetCBAM1CBAM2FIDMMDPSNRSSIM
1--155.690.258514.63730.5896
2-174.580.222115.34560.5948
3-151.720.223915.56940.5934
4123.500.235816.02960.5956
Table 5. Precision comparison of different models.
Table 5. Precision comparison of different models.
Detection Model123
YOLOv5n88.1%96.8%85.1%
YOLOv5s88.2%97.2%95.1%
YOLOv5m89.7%97.1%94.8%
YOLOv5l89.8%95.3%97.4%
Table 6. Calculations of FID, MMD, PSNR, and SSIM metrics for three algorithms to generate augmented images.
Table 6. Calculations of FID, MMD, PSNR, and SSIM metrics for three algorithms to generate augmented images.
AlgorithmFIDMMDPSNRSSIM
basic GAN165.160.289914.06370.5940
WGAN155.690.258514.63730.5896
CC-WGAN123.500.235816.02960.5956
Table 7. Performance evaluation experiments for target detection based on images from three generation algorithms.
Table 7. Performance evaluation experiments for target detection based on images from three generation algorithms.
AlgorithmClassPrecisionRecallmAP0.5mAP0.5:0.95
basic GANship84.2%92.7%0.9480.624
aircraft84.9%81.2%0.8640.515
all84.5%87%0.9060.57
WGANship94.6%87.4%0.9540.636
aircraft96.7%66.7%0.8780.598
all95.6%77%0.9160.617
CC-WGANship96.8%96.1%0.9610.879
aircraft98%94.1%0.9480.733
all97.4%95.1%0.9550.806
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, J.; Li, H.; Qing, P.; Hou, J.; Peng, Y. Side-Scan Sonar Image Augmentation Method Based on CC-WGAN. Appl. Sci. 2024, 14, 8031. https://doi.org/10.3390/app14178031

AMA Style

Zhu J, Li H, Qing P, Hou J, Peng Y. Side-Scan Sonar Image Augmentation Method Based on CC-WGAN. Applied Sciences. 2024; 14(17):8031. https://doi.org/10.3390/app14178031

Chicago/Turabian Style

Zhu, Junhui, Houpu Li, Ping Qing, Jiaxin Hou, and Ye Peng. 2024. "Side-Scan Sonar Image Augmentation Method Based on CC-WGAN" Applied Sciences 14, no. 17: 8031. https://doi.org/10.3390/app14178031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop