Side-Scan Sonar Image Generator Based on Diffusion Models for Autonomous Underwater Vehicles
Abstract
:1. Introduction
- We established an SSS data collection platform based on an AUV and collected a large number of raw sonar data from different marine areas. We developed a nonlinear gain enhancement algorithm suitable for compensating for spherical wave propagation loss, achieving balanced sonar image processing and improving the quality of SSS imaging.
- We created a five-category SSS image dataset that includes common seabed backgrounds and targets. Based on this dataset, we established a controllable category SSS image generator that can generate images of specified categories without relying on additional classifiers, effectively expanding the SSS image dataset.
- We conducted both quantitative and qualitative evaluations of the SSS image generator, using the FID, the IS, and Haralick texture features to assess the model. The generative model was also applied to the task of target detection, and the cross-validation results demonstrate its positive impact on improving detection accuracy, providing data support for subsequent SSS image-based research.
2. Related Work
2.1. Basic Image Generation Model
2.2. SSS Image Generation Model
3. Materials and Methods
3.1. Data Acquisition and Processing
3.2. SSS Diffusion Model
3.2.1. Target Modeling
3.2.2. Parameter Estimation Network
4. Experiments and Analysis
4.1. SSS Imaging Results
4.2. SSS Image Generation
4.3. SSS Image Detection
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
SSS | side-scan sonar |
AUV | Autonomous Underwater Vehicle |
FID | Fréchet Inception Distance |
IS | Inception Score |
Sonar | Sound Navigation and Ranging |
USBL | Ultra-Short Baseline |
VAEs | Variational Autoencoders |
GANs | Generative Adversarial Networks |
KL | Kullback–Leibler |
VQVAE | Vector-Quantized VAE |
LSGAN | Least Squares GAN |
RNNs | Recurrent Neural Networks |
LSTM | Long Short-Term Memory |
MFSANet | multi-feature fusion self-attention network |
SSAM | Simplified Self-Attention Module |
GPS | Global Positioning System |
DDPM | denoising diffusion probabilistic model |
XTF | Extended Triton Format |
GLCM | Gray-Level Co-occurrence Matrices |
MDS | Multidimensional Scaling |
SOTA | state-of-the-art |
NMS | Non-Maximum Suppression |
References
- Wang, Y.; Chu, H.; Ma, R.; Bai, X.; Cheng, L.; Wang, S.; Tan, M. Learning-Based Discontinuous Path Following Control for a Biomimetic Underwater Vehicle. Research 2024, 7, 0299. [Google Scholar] [CrossRef]
- Huy, D.Q.; Sadjoli, N.; Azam, A.B.; Elhadidi, B.; Cai, Y.; Seet, G. Object perception in underwater environments: A survey on sensors and sensing methodologies. Ocean Eng. 2023, 267, 113202. [Google Scholar] [CrossRef]
- Xu, S.; Zhang, M.; Song, W.; Mei, H.; He, Q.; Liotta, A. A systematic review and analysis of deep learning-based underwater object detection. Neurocomputing 2023, 527, 204–232. [Google Scholar] [CrossRef]
- Burguera, A.; Oliver, G. High-resolution underwater mapping using side-scan sonar. PLoS ONE 2016, 11, e0146396. [Google Scholar] [CrossRef]
- Fallon, M.F.; Kaess, M.; Johannsson, H.; Leonard, J.J. Efficient AUV navigation fusing acoustic ranging and side-scan sonar. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 2398–2405. [Google Scholar]
- Coiras, E.; Petillot, Y.; Lane, D.M. Multiresolution 3-D reconstruction from side-scan sonar images. IEEE Trans. Image Process. 2007, 16, 382–390. [Google Scholar] [CrossRef] [PubMed]
- Rhinelander, J. Feature extraction and target classification of side-scan sonar images. In Proceedings of the 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6–9 December 2016; pp. 1–6. [Google Scholar]
- de Souza, L.A.P.; Azevedo, A.A.; da Silva, M. Side Scan Sonar Applied to Water Reservoir. In Proceedings of the 2013 IEEE/OES Acoustics in Underwater Geosciences Symposium, Rio de Janeiro, Brazil, 24–26 July 2013; pp. 1–7. [Google Scholar]
- Tang, Y.; Wang, L.; Jin, S.; Zhao, J.; Huang, C.; Yu, Y. AUV-based side-scan sonar real-time method for underwater-target detection. J. Mar. Sci. Eng. 2023, 11, 690. [Google Scholar] [CrossRef]
- Yan, X.; Yang, J.; Sohn, K.; Lee, H. Attribute2image: Conditional image generation from visual attributes. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part IV 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 776–791. [Google Scholar]
- Xu, M.; Yoon, S.; Fuentes, A.; Park, D.S. A comprehensive survey of image augmentation techniques for deep learning. Pattern Recognit. 2023, 137, 109347. [Google Scholar] [CrossRef]
- Ehrhardt, J.; Wilms, M. Autoencoders and variational autoencoders in medical image analysis. In Biomedical Image Synthesis and Simulation; Elsevier: Amsterdam, The Netherlands, 2022; pp. 129–162. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Esser, P.; Rombach, R.; Blattmann, A.; Ommer, B. Imagebart: Bidirectional context with multinomial diffusion for autoregressive image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 3518–3532. [Google Scholar]
- Pinaya, W.H.L.; Vieira, S.; Garcia-Dias, R.; Mechelli, A. Autoencoders. In Machine Learning; Elsevier: Amsterdam, The Netherlands, 2020; pp. 193–208. [Google Scholar]
- Oord, A.V.D.; Vinyals, O.; Kavukcuoglu, K. Neural Discrete Representation Learning. Adv. Neural Inf. Process. Syst. 2017, 30, 6307–6316. [Google Scholar]
- Peng, J.; Liu, D.; Xu, S.; Li, H. Generating diverse structure for image inpainting with hierarchical VQ-VAE. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10775–10784. [Google Scholar]
- Farnia, F.; Ozdaglar, A. Do GANs always have Nash equilibria? In Proceedings of the International Conference on Machine Learning. PMLR, Virtual Event, 13–18 July 2020; pp. 3029–3039. [Google Scholar]
- Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 7354–7363. [Google Scholar]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
- Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8110–8119. [Google Scholar]
- Van Den Oord, A.; Kalchbrenner, N.; Kavukcuoglu, K. Pixel recurrent neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 19–24 June 2016; pp. 1747–1756. [Google Scholar]
- Van den Oord, A.; Kalchbrenner, N.; Espeholt, L.; Vinyals, O.; Graves, A. Conditional image generation with pixelcnn decoders. Adv. Neural Inf. Process. Syst. 2016, 29, 4797–4805. [Google Scholar]
- Salimans, T.; Karpathy, A.; Chen, X.; Kingma, D.P. Pixelcnn++: Improving the pixelcnn with discretized logistic mixture likelihood and other modifications. arXiv 2017, arXiv:1701.05517. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5999–6009. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Nichol, A.Q.; Dhariwal, P. Improved denoising diffusion probabilistic models. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 18–24 July 2021; pp. 8162–8171. [Google Scholar]
- Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar]
- Dhariwal, P.; Nichol, A. Diffusion models beat gans on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
- Zhang, L.; Rao, A.; Agrawala, M. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 3836–3847. [Google Scholar]
- Jiang, Y.; Ku, B.; Kim, W.; Ko, H. Side-scan sonar image synthesis based on generative adversarial network for images in multiple frequencies. IEEE Geosci. Remote Sens. Lett. 2020, 18, 1505–1509. [Google Scholar] [CrossRef]
- Bore, N.; Folkesson, J. Modeling and simulation of sidescan using conditional generative adversarial network. IEEE J. Ocean. Eng. 2020, 46, 195–205. [Google Scholar] [CrossRef]
- Song, Y.; He, B.; Liu, P.; Yan, T. Side scan sonar image segmentation and synthesis based on extreme learning machine. Appl. Acoust. 2019, 146, 56–65. [Google Scholar] [CrossRef]
- Wang, J.; Li, H.; Huo, G.; Li, C.; Wei, Y. Multi-modal multi-stage underwater side-scan sonar target recognition based on synthetic images. Remote Sens. 2023, 15, 1303. [Google Scholar] [CrossRef]
- Ge, Q.; Ruan, F.; Qiao, B.; Zhang, Q.; Zuo, X.; Dang, L. Side-scan sonar image classification based on style transfer and pre-trained convolutional neural networks. Electronics 2021, 10, 1823. [Google Scholar] [CrossRef]
- Xu, H.; Bai, Z.; Zhang, X.; Ding, Q. Mfsanet: Zero-shot side-scan sonar image recognition based on style transfer. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1503105. [Google Scholar] [CrossRef]
- Yang, Z.; Zhao, J.; Zhang, H.; Yu, Y.; Huang, C. A Side-Scan Sonar Image Synthesis Method Based on a Diffusion Model. J. Mar. Sci. Eng. 2023, 11, 1103. [Google Scholar] [CrossRef]
- Zhang, F.; Zhang, W.; Cheng, C.; Hou, X.; Cao, C. Detection of Small Objects in Side-Scan Sonar Images Using an Enhanced YOLOv7-Based Approach. J. Mar. Sci. Eng. 2023, 11, 2155. [Google Scholar] [CrossRef]
- Cheng, C.; Hou, X.; Wen, X.; Liu, W.; Zhang, F. Small-Sample Underwater Target Detection: A Joint Approach Utilizing Diffusion and YOLOv7 Model. Remote Sens. 2023, 15, 4772. [Google Scholar] [CrossRef]
- Zhang, P.; Tang, J.; Zhong, H.; Ning, M.; Liu, D.; Wu, K. Self-trained target detection of radar and sonar images using automatic deep learning. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–14. [Google Scholar] [CrossRef]
- Lee, B.; Ku, B.; Kim, W.; Kim, S.; Ko, H. Feature sparse coding with coordconv for side scan sonar image enhancement. IEEE Geosci. Remote Sens. Lett. 2020, 19, 1–5. [Google Scholar] [CrossRef]
- Guo, Z.; Liu, J.; Wang, Y.; Chen, M.; Wang, D.; Xu, D.; Cheng, J. Diffusion models in bioinformatics and computational biology. Nat. Rev. Bioeng. 2024, 2, 136–154. [Google Scholar] [CrossRef]
- Chen, S.; Sun, P.; Song, Y.; Luo, P. Diffusiondet: Diffusion model for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 19830–19843. [Google Scholar]
- Xia, B.; Zhang, Y.; Wang, S.; Wang, Y.; Wu, X.; Tian, Y.; Yang, W.; Van Gool, L. Diffir: Efficient diffusion model for image restoration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 13095–13105. [Google Scholar]
- Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-based generative modeling through stochastic differential equations. arXiv 2020, arXiv:2011.13456. [Google Scholar]
- Zhai, J.; Zhang, S.; Chen, J.; He, Q. Autoencoder and its various variants. In Proceedings of the 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Miyazaki, Japan, 7–10 October 2018; pp. 415–419. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical image computing and computer-assisted intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Zhang, B.; Sennrich, R. Root mean square layer normalization. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Rodríguez, O.C.; Silva, A.J.; Hughes, A.P.; Moreira, A.C. Underwater Sonar as a Ray Tracing Problem. In INCREaSE 2019: Proceedings of the 2nd International Congress on Engineering and Sustainability in the XXI Century, Faro, Portugal, 9–11 October 2019; Springer: Cham, Switzerland, 2020; pp. 255–264. [Google Scholar]
- Chong, M.J.; Forsyth, D. Effectively unbiased fid and inception score and where to find them. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6070–6079. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Haralick, R.M.; Shanmugam, K.; Dinstein, I.H. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, SMC-3, 610–621. [Google Scholar] [CrossRef]
- Partio, M.; Cramariuc, B.; Gabbouj, M.; Visa, A. Rock texture retrieval using gray level co-occurrence matrix. In Proceedings of the 5th Nordic Signal Processing Symposium, Trondheim, Norway, 4–7 October 2002; Volume 75. [Google Scholar]
- Hout, M.C.; Papesh, M.H.; Goldinger, S.D. Multidimensional scaling. Wiley Interdiscip. Rev. Cogn. Sci. 2013, 4, 93–103. [Google Scholar] [CrossRef] [PubMed]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo algorithm developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Background | Topography | Acoustic Interference | Submarine Line | Target |
---|---|---|---|---|
1317 | 342 | 402 | 1712 | 399 |
Scale | FID ↓ | IS ↑ | ||
---|---|---|---|---|
InceptionV3 | ResNet | InceptionV3 | ResNet | |
17.5257 | 27.7368 | 2.6612 | 2.5764 | |
17.5456 | 28.2699 | 2.8712 | 2.8424 | |
17.6471 | 29.3880 | 3.1743 | 3.1884 | |
17.5502 | 31.2412 | 3.4934 | 3.6395 | |
17.5741 | 32.4902 | 3.6551 | 3.9083 | |
17.7355 | 32.6387 | 3.7988 | 4.0885 | |
17.7328 | 33.2389 | 3.8120 | 4.1173 | |
18.2058 | 32.7953 | 3.8921 | 4.1597 | |
19.6987 | 33.3931 | 3.8971 | 4.2231 | |
20.0235 | 33.4207 | 3.9059 | 4.2294 | |
20.1990 | 32.9504 | 3.9128 | 4.2487 |
Dataset | Training Set | Test Set | Validation Set | |||
---|---|---|---|---|---|---|
Original | Generated | Original | Generated | Original | Generated | |
A | 0 | 1500 | 103 | 0 | 153 | 0 |
B | 143 | 0 | 103 | 0 | 153 | 0 |
C | 143 | 300 | 103 | 0 | 153 | 0 |
D | 143 | 500 | 103 | 0 | 153 | 0 |
E | 143 | 800 | 103 | 0 | 153 | 0 |
F | 143 | 1000 | 103 | 0 | 153 | 0 |
Dataset | YOLOv10n | YOLOv10s | YOLOv10m | YOLOv10b | YOLOv10l | YOLOv10x | Average |
---|---|---|---|---|---|---|---|
A | 0.629 | 0.597 | 0.625 | 0.633 | 0.669 | 0.602 | 0.626 |
B | 0.723 | 0.718 | 0.743 | 0.739 | 0.728 | 0.777 | 0.738 |
C | 0.785 | 0.759 | 0.793 | 0.777 | 0.761 | 0.781 | 0.776 |
D | 0.838 | 0.813 | 0.851 | 0.850 | 0.831 | 0.834 | 0.836 |
E | 0.757 | 0.812 | 0.788 | 0.805 | 0.805 | 0.800 | 0.795 |
F | 0.815 | 0.820 | 0.801 | 0.802 | 0.806 | 0.805 | 0.808 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, F.; Hou, X.; Wang, Z.; Cheng, C.; Tan, T. Side-Scan Sonar Image Generator Based on Diffusion Models for Autonomous Underwater Vehicles. J. Mar. Sci. Eng. 2024, 12, 1457. https://doi.org/10.3390/jmse12081457
Zhang F, Hou X, Wang Z, Cheng C, Tan T. Side-Scan Sonar Image Generator Based on Diffusion Models for Autonomous Underwater Vehicles. Journal of Marine Science and Engineering. 2024; 12(8):1457. https://doi.org/10.3390/jmse12081457
Chicago/Turabian StyleZhang, Feihu, Xujia Hou, Zewen Wang, Chensheng Cheng, and Tingfeng Tan. 2024. "Side-Scan Sonar Image Generator Based on Diffusion Models for Autonomous Underwater Vehicles" Journal of Marine Science and Engineering 12, no. 8: 1457. https://doi.org/10.3390/jmse12081457
APA StyleZhang, F., Hou, X., Wang, Z., Cheng, C., & Tan, T. (2024). Side-Scan Sonar Image Generator Based on Diffusion Models for Autonomous Underwater Vehicles. Journal of Marine Science and Engineering, 12(8), 1457. https://doi.org/10.3390/jmse12081457