Deep Learning Based Underwater Acoustic Target Recognition: Introduce a Recent Temporal 2D Modeling Method
Abstract
:1. Introduction
- We train models using time domain signals and time-frequency representations, obtain a network structure suitable for UATR, and analyze the performance and applicable scenarios of these two types of inputs.
- We adopt a recent temporal modeling method to transform the time-domain feature vectors of underwater acoustic signals extracted in 1D convolution into 2D tensors, and then use 2D convolution to further extract periodic characteristics. By adding Timesblocks to two excellent model structures, the models break through bottlenecks in the original structure’s recognition ability.
2. Methods
2.1. Temporal 2D-Variation Modeling
- Perform a fast Fourier transform (FFT) on the time series to convert it into a frequency domain sequence . Only retain the first half of the frequency domain sequence because the result obtained by FFT is middle-symmetric. The dimension of the original series is , where is the length of the time series and is the number of channels.
- Calculate the average amplitude of the frequency domain sequence for all channels. The first amplitude is set to 0 considering the characteristics of FFT.
- Assuming that the raw time series has types of periods, and each period has a different length, record the Top-k + 1 amplitudes and the corresponding positions . Some modifications are made here, which are summarized in subsequent experiments.
- The position is considered as the period length, and the number of periods in a signal sequence is , where is the rounding up operation, ensuring that all sampling points are counted.
- Normalize amplitudes using the softmax function to obtain weights that represent the importance of each period.
2.2. TimesNet and Timesblock
- Based on the temporal 2D-variation modeling mentioned in Section 2.1, k types of periods and weights of the input signal can be calculated.
- Transform the raw time series into a set of 2D tensors {}, where is the -th period length, indicating that each column contains the time points within one period; is the number of the -th period, representing that each row contains the time points at the same phase among different periods. To add, {} is often not an integer, which means that the number of sampling points in the last period is less than the period length. So, to obtain a complete 2D tensor, it is necessary to padd zero for most signal sequences before reshaping.
- Input these tensors into two inception blocks in series that contain multiple-scale convolutional kernels to extract feature maps of intraperiod and interperiod variations.
- Reshape the extracted feature maps back into 1D feature vectors, removing the previously filled tails.
- Calculate the weighted average feature vectors for all periods, with weights derived from the algorithm in the first step.
- The final feature vector, i.e., the output of one Timesblock, is obtained by adding the weighted average feature vector from the previous step as a residual to the original series.
3. Preparation
3.1. Data Source
3.2. Dataset Manufacture
3.3. Protocol
- ResNet is a very famous neural network, which responds well to degradation and greatly eliminates the difficulty of training neural networks with excessive depth by adding shortcuts. Activation defaults to using reLU.
- SE ResNet adds the squeezing-and-excitation (SE) module [37] to the initial residual block, which mainly includes two linear layers to calculate the weights of different channels to introduce the channel attention mechanism. Activation defaults to using reLU.
- CamResNet uses 1D convolution instead of linear layers to adapt the SE module, and goes further into the attention mechanism by adding a spatial attention module as an independent branch to the SE module to synthesize the signal characteristics in all channels. Activation defaults to using reLU.
- DenseNet is a variation of ResNet by converting skip-connection from addition to concatenation, which performs well on certain datasets. Every block in DenseNet contains three layers and uses an eLU for activation.
- MSRDN is composed of stacked multi-scale residual units that contain four parallel convolutional layers with different kernel sizes to generate and combine feature maps with multiple resolutions. A soft-threshold learning module is added to the top of the units to generate a threshold for every channel by nonlinear transformation and enhancing the effective channel components. The model uses siLU for activation.
- The backbone of each model is a stack of several convolutional blocks, the bottom layer is a convolutional block that adjusts the channel from 1 to the specified number, and the top layers contain one average-pooling, which significantly reduces the number of parameters in the linear layer and effectively prevents overfitting. One linear layer is placed at the end to output the probabilities.
rate = 0.001, epoch = 50, training:testing = 4:1.
4. Results and Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
UATR | underwater acoustic target recognition |
DNN | Deep Neural Network |
ML | machine learning |
DEMON | Detection of envelope modulation on noise |
LOFAR | Low-frequency analysis and recording |
CQT | Constant-Q transform |
MFCC | Mel frequency cepstral coefficients |
GFCC | Gammatone frequency cepstral coefficients |
SNR | signal-to-noise ratio |
STFT | short-time Fourier transform |
t-SNE | t-Distributed Stochastic Neighbor Embedding |
References
- Urick, R.J. Principles of Underwater Sound; McGraw-Hill Book, Co.: Los Angeles, CA, USA, 1983; 423p. [Google Scholar]
- Yang, J.; Yang, Y.; Li, Y.; Shi, L.; Yang, X. Subsea Broadband Reverberation Modeling and Simulation of High-speed Motion Sonar. J. Unmanned Undersea Syst. 2023, 31, 285–290. [Google Scholar] [CrossRef]
- Sun, R.; Shu, X.; Qu, D. Multipath Effect of Sonar Pulse Waveforms in Shallow Water. J. Sichuan Ordnance 2013, 34, 56–59. (In Chinese) [Google Scholar]
- Ranjani, G.; Sadashivappa, G. Analysis of Doppler Effects in Underwater Acoustic Channels using Parabolic Expansion Modeling. Int. J. Adv. Comput. Sci. Appl. 2019, 10. [Google Scholar] [CrossRef]
- Wang, N.; He, M.; Sun, J.; Wang, H.; Zhou, L.; Chu, C.; Chen, L. IA-PNCC: Noise processing method for underwater target recognition convolutional neural network. Comput. Mater. Contin. 2019, 58, 169–181. [Google Scholar] [CrossRef]
- Chen, C. The present situation and developing trend of target discrimination techniques. Technol. Acoust. 1999, 4, 185–188. [Google Scholar]
- Shiliang, F.; Shuanping, D.; Xinwei, L.; Ning, H.; Xiaonan, X. Development of Underwater Acoustic Target Feature Analysis and Recognition Technology. Bull. Chin. Acad. Sci. 2019, 34, 297–305. [Google Scholar]
- Aksuren, I.G.; Hocaoglu, A.K. Automatic target classification using underwater acoustic signals. In Proceedings of the 2022 30th Signal Processing and Communications Applications Conference (SIU), Safranbolu, Turkey, 15–18 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–4. [Google Scholar]
- Cheng, Y.; Qiu, J.; Liu, Z.; Li, H. Challenges and prospects of underwater acoustic passive target recognition technology. J. Appl. Acoust. 2019, 38, 653–659. (In Chinese) [Google Scholar]
- Luo, X.; Chen, L.; Zhou, H.; Cao, H. A Survey of Underwater Acoustic Target Recognition Methods Based on Machine Learning. J. Mar. Sci. Eng. 2023, 11, 384. [Google Scholar] [CrossRef]
- Tian, S.; Chen, D.; Wang, H.; Liu, J. Deep convolution stack for waveform in underwater acoustic target recognition. Sci. Rep. 2021, 11, 9614. [Google Scholar] [CrossRef]
- Yuan, F.; Ke, X.; Cheng, E. Joint Representation and Recognition for Ship-Radiated Noise Based on Multimodal Deep Learning. J. Mar. Sci. Eng. 2019, 7, 380. [Google Scholar] [CrossRef]
- Doan, V.S.; Huynh-The, T.; Kim, D.S. Underwater acoustic target classification based on dense convolutional neural network. IEEE Geosci. Remote Sens. Lett. 2020, 19, 1–5. [Google Scholar] [CrossRef]
- Zhang, Q.; Da, L.; Zhang, Y.; Hu, Y. Integrated neural networks based on feature fusion for underwater target recognition. Appl. Acoust. 2021, 182, 108261. [Google Scholar] [CrossRef]
- Xue, L.; Zeng, X.; Jin, A. A Novel Deep-Learning Method with Channel Attention Mechanism for Underwater Target Recognition. Sensors 2022, 22, 5492. [Google Scholar] [CrossRef]
- Wang, Q.; Zeng, X.; Wang, L.; Wang, H.; Cai, H. Passive moving target classification via spectra multiplication method. IEEE Signal Process. Lett. 2017, 24, 451–455. [Google Scholar] [CrossRef]
- Yang, J.; Yan, S.; Zeng, D.; Yang, B. Underwater time—Domain signal recognition network with improved channel attention mechanism. J. Signal Process. 2023, 39, 1025. (In Chinese) [Google Scholar]
- Ke, X.; Yuan, F.; Cheng, E. Underwater Acoustic Target Recognition Based on Supervised Feature-Separation Algorithm. Sensors 2018, 18, 4318. [Google Scholar] [CrossRef] [PubMed]
- Liu, J.; He, Y.; Liu, Z.; Xiong, Y. Underwater target recognition based on line spectrum and support vector machine. In Proceedings of the 2014 International Conference on Mechatronics, Control and Electronic Engineering (MCE-14), Shenyang, China, 29–31 August 2014; Atlantis Press: Amsterdam, The Netherlands, 2014. [Google Scholar] [CrossRef]
- Chen, Y.; Xu, X. The research of underwater target recognition method based on deep learning. In Proceedings of the 2017 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC), Xiamen, China, 22–25 October 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar] [CrossRef]
- Yue, H.; Zhang, L.; Wang, D.; Wang, Y.; Lu, Z. The classification of underwater acoustic targets based on deep learning methods. In Proceedings of the 2017 2nd International Conference on Control, Automation and Artificial Intelligence (CAAI 2017), Sanya, China, 25–26 June 2017; Atlantis Press: Amsterdam, The Netherlands, 2017. [Google Scholar]
- Wang, P.; Peng, Y. Research on feature extraction and recognition method of underwater acoustic target based on deep convolutional network. In Proceedings of the 2020 IEEE International Conference on Advances in Electrical Engineering and Computer Applications (AEECA), Dalian, China, 25–27 August 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
- Wang, B.; Zhang, W.; Zhu, Y.; Wu, C.; Zhang, S. An Underwater Acoustic Target Recognition Method Based on AMNet. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
- Zhu, P.; Zhang, Y.; Huang, Y.; Zhao, C.; Zhao, K.; Zhou, F. Underwater acoustic target recognition based on spectrum component analysis of ship radiated noise. Appl. Acoust. 2023, 211, 109552. [Google Scholar] [CrossRef]
- Feng, S.; Zhu, X. A Transformer-Based Deep Learning Network for Underwater Acoustic Target Recognition. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
- Lian, Z.; Xu, K.; Wan, J.; Li, G.; Chen, Y. Underwater acoustic target recognition based on gammatone filterbank and instantaneous frequency. In Proceedings of the 2017 IEEE 9th International Conference on Communication Software and Networks (ICCSN), Guangzhou, China, 6–8 May 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
- Wang, Y.; Zhang, H.; Xu, L.; Cao, C.; Gulliver, T.A. Adoption of hybrid time series neural network in the underwater acoustic signal modulation identification. J. Frankl. Inst. 2020, 357, 13906–13922. [Google Scholar] [CrossRef]
- Hsieh, T.Y.; Wang, S.; Sun, Y.; Honavar, V. Explainable multivariate time series classification: A deep neural network which learns to attend to important variables as well as time intervals. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Virtual, 8–12 March 2021. [Google Scholar]
- Liu, M.; Ren, S.; Ma, S.; Jiao, J.; Chen, Y.; Wang, Z.; Song, W. Gated transformer networks for multivariate time series classification. arXiv 2021, arXiv:2103.14438. [Google Scholar]
- Kamal, S.; Chandran, C.S.; Supriya, M.H. Passive sonar automated target classifier for shallow waters using end-to-end learnable deep convolutional LSTMs. Eng. Sci. Technol. Int. J. 2021, 24, 860–871. [Google Scholar] [CrossRef]
- Hu, G.; Wang, K.; Liu, L. Underwater acoustic target recognition based on depthwise separable convolution neural networks. Sensors 2021, 21, 1429. [Google Scholar] [CrossRef]
- Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems 30; NeurIPS: La Jolla, CA, USA, 2017. [Google Scholar]
- Santos-Domínguez, D.; Torres-Guijarro, S.; Cardenal-López, A.; Pena-Gimenez, A. ShipsEar: An underwater vessel noise database. Appl. Acoust. 2016, 113, 64–69. [Google Scholar] [CrossRef]
- Irfan, M.; Zheng, J.; Ali, S.; Iqbal, M.; Masood, Z.; Hamid, U. DeepShip: An underwater acoustic benchmark dataset and a separable convolution based autoencoder for classification. Expert Syst. Appl. 2021, 183, 115270. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Class | Total | ||||
---|---|---|---|---|---|
Cargo | Tanker | Tug | Passenger | ||
Duration (s) | 33,530 | 38,346 | 35,392 | 38,609 | 145,877 |
Number of Frames | 67,060 | 76,692 | 70,784 | 77,217 | 291,753 |
Number of Segments | 9580 | 10,956 | 10,112 | 11,031 | 41,679 |
Hyper-Parameters | Considered Values | Best Values |
Kernel Size 1D | 7, 11, 15, 19, 21, 23 | 21 |
Kernel Size 2D | 3, 5, 7, 9 | 7 |
Channels | 32, 64, 128, 196, 256 | 128 |
Number of Blocks nb | 2, 3, 4, 5 | 4 |
Hyper-Parameters | Selected Value | |
Multi-Scale Kernel mk | , | |
Stride s | 1, 2 for MSRDN | |
Max pooling | kernel size = 3, stride = 2 |
Input | Model | Accuracy (%) | Params (M) | Avg. Time (s) |
---|---|---|---|---|
T | ResNet | 93.76 | 16.09 | 0.0495 |
SE ResNet | 95.71 | 16.98 | 0.0506 | |
CamResNet | 93.53 | 136.75 | 0.1841 | |
MSRDN | 93.99 | 4.12 | 0.0432 | |
DenseNet | 91.47 | 62.42 | 0.0557 | |
T-F | ResNet | 96.33 | 37.11 | 0.4003 |
SE ResNet | 96.86 | 37.37 | 0.4032 | |
CamResNet | 95.72 | 352.93 | 1.3394 | |
MSRDN | 97.59 | 6.38 | 0.3531 | |
DenseNet | 93.32 | 656.90 | 0.4226 |
Model | Timesblock | Accuracy (%) | Params (M) | Avg. Time (s) |
---|---|---|---|---|
SE ResNet | - | 95.71 | 16.98 | 0.0506 |
k = 1, nk = 1 | 95.81 | 11.41 | 0.0585 | |
k = 1, nk = 2 | 96.62 | 14.23 | 0.0744 | |
k = 2, nk = 2 | 95.92 | 14.23 | 0.1110 | |
k = 3, nk = 2 | 96.10 | 14.23 | 0.1343 | |
MSRDN | - | 93.99 | 4.12 | 0.0432 |
k = 1, nk = 1 | 94.31 | 4.11 | 0.0422 | |
k = 1, nk = 2 | 87.27 | 6.37 | 0.0423 | |
k = 2, nk = 1 | 93.26 | 4.11 | 0.0431 | |
k = 2, nk = 2 | 93.33 | 6.37 | 0.0436 | |
k = 3, nk = 1 | 94.21 | 4.11 | 0.0432 | |
= 4, = 1 | 94.36 | 4.11 | 0.0433 |
True | Predicted | ||||
---|---|---|---|---|---|
Cargo | Tanker | Tug | Passenger | Recall | |
Cargo | 12,665 | 380 | 175 | 192 | 0.944 |
Tanker | 434 | 14,868 | 21 | 16 | 0.969 |
Tug | 139 | 14 | 13,879 | 125 | 0.980 |
Passenger | 169 | 160 | 148 | 14,967 | 0.969 |
Precision | 0.945 | 0.964 | 0.977 | 0.978 | 0.966 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, J.; Gao, W.; Ma, E.; Sun, X.; Ma, J. Deep Learning Based Underwater Acoustic Target Recognition: Introduce a Recent Temporal 2D Modeling Method. Sensors 2024, 24, 1633. https://doi.org/10.3390/s24051633
Tang J, Gao W, Ma E, Sun X, Ma J. Deep Learning Based Underwater Acoustic Target Recognition: Introduce a Recent Temporal 2D Modeling Method. Sensors. 2024; 24(5):1633. https://doi.org/10.3390/s24051633
Chicago/Turabian StyleTang, Jun, Wenbo Gao, Enxue Ma, Xinmiao Sun, and Jinying Ma. 2024. "Deep Learning Based Underwater Acoustic Target Recognition: Introduce a Recent Temporal 2D Modeling Method" Sensors 24, no. 5: 1633. https://doi.org/10.3390/s24051633
APA StyleTang, J., Gao, W., Ma, E., Sun, X., & Ma, J. (2024). Deep Learning Based Underwater Acoustic Target Recognition: Introduce a Recent Temporal 2D Modeling Method. Sensors, 24(5), 1633. https://doi.org/10.3390/s24051633