*Article* **A Novel Deep-Learning Method with Channel Attention Mechanism for Underwater Target Recognition**

**Lingzhi Xue, Xiangyang Zeng \* and Anqi Jin**

School of Marine Science and Technology, Northwestern Polytechnical University, Xi'an 710072, China; 2018100384@mail.nwpu.edu.cn (L.X.); jinaq@mail.nwpu.edu.cn (A.J.)

**\*** Correspondence: zenggxy@nwpu.edu.cn

**Abstract:** The core of underwater acoustic recognition is to extract the spectral features of targets. The running speed and track of the targets usually result in a Doppler shift, which poses significant challenges for recognizing targets with different Doppler frequencies. This paper proposes deep learning with a channel attention mechanism approach for underwater acoustic recognition. It is based on three crucial designs. Feature structures can obtain high-dimensional underwater acoustic data. The feature extraction model is the most important. First, we develop a ResNet to extract the deep abstraction spectral features of the targets. Then, the channel attention mechanism is introduced in the camResNet to enhance the energy of stable spectral features of residual convolution. This is conducive to subtly represent the inherent characteristics of the targets. Moreover, a feature classification approach based on one-dimensional convolution is applied to recognize targets. We evaluate our approach on challenging data containing four kinds of underwater acoustic targets with different working conditions. Our experiments show that the proposed approach achieves the best recognition accuracy (98.2%) compared with the other approaches. Moreover, the proposed approach is better than the ResNet with a widely used channel attention mechanism for data with different working conditions.

**Keywords:** feature extraction; target recognition; neural networks; underwater acoustic signals

## **1. Introduction**

The traditional methods of target recognition include feature extraction techniques based on mathematical modeling [1]. Using the entropy theory [2,3] as a feature to extract the radiation noise of a ship is one of the most common mathematical modeling methods. Additionally, a critical approach to recognition is to analyze the peaks of the spectrum to obtain the physical features, such as the propeller speed cavitation noise of the engine [4,5]. The spectrum will be distorted because of the Doppler effect when the ship moves toward the hydrophone receivers [6]. Wang proposes the multi-method spectra based on auditory feature extraction from the human ear and effectively extracts stable feature points under the Doppler effect [7]. Modeling the Doppler power spectrum of non-stationary underwater acoustic channels is another method to reduce the impact of the Doppler effect in underwater acoustic target recognition [8]. The information extracted by traditional methods is limited when the spectrum of signal changes with the Doppler effect. Li [9] uses the square root unscented Kalman filter to attenuate the Doppler phenomena in underwater acoustic signals.

Deep learning has an advantage in extracting the spectrum feature compared with the traditional method. However, it is often difficult to collect enough underwater acoustic signal data for training, which significantly limits the performance of deep neural networks in underwater target recognition. Nevertheless, researchers are still exploring the application of deep learning in underwater target recognition with the constraints of the available underwater acoustic data. Yang [10] et al. use deep auto-encoder networks combined with

**Citation:** Xue, L.; Zeng, X.; Jin, A. A Novel Deep-Learning Method with Channel Attention Mechanism for Underwater Target Recognition. *Sensors* **2022**, *22*, 5492. https:// doi.org/10.3390/s22155492

Academic Editors: Yuxing Li and Luca Fredianelli

Received: 6 July 2022 Accepted: 21 July 2022 Published: 23 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

long short-term networks to extract target features and set different gates according to the temporal characteristics of underwater acoustic to extract feature information effectively. Auto-encoder networks can downscale high-dimensional data to low-dimensional data while retaining sufficient feature information, but the number of parameters is enormous. The convolution neural network (CNN) method can significantly reduce the number of parameters compared with DNN. Hence, CNN is better for underwater acoustic signals with limited samples. Hu [11] uses CNN to reduce the number of parameters and obtain better experimental results. Wang [12] investigates the intrinsic mechanism of convolution networks for underwater acoustic signals and displays the relationship between the waveform of the original data and the convolution kernel. Hu builds an underwater acoustic recognition model based on separable convolution operations according to the information collection mechanism of the auditory system, which is the first time grouped convolution models are applied in underwater acoustic recognition. Tian [13] applies a deep convolution stack to optimize CNN networks, which solves the lack of depth and structural imbalance of the networks. However, the above CNN model extracts single-scale features with the fixed size of the convolution kernel, which lose a lot of feature information. Hong [14] proposes a deep convolution stack network with a multi-scale residual unit (MSRU) to extract multi-scale features while exploring using generative adversarial networks (GAN) to synthesize underwater acoustic waveforms. The method modifies two advanced GAN models and improves their performance. GAN network with generators model and adversaries model uses the idea of the game to optimize the network. The generators can generate underwater acoustic samples when reaching the Nash equilibrium between the generators and adversaries models. We propose [15] an underwater acoustic target recognition model based on GAN, optimizing the recognition model with two model adversaries. The experiment verifies the better recognition ability of GAN than the other networks with small samples. The number of neural network layers has increased due to the urgent need to identify underwater acoustic data under different spatial and temporal conditions. Doan [16] applies the dense convolutional neural network to identify the target class, which addresses the over-fitting problem in a deep convolutional neural network with a limited number of samples. Gao [17] increases the number of samples using the GAN model, extracting underwater acoustic features with small samples in deeper network layers. To solve the recognition problem with a limited number of samples in deep networks, He [18,19] first proposes a ResNet model in image recognition, which uses the residual function to eliminate the gradient disappearance effectively. Wu [20] conducts deeper research in terms of the depth and width of ResNet models. Liu [21] applies the ResNet model to the study of underwater acoustic signals and acquires good experimental results.

Hu [22] first proposes the SE (squeeze and excitation) network, which uses channel weighting to discriminate the importance of information in different channels of ResNet. This network is a channel attention mechanism approach that can assign weights to different channel information according to their effectiveness and effectively remove channels with similar features. The channel attention mechanism can again adaptively optimize the neural network models, and different channel attention mechanisms for different research objects are required. Because of the low-pass filtering properties of the underwater acoustic channel, the high-frequency spectrum of the signal is decreased when increasing the distance between the target signal and the hydrophone. So, the underwater acoustic signal contains the spectrum of low frequency and the continuous spectrum. The continuous spectrum contains the ocean background noise signal, and the spectrum of low frequency contains the ship's radiation noise, propeller noise, machine noise, and other hull self-noise. Yang [23,24] uses an auditory inspired for ship type classification. The core of the underwater acoustic target recognition method is to extract the low-frequency spectrum [25,26]. However, the distance changes between the target and the hydrophone lead to a Doppler shift, which makes the information in the low-frequency spectrum disappear. This paper designs a camResNet (ResNet with channel attention mechanism) model to extract the

low-frequency spectrum of underwater acoustic signals when the Doppler shift occurs. The channel attention mechanism of camResNet is divided into two parts. First, the signal channels are weighted by analysis of channel information. Second, the valid information points in each channel are extracted, and the complete information is weighted. weighted. This paper is organized as follows. Section 2 introduces the structure of the SE\_Res-Net network. Section 3 describes the details of the underwater acoustic target recognition method based on camResNet. Section 4 describes the experimental data and shows the

This paper designs a camResNet (ResNet with channel attention mechanism) model to extract the low-frequency spectrum of underwater acoustic signals when the Doppler shift occurs. The channel attention mechanism of camResNet is divided into two parts. First, the signal channels are weighted by analysis of channel information. Second, the valid information points in each channel are extracted, and the complete information is

*Sensors* **2022**, *22*, x FOR PEER REVIEW 3 of 20

This paper is organized as follows. Section 2 introduces the structure of the SE\_ResNet network. Section 3 describes the details of the underwater acoustic target recognition method based on camResNet. Section 4 describes the experimental data and shows the experimental results. Section 5 concludes the advantages and disadvantages of the proposed method. experimental results. Section 5 concludes the advantages and disadvantages of the proposed method. **2. Structure of ResNet**

The ResNet model deals with network degradation caused by network layer deep-

#### **2. Structure of ResNet** ening using residual learning methods. Hong [27] studied the characteristics of underwa-

The *w*1 *w<sup>N</sup>*

The ResNet model deals with network degradation caused by network layer deepening using residual learning methods. Hong [27] studied the characteristics of underwater acoustic signals and increased the recognition rate with an 18-layer residual network (ResNet18), which contains an embedding layer. ter acoustic signals and increased the recognition rate with an 18-layer residual network (ResNet18), which contains an embedding layer. The ResNet model consists of many residual modules; the input of the modules is x, and the output of the convolutional structure of multi-layer stacking is *H*(*x*), called the

The ResNet model consists of many residual modules; the input of the modules is *x*, and the output of the convolutional structure of multi-layer stacking is *H*(*x*), called the learned features. The learned features are difficult to optimize by backward gradient propagation with a network having too many layers, even if the nonlinear activation function performs very well. He finds that function *F*(*x*) = *H*(*x*) − *x*, called the residual function, is easier to optimize *H*(*x*). The output of residual modules is the complex feature function *F*(*x*) + *x*, which is the residual function learned by the network summed with the original signal, and the output of residual modules is the input of the following residual modules. Figure 1 shows the architecture of the ResNet model, in which *H*(*x*) is the residual function, and the mathematical expression is defined as learned features. The learned features are difficult to optimize by backward gradient propagation with a network having too many layers, even if the nonlinear activation function performs very well. He finds that function *F*(*x*) = *H*(*x*) − *x* , called the residual function, is easier to optimize *H*(*x*). The output of residual modules is the complex feature function *F*(*x*) + *x* , which is the residual function learned by the network summed with the original signal, and the output of residual modules is the input of the following residual modules. Figure 1 shows the architecture of the ResNet model, in which *H*(*x*) is the residual function, and the mathematical expression is defined as ( ) ( ( ( ( )))) <sup>1</sup> <sup>1</sup> *H x x w w w x* = + *<sup>N</sup> N*− (1)

$$H(\mathbf{x}) = \mathbf{x} + w\_N \delta(w\_{N-1}(\delta(\dots \delta(w\_1 \mathbf{x})))) \tag{1}$$

The *w*<sup>1</sup> · · · *w<sup>N</sup>* in this equation denotes the weight of each module in the residual network. The function for *x* mathematical expression is defined as network. The function for *x* mathematical expression is defined as *w w w x H x <sup>N</sup> <sup>N</sup>* = + <sup>−</sup> ( ( ( ( ( ))))) <sup>1</sup> ( ) 1 1 (2)

$$\frac{\partial H(\mathbf{x})}{\partial \mathbf{x}} = 1 + \frac{\partial (w\_N \delta(w\_{N-1}(\delta(\cdot \cdots \delta(w\_1 \mathbf{x})))))}{\partial \mathbf{x}} \tag{2}$$

The first term of Equation (2) equals 1, and the second term is the gradient value of the weight function to *x*. Since it contains 1, the function *<sup>∂</sup>H*(*x*) *<sup>∂</sup><sup>x</sup>* will not equal 0, even if the second term is small. the weight function to *x* . Since it contains 1, the function *x H x* ( ) will not equal 0, even if the second term is small.

**Figure 1.** The architecture of the ResNet model. **Figure 1.** The architecture of the ResNet model.
