Design of Siamese Network for Underwater Target Recognition with Small Sample Size

Liu, Dali; Shen, Wenhao; Cao, Wenjing; Hou, Weimin; Wang, Baozhu

doi:10.3390/app122010659

Open AccessArticle

Design of Siamese Network for Underwater Target Recognition with Small Sample Size

by

Dali Liu

^1,*

,

Wenhao Shen

¹,

Wenjing Cao

¹,

Weimin Hou

^2,* and

Baozhu Wang

²

¹

School of Electronics and Information Engineering, Tiangong University, Tianjin 300387, China

²

School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang 050018, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2022, 12(20), 10659; https://doi.org/10.3390/app122010659

Submission received: 5 October 2022 / Revised: 16 October 2022 / Accepted: 19 October 2022 / Published: 21 October 2022

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The acquisition of target data for underwater acoustic target recognition (UATR) is difficult and costly. Although deep neural networks (DNN) have been used in UATR, and some achievements have been made, the performance is not satisfactory when recognizing underwater targets with different Doppler shifts, signal-to-noise ratios (SNR), and interferences. On the basis of this, this paper proposed a Siamese network with two identical one-dimensional convolutional neural networks (1D-CNN) that recognize the detection of envelope modulation on noise (DEMON) spectra of underwater target-radiated noise. The parameters of underwater samples were diverse, but the states of the collected samples were very homogeneous. Traditional underwater target recognition uses multi-state samples to train the network, which is costly. This article trained the network using samples from a single state. The expectation was to be able to identify samples with different parameters. Datasets of targets with different Doppler shifts, SNRs, and interferences were designed to evaluate the generalization performance of the proposed Siamese network. The experimental results showed that when recognizing samples with Doppler shifts, the classification accuracy of the proposed network reached 95.3%. For SNRs, the classification accuracy reached 85.5%. The outstanding generalization ability of the proposed model shows that it is suitable for practical engineering applications.

Keywords:

small-scale sample; DEMON spectrum analysis; Siamese network; deep learning; contrast loss function

1. Introduction

As an important branch of general-purpose marine key technologies, various UATR systems have emerged in various countries around the world. Underwater target recognition is widely used in marine geological reconnaissance, underwater defense, and other fields [1]. Optimizing the sonar signal collection capability and improving the hydroacoustic signal processing and feature extraction methods, and thus enhancing the system target recognition accuracy, have been the main goals of researchers [2]. The traditional underwater target recognition system only performs simple processing of the sonar collection signal, and then the sonar operator judges the recognition of the characteristics of the hydroacoustic signal (timbre, fluctuation, and beat) based on his personal experience, which brings up the problems of low efficiency and accuracy that have plagued a generation of hydroacoustic scientists. UATR is divided according to the medium: optical imaging and sonar [3]. Sonar recognition is either active or passive and is used to identify objects at close range, but a complex environment results in low image contrast. In active sonar recognition, a sonar waveform signal is transmitted into the water and reflected from the target; then, the sonar system receives and processes the echoes. By contrast, passive sonar receives noise radiated by a target and then extracts the feature information for target recognition. Due to ship-radiated noise containing important, specific information, passive sonar is also widely used [4].

Underwater acoustic signals received by passive sonar contain considerable noise. Therefore, these signals need to be analyzed to extract their features. The DEMON method extracts the invariant features of a target from its radiated noise and is widely employed in passive sonar recognition [5]. After the features of line spectra in the target-radiated noise are analyzed, the target is recognized from the result.

In the recent years, as the most popular deep-learning model, the deep neural network (DNN) has attracted the interest of scholars in the field of UATR [6]. Yang et al. combined the auditory perception principle and CNN to propose an auditory perception-inspired deep convolutional neural network (ADCNN) [7], which used a CNN to extract features of different frequency components from signals and merged them at the fusion layer to achieve the classification of acoustic targets. Choi et al. used the absolute values of matrix elements in the cross-spectrum density matrix (CSDM) to generate two additional matrices as input data and used them for training the CNN model [8,9]. Cao et al. combined the CNN architecture with a second-order pooling (SOP) and used a convolution layer to learn the local features of the data extracted by constant-Q transform (CQT), achieving an end-to-end network to accomplish the classification of underwater targets. Zhou et al. proposed a compound convolutional neural network based on the shared latent sparse (SLS) feature and deep belief network (DBN) [10,11], using these two functions to learn about fringe-based sonar images to improve the accuracy of classification. Chen et al. proposed a method based on a convolutional neural network with residual units to recognize a time-frequency image of ship-radiated noise [12]. Wang et al. combined improved antinoise power-normalized cepstral coefficients (ia-PNCC) with a CNN and applied multitaper and normalized Gammatone filter banks to improve the antinoise capacity [13,14,15,16]. Complex-valued neural networks can process the amplitude and phase of the spectrum simultaneously and have been used to process and analyze wireless, image, and audio signals. They have the potential to be used for multitarget ship recognition. At the same time, the multilayer neural network and robust adaptive controller also optimizes the target-tracking control of the underdriven autonomous underwater vehicles (AUV) [17,18], improving the anti-interference ability of underwater target recognition in the face of complex marine environments. Multidimensional fusion networks and the support vector machine (SVM) are also used for UATR tasks.

This study applied the DEMON method to analyze underwater target-radiated noise to obtain line spectra containing the features to be used in recognizing a target. Due to the variability of the underwater environment, collecting representative high-quality data is challenging. A huge number of samples are necessary for a traditional deep neural network to do this. Therefore, the need for a network model that can operate from a small sample has become urgent. In practical applications, Doppler shifts [19], due to the relative velocity between the sonar platform and the target, cannot be ignored. Moreover, the distance between the platform and target and the environmental noise affects the signal-to-noise ratio (SNR). Due to environmental or human interferences, redundant spectrum lines may appear in the DEMON spectrum of the received radiated noise, or some spectrum lines may weaken or even disappear, so using a neural network requires excellent generalization.

A Siamese network, used for classification and recognition of problems with small sample sizes, is a promising candidate. Zagoruyko et al. used one for face verification and recognition [20]. The Siamese network is widely used for identification and classification problems with small samples. Yuan et al. effectively extracted a few samples’ classification features effectively through this network and determined the relationships between the features. Lee proposed a one-shot Siamese network called Siam-OS for fast and effective tracking of visual objects [21,22].

In this paper, we designed a network model for measuring the similarity of underwater target samples based on twin network architecture and 1D-CNN. The model was trained with simulated virtual samples, and the trained model calculated the similarity between two samples to be measured, then determined whether the two samples were the same class of targets, which effectively solved the problem of small samples of underwater targets being difficult to identify. Firstly, the modeling and feature extraction methods of underwater target radiation noise were thoroughly studied in this paper, and seven types of underwater target samples were designed for the training of the network model, based on real data. Secondly, in order to evaluate the generalization ability of the designed network model, samples with different Doppler frequency bias samples, different signal-to-noise ratios, and a different number of interference spectral lines were designed, based on the above seven types of underwater target samples as the evaluation dataset of the network model.

In the training and evaluation process, positive and negative sample pairs were input to the network, and the network extracted features of the sample pairs, calculated Euclidean distances, and achieved satisfactory target classification and recognition. The effects of Doppler shift, signal-to-noise ratio and mutual interference on the network were obtained by evaluating the network performance [23,24]. Finally, in order to test the recognition capability of the network model, the network model was tested using simulation samples and fewer real underwater target samples.

Traditional methods use multiple classes and a large number of samples to train the network. The parameters of underwater samples are diverse, but the state of the collected samples is very homogeneous. This article used a single state to train the network to recognize samples with different states and achieved better recognition results.

2. Model of Ship-Radiated Noise and Sample Generation

Due to the cost of underwater target data acquisition, this study developed, trained, and evaluated a Siamese network using simulated data. We achieved high recognition accuracy for a small number of real samples, but due to the small number and variety of real samples, we did not count and show them. A model of underwater ship-radiated noise was built, and then the radiated noise was processed to obtain its DEMON spectra. DEMON spectra were used as the samples to generate the datasets for evaluating the Siamese network.

2.1. Ship-Radiated Noise and DEMON Processing

Ship-radiated noise is a non-stationary random signal, and the power spectra consist of discrete and continuous spectra. Line spectra generated by periodic vibration of the ship contain the key features of the ship. The signals of line spectra are composed of a series of sine waves that can be expressed as

S (t) = [1 + \sum_{i = 1}^{M} A_{i} \sin (2 π f_{i} t + Φ_{i})] \cdot \cos ω t

(1)

where

M

is the number of line spectra,

f_{i}

,

Φ_{i}

, and

A_{i}

are the frequency, the initial phase, and the amplitude of the

i - t h

sinusoidal signal, respectively, and ω is the carrier frequency.

Considering the underwater environmental noise, the model of ship-radiated noise L(t) can be written as

L (t) = S (t) + N (t)

(2)

where

N (t)

is the underwater environmental noise.

The DEMON method is a classical method in feature extraction for demodulating broadband signals to obtain low-frequency spectra. Unchanging characteristics, such as the propeller’s shaft frequency, blade number, and paddle frequency, are extracted by further detection [25]. The process of the DEMON method is shown in Figure 1.

To facilitate the analysis,

S (t)

was assumed to have one component, which can be expressed as

S^{'} (t) = A (1 + b \sin Ω t) \cos ω t

(3)

where

A

is the amplitude of the signal,

b

is the modulation factor, and

0 < b < 1

Ω

is the frequency of the modulated signal.

To obtain the frequency of modulation signal

\sin Ω t

, signal

S^{'} (t)

was processed by square value demodulation, and

{[S^{'} (t)]}^{2}

can be written as

\begin{matrix} {[S^{'} (t)]}^{2} & = A^{2} {(1 + b \sin Ω t)}^{2} \cdot \cos^{2} ω t \\ = A^{2} \cos^{2} ω {t + 2 bA}^{2} \cos^{2} ω t \sin Ω t + b^{2} A^{2} \cos^{2} ω t \sin^{2} Ω t \\ = \frac{A^{2}}{2} + \frac{A^{2}}{2} \cos 2 ω t + b A^{2} \sin Ω t + \frac{b A^{2}}{2} \sin (Ω + 2 ω) t + \frac{b A^{2}}{2} \sin (Ω - 2 ω) t + \frac{b^{2} A^{2}}{4} \\ - \frac{b^{2} A^{2}}{4} \cos 2 Ω t - \frac{b^{2} A^{2}}{8} \cos (2 Ω + ω) t + \frac{b^{2} A^{2}}{4} \cos 2 ω t - \frac{b^{2} A^{2}}{4} \cos (2 Ω - 2 ω) t \end{matrix}

(4)

The cutoff frequency F_lpf of the low-pass filter satisfied Equation (5). Equation (5) can be written as

2 Ω < F_{l p f} < 2 ω - Ω

(5)

After low-pass filtering, the signal

X (t)

contained the low-frequency modulated signal, which can be written as

X (t) = \frac{A^{2}}{2} + \frac{b^{2} A^{2}}{4} + {bA}^{2} \sin Ω t - \frac{b^{2} A^{2}}{4} \cos 2 Ω t

(6)

The discrete form of signal

X (t)

is expressed as

x (n)

. The discrete Fourier transform of

x (n)

can be written as

F (k) = \sum_{n = 0}^{N - 1} x (n) e^{- \frac{2 π i}{N} k n} k = 0, \dots, N - 1

(7)

F (k)

is the DEMON spectrum of ship-radiated noise, and it is also the sample for the neural network.

2.2. Design of Samples and Datasets

2.2.1. Sample Design

Taking typical sonar as an example, the sampling rate of the ship-radiated noise was set to 44.1 kHz. After band-pass filtering, square law detection, and low-pass filtering, the received signal was down-sampled 50 times, reducing the sampling rate to 882 Hz. The DEMON spectra were obtained by Fourier transform, and the amplitude was normalized between 0 and 1. Finally, samples of 2048 data points were obtained.

Dataset A was comprised of 21,000 standard targets of 7 types and was used for neural network training. To evaluate the effects of Doppler shifts, SNRs, and interference on network performance, samples with different parameters were designed based on the standard samples.

1.: Generation of the dataset with different Doppler shifts (dataset B)

When a target is moving, the ship-radiated noise received by sonar has a Doppler shift. Ship-radiated noise with a Doppler shift can be written as

S_{1} (t) = S (λ t)

(8)

where

λ

is the scaling factor and

λ = c / (c + v)

,

c

is the speed of sound in water, and v is the relative velocity between the sonar and target. The value of

v / c

was adjusted between −0.02 and 0.02 in 0.001 steps so that 41 types of ship-radiated noise with Doppler shifts were generated. These samples constituted dataset B, where there were 7 types of targets, with 500 samples for each target.

2.: Generation of the dataset with different SNRs (dataset C)

The SNR of ship-radiated noise signals varies with changes in the target distance and underwater environmental noise. Ship-radiated noise with various SNRs is written as

S_{2} (t) = S (t) + μ N (t)

(9)

where

N (t)

is Gaussian white noise. The value of

μ

was adjusted between 1.0 and 4.0 in 0.2 steps, and then 16 types of ship-radiated noise with different SNRs were generated. These samples constituted dataset C, which had 7 types of targets, with 500 samples for each target.

3.: Generation of the dataset with different interference (dataset D)

When there is environmental or human interference, some redundant line spectra may appear, or some primary line spectra may weaken and even disappear. This increases the difficulty of feature extraction. The number of line spectra

M

was adjusted between

M - 3

and

M + 3

in steps of 1 so that 7 types of ship-radiated noise with different numbers of line spectra were generated. These samples constituted dataset D, and the target category number and sample number were the same as datasets B and C.

Datasets E, F, and G were generated to test the performance of a neural network in recognizing new samples. The samples for datasets E, F, and G were derived from 10 types of new targets with different Doppler shifts, SNRs, and interference.

Typical DEMON spectrums of samples with different Doppler shifts, SNRs, and in-terferences are show in Figure 2. In Figure 2, the vertical axis represents the normalized value, and the horizontal axis represents the frequency bin. The specific role of normalization is to generalize the statistical distributivity of a uniform sample. Because the sampling rate was 882 Hz, and the number of the frequency bin was 2048, each frequency bin was 0.43 Hz.

Dataset A was used for network training, and datasets B, C, and D were used to evaluate the network performance for Doppler shifts, SNRs, and interference. Datasets E, F, and G with 10 types of new targets were used for network performance testing. The detailed information on datasets is shown in Table 1.

2.2.2. Generation of Positive and Negative Sample Pairs

A pair of samples was used as training data for a Siamese network. The difference between them was obtained, making it possible to recognize targets from a small sample.

When the sample pairs were from the same target, they were called positive sample pairs. There were 3000 samples in dataset A for each type of target. According to mathematical knowledge, combined formulas can be written as

C_{m}^{n} = \frac{m!}{n! (m - n)!}

(10)

where

m

is the total number of samples, and

n

represents the number of samples arbitrarily selected from

m

. Positive sample pairs were formed by randomly selecting two samples from the 3000 samples, so the number of positive sample pairs was

N_{A}^{+} = 7 \cdot C_{3000}^{2}

. The training dataset AA+ for the Siamese network was generated by selecting 7000 random pairs from

N_{A}^{+}

.

There were 500 samples in datasets B, C, and D for each type. Their positive sample pairs were generated in the same way as dataset A. Consequently, the number of positive sample pairs was

N_{B}^{+} = 41 \cdot 7 \cdot C_{500}^{2}

,

N_{C}^{+} = 16 \cdot 7 \cdot C_{500}^{2}

, and

N_{D}^{+} = 7 \cdot 7 \cdot C_{500}^{2}

. Datasets BB+, CC+, and DD+ were generated by selecting 500 pairs for N_B⁺, N_C⁺, and N_D⁺.

In the same way as datasets B, C, and D, the number of positive sample pairs for datasets E, F, and G were

N_{E}^{+} = 41 \cdot 10 \cdot C_{500}^{2}

,

N_{F}^{+} = 16 \cdot 10 \cdot C_{500}^{2}

, and

N_{G}^{+} = 7 \cdot 10 \cdot C_{500}^{2}

, respectively. Datasets EE+, FF+, and GG+ were generated by selecting 1000 pairs for each type.

Similarly, when the sample pairs were from different targets, they were called negative sample pairs. Datasets AA−, BB−, CC−, DD−, EE−, FF−, and GG− were generated by the same method.

Positive sample pairs were labeled “1”, and negative sample pairs were labeled “0.” The numbers of positive and negative sample pairs for training, validation, and testing are shown in Table 2.

3. Design of the Siamese Network

Siamese networks can evaluate the similarity between two samples and are based on metric learning. Unlike traditional machine learning schemes, Siamese networks can recognize new targets from a small sample after being trained by a large number of other samples [26].

The architecture of the proposed Siamese network is shown in Figure 3 and has two parts. One is feature extraction, which consists of two convolutional neural networks (CNNs) with shared weights. Compared to recurrent neural networks (RNN) and artificial neural networks (ANN), since the feature detection layer of CNN learns through the training data, it learns implicitly from the training data. CNN is more suitable for UATR. When evaluating network performance, in order to understand it more comprehensively, an MLP network was designed to be compared with CNN. The other part is a similarity calculator, which calculates the Euclidean distance between the two outputs of the Siamese sub-networks. Samples X1 and X2 are an input sample pair, which can be positive or negative. The output indicates the difference between the two samples.

The Euclidean distance between two samples can be written as

D_{w} (X 1, X 2) = ∥ G_{w} (X 1) - G_{w} (X 2) ∥

(11)

where

G_{ω} (X 1)

is the eigenvector of sample

X 1

,

G_{ω} (X 2)

is the eigenvector of sample

X 2

, and

D_{ω} (X 1, X 2)

represents the similarity between sample

X 1

and sample

X 2

. In principle, the larger

D_{ω} (X 1, X 2)

is, the less the similarity between samples

X 1

and

X 2

; the smaller

D_{ω} (X 1, X 2)

is, the more the similarity [27].

Next, the Siamese network is trained using a contrastive loss function to extract differentiated features. Raia Hadsell et al. first proposed the contrastive loss function to determine whether two inputs were positive or negative [28].

A dataset with

L

sample pairs is

\{({(x_{1}, x_{2})}^{l}, y_{l})\}, l = 1, 2, \dots, L

, where

{(x_{1}, x_{2})}^{l}

is the

l - t h

sample pair and

y_{l}

is the label of the sample pair. If the sample pair was positive,

y_{l} = 1

; otherwise,

y_{l} = 0

. The contrastive function is expressed as

L o s s = \frac{1}{2 L} \sum_{l = 1}^{L} y_{l} d_{l}^{2} + (1 - y_{l}) \max {(m a r g i n - d_{l}, 0)}^{2}

(12)

where

d

is the Euclidean distance, and margin is the threshold. According to (10), d is minimized if samples

X 1

and

X 2

are a positive pair and maximized if

X 1

and

X 2

are a negative pair. This makes the differences between the extracted features more significant.

A CNN is a feed-forward neural network often employed in image feature extraction and classification [29]. The architecture of the Siamese network proposed in this paper is shown in Figure 4. Network parameters, such as convolution layers and kernels, were all optimized by exhaustive search.

In this study, the inputs of the two identical sub-networks were a pair of samples, each with 2048 × 1 data. More specifically, through multiple convolutions, pooling, and fully connected inputs, output vectors were obtained to calculate the Euclidean distance, and the contrastive loss function determined whether the input pair was positive.

The Siamese network with CNN had three convolution layers, with four convolution kernels per layer. The size of convolution kernels was five. The output parameters and dimensions of each layer are shown in Table 3.

When evaluating network performance, in order to understand it more comprehensively, an MLP network was designed to be compared with 1D-CNN. When the complexity of the network reached a certain level, the increase in the number of layers was not helpful for performance improvement. Considering that a simple structure reduces the number of calculations for network propagation, the structural parameters of the MLP listed in Table 4 were finally selected.

4. Experiments and Results

4.1. Network Training

Before training the network, all parameters were initialized using Glorot initialization. The rectified linear unit (ReLU) [30] was the activation function for each convolutional layer. The batch size was 128, the training epoch was 20, and the network parameters were optimized by the Adam optimization algorithm. Sample pairs were input to the two sub-networks to obtain feature vectors of the same dimension. We could determine whether a sample pair was positive by calculating the Euclidean distance of output vectors and comparing it with the threshold.

After training the Siamese network with dataset AA and evaluating the performance with datasets BB, CC, and DD, the network parameters were tuned. The optimized parameters of the Siamese network are shown in Table 3.

4.2. Network Performance Test

Datasets EE, FF, and GG were used to test the performance of the Siamese network to recognize new targets with different parameters.

Performance evaluation of Doppler shifts.

Dataset EE was used to assess the performance of the Siamese network to recognize new targets with different Doppler shifts. Positive and negative sample pairs were input to test the network accuracy for moving targets with different velocities. The results are shown in Figure 5. The horizontal axis is the relative Doppler shift

v / c

[31], and the vertical axis is the accuracy of the Siamese network.

Figure 5 shows the high accuracy of the network for 10 types of new targets and 41 Doppler shifts. The result shows that the proposed network with CNN can recognize new targets with different Doppler shifts. The recognition rate of the Siamese network with MLP is relatively low.

To further evaluate the recognition performance with dataset EE, we analyzed the results of samples with the maximum relative Doppler shift (f_d = 0.02). The classification results are shown in Figure 6, where the vertical axis is the actual label, and the horizontal axis is the predicted label. There are 10 types of targets, and the number of samples in each type is 1000. The color depth reflects the accuracy of recognition.

Precision [32] is defined here as the probability of being a positive sample among all predicted positive samples, which can be expressed as

p r e c i s i o n_{n} = \frac{T P_{n}}{T P_{n} + F P_{n}}

(13)

“Recall” [33] is the probability of being predicted as a positive sample among the actual positive samples, which can be expressed as

r e c a l l_{n} = \frac{T P_{n}}{T P_{n} + F N_{n}}

(14)

where n is the sample label of the n-th type, TP represents true positive, FP denotes false positive, and FN is false negative.

We introduced a new measure of balance between them: the F1-score [34]. The F1-score considered both the precision rate and the recall rate so that the two can reach their maxima simultaneously and achieve a balance. The F1-score is expressed as

F 1_{n} = \frac{2 * {precision}_{n} * {recall}_{n}}{{precision}_{n} + {recall}_{n}}

(15)

By calculating the average F1-score of each type, F1_average can be written as

F 1_{a v e r a g e} = \frac{1}{N} \sum_{n = 0}^{N - 1} F 1_{n},

(16)

where N is 10.

When the Doppler shift (f_d = 0.02) was at maximum, the calculated value of F1_average was 0.8462.

Performance evaluation of SNRs.

Dataset FF was used to evaluate the performance of the Siamese network to recognize new targets with 16 types of SNRs. Positive and negative sample pairs were input to the network to assess the accuracy of the network for targets with different SNRs. The results of the evaluation are shown in Figure 7. The horizontal axis is the factor

μ

that controls the SNR, and the vertical axis is the accuracy of the network.

As shown in Figure 7, the Siamese network has a high accuracy for 10 types of new targets and 16 SNRs when

μ \leq 1.8

. As μ increase, the accuracy of the network decreases rapidly, but it remains above 80% for

μ \leq 2.0

. Experiments showed that when the SNR is within a certain range, the network with CNN identifies underwater targets accurately. The recognition rate of the Siamese network with MLP is relatively low.

The classification results for when μ was 2.2 are shown in Figure 8. The calculated value of F1_average was 0.6403 using the same method.

Performance evaluation of interference.

Dataset GG was used to evaluate the performance of the Siamese network in recognizing new targets with different levels of interference. Positive and negative sample pairs were input to the network to assess the accuracy of the network for targets with different parameters. The results of the evaluation are shown in Figure 9. The horizontal axis shows the number of interference occurrences (negative data indicate the loss of line spectra), and the vertical axis shows the accuracy of the network.

As shown in Figure 9, our proposed network identifies targets well with various interference levels. Experiments showed that the network could accurately identify underwater targets when the number of line spectra increased or decreased within a certain range. The recognition rate of the Siamese network with MLP was relatively low.

The classification results when the number = 2 are shown in Figure 10. The calculated F1_average was 0.7423 using the same method.

5. Conclusions

Due to the high cost of underwater target data acquisition, a Siamese network for identifying DEMON spectra of underwater targets was developed during our study and reported in this paper. We used the simulation data to train and evaluate the network performance and then used the simulating and real samples to test the network performance; then, we added more real samples to evaluate the network performance. The proposed network recognized targets accurately from a small sample. During the training process, the parameters of the Siamese network were tuned by evaluating datasets with different Doppler shifts, SNRs, and interference to obtain a network with good generalization performance. The article did not study the underwater target radiation noise deeply enough, and the difference between the virtual sample obtained from simulation and the real sample will cause the network recognition performance to be affected. The research of underwater target radiation noise modeling should be strengthened to reduce the differences between the simulated samples and the real samples. On the other hand, the number of real target samples increased in the network training process to improve the network recognition performance. The experimental results showed that a network trained on samples without a Doppler shift and interference has a high Doppler tolerance and can recognize underwater targets with different SNRs and levels of interference. The cross-entropy contrastive loss function enabled the differences between input sample pairs to be determined accurately so that targets can be recognized from small samples. Convolutional neural networks need a large number of training samples to achieve the ideal recognition effect, although the twin network-based small sample UATR algorithm effectively improves the recognition accuracy of the algorithm in the case of insufficient training samples; however, improving the recognition rate of the UATR algorithm in the case of very few underwater target samples or even a single sample is the next problem to be seriously considered.

The approach mentioned in this paper differs from the traditional approach of using a large number of multi-state samples to train the network and then to identify multi-state samples. In this paper, we used single-state samples to train the network, and the results showed that we can better identify multi-state samples. In this paper, more simulation samples were used for evaluation and fewer real samples were used for testing. For our next study, we will evaluate the network with more real samples so that the new real samples will be identified with better results. If necessary, all simulation samples for training and evaluation will be replaced with real samples. This study needs to be completed on the basis of subsequent collection of a sufficient amount of real data. In addition, the cpu-based UATR is slow, and we will port the network model to FPGA for accelerated recognition. The training and recognition rate of the network model for a large number of samples can be greatly improved by the ARM architecture.

Author Contributions

Conceptualization, D.L.; methodology, D.L., W.S. and W.C.; software, formal analysis, W.H., B.W. and W.S.; validation, W.S. and W.C.; writing—original draft preparation, W.H. and B.W.; writing—review and editing, W.S. and W.C.; supervision, D.L.; project administration, D.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Program of Tianjin, grant number 21YDTPJC00180, and the Science and Technology Program of Hebei, grant number 22350901D.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hemminger, T.L.; Pao, Y.-H. Detection and classification of underwater acoustic transients using neural networks. IEEE Trans. Neural Netw. 1994, 5, 712–718. [Google Scholar] [CrossRef]
Yang, H.; Shen, S.; Yao, X.; Sheng, M.; Wang, C. Competitive Deep-Belief Networks for Underwater Acoustic Target Recognition. Sensors 2018, 18, 952. [Google Scholar] [CrossRef] [Green Version]
Zhu, J.; Christensen, J.; Jung, J.; Martin-Moreno, L.; Yin, X.; Fok, L.; Zhang, X.; Garcia-Vidal, F.J. A holey-structured metamaterial for acoustic deep-subwavelength imaging. Nat. Phys. 2011, 7, 52–55. [Google Scholar] [CrossRef] [Green Version]
Henclik, S. Underwater acoustic target tracking with fixed passive sonar system. In Proceedings of the 6th European Conference on Underwater Acoustics (ECUA 2002), Gdansk, Poland, 24–27 June 2002. [Google Scholar]
Zhang, Y.; Sun, J.; Zhang, Y. Research on acoustic signal detection simulation for passive sonar. In Proceedings of the 2010 International Conference on Computational and Information Sciences (ICCIS 2010), Chengdu, China, 17–19 December 2010. [Google Scholar]
Feng, S.; Zhu, X. A Transformer-Based Deep Learning Network for Underwater Acoustic Target Recognition. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Yao, j.; Wang, D.; Hu, H.; Xing, W.; Wang, L. ADCNN: Towards learning adaptive dilation for convolutional neural networks. Pattern Recognit. 2022, 123, 108369. [Google Scholar] [CrossRef]
Chen, Z.; Li, Y.; Liang, H.; Yu, J. Hierarchical Cosine Similarity Entropy for Feature Extraction of Ship-Radiated Noise. Entropy 2018, 20, 425. [Google Scholar] [CrossRef] [Green Version]
Zeng, D.; Peng, R.; Jiang, C.; Li, Y.; Dai, J. CSDM: A context-sensitive deep matching model for medical dialogue information extraction. Inf. Sci. 2022, 607, 727–738. [Google Scholar] [CrossRef]
Dubey, S.R.; Chakraborty, S. Average biased ReLU based CNN descriptor for improved face retrieval. Multimed. Tools Appl. 2021, 80, 23181–23206. [Google Scholar] [CrossRef]
Zhu, J.; Hu, T.; Jiang, B.; Yang, X. Intelligent bearing fault diagnosis using PCA-DBN framework. Neural Comput. Appl. 2020, 32, 10773–10781. [Google Scholar] [CrossRef]
Yang, H.; Li, L.-L.; Li, G.-H.; Guan, Q. A novel feature extraction method for ship-radiated noise. Def. Technol. 2022, 18, 604–617. [Google Scholar] [CrossRef]
Safi, M.E.; Abbas, E.I. Isolated word recognition based on PNCC with different classifiers in a noisy environment. Appl. Acoust. 2022, 195, 108848. [Google Scholar] [CrossRef]
Patil, A.A.; Patil, C.B.; Mahulikar, P.P. Effect of coactive influence of LDHs and PNCC on thermal and mechanical properties of epoxy resin. Plast. Rubber Compos. 2021, 50, 209–218. [Google Scholar] [CrossRef]
Patil, A.A.; Patil, C.B.; Mahulikar, P.P. Impact of synergism of LDH with PNCC on the thermal and mechanical properties of polyester nanocomposites. Polym. Plast. Technol. Mater. 2020, 59, 864–873. [Google Scholar] [CrossRef]
Zhang, Q.; Bai, J.; Xu, F. A retrieval method for encrypted speech based on improved power normalized cepstrum coefficients and perceptual hashing. Multimed. Tools Appl. 2022, 81, 15127–15151. [Google Scholar] [CrossRef]
Takacs, B.; Doczi, R.; Suto, B.; Kallo, J.; Varkonyi, T.A.; Haidegger, T.; Kozlovszky, M. Extending AUV Response Robot Capabilities to. Solve Standardized Test Methods. Acta Polytech. Hung. 2016, 13, 157–170. [Google Scholar]
Bereketli, A.; Tumcakir, M.; Yeni, B. P-AUV: Position aware routing and medium access for ad hoc AUV networks. J. Netw. Comput. Appl. 2019, 125, 146–154. [Google Scholar] [CrossRef]
Ssegey, Z.; Nikos, K. Learning to Compare Image Patches via Convolutional Neural Networks. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Yuan, J.; Guo, H.; Jin, Z.; Jin, H.; Zhang, X.; Luo, J. One-shot Learning for Fine-grained Relation Extraction via Convolutional Siamese Neural Network. In Proceedings of the 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, USA, 11–14 December 2017. [Google Scholar]
Lee, D.-H. One-Shot Scale and Angle Estimation for Fast Visual Object Tracking. IEEE Access 2019, 7, 55477–55484. [Google Scholar] [CrossRef]
Salberg, A.-B.; Swami, A. Doppler and frequency-offset synchronization in wideband OFDM. IEEE Trans. Wirel. Commun. 2005, 4, 2870–2881. [Google Scholar] [CrossRef]
Yang, T.; Yang, W. Performance analysis of direct-sequence spread-spectrum underwater acoustic communications with low signal-to-noise-ratio input signals. J. Acoust. Soc. Am. 2008, 123, 842–855. [Google Scholar] [CrossRef]
Tian, P.; Kang, R.; Yu, H.; Wu, Y. Analysis of quantisation noise within signal band for sinusoidal signal. IET Commun. 2013, 7, 335–339. [Google Scholar] [CrossRef]
Wu, Y.; Yang, Y.; Yang, L.; Wang, Y. Underwater target recognition based on constant-beamwidth waveform fidelity and interference-suppression. J. Northwestern Polytech. Univ. 2015, 33, 843–848. [Google Scholar]
Tanveer, M.; Tan, H.K.; Ng, H.F.; Leung, M.K.; Chuah, J.H. Regularization of Deep Neural Network with Batch Contrastive Loss. IEEE Access 2021, 9, 124409–124418. [Google Scholar] [CrossRef]
Akritas, P.; Antoniou, I.; Ivanov, V.V. I dentification and prediction of discrete chaotic maps applying a Chebyshev neural network. Chaos Solitons Fractals 2000, 11, 337–344. [Google Scholar] [CrossRef]
Ma, X.; Zhang, S.; Sun, J.; Han, Y.; Du, J.; Fu, X.; Yang, Y.; Sa, Y.; Li, Q.; Yang, C. A TFA-CNN method for quantitative analysis in infrared spectroscopy. Infrared Phys. Technol. 2022, 126, 104329. [Google Scholar] [CrossRef]
Deng, M.; Zhang, Q.; Zhang, K.; Li, H.; Zhang, Y.; Cao, W. A Novel Defect Inspection System Using Convolutional Neural Network for MEMS Pressure Sensors. J. Imaging 2022, 8, 268. [Google Scholar] [CrossRef]
Yang, L.; Yang, L.; Ho, K.C. Moving Target Localization in Multistatic Sonar by Differential Delays and Doppler Shifts. IEEE Signal Process. Lett. 2016, 23, 1160–1164. [Google Scholar] [CrossRef]
Hammad, I.; Li, L.; El-Sankary, K.; Snelgrove, W.M. CNN Inference Using a Preprocessing Precision Controller and Approximate Multipliers with Various Precisions. IEEE Access 2021, 9, 7220–7232. [Google Scholar] [CrossRef]
Li, M.T.; Lee, S.H. A Study on Small Pest Detection Based on a CascadeR-CNN-Swin Model. CMC Comput. Mater. Contin. 2022, 72, 6155–6165. [Google Scholar] [CrossRef]
Goncalves, C.B.; Souza, J.R.; Fernandes, H. CNN architecture optimization using bio-inspired algorithms for breast cancer detection in infrared images. Comput. Biol. Med. 2022, 142, 105205. [Google Scholar] [CrossRef]
DeVries, Z.; Locke, E.; Hoda, M.; Moravek, D.; Phan, K.; Stratton, A.; Phan, P. Using a national surgical database to predict complications following posterior lumbar surgery and comparing the area under the curve and F1-score for the assessment of prognostic capability. Spine J. 2021, 21, 1135–1142. [Google Scholar] [CrossRef]

Figure 1. The process of the DEMON method.

Figure 2. The DEMON spectrum of different parameters. (a) DEMON spectrums of a reference sample; (b) DEMON spectrums of a sample with Doppler shifts; (c) DEMON spectrums of a sample with low SNR; (d) DEMON spectrums of a sample with interferences.

Figure 3. Siamese network architecture.

Figure 4. The proposed Siamese network architecture.

Figure 5. The performance of the proposed Siamese network for Doppler shifts.

Figure 6. The classification results of samples with f_d = 0.02.

Figure 7. The performance of the proposed Siamese network for SNRs.

Figure 8. Classification result of samples with μ = 2.2.

Figure 9. The performance of the proposed Siamese network for interferences.

Figure 10. The classification results of samples with number = 2.

Table 1. The dimensions of datasets A to G.

	Datasets	Ship Status	No. Ships	Sample Dimension	Total No. Samples
Training	A	1	7	1 × 2048	1 × 7 × 3000
Verification	B	41	7	1 × 2048	41 × 7 × 500
	C	16	7	1 × 2048	16 × 7 × 500
	D	7	7	1 × 2048	7 × 7 × 500
Test	E	41	10	1 × 2048	41 × 10 × 500
	F	16	10	1 × 2048	16 × 10 × 500
	G	7	10	1 × 2048	7 × 10 × 500

Table 2. The dimensions of positive and negative sample pairs.

	Datasets	Sub-Dataset	Ship Status	No. Ships	No. Pairs
Training	AA	AA+	1	7	1 × 7 × 1000
Training	AA	AA−	1	7	1 × 7 × 1000
Validation	BB	BB+	41	7	41 × 7 × 500
	BB	BB−	41	7	41 × 7 × 500
	CC	CC+	16	7	16 × 7 × 500
	CC	CC−	16	7	16 × 7 × 500
	DD	DD+	7	7	7 × 7 × 500
	DD	DD−	7	7	7 × 7 × 500
Test	EE	EE+	41	10	41 × 10 × 1000
	EE	EE−	41	10	41 × 10 × 1000
	FF	FF+	16	10	16 × 10 × 1000
	FF	FF−	16	10	16 × 10 × 1000
	GG	GG+	7	10	7 × 10 × 1000
	GG	GG−	7	10	7 × 10 × 1000

Table 3. The parameters of the proposed Siamese network with CNN.

Layer Type	Output Shape	No. Kernels	Kernel Size	Activation Function
Input	2048 × 1
Conv_1D	2048 × 4	4	1 × 5	ReLU
MaxPooling1D	512 × 4	1
Conv_1D	512 × 4	4	1 × 5	ReLU
MaxPooling1D	128 × 4	1
Conv_1D	128 × 4	4	1 × 5	ReLU
MaxPooling1D	32 × 4	1
Flatten	128 × 1
Dense	64 × 1			ReLU

Table 4. The parameters of the proposed Siamese network with MLP.

Layer Type	Neuron Size	Output Shape
Input		2048 × 1
Dense_1	1024	1024 × 1
Dense_2	256	256 × 1
Dropout_1		256 × 1
Dense_3	128	128 × 1
Dropout_2		128 × 1
Dense_4	64	64 × 1

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, D.; Shen, W.; Cao, W.; Hou, W.; Wang, B. Design of Siamese Network for Underwater Target Recognition with Small Sample Size. Appl. Sci. 2022, 12, 10659. https://doi.org/10.3390/app122010659

AMA Style

Liu D, Shen W, Cao W, Hou W, Wang B. Design of Siamese Network for Underwater Target Recognition with Small Sample Size. Applied Sciences. 2022; 12(20):10659. https://doi.org/10.3390/app122010659

Chicago/Turabian Style

Liu, Dali, Wenhao Shen, Wenjing Cao, Weimin Hou, and Baozhu Wang. 2022. "Design of Siamese Network for Underwater Target Recognition with Small Sample Size" Applied Sciences 12, no. 20: 10659. https://doi.org/10.3390/app122010659

APA Style

Liu, D., Shen, W., Cao, W., Hou, W., & Wang, B. (2022). Design of Siamese Network for Underwater Target Recognition with Small Sample Size. Applied Sciences, 12(20), 10659. https://doi.org/10.3390/app122010659

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design of Siamese Network for Underwater Target Recognition with Small Sample Size

Abstract

1. Introduction

2. Model of Ship-Radiated Noise and Sample Generation

2.1. Ship-Radiated Noise and DEMON Processing

2.2. Design of Samples and Datasets

2.2.1. Sample Design

2.2.2. Generation of Positive and Negative Sample Pairs

3. Design of the Siamese Network

4. Experiments and Results

4.1. Network Training

4.2. Network Performance Test

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI