Surface and Underwater Acoustic Source Discrimination Based on Machine Learning Using a Single Hydrophone

Zhang, Wen; Wu, Yanqun; Shi, Jian; Leng, Hongze; Zhao, Yun; Guo, Jizhou

doi:10.3390/jmse10030321

Open AccessArticle

Surface and Underwater Acoustic Source Discrimination Based on Machine Learning Using a Single Hydrophone

by

Wen Zhang

^*

,

Yanqun Wu

^*,

Jian Shi

^*,

Hongze Leng

,

Yun Zhao

and

Jizhou Guo

College of Meteorology and Oceanography, National University of Defense Technology, Changsha 410073, China

^*

Authors to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2022, 10(3), 321; https://doi.org/10.3390/jmse10030321

Submission received: 8 January 2022 / Revised: 8 February 2022 / Accepted: 22 February 2022 / Published: 24 February 2022

(This article belongs to the Section Physical Oceanography)

Download

Browse Figures

Versions Notes

Abstract

:

In shallow water, passive sonar usually has great difficulty in discriminating a surface acoustic source from an underwater one. To solve this problem, a supervised machine learning method using only one hydrophone is implemented in this paper. Firstly, simulated training data are generated by a normal mode model KRAKEN with the same environment setup as that in SACLANT 1993 experiment. Secondly, the k-nearest neighbor (kNN) classifiers are trained and evaluated using the scores of precision, recall, F1 and accuracy. Thirdly, the random subspace kNN classifiers are finely trained on three hyperparameters (the number of nearest neighbors, the number of predictors selected at random and the number of learners in the ensemble) to obtain the best model. Fourthly, a deep learning method called ResNet-18 is also applied, and it reaches the best balance between precision and recall, while the accuracies of both simulation and experimental data are all 1.0. Further, data from all 48 hydrophones of the vertical linear array (VLA) are analyzed using the three kinds of machine learning methods (kNN, random subspace kNN and ResNet-18) separately, and the results are compared. It is concluded that the performance of random subspace kNN is the best. Both the simulation and experimental results suggest the feasibility of machine learning as a surface and underwater acoustic source discrimination method even with only a single hydrophone.

Keywords:

surface and underwater acoustic source discrimination; machine learning; kNN; random subspace kNN; ResNet-18; depth classification; single hydrophone

1. Introduction

Surface and underwater acoustic source discrimination is one of the most important applications in passive sonar systems. One difficult task is to discriminate a surface acoustic source (ships, towed arrays, etc.) from an underwater acoustic one (submarines, UUVs, etc.) [1]. The difficulty arises because underwater target radiated noise is mixed with ambient ocean noise and is very complex to deal with, especially in a shallow water environment where the multipath effect exists.

Various recognition methods have been studied and tested theoretically, but their practical results were not as good as their simulation counterparts. A common way to solve this problem is to estimate the source depth directly. Matched field processing (MFP) [2,3,4,5,6,7] is a traditional method for source localization as well as depth estimation. MFP is achieved by matching the measured pressure fields with the replica ones, while the generated replica fields require accurate a priori environmental information and an appropriate propagation model. However, the modeling depends on obtaining exact information in the spatially varying ocean environment, and the environment mismatch can significantly degrade the performance of MFP.

Considering that accurate source depth is not easily estimated, it is a good idea to reformulate the problem as a binary classification problem for surface and underwater acoustic source discrimination. Related works [8,9,10,11,12,13,14] have done binary classification with model-driven methods. In 2007, Premus et al. [8] presented a matched subspace approach for depth discrimination in a shallow water waveguide. Two hypotheses for the low-order and the high-order mode subspaces were tested to discriminate two acoustic sources of depth 9 and 54 m, respectively. In 2013, Premus et al. [9] used mode subspace projections to discriminate acoustic sources of depth 9 and 60 m with a horizontal linear array (HLA). In 2014, Yang [10] demonstrated a data-based matched-mode source localization method for a moving source and estimated the source depth directly from the data, without requiring any environmental acoustic information and assuming any propagation model. In 2016, Du et al. [11] performed passive acoustic source depth discrimination with two hydrophones in shallow water. They used the fact that when there was a significant thermocline in the ocean environment, the numerical values for the waveguide invariant would be different, depending on the source and receiver depths. Based on this characteristic, they presented a method of source depth discrimination through the LOFAR diagram analysis of two hydrophones placed above and below the thermocline separately. In the same year, Conan [12] considered the source depth discrimination and aimed to evaluate whether the source was near the surface or submerged with a VLA. These two hypotheses were formulated in terms of normal modes, using the concept of trapped and free modes. The method was tested on simulated data for a vertical array spanning half of the water column in a shallow water environment. In 2017, Conan [13] used the trapped energy ratio for source depth discrimination with an HLA, and experimental results showed that a surface ship and a submerged towed source were successfully identified. In 2018, Liang et al. [14] conducted depth discrimination for low-frequency sources using an HLA with acoustic vector sensors based on mode extraction. The simulation analyses showed that a large number of vector sensors were required to achieve a high probability of correct discrimination (PCD).

Recently, underwater acoustic source ranging has been obtained by machine learning [15,16], which achieved a lower range estimation error than the classical MFP [4,5,7]. These studies demonstrated that machine learning trained on observational datasets performed well in the ship range positioning. This approach is particularly beneficial when historical data can be used to train machine learning models. Although more data can be collected using automatic identification systems (AISs), there are still not enough databases available for some applications. In addition, it would be impractical to collect acoustic data for each source location (i.e., range and depth) over large ocean areas.

In machine learning, surface and underwater acoustic source discrimination is the assignment of a label to a given input value. An example of discrimination is classification, which attempts to assign each input value to one of a given set of binary classes. In supervised learning cases, the discrimination systems are trained from labeled "training" data. When applied in underwater target radiated noise classification, it can be supposed that machine learning is an alternative tool where the assignment is a label to a given input value. The machine learning method can obtain understandable information from data, making it unnecessary to build a physical model, such as the classic MFP or model-driven method used for surface and underwater acoustic source discrimination. In 2019, Jonekwon et al. [17] used four data-driven methods—random forest (RF), support vector machine (SVM), feed-forward neural network (FNN) and convolutional neural network (CNN)—to discriminate surface and underwater vessels in the ocean using simulation low-frequency acoustic pressure data. Acoustic data were modeled considering a vertical linear array (VLA) by a Monte Carlo simulation using KRAKEN in the ocean environment of the East Sea of Korea. The processed cross-spectral density matrix (CSDM) was used as input data for supervised machine learning. Only simulation array data were used to conduct both training and testing.

In this paper, real experimental data are used to validate that machine learning can solve the surface and underwater acoustic source discrimination problem well, even using a single hydrophone. Interestingly, the training data are simulated acoustic data under the experimental environment, while the test data are the real experimental data. Compared with the traditional methods, such as MFP and model-driven methods, the machine learning method uses the same information to deal with the discrimination problem. Traditional machine learning methods play an important role in classification. In recent years, some deep learning models such as VGGNet [18], AlexNet [19] and ResNet [20] have greatly improved the performance of the deep neural networks in image recognition and have also been applied to more fields. In this paper, two kinds of classic machine learning methods (kNN and random subspace kNN) and one kind of deep learning method (ResNet-18 of ResNet) are adopted.

The rest of this paper is divided into six sections. Section 2 describes the whole architecture of surface and underwater acoustic source discrimination based on machine learning. In Section 3, the basic theory of the normal mode model and three kinds of machine learning classifiers (kNN, random subspace kNN, ResNet-18) used to classify the targets are introduced. In Section 4, the simulation environments for the simulated acoustic data and the experimental data are presented. Only a single hydrophone is used to conduct the discrimination. In Section 5, firstly the k-nearest neighbor (kNN) classifiers are trained and evaluated using the scores of precision, recall, F1 and accuracy. Secondly, the random subspace kNN classifiers are finely trained on three hyperparameters (the number of nearest neighbors, the number of predictors selected at random and the number of learners in the ensemble) to obtain the best model. Thirdly a deep learning method called ResNet-18 is applied to conduct the discrimination. Fourthly data from all 48 hydrophones of the VLA are analyzed using the three kinds of machine learning methods separately. At last, Section 6 provides the conclusions and discussion.

2. Whole Architecture

The whole architecture of surface and underwater acoustic source discrimination based on machine learning using only a single hydrophone is shown in Figure 1. The main steps are as follows: (1) Setting the underwater acoustic environment. In this paper, the SACLANT 1993 experimental data are analyzed for validating the proposed method. This experiment is viewed as a benchmark one for source localization using MFP and openly accessible online [21]. The settings of the environment are described in detail in Section 4.1. (2) Simulating surface and underwater acoustic source signals received by a single hydrophone using KRAKEN. The simulating process is described in detail in Section 4.2. (3) Data preprocessing. This step can be also called data normalization. The generated data should be normalized before sending for model training based on machine learning. The experimental data should also be normalized using the same method to be consistent with simulation data. The preprocessing is referred to in Section 4.2 and Section 4.3. (4) Model training using kNN classifier. The purpose is to compare all the performance results and then evaluate the trained model using the scores of precision, recall, F1 and accuracy, as explained in Section 3.2 and Section 5.1. (5) Model fine training using random subspace kNN classifier. The fine model training is implemented on three hyperparameters: the number of nearest neighbors, the number of predictors selected at random and the number of learners in the ensemble. The fine training is described in Section 3.3 and Section 5.2. (6) Model training using ResNet-18 classifier. The main purpose is to explore a deep learning method to compare with the two classic machine learning methods. The ResNet-18 training is described in Section 3.4 and Section 5.3. (7) Classifying the experimental data of SACLANT 1993 by using the trained model. The results are presented in Section 5.1, Section 5.2 and Section 5.3. (8) Finally, data from all 48 hydrophones of the VLA are analyzed using the three kinds of machine learning methods (kNN, random subspace kNN and ResNet-18) separately. The classificational results are shown in Section 5.4.

3. Theory and Method

3.1. Normal-Mode Model

The normal-mode model provides analytic expressions of the received data at the array, and the pressure of the nth sensor is a sum of M propagating normal modes given by [22]

p (f, r_{n}) = \frac{i}{ρ (z_{s}) \sqrt{8 π r_{n}}} e^{- i \frac{π}{4}} \sum_{m = 1}^{M} Ψ_{m} (z_{s}) Ψ_{m} (z_{r}) \frac{\exp {i k_{m} r_{n}}}{\sqrt{k_{m}}}

(1)

where

f

is the signal frequency;

ρ (z_{s})

is the water density at the source depth;

r_{n}

is the range from the source to the nth sensor;

k_{m}

and

Ψ_{m}

are the modal wavenumber and the eigenfunction of the mode m, respectively;

k_{m} = κ_{m} + i α_{m}

(

κ_{m}, α_{m} > 0

) is a complex wavenumber with the imaginary part

α_{m}

(the modal attenuation coefficient); and

κ_{m} = ω / c_{p m}

with

c_{p m}

the phase velocity of the mth mode.

3.2. kNN Classifier

The acronym kNN is short for k-nearest neighbor classification. Categorizing query points based on their distance to points (or neighbors) in a training dataset can be a simple yet effective way of classifying new points. The core idea of kNN is that if the majority of the k-most similar samples in the feature space belong to a certain class, the sample also belongs to this class. Simply speaking, each sample can be classified by its k neighbors.

Given two points

x_{i} (x_{i 1}, x_{i 2}, \dots, x_{i n})

and

x_{j} (x_{j 1}, x_{j 2}, \dots, x_{j n})

, the Euclidean distance between them is

d (x_{i}, x_{j}) = \sqrt{\sum_{k = 1}^{n} {(x_{i k} - y_{i k})}^{2}}

(2)

Given a set

X

of

p

points (

x_{1}, x_{2}, \dots, x_{p}

) and a distance function (such as the Euclidean distance in Equation (2)), kNN search lets us find the k closest points in

X

to a query point or set of points

Y

. The kNN search technique and kNN-based algorithms are widely used as benchmark learning rules.

Nearest neighbor classifiers typically have good predictive accuracy in low dimensions but might be not so accurate in high dimensions. More importantly, they have high memory usage and are not easy to interpret. Both the simulation and experimental data, described in Section 4.2 and Section 4.3, are low in dimensions.

3.3. Random Subspace kNN Classifier

A classification ensemble is a predictive model composed of a weighted combination of multiple classification models. In general, combining multiple classification models increases predictive performance. Random subspace kNN classifier is a kind of classification ensemble and can improve the accuracy of kNN classifiers. It also has the advantage of using less memory than ensembles with all predictors, and it can also handle missing values.

The basic random subspace algorithm requires a few hyperparameters. The important hyperparameters are the number of nearest neighbors k, the number of predictors selected at random d and the number of learners m in the ensemble and are explored in Section 5.2.

The diagram of the basic random subspace algorithm is shown in Figure 2. The original dataset X has p samples

(x_{1}, x_{2}, \dots, x_{p})

; each of the samples has n features

x_{i 1}, x_{i 2}, \dots, x_{i n} (i = 1, 2, \dots, p)

. There are n predictors in X, and each predictor has p numbers

x_{1 j}, x_{2 j}, \dots, x_{p j} (j = 1, 2, \dots, n)

. First, choose without replacement of a random set of d predictors from the n possible ones. Second, train a kNN weak learner using just the d chosen predictors. Third, repeat the first and second steps until there are m weak classification learners. Forth, predict by taking an average of the scores of the weak learners using 10-fold cross-validation, and classify the sample with the highest average score.

3.4. ResNet-18 Classifier

ResNet-18 [20] is a form of convolutional neural network (CNN) that uses layers of filters to learn spatial features. In deep neural network learning, one problem is that as network depth increases, the accuracy becomes saturated and degrades rapidly, which is called the “degradation” problem [20]. ResNet can solve the dilemma by adding the residual block as shown in Figure 3. In this way, the residual network solves the “degradation” problem without increasing the number of parameters and computational complexity, allowing the depth of networks to be deeper. Figure 3 denotes the desired underlying mapping as

H (x)

, and the stacked nonlinear layers fit another mapping of

F (x) = H (x) - x

. The ResNet adds the identity mapping to the convolutional output by using a short connection. The mapping output is recast into

F (x) + x

. Here, the rectified linear unit (ReLU) is used as the activation function. The architecture of ResNet-18 is shown in Table 1.

In this paper, kNN is used in Section 5.1 to train and evaluate the model using the scores of precision, recall, F1 and accuracy. Then, random subspace kNN classifier is explored in Section 5.2 to perform finely detailed distinctions between classes. ResNet-18 is applied in Section 5.3 to evaluate the trained model using the scores of precision, recall, F1 and accuracy. Section 5.4 discusses the results of data from all the 48 hydrophones using kNN, random subspace kNN and ResNet-18.

4. Simulation and Experiment Data

4.1. The Environment of SACLANT 1993

On 26 and 27 October in 1993, an experiment was conducted by SACLANT Centre in a shallow water area north of the island of Elba [23,24,25]. The experiment deployed a 48-element vertical linear array (VLA) with 2 m spacing. The nominal experimental geometry and the sound speed profile measured near the VLA are shown in Figure 4. On the VLA, the topmost hydrophone is No. 1 hydrophone with a depth of 18.7 m, and the bottom is No. 48 hydrophone with a depth of 112.7 m. On 26 October, the stationary source with a depth of about 79 m was deployed from a surface buoy at approximately 5.6 km from the VLA. The surface buoy was tethered to a ballast on the bottom so that the source position was within a 200 m circle around the ballast. Based on the known uncertainties about the GPS position for the vertical array and the source buoy, the source range with respect to the VLA was approximately 5.6 km. The transmitted signal was pseudorandom noise (PRN) with a frequency band from 300 to 350 Hz.

Generally speaking, the typical draft depth of the surface ships in shallow waters is usually less than 20 m. Thus, 30 m is used as the discrimination depth in the paper [17], which divides all the acoustic sources into the surface and underwater classes. The acoustic sources with a depth between 0 and 30 m are labeled as surface ones, while those with a depth between 31 and 127 m are labeled as underwater ones.

4.2. The Simulation Data

Firstly, hydrophone No. 24 with a depth of 63.7 m (in the middle of the VLA) is selected for analyses. Without loss of generality, the other hydrophones are used in Section 5.4. However, due to the multipath effect in shallow water, the energy distribution of the sound field on the VLA fluctuates, resulting in different SNRs at the receivers. Thus, the SNR of the chosen hydrophones should not be too small. The simulation data are generated by KRAKEN [26] under the realistic SACLANT 1993 ocean environment described above. In order to obtain sufficient data for training, the distance range is from 4.0 to 7.0 km with 0.1 km interval, and the number of discrete ranges is 31. The simulated source depth is from 1 to 90 m with 1 m interval, and the number of discrete depths is 90. So, the number of the spatial samples is 2790 (=31 × 90). The acoustic signal is broadband and the frequency bandwidth is from 300 to 350 Hz with 0.5 Hz interval. Thus, the number of discrete frequencies is 101, which is also the number of features of each sample. At last, the total size of simulation data is 2790 × 101, within which the first 930 (=31 × 30) rows of the array correspond to surface sources and the remaining 1860 (=31 × 60) rows correspond to underwater sources. No noise is added to the simulation data.

The simulation data are spectra and the units are dB. They should be normalized before training. The purpose of normalization is to make the preprocessed data be limited within a certain range (such as [0,1] or [−1,1]), so as to eliminate the adverse effects caused by singular sample data. Here, the simulation 2790 × 101 array data are normalized to [−1,1] by row. The 1st and 930th surface samples, corresponding to 0 depth and 4.0 km range and 30 m depth and 7.0 km range, respectively, are shown in Figure 5a,b. The 931st and 2790th underwater samples, corresponding to 31 m depth and 4.0 km range and 90 m depth and 7.0 km range, respectively, are shown in Figure 5c,d. All the simulation data are very smooth because no noise is added.

4.3. The Experimental Data

For the experimental data analyses, hydrophone No. 24 (in the middle of the VLA) is selected, the same as used for simulations. The number of points sampled in the experiment is 301,056, the sampling rate is 1000 Hz and the duration is 301.056 s, that is, about 5 min. The spectrogram of the experimental data is shown in Figure 6, where the line spectra are visible. In this paper, 2000 time points (equal to 2 s) are taken as a sample to process the sea trial data in sections, and there are 1800 overlapping points. The sample obtained is a matrix with the size of 1496 × 2000. Taking the Fourier transform of each row of the matrix, and retaining only the frequency points within the range of [300, 350] Hz, the final sample is a matrix with the size of 1496 × 101. The selected range of frequency must be the same for both the simulation and experimental data.

The hydrophone data are broadband and should be preprocessed for the purpose of making the feature distribution of simulation data and experimental data consistent. Here, the experimental 1496 × 101 array data are normalized to [−1,1] by row, too. The 1st and 1496th experimental samples, both corresponding to about 79 m depth and 5.6 km range, are shown in Figure 7a,b. In comparison with the simulation data in Figure 5, it can be found that the experimental samples are noisy but still fit the reality. Dealing with noisy data discrimination is challenging when using the ideal simulation data to train the model.

Then, the normalized matrix is the final experimental training data with the number of samples of 1496, while the number of features of each sample is 101. There are 101 predictors, and each predictor has 1496 numbers. The structure of the experimental data is consistent with the simulation data described in Section 4.2.

5. Results and Analyses

The training dataset is the simulated 2790 × 101 array data, which are described in Section 4.2. The first 930 and the remaining 1860 rows of the array correspond to surface (labeled as 0) and underwater (labeled as 1) sources, respectively. Because the trained data used here have low dimensions, the kNN classifier is expected to have good predictive accuracy and is chosen to first explore the data. In summary, the trained data have 2790 observations (or samples), 101 predictors and 2 response classes. Each observation (or sample) has 101 features, while each predictor has 2790 numbers (the first 930 numbers are labeled 0 and the remaining 1860 numbers are labeled 1).

5.1. Results of kNN

Firstly, the model created by kNN classifier is chosen as the trained model to make predictions on new testing data. Ten-fold cross-validation is used to overcome overfitting. Different hyperparameters lead to different results. The number of neighbors is the key hyperparameter of kNN and is explored.

The testing data are the SACLANT 1993 experimental data with the size of 1496 × 101, which are described in Section 4.3. All the 1496 samples correspond to underwater sources (labeled as 1), which means the true labels are all ones.

The results (precision, recall, F1 score and accuracy) of simulation and experimental data versus the number of neighbors of kNN are shown in Figure 8 and Figure 9. The number of neighbors versus precision (in blue) and recall (in red) of simulation data is shown in Figure 8. The number of neighbors versus accuracy of simulation data (in blue) and experimental data (in red) is displayed in Figure 9.

For this research, if the main purpose is to detect an underwater target, such as a UUV or a submarine, a false alarm is preferred to missing any important target. Therefore, more attention should be paid to the recall score at this time, even if the precision score is sacrificed. This explains the accuracy of experimental data increases while the accuracy of the simulation data decreases as shown in Figure 9. Meanwhile, the recall of the simulation data increases while the precision of the simulation data decreases in Figure 8. Because all the experimental data samples are at 79 m depth, which is significantly far away from the 30 m cutoff between the surface and underwater classes, the accuracy of experimental data can increase to or near 1.0.

The simulation scores tend to decrease as the number of neighbors increases, but the experimental accuracy increases. This is likely because (1) the experimental data only have “underwater” sources at a single depth of about 79 m that is significantly below the 30 m cutoff between the two classes and (2) the simulations have all depths, including many that are very close to the cutoff, and the majority of the inaccuracy is coming from simulated samples with source depth around 30 m.

Because the performance of the kNN classifier cannot balance among precision, recall, F1 score and accuracy, a better method called random subspace kNN is also used.

5.2. Results of Random Subspace kNN

A random subspace ensemble is now used to improve the performance of classification, and a description of how to use cross-validation to determine good hyperparameters (number of nearest neighbors, number of predictors selected at random number of learners in ensemble) to find an appropriate model is presented.

The first step is to choose the number of nearest neighbors k in the random subspace kNN classifier, that is, to find a good choice for k. There are 101 predictors for each sample. For a sample, d predictors are selected randomly, while the same indexed predictors are selected for every sample. This process is repeated m times, and then an ensemble of m kNN classification learners is formed as shown in Figure 2. Here, d and m are set to 20 and 100. The method used here is the 10-fold cross-validation to prevent overfitting. In the 10-fold classification, the dataset is randomly divided into 10 pieces, using 9 of them for training and 1 for testing. This process can be repeated 10 times, with different test data used each time. Ten-fold classification error is the average of the 10 classification errors and it is in percentage. The number of neighbors is chosen approximately evenly spaced on a logarithmic scale. The results shown in Figure 10 indicate that the lowest 10-fold cross-validated error occurs for k = 1. Thus, k = 1 for this work.

The second step is to choose the number of predictors d selected at random in the random subspace kNN classifier. Here, k and m are set to 1 and 100. The ensemble for 1-nearest neighbor classification with various numbers of dimensions is created, and the 10-fold cross-validation loss of the resulting ensembles is examined. The results are shown in Figure 11. There seems to be no advantage in an ensemble with more than 20 or so predictors. It is possible that 22 predictors give good predictions. Thus, the number of predictors selected at random is set to be 22 for the following.

The third step is to choose the number of classification learners m in the ensemble in the random subspace kNN classifier. Here, k and d are set to 1 and 22. The ensembles for 1-nearest neighbor classification and 22 predictors are created, and the 10-fold cross-validated loss of the resulting ensembles is examined. The purpose is to find the smallest number of learners in the ensemble that still gives good classification. The results are shown in Figure 12. There seems to be no advantage in an ensemble with more than 60 or so learners. It is possible that 70 learners give good predictions. Thus, the number of learners is set to 70.

The last step is to create a model and examine its performance. The model uses 1-nearest neighbor, 22 predictors chosen at random and 70 learners. The scores of precision, recall, F1, accuracy of simulation data and accuracy of experimental data are 0.9817, 0.9785, 0.9800, 0.9735 and 1.0, respectively, as shown in Table 2. The final model reaches a good balance between precision and recall and has a very good experimental accuracy. The confusion matrix plot is shown in Figure 13, where labels 0 and 1 indicate surface and underwater sources, respectively. Here, 96.3% of surface sources are classified correctly as surface sources, while 3.7% are classified falsely as underwater sources; 97.8% of underwater sources are classified correctly as underwater sources, while 2.2% are classified falsely as surface sources. The receiver operating characteristic (ROC) plot is shown in Figure 14. The true positive rate is 0.9785 and the false positive rate is 0.03656.

5.3. Results of ResNet-18

In this part, ResNet-18 is applied to explore the performance of classification further. To obtain sufficient training data for ResNet-18, the same conditions are used as those in Section 4.2 except that the intervals of the distance range and the depth are 0.01 km and 0.1 m instead of 0.1 km and 1 m, respectively. The distance range is from 4.0 to 7.0 km with 0.01 km interval, and the number of discrete ranges is 301. The simulated source depth is from 1 to 30 m for surface sources and 31 to 90 m for underwater sources with 0.1 m interval, and the number of discrete depths is 882 (291 + 591). So, the number of the spatial samples is 265,482 (=301 × 882). Considering each sample has 101 features, the total size of the simulation data is 265,482 × 101, within which the first 87,591 (=301 × 291) rows of the array correspond to surface sources and the remaining 177,891 (=301 × 591) rows correspond to underwater sources. No noise is added to the simulation data. Here, 75% of the 265,482 samples are used as the training data, while the remaining 25% are used as the validation data. The 1496 experimental samples are used as the test data.

After the ResNet-18 is trained, scores are calculated for the classification of the validation data. The scores of precision, recall, F1, accuracy of simulation data and accuracy of experimental data are all 1.0, as shown in Table 2. The ResNet-18 reaches the best balance between precision and recall and has a very good experimental accuracy.

5.4. Results of kNN, Random Subspace kNN and ResNet-18 for All 48 Hydrophones

In the above parts, only data of hydrophone No. 24 were analyzed. In this part, data from all 48 hydrophones of the VLA are analyzed using the three kinds of machine learning methods (kNN, random subspace kNN and ResNet-18) separately in the same way used as described in Section 5.1, Section 5.2, Section 5.3. In the VLA in Figure 4, the depths of the 48 hydrophones from the topmost to the bottom span from 18.7 to 112.7 m. The sound propagation loss at different hydrophones may be distinct due to the multipath effect in shallow water, and the SNRs of different hydrophones in the array may be different.

For each hydrophone, the simulation data of surface and underwater acoustic sources received by the corresponding single hydrophone are generated using KRAKEN. The simulation data and the corresponding network training are completed separately for each hydrophone on the VLA. For each hydrophone, the testing experimental data are the 1496 × 101 array, which is described in Section 4.3. Artificially, 0.9 is set as the threshold to determine whether the training is a success or failure, where 0.9 is a threshold on the scores which are shown in Figure 15, Figure 16, Figure 17, Figure 18, Figure 19 and Figure 20.

5.4.1. kNN

For kNN, the simulation data are a 2790 × 101 array, which is described in Section 4.2. Based on the kNN results of Section 5.1, the number of neighbors is set as 1 for all 48 hydrophones in the following content.

The validation results (precision, recall and F1) of simulation data of all 48 hydrophones are shown in Figure 15. The validation accuracy of simulation data and testing accuracy of experimental data of all 48 hydrophones are shown in Figure 16. From Figure 15 and Figure 16, it can be seen that none of the 48 hydrophones fails to reach the threshold of 0.9. However, for the accuracy of experimental data, 19 hydrophones (Nos. 8–12, 16, 18, 20–24, 26, 27, 33, 34, 40–42) fail to meet the threshold of 0.9. For hydrophone No. 24, the testing accuracy is 0.8951, which is also below 0.9.

5.4.2. Random Subspace kNN

For random subspace kNN, the simulation data are a 2790 × 101 array, which is described in Section 4.2. Based on the random subspace kNN results of Section 5.2, the hyperparameters of the model are 1-nearest neighbor, 22 predictors at random and 70 learners.

The validation results (precision, recall and F1) of simulation data of all 48 hydrophones are shown in Figure 17. The validation accuracy of simulation data and testing accuracy of experimental data of all 48 hydrophones are shown in Figure 18. From Figure 17 and Figure 18, it is seen that none of the 48 hydrophones fails to reach the threshold of 0.9. However, for the accuracy of experimental data, 11 hydrophones (Nos. 11, 12, 20–22, 26–28, 33, 40, 42) fail to meet the threshold of 0.9. For hydrophone No. 24, the testing accuracy reaches 1.0.

5.4.3. ResNet-18

For ResNet-18, the simulation data are a 265,482 × 101 array (75% as the training data, 25% as the validation data), which is described in Section 5.3. It is much denser than the dataset used in the kNN and random subspace kNN.

The validation results (precision, recall and F1) of simulation data of all 48 hydrophones are shown in Figure 19. The validation accuracy of simulation data and testing accuracy of experimental data of all 48 hydrophones are shown in Figure 20. From Figure 19 and Figure 20, it is seen that none of the 48 hydrophones fails to reach the threshold of 0.9, and all the 48 models are finely trained as all the scores (precision, recall, F1 and accuracy) are 1.0. However, for the accuracy of experimental data, as many as 21 hydrophones (Nos. 3, 5–7, 13–18, 21, 25, 26, 28, 30, 31, 33, 41, 42, 47, 48) fail to meet the threshold of 0.9. For hydrophone No. 24, the testing accuracy is 1.0.

5.4.4. Comparison

The performances of all 48 hydrophones of the VLA using the three machine learning methods are analyzed and compared. The lists of hydrophones that fail to reach the threshold of 0.9 are shown in Table 3.

In kNN, there are 19 hydrophones with testing accuracy lower than 0.9. The 10 hydrophones (Nos. 11, 12, 20–22, 26, 27, 33, 40, 42) also fail in random subspace kNN.

In random subspace kNN, there are only 11 hydrophones with testing accuracy lower than 0.9. Compared with kNN, nine failed hydrophones (Nos. 8–10, 16, 18, 23, 24, 34, 41) are successful in random subspace kNN. There is an exception that one hydrophone (No. 28) does not appear in kNN. The performance of random subspace kNN is better than kNN.

In ResNet-18, there are up to 21 hydrophones with testing accuracy lower than 0.9. Compared with random subspace kNN, the number of failed hydrophones in ResNet-18 is 10 more than in random subspace kNN. The main reason for this difference is that ResNet-18 is a kind of deep learning method, and the training data for ResNet-18 are much denser than for random subspace kNN. Thus, the trained ResNet-18 model is more precise than the trained random subspace kNN model, and the generalization ability of the trained ResNet-18 model is poorer than that of the trained random subspace kNN model, especially when the testing data are polluted by the noise. Consequently, the performance of ResNet-18 is worse than that of random subspace kNN.

Four failed channels (Nos. 21, 26, 33, 42) appear in all three methods. The SNRs of these channels are likely too low to conduct surface and underwater discrimination. More details of the experiment (cannot be obtained) are needed to determine the deeper physical reasons.

Generally speaking, the performance (the size of samples for training, training and prediction time, accuracy of experimental testing data, successful hydrophones among 48-element VLA, generalization ability) of random subspace kNN is the best among the three machine learning methods for conducting surface and underwater discrimination. Random subspace kNN is a simple kind of machine learning method and only needs a smaller size of samples for training, and the training and testing time is short while the accuracies of experimental testing data are high for successful channels. The number of hydrophones among the 48-element VLA successfully passing through the test is also high, and the random subspace kNN has a strong generalization ability for noisy experimental testing data.

6. Conclusions and Discussion

From this work on surface and underwater acoustic source discrimination based on machine learning using only one hydrophone, the following conclusions are drawn from the whole process and the simulation and experimental data results: (1) The training data are generated using the acoustic model under the experimental environment, and the test data are the real experimental data. Compared with the traditional methods, such as MFP and model-driven methods, machine learning methods do not need additional information and may be an alternative way to deal with the surface and underwater acoustic source discrimination problem. (2) The kNN and random subspace kNN classifiers can be used to train the simulation data and tested on the experimental data, and the best one is chosen. The kNN can be used, as determined by evaluating the scores of precision, recall, F1 and accuracy. The random subspace classifier can be used to finely train the model. In general, the random subspace classifier increases predictive performance compared with the kNN classifier. (3) The classification results of hydrophone No. 24 experimental data using the final best model of random subspace kNN have illustrated that the surface and underwater discrimination problem is solved well using the trained model created by the simulation data. The final best model also reaches a good balance among scores of precision, recall, F1 and accuracy on simulation data, and it has a very good experimental accuracy at the same time. (4) The classification results for hydrophone No. 24 experimental data using ResNet-18 reach the best balance between precision and recall and have a very good experimental accuracy. (5) Considering the surface and underwater discrimination results of all 48 hydrophones of the VLA using three kinds of machine learning methods (kNN, random subspace kNN and ResNet-18), the performance of random subspace kNN is the best. It only needs a relatively small size of samples for training, and the training and testing time is short while the accuracies of experimental testing data are high for successful channels. The number of hydrophones among 48-element VLA successful passing through the test is also high. Furthermore, it has a strong generalization ability for noisy experimental testing data.

Both the simulation and experimental results suggest the feasibility of machine learning as a surface and underwater acoustic source discrimination method using only a single hydrophone. Moreover, if the discrimination problem is more complicated than just two kinds of classification, more than one hydrophone in the VLA can be employed for further study.

In this work, the SACLANT 1993 experiment is a good case study because all the main parameters of the environment were thoroughly examined by the scientists conducting the experiment. In practice, the variation in the water sound speed profile (SSP), the surface and the seafloor will dramatically change sound propagation in shallow ocean environments. For another new shallow water region, the conclusion may be different because we do not know how precisely the ocean acoustic environment is measured. However, the method described in the paper may be also applicable in real-world settings if the uncertainty of the environments could be considered properly.

Author Contributions

Conceptualization, W.Z. and Y.W.; methodology, W.Z.; software, W.Z. and Y.W.; validation, J.S., Y.Z., H.L. and J.G.; formal analysis, W.Z.; investigation, W.Z. and Y.W.; resources, Y.W.; data curation, W.Z.; writing—original draft preparation, W.Z.; writing—review and editing, W.Z. and Y.W.; visualization, J.S.; supervision, Y.Z., H.L. and J.G.; project administration, Y.W.; funding acquisition, W.Z. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (Grant No. 6210012506).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The reader can ask for all the related data from the first author ([email protected]) and the corresponding author ([email protected]).

Acknowledgments

The authors would like to thank SPIB (http://spib.linse.ufsc.br/, accessed on 23 February 2022) where the related data originally come from.

Conflicts of Interest

The authors declare no conflict of interest.

References

Nicolas, B.; Mars, J.I.; Lacoume, J. Source depth estimation using a horizontal array by matched-mode processing in the frequency-wavenumber domain. EURASIP J. Appl. Signal Process 2006, 2006, 65901. [Google Scholar] [CrossRef] [Green Version]
Tolstoy, A. Matched field processing for underwater acoustics. In Matched Field Processing for Underwater Acoustics/Alexandra Tolstoy; World Scientific: Singapore, 1993. [Google Scholar]
Wilson, G.R.; Koch, R.A.; Vidmar, P.J. Matched mode localization. J. Acoust. Soc. Am. 1988, 84, 310–320. [Google Scholar] [CrossRef]
Westwood, E.K. Broadband matched-field source localization. J. Acoust. Soc. Am. 1992, 91, 2777–2789. [Google Scholar] [CrossRef] [Green Version]
Baggeroer, A.B.; Kuperman, W.A.; Mikhalevsky, P.N. An overview of matched field methods in ocean acoustics. IEEE J. Ocean. Eng. 1993, 18, 401–424. [Google Scholar] [CrossRef]
Smith, G.B.; Feuillade, C.; Del Balzo, D.R.; Byrne, C.L. A nonlinear matched-field processor for detection and localization of a quiet source in a noisy shallow-water environment. J. Acoust. Soc. Am. 1989, 85, 1158–1166. [Google Scholar] [CrossRef]
Michalopoulou, Z.H.; Porter, M.B. Matched-field processing for broad-band source localization. IEEE J. Ocean. Eng. 1996, 21, 384–392. [Google Scholar] [CrossRef] [Green Version]
Premus, V.E.; Backman, D. A matched subspace approach to depth discrimination in a shallow water waveguide. In Proceedings of the Forty-first Asilomar Conference on Signals, Systems & Computers, Pacific Grove, CA, USA, 4–7 November 2007. [Google Scholar]
Premus, V.; Helfrick, M.N. Use of mode subspace projections for depth discrimination with a horizontal line array: Theory and experimental results. J. Acoust. Soc. Am. 2013, 133, 4019–4031. [Google Scholar] [CrossRef] [PubMed]
Yang, T.C. Data-based matched-mode source localization for a moving source. J. Acoust. Soc. Am. 2014, 135, 1218–1230. [Google Scholar] [CrossRef] [PubMed]
Du, J.; Zheng, Y.; Wang, Z.; Cui, H.; Liu, Z. Passive acoustic source depth discrimination with two hydrophones in shallow water. In Proceedings of the OCEANS, Shanghai, China, 10–13 April 2016. [Google Scholar]
Conan, E.; Bonnel, J.; Chonavel, T.; Nicolas, B. Source depth discrimination with a vertical line array. J. Acoust. Soc. Am. 2016, 140, EL434–EL440. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Conan, E.; Bonnel, J.; Nicolas, B.; Chonavel, T. Using the trapped energy ratio for source depth discrimination with a horizontal line array: Theory and experimental results. J. Acoust. Soc. Am. 2017, 142, 2776–2786. [Google Scholar] [CrossRef] [PubMed]
Liang, G.; Zhang, Y.; Zhang, G.; Feng, J.; Zheng, C. Depth discrimination for low-frequency sources using a horizontal line array of acoustic vector sensors based on mode extraction. Sensors 2018, 18, 3962. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Niu, H.; Reeves, E.; Gerstoft, P. Source localization in an ocean wave-guide using supervised machine learning. J. Acoust. Soc. Am. 2017, 142, 1176–1188. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Niu, H.; Ozanich, E.; Gerstoft, P. Ship localization in Santa Barbara Channel using machine learning classifiers. J. Acoust. Soc. Am. 2017, 142, EL455–EL460. [Google Scholar] [CrossRef] [PubMed]
Choi, J.; Choo, Y.; Lee, K. Acoustic classification of surface and underwater vessels in the ocean using supervised machine learning. Sensors 2019, 19, 3419. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wang, L.; Guo, S.; Huang, W.; Qiao, Y. Places205-vggnet models for scene recognition. arXiv 2015, arXiv:1508.01667. preprint. [Google Scholar]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Signal Processing Information Base (SPIB). Available online: http://spib.linse.ufsc.br/ (accessed on 10 October 2021).
Jensen, F.B.; Kuperman, W.; Porter, M.B.; Schmidt, H. Computational Ocean Acoustics, 2nd ed.; Springer Science & Business Media: New York, NY, USA, 2011; pp. 3–4+337–341. [Google Scholar]
Gingras, D.F.; Gerstoft, P. Inversion for geometric and geoacoustic parameters in shallow water: Experimental results. J. Acoust. Soc. Am. 1995, 97, 3589–3598. [Google Scholar] [CrossRef] [Green Version]
Akal, T.; Gehin, C.; Matteucci, B.; Tonarelli, B. Measured and Computed Physical Properties of Sediment Cores: Island of Elba Zone; SACLANT ASW Research Centre: La Spezia, Italy, 1972; Special Report No. M-82. [Google Scholar]
Krolik, J.L. The performance of matched-field beamformers with Mediterranean vertical array data. IEEE Trans. Signal Process. 1996, 44, 2605–2611. [Google Scholar] [CrossRef]
Porter, M.B. The KRAKEN Normal Mode Program. Available online: https://oalib.hlsresearch.com/Modes/AcousticsToolbox/manualtml/kraken.html (accessed on 9 October 2020).

Figure 1. The whole architecture of surface and underwater acoustic source discrimination based on machine learning using only a single hydrophone.

Figure 2. The diagram of the basic random subspace algorithm.

Figure 3. A residual block in ResNet.

Figure 4. The experimental environment of SACLANT 1993.

Figure 5. The features of four samples of the simulation data, where the horizontal axis represents the frequency from 300 to 350 Hz and the vertical axis represents the normalized amplitude from −1 to 1: (a) the 1st features of the surface source; (b) the 930th features of the surface source; (c) the 931st features of the underwater source; (d) the 2790th features of the underwater source.

Figure 6. The spectrogram of the experimental data of hydrophone No. 24.

Figure 7. The features of two samples of the experimental data, where the horizontal axis represents the frequency from 300 to 350 Hz and the vertical axis is the normalized amplitude from −1 to 1: (a) the 1st features of the experimental source; (b) the 1496th features of the experimental source.

Figure 8. Number of neighbors vs. precision (in blue) and recall (in red) of simulation data.

Figure 9. Number of neighbors vs. accuracy of simulation data (in blue) and experimental data (in red).

Figure 10. Number of nearest neighbors vs. 10-fold classification error.

Figure 11. Number of predictors selected at random vs. 10-fold classification error for k = 1.

Figure 12. Number of learners in ensemble vs. 10-fold classification error for k = 1 and d = 22.

Figure 13. The confusion matrix plot of the simulation data of the final model produced by the random subspace kNN classifier.

Figure 14. The ROC plot of the simulation data of the best model produced by the random subspace kNN classifier.

Figure 15. The kNN validation results (precision, recall and F1) of simulation data of all 48 hydrophones.

Figure 16. The kNN validation accuracy of simulation data and testing accuracy of experimental data of all 48 hydrophones.

Figure 17. The random subspace kNN validation results (precision, recall and F1) of simulation data of all 48 hydrophones.

Figure 18. The random subspace kNN validation accuracy of simulation data and testing accuracy of experimental data of all 48 hydrophones.

Figure 19. The ResNet-18 validation results (precision, recall and F1) of simulation data of all 48 hydrophones.

Figure 20. The ResNet-18 validation accuracy of simulation data and testing accuracy of experimental data of all 48 hydrophones.

Table 1. Architecture of ResNet-18.

Layer name	[Kernel Size, Filters]	No. of Blocks
Conv1	$[7 \times 1, 64]$	1
Pool	$[3 \times 1]$	1
Conv2_x	$[\begin{matrix} 3 \times 1, 64 \\ 3 \times 1, 64 \end{matrix}]$	2
Conv3_x	$[\begin{matrix} 3 \times 1, 128 \\ 3 \times 1, 128 \end{matrix}]$	2
Conv4_x	$[\begin{matrix} 3 \times 1, 256 \\ 3 \times 1, 256 \end{matrix}]$	2
Conv5_x	$[\begin{matrix} 3 \times 1, 512 \\ 3 \times 1, 512 \end{matrix}]$	2
Average pooling Fully connection layer Softmax classifier

Table 2. The results (precision, recall, F1 score and accuracy) of simulation and experimental data of the final model using random subspace kNN and ResNet-18.

	Number of Neighbors	Simulation Precision	Simulation Recall	Simulation F1 Score	Simulation Accuracy	Experiment Accuracy
Random subspace kNN	1	0.9817	0.9785	0.9800	0.9735	1.0
ResNet-18		1.0	1.0	1.0	1.0	1.0

Table 3. Lists of hydrophones failing to reach the threshold of 0.9 using three kinds of machine learning methods (kNN, random subspace kNN and ResNet-18).

Machine Learning Method	List of Hydrophones Failing to Reach Threshold	Number of Hydrophones Failing to Reach Threshold
kNN	8, 9, 10, 11, 12, 16, 18, 20, 21, 22, 23, 24, 26, 27, 33, 34, 40, 41, 42	19
Random subspace kNN	11, 12, 20, 21, 22, 26, 27, 28, 33, 40, 42	11
ResNet-18	3, 5, 6, 7, 13, 14, 15, 16, 17, 18, 21, 25, 26, 28, 30, 31, 33, 41, 42, 47, 48	21

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, W.; Wu, Y.; Shi, J.; Leng, H.; Zhao, Y.; Guo, J. Surface and Underwater Acoustic Source Discrimination Based on Machine Learning Using a Single Hydrophone. J. Mar. Sci. Eng. 2022, 10, 321. https://doi.org/10.3390/jmse10030321

AMA Style

Zhang W, Wu Y, Shi J, Leng H, Zhao Y, Guo J. Surface and Underwater Acoustic Source Discrimination Based on Machine Learning Using a Single Hydrophone. Journal of Marine Science and Engineering. 2022; 10(3):321. https://doi.org/10.3390/jmse10030321

Chicago/Turabian Style

Zhang, Wen, Yanqun Wu, Jian Shi, Hongze Leng, Yun Zhao, and Jizhou Guo. 2022. "Surface and Underwater Acoustic Source Discrimination Based on Machine Learning Using a Single Hydrophone" Journal of Marine Science and Engineering 10, no. 3: 321. https://doi.org/10.3390/jmse10030321

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Surface and Underwater Acoustic Source Discrimination Based on Machine Learning Using a Single Hydrophone

Abstract

1. Introduction

2. Whole Architecture

3. Theory and Method

3.1. Normal-Mode Model

3.2. kNN Classifier

3.3. Random Subspace kNN Classifier

3.4. ResNet-18 Classifier

4. Simulation and Experiment Data

4.1. The Environment of SACLANT 1993

4.2. The Simulation Data

4.3. The Experimental Data

5. Results and Analyses

5.1. Results of kNN

5.2. Results of Random Subspace kNN

5.3. Results of ResNet-18

5.4. Results of kNN, Random Subspace kNN and ResNet-18 for All 48 Hydrophones

5.4.1. kNN

5.4.2. Random Subspace kNN

5.4.3. ResNet-18

5.4.4. Comparison

6. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI