3.1. Feature Parameter Rules and Sample Set Construction
This article mainly studied the prediction method of underwater acoustic transmission loss, so it focused on the influence of different underwater acoustic characteristic parameters on the prediction results of the acoustic propagation model. The problem of sound transmission under water is mainly studied by two methods: one method is the wave theory, which studies the changes in the amplitude and phase of the acoustic signal in the sound field; and the other method is the ray theory, which can be used in a high-frequency range, sound wave is regarded as a beam of rays, and the change of the sound intensity in the sound field with the beam of rays is usually studied. From these two theories, several classic underwater acoustic transmission models have been developed to characterize the underwater acoustic propagation process including NM, RM, PE model, etc. [
3].
In underwater acoustics, the standard measure of sound field signal strength versus distance is the transmission loss (
TL), which is the ratio of the sound intensity
at a certain point in the sound field to the sound intensity
I0 at a distance of 1 m from the sound source. It can be expressed as Equation (9):
where
is the sound pressure at a certain point in the sound field;
r is the distance from the sound source position to a certain point in the sound field;
z is the receiving depth of a certain point in the sound field;
is the sound pressure value at a distance of 1 m from the sound source; and TL is the value of the transmission loss of the sound field in decibels (dB).
This paper selected the Bellhop model in ray mode for research. This model was written by Porter and Bueker [
21] in 1987 using the Gaussian beam tracking method to calculate the sound field in a horizontally non-uniform environment (conditions related to distance). The experimental data of this model in the frequency range of 600 Hz–30 KHz were consistent with the theoretical model [
22]. It is suitable for near-field conditions. The important characteristic length of the water depth or sound velocity profile should be greater than 20 wavelengths [
23].
For the normal model, this paper selected the Kraken model for research. This model was jointly developed by the U.S. Naval Ocean System Center (NOSC) and the U.S. Naval Research Laboratory (NRL) using the finite difference method to solve the normal wave equation. It is suitable for low-frequency (lower than 500 Hz) shallow sea and far-field calculation under uniform horizontal (distance-independent) marine environmental conditions, and the seabed structure must be given in the calculation, that is, the parameters of Cs, Kd,
p, and Mt in
Table 1.
For the fast field program model (FFP model), we choose the Scooter model for research. This model applies fast Fourier transform (FFT) technology to solve the wave equation. It is limited to the condition of a horizontally layered homogeneous medium that is not related to distance, near-field environment, and is suitable for calculations in the low-frequency range (below 500 Hz).
For the parabolic equation (PE) model, the RAMGeo model was selected. This model is an approximate solution of the wave equation with a narrow angle (<±20°) considering the coupling effect of acoustic diffraction and normal waves of various numbers. It is suitable for calculating narrow angles. The calculation speed for low frequency (lower than 500 Hz) and shallow sea is very fast. When the frequency increases and the depth increases, the calculation time increases rapidly, therefore, this model is not applicable.
The above far-field and near-field rules are constrained by the characteristic parameter horizontal distance (R). Suppose that the total spacing of the uniform linear array element D = m × d; m is the number of array element spacing; the highest frequency wavelength of the sound source (the minimum wavelength of the sound source) is
, if the sound source is at the level of the array Distance R >
, it is the far-field model, otherwise it is the near-field model. Under the near-field model, sound waves are regarded as spherical waves, and under the far-field model, sound waves are regarded as plane waves. The calculation of underwater acoustic transmission loss mainly includes characteristic parameters such as signal frequency, sea depth, and seabed environment. The specific parameters and parameter ranges are shown in
Table 1.
The optimal characteristic parameter ranges of the above four types of underwater acoustic transmission loss models are shown in
Table 2.
Sample data generation is very important for neural network training, and the distribution and statistical characteristics of the sample will also directly affect the training efficiency and prediction effect of the neural network. Based on the above constraints, we built a dataset with a normal distribution to train the neural network model. The training set generation method is as follows:
- (1)
Select an underwater acoustic propagation model, and generate the data on the principle of a normal distribution within the constraint range according to the above-mentioned constraint conditions of the parameters of the selected model.
- (2)
Set labels for the samples in the established training set. The labels of each sample are different. Usually, a combination of 0–K (K is the number of models) is used to set the corresponding label. The labels are formed into a new set corresponding to the input data.
- (3)
Combine different marine environment characteristics and signal characteristics according to the above steps to form a large training set. Then, after making a one-to-one correspondence between the input data and the label, randomly shuffle the order of the input data, and recombine the training set. The purpose of shuffling the data is to prevent too much noise data of the same type from clustering together, causing over-fitting.
Different feature parameters have different weights in the adaptation model. In the DBN neural network, the weight of each feature parameter is automatically optimized according to the training data, and finally, the optimal weight is achieved through iterative training. If we want the neural network to learn these rules in-depth, we will build more data for the network’s learning of the feature parameters with larger weight, and the rest of the feature parameters can be iterated by the neural network to obtain the optimal weight. As shown in
Table 2, we extracted a total of nine feature parameters. Compared with other models, Fs is a significant feature parameter in the ray mode and has a larger weight, so we increased the number of Fs parameters in the ray mode training. For the ray model, we increased the number of samples for the feature parameter Fs with a larger weight, and constructed 50 different values within the constraint range. The remaining eight different characteristic parameters were used as a group, and a total of 20 groups of different values were generated within the constraint range. Through the combination of the characteristic parameter Fs and other characteristic parameters, the total number of data samples of the ray model was 1000. For normal mode, it can be seen that H and R had a larger weight. We first generated 50 data with different values for the characteristic parameter H, and the remaining eight characteristic parameters generated 20 sets of sample data within the constraint range, and combined them to generate 1000 sets of data. Then, we generated 50 different values for the characteristic parameter R, and the remaining eight characteristic parameters generated 20 sets of sample data within the constraint range, and combined them to generate 1000 sets of data. A total of 2000 sets of sample data were generated for training the normal wave model data. For the fast field model, like the normal model, the H parameter and the R parameter had greater weight, but the constraint range was different, so 2000 sets of sample data were also generated. For the parabolic model (PE), it can be seen that the characteristic parameter SA had a greater weight, the same as the ray model, and 1000 sets of data were generated. Finally, 6000 sets of sample data were generated for DBN training.
3.2. Model Initialization and Training
First, we initialized the model to determine the momentum factor, learning rate, maximum number of iterations, and the number of hidden layer nodes related to the RBM training process. In addition, because DBN does not clearly define the most suitable layer number, the determination of the layer number needs to determine the depth of the DBN model through experimental evaluation results. Then, we set the relevant parameters of RBM, set the initial value of learning rate to 0.1 according to experience, and set the initial momentum to 0.5, where the number of input nodes is the number of characteristic parameters in
Table 1, and the number of output nodes is four nodes equal to the number of acoustic models. The number of pre-training iterations (epochs) was selected as 200, and the number of global reverse fine-tuning was 100. In order to prevent over-fitting, we randomly selected 3/4 of each type of data as training samples, and the rest as test samples. The number of the random sample training batch size was 20.
Table 3 shows the performance of the DBN neural network at different depths. It can be seen that as the depth of the DBN model increases, the time cost also doubles. When the depth is greater than 5, the classification accuracy does not increase significantly. The training time has been increasing by a large margin, so for comprehensive training time and classification accuracy, five layers is a more reasonable depth choice.
The training of DBN mainly consists of two parts. The first part is unsupervised bottom-up layer-by-layer pre-training, which transfers the feature vector from the input to the output. The second part is the supervised top-down reverse fine-tuning. The output value is compared with the given data label to get the error, and the error is back propagated from the output end to the input end to achieve the purpose of adjusting the DBN parameters. The overall training process is shown in
Figure 4:
The specific training steps are as follows:
- Step 1:
Preprocess the dataset constructed in
Section 3.1 and input data into the model in batches for training.
- Step 2:
Set the bias of the visible layer and the hidden layer and the initial value of the weight between each neuron . represents the bias of the input data layer or visual layer, represents the bias of each hidden layer or data output layer, and represents the connection weight of each layer. According to the research of Hinton et al., the connection weight of each layer is initialized to obey the normal distribution N (0,1), and the initial value of the bias of the visible layer and the bias of the hidden layer is set to 0.
- Step 3:
Next, we perform iterative training on DBN. In the DBN model, each layer of RBM learns the characteristics of the input data forward and unsupervised, and uses the greedy unsupervised learning algorithm to update the bias value of each unit of each layer and the weight between the hidden layer.
- Step 4:
After completing the network pre-training, we add the Softmax output layer, and use fine-tuning to transform the entire network from a generative model to a discriminative model. This network loss function can be defined as:
In the formula, 1{label(
n) = 1} means that the condition in brackets is set to 1 when the condition is satisfied, and 0 is not satisfied.
and
are the connection weight vector of the nth node in the output layer and its offset; x is the response of the upper node. The fine-tuning process generally uses the gradient descent method, the goal is to minimize the loss function and use the error back propagation (BP) to fine-tune the network. The fine-tune process is to compare the correct classification labels corresponding to different feature parameters (
Table 1) in turn. If the expected output value is not obtained, the difference between the actual output and the expected output (error) is calculated, and the error is successively modified by modifying the weight of each neuron propagate to the input layer.
- Step 5:
When iterations end, the training of the entire model is completed.
After the training of the DBN model is completed, all training samples can be accurately classified under the correct label assigned, that is, the error between the predicted label and the real label converges to 0. Use loss
to evaluate the degree of convergence of the model: Loss tending to 0 means that the model converges well. As shown in the loss curve of
Figure 5, as the number of iterations increases, the loss tends to 0.
This training uses 75% of the data to train the model, and its loss function and accuracy are shown in
Figure 5. It can be seen from the figure that as the number of training increases, the model prediction accuracy gets higher and higher, and the error loss curve also declines, and finally remains at a stable value. At this time, an increase in the number of training may lead to overfitting. Finally, the 25% data in the sample set is used for testing, and the accuracy of model adaptation is obtained, which is 94.86% in combination. It shows that the recognition accuracy is already very high, close to the distribution of the sample data, and then the underwater acoustic transmission loss is calculated through the model matching and adaptive underwater acoustic propagation model.