2.2.1. Sampling

When using 1D-CNN to learn the variation features of the side scan data sequence, the one-ping backscatter data sequence should be divided into regional sub-sequences as samples, as illustrated in Figure 2. The sub-sequence/sample size should be properly selected to accurately reflect the variation characteristics, as discussed in Section 4.1. An improper sample size would cause the network to learn the wrong information and misjudge the results.


**Figure 2.** Data sequence sample of a ping data. The positive (bottom) and negative (noise, water column, and seabed) samples.

To establish the sample sets, the positive and negative samples need to be selected from raw sonar backscatter strength data sequences. The positive samples can be detected using the traditional method with manual intervention, whereas the negative samples should contain the samples in the water column area, those containing noise, and those in the seabed area, as shown in Figure 2.

#### 2.2.2. Normalization of Sonar Data Sequences

As shown in Figure 2, the samples are in various strength ranges and need to be normalized into the same range for the network training. These samples can be normalized by using the *z*-score to ensure that they are in the same range [28]. Given that the side scan data are usually recorded in a fixed range (e.g., 0 to 216-1), the samples can be simply normalized by the equation below.

$$dB = \frac{dB}{Max\_{dB}} \Big( c.g. \frac{dB}{32767} \Big). \tag{1}$$

After the normalization, the sample range should be normalized to (0~1), as shown in Figure 3.

**Figure 3.** Normalization of the data sequence samples.

#### 2.2.3. Network

The bottom tracking of the side scan sonar data aims to recognize special bottom backscatter strength sequences, and can be fulfilled via the 1D-CNN recognition of the normalized data sequences.

1D-CNN is the one-dimensional version of common CNNs, which also contain input layers, convolution layers, pooling layers, and the output layers. Given the characteristics of our problem, the input layer of the 1D-CNN contains backscatter strength sequences, whereas the output layer contains the positive (1) and negative (0) results, as shown in Figure 4.

The input layer contains the one-dimensional normalized backscatter strength samples, and the median layers are combinations of convolution and pooling layers. The one-dimensional convolution operation *s* of the data sequences in discrete form is shown below.

$$s(t) = (d \ast w)(t) = \sum\_{a = -\infty}^{\infty} d(a)w(t - a) \tag{2}$$

where *d* is the input data sequence, *w* is the activation function, and *t* is the *t*th value of *d*.

The following rectified linear unit (ReLU) *h* is selected as the activation function for the convolution layers.

$$h\_{w,b}(X) = \max(X \cdot w + b, 0) \tag{3}$$

where *w* and *b* are the trainable parameters, and *X* is the input data.

**Figure 4.** The structure of the one-dimensional convolution neural network (1D-CNN) with the positive and negative input samples and the corresponding output results.

After the convolution and pooling layers, the flattened layer reshapes the tensors into vectors, whereas the fully-connected layer usually uses the ReLU to connect the output layer.

The last layer is the output layer, with the following activation sigmoid function σ:

$$\sigma(\mathbf{x}) = \frac{1}{1 + e^{-\mathbf{x}}} \tag{4}$$

where *x* is the input data.

After each training loop, the loss function is used to calculate the difference between the predicted results and the ground truth. Given that the bottom tracking problem is a binary classification problem, the cross-entropy loss function is selected as the following loss function *H*.

$$H\_{\mathcal{Y}i}(y) = -\sum\_{i} y\_i' \log(y\_i) \tag{5}$$

where *yi* is the predicted result and *yi* is the ground truth.

The root mean square propagation optimizer is chosen to update the parameters. After several loops, if the network learns the variation features of the samples properly, then the loss function would reach a stable low value, whereas the training and validation accuracies would reach stable high values. The well-trained 1D-CNN serves as the basis of the real-time bottom tracking method.

### *2.3. Bottom Tracking Using the Trained 1D-CNN*

In this section, the trained network is used for the bottom tracking of the side scan data. The complete procedures of real-time bottom tracking are explained in detail, along with the auxiliary methods that can improve real-time performance and recognition accuracy.
