2.2.3. Loss Function and Network Parameters

In the mapping result, the points within the search area are positive samples, and the points outside the area are negative samples. The loss function for each point in the mapping result is:

$$L\_p(\mathfrak{x}, \mathfrak{s}) = \lg\left(1 + e^{-\mathfrak{x}\mathfrak{s}}\right) \tag{5}$$

where *s* is the true value of the point, *x* is the label of the point, and *x*∈{+1, −1}. The overall loss of the mapping result is the average of the losses of all points, that is:

$$L\_{ALL}(\mathbf{x}, \mathbf{s}) = \sum\_{D} L\_{p}(\mathbf{x}(z), s(z)) \tag{6}$$

where *z* is the position, *x*(*z*) is expressed as:

$$\mathbf{x}(z) = \begin{cases} 1 & h|z-c| \le R \\ -1 & \text{others} \end{cases} \tag{7}$$

where *h* is the step size of the network, *c* is the center point, and *R* is the radius of the search area.

The weight coefficients of the similarity discriminant function are solved by the gradient descent method to minimize the error between the sample labels *x* and the similarity discriminant function. The specific parameters of the network are shown in Table 1. The maximum pooling layer is used after the first two convolution layers, respectively. The Relu nonlinear activation function is used after each convolution layer except the last layer. The batch normalization (BN) layer is embedded after each linear layer. There is no padding in the network.

**Table 1.** Network parameters.


### *2.3. Workflow*

After the initial target image on the velocity spectrum is given, all subsequent images are compared with the initial target image for similarity. The Siamese network performs full convolution on the search image. In order to find the position of the target in the searched image, all possible positions can be tested exhaustively, and the image with the greatest similarity to the target can be determined. The triple Siamese structure actually calculates the cross-correlation between the two input branches and the search branch, determines the weight coefficient of the branch fusion according to the similarity of the image, and adapts to the lateral change of the velocity spectrum by updating the initial target. The main implementation steps of the method are as follows:

Step 1: generate velocity spectrum images of all positions.

Step 2: extract the target image feature *Hi* at the specified position and within the specified window by using the initial branch.

Step 3: extract the image feature *Hc* of the search area at the current position by using the search branch.

Step 4: calculate the cross-correlation between the features *Hi* and *Hc* to obtain the target response *R*1.

Step 5: extract the target image feature *Hr* at the specified position and within the specified window by using the update branch.

Step 6: calculate the cross-correlation between the features *Hr* and *Hc* to obtain the target response *R*2, and take *R*<sup>2</sup> as the current position tracking result.

Step 7: determine the fusion weight coefficients *a*<sup>1</sup> and *a*2, and obtain *Rf* = *a*1*R*<sup>1</sup> + *a*2*R*<sup>2</sup> as the new input of the update branch.

Step 8: move the window, repeat steps 3–7 until the velocity spectrum of this position is traversed, and then end the current position task.

Step 9: move the position, repeat steps 2–8 until the velocity spectra of all positions are traversed, and then the whole task is completed.
