**4. Experiment and Analysis**

Our tracker is implemented on a PC with an i7-9700 3.0 GHz and a single NVIDIA GeForce RTX 2060 GPU with Pytorch. The algorithm proposed in this chapter uses the VGG-16 [14] neural network as the feature extraction network for the target and the search region, and the outputs of the Conv4-1 and Conv4-3 layers are used for target appearance modeling. The number of channel dimensions of the outputs is 512. Then the feature passes through the lightweight network and its feature channels are given different weights, and the number of channels is reduced to 380. Moreover, the kernel of the lightweight target-aware attention learning network is set to match the size of the target template. For the designed lightweight target-aware attention learning network, online training is performed using the attention learning loss function only in the first frame of each video sequence, setting the maximum number of iterations to 100, the momentum setting to 0.9, and the convergence loss threshold to 0.01. To handle scale variations, we also search for the object over three scales (0.957, 1, 1.047), and update the scales by scale weights (0.99, 1, 1.005). To evaluate the performance of the proposed algorithm, this section is tested on the OTB-50 [15] and OTB-100 [16] dataset, TC-128 [17] dataset, UAV123 [18] dataset set, VOT2016 [19] dataset and LaSOT dataset [20].

#### *4.1. Ablation Studies*

To better explain the validity of the proposed method, the ablation experiment of this work is analyzed on the OTB-100 dataset using one-pass evaluation. Our algorithm contains the base Siamese-based tracker and the proposed lightweight target-aware attention network. Figure 5 shows the precision and success rate of baseline without the proposed attention network and our method.

From Figure 5, we can see that when the proposed attention network is added, the accuracy and success rate of the tracking algorithm are improved. The network removes redundant and partial background information from the features to achieve superior tracking performance by online mining of different channels of the target depth features for their ability to represent the target information. The experimental results in Figure 5 show that the proposed attention network contributes to the performance of the tracking algorithm.

**Figure 5.** The ablation studies on the OTB-100 dataset.
