**1. Introduction**

Visual target tracking is a branch in the field of computer vision, and thanks to the development of deep learning techniques, especially the application of neural networks [1], target tracking has entered a new phase. In the target tracking task, the target being tracked is arbitrary, and the traditional trackers designed based on manual features [2] perform generally in target modeling. Thanks to the powerful generalization ability of depth features, which can model all kinds of targets well, depth feature-based trackers [3–5] have achieved excellent results in recent years.

Although the existing depth feature-based trackers perform well, we find that the pre-trained depth features still have some interference when modeling arbitrary targets. This is because, firstly, the targets being tracked are arbitrary, and if the dataset used to train the depth feature model does not contain such targets, that is, the depth feature model has not learned information about such targets, then when extracting the target features, it can only rely on the existing information for speculation, which often brings a lot of uncertainties and leads to more disturbances in the model. Secondly, even if the deep feature model has learned such targets, and when the general tracker uses the last layer or layers to extract the target features, it will lead to more disturbing factors in the feature model because of the huge amount of data. Finally, the existing pre-trained deep feature models are created mainly for the target recognition task, where its main task is to identify

**Citation:** Zhao, Y.; Zhang, J.; Duan, R.; Li, F.; Zhang, H. Lightweight Target-Aware Attention Learning Network-Based Target Tracking Method. *Mathematics* **2022**, *10*, 2299. https://doi.org/10.3390/ math10132299

Academic Editor: Tao Zhou

Received: 12 May 2022 Accepted: 26 June 2022 Published: 30 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

1

all similar targets that appear in each frame. The target tracking task, on the other hand, is different and is to identify the same target in subsequent frames, so the tracker based on pre-trained features may be wrong in the face of interference from similar targets in the same frame.

Some trackers use the designed lightweight network as the memory module and use the target appearance information in each frame to update network parameters, to achieve good appearance memory performance. In this paper, a lightweight target-aware attention learning network is designed to learn the most effective channel features of the target online, using the target information in the first frame template to learn a series of channel features with weights, and by recombining these channel features. A compact and effective deep feature is obtained, which can better distinguish the object from the background compared to the pre-trained features. At the same time, a new attention learning loss function is developed to optimize the training of the proposed network using the Adam optimization method. Different from other methods, the lightweight network designed in this paper does not require complex parameters and is easy to implement. It only needs to learn the most salient features through the reliable information of the first frame of the target and does not need to use too much memory temporarily, which is beneficial for the efficient use of hardware resources. Finally, the lightweight target-aware attention learning network is unified into the Siamese tracking network framework to effectively achieve target tracking. Figure 1 shows that our tracker yields better tracking performance when compared with other trackers.

**Figure 1.** Comparison of our tracker with other trackers for Bolt (**top**), Basketball (**bottom**).

The main contributions of this article are described in summary as follows:

