*4.3. Tracklet Growing*

If the frame gap between *<sup>T</sup>t*2*m* and *<sup>T</sup>t*4*n* was small, variations in the appearance and motion from *<sup>T</sup>t*2*m* –*Tt*4*n* were not obvious, and the PAN could work well. Otherwise, the long-term frame gap brought a large variety of appearances, and motions may reduce the performance. PAN considers more local elements of the tracklet to enhance the performance. In order to make tracklet association more reliable, it is effective to reduce the time interval in the sliding windows as much as possible. Therefore, the tracklet growing process was used to extend the tracklet by estimated bounding boxes, which were missing from the detection. It contained forward and backward growth.

To forward the extended tracklet *<sup>T</sup>t*2*m* , the center position **p***f*1 (*Tt*2*m* )=(*x*ˆ, *y*ˆ) in frame *t*2 + 1 was first estimated by quadratic fitting. Then, the optimal estimation bounding box was searched as follows:

$$\begin{aligned} d^\* &= \arg\min\_{d \in \mathcal{C}} \left\| \mathbf{H}(T\_m^{t2}(\varepsilon)) - \mathbf{H}(d) \right\|\_2^2 \\ \text{s.t.} & \left\| \mathbf{H}(T\_m^{t2}(\varepsilon)) - \mathbf{H}(d) \right\| \le \varepsilon\_1 \end{aligned} \tag{14}$$

where *C* is the candidate bounding boxes set, center positions x and y are sampled according to the distribution of *G*(0, *<sup>σ</sup>m*), and the size is equal to *<sup>T</sup>t*2*m* (*e*). **H** denotes the color histogram of detection *<sup>T</sup>t*2*m* (*e*). The goal was to find the most similar estimation. If the optimal estimation *d<sup>t</sup>*2+<sup>1</sup> *o* was found, a conflict process was also required to avoid false alarms. If the overlap between *d<sup>t</sup>*2+<sup>1</sup> *o* and an existing *d<sup>t</sup>*2+<sup>1</sup> *i* exceeded the threshold, the forward growth of *<sup>T</sup>t*2*m* stopped. Otherwise, *<sup>T</sup>t*2*m* was updated to *Tt*2+<sup>1</sup> *m* with *d<sup>t</sup>*2+<sup>1</sup> *o* and the growing process continued to frame *t*2 + 2. The backward extension was similar to the forward process. For the isolated tracklets, random sampling was used to form the candidate estimations. After these missing detection compensation processes, tracklets were extended to improve the discrimination performance of PAN, and more reliable associations could be made.

#### *4.4. Tracklet Association in Sliding Windows*

Tracklet association was the last module in MOT to generate the final trajectories of objects. The main task was to link tracklets belonging to the same objects into a complete trajectory based on similarities among tracklets. Solutions such as min-cost networks, energy minimization, successive shortest paths, and the Hungary algorithm are widely used to generate tracking results. Global optimization is an ideal scheme because the previous judgments will be revised to achieve the overall optimal results. In cases where it is difficult to distinguish objects, this dynamic scheme can achieve better tracking performance than a greedy strategy. Similar to tracking by learning feature extraction method [15], network flows methods were no longer used to ge<sup>t</sup> the tracking result. The MAP problem shown in Equation (10) was directly mapped to a generalized linear assignment:

$$\begin{aligned} \max\_{L} & \sum\_{i=1}^{N} \sum\_{j=1}^{N} \Lambda(T\_{i\prime} T\_{j}) L\_{ij} \\ \text{s.t.} & \sum\_{i=1}^{N} L\_{ij} \le 1; \sum\_{j=1}^{N} L\_{ij} \le 1 \end{aligned} \tag{15}$$

To solve problem Equation (15), the similarity <sup>Λ</sup>(*Ti*, *Tj*) between tracklets was used; this is equal to linking probabilities mainly based on PAN features. <sup>Λ</sup>(*Ti*, *Tj*) was computed by Equation (13). However, PAN features cannot be extracted from tracklets with lengths of less than two elements. For this particular case, <sup>Λ</sup>(*Ti*, *Tj*) degenerated into the traditional weighted combination of appearance and motion. *Lij* is the association indicator, where 1 indicates connection and 0 means disconnection. The constraints guaranteed the uniqueness of association. As the better discriminative PAN, the similarity matrix **Λ** was normalized, and Equation (15) was solved by a greedy iterative algorithm.
