*Spatial Distance Estimation*

Although, the 2-D Euclidean distance measure works well between frames with similar angles across similar LSUs, there are cases where the angle and zoom changes across similar LSUs. The topological information contained within the frame is also lost, making it impossible to obtain a realistic distance estimation. To compensate for the topological information, we propose to use depth maps, in combination with the location graph, to estimate a more realistic spatial distance between the objects in a frame. To obtain depth information, we use Dense Depth [24], pre-trained on NYU Depth V2 dataset [25]. The estimated depth is used as a third dimension, and thereby the Euclidean measure is recalculated as shown in Figure 10.

**Figure 9.** Spatial location graph generated for a frame using the centre of the bounding box coordinates and Euclidean distance between them.

**Figure 10.** Comparison of frame 26070 with its estimated depth. Using the depth and distance measures, the actual distance between the two objects can be estimated.

Let *x* and *y* be the centre points of the objects *Ox* and *Oy*, respectively, in a frame. Then the distance between them is given by:

$$distance = |\mathbf{x} - \mathbf{y}|\tag{7}$$

The estimated depth has a range of values that are clipped between 10 and 1000, where 10 is the closest and 1000 is the farthest. If the depth values at points *x* and *y* can be represented as *<sup>δ</sup>*(*x*) and *<sup>δ</sup>*(*y*), the depth between the objects can be estimated by:

$$depth = |\delta(\mathbf{x}) - \delta(y)|\tag{8}$$

Finally, from Equations (7) and (8), the actual distance between the objects can be calculated as follows:

$$D\_x^y = \sqrt{(distance)^2 + (depth)^2} \tag{9}$$

## *Spatial Location Graph*

For every frame with multiple instance objects, the spatial location graph is estimated based upon the pairwise distance between the objects in the frame, using Equation (9). Let *Gi*(*<sup>O</sup>*, *D*) and *Gj*(*<sup>O</sup>*, *D*) be the graphs with objects as nodes and their distances as edges for two similar frames *i* and *j*. The objects in frame *j* are matched with the objects in *i*, based on comparing the distances between the objects in *j* and *i* such that the difference between the distances is always minimal. For instance, if frame *i* has 4 objects, *Oi*1,*Oi*2,*Oi*3,*Oi*4, of which *Oi*1 and *Oi*2 belong to the same class, and *<sup>D</sup>*12*i* , *<sup>D</sup>*13*i* denotes the distance between objects, then to re-identify objects *O*1 in frame *j*, the sub-graph distances of *Gi*[*O*1] and *Gi*[*O*2] are compared with *Gi*[*O*1]. *Oj*1 is deduced to be the same as the object in *i* for which the difference between distances is minimal. The overall object re-ID algorithm is shown in Algorithm 2 while the complete re-ID pipeline is shown in Figure 11.

```
Algorithm 2: Multi-object re-ID.
   Input: Objects_list per frame, shot boundary and LSU similarity
   Output: Object IDs per frame
1 shots = []
2 for object = object_list[0]:object_list[len(object_list)] do
 3 if count(objects) in all_frames <= 1 then
 4 single_instance.append(object)
 5 else
 6 multi_instance.append(object)
7 for object = single_instance[0]:single_instance[len(single_instance)] do
 8 id = 1
 9 for i = 1:class do
10 for i = 1:n do
11 if i==0 then
12 objectid = id
13 id = id + 1
14 else
15 if similarity(f ramen,1: f ramen−1 > threshold, then
16 Let frame a be the frame most similar to frame n
17 objectid = Oaid
18 else
19 objectid = id
20 id = id + 1
21 for object = multiple_instance[0]:multiple_instance[len(multiple_instance)] do
22 id = 1
23 for i = 1:class do
24 for i = 1:n do
25 if i==0 then
26 objectid = id
27 id = id + 1
28 else
29 if similarity(f ramen,1: f ramen−1 > threshold, then
30 Let frame a be the frame most similar to frame n
31 object_list = graph_compare(Gn[Oclass], Ga[Oclass])
32 for object_id in object_list do
33 objectid = object_id
34 else
35 objectid = id
36 id = id + 1
```
**Figure 11.** Proposed pipeline for multi-object re-ID. Given the input video, we estimate LSU and objects per frame for the video. Based on the number of occurrences of the object in a frame, the objects are categorised as single- and multi-instance objects. Subsequently using the inter-frame similarity and graph-based algorithms, object IDs are created and visualised.
