3.4.2. Single Instance

If there is just a single occurrence of the object in every frame it appears in throughout the video, then by Hypothesis 1, the *id* for object *O* at frame *n* is given by Equation (6) as follows:

$$O\_{id}^{\mathfrak{n}} = \begin{cases} \mathit{id} = 1, & n = 0 \\ O\_{\mathit{id}'}^{\mathfrak{n}} & S\_{\mathfrak{a}}^{\mathfrak{n}} > threshold\_{\mathfrak{a}} \\ \mathit{id} + 1, & S\_{\mathfrak{a}}^{\mathfrak{n}} < threshold\_{\mathfrak{a}} \\ & where \ (a = 1:n - 1) \end{cases} \tag{6}$$

where in *Onid* is the object *id* at frame *n*, and *Sna* is the context-similarity measure between the frame *a* and *n* as calculated in Equation (5).

## 3.4.3. Multiple Instance

If there are multiple occurrences of the object, we propose a graph-based approach to correctly localise the object in the frame. An example of this problem is shown in Figure 8. In such cases, where multiple objects of the same class exist, it is not only important to know whether shot/LSU of the frames are similar, but also to know the spatial position/location of the object in the frame, so that the object can be re-IDed correctly.

Therefore, based on the bounding box co-ordinates of the detected objects, a location graph is estimated using spatial distances between the objects, as shown in Figure 9. The idea here is to generate and compare the graphs such that the IDs of the objects can be matched.
