*5.1. Quantitative Results*

5.1.1. Shot Boundary Detection

In this study, to evaluate shot boundary detection, we have compared our framework with state-of-the-art CNN-based fast shot boundary detection[20]. We have used 10 random Internet Archive videos from the RAI dataset. Table 1 compares the precision, recall, and Fscore of our pipeline with this state-of-the-art algorithm. These experimental results show that the state-of-the-art model performs extremely well on normal transitions, while performing comparatively poorly on complex transitions. Our approach, on the other hand, has obtained

similar precision values for both complex and normal transitions. On average, our approach has outperformed the state-of-the-art with an f1 measure of 0.92.

#### 5.1.2. LSU Boundary Detection

In this study, we also evaluated LSU boundary detection by comparing the results against two different algorithms for scene detection: [26], which uses a variety of visual and audio features that are integrated in a Shot Transition Graph (STG); and [27], which uses a spectral clustering algorithm and Deep Siamese network-based model to detect scenes. We used the same 10 videos from the RAI dataset for validation. Table 2 tabulates the coverage and overflow measures calculated based on the above methods. Our experimental results indicate that the model in [26] has the highest coverage value of 0.8—but it also has a very high overflow measure. Ref. [27] provides a comparatively better overflow result and overall performance than [26]. Although our approach achieved a lower coverage measure, it has obtained a very good overflow measure, which has resulted in a higher *Fscore*. Our approach, with an average *Fscore* of 0.74, outperformed the other methods by more than 10%.

**Table 1.** Performance comparison for shot detection using boundary-level metrics.


**Table 2.** Performance comparison for LSU detection using frame-level metrics.


## 5.1.3. Object re-ID

In this study, to evaluate object re-ID, we have applied the algorithm on 10 random episodes from Season 4 of *New Girl* and 10 random episodes from Season 3 of *Friends* TV shows. The dataset does not possess ground truth labels. Thus, the approach was manually validated—if the object was re-IDed correctly it was marked *True*; else it was marked *False*. The *True* and *False* values were consolidated per object class for all episodes of *New Girl* and *Friends* separately, and the object classes that had a minimum of 20 occurrences in all the episodes of SOAP put together were chosen to estimate the accuracy. The accuracy was then calculated for each SOAP separately. Table 3 shows the accuracy results for the object

re-ID applied on the two SOAP series. These experimental results show that our object re-ID algorithm performs at an average accuracy of 0.87.


**Table 3.** Performance evaluation of object re-ID.
