**1. Introduction**

Currently, the number of surveillance cameras are installed indoors and outdoors to monitor various aspects of industrial and public safety is actively increasing. Previously, viewing cameras and determining emergency situations were usually assigned to the operator, who had to activate an alarm or perform other active actions. Even when there were relatively few cameras, most often records of the incident were extracted from the archive later, if there was information available about where and when to look. Now, when there are so many cameras that the operator is not able to control them all, video surveillance systems of varying degrees of complexity are in great demand, from systems that are able to detect movement in the frame to those that can determine some complex emergency situations [1].

In this paper, the detection of anomalies in a video based on a sparse optical flow is considered. Here, the input data is a sequence of extracted video frames, and the output data is an area or a set of areas in the frame where abnormal movement occurs.

There are several approaches to anomalies detection, which are based on a grid pattern [2], global pattern [3], trajectories [4], etc. Each of them has advantages and disadvantages, but they are all relatively computationally complex. There are slightly simpler algorithms that perform just as well.

This paper considers the implementation of the method based on the analysis of local binary tracklets [5], which requires detection of feature points and construction of a trajectory for them using the optical flow method.

**Citation:** Fomin, I.; Rezets, Y.; Smirnova, E. Anomaly Detection on Video by Detecting and Tracking Feature Points. *Eng. Proc.* **2023**, *33*, 34. https://doi.org/10.3390/ engproc2023033034

Academic Editors: Askhat Diveev, Ivan Zelinka, Arutun Avetisyan and Alexander Ilin

Published: 19 June 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **2. Related Works**

#### *2.1. Feature Points Detection Methods*

To detect features, algorithms are designed to determine the points of a sharp change of an image value in more than one direction.

Harris corner detector [6] uses image change in a sliding window of a certain size when it is shifted relative to a given point in any direction.

GFTT (Good Features To Track) [7] works on the same principle, but it calculates the measure of features differently, defining it as the minimum of two eigenvalues in the matrix of pixels of a sliding window—if both are large enough, the window shift strongly affects both directions. FAST [8] performs pixel comparisons with those lying on a circle with a radius of 3 pixels, and this is a special point if 12 pixels in the circle are noticeably different from the central one. AGAST [9] uses an approach with a binary decision tree so that each point can be checked in literally two comparisons, significantly speeding up the procedure. MSER [10] creates several binary (black and white) images with different thresholds, using areas that change slightly when a threshold changes as features.

SIFT descriptor-based detector [11] uses Gaussian filter smoothing to simulate different scales of the image and more efficiently highlight feature points on both large and small objects. BRIEF [12] develops the idea of SIFT, significantly simplifying the calculation of feature points descriptors. ORB [13] is a newer alternative to the SIFT and BRIEF detector; it uses the TreeFAST detector and another descriptor to effectively describe feature points. BRISK [14] raises the quality of successful matching even higher at different scales using the AGAST detector and taking into account additional maxima on each of the octaves of Gaussians when comparing. Method KAZE [15] avoids the disadvantages of Gaussian filter and achieves high localization accuracy and distinctiveness. AKAZE [16] speeds up KAZE using a more computationally efficient FED (Fast Explicit Diffusion) platform. AKAZE uses rotation and scaling invariant M-LDB [16] as a descriptor.

#### *2.2. Methods of a Local Optical Flow Construction*

To track feature points, in particular, to build an optical flow between frames, several tracking methods have been developed. The basis of the most famous Lucas–Kanade algorithm [17] is the assumption that the value of pixels of one object varies slightly between frames, and using the least squares method, it finds a position that minimizes the discrepancy between the pixel values in neighboring frames.

The method of G. Farneback [18] applies the decomposition of the change in image intensities around a singular point in the Taylor series to the second term using a weight function to approximate the values of neighboring pixels. Based on an extensive analysis of the available data, a new modification of the Lucas–Kanade method—the RLOF [19] algorithm—was created. It uses a modified Hampel estimate with robust characteristics.

#### *2.3. Anomaly Detection Algorithms*

Let us also discuss the existing algorithms for abnormal behavior in a video sequence detection. Optical flow is directly widely used for behavior analysis [20,21], but is not able to analyze spatial relations. Particle flow [22] copes with the task a little better, but it is unable to fully analyze objects in space. Methods based on local spatio-temporal features, in the form of gradients [23] and motion histograms [24], can be applied to the analysis of poorly structured scenes (crowded scene for example). Methods based on tracklets (for example, [25,26]) are well suited to detecting short-term anomalies and analyzing dense crowds. In trajectory methods, tracking algorithms track the trajectory of an object, after which it is analyzed through clustering of string kernels [27], single-class SVM [28] or semantic scene learning [29]. Methods based on global patterns analyze the entire sequence using a dense optical flow [30]. There are Gaussian Mixture Model (GMM)-based methods [31], models of social interactions [3], or Principal Component Analysis (PCA) [32] and several others. Grid pattern-based methods split the frame into blocks (grid cells) and perform block-based analysis. There are methods based on the reconstruction of sparse

textures [2], motion context descriptor [33] or spatio-temporal Laplace maps. In [34,35] hierarchical sparse coding is applied, and in [36] multilevel sparse coding is utilized, on top of which SVM is applied for classification.

#### **3. The Tracklet Analysis Algorithm**

In order to study the influence of feature points detection and tracking algorithms on anomaly detection algorithm quality, we selected the LBT [5] (Local Binary Tracklet analysis). The overall pipeline is shown in the Figure 1.

**Figure 1.** Data pipeline of the LBT algorithm.

The input is an ordered set of images extracted from the video sequence. Then, using one of the detectors described in Section 2.1, the detection of feature points is performed. Then, the tracking of points between frames is performed using one of the tracking algorithms (or trackers), e.g., Lucas–Kanade or RLOF, as described in Section 2.2. According to tracking results, the trajectory of a point is sequentially formed, and its last section with a length of a given number of points (parameter) is used as a tracklet for anomalies detection. Too short tracklets, as well as those where there is a sharp change in velocity or direction, are discarded as unreliable. We assume that the objects that give rise to the tracklets have some inertia. Then, if during the time period specified for the analysis, the point abruptly changed the direction or velocity, there is a tracking failure. Next, the anomaly detection is performed. To achieve this, the frame is evenly divided by a grid into a given number of areas horizontally and vertically (the number of areas is a parameter) and a certain number of characteristic directions are identified for each of the areas. If a new characteristic direction of movement appears in the area or the magnitude of the characteristic speed of movement changes significantly (the threshold is a parameter), then the algorithm designates this area of the frame as abnormal.

### **4. Experiments**

#### *4.1. Test Datasets*

When selecting test datasets for the analysis of optical flow anomalies, datasets UMN [37] and UCSD [38] were found. For an experimental study of the algorithm, a video sequence from an UMN dataset was chosen, where a group of people run away, simulating a panic (UMN sequence in the tests). A sequence was also selected from the UCSD dataset, where vehicles are present in the pedestrian zone (the Ped sequence in the tests). Examples of the appearance of frames from both sequences are shown in the Figure 2.

**Figure 2.** Examples of frames from selected video sequences.

#### *4.2. Quality Metrics*

To evaluate the influence of algorithms on the quality and performance, it is necessary to determine quality metrics. In the case of performance, the calculation was performed as follows. During the processing of a sequence of frames, the time required for processing was calculated and then divided by the frame number in the video sequence. Thus, the average duration for one frame is obtained, from which we determine the number of frames processed per second (FPS). The following approach was chosen to determine the quality. Once the task (to draw attention to a certain moment in the video, or to the certain camera at some point in time) was selected, the markup was implemented for each frame as a whole. A sequence of values was compiled for all frames, whether an anomaly was present or absent in the frames. The output of the algorithm is designed in the same format. Thus, it is possible to calculate the number of successfully detected abnormal frames (TP, true positive), erroneously detected abnormal frames (FP—false positive, type II errors) and erroneously rejected abnormal frames (FN—false negative, type I errors). Based on these data, it is possible to calculate the metrics of accuracy, completeness and F1-score (1) as a harmonic mean.

$$precision = \frac{TP}{TP + FP}, \text{ recall} = \frac{TP}{TP + FN}, \text{ F1} = \frac{2TP}{2TP + FP + FN} \tag{1}$$

It is also possible to build a Precision–Recall Curve (PRC) and the area under this curve (Precision and Recall-Area Under Curve, PR-AUC). Methods for plotting this curve and calculating the area under it are presented in a variety of packages; for example, in this case, the version from the python library scikit-learn [39] was used.

For tracklets and feature points, some characterizing parameters can also be proposed. Despite the fact that for each feature point, only a small set of the last points of the trajectory is used for analysis, the total number of frames on which the point is successfully tracked is calculated for comparison of algorithms. Then, this indicator is averaged over all frames of the video and tracklets. In this way, the average lifetime of the tracklet is obtained, which shows the overall robustness of tracking using a given method. The number of detected and tracked feature points on each frame is also summed after the removal of those whose position does not change, provided that a significantly higher threshold (several thousand, for example) is detected. The value is averaged by the frames quantity in the sequence and the average number of detected points is obtained. It characterizes how well each of the algorithms detects feature points.

#### *4.3. Configuring Algorithm Parameters*

Before analyzing the dependence of performance and quality on the choice of algorithm, it was necessary to pre-configure the parameters of the algorithms used. Note that chosen videos and the algorithms themselves ensure the practical absence of interdependence, so parameters of the points detection, tracking and anomaly detection algorithms can be configured sequentially.

At the first stage, with fixed parameters of the algorithms for detecting and tracking feature points, the parameters of the algorithm for anomalies detection in the optical flow were configured. To evaluate the performance of the algorithm in descending order of priority, the following metrics were used: F1-score, PR-AUC, FPS, and the average lifetime of the tracklet (higher value equal to better result for all metrics) and the average number of detected points (with the same maximum for all methods). According to the research results, we found that the number of cells for analysis has the main influence. With a value of the parameter "8 cells vertically by 12 horizontally", the maximum of accuracy is obtained. Neither the maximum length of the tracklet nor the discretization of the direction histogram have a significant effect; these parameters were selected according to maximal accuracy and FPS. At the second stage, the parameters of each of algorithms for feature points detection and tracking were configured. In total, two algorithms for feature points tracking were considered: Lukas–Kanade (LK) and RLOF, described above in Section 2.2. Also eight classical algorithms for feature points detection were evaluated: GFTT, FAST, AGAST, SimpleBlob, SIFT, MSER, KAZE, and AKAZE, all described in Section 2.1. For each of the algorithms permissible limits of parameter changes were selected based on the information provided in the corresponding article. After that, the grid search for the best combinations of parameters for each algorithm was performed. Thus, sets of the best parameters for each of the algorithms were obtained to compare them with each other.

#### *4.4. Comparison of Detection and Tracking Algorithms*

Using configured methods for detecting and tracking feature points, as well as a customized anomaly detection algorithm, a study of the quality metrics described above was conducted. It was determined that choice of method has practically no effect on the maximal length of the trajectory; therefore, this parameter was discarded during the study of the results. Average number of detected points also does not directly depend on the method of feature points detection and does not correlate with quality. For any method, it can be changed depending on the requirements of the tracking algorithm or optical flow anomalies detection algorithm. For this reason, this parameter has also been omitted in this study. The summary tables of the results for the Lukas–Kanade tracking algorithm (Table 1) and RLOF tracking algorithm (Table 2) are shown below.

**Table 1.** Results for Lukas–Canade method.



**Table 2.** Results for RLOF method.

According to these tables, it is possible to conclude which method of detecting feature points is best suited for anomaly detection task. To do this, we present normalized graphs of the PR-AUC and FPS parameters for all combinations of the dataset and the tracking method (see Figure 3). The F1-score graph, with exclusion of insignificant details, coincides with the PR-AUC graph and is therefore omitted here.

**Figure 3.** Normalized PR-AUC and FPS graphs.

From the graphs shown, demonstrating the ratio of quality metrics of different pairs of the tracking method and the dataset when using a particular detector, several conclusions can be drawn. The most significant difference between the methods is observed in the resulting FPS, but the dataset and chosen tracker does not have significant effect on relative values. Therefore, in general, when the algorithm must work faster, it is recommended to use one of the detectors from the left part of the figure—GFTT, FAST, AGAST or SimpleBlob. For the UMN dataset, due to the relative simplicity of the problem being solved, the choice of detector did not have a significant impact on quality. For the Ped dataset, the FAST and AGAST detectors showed slightly lower quality as simpler methods. We also noted that the quality of the algorithm when tracking points by the RLOF method depends on detected feature points and detector quality more than when using the Lukas–Kanade method. Comparison based on absolute values of quality metrics (Tables 1 and 2) shows the absence of significant differences in quality when using different trackers. For some items, a quality is higher when using RLOF. FPS differs insignificantly, by units and fractions of percent, so there is almost no difference between trackers for this indicator. The Lukas–Kanade method works faster, while RLOF is a little slower, but more accurate for a number of detectors. There is a possibility that the absence of significant differences in quality for detection methods is because of the selected dataset or quality metrics. The study of this set of methods on other tasks may become the goal of further research.

### **5. Conclusions**

This paper considered several well-known algorithms for feature points detection and tracking and for anomaly detection in an optical flow constructed from a video sequence. After performing the analytical review, we chose algorithms: LBT for anomaly detection in optical flow; GFTT, FAST, AGAST, SimpleBlob, SIFT, MSER, KAZE, and AKAZE for detecting feature points; RLOF and Lucas–Kanade for feature points tracking. In the course of the study on the selected video sequences, significant differences in the performance of the algorithm were revealed depending on the selected feature point detector. There was no difference in quality, except for the FAST and AGAST detectors, which performed worse than the others on the Ped dataset with vehicles in the crowd. For use in such scenes, GFTT, SimpleBlob and AKAZE can be recommended as better methods in terms of FPS, precision and robustness. Tracking methods both have almost the same quality. The Lukas–Kanade tracking method has better performance and is more robust to poor quality of feature points; therefore, it can be recommended for use in scenes similar to considered dataset. In further research on this topic we plan to expand the variety of video sequences for analyzing methods, as well as to study existing neural network algorithms for feature points detection and tracking, and other methods for optical flow anomalies detection.

**Author Contributions:** Conceptualization, I.F. and Y.R.; methodology, Y.R.; software, Y.R.; validation, Y.R., I.F. and E.S.; formal analysis, Y.R. and I.F.; investigation, Y.R.; data curation, Y.R. and I.F.; writing original draft preparation, I.F.; writing—review and editing, E.S.; visualization, I.F.; supervision, E.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work was carried out as the part of the state task of the Russian Ministry of Education and Science for 2023 "Research of methods for creating self-learning video surveillance systems and video analytics based on the integration of technologies for spatiotemporal filtering of video stream and neural networks" (FNRG 2022 0015 1021060307687-9-1.2.1 №075-01595-23-00).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
