*3.1. Comparison Results from Experiments with UMN Dataset and Avenue Dataset*

For the UMN dataset, the method based on optical flow-based features [8], Bayesian model [25], chaotic invariants [11], the social force model [10], and sparse reconstruction cost [12] were compared with the proposed method. The UMN dataset contains three crowd escaping scenes in both indoor and outdoor environments. The normal events depict people wandering in groups, while abnormal events depict a crowd escaping quickly. The dataset contains 11 video sequences that are captured in three different backgrounds. Scene 1 and Scene 3 are outdoor scenes (lawn, plaza) and Scene 2 is an indoor scene. The accuracy was defined to be the percentage of correctly identified frames that are calculated by comparing with ground truth.

Table 2 demonstrates the accuracy comparison of six methods for three different scenes of the UMN dataset in identifying escape events.


**Table 2.** Accuracy comparison with state-of-the art methods on the UMN dataset.

The methods in [8,25] were previously tested on the whole UMN dataset, and the provided results were used to compute the corresponding accuracy. As an evaluation setting, the same evaluation settings as described in [25] were used. Overall, the proposed method achieves the best accuracy with an average of 96.50%, which is higher than that of the other methods. Even though the proposed method did not employ any learning process, the proposed method outperforms the comparison methods. Such methods using the learning process are dependent on the training dataset and also require lots of computation, so it is hard to be used as a real-time surveillance system. The proposed method can be used in real life by supplementing these matters.

Figures 8–10 demonstrate the examples of detection results for abnormality from the UMN dataset (No. 1 to No. 3) in the proposed method. In these videos, the suspicious behavior is the sudden movement of people. When people run away in multiple directions at the same time, the direction of movement appears very irregular and the size of the movement is also dramatically increased. The proposed method responded appropriately to this kind of motion. The area in which people are running away has been properly detected. In the scene where several people are running away, we can see that the motion vector is greatly increased, and all of the suspicious behavior is detected. The proposed method generated reactivity images using feature information extracted from optical flow in the video and detected anomalous regions based on temporal saliency obtained through a weighted combination of them. Feature information using the magnitude and gradient of movement, which is the most important factor that constitutes a behavior, is extracted, and a strongly reactive region is detected through a weighting condition formula. The result demonstrates that the suspicious behavior was reasonably detected.

For the Avenue dataset, the methods described in [21,26,29] were compared with the proposed method. The Avenue dataset contatins 16 training videos and 21 testing videos. The only normal behavior in the dataset is people walking in front of the camera, and the abnormal behaviors are unusual actions such as running and jumping, and walking in the wrong direction. Table 3 demonstrates the AUC (area under the curve) values of both the proposed method and the state-of-the-art comparison methods [21,26,29] for the Avenue dataset. The method in [26] was previously tested on the whole Avenue dataset, and the provided results were used in the comparison. The comparison results of Table 4 shows that the performance of the proposed method outperforms the comparison methods.

**Figure 8.** Examples of detection results for abnormality from No. 1 data (UMN Scene 1: Lawn).

**Figure 9.** Examples of detection results for abnormality from No. 2 data (UMN Scene 2: Indoor).

**Figure 10.** Examples of detection results for abnormality from No. 3 data (UMN Scene 3: Plaza).

**Table 3.** AUC (area under the curve) comparison with state-of-the art methods on the Avenue dataset.



**Table 4.** The overall performance evaluation result of the proposed method.

Due to the unpredictability of abnormal events, most previous approaches employ a learning process, and most of them only learn normal event models in an unsupervised or semi-supervised manner, and abnormal events are considered to be patterns that significantly deviate from the created normal event models [29]. The method used in [21] uses spatio–temporal convolutional neural networks to extract and learn various features, and the method in [29] employs the online dictionary learning and sparse reconstruction framework. The method in [26] used both training data and testing data to make a global grid motion template (GGMT). As mentioned before, even though the proposed method did not employ any learning process and uses simple motion features, the proposed method outperforms the comparison methods.

Figure 11 demonstrates the examples of detection results for abnormality from No. 4, which were chosen from the Avenue dataset in the proposed method. In this video sequence, most people in front of the building are walking to the right, while a child jumps to the left. The proposed method properly detected the area where the child jumps. We can see that the percentage of the child occupying the image has increased, because the distance between the child and the camera is much closer than that of the other people. Due to this, the average direction was calculated as the left direction, from the moment the child jumps. Since the child is moving to the left during the jump, the whole direction was calculated correctly. It is seen that different types of abnormality such as running and jumping can be accurately detected and localized.

**Figure 11.** Examples of detection results for abnormality from the No. 4 data (Avenue).

*3.2. Analysis of Examples of Detection Results for Abnormalities with 10 Di*ff*erent Types of Video Sequences*

Figures 8–17 shows the examples of detection results for abnormalities from 10 different types of video sequences. The results for the No. 1 to No. 4 data are shown in Figures 8–11 and are explained in detail in Section 3.1.

**Figure 12.** Examples of detection results for abnormality from the No. 5 data.

**Figure 13.** Examples of detection results for abnormality from the No. 6 data.

**Figure 14.** Examples of detection results for abnormality from the No. 7 data.

**Figure 15.** Examples of detection results for abnormality from the No. 8 data.

**Figure 16.** Examples of detection results for abnormality from the No. 9 data.

**Figure 17.** Examples of detection results for abnormality from the No. 10 data.

Figure 12 shows the examples of detection results for abnormality from the No. 5 data. As mentioned before, this video sequence is a video recording of people in which there are no anomalous behaviors and all people are moving normally. This experiment is performed to see if the proposed method responds to ordinary behavior. As a result, it was found it did not react at all to the usual walking behavior (reaction rate 0%).

Figure 13 shows the examples of detection results for abnormality from No. 6 data in which a man is walking while looking at his cell phone and after a while, he falls over an obstacle. This video includes every scene from his usual walking to the falling down. Through the result, we can see that the system does not react at all to the ordinary walking, but it reacts strongly from the moment when the man falls over the obstacle.

The video shown in Figure 14 is similar to the video shown in Figure 13. As two men walk together, the man on the left falls on an obstacle. As a result of the experiment, we can see that the area of the man on the left is correctly detected from the moment he falls. The man on the right was similar in size to the fallen man but was not detected.

Figure 15 shows the examples of detection results for abnormality from the No. 8 data, in which a man is falling into the water. The man falling into the water was correctly detected, but another man's foot moving on the left side was erroneously detected. This is because as the distance from the camera is close to the scene, the magnitude of the motion vector is greatly affected.

Figure 16 shows the examples of detection results for abnormality from the No. 9 data, in which a man is falling down a stairway. Similarly, as the response to the magnitude and direction of the action grows, the behavior that a man is falling is detected.

Figure 17 shows the examples of detection results from the No. 10 data. This is a violent robbery video that happened in South Kensington, which was reported in US news. This video sequence contains a scene in which two men assault one man. Even though the fact that the video is very low in intensity and contains lots of noise due to illumination, the proposed system both detects the scene where one man is running as well as the scene where two men joined together and committed violence on another man.

#### *3.3. Overall Performance Evaluation Results of 10 Di*ff*erent Types of Video Sequences*

The actual results of the proposed method are compared with those of the actual suspicious behavior region, which is regarded as a ground truth. Frames that successfully detected a region containing suspicious behaviors are used as a component of a true positive (*tp*), and frames that detected a suspicious behavior region even if there were not any suspicious behaviors in the frame were used as a component of a false positive (*fp*). Frames that did not detect any regions, even if there were suspicious behavior in the frame, were used as a component of false negative (*fn*). The proposed method achieved a 100% true negative (*tn*) rate, because nothing was detected as a suspicious behavior region in experiments with No. 5 data where no suspicious behavior is included. The accuracy, precision, recall, and FNR (False Negative Rate) is calculated as follows.

$$\begin{array}{c} \text{accuracy} = \frac{tp + tn}{tp + fp + fn + tn}, \text{precision} = \frac{tp}{tp + fp} \\ \text{recall}(True \text{ Positive Rate}) = \frac{tp}{tp + fn}, \text{FNR}(\text{False Negative Rate}) = \frac{fp}{fp + tn} \end{array} \tag{5}$$

Table 4 summarizes the overall performance evaluation result of the proposed method.

The reason for the low performance of the No. 9 data was analyzed as follows. In the No. 9 data, all the motion vectors were not detected and only a part of them was detected because the position of the walking man is too close to the photographing camera. It was a difficult environment to measure the motion vector properly. For this reason, false positives have been increased, resulting in lower performance compared with that of the other data.

As a summary, the proposed system detects various suspicious behaviors captured in various environments with high performance, and it is also robust to differences in brightness depending on the weather and time. However, given the results of the No. 9 data, it is necessary to secure a suitable shooting distance to accurately run the proposed method.

#### **4. Conclusions**

In this paper, a new surveillance system for detecting suspicious behavior regions that can be used in real-time was proposed. The proposed method generated reactivity images using feature information extracted from optical flow in CCTV video and detected anomalous regions based on temporal saliency obtained through a weighted combination of them. Feature information using the magnitude and gradient of movement, which is the most important factor that constitutes a behavior, is extracted, and a strongly reactive region is detected through a weighting condition formula.

Extensive experiments on different challenging public datasets as well as on eight various types of video sequences collected online were conducted to demonstrate the effectiveness of the proposed method. Quantitative and qualitative analyses of the experimental results showed that the proposed method outperformed the traditional method in suspicious behavior detection and was comparable to the state-of-the-art methods without using complicated training approaches. In addition, experimental results showed that the proposed system is suitable for detecting suspicious behaviors such as violent actions, fallings, jumping, sudden running, and bumps. The proposed method can detect instantaneous events and accidents.

In the proposed method, two reactivity maps of motion magnitude and motion gradient were generated, and these two maps were weighted and combined to make a temporal saliency map. However, to detect more complex behaviors, it is not enough to combine just the two features used in the proposed method. It is necessary to grasp the relation of existing objects in the video and to grasp the situation before and after based on the time when the event occurred. However, an essential motion pattern is indispensable for detecting such a complicated behavioral relationship. The proposed method is structurally easy to combine with other features. Just adding a new algorithm that extracts other features to the proposed method is not difficult, and with this extension, it can be used not only in the field of detecting more various abnormal behavior but also in various other fields.

**Funding:** This research received no external funding. **Conflicts of Interest:** The author declares no conflict of interest.
