*2.1. Data Aquisition*

Various kinds of suspicious behavior video sequences are used as a dataset in the proposed method. The UMN (University of Minnesota) crowd dataset and Avenue dataset, which are used in various behavior recognition and detection papers [8,10–12,25,28,29] were collected. The video sequences in the UMN dataset were each filmed from three different backgrounds—lawn, indoor, and plaza—featuring scenes where multiple people ran away simultaneously when they heard explosions. The video sequences in the Avenue dataset were filmed from in front of a building, and in that video, a few people are running or jumping, while most people are walking. The Walk dataset, which does not include any suspicious behavior, was also used. The Walk dataset does not contain any suspicious behaviors and features videos that were just filmed of people walking along the street without any special features. This video sequence was selected to check for false detection of the proposed method. As shown in No. 6 to No. 10 in Table 1, various types of YouTube video sequences on the Internet that contain various kinds of suspicious behaviors that could lead to a real accident were also collected.


**Table 1.** Dataset and types of suspicious behaviors used in the proposed method.

Table 1 describes what kind of suspicious behavior has been collected as datasets. Actual suspicious behaviors such as violence, tumbling, falling, jumping, and suddenly running behaviors that can be detected in CCTVs installed on the street have been designated as ground truth. All of these behaviors are characterized by large changes in motion or irregular directions of motion.

#### *2.2. Description of Proposed Method*

The proposed method has been developed to detect suspicious behavior in real-time using CCTV. This system is designed to detect instantaneous big changes in the size and direction of motion, such as collisions, sudden running, falling, and assault, which can all occur frequently in real life.

Figure 1 shows the overall process of the proposed method. After performing preprocessing such as grayscale image transformation and median filtering, the two kinds of motion vectors, magnitude (size of motion) and gradient (direction of motion), are extracted by optical flow calculation. Then, the two kinds of extracted motion vectors are converted into the polar coordinate system, and the magnitude feature map for the magnitude of the motion vector (*F*mag) and the gradient feature map

for the gradient of the motion vector (*F*grad) are generated. Then, two reactivity maps (*R*mag, *R*grad) are generated using the mean and variance of each feature map and combined into one temporal saliency map (*TS*). The temporal saliency map shows the area finally detected as suspicious behavior. This described in detail below.

**Figure 1.** The overall process of the proposed method.
