**1. Introduction**

The vast majority of animals, including humans, get the most information from vision among various sensory organs and with this vision, they recognize and judge the situation [1]. As such, visual information is important to judge not only general circumstances but also special situations [2,3]. Although the technology of image processing and the performance of the computer have dramatically improved, analyzing and judging the situation comprehensively as a human does is still difficult [3]. Today, as various technologies using image processing continue being developed, the scope of intelligent image security technology in the video security market is rapidly expanding; the market share is rapidly expanding from hardware to software, such as intelligent image analysis [4]. The technology used for image security requires suspicious behavior detection technology to prevent public security issues, incidents, and accidents. Attempting to enter a personal property, entering a subway station without paying a ticket, kidnapping a child, beating a person, or an act of sudden collapse of a person who is walking along the road may be examples.

This kind of image analysis technology can cope with security threats to individuals and society at large from terrorism, crime, and disasters. In the wake of recent terrorist accidents in many countries, each country has been actively investing in expanding the video security market and securing security systems for the safety of people all over the world [5,6]. In recent years, the number of CCTV installations in the public sector, such as transportation and crime prevention CCTV, has increased to cope with various accidents, such as safety accidents and violent accidents [7]. While the number of areas being surveilled has increased due to the spread of CCTV, the extent of smart technology application remains insufficient. CCTV is already installed in many areas and records automatically, but the reading and checking of the video still must be done manually by a person. Human evaluation of CCTV is not ideal, because it is a task that requires high levels of concentration over long periods. Therefore, an automated monitoring system should be implemented that can automatically recognize crime such as robbery and violence, as well as other situations that require urgent responses, and then notify the proper parties. To date, in the field of intelligent CCTV research [4–7], relatively few studies on behavior recognition or suspicious behavior detection have been carried out in comparison to the number of studies on the active classification and segmentation of objects. In most cases, CCTV is used for security reasons. In particular, when constructing a public place such as an airport, a train station, or a park, ensuring the safety and security of the people using that place is mandatory. If CCTV can automatically detect people who are acting abnormally rather than simply recording them, it will greatly aid accident prevention and response.

There are various patterns of suspicious behavior that we want to detect through CCTV, but the common factor is that the size of the movement is large and the direction is irregular [8–10]. For example, while violence is being committed, the speed of movement generally increases sharply, and the direction of movement becomes very irregular. When a person bumps something or falls on something, the movement at this moment has a different direction of movement than that of a normal moving person, and the magnitude of motion at that moment becomes irregularly large. Beyond these cases, running or jumping behaviors that occur indoors, such as in a classroom, can be considered to be suspicious behaviors, and they have characteristics similar to those described above.

Suspicious behavior detection is one of the most actively studied areas of computer vision, such as video analysis and surveillance [8–27]. Ordinary behavior refers to actions that do not attract people's attention when people perceive some sort of movement [8]. Therefore, surveillance systems detect suspicious behavior using characteristic patterns for various behaviors, which are generally opposed to ordinary behaviors. There have been many studies on abnormal behavior detection using different approaches such as spatio–temporal features [8–16] and machine learning techniques [17–25].

As a high-dimensional feature is essential to better represent the suspicious behavior pattern, many methods based on spatio–temporal information such as optical flow [8], spatio–temporal gradient [9], the social force model [10], chaotic invariant [11], and sparse representation [12] have been studied. It does not require any training learning process, so it has less computation, which can be used in real-time detection [8]. The method described in [9] extracts moving objects from video sequence first and then tracks moving objects to detect their overlapping. Once an overlapping area is detected, the clutter model is built up based on the changes of spatio–temporal features to detect abnormal behavior. An abnormal pattern detecting method based on spatiotemporal volume has been presented in [13]. It calculated the likelihood by analyzing the area occupying a relatively large part of the periphery and transformed it into the form of a codebook, thereby reducing the time required for the calculation. This method is competitive with other methods because it does not require background/foreground segmentation and tracking calculations. However, it is difficult to use this method in an image in which various kinds of abnormal conditions may exist, because the threshold value necessary for detecting abnormal patterns has to be individually calculated and applied experimentally for each image. The method described in [14] detects abnormal crowd behavior based on a combined approach of energy model and threshold. It used the optical flow method to estimate displacement vectors of moving crowd and the computation of crowd motion energy. The crowd motion energy was further modified by crowd motion intensity. The method described in [15] also extracts the motion vector using the optical flow from the segmented image with foreground and background; then, the motion vector with a large change was detected and learned by principal component analysis (PCA). However, data loss can occur due to noise in the process of separating the foreground and background from actual images. Abnormal behavior detection using an interest point by simply monitoring the change of topological structure has been presented in [16]. Two new methods for the analysis of boundary point structure and the extraction of a critical point from the partial motion fields were introduced and both methods were used to build the global topological structure of the crowd motion.

Machine learning techniques for detecting unusual events have been presented in [17–26]. These methods also employ the feature extraction process but use trained data that came out of the learning process. The method described in [17] detects multiple anomalous activities with key features such as speed, direction, centroid, and dimensions, and these help to track an object in video frames. It also employed problem domain knowledge rules in order to distinguish activities and the dominant behavior of activities. In [18], a video frame is divided into several segments of equal size, and the features that were extracted from each segment were clustered using unsupervised learning. Then, the clusters smaller than this were classified as abnormal behavior. In this method, unusual phenomena that do not follow the general statistics are judged as abnormal behavior. However, when there is only abnormal behavior, not ordinary behavior, it is highly unlikely that abnormal behavior can be detected. In order to solve the above-mentioned problems appearing in the method presented in [18,19], Hamid et al. analyzed the whole structure information using statistical information of behavior class and then defined and detected abnormal behavior based on the subclass. However, there was a scalability problem in applying it to various images because of the discontinuous sequence and the fact that the spatiotemporal patch must be stored in the same form every moment. In addition, since data is processed in a batch process, it cannot cope with real-time environmental change. A method that uses violent flows (ViF) feature points for real-time processing has been presented in [20]. After extracting motion vector, motion vectors whose magnitude value exceeds the threshold value are studied and learned by support vector machine (SVM) [8]. However, this method is not applicable to surveillance cameras used in real life because it deals only with images taken from a distance. Convolutional neural network (CNN)-based algorithms have been presented in [21]. Using fully convolutional neural networks (FCNs) and temporal data, a pre-trained supervised FCN is transferred into an unsupervised FCN ensuring the detection of anomalies in scenes. The method described in [22] considered successive chunks that could be observed in segments made from a database that contained no suspicious behavior to be ordinary behavior. Then, by using these successive chunks for learning, the parts for which the magnitude of the feature is small or those who are not included in the learning are detected as suspicious behavior. However, these methods that use the learning process show weakness in versatility because they cannot detect behaviors that are not used in learning. In [23], a unified framework for anomaly detection in video based on the restricted Boltzmann machine (RBM), a recent powerful method for unsupervised learning and representation learning, has been introduced. Unsupervised learning techniques also employed in the method described in [24], and the Bayesian model is employed in the method described in [25]. More significant related work to abnormal behavior detection is described in the review paper [27].

Such methods of manually applying a threshold value or using a background removal with or without data loss are not versatile. In addition, methods using the learning process are dependent on the training data and also require lots of computation, so it is hard to be used as a real-time surveillance system. In this paper, a new suspicious behavior detection method that can be used in real life by supplementing these matters is presented. The proposed method can infer suspicious behavior patterns by solely using simple motion features for real-time anomaly detection. Generally, as humans, we focus our attention on behaviors that vary in the magnitude or direction of motion and behave differently in terms of the rules of motion compared to other objects. In this paper, this information was used in the proposed method. The developed system with the proposed method attempts to detect significantly different behaviors among other behaviors in order to search for suspicious behaviors. To this end, motion features are extracted using optical flow, and these features are then integrated to create temporal saliency. Finally, abnormal behavior can be detected based on temporal saliency.

This paper is organized as follows. In Section 2, the proposed method is presented. A temporal saliency is made by extracting and combining motion features using optical flow and detects suspicious behavior based on this. Test datasets used in the proposed method are also described here. In Section 3, experimental results and discussions were described so as to evaluate the performance of the method. Finally, in Section 4, conclusions were drawn with some general observations and recommendations for ongoing work.
