1. Introduction
When a large number of metros are put into use in various cities, the frequency of metro operation interruptions or passenger casualties also increases [
1]. The economic losses and casualties caused by the frequent occurrence of metro accidents not only affect the economic benefits of enterprises, but also destroy social stability to a certain extent. Human factors are the most important reasons for metro accidents. Among the 128 operational cases collected by Wanxin et al. [
2], 85 accidents were caused by metro passenger behavior, accounting for 66% of the total number of accidents.
In order to strengthen the management of metro passenger behavior, video surveillance devices have been installed in densely populated areas such as metro entrances, turnstiles, and escalators. Traditional detection and recognition mainly rely on monocular vision [
3]. However, these devices have problems such as low recognition accuracy and poor stability, and are not able to identify unsafe behaviors of passengers [
4,
5], and there is no abnormal behavioral warning function. Staff need to take turns on duty to supervise the activities of passengers in the station. After an accident, they can only collect evidence by watching the video, which cannot be dealt with in time [
6]. The emergence of a series of depth camera sensors, such as Kinect launched by Microsoft, has important theoretical significance and practical application value for research on intelligent recognition of metro passengers’ unsafe behavior.
Behavior recognition is performed through the detection and tracking of acquired video sequences, using the related computer vision, image processing and other technologies to describe specific actions and realize behavior recognition. The accuracy of recognition results depends on the extraction of behavior features and the selection of appropriate recognition methods [
7,
8,
9,
10,
11,
12,
13,
14,
15,
16]. In order to accurately describe the behavior features for subsequent action recognition, different action features are extracted according to the types of the acquired basic data information. The commonly used feature parameters of different recognition systems include static feature parameters and dynamic feature parameters. Typical unsafe behavior recognition systems and their characteristic parameters are compared and analyzed, as shown in
Table 1.
As shown in
Table 1, the behavior recognition technology based on video information and image information is easily disturbed by external environmental factors such as light and object occlusion, and requires preprocessing of the original data, which increases the overall recognition difficulty, complicates the implementation process, and has low repetition rate. Kinect bone information recognition based on infrared structured light detection can effectively avoid the interference of environmental factors [
17]. Considering the complexity of the metro environment, the use of Kinect sensors to identify unsafe behaviors of metro passengers is based on skeletal information, which is more suitable than using video methods to obtain data.
However, since the identification of metro passengers’ unsafe behavior needs to be performed in real time and requires accuracy, when using Kinect for behavior recognition, algorithms have a great impact on the recognition effect. Therefore, according to the characteristics of metro passenger behavior, selecting an appropriate algorithm is a topic that needs further discussion.
The current Kinect-based behavior recognition algorithms can mainly be divided into two categories, one is the state space method and the other is the template-matching method. The state space method includes the hidden Markov model [
18], a Bayesian network [
19], a BP neural network [
20] and a transfer learning model [
21,
22]. These methods can be easily used to obtain data, and only need traditional video information, picture information, etc., to perform subsequent behavior recognition. However, this method requires a large number of training data to learn model parameters, usually combining machine learning and deep learning methods, and the determination of observed values of model parameters often has certain errors; sometimes for the convenience of model solving, it will affect the generation process of real data [
23,
24].
The other approach is the template-matching method that uses frame-to-frame matching [
25], frame fusion matching [
26] and key frame matching [
27], among which the dynamic time warping (DTW) algorithm is the most typical one. This method matches the behavior feature sequence in the sample with the corresponding standard behavior, and then determines the distance between the sequences to obtain the action similarity. Its advantage is that it does not need a large number of samples as the recognition basis, and the algorithm process is simple and easy to operate [
28]. Therefore, considering that passengers’ unsafe behaviors are mostly simple actions rather than complex continuous actions, using the DTW algorithm can detect the behaviors more quickly and easily, which meets the requirement of metro operation for speed-up accident prevention and rapid emergency response.
In the field of DTW, previous studies mostly focus on improving the accuracy of behavior recognition by replacing the Euclidean distance with the Mahalanobis distance, the coupled hidden Markov model, etc. [
29,
30,
31], but few studies consider the method of building bone feature vectors and the selection of DTW model parameters on the recognition effect.
In regard to the building bone feature vectors, most of the action features of bone information use the vectors formed between adjacent joint points as the extracted action feature vectors. For example, YU Ruiyun [
32], LU Zhongqiu [
33], etc., select the joint coordinates of the main parts of the body, and then calculate the adjacent bone joint point vectors to obtain and identify the similarity between standard actions and test actions. However, this joint selection method has the disadvantage of redundant joints, and is also affected by factors such as the height and body of the recognized object and the body offset during the recognition process, resulting in the low robustness of the recognition results.
As for the selection of DTW model parameters, most of the current studies [
34,
35] determine the joint angle feature by calculating the mean of the angle set, but the DTW algorithm finds the optimal road strength, and only calculating the mean of the angle set may not achieve the optimal recognition effect. In other words, which angle difference is most conducive to the recognition result is unknown. Furthermore, the influence of coupling between bone feature vectors and angle difference parameters on the recognition results needs further study.
To sum up, there are two problems to be solved. (1) In view of the low similarity of motion recognition using the overall adjacent joint method, and the problem that individual unsafe behaviors are mistakenly recognized as safe behaviors, resulting in reduced recognition accuracy, this study adopts the local pelvic divergence method, hoping to improve the similarity of motion recognition and thus improve the accuracy of behavior recognition. (2) It is of great significance to conduct a comparative study on the mean value, the maximum value of the angle and the angle sum to obtain the optimal parameter values.
In this study, based on the skeletal information obtained by Kinect, the overall adjacent joint method, the overall pelvis divergence method, the local adjacent joint method and the local pelvis divergence method were used for comparative analysis, and the cosine method was used to obtain the joint angle difference between actions. The similarity calculation model of the action feature quantity is established by using the angle difference, and the optimal path between the actions is obtained by using the DTW algorithm and converted into the action similarity. Then, the unsafe behavior recognition experiment of metro passengers is carried out, and the similarity score of each action is obtained. Comparing and analyzing the experimental results, the optimal recognition model is obtained. Finally, with the help of the MATLAB App Designer interface development platform, the identification and early warning system of unsafe behavior of metro passengers is designed to verify the recognition accuracy of the optimal recognition model. Additionally, this method is compared with traditional identification methods and similar studies by other scholars to verify the scientific validity of the identification model.
The main contributions of this paper are as follows. (1) Through the comparative analysis of the metro passengers’ unsafe behavior identification experiments, an optimal identification method is determined, the feasibility and accuracy of the method are verified, and a new technical means for the management and control of metro passengers’ unsafe behavior is provided. (2) The improved Kinect recognition method is applied to the metro station to solve the problem that the recognition effect of passengers’ unsafe behaviors is not ideal in the complex environment of the metro station.
5. Conclusions
In this study, a recognition model of metro passengers’ unsafe behavior is proposed based on Kinect. In the process of building a spatial skeleton model, a new skeleton model building method and a recognition model of the DTW similarity algorithm are proposed. That is, taking the pelvis as the starting point of the vector and the high-frequency moving joint points of the action as the end point of the vector to extract the feature vector, using the maximum angle difference attribute as the feature quantity, and combining with the DTW similarity algorithm, it solves the problems of low action similarity caused by redundant joints, unclear quantification of DTW results and practical applications. In addition, the recognition score of each model is calculated by MATALB, and the group comparison experiment is carried out. The results show that the recognition effect of the overall situation is poor and the recognition purpose cannot be achieved. The effect of the local “pelvis divergence method” recognition model is generally higher than that of the local “adjacent joint method”, among which the local “pelvis divergence method” has the highest recognition results of the maximum angle difference model, and the recognition results of the five unsafe behaviors are 86.9%, 89.2%, 85.5%, 86.7%, and 88.3%, all of which are greater than 80%, indicating the feasibility of the model. Additionally, the recognition results are more concentrated and more stable, which significantly improves the recognition rate of metro passengers’ unsafe behavior.
The focus of this research is to avoid complex algorithms for effective action recognition based on a small sample size. However, during the experiment, it was found that the recognition accuracy of the double-arm waving and calling for help was generally low. In addition to the system error, the faster movement speed and the overlapping joint points were the reasons for this problem, which can be further improved and discussed in the follow-up research.