3.1. Vehicle Detection and Tracking
The algorithm employed in this paper primarily combines YOLOv5s with DeepSORT. In the context of a closed traffic environment such as a highway, achieving real-time vehicle detection under high frame-rate camera surveillance of rapidly moving vehicles necessitates a detection algorithm that is capable of timely processing and minimizes the possibility of overlooking detections in the image sequence. The original DeepSORT model, based on the Tracking-by-Detection (TBD) strategy, utilizes the Faster R-CNN object detection algorithm. While it performs well in accuracy, its processing speed is comparatively slow, rendering it unsuitable for real-time applications. Consequently, this paper integrates the multi-object tracking algorithm, DeepSORT, with the YOLOv5s object detection algorithm to replace Faster R-CNN, thereby enhancing the multi-vehicle tracking performance of the model.
The algorithm comprises two main steps. Firstly, it involves computing the cosine distance of feature vectors and the Mahalanobis distance between detection boxes and prediction boxes using the YOLOv5s network. The computed results are then subjected to weighted fusion to generate an association matrix. Subsequently, the Hungarian algorithm is employed to match vehicle detection boxes with tracking boxes. If a successful match is established, output is generated; if not, a reinitialization of the Kalman filter tracker is required.
3.2. Image-Based Vehicle Re-Identification
The task of vehicle re-identification aims to match the same vehicle across different cameras or time instances through image matching. The key to solving vehicle re-identification lies in feature extraction and similarity measurement computation. Feature extraction based on deep learning is currently the mainstream approach. Similarity measurement typically involves mapping images into a feature space where image similarity can be directly quantified, and then utilizing the extracted feature differences to assess the similarity between two targets. A smaller distance between two targets indicates higher similarity and a higher likelihood of being the same target; a larger distance signifies lower similarity and a lower likelihood of being the same target. Generally, fully connected layers are employed to flatten image features into one-dimensional vectors, followed by the use of suitable distance metrics to compute the disparities between image features. Commonly used distance metrics include Euclidean distance, Manhattan distance, cosine distance, and Mahalanobis distance.
In this paper, when constructing a vehicle re-identification model based on attribute information, the vehicle attribute information extracted from each attribute branch is incorporated into the global features generated by the main network. This incorporation aims to enhance the interaction between attribute features and vehicle re-identification features, thereby producing more distinctive and representative features [
30,
31].
As shown in
Figure 2. In the improved vehicle re-identification network architecture focusing on attribute information, the network model first extracts image feature information from the main network, generating a feature map F. Subsequently, the global feature F is fed into attribute branches based on attention modules, resulting in various attribute feature maps
. These attribute feature maps are then combined using an attribute re-weighting module to produce a unified and comprehensive attribute feature map A, encompassing all attributes. The generated attribute feature map A undergoes global average pooling and fully connected layers to yield attribute feature vectors for attribute recognition. Simultaneously, useful attribute information is extracted through convolutional operations on the attribute feature map A. This information is subsequently integrated back into the global features, resulting in the final fusion feature map R. The spatial average pooling feature vector r of R is employed for the ultimate vehicle re-identification task.
3.3. Anomaly Trajectory Recognition Based on Trajectory Rules
In this paper, anomalous trajectory recognition consists of five main parts: mathematical modelling of vehicle information, mathematical modelling of vehicle behaviour, trajectory acquisition in the visible area, trajectory prediction in the blind area and discrimination of anomalies. Firstly, we model the physical information and driving behaviour of the vehicle to achieve a comprehensive perception of the moving vehicle. After that, vehicle trajectories are acquired and predicted in two actual scene regions (visible region and inside the blind zone), respectively. Finally, according to the defined anomaly rules, the vehicle trajectory is determined as anomalous or not.
3.3.1. Mathematical Description of Vehicle Information
The vehicle trajectory information obtained based on the tracking boxes under a certain surveillance camera includes the following main aspects: the time duration T for which the trajectory exists, the total number of frames f in the trajectory over time T, and the world coordinates of the vehicle’s position in the frame sequence. Generally, z is assumed to be either 0 or a constant. The description of model parameters can be as follows:
Set of vehicle positions , where ;
Set of vehicle movement paths
, where
represents the change in vehicle position, or the path length the vehicle has traveled from the
frame to the
frame. It is calculated as shown in Equation (
1):
Set of vehicle movement angles
, where
represents the angle of vehicle movement between the
frame and the
frame in a coordinate system defined by the coordinates.
and is calculated as shown in Equation (
2):
Set of slopes of vehicle movement angles
, where
represents the slope of the vehicle trajectory from the
frame to the
frame. It is calculated as shown in Equation (
3):
Set of average vehicle movement speeds
, where
represents the average speed of the vehicle over the past
k frames, from
frame to the
frame. It is calculated as shown in Equation (
4):
In the aforementioned mathematical descriptions, the parameter L can be employed to assess whether the vehicle is in motion, while parameters A and K are utilized to evaluate whether the vehicle’s travel direction is accurate. Additionally, parameter serves to evaluate whether the vehicle’s speed adheres to road regulations.
3.3.2. Mathematical Model of Vehicle Driving Behavior
Taking into consideration the characteristics of highway scenarios and vehicle motion, this study categorizes common highway vehicle motion behaviors into normal and abnormal behaviors. Normal behaviors encompass compliant driving actions such as straight-line driving and lane changing that adhere to road traffic safety regulations. Abnormal behaviors, on the other hand, encompass actions like driving in the opposite direction, speeding, moving at a slow pace, stopping, and making hazardous lane changes. A schematic diagram is illustrated in
Figure 3 for reference.
From the vehicle behavior schematic, it can be observed that different abnormal vehicle behaviors correspond to distinct motion characteristics in vehicle trajectories. For a behavior recognition model, key factors to consider include changes in vehicle position, velocity, as well as vehicle motion direction and angles.
We define that vehicles driving in the correct direction according to road regulations, moving away from the camera’s imaging view, are termed “downward”, while those approaching the camera are termed “upward”.
Based on the aforementioned rules and considering the constraints of the application scenario, a mathematical model for vehicle driving behavior can be established. The mathematical model for recognizing downward vehicle behavior is illustrated in
Table 1. With the mathematical model of vehicle behavior in place, we can establish a vehicle behavior detection based on trajectory rules to perform behavior detection on the extracted highway vehicle driving trajectories.
3.3.3. Trajectory Acquisition Method for Visible Scenes
In the visible range of the camera, based on the vehicle tracking results and vehicle modelling information, we can obtain accurate vehicle position information and trajectory on the time series. The positional information of vehicles in camera images is represented in the form of pixel coordinates. These pixel coordinates cannot be directly used to analyze the vehicles’ trajectories in the real-world. Therefore, it is necessary to map the pixel coordinates of vehicles in camera images to their corresponding real-world coordinates, thus obtaining the vehicles’ position information in the real-world. The calculation method between any point in the world coordinate system and its corresponding point in the pixel coordinate system is shown in Equation (
5).
In Equation (
5), if the pixel coordinates and the world coordinates of any point are known,
can be determined as 1. Thus, there remain 11 unknowns in the projection transformation matrix M. In practical scenarios, the bridge deck can be approximated as a horizontal plane in the world coordinate system. The mapping reference points between the pixel coordinate system and the world coordinate system are illustrated in
Figure 4.
When performing vehicle detection and tracking, precise vehicle location information has already been obtained. After undergoing coordinate transformation, vehicle trajectories can be accurately represented in two-dimensional space. In this section, for the extraction of vehicle trajectories, the collection of bottom center coordinates of the tracked vehicle bounding boxes in each frame is selected as the approximate travel trajectory of the vehicle. The specific process of trajectory extraction can be described as follows: The vehicle bounding boxes detected in the frame are used as the basis for trajectory extraction.
The bottom center coordinates are represented as shown in Equation (
6),
where
represents the coordinates of the origin, i.e., the upper-left corner of the rectangle, and
and
denote the width and height of the rectangle, respectively. The obtained trajectory is illustrated in
Figure 5.
In the P-th camera view, the trajectory of a certain vehicle in a consecutive sequence of
frames can be represented by a sequence of sets containing
pixel coordinate points, as shown in Equation (
7).
3.3.4. Blind Zone Trajectory Prediction Based on Bidirectional LSTM
In long-distance tunnels, there is a certain blind spot between multiple cameras, which makes it impossible for us to directly obtain vehicle trajectory information and poses a safety hazard in road traffic management. Long Short-Term Memory (LSTM) is a specialized type of Recurrent Neural Network (RNN) architecture that effectively addresses the vanishing gradient and exploding gradient problems. LSTM incorporates input gates, forget gates, and output gates on top of the basic RNN structure. The architecture of LSTM is depicted in
Figure 6.
Bidirectional LSTM can simultaneously consider both the forward and backward information of a sequence, thereby enhancing the sequence modeling capability. The network architecture of Bidirectional LSTM is illustrated in
Figure 7.
Compared to traditional LSTM, Bidirectional LSTM adds a reverse LSTM layer in its structure. The input sequence is processed separately through the forward and backward LSTM layers, and then the outputs of the two LSTM layers are concatenated along the time dimension. This allows the model to utilize information both before and after the current time step, enabling a more comprehensive sequence modeling and prediction.
During the training of the blind spot vehicle trajectory prediction model based on Bidirectional LSTM, preprocessing is conducted as a preliminary step. The feature sequences, composed of time steps, lateral and longitudinal positions of vehicle trajectories, vehicle speed, acceleration, and other information, are transformed into arrays. Each trajectory’s feature sequence is partitioned into three segments: the front and rear segments are used as inputs, while the middle segment serves as the expected output. Subsequently, the Bidirectional LSTM model is trained. During each forward pass, the hidden parameters of the model are computed, and predictions are generated. The error between the predictions and the ground truth is then calculated, and this error is employed in the backward pass. The model’s various parameters are updated using the principle of gradient descent to minimize the objective function and complete the training process.
3.3.5. Definition of Abnormal Trajectory Rule
In this paper, we mainly define four types of abnormal vehicle trajectory behaviours, which include abnormal driving behaviour, abnormal vehicle speed, abnormal vehicle parking, and dangerous vehicle lane change.
Identification Wrong-way driving behavior: It refers to the abnormal behavior of a vehicle where its actual direction of travel is opposite to the prescribed road direction. Determining wrong-way behavior can primarily involve assessing both the position and angle of vehicle movement. To determine whether a vehicle is engaged in wrong-way driving at frame i, it is first necessary to ascertain whether the set of vehicle movement paths is non-empty. Additionally, it is important to determine whether the monitoring camera is in an upstream or downstream state. Subsequently, the determination of wrong-way behavior is based on the collection of vehicle movement angles. In the downstream state, for normal driving vehicles, as they move away from the camera, the directions of the road regulations, vehicle movement, and the positive y-axis direction of the coordinate calibration are consistent. The range of vehicle movement angles is . A wrong-way frame rate counter Num is set up, where a count is increased by 1 upon detecting a wrong-way frame. When , the vehicle can be classified as performing wrong-way driving. Considering the limited availability of wrong-way datasets, this study employs the opposite upstream/downstream configuration and uses the normal dataset for detecting wrong-way behavior in the reverse direction.
Identification Abnormal vehicle speed behavior: It refers to instances where a vehicle does not adhere to the prescribed speed limit, encompassing both speeding and slow driving. Both of these behaviors pose significant traffic risks in real-world traffic scenarios. In the experiment for detecting abnormal vehicle speed behavior, given the diversity of vehicle types, distinct permissible speed thresholds are applicable. Thus, it is necessary to first determine the current vehicle category and travel lane, and then establish the minimum speed “m” and maximum speed “n” allowed by the road. Subsequently, based on the calculation method for vehicle speed determined during the mathematical description of vehicle trajectories, the vehicle speed is computed to ascertain if abnormal speed behavior is present. Considering the driving characteristics of vehicles in the research scenario, abrupt changes in vehicle speed are unlikely to occur within a brief period. Consequently, simultaneous occurrence of speeding and slow driving anomalies is unlikely. For the calculation, the set of vehicle average speeds “” is computed with k = 10, using the average speed calculation over the most recent 10 frames. A count is incremented by 1 for each instance of speed anomaly. If consecutive 5 frames exhibit speed anomalies, it is concluded that the vehicle is engaged in abnormal speed behavior.
Identification vehicle parking anomalies: In the context of highway scenarios, normal vehicle speeds are extremely high, making parking behavior highly perilous. Although vehicle parking is a continuous process involving gradual speed reduction and slow trajectory changes, we focus solely on identifying the eventual parking outcome. Recognition of parking behavior is achieved by considering the condition where the vehicle’s position no longer changes. We establish a condition of no position change for a consecutive frame sequence of 5 frames. When this time threshold is exceeded, the vehicle is deemed to have parked.
Identification dangerous vehicle lane changes: In the context of highway scenarios, dangerous lane changes are highly hazardous driving behaviors that often result in severe traffic accidents. The occurrence of such behavior typically leads to serious consequences. Rule-based recognition of dangerous lane change behavior is primarily concerned with the vehicle’s steering angle during travel. It mainly detects actions such as U-turns and excessively large steering angles during lane changes that are highly likely to lead to dangerous lane change situations. A threshold value for the vehicle trajectory path slope, denoted as “u” is established. When the path slope exceeds this threshold, it is recognized that the vehicle’s travel direction has deviated significantly from the defined coordinate vertical axis, indicating the presence of a dangerous lane change behavior. The maximum steering angle during a vehicle lane change generally depends on the vehicle’s speed and type.