3.3.1. Dynamic Node Edge Generation Based on Relative Angles
Dynamic nodes usually refer to mobile objects such as vehicles moving on the road. For vehicle trajectory prediction, the positions and speeds of dynamic nodes change over time, and their dynamics and changing trends need to be considered in trajectory prediction, so the edge connection scheme for dynamic nodes needs to consider their node characteristics.
GAT is a commonly used graph neural network model that is widely used for trajectory prediction tasks. GAT can learn the dynamic relationships between nodes (e.g., vehicles or pedestrians) in a traffic network and predict the motion trajectories [
31]. In the GAT model, nodes denote traffic vehicles and edges denote interactions between nodes. Through a multilayer attention mechanism, GAT can dynamically learn the importance between each node and its neighboring nodes in the graph and perform information aggregation and trajectory prediction accordingly [
32]. The learning formula is as follows:
In Equations (
3) and (
4), W is the learnable matrix,
is the activation function, which is generally a two-layer multilayer perceptron in a graph neural network, after calculating the weights of the edges, the operation of normalization is carried out and its required attention is calculated according to Equation (
5), and the overall formula is as follows:
,
,
is a linear transformation of the
feature at the previous moment. It can be seen that Equation (
5) focuses on the influence of distance
on the attention coefficient, while in the actual traffic scene, distance is not the only key factor. And under the GAT formula, every point is involved in the calculation of the graph, which leads to problems such as distraction of attention and increase in calculation cost.
In this paper, we address this problem by introducing the relative position factor on the basis of the above GAT equation. The relative position information between vehicles is integrated into the calculation process of the attention weight according to Equations (
6) and (
7), and the angular relationship between the dynamic vehicles is added into the model as a characteristic factor. The specific formulas are as follows:
where
is the activation function, which is generally a two-layer multilayer perceptron in a graph neural network, then the three values of
Q,
K, and
V in attention are
where
,
, and
are learnable matrices, according to Equation (
8). The weights of the edges are
where
is the angular characterization as defined in Equation (
9):
where
is the learnable matrix,
is the role of the relative angle factor, according to the understanding of the remote interaction between vehicles; when the relative angle between the two vehicles is smaller (such as 0—the vehicles are parallel at this time), the attention required at this time is smaller; while when the relative angle factor between the two vehicles is larger (such as
), the attention required is larger.
The original Transformer model is optimized and improved by introducing the angle characteristics between vehicles. In the original model, the attention mechanism only relies on the variable of distance, which makes it difficult to pay full attention to the intersection state of vehicles. The introduction of the relative angle factor enables the model to more fully consider the positional relationship between vehicles, and the weight of the edges is thus improved, which in turn improves the model’s understanding of scenarios such as intersections as a way to increase the accuracy and robustness of trajectory prediction.
As shown in
Figure 5, three vehicles can be observed from the figure, green vehicle A, blue vehicle B, and orange vehicle C. Vehicle A is closer to vehicle B. According to the traditional attention mechanism, when calculating the attentional weights of the edges of vehicle B that needs to be predicted, the weights of vehicle A and vehicle B will be heavier than those of vehicle B and vehicle C. Vehicle A is closer to vehicle B than vehicle B is to vehicle C. Vehicle A is closer to vehicle B than vehicle B and vehicle B. However, in real traffic scenarios, vehicle B needs to pay more attention to the trajectory of vehicle C to avoid possible collisions and adjust its position to obtain a reasonable turning path. The figure shows that the angle between vehicle A and vehicle B is approximately
, and
, the angle between vehicle B and vehicle C is approximately
,
. By calculating the angle of vehicle B with vehicle C and vehicle A and adding it to the attention mechanism, their attention weights can be adjusted and the attention between vehicle b and vehicle C is increased by a moderate amount.
After calculating the weights of the edges
, a first-order subgraph centered on the vehicles to be predicted is generated according to Equations (
10)–(
12) as follows:
where
is the set of neighbors of the predicted vehicles,
,
is a learnable matrix that fuses the weight features
between the vehicles and their own features
through a
activation function, and finally, connects the edges through a selection gate function [
33], and the output is
. The graph generated after filtering through the gate function is a first-order adjacency graph centered on the predicted target. With the gate function, the number of edges can be controlled and the rationality of the generated edges can be increased.
Graph neural networks suffer from state-space explosion when dealing with large-scale data. This is because graph convolutional computation in graph neural network graphs is achieved by nodes aggregating information from the domain as a means of updating their own nodes, and a linear increase in the number of nodes and edges in the graph may lead to an exponential increase in computation. The dynamic edge generation module limits the number of edges generated by updating the physical state of the vehicle (e.g., speed, relative angle, etc.) and filters the number of edges that need to be generated by applying a gate function to compute the weight of the edges in order to compute the dynamic part of an object such as a vehicle. If the number of connected edges is reduced, the number of updates is greatly reduced and the amount of floating point operations and training time required is correspondingly reduced, thus reducing the state-space explosion problem. In the subsequent experimental section, the floating point operations and testing times before and after applying the method are compared. Also, a comparison graph comparing the number of updated nodes and total points is given.
3.3.2. Static Node Edge Generation Based on Length Thresholding
In traffic scenarios, static nodes are objects or locations with a fixed position in space, which are objects or locations that do not move, such as road junctions, buildings, and traffic signs. Unlike dynamic nodes, the positions of static nodes do not change over time, so their motion characteristics usually do not need to be considered in trajectory prediction or other related tasks. In static nodes, local map information, such as the position and direction of lane lines, has a greater impact on the future trajectory of the vehicle; i.e., local map information can represent the future intention of the vehicle. According to the above characteristics, in the edge generation of static nodes, the distance between the static node and the predicted object is an important factor for the vehicle trajectory prediction, and the length threshold is adopted as the strategy for the edge generation of the static node; i.e., the maximum length of the connected edges between the vehicle and the lane lines is measured to decide whether to connect the edges between the vehicle and the lane lines or not. The specific steps are as follows:
First, the relative position information between the vehicle and the map lane lines is calculated, as shown in Equation (
13):
is the MLP encoder of the lane segment,
, and
are the start position, end position, and feature vector of the lane segment
, respectively. The spatio-temporal features of the predicted vehicles are used as query inputs [
34] and the MLP-encoded lane segment features are used as key/value inputs, and the weights between the vehicles and lane lines are calculated.
Next, a hyperparameter threshold L is used to measure the maximum length of the connecting edges between the vehicle and the lane line; l indicates the length between the vehicle and the lane line. When , the weight between the vehicle and the lane line is computed by softmax, and the edges of the nodes of the vehicle and the lane line are generated based on the weight between the vehicle and the lane line, and the first-order subgraph centered on the predicted vehicle is built.
Compared with the traditional GAT, this static edge generation strategy determines the edge connection relationship between vehicles and lane lines by setting a certain length threshold. This static edge connection strategy based on a length threshold has obvious advantages in computational efficiency. In addition, the use of hyperparameters can flexibly control the distance of the connecting edges between vehicles and lane lines, thus controlling the number and distance of the generated static edges, which further optimizes the performance of the prediction model.