1. Introduction
In recent years, with the rapid development of autonomous driving technology, lane change intention recognition has gradually become a hot topic in advanced driver assistance systems (ADASs) research. Similarly to challenges in other domains, such as aerial vision-and-language navigation [
1], lane change intention recognition requires intelligent systems to process large-scale sequential data and reason about spatial and temporal relationships. Accurate lane change intention recognition plays a crucial role in autonomous driving systems, directly affecting driving safety and system decision-making [
2]. Therefore, deeply understanding the motion intentions of traffic participants is essential for enabling autonomous vehicles to make better autonomous decisions and path planning, particularly in mixed-traffic environments, thereby enhancing the effectiveness of collision avoidance systems and overall traffic safety [
3].
Current driving intention recognition methods can be broadly categorized into two types: traditional methods and data-driven methods. Traditional approaches often employ probabilistic models, but these methods are usually constrained by assumptions and limitations. With the advancement of machine learning and deep learning technologies, data-driven methods have shown superior predictive accuracy [
4]. For example, KUMAR P et al. [
5] proposed an online lane change intention prediction method based on support vector machines (SVMs) and Bayesian filtering, but this approach does not provide posterior probability values. To address this issue, YANG et al. [
6]. introduced a lane change intention classifier based on relevance vector machines (RVMs), which can predict surrounding vehicles’ lane change situations more quickly and accurately, demonstrating advantages over radar sensors and SVMs, thus improving model convergence. Additionally, SONG Xiaolin et al. [
7] proposed a hybrid model based on HMM-SVM to enhance lane change intention recognition rates using driver operation data. Other models include HMM-BF [
8] and GA-HMM [
9], among others.
Despite the progress made in lane change intention recognition research based on machine learning, significant advancements have been achieved with the development of deep learning technologies. Recurrent neural networks, such as Long Short-Term Memory (LSTM), have shown excellent performance in handling sequential tasks, and have resulted in improved accuracy in driving intention recognition [
10]. In order to establish an intention prediction model based on LSTM [
11,
12,
13,
14], Patel S et al. [
15] proposed a method based on recurrent neural networks (RNNs) and graphical models to predict the future lane change intentions of other vehicles on highways. F. Altché et al. [
16] introduced an LSTM-based recurrent neural network model to fuse GPS, IMU, and odometry data to recognize driver intentions when entering intersections. Ji Xuewu et al. [
17] designed an LSTM-based driving intention recognition and vehicle trajectory prediction model that outperformed traditional methods. S. Yang et al. [
4] utilized an LSTM network based on spatiotemporal attention mechanisms to accurately predict lane change behavior within five seconds using the NGSIM dataset. R. Izquierdo et al. [
11] proposed an integrated method using CNNs (Convolutional Neural Networks) and LSTM to predict the lane change intentions of surrounding vehicles, demonstrating good generalizability. Gao Kai et al. [
18] introduced a novel lane change intention prediction algorithm that combines a multi-head attention mechanism with CNN and LSTM, significantly improving prediction accuracy by processing lateral position information and surrounding environmental data on the NGSIM dataset. Li Lin et al. [
19] proposed a temporal recognition model based on a Bi-GLSTM network which improved lane change intention recognition accuracy by combining a graph attention neural network and validated the model’s effectiveness on the HighD dataset. K. Gao et al. [
20] developed a dual Transformer model that combines multi-head attention mechanisms with LSTM to optimize performance, significantly enhancing prediction accuracy on the NGSIM and HighD datasets; however, such methods primarily focus on the spatial correlations of interacting vehicles while neglecting temporal dependencies. Additionally, some researchers [
14] proposed a framework based on inverse reinforcement learning and bidirectional recurrent neural networks, focusing on the intentions and trajectory predictions of vulnerable road users in urban traffic environments, showing that their average displacement error was significantly lower than that of other baseline models. Deep learning has shown great potential in solving complex transportation problems, too. For instance, residual neural networks have been applied to estimate origin–destination (O-D) trip matrices from link flow data, effectively addressing challenges like limited sensor deployment and indirect O-D flow data [
21]. Another study introduced a stacked sparse autoencoder (SAE) framework to simultaneously estimate O-D flows and optimize sensor placement, achieving accurate predictions with fewer sensors [
22].
Although research on driving behavior perception and intention recognition has made progress, there remain challenges in modeling methods. These include (1) the need for improved model scalability and generalization capabilities; (2) the inability to account for interactions between vehicles, resulting in reduced recognition accuracy in special scenarios; and (3) the need for enhanced robustness in data-driven methods despite some success. Therefore, considering the advantages and disadvantages of existing models and the importance of interaction information, this paper proposes a combined model integrating the Transformer encoder and BiGRU, along with an additive attention mechanism, to identify vehicle lane change intentions.
2. Method
In the context of intelligent connected transportation environments, the advancement of sensor technology has significantly expanded the scale of vehicle feature parameter sequences. This phenomenon has led to an increase in potentially irrelevant information within the data, thereby imposing higher demands on the effectiveness of models. Consequently, models must not only possess the capability to handle large-scale data, but also effectively extract temporal features to ensure accurate recognition of lane change intentions. To address this, this paper proposes a novel lane change intent recognition model, Model_TA, which integrates a Transformer encoder, a bidirectional gated recurrent unit (BiGRU), and an additive attention mechanism, aimed at enhancing the accuracy and robustness of lane change intent recognition. Its structure is illustrated in
Figure 1.
First, the Transformer encoder utilizes a self-attention mechanism for effective extraction of global features when processing sequential data. This mechanism captures dependencies between different time steps, thereby identifying key vehicle behavioral characteristics within complex traffic scenarios. Through parallel computation, the Transformer significantly improves the training efficiency of the model, reducing the distance of information transmission and laying the groundwork for real-time applications. Secondly, the BiGRU demonstrates unique advantages in capturing temporal information. By considering both historical and future information, the BiGRU enables the model to comprehensively understand the context of vehicle behavior. This bidirectional characteristic enhances the model’s ability to capture long-range dependencies, thereby improving the accuracy of lane change intent recognition and the timeliness of decision-making. Furthermore, the additive attention mechanism endows the model with dynamic weight allocation capabilities, allowing it to focus on the most relevant features during the feature extraction process. This mechanism effectively reduces the interference of irrelevant information on the model’s judgment, optimizing the identification of key features and enhancing the model’s adaptability in complex environments. Finally, the Softmax function is employed to compute the driving intentions of target vehicles, providing probability distributions for behaviors such as left lane change, straight driving, and right lane change. Through this probabilistic output, Model_TA not only provides clear decision-making foundations for intelligent driving systems, but also enhances the system’s interpretability.
While the Transformer encoder, BiGRU, and attention mechanisms have been applied in various contexts, the novelty of Model_TA lies in its tailored integration for lane change intent recognition. Specifically, the use of the Transformer encoder enables effective global feature extraction from sequential data, addressing long-range dependencies in vehicle behavior. The BiGRU component enhances temporal understanding by incorporating both historical and future information, while the additive attention mechanism dynamically assigns weights to relevant features, mitigating the impact of noisy or irrelevant information. This unified framework is specifically designed to handle the challenges of large-scale, noisy, and temporally complex traffic datasets. Furthermore, the probabilistic outputs of Model_TA provide clear and interpretable decision foundations for intelligent driving systems, distinguishing it from existing approaches in terms of accuracy, robustness, and applicability to real-world traffic scenarios.
2.1. Transformer Encoder
The Transformer model was originally proposed by Vaswani et al. [
23]. It consists of two parts: an encoder for processing the input sequence, and a decoder for generating the output sequence. This paper primarily utilizes the encoder part. The Transformer encoder is mainly composed of a multi-head attention mechanism and a feedforward neural network.
The basic building block of the multi-head attention layer is the scaled dot-product attention unit. When the sequence is passed through this unit, the attention weights are computed simultaneously across each row. The result not only contains information about the row itself, but also includes a weighted combination of other relevant rows. In this paper, the time series is passed through positional encoding before entering the multi-head attention area of the Transformer encoder. It undergoes three independent linear transformations to obtain the query matrix Q, the key matrix K, and the value matrix V, calculated as follows:
In the formula, represents the dimension of the key vectors, which is used to scale the dot product results
The multi-head self-attention mechanism concatenates the outputs of multiple attention heads and applies a linear transformation to obtain the final multi-head attention output:
In the formula, represents the weight matrix for the output. This matrix is used to linearly transform the concatenated outputs from the multiple attention heads into the final output of the multi-head attention mechanism.
The multi-head attention mechanism allows for the parallel computation of several independent attention heads, enabling the model to simultaneously focus on different parts of the input sequence across various representation subspaces. This capability helps capture richer and more diverse features. Such a mechanism significantly enhances the performance of the Transformer encoder in handling complex sequence data tasks.
2.2. Bidirectional Gate Recurrent Unit
The GRU model replaces the forget gate and input gate of the LSTM model with an update gate. The calculation formulas for the GRU are as follows:
In the GRU model, represents the update gate state at time t, while denotes the reset gate state at the same time. The sigmoid function is used to compute these gates, with , , and serving as the corresponding weight matrices. The hidden state from the previous time step is , and the current input is . The hyperbolic tangent function tanh generates the candidate hidden state , which, along with the update gate, produces the final hidden state at time t.
For a unidirectional GRU neural network, the state is output from the past to the future, which can lead to disadvantages when processing sequential information. This is because, in some cases, the information in the sequence is influenced by both previous and subsequent inputs. To address this issue, a bidirectional gated recurrent unit (Bi-GRU) is employed. The bidirectional GRU neural network builds on the advantages of bidirectional RNNs and LSTMs, leading to further improvements. Its structure is illustrated in
Figure 2.
The calculation formulas for the bidirectional GRU are as follows:
In the equations,
,
represent the outputs of the forward and backward hidden layers, respectively.
are the weights for the forward and backward hidden states at time
t;
are the corresponding biases. The structure of BiGRU is shown in
Figure 2.
2.3. Additive Attention
The additive attention mechanism was selected due to its ability to mitigate scaling issues and improve model stability compared to standard dot-product attention mechanisms. By calculating attention scores through a learned feedforward layer, additive attention avoids large variance in vector magnitudes, which is particularly advantageous for processing noisy and heterogeneous data in real-world traffic scenarios.
Additive attention primarily serves to select the most relevant information from the encoder’s output for decoding at the current time step. It achieves this by calculating attention weights, which dynamically weigh different parts of the input sequence to generate a context vector. This mechanism enables the model to selectively focus on the most pertinent sections when processing long sequences, thereby enhancing the accuracy of the task. The expression for additive attention is as follows:
In the expressions, T represents the length of the input sequence, while denotes the degree of association between the hidden state of the decoder at time j − 1 and the hidden state of the encoder at position j. The term represents the attention probability distribution, and indicates the context vector at the current time step. Together, these components enable the model to effectively focus on the most relevant parts of the input sequence during the decoding process.
3. Data Preprocessing
This study selects data from the NGSIM (Next-Generation Simulation) dataset, specifically from the I-80 segment, for training and testing the model. The NGSIM dataset, developed by the Federal Highway Administration (FHWA) of the United States, aims to support traffic simulation and research. It has a sampling frequency of 10 Hz and records various information, including vehicle coordinates, speed, acceleration, vehicle type, and lane number. the selected data cover a 1200-foot segment that includes six regular lanes and one merging ramp.
3.1. Trajectory Data Processing
The NGSIM dataset is collected from real traffic environments and is subject to various noise influences, such as sensor errors and environmental interferences, which can adversely affect subsequent data analysis and model training. To address this, missing data points were first interpolated using a linear method to ensure continuity. Then, a Savitzky–Golay (SG) filter was applied to the raw data for smoothing. The SG filter is a fitting method based on polynomial least squares that helps produce smoother and more continuous data, facilitating feature extraction and model training. The filtering window length 41 and the polynomial fitting order 3 were chosen after considering the dataset characteristics and achieving a balance between noise reduction and data fidelity. For instance, the trajectory data of a lane changing vehicle (identified as vehicle number 12) were presented, and the comparison between the filtered data and the original data is shown in
Figure 3.
3.2. Feature Selection
In real-world scenarios, a vehicle’s lane changing behavior results from the interplay of the target vehicle’s state, surrounding vehicle information, and road conditions. To enable the model to better understand this interaction, the input features include Target Features (TFs), Surrounding Vehicle Features (SFs), and Lane Features (LFs). Detailed information about these features is provided in
Table 1.
Surrounding Vehicle Features consist of the historical trajectory information of neighboring vehicles positioned relative to the target vehicle: left front (S1), left rear (S2), right front (S3), right rear (S4), directly in front (S5), and directly behind (S6), as illustrated in
Figure 4. In practical scenarios, there may be fewer than six neighboring vehicles. In such cases, the information for the unavailable vehicles can be set to default values, such as zero vectors or placeholders, to ensure consistent input dimensions for the model.
In this context, the following notations are used: sgn represents the sign function. // denotes floor division, is the lateral velocity of the target vehicle, and is the longitudinal velocity of the target vehicle. is the lateral acceleration of the target vehicle and is the longitudinal acceleration of the target vehicle. d represents the lane width, which is approximately 3.66 m.
Figure 4.
Diagram of target vehicle and surrounding vehicles.
Figure 4.
Diagram of target vehicle and surrounding vehicles.
3.3. Lane Change Trajectory Annotation
The task of recognizing a vehicle’s lane changing intent is fundamentally a time series classification problem. By analyzing segments of vehicle trajectories, it is possible to capture the vehicle’s motion characteristics at different time points, leading to a more accurate identification of its lane changing intent. The extracted trajectory segments are classified into three categories: left lane change, lane keeping, and right lane change, with corresponding labels assigned.
There are two mainstream methods for defining the lane changing process [
24]. The first method considers a segment of the vehicle’s trajectory before it crosses the lane boundary line as part of the lane changing process. The second method includes both the segments before and after crossing the critical point as part of the lane changing process. This study employs the second method, using a heading angle threshold approach to label lane changing intent, which is primarily based on changes in the vehicle’s heading angle.
The lane changing labeling process is illustrated in
Figure 5. The sequence labeling steps used in this study are as follows:
- (1)
Filter out all lane changing vehicle IDs based on the lane changes of the vehicles.
- (2)
Determine the intersection points between the trajectories of lane changing vehicles and the lane boundaries. Then, calculate the vehicle’s heading angle using the relevant parameters (x, y).
- (3)
The heading angle 0 of each sampling point is traversed in the opposite direction of the time axis from the lane change point. If three consecutive sampling points in the trajectory sequence satisfy
(threshold value of the initial course angle for lane changing), the position where the threshold value of 0 is first reached is determined as the starting point for lane changing; Then, use a similar method to determine
(threshold of heading angle at the end of lane change) to determine the end point of lane change, as shown in
Figure 5.
- (4)
If the extracted trajectory sequence includes points of lane changing events, it is defined as a lane change sequence; otherwise, it is categorized as a straight-driving sequence. Lane change sequences are classified as left lane changes and right lane changes based on variations in the vehicle’s lane ID.
Figure 5.
Criteria for trajectory classification.
Figure 5.
Criteria for trajectory classification.
3.4. Trajectory Sequence Extraction
In this study, we utilized the sliding window method to extract sequences of target vehicles and their neighboring vehicles over a specified prediction time length, thereby obtaining the samples.
In the formula, is the extracted feature sequence, and is the corresponding label. The straight sequence is labeled as 0, the vehicle changing lanes to the left is labeled as 1, and the vehicle changing lanes to the right is labeled as 2.
This study extracted a total of 151,602 left lane change sequences, 2,588,920 straight movement sequences, and 26,807 right lane change sequences. To prevent model overfitting during the experiment, it was necessary to maintain a consistent data volume across the three types of driving behaviors. Therefore, we selected 26,807 sequences from each category, referencing the number of right lane change sequences for uniformity. Of these, 70% were designated as the training set, while 30% were allocated to the testing set.
4. Experiments and Results
The experiments were conducted on an NVIDIA GeForce RTX 3090 GPU, utilizing the PyTorch deep learning framework (version 1.13.0) for implementation. The training process consisted of 200 epochs with a batch size of 64. The loss function used was Cross Entropy Loss, which is well suited for multi-class classification tasks. The Adam optimizer, with a learning rate of 0.0005, was employed for its proven ability to achieve fast convergence and stable training in deep learning models.
For the model architecture, the Transformer encoder was configured with two layers, a dropout rate of 0.1, and a hidden dimension of 128, following recommendations from prior studies on Transformer-based sequence modeling. The BiGRU module consisted of six layers, with the number of neurons in each layer decreasing progressively to capture hierarchical temporal features effectively. Specifically, the layers contained 256, 128, 64, 32, 16, and 8 neurons, respectively.
These hyperparameters were determined based on prior research and fine-tuned through grid search optimization. The final configuration was selected according to the best F1-score on the validation set. Additionally, early stopping was applied during training to prevent overfitting, with training halted if the validation loss did not improve for 10 consecutive epochs.
4.1. Evaluation Metrics
The model was trained using the training set and evaluated for performance using the testing set. The performance evaluation metrics selected in this study are as follows:
- (1)
Accuracy: The proportion of correctly predicted positive and negative samples in the testing set relative to the total number of samples.
- (2)
Precision: The proportion of correctly predicted positive samples among all samples predicted as positive.
- (3)
Recall: The proportion of correctly predicted positive samples out of the total actual positive samples.
- (4)
F1-Score: The harmonic mean of precision and recall scores.
4.2. Comparative Experiments
In this study, we introduced a Transformer encoder based on BiGRU and named it Model_T. Further, we incorporated an additive attention mechanism into Model_T, designating it as Model_TA. To validate the effectiveness of the proposed Model_TA, we conducted a comparative analysis with SVM, LSTM, and Transformer models on the NGSIM dataset. The experimental results are presented in
Table 2.
As shown in
Table 2, the proposed Model_TA outperforms the other three models in terms of accuracy, achieving an improvement of 20.3%, 4.73%, and 1.73% over SVM, LSTM, and the Transformer model, respectively, reaching an accuracy of 97.01%. This indicates that the model can accurately identify vehicle driving behaviors and is capable of handling most lane change intention recognition tasks. The recognition of right lane changes is the best, with a recall rate of 99.82%, while the recognition of straight-moving vehicles is comparatively lower, with a recall rate of 94.81%.
The lower recall for straight movement may be attributed to significant fluctuations in some straight-driving data, causing the model to misclassify them as lane change types, resulting in a lower recall rate for straight movements. This issue may be caused by the overlapping of feature spaces, where certain straight-driving behaviors such as slight steering adjustments or speed variations exhibit characteristics similar to those observed in the early stages of lane changing. Furthermore, the limited diversity of the training data may fail to fully capture the variations in lane changing behaviors. Insufficient consideration of vehicle-to-vehicle interactions could also contribute to this problem.
Figure 6 presents the confusion matrix of the model’s performance, illustrating the relationship between true labels and predicted labels for three classes: left lane change, straight driving, and right lane change. The diagonal elements represent the number of correctly classified samples. From the confusion matrix generated by the Model_TA (as shown in
Figure 6), it is evident that when lane change categories are misclassified, they are rarely misidentified as another lane change category. Instead, they are more likely to be misclassified as straight movement. This could be due to the features generated in the early stages of lane change not being sufficiently distinct compared to the features of vehicles maintaining straight movement.
The ROC curve is a common model performance evaluation metric that reflects the trade-off between the true positive rate (TPR) and the false positive rate (FPR), illustrating the classification model’s ability to distinguish between classes at various thresholds. The ROC curve for model training validation is shown in
Figure 7. The area under the curve (AUC) is as high as 0.993, approaching the ideal value of 1.0, indicating that the model performs excellently in distinguishing between different lane change intentions. The curve exhibits a pronounced upward shape and approaches the ideal top-left corner of the ROC curve, further confirming the model’s high accuracy in correctly identifying lane change intentions.
This ROC comparison curve reveals that the vehicle lane change intention recognition model possesses exceptional classification capabilities. The high AUC value and ideal curve shape signify that the model can accurately differentiate vehicle lane change intentions, providing reliable decision support for advanced driver-assistance systems and autonomous driving applications.
4.3. Ablation Experiments
Although the comparative experimental results indicate that the Model_TA model overall outperforms the other three models, the necessity of each internal component of the model lacks persuasive support. To address this, we conducted ablation experiments by removing specific components of the model and observing the changes in performance to confirm the importance of each part and assess its impact on the vehicle lane change intention recognition task.
The results of the ablation experiments are presented in
Table 3. Compared to BiGRU and Model_T, Model_TA achieved accuracy improvements of 1.35% and 0.45%, respectively. This suggests that the components introduced in Model_TA contribute positively to the model’s overall performance in recognizing lane change intentions.
4.4. Anticipatory Judgment Capability
To validate the accuracy of the proposed Model_TA in predicting lane change intentions at different prediction times, we conducted a horizontal comparison with the well-performing models from the comparative and ablation experiments: BiGRU, Model_T, and the Transformer model. Additionally, we selected various prediction times (the time from the endpoint of the lane change sequence to the vehicle’s lane change point) of 0.5 s, 1.0 s, 1.5 s, 2.0 s, 2.5 s, and 3.0 s for vertical comparison.
Accuracy was used as the evaluation metric for the models’ ability to recognize lane change intentions at different prediction times. The results are presented in
Table 4. This analysis helps to understand how the performance of each model varies with the prediction time, providing insights into the effectiveness of Model_TA in real-time driving scenarios.
Table 4 illustrates the comparison of model recognition performance under different prediction times. The prediction time is defined as the time before the lane change event occurs. As the prediction time increases, the classification accuracy of all models inevitably decreases. However, the Model_TA maintains a relatively high accuracy across all prediction times, demonstrating its practical utility.
As shown in
Table 4, when the prediction time is within 1 s, all models demonstrate high accuracy in recognizing lane change behaviors, exceeding 93.5%. Notably, Model_TA achieves the highest recognition accuracy of 95.66% at a prediction time of 0.5 s, while the Transformer model reaches 95.92% at 1.0 s. However, after the prediction time exceeds 1.0 s, Model_TA exhibits a significant advantage in recognition accuracy, especially when the prediction time is 2.0 s or longer, where its accuracy is substantially higher than that of the other models. For instance, at a prediction time of 2.0 s, Model_TA achieves an accuracy of 90.15%, significantly surpassing BiGRU’s 82.49% and Model_T’s 85.27%. This indicates that the inclusion of the additive attention mechanism in Model_TA allows it to better capture the key features of vehicle lane change intentions in long-term prediction tasks, enhancing the overall performance of the model.
The Transformer encoder plays a crucial role in the model’s effectiveness. Its multi-head self-attention mechanism effectively captures long-term dependencies and global features within the input sequences, which is particularly important for lane change intention recognition. Compared to traditional RNN or GRU models, the Transformer encoder provides a more comprehensive understanding of the temporal dynamics of vehicle behavior, thereby improving recognition accuracy. The Transformer model’s accuracy of 95.92% at a prediction time of 1.0 s indicates its efficiency in extracting key information within shorter time windows.
Additionally, as the prediction time increases, the recognition accuracy of all models tends to decline; however, Model_TA shows the smallest decrease, demonstrating strong robustness. At a prediction time of 3.0 s, Model_TA maintains an accuracy of 83.13%, while BiGRU’s accuracy drops to 70.25%. This further validates the advantage of the Transformer encoder in handling long-term dependencies, allowing the model to sustain high accuracy in long-term predictions.
5. Discussion
The model presented in this paper is trained on driving data from straight sections of highways, resulting in a limited applicability due to the singular nature of the scenario. Additionally, this study only examines lane changes between adjacent lanes and does not consider situations such as consecutive lane changes or abandoning lane changes. These limitations may affect the generalizability of the model to other traffic scenarios, such as urban intersections or multi-lane highways. Moreover, the baseline models used for comparison in this study are relatively fundamental and may not fully reflect the precision of state-of-the-art models. Future research will focus on addressing this limitation by incorporating more advanced models for comparison.
To mitigate the impact of dataset imbalance, we employed oversampling techniques for the minority class (lane change sequences) and undersampling techniques for the majority class (straight-driving sequences). In future work, we plan to experiment with alternative techniques, such as generative adversarial networks (GANs) to synthesize realistic lane change sequences, and explore more advanced data augmentation methods to further improve the dataset balance.
Although the model achieves an accuracy of 97.01% on the current dataset, its robustness under real-world scenarios remains to be evaluated. Specifically, edge cases, such as rare driving behaviors, and unseen traffic conditions, like weather changes or unfamiliar vehicle types, have not been explicitly tested. To address these concerns, future work will focus on validating the model in real-world traffic conditions through the collection of real-world driving data under various conditions, such as different weather conditions, road types, and traffic densities. Additionally, domain adaptation techniques will be explored to bridge the gap between simulated data and real-world environments, enabling the model to transfer knowledge effectively and adapt to unseen scenarios. These efforts aim to improve the model’s robustness and practical utility in complex traffic environments.
6. Conclusions
To tackle the issue of poor performance in long-term lane change intention recognition prediction in highway scenarios, this paper proposes a hybrid model based on Transformer encoders, BiGRU, and an additive attention mechanism. The Transformer encoder effectively captures long-term dependencies and global features in input sequences through its multi-head self-attention mechanism, enhancing the model’s understanding of dynamic changes in vehicle behavior. BiGRU further integrates temporal information, while the additive attention mechanism improves the model’s focus on key features, optimizing overall performance.