Research on Vehicle Lane Change Intent Recognition Based on Transformers and Bidirectional Gated Recurrent Units

Zhou, Dan; Chen, Yujie; Fan, Kexing; Bai, Qi; Luo, Yong; Xie, Guodong

doi:10.3390/wevj16030155

Open AccessArticle

Research on Vehicle Lane Change Intent Recognition Based on Transformers and Bidirectional Gated Recurrent Units

by

Dan Zhou

^1,2,3,*,

Yujie Chen

¹,

Kexing Fan

¹,

Qi Bai

¹,

Yong Luo

¹ and

Guodong Xie

¹

School of Architecture and Transportation Engineering, Guilin University of Electronic Technology, Guilin 541004, China

²

Guangxi Key Laboratory of ITS, Guilin University of Electronic Technology, Guilin 541004, China

³

Department of Science and Technology of Guangxi Zhuang Autonomous Region, Guilin 541004, China

^*

Author to whom correspondence should be addressed.

World Electr. Veh. J. 2025, 16(3), 155; https://doi.org/10.3390/wevj16030155

Submission received: 10 January 2025 / Revised: 8 February 2025 / Accepted: 4 March 2025 / Published: 6 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

In order to quickly and accurately identify the lane changing intention of vehicles, and to deeply consider the time series characteristics of vehicle driving processes and the interactive effects between vehicles, a lane changing intention recognition model, namely, Model_TA, was constructed by combining the time series feature extraction ability of the encoder in the Transformer model, the bidirectional gating mechanism of the bidirectional gated recurrent unit, and the additive attention mechanism. The performance of the Model_TA model was trained and validated on the I-80 dataset in NGSIM. The experimental results showed that the accuracy of model intent recognition was 97.01%, which was 20.3%, 4.73%, and 1.73% higher than that of SVM, LSTM, and Transformer models, respectively; the prediction accuracy at 2.0 s, 2.5 s, and 3.0 s is 90.15%, 84.58%, and 83.13%, respectively, which is better than similar models. It is proved that the model can better predict the lane changing intention of vehicles.

Keywords:

traffic engineering; lane change intention; feature extraction; bidirectional gated recurrent unit; attention mechanism

1. Introduction

In recent years, with the rapid development of autonomous driving technology, lane change intention recognition has gradually become a hot topic in advanced driver assistance systems (ADASs) research. Similarly to challenges in other domains, such as aerial vision-and-language navigation [1], lane change intention recognition requires intelligent systems to process large-scale sequential data and reason about spatial and temporal relationships. Accurate lane change intention recognition plays a crucial role in autonomous driving systems, directly affecting driving safety and system decision-making [2]. Therefore, deeply understanding the motion intentions of traffic participants is essential for enabling autonomous vehicles to make better autonomous decisions and path planning, particularly in mixed-traffic environments, thereby enhancing the effectiveness of collision avoidance systems and overall traffic safety [3].

Current driving intention recognition methods can be broadly categorized into two types: traditional methods and data-driven methods. Traditional approaches often employ probabilistic models, but these methods are usually constrained by assumptions and limitations. With the advancement of machine learning and deep learning technologies, data-driven methods have shown superior predictive accuracy [4]. For example, KUMAR P et al. [5] proposed an online lane change intention prediction method based on support vector machines (SVMs) and Bayesian filtering, but this approach does not provide posterior probability values. To address this issue, YANG et al. [6]. introduced a lane change intention classifier based on relevance vector machines (RVMs), which can predict surrounding vehicles’ lane change situations more quickly and accurately, demonstrating advantages over radar sensors and SVMs, thus improving model convergence. Additionally, SONG Xiaolin et al. [7] proposed a hybrid model based on HMM-SVM to enhance lane change intention recognition rates using driver operation data. Other models include HMM-BF [8] and GA-HMM [9], among others.

Despite the progress made in lane change intention recognition research based on machine learning, significant advancements have been achieved with the development of deep learning technologies. Recurrent neural networks, such as Long Short-Term Memory (LSTM), have shown excellent performance in handling sequential tasks, and have resulted in improved accuracy in driving intention recognition [10]. In order to establish an intention prediction model based on LSTM [11,12,13,14], Patel S et al. [15] proposed a method based on recurrent neural networks (RNNs) and graphical models to predict the future lane change intentions of other vehicles on highways. F. Altché et al. [16] introduced an LSTM-based recurrent neural network model to fuse GPS, IMU, and odometry data to recognize driver intentions when entering intersections. Ji Xuewu et al. [17] designed an LSTM-based driving intention recognition and vehicle trajectory prediction model that outperformed traditional methods. S. Yang et al. [4] utilized an LSTM network based on spatiotemporal attention mechanisms to accurately predict lane change behavior within five seconds using the NGSIM dataset. R. Izquierdo et al. [11] proposed an integrated method using CNNs (Convolutional Neural Networks) and LSTM to predict the lane change intentions of surrounding vehicles, demonstrating good generalizability. Gao Kai et al. [18] introduced a novel lane change intention prediction algorithm that combines a multi-head attention mechanism with CNN and LSTM, significantly improving prediction accuracy by processing lateral position information and surrounding environmental data on the NGSIM dataset. Li Lin et al. [19] proposed a temporal recognition model based on a Bi-GLSTM network which improved lane change intention recognition accuracy by combining a graph attention neural network and validated the model’s effectiveness on the HighD dataset. K. Gao et al. [20] developed a dual Transformer model that combines multi-head attention mechanisms with LSTM to optimize performance, significantly enhancing prediction accuracy on the NGSIM and HighD datasets; however, such methods primarily focus on the spatial correlations of interacting vehicles while neglecting temporal dependencies. Additionally, some researchers [14] proposed a framework based on inverse reinforcement learning and bidirectional recurrent neural networks, focusing on the intentions and trajectory predictions of vulnerable road users in urban traffic environments, showing that their average displacement error was significantly lower than that of other baseline models. Deep learning has shown great potential in solving complex transportation problems, too. For instance, residual neural networks have been applied to estimate origin–destination (O-D) trip matrices from link flow data, effectively addressing challenges like limited sensor deployment and indirect O-D flow data [21]. Another study introduced a stacked sparse autoencoder (SAE) framework to simultaneously estimate O-D flows and optimize sensor placement, achieving accurate predictions with fewer sensors [22].

Although research on driving behavior perception and intention recognition has made progress, there remain challenges in modeling methods. These include (1) the need for improved model scalability and generalization capabilities; (2) the inability to account for interactions between vehicles, resulting in reduced recognition accuracy in special scenarios; and (3) the need for enhanced robustness in data-driven methods despite some success. Therefore, considering the advantages and disadvantages of existing models and the importance of interaction information, this paper proposes a combined model integrating the Transformer encoder and BiGRU, along with an additive attention mechanism, to identify vehicle lane change intentions.

2. Method

In the context of intelligent connected transportation environments, the advancement of sensor technology has significantly expanded the scale of vehicle feature parameter sequences. This phenomenon has led to an increase in potentially irrelevant information within the data, thereby imposing higher demands on the effectiveness of models. Consequently, models must not only possess the capability to handle large-scale data, but also effectively extract temporal features to ensure accurate recognition of lane change intentions. To address this, this paper proposes a novel lane change intent recognition model, Model_TA, which integrates a Transformer encoder, a bidirectional gated recurrent unit (BiGRU), and an additive attention mechanism, aimed at enhancing the accuracy and robustness of lane change intent recognition. Its structure is illustrated in Figure 1.

First, the Transformer encoder utilizes a self-attention mechanism for effective extraction of global features when processing sequential data. This mechanism captures dependencies between different time steps, thereby identifying key vehicle behavioral characteristics within complex traffic scenarios. Through parallel computation, the Transformer significantly improves the training efficiency of the model, reducing the distance of information transmission and laying the groundwork for real-time applications. Secondly, the BiGRU demonstrates unique advantages in capturing temporal information. By considering both historical and future information, the BiGRU enables the model to comprehensively understand the context of vehicle behavior. This bidirectional characteristic enhances the model’s ability to capture long-range dependencies, thereby improving the accuracy of lane change intent recognition and the timeliness of decision-making. Furthermore, the additive attention mechanism endows the model with dynamic weight allocation capabilities, allowing it to focus on the most relevant features during the feature extraction process. This mechanism effectively reduces the interference of irrelevant information on the model’s judgment, optimizing the identification of key features and enhancing the model’s adaptability in complex environments. Finally, the Softmax function is employed to compute the driving intentions of target vehicles, providing probability distributions for behaviors such as left lane change, straight driving, and right lane change. Through this probabilistic output, Model_TA not only provides clear decision-making foundations for intelligent driving systems, but also enhances the system’s interpretability.

While the Transformer encoder, BiGRU, and attention mechanisms have been applied in various contexts, the novelty of Model_TA lies in its tailored integration for lane change intent recognition. Specifically, the use of the Transformer encoder enables effective global feature extraction from sequential data, addressing long-range dependencies in vehicle behavior. The BiGRU component enhances temporal understanding by incorporating both historical and future information, while the additive attention mechanism dynamically assigns weights to relevant features, mitigating the impact of noisy or irrelevant information. This unified framework is specifically designed to handle the challenges of large-scale, noisy, and temporally complex traffic datasets. Furthermore, the probabilistic outputs of Model_TA provide clear and interpretable decision foundations for intelligent driving systems, distinguishing it from existing approaches in terms of accuracy, robustness, and applicability to real-world traffic scenarios.

2.1. Transformer Encoder

The Transformer model was originally proposed by Vaswani et al. [23]. It consists of two parts: an encoder for processing the input sequence, and a decoder for generating the output sequence. This paper primarily utilizes the encoder part. The Transformer encoder is mainly composed of a multi-head attention mechanism and a feedforward neural network.

The basic building block of the multi-head attention layer is the scaled dot-product attention unit. When the sequence is passed through this unit, the attention weights are computed simultaneously across each row. The result not only contains information about the row itself, but also includes a weighted combination of other relevant rows. In this paper, the time series is passed through positional encoding before entering the multi-head attention area of the Transformer encoder. It undergoes three independent linear transformations to obtain the query matrix Q, the key matrix K, and the value matrix V, calculated as follows:

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(1)

In the formula,

d_{k}

represents the dimension of the key vectors, which is used to scale the dot product results

The multi-head self-attention mechanism concatenates the outputs of multiple attention heads and applies a linear transformation to obtain the final multi-head attention output:

M u l t i H e a d (Q, K, V) = C o n c a t ({h e a d}_{1}, \dots, {h e a d}_{h}) W^{O}

(2)

In the formula,

W^{O}

represents the weight matrix for the output. This matrix is used to linearly transform the concatenated outputs from the multiple attention heads into the final output of the multi-head attention mechanism.

The multi-head attention mechanism allows for the parallel computation of several independent attention heads, enabling the model to simultaneously focus on different parts of the input sequence across various representation subspaces. This capability helps capture richer and more diverse features. Such a mechanism significantly enhances the performance of the Transformer encoder in handling complex sequence data tasks.

2.2. Bidirectional Gate Recurrent Unit

The GRU model replaces the forget gate and input gate of the LSTM model with an update gate. The calculation formulas for the GRU are as follows:

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}])

(3)

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}])

(4)

{\tilde{h}}_{t} = tanh (W_{h} \cdot [r_{t} \cdot h_{t - 1}, x_{t}] + b_{h})

(5)

h_{t} = (1 - z_{t}) \cdot h_{t - 1} + z_{t} \cdot {\tilde{h}}_{t}

(6)

In the GRU model,

z_{t}

represents the update gate state at time t, while

r_{t}

denotes the reset gate state at the same time. The sigmoid function

σ

is used to compute these gates, with

W_{z}

,

W_{r}

, and

W_{h}

serving as the corresponding weight matrices. The hidden state from the previous time step is

h_{t - 1}

, and the current input is

x_{t}

. The hyperbolic tangent function tanh generates the candidate hidden state

{\tilde{h}}_{t}

, which, along with the update gate, produces the final hidden state

h_{t}

at time t.

For a unidirectional GRU neural network, the state is output from the past to the future, which can lead to disadvantages when processing sequential information. This is because, in some cases, the information in the sequence is influenced by both previous and subsequent inputs. To address this issue, a bidirectional gated recurrent unit (Bi-GRU) is employed. The bidirectional GRU neural network builds on the advantages of bidirectional RNNs and LSTMs, leading to further improvements. Its structure is illustrated in Figure 2.

The calculation formulas for the bidirectional GRU are as follows:

\vec{h_{t}} = G R U (x_{t}, \vec{h_{t - 1}})

(7)

\overset{\leftarrow}{h_{t}} = G R U (x_{t}, \overset{\leftarrow}{h_{t - 1}})

(8)

h_{t} = W_{t} \cdot \vec{h_{t}} + V_{t} \cdot \overset{\leftarrow}{h_{t}} + b_{t}

(9)

In the equations,

\vec{h_{t}}

,

\overset{\leftarrow}{h_{t}}

represent the outputs of the forward and backward hidden layers, respectively.

W_{t} {, V}_{t}

are the weights for the forward and backward hidden states at time t;

b_{t}

are the corresponding biases. The structure of BiGRU is shown in Figure 2.

2.3. Additive Attention

The additive attention mechanism was selected due to its ability to mitigate scaling issues and improve model stability compared to standard dot-product attention mechanisms. By calculating attention scores through a learned feedforward layer, additive attention avoids large variance in vector magnitudes, which is particularly advantageous for processing noisy and heterogeneous data in real-world traffic scenarios.

Additive attention primarily serves to select the most relevant information from the encoder’s output for decoding at the current time step. It achieves this by calculating attention weights, which dynamically weigh different parts of the input sequence to generate a context vector. This mechanism enables the model to selectively focus on the most pertinent sections when processing long sequences, thereby enhancing the accuracy of the task. The expression for additive attention is as follows:

e_{i j} = v_{a}^{T} tanh (W_{a} [s_{i - 1}; h_{j}])

(10)

α_{i j} = \frac{exp (e_{i j})}{\sum_{k = 1}^{T_{x}} e x p (e_{i k})}

(11)

c_{i} = \sum_{j = 1}^{T_{x}} α_{i j} h_{j}

(12)

In the expressions, T represents the length of the input sequence, while

e_{i j}

denotes the degree of association between the hidden state of the decoder at time j − 1 and the hidden state of the encoder at position j. The term

α_{i j}

represents the attention probability distribution, and

c_{i}

indicates the context vector at the current time step. Together, these components enable the model to effectively focus on the most relevant parts of the input sequence during the decoding process.

3. Data Preprocessing

This study selects data from the NGSIM (Next-Generation Simulation) dataset, specifically from the I-80 segment, for training and testing the model. The NGSIM dataset, developed by the Federal Highway Administration (FHWA) of the United States, aims to support traffic simulation and research. It has a sampling frequency of 10 Hz and records various information, including vehicle coordinates, speed, acceleration, vehicle type, and lane number. the selected data cover a 1200-foot segment that includes six regular lanes and one merging ramp.

3.1. Trajectory Data Processing

The NGSIM dataset is collected from real traffic environments and is subject to various noise influences, such as sensor errors and environmental interferences, which can adversely affect subsequent data analysis and model training. To address this, missing data points were first interpolated using a linear method to ensure continuity. Then, a Savitzky–Golay (SG) filter was applied to the raw data for smoothing. The SG filter is a fitting method based on polynomial least squares that helps produce smoother and more continuous data, facilitating feature extraction and model training. The filtering window length 41 and the polynomial fitting order 3 were chosen after considering the dataset characteristics and achieving a balance between noise reduction and data fidelity. For instance, the trajectory data of a lane changing vehicle (identified as vehicle number 12) were presented, and the comparison between the filtered data and the original data is shown in Figure 3.

3.2. Feature Selection

In real-world scenarios, a vehicle’s lane changing behavior results from the interplay of the target vehicle’s state, surrounding vehicle information, and road conditions. To enable the model to better understand this interaction, the input features include Target Features (TFs), Surrounding Vehicle Features (SFs), and Lane Features (LFs). Detailed information about these features is provided in Table 1.

Surrounding Vehicle Features consist of the historical trajectory information of neighboring vehicles positioned relative to the target vehicle: left front (S1), left rear (S2), right front (S3), right rear (S4), directly in front (S5), and directly behind (S6), as illustrated in Figure 4. In practical scenarios, there may be fewer than six neighboring vehicles. In such cases, the information for the unavailable vehicles can be set to default values, such as zero vectors or placeholders, to ensure consistent input dimensions for the model.

\{\begin{matrix} Δ x^{i} = sgn [- 1 + (i - 1) / / 2] d \\ Δ y^{i} = {(- 1)}^{i + 1} \infty \\ v_{x}^{i} = v_{x} \\ v_{y}^{i} = v_{y} \\ a_{x}^{i} = a_{x} \\ a_{y}^{i} = a_{y} \end{matrix}

(13)

In this context, the following notations are used: sgn represents the sign function. // denotes floor division,

v_{x}

is the lateral velocity of the target vehicle, and

v_{y}

is the longitudinal velocity of the target vehicle.

a_{x}

is the lateral acceleration of the target vehicle and

a_{y}

is the longitudinal acceleration of the target vehicle. d represents the lane width, which is approximately 3.66 m.

Figure 4. Diagram of target vehicle and surrounding vehicles.

3.3. Lane Change Trajectory Annotation

The task of recognizing a vehicle’s lane changing intent is fundamentally a time series classification problem. By analyzing segments of vehicle trajectories, it is possible to capture the vehicle’s motion characteristics at different time points, leading to a more accurate identification of its lane changing intent. The extracted trajectory segments are classified into three categories: left lane change, lane keeping, and right lane change, with corresponding labels assigned.

There are two mainstream methods for defining the lane changing process [24]. The first method considers a segment of the vehicle’s trajectory before it crosses the lane boundary line as part of the lane changing process. The second method includes both the segments before and after crossing the critical point as part of the lane changing process. This study employs the second method, using a heading angle threshold approach to label lane changing intent, which is primarily based on changes in the vehicle’s heading angle.

The lane changing labeling process is illustrated in Figure 5. The sequence labeling steps used in this study are as follows:

(1): Filter out all lane changing vehicle IDs based on the lane changes of the vehicles.
(2): Determine the intersection points between the trajectories of lane changing vehicles and the lane boundaries. Then, calculate the vehicle’s heading angle $θ = a r c t a n (\frac{x^{t} - x^{t - 3}}{y^{t} - y^{t - 3}})$ using the relevant parameters (x, y).
(3): The heading angle 0 of each sampling point is traversed in the opposite direction of the time axis from the lane change point. If three consecutive sampling points in the trajectory sequence satisfy $∣ θ ∣ ⩽ θ_{s}$ (threshold value of the initial course angle for lane changing), the position where the threshold value of 0 is first reached is determined as the starting point for lane changing; Then, use a similar method to determine $∣ θ ∣ ⩽ θ_{e}$ (threshold of heading angle at the end of lane change) to determine the end point of lane change, as shown in Figure 5.
(4): If the extracted trajectory sequence includes points of lane changing events, it is defined as a lane change sequence; otherwise, it is categorized as a straight-driving sequence. Lane change sequences are classified as left lane changes and right lane changes based on variations in the vehicle’s lane ID.

Figure 5. Criteria for trajectory classification.

3.4. Trajectory Sequence Extraction

In this study, we utilized the sliding window method to extract sequences of target vehicles and their neighboring vehicles over a specified prediction time length, thereby obtaining the samples.

{X, Y} = {(X_{t - t_{h}}, \dots, X_{t}), Y_{t}}

(14)

In the formula,

{X = (X}_{t - t_{h}}, \dots, X_{t})

is the extracted feature sequence, and

Y_{t} \in \{0, 1, 2\}

is the corresponding label. The straight sequence is labeled as 0, the vehicle changing lanes to the left is labeled as 1, and the vehicle changing lanes to the right is labeled as 2.

This study extracted a total of 151,602 left lane change sequences, 2,588,920 straight movement sequences, and 26,807 right lane change sequences. To prevent model overfitting during the experiment, it was necessary to maintain a consistent data volume across the three types of driving behaviors. Therefore, we selected 26,807 sequences from each category, referencing the number of right lane change sequences for uniformity. Of these, 70% were designated as the training set, while 30% were allocated to the testing set.

4. Experiments and Results

The experiments were conducted on an NVIDIA GeForce RTX 3090 GPU, utilizing the PyTorch deep learning framework (version 1.13.0) for implementation. The training process consisted of 200 epochs with a batch size of 64. The loss function used was Cross Entropy Loss, which is well suited for multi-class classification tasks. The Adam optimizer, with a learning rate of 0.0005, was employed for its proven ability to achieve fast convergence and stable training in deep learning models.

For the model architecture, the Transformer encoder was configured with two layers, a dropout rate of 0.1, and a hidden dimension of 128, following recommendations from prior studies on Transformer-based sequence modeling. The BiGRU module consisted of six layers, with the number of neurons in each layer decreasing progressively to capture hierarchical temporal features effectively. Specifically, the layers contained 256, 128, 64, 32, 16, and 8 neurons, respectively.

These hyperparameters were determined based on prior research and fine-tuned through grid search optimization. The final configuration was selected according to the best F1-score on the validation set. Additionally, early stopping was applied during training to prevent overfitting, with training halted if the validation loss did not improve for 10 consecutive epochs.

4.1. Evaluation Metrics

The model was trained using the training set and evaluated for performance using the testing set. The performance evaluation metrics selected in this study are as follows:

(1): Accuracy: The proportion of correctly predicted positive and negative samples in the testing set relative to the total number of samples.
(2): Precision: The proportion of correctly predicted positive samples among all samples predicted as positive.
(3): Recall: The proportion of correctly predicted positive samples out of the total actual positive samples.
(4): F1-Score: The harmonic mean of precision and recall scores.

4.2. Comparative Experiments

In this study, we introduced a Transformer encoder based on BiGRU and named it Model_T. Further, we incorporated an additive attention mechanism into Model_T, designating it as Model_TA. To validate the effectiveness of the proposed Model_TA, we conducted a comparative analysis with SVM, LSTM, and Transformer models on the NGSIM dataset. The experimental results are presented in Table 2.

As shown in Table 2, the proposed Model_TA outperforms the other three models in terms of accuracy, achieving an improvement of 20.3%, 4.73%, and 1.73% over SVM, LSTM, and the Transformer model, respectively, reaching an accuracy of 97.01%. This indicates that the model can accurately identify vehicle driving behaviors and is capable of handling most lane change intention recognition tasks. The recognition of right lane changes is the best, with a recall rate of 99.82%, while the recognition of straight-moving vehicles is comparatively lower, with a recall rate of 94.81%.

The lower recall for straight movement may be attributed to significant fluctuations in some straight-driving data, causing the model to misclassify them as lane change types, resulting in a lower recall rate for straight movements. This issue may be caused by the overlapping of feature spaces, where certain straight-driving behaviors such as slight steering adjustments or speed variations exhibit characteristics similar to those observed in the early stages of lane changing. Furthermore, the limited diversity of the training data may fail to fully capture the variations in lane changing behaviors. Insufficient consideration of vehicle-to-vehicle interactions could also contribute to this problem.

Figure 6 presents the confusion matrix of the model’s performance, illustrating the relationship between true labels and predicted labels for three classes: left lane change, straight driving, and right lane change. The diagonal elements represent the number of correctly classified samples. From the confusion matrix generated by the Model_TA (as shown in Figure 6), it is evident that when lane change categories are misclassified, they are rarely misidentified as another lane change category. Instead, they are more likely to be misclassified as straight movement. This could be due to the features generated in the early stages of lane change not being sufficiently distinct compared to the features of vehicles maintaining straight movement.

The ROC curve is a common model performance evaluation metric that reflects the trade-off between the true positive rate (TPR) and the false positive rate (FPR), illustrating the classification model’s ability to distinguish between classes at various thresholds. The ROC curve for model training validation is shown in Figure 7. The area under the curve (AUC) is as high as 0.993, approaching the ideal value of 1.0, indicating that the model performs excellently in distinguishing between different lane change intentions. The curve exhibits a pronounced upward shape and approaches the ideal top-left corner of the ROC curve, further confirming the model’s high accuracy in correctly identifying lane change intentions.

This ROC comparison curve reveals that the vehicle lane change intention recognition model possesses exceptional classification capabilities. The high AUC value and ideal curve shape signify that the model can accurately differentiate vehicle lane change intentions, providing reliable decision support for advanced driver-assistance systems and autonomous driving applications.

4.3. Ablation Experiments

Although the comparative experimental results indicate that the Model_TA model overall outperforms the other three models, the necessity of each internal component of the model lacks persuasive support. To address this, we conducted ablation experiments by removing specific components of the model and observing the changes in performance to confirm the importance of each part and assess its impact on the vehicle lane change intention recognition task.

The results of the ablation experiments are presented in Table 3. Compared to BiGRU and Model_T, Model_TA achieved accuracy improvements of 1.35% and 0.45%, respectively. This suggests that the components introduced in Model_TA contribute positively to the model’s overall performance in recognizing lane change intentions.

4.4. Anticipatory Judgment Capability

To validate the accuracy of the proposed Model_TA in predicting lane change intentions at different prediction times, we conducted a horizontal comparison with the well-performing models from the comparative and ablation experiments: BiGRU, Model_T, and the Transformer model. Additionally, we selected various prediction times (the time from the endpoint of the lane change sequence to the vehicle’s lane change point) of 0.5 s, 1.0 s, 1.5 s, 2.0 s, 2.5 s, and 3.0 s for vertical comparison.

Accuracy was used as the evaluation metric for the models’ ability to recognize lane change intentions at different prediction times. The results are presented in Table 4. This analysis helps to understand how the performance of each model varies with the prediction time, providing insights into the effectiveness of Model_TA in real-time driving scenarios.

Table 4 illustrates the comparison of model recognition performance under different prediction times. The prediction time is defined as the time before the lane change event occurs. As the prediction time increases, the classification accuracy of all models inevitably decreases. However, the Model_TA maintains a relatively high accuracy across all prediction times, demonstrating its practical utility.

As shown in Table 4, when the prediction time is within 1 s, all models demonstrate high accuracy in recognizing lane change behaviors, exceeding 93.5%. Notably, Model_TA achieves the highest recognition accuracy of 95.66% at a prediction time of 0.5 s, while the Transformer model reaches 95.92% at 1.0 s. However, after the prediction time exceeds 1.0 s, Model_TA exhibits a significant advantage in recognition accuracy, especially when the prediction time is 2.0 s or longer, where its accuracy is substantially higher than that of the other models. For instance, at a prediction time of 2.0 s, Model_TA achieves an accuracy of 90.15%, significantly surpassing BiGRU’s 82.49% and Model_T’s 85.27%. This indicates that the inclusion of the additive attention mechanism in Model_TA allows it to better capture the key features of vehicle lane change intentions in long-term prediction tasks, enhancing the overall performance of the model.

The Transformer encoder plays a crucial role in the model’s effectiveness. Its multi-head self-attention mechanism effectively captures long-term dependencies and global features within the input sequences, which is particularly important for lane change intention recognition. Compared to traditional RNN or GRU models, the Transformer encoder provides a more comprehensive understanding of the temporal dynamics of vehicle behavior, thereby improving recognition accuracy. The Transformer model’s accuracy of 95.92% at a prediction time of 1.0 s indicates its efficiency in extracting key information within shorter time windows.

Additionally, as the prediction time increases, the recognition accuracy of all models tends to decline; however, Model_TA shows the smallest decrease, demonstrating strong robustness. At a prediction time of 3.0 s, Model_TA maintains an accuracy of 83.13%, while BiGRU’s accuracy drops to 70.25%. This further validates the advantage of the Transformer encoder in handling long-term dependencies, allowing the model to sustain high accuracy in long-term predictions.

5. Discussion

The model presented in this paper is trained on driving data from straight sections of highways, resulting in a limited applicability due to the singular nature of the scenario. Additionally, this study only examines lane changes between adjacent lanes and does not consider situations such as consecutive lane changes or abandoning lane changes. These limitations may affect the generalizability of the model to other traffic scenarios, such as urban intersections or multi-lane highways. Moreover, the baseline models used for comparison in this study are relatively fundamental and may not fully reflect the precision of state-of-the-art models. Future research will focus on addressing this limitation by incorporating more advanced models for comparison.

To mitigate the impact of dataset imbalance, we employed oversampling techniques for the minority class (lane change sequences) and undersampling techniques for the majority class (straight-driving sequences). In future work, we plan to experiment with alternative techniques, such as generative adversarial networks (GANs) to synthesize realistic lane change sequences, and explore more advanced data augmentation methods to further improve the dataset balance.

Although the model achieves an accuracy of 97.01% on the current dataset, its robustness under real-world scenarios remains to be evaluated. Specifically, edge cases, such as rare driving behaviors, and unseen traffic conditions, like weather changes or unfamiliar vehicle types, have not been explicitly tested. To address these concerns, future work will focus on validating the model in real-world traffic conditions through the collection of real-world driving data under various conditions, such as different weather conditions, road types, and traffic densities. Additionally, domain adaptation techniques will be explored to bridge the gap between simulated data and real-world environments, enabling the model to transfer knowledge effectively and adapt to unseen scenarios. These efforts aim to improve the model’s robustness and practical utility in complex traffic environments.

6. Conclusions

To tackle the issue of poor performance in long-term lane change intention recognition prediction in highway scenarios, this paper proposes a hybrid model based on Transformer encoders, BiGRU, and an additive attention mechanism. The Transformer encoder effectively captures long-term dependencies and global features in input sequences through its multi-head self-attention mechanism, enhancing the model’s understanding of dynamic changes in vehicle behavior. BiGRU further integrates temporal information, while the additive attention mechanism improves the model’s focus on key features, optimizing overall performance.

Author Contributions

Conceptualization, D.Z. and Y.C.; methodology, Y.L.; validation, K.F.; writing—original draft preparation, Y.C.; writing—review and editing, Q.B. and G.X.; supervision, D.Z.; Visualization, Y.L.; funding acquisition, D.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 52462048, and the Guangxi Natural Science Foundation, grant number 2024GXNSFAA010278.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article, and further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no competing interests. There are no financial or personal relationships that could have inappropriately influenced or biased our work.

References

Liu, S.; Zhang, H.; Qi, Y.; Wang, P.; Zhang, Y.; Wu, Q. AerialVLN: Vision-and-Language Navigation for UAVs. arXiv 2023, arXiv:2308.06735. [Google Scholar]
Han, H.; Xie, T. Vehicle Lane Change Trajectory Prediction Based on Attention Seq2Seq Network. China J. Highw. Transp. 2020, 33, 106–118. [Google Scholar]
Zhang, Y.; Chen, Y.; Li, Y.; Huang, J.; Li, S. Lane-Changing Intention Recognition Based on Multivehicle Interaction Dynamic Graph Modeling in a Connected Environment. J. Transp. Eng. Part A Syst. 2024, 150, 04024022. [Google Scholar] [CrossRef]
Yang, S.; Chen, Y.; Cao, Y.; Wang, R.; Shi, R.; Lu, J. Lane Change Trajectory Prediction Based on Spatiotemporal Attention Mechanism. In Proceedings of the 2022 IEEE 25th International Conference on Intelligent Transportation Systems (ITSC), Macau, China, 8–12 October 2022. [Google Scholar]
Kumar, P.; Perrollaz, M.; Lefevre, S.; Laugier, C. Learning-Based Approach for Online Lane Change Intention Prediction. In Proceedings of the IEEE Intelligent Vehicles Symposium, Gold Coast, QLD, Australia, 23–26 June 2013; IEEE: New York, NY, USA, 2013; pp. 797–802. [Google Scholar]
Yang, J.H.; Kim, D.J.; Chung, C.C. Lane change intention inference of surrounding vehicle: Comparative study on Relevance Vector Machine (RVM) and Support Vector Machine (SVM). In Proceedings of the 21st International Conference on Control, Automation and Systems (ICCAS), Jeju, Republic of Korea, 12–15 October 2021; IEEE: Jeju, Republic of Korea, 2021; pp. 1580–1585. [Google Scholar]
Song, X.; Zheng, Y.; Cao, H. Research on Driver Lane-Changing Intention Identification Based on HMM-SVM. J. Electron. Meas. Instrum. 2016, 30, 58–65. [Google Scholar]
Zhang, M.; Fu, R.; Morris, D.D.; Wang, C. A framework for turning behavior classification at intersections using 3D LIDAR. IEEE Trans. Veh. Technol. 2019, 68, 7431–7442. [Google Scholar] [CrossRef]
Deng, Q.; Wang, J.; Soffker, D. Prediction of human driver behaviors based on an improved HMM approach. In Proceedings of the IEEE Intelligent Vehicles Symposium, Changshu, China, 26–30 June 2018; IEEE: New York, NY, USA, 2018; pp. 2066–2071. [Google Scholar]
Zhang, H.; Fu, R.; Yuan, W.; Guo, Y. Research on Driving Behavior Perception and Intention Recognition Algorithm for Lead Vehicles. China J. Highw. Transp. 2022, 35, 299–311. [Google Scholar]
Izquierdo, R.; Quintanar, A.; Parra, I.; Fernandez-Llorca, D.; Sotelo, M.A. Experimental validation of lane-change intention prediction methodologies based on CNN and LSTM. In Proceedings of the IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; pp. 3657–3662. [Google Scholar]
Dang, H.Q.; Furnkranz, J.; Biedermann, A.; Hoepfl, M. Time-to-lane-change prediction with deep learning. In Proceedings of the IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 1–7. [Google Scholar]
Wang, J.; Zhang, Z.; Lu, G. A Bayesian inference based adaptive lane change prediction model. Transp. Res. Part C Emerg. Technol. 2021, 132, 103363. [Google Scholar] [CrossRef]
Saleh, K.; Hossny, M.; Nahavandi, S. Contextual recurrent predictive model for long-term intent prediction of vulnerable road users. IEEE Trans. Intell. Transp. Syst. 2020, 21, 3398–3408. [Google Scholar] [CrossRef]
Patel, S.; Griffin, B.; Kusano, K.; Corso, J.J. Predicting Future Lane Changes of Other Highway Vehicles using RNN-based Deep Models. arXiv 2019, arXiv:1801.04340. [Google Scholar]
Altché, F.; de La Fortelle, A. An LSTM Network for Highway Trajectory Prediction. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 353–359. [Google Scholar]
Ji, X.; Fei, C.; He, X.; Liu, Y.; Liu, Y. Driving Intention Recognition and Vehicle Trajectory Prediction Based on LSTM Network. China J. Highw. Transp. 2019, 32, 34–42. [Google Scholar]
Gao, K.; Li, X.; Hu, L.; Chen, B.; Du, R. Lane Change Intention Prediction Based on Multi-Head Attention CNN-LSTM. J. Mech. Eng. 2022, 58, 369–378. [Google Scholar]
Li, L.; Zhao, W.; Wang, C. Analysis and Recognition of Vehicle Driving Intent Based on Bi-GLSTM Network. J. Mech. Eng. 2024, 60, 51–63. [Google Scholar]
Gao, K.; Li, X.; Chen, B.; Hu, L.; Liu, J.; Du, R.; Li, Y. Dual Transformer Based Prediction for Lane Change Intentions and Trajectories in Mixed Traffic Environment. IEEE Trans. Intell. Transp. Syst. 2023, 24, 6203–6216. [Google Scholar] [CrossRef]
Alshehri, A.; Owais, M.; Gyani, J.; Aljarbou, M.H.; Alsulamy, S. Residual Neural Networks for Origin–Destination Trip Matrix Estimation from Traffic Sensor Information. Sustainability 2023, 15, 9881. [Google Scholar] [CrossRef]
Owais, M. Deep Learning for Integrated Origin–Destination Estimation and Traffic Sensor Location Problems. IEEE Trans. Intell. Transp. Syst. 2024, 25, 6501–6513. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2023, arXiv:1706.03762. [Google Scholar] [CrossRef]
Fang, H.; Liu, L.; Gu, Q.; Xiao, X.; Meng, Y. Current Research Status of Lane Change Intention Recognition for Autonomous Vehicles. J. Eng. Sci. 2024, 46, 1845–1855. [Google Scholar]

Figure 1. The overall structure of the model.

Figure 2. Bidirectional GRU structure and unit details.

Figure 3. Comparison of data smoothing results of vehicle No. 12.

Figure 6. Confusion matrix.

Figure 7. ROC comparison curve.

Table 1. Model input feature descriptions.

Parameters	Feature Description	Categories
$x_{t}$ $, y_{t}$	Lateral and Longitudinal Displacement of the Target Vehicle	TF
$v x_{t}$ $, {v y}_{t}$	Lateral and Longitudinal Velocity of the Target Vehicle
$a x_{t}$ $, {a y}_{t}$	Lateral and Longitudinal Acceleration of the Target Vehicle
$θ$	Yaw Angle of the Target Vehicle $θ = a r c t a n (\frac{x^{t} - x^{t - 3}}{y^{t} - y^{t - 3}})$
$Δ x_{t}^{i}$ $, {Δ y}_{t}^{i}$	Lateral and Longitudinal Displacement of Surrounding Vehicle i Relative to the Target Vehicle	SF
$Δ {v x}_{t}^{i}$ $, {Δ v y}_{t}^{i}$	Lateral and Longitudinal Relative Velocity of Surrounding Vehicle i Relative to the Target Vehicle	SF
$R_{1}$ $, R_{2}$	Indicates the Lane Markings for the Target Vehicle’s Left and Right Lanes. If They Exist, It is 1; Otherwise, It Is 0.	LF

Table 2. Model performance comparison results.

Model	Intention	Precision	Recall	F1-Score	Accuracy
SVM	LLC	78.66%	83.58%	81.04%	76.71%
	LK	69.29%	63.76%	66.41%
	RLC	81.39%	82.79%	82.08%
LSTM	LLC	93.56%	90.53%	92.02%	92.28%
	LK	87.32%	91.37%	89.30%
	RLC	96.29%	94.95%	95.62%
Transformer	LLC	93.73%	96.37%	95.01%	95.28%
	LK	96.29%	89.60%	92.82%
	RLC	95.95%	99.87%	97.87%
Model_TA	LLC	95.46%	96.40%	95.93%	97.01%
	LK	96.37%	94.81%	95.58%
	RLC	99.19%	99.82%	99.50%

Table 3. Ablation study comparison table.

Model	Model Composition Structure			Accuracy
Model	BiGRU	Transformer Encoder	Additive Attention	Accuracy
BiGRU	✓	✗	✗	95.66%
Model_T	✓	✓	✗	96.54%
Model_TA	✓	✓	✓	97.01%

Table 4. Comparison of model recognition performance under different prediction times.

Time/s	BiGRU	Model_T	Transformer	Model_TA
0.5	95.39%	94.57%	95.59%	95.39%
1.0	93.64%	94.06%	95.92%	94.33%
1.5	89.70%	91.18%	93.16%	93.72%
2.0	82.49%	86.14%	88.99%	90.15%
2.5	73.83%	82.13%	81.50%	84.58%
3.0	70.25%	81.09%	79.89%	83.13%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, D.; Chen, Y.; Fan, K.; Bai, Q.; Luo, Y.; Xie, G. Research on Vehicle Lane Change Intent Recognition Based on Transformers and Bidirectional Gated Recurrent Units. World Electr. Veh. J. 2025, 16, 155. https://doi.org/10.3390/wevj16030155

AMA Style

Zhou D, Chen Y, Fan K, Bai Q, Luo Y, Xie G. Research on Vehicle Lane Change Intent Recognition Based on Transformers and Bidirectional Gated Recurrent Units. World Electric Vehicle Journal. 2025; 16(3):155. https://doi.org/10.3390/wevj16030155

Chicago/Turabian Style

Zhou, Dan, Yujie Chen, Kexing Fan, Qi Bai, Yong Luo, and Guodong Xie. 2025. "Research on Vehicle Lane Change Intent Recognition Based on Transformers and Bidirectional Gated Recurrent Units" World Electric Vehicle Journal 16, no. 3: 155. https://doi.org/10.3390/wevj16030155

APA Style

Zhou, D., Chen, Y., Fan, K., Bai, Q., Luo, Y., & Xie, G. (2025). Research on Vehicle Lane Change Intent Recognition Based on Transformers and Bidirectional Gated Recurrent Units. World Electric Vehicle Journal, 16(3), 155. https://doi.org/10.3390/wevj16030155

Article Menu

Research on Vehicle Lane Change Intent Recognition Based on Transformers and Bidirectional Gated Recurrent Units

Abstract

1. Introduction

2. Method

2.1. Transformer Encoder

2.2. Bidirectional Gate Recurrent Unit

2.3. Additive Attention

3. Data Preprocessing

3.1. Trajectory Data Processing

3.2. Feature Selection

3.3. Lane Change Trajectory Annotation

3.4. Trajectory Sequence Extraction

4. Experiments and Results

4.1. Evaluation Metrics

4.2. Comparative Experiments

4.3. Ablation Experiments

4.4. Anticipatory Judgment Capability

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI