A Novel Trajectory Prediction Method Based on CNN, BiLSTM, and Multi-Head Attention Mechanism

Xu, Yue; Pan, Quan; Wang, Zengfu; Hu, Baoquan

doi:10.3390/aerospace11100822

Open AccessArticle

A Novel Trajectory Prediction Method Based on CNN, BiLSTM, and Multi-Head Attention Mechanism

¹

School of Automation, Northwestern Polytechnical University, Xi’an 710129, China

²

School of Mechanical and Electrical Engineering, Lanzhou University of Technology, Lanzhou 730050, China

³

School of Engineering, Xi’an International University, Xi’an 710077, China

^*

Author to whom correspondence should be addressed.

Aerospace 2024, 11(10), 822; https://doi.org/10.3390/aerospace11100822

Submission received: 7 September 2024 / Revised: 3 October 2024 / Accepted: 7 October 2024 / Published: 8 October 2024

(This article belongs to the Section Aeronautics)

Download

Browse Figures

Versions Notes

Abstract

:

A four-dimensional (4D) trajectory is a multi-dimensional time series that embodies rich spatiotemporal features. However, its high complexity and inherent uncertainty pose significant challenges for accurate prediction. In this paper, we present a novel 4D trajectory prediction model that integrates convolutional neural networks (CNNs), bidirectional long short-term memory networks (BiLSTMs), and multi-head attention mechanisms. This model effectively addresses the characteristics of aircraft flight trajectories and the difficulties associated with simultaneously extracting spatiotemporal features using existing prediction methods. Specifically, we leverage the local feature extraction capabilities of CNNs to extract key spatial and temporal features from the original trajectory data, such as geometric shape information and dynamic change patterns. The BiLSTM network is employed to consider both forward and backward temporal orders in the trajectory data, allowing for a more comprehensive capture of long-term dependencies. Furthermore, we introduce a multi-head attention mechanism that enhances the model’s ability to accurately identify key information in the trajectory data while minimizing the interference of redundant information. We validated our approach through experiments conducted on a real ADS-B trajectory dataset. The experimental results demonstrate that the proposed method significantly outperforms comparative approaches in terms of trajectory estimation accuracy.

Keywords:

CNN; LSTM; multi-head attention mechanism; deep learning; trajectory prediction

1. Introduction

With the rapid advancement of aviation, navigation, and unmanned technology, trajectory estimation has emerged as a crucial technology in these fields [1]. The accuracy and real-time performance of trajectory estimation are vital for ensuring navigation safety, enhancing navigation efficiency, and enabling intelligent autonomous navigation [2]. Traditional trajectory estimation methods, such as Kalman filtering and particle filtering, have demonstrated effectiveness in certain scenarios. However, their performance is often constrained when confronted with complex and dynamic navigation environments and uncertainties [3]. Consequently, the development of more advanced and efficient trajectory estimation methods has become a prominent research focus.

Recently, deep learning techniques [4], particularly CNNs [5] and long short-term memory networks (LSTMs) [6], have made significant strides in various domains, including image processing, natural language processing, and time series analysis [7]. The robust feature extraction and learning capabilities inherent in these network models provide distinct advantages in managing complex data patterns and sequence information [8]. Motivated by these advancements, numerous researchers have begun applying deep learning methods to the field of trajectory estimation, overcoming the limitations of traditional approaches and enhancing both the accuracy and robustness of trajectory predictions.

For instance, Rahman et al. [9] transformed trajectory data into images of varying sizes and subsequently classified these images using CNNs, achieving high accuracy on test sets involving two aircraft. Similarly, Wang et al. [10] optimized the trajectories of hypersonic aircraft during the re-entry phase by meticulously designing and constructing deep learning models. Their approach facilitated the efficient and rapid generation of optimal flight trajectories, thereby offering substantial technical support for stable re-entry under high-speed flight conditions.

Li et al. [11] developed a three-dimensional hypersonic trajectory prediction method leveraging deep neural networks (DNNs), addressing the challenge of deriving optimal flight paths under complex control scenarios. Han et al. [12] introduced a novel 4D trajectory prediction method utilizing LSTM, enabling the precise acquisition of 4D information related to predicted trajectories. Zeng et al. [13] employed regularization techniques to reconstruct historical trajectory data and proposed a data-driven LSTM model capable of effectively capturing the temporal dependencies of flight trajectories, resulting in high prediction accuracy.

Furthermore, Schimpf et al. [14] trained a recurrent neural network (RNN) on multiple flight path trajectories to enhance the accuracy of trajectory predictions. Liu et al. [15] developed a trajectory prediction model based on DNNs that leverages the dynamic characteristics of aircraft. This model generates extensive flight trajectory data through established dynamic equations, conducts offline training on the generated data, and subsequently applies it to real-time online predictions. This integration into flight control systems significantly improves aircraft safety.

While the aforementioned methods excel in the complex task of predicting aircraft tracks, single models often struggle to fully capture the spatial characteristics, temporal dependencies, and dynamic changes inherent in track data. This limitation can lead to inadequate prediction accuracy and generalization ability [16,17,18,19]. In contrast, composite models can comprehensively capture the features of trajectory data by leveraging the strengths of various models, thereby improving adaptability to new data and scenarios [20]. Consequently, researchers have started to implement composite models for trajectory estimation to enhance prediction accuracy and overall performance.

For instance, Wu et al. [21] developed a track prediction model that integrates CNNs, RNNs, and fully connected neural networks, validating the model with real data under severe convective weather conditions. Their results demonstrated that the prediction variance of the composite model was significantly lower than that of single models. Similarly, Tran et al. [22] achieved a 30% reduction in prediction error by combining encoder–decoder architectures, CNNs, and gated recurrent units (GRUs) for trajectory prediction. Furthermore, Shafienya et al. [23] proposed a hybrid deep learning model that incorporates CNNs, GRUs, and 3D-CNNs. This model specifically extracts spatial and temporal features from the input data using the CNN–GRU combination and employs a 3D-CNN for prediction. Experimental validation using automatic dependent surveillance–broadcast (ADS-B) data revealed that this method yielded lower measurement errors compared to traditional trajectory prediction techniques.

Although the above methods have shown certain advantages in trajectory prediction, there are still the following issues: (1) Traditional trajectory estimation methods often have limitations in processing spatial features, making it difficult to fully capture the complex spatial characteristics of trajectory data [24]. (2) When handling trajectory data, some anomalies or critical events may involve multiple positions in the sequence, and traditional single models often find it hard to simultaneously focus on these global dependencies [25]. In the method proposed in this paper, a CNN is used as an initial spatial feature extractor, efficiently capturing spatial features of the trajectory data, such as the aircraft’s position, speed, direction, and other key information. A BiLSTM not only processes forward temporal sequence information but also leverages backward temporal sequence information, thereby gaining a more comprehensive understanding of the temporal characteristics of trajectory data. The multi-head attention mechanism is used to further enhance the model’s ability to model global dependencies. It dynamically adjusts the importance of different features, allowing the model to focus more accurately on critical information during prediction. Specifically, the main contributions of this paper are as follows:

(1) A specific trajectory estimation framework has been proposed, which fully utilizes the modeling capability of BiLSTMs for capturing long-term dependencies in sequential data, especially considering the temporal continuity of trajectory data and its correlation with past and future states. Meanwhile, CNNs are employed to extract key local features such as curvature and velocity changes from the trajectory data, which are crucial for capturing the dynamic characteristics of the trajectories.

(2) To further enhance the accuracy of trajectory estimation, we introduced a multi-head attention mechanism into the framework to weight and fuse the extracted features, thereby highlighting key information within the trajectories. The advantage of this mechanism lies in its ability to allow the model to simultaneously focus on multiple important parts of the sequence across various representation subspaces, which is particularly critical for handling potential turning points or anomalies that may appear within the trajectories.

(3) In addition, the proposed trajectory estimation framework emphasizes not only high-accuracy predictions but also the improvement of real-time performance and robustness. Through carefully designed network architecture and parameter optimization, as well as the introduction of efficient parallel computing techniques, this framework can process large-scale trajectory data and respond quickly, while also demonstrating good noise resistance and generalization performance.

The remaining chapters of this paper are organized as follows: Section 2 provides a brief introduction to the fundamental theoretical concepts of CNNs, LSTMs, and BiLSTMs. Section 3 presents a detailed explanation of the proposed trajectory estimation method. Section 4 validates the effectiveness and accuracy of this method using real ADS-B trajectory data. Finally, Section 5 concludes the paper.

2. Theoretical Foundations

2.1. Convolutional Neural Networks (CNNs)

A CNN is a typical feed-forward neural network composed primarily of convolutional layers, pooling layers, and fully connected layers. The typical structure of a CNN is illustrated in Figure 1 [26].

The convolutional layer applies convolution operations to the feature map generated by the previous layer using convolutional kernels, resulting in a new feature map. The calculation process is as follows:

g_{j}^{l} = σ (\sum_{i = 1}^{n} x_{i}^{l - 1} * w_{j i}^{l} + b_{j}^{l})

(1)

where

i

is the serial number of the input feature map,

j

is the serial number of the output feature map,

l

is the number of layers,

x_{i}^{l - 1}

is the ith input feature map of the (l – 1)th layer,

w_{i j}^{l}

is the weight of the convolution kernel,

b_{j}^{l}

is the bias term,

*

is the sign of the convolution operation, and

σ

is the activation function.

The pooling layer is employed to reduce the number of internal parameters in a CNN and includes two methods: max pooling and average pooling [27]. This paper utilizes max pooling, and the calculation formula is as follows:

x_{l}^{j} = \max (x_{l - 1}^{i})

(2)

where

\max ()

is the maximum pooling function, which is used to calculate the maximum value of the feature map

x_{l - 1}^{i}

in the pooled region.

The fully connected layer transforms the two-dimensional (2D) feature maps generated by the convolutional and pooling layers into one-dimensional (1D) vectors, thereby completing the feature extraction task. The calculation formula is as follows:

o_{j}^{l} = \sum_{i = 1} (x_{i}^{l - 1} \times w_{j i}^{l} + b_{j}^{l})

(3)

2.2. Long Short-Term Memory Network (LSTM)

An LSTM is a specialized structure of an RNN designed to address the gradient vanishing and exploding problems that traditional RNNs encounter when processing long sequences [28]. By incorporating gating mechanisms, the LSTM has achieved significant success in various sequence processing tasks, allowing the network to effectively learn long-term dependencies within the sequence. Unlike traditional RNNs, LSTMs introduce three gating units: the input gate, oblivion gate, and output gate, as illustrated in Figure 2. The calculation process of LSTMs is as follows:

The input gate controls the addition of information and is calculated as follows:

i_{t} = ς (x_{t} w_{x i} + h_{t - 1} w_{h i} + b_{i})

(4)

The oblivion gate controls the discarding of information and is calculated as:

f_{t} = ς (x_{t} w_{x f} + h_{t - 1} w_{h f} + b_{f})

(5)

c_{t}

represents the internal cell state and is calculated as:

{\tilde{C}}_{t} = \tanh (x_{t} w_{x c} + h_{t - 1} w_{h c} + b_{c})

(6)

C_{t} = f_{t} ⊙ C_{t - 1} + i_{t} ⊙ {\tilde{C}}_{t}

(7)

The output gate controls the output of the message and is calculated as follows:

o_{t} = ς (x_{t} w_{x o} + h_{t - 1} w_{h o} + b_{o})

(8)

h_{t} = O_{t} ⊙ \tanh (C_{t})

(9)

where

i_{t}

is the input gate;

f_{t}

is the oblivion gate;

o_{t}

is the output gate;

x_{t}

denotes the input at time

t

;

ς

is the sigmoid activation function;

w_{x i}

,

w_{x f}

,

w_{x c}

,

w_{x o}

denote the weight matrix associated with

x_{t}

;

w_{h i}

,

w_{h f}

,

w_{h c}

,

w_{h o}

denote the weight matrix associated with

h_{t - 1}

;

b_{i}

,

b_{f}

,

b_{c}

,

b_{o}

are the bias vectors; and

⊙

denotes the Hadamard product.

2.3. Bidirectional Long Short-Term Memory Network (BiLSTM)

A BiLSTM combines the principles of an LSTM with a bidirectional RNN. It consists of two parallel LSTM layers: a forward LSTM and a backward LSTM, as illustrated in Figure 3. At each time step, the forward LSTM processes the current part of the sequence while incorporating future information, whereas the backward LSTM processes the same part of the sequence while considering past information [29]. The outputs from these two LSTM layers are then merged using an activation function to produce a final output that accounts for both preceding and following information in the sequence.

This method can effectively capture the forward and backward information of trajectory data. Among them,

\vec{h_{t}}

and

\overset{\leftarrow}{h_{t}}

represent the hidden vectors of the forward LSTM layer and the backward LSTM layer at time

t

. They are independent of each other and only related to their respective LSTM layers.

y_{t}

is obtained by weighted connections between these two hidden layers, and

ς

represents the activation function. The specific calculation process is shown in Equations (10) and (11).

\vec{h_{t}} = LSTM (x_{t}, \vec{h_{t - 1}})

(10)

\overset{\leftarrow}{h_{t}} = LSTM (x_{t}, \overset{\leftarrow}{h_{t + 1}})

(11)

3. Proposed Method

3.1. CNN-BiLSTM Incorporating Multi-Head Attention Mechanism

When processing sequential data, particularly complex sequences with both spatial and temporal dependencies, a single network structure often struggles to capture all essential information. To address this challenge, we propose a new model that combines BiLSTMs and CNNs, integrating a multi-head attention mechanism. This network architecture aims to simultaneously capture both spatial and sequential features of the data while enhancing the focus on key information through the multi-head attention mechanism, thereby improving the overall performance of the model.

The proposed model structure is illustrated in Figure 4. We first employ a BiLSTM to capture bidirectional dependencies in the sequence data. The BiLSTM consists of a forward LSTM and a backward LSTM, which, respectively, capture forward and backward information in the sequence, fusing this information to obtain a more comprehensive representation of the sequence. Next, we use the output of the BiLSTM as the input for the CNN, leveraging the CNN’s convolution operation to capture the local spatial features of the data. The introduction of the CNN not only enhances the model’s understanding of the spatial structure of the data but also further enriches the model’s representational capability.

Although the combination of a BiLSTM and a CNN can effectively capture spatial features and bidirectional dependencies in sequences, certain parts of the data may hold greater importance than others in specific cases. To enhance the model’s focus on key information, we introduced a multi-head attention mechanism. This mechanism allows the model to concentrate on different segments of the input sequence from multiple perspectives and assign varying weights to each segment. By effectively highlighting key information and suppressing less relevant parts, the multi-head attention mechanism improves the overall performance of the model. A detailed explanation of the multi-head attention mechanism will be provided in the next section.

3.2. Multi-Head Attention Mechanism

Self-attention is a mechanism that enables models to allocate varying levels of attention to different positions in the input sequence when processing sequential data. In this mechanism, the model calculates the correlation between each position in the input sequence and all other positions, generating a new weighted representation based on these correlation scores.

The core idea of the self-attention mechanism is to capture the dependency relationships between different positions in the sequence by calculating the similarity score between each position in the input sequence and all other positions. Specifically, given an input sequence

x_{i}

, each position in

x_{i}

is first linearly transformed to generate three vectors: query (q), key (k), and value (v). The query vector is used to match other positions in the sequence, the key vector is used to calculate similarity with the query vector, and the value vector contains the information that needs to be extracted. The calculation formula is as follows:

{\begin{cases} q_{i} = w_{q i} \cdot x_{i} \\ k_{i} = w_{k i} \cdot x_{i} \\ v_{i} = w_{v i} \cdot x_{i} \end{cases}

(12)

where

x_{i}

is the input sequence data;

w_{q i}

,

w_{k i}

,

w_{v i}

are three different learnable weight matrices; and

q_{i}

,

k_{i}

,

v_{i}

are the computed query vectors, key vectors and value vectors.

After obtaining the query vector and key vector, the self-attention mechanism calculates the dot product between the query vector and all key vectors in the sequence. These dot product values indicate the similarity or correlation between the query position and other positions in the sequence. To obtain normalized attention weights, the dot product values are transformed using a SoftMax function. The SoftMax function converts the original dot product values into a probability distribution, ensuring that the sum of weights for all positions equals 1. Consequently, the weight of each position represents its importance in the current context. The calculation formula is as follows:

{a^{'}}_{i, j} = S o f t m a x (q_{i} ⊙ k_{i})

(13)

where

{a^{'}}_{i, j}

is the normalized attention weight,

⊙

is the dot product operation symbol, and

S o f t m a x

is used to normalize the result of the dot product operation, which is calculated as follows:

S o f t m a x (x_{i}) = \frac{e^{x_{i}}}{\sum_{i = 1}^{m} e^{x_{i}}} \in (0, 1)

(14)

After obtaining the attention weights, the self-attention mechanism weights and sums these weights with the corresponding value vectors. Specifically, this involves multiplying the value vectors of each position by their corresponding weights and then summing these weighted value vectors, as illustrated in Figure 5. This process serves as a weighted aggregation of information from all positions in the sequence, with more important positions receiving greater weight. The final output representation is the result of this weighted aggregation for the current position, incorporating information from all positions in the sequence, filtered and fused according to the significant attention weights. The calculation formula is as follows:

O_{j} = \sum_{i = 1}^{m} {a^{'}}_{j, i} * v_{i}

(15)

The multi-head attention mechanism is an enhancement and extension of the self-attention mechanism. By introducing multiple attention heads, it enables the model to learn different aspects of the input data in parallel across various representation subspaces. This approach allows for a more effective capture of the dependency relationships between different positions in the input sequence, as illustrated in Figure 6.

Specifically, the multi-head attention mechanism first divides the input trajectory data into multiple smaller segments, each corresponding to a different attention head. Each head then independently performs self-attention calculations to generate its own output tensor. Finally, the outputs from all heads are concatenated and transformed into the final output representation through a linear transformation. Its expression is as follows:

M u l t i H e a d (q, k, v) = C o n c a t e n a t e (h e a d_{1}, h e a d_{2}, \dots, h e a d_{n}) w_{o}

(16)

where

h e a d_{i} = A t t e n t i o n (q w_{q i}, k w_{k i}, v w_{v i})

,

w_{q i}

,

w_{k i}

,

w_{v i}

are the mapping matrix weights, and

w_{o}

is the output weight matrix.

3.3. Network Structure Parameters

The structural parameters of the network are presented in Table 1. Its core components consist of two layers of LSTM and two layers of convolutional layers. The number of neurons in these layers is 128, 256, 32, and 64, respectively. This design facilitates the network’s ability to extract and abstract features from the input data in a layered manner.

To enhance the model’s performance, we introduced both Dropout and batch normalization (BN) layers. The Dropout layer operates by randomly “turning off” or “discarding” a portion of neurons during training, which means that in each iteration, the network learns with a slightly different structure. This approach compels the model to learn more robust features, as it must depend on the remaining active neurons for predictions. Consequently, this not only helps prevent overfitting but also increases the model’s diversity and improves its generalization ability on unseen data.

Conversely, the BN layer serves to standardize the inputs of each layer within the network, adjusting them to a relatively stable distribution. This helps mitigate the issue of internal covariate shift during training, making it easier for the network to learn the data distribution. Additionally, the BN layer can accelerate the training process by allowing the use of higher learning rates and alleviating the vanishing or exploding gradient problems. However, it is important to note that BN is not commonly used within LSTM layers, as LSTMs inherently manage changes in sequence data through their internal gating mechanisms. Therefore, we only apply a BN layer after the convolutional layers.

The hyperparameter settings of the network are detailed in Table 2. We employed sliding window technology, setting the window size to 10, to capture relevant contextual information when processing sequence data. To optimize the parameters within the network, we selected the Adam optimizer, which adaptively adjusts the learning rate based on gradient information during training. This choice accelerates the training process and enhances the model’s performance. For evaluating the model’s performance, we utilized the mean squared error (MSE) as the loss function, which quantifies the average difference between the model’s predicted and actual values.

In addition, we set the maximum number of iterations to 500 to ensure that the model has enough training epochs to converge to a better solution. To effectively utilize computational resources and accelerate the training process, we determined the optimal batch size and test set proportion for the model through experimentation, as shown in Figure 7. Specifically, the batch size is set to 32, meaning that the network will process 32 samples simultaneously in each iteration. Finally, regarding the dataset partitioning, we set the proportion of the test set to 0.3, which means we will use 70% of the input data as the training set to train the model, while the remaining 30% will be used as the test set to evaluate the model’s generalization ability.

3.4. Network Training Process

Based on the proposed att-CNN-BiLSTM model, combined with trajectory data sample partitioning and model testing, a complete trajectory prediction process can be obtained, as shown in Algorithm 1 and Figure 8. The main steps can be summarized as follows:

Algorithm 1. att-CNN-BiLSTM

Require: Trajectory data; learning rate: η; batch size: m; max epoch: θ; window_size: δ; optimizer: Adam.

While epoch ≤ max epoch:

1: for i in range(len(data) − window_size):

2: X.append(data[i:i + window_size])

3: y.append(data[i + window_size])

4: i += 1

5: Normalization of input trajectory data;

6: Split the dataset into a training set and a testing set, where the testing set is 30%;

7: Reshaping the input data to match the input shape of the LSTM;

8: Construct att-CNN-BiLSTM model and input training set data for training;

9: The forward and reverse information of the trajectory data is captured through Equations (10) and (11);

10: Enhancing the model’s focus on key information through Equations (12)–(16);

11: Local spatial features in the data are captured through Equation (1);

12: Prediction using testing set;

13: Output MSE, RMSE, MAE, R²;

14: epoch += 1

end while

(1) Divide the preprocessed trajectory into training and test sets using a sliding time window based on the test set share determined in Figure 7.

(2) Construct an att-CNN-BiLSTM model based on the structural parameters of each layer in Table 1. Specifically, using BiLSTMs as the first two layers of the model to capture the bidirectional dependencies of the sequence. After the BiLSTMs, add a multi-head attention mechanism layer to calculate the attention weights of different parts in the input sequence. Next, use CNN layers to capture local features.

(3) Set the learning rate, optimizer, and loss function of the model.

(4) Input the training set data into the att-CNN-BiLSTM model for training, update the parameters in the network using the Adam optimizer, calculate the loss of the model using the MSE loss function, and end the training when the loss value is less than or equal to the set value.

(5) Judge whether the number of training times reaches the set epoch threshold; if not, repeat step 4; if yes, end the training.

(6) Finally, the trained model is predicted using the test set and several performance evaluation metrics are output, including mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R²). These metrics can comprehensively assess the prediction accuracy and generalization ability of the model and help us understand the real performance of the model.

4. Experimental Validation

In this section, we validate the performance of the proposed method through experimental verification using real ADS-B trajectory data. The experiments were conducted on a computer equipped with an Intel i7-10750H CPU, an Nvidia GTX 1650Ti GPU, and 16 GB of memory. The network was successfully trained using software platforms, including Python 3.7, TensorFlow 2.3, and Keras 2.7.

4.1. Dataset Description

The ADS-B dataset [25] represents a crucial advancement in air traffic management technology. It offers real-time insights into air traffic for managers by automatically transmitting essential information, such as an aircraft’s position, speed, and heading. This automatic broadcasting surveillance system enhances the efficiency and safety of air traffic operations while also serving as a valuable resource for aviation research and development.

In this study, we analyzed the actual flight trajectory data of the aircraft with registration number B1520 from May 2024. To gain a deeper understanding of the aircraft’s flight characteristics, we selected a comprehensive dataset that includes various flight-related information, such as aircraft identification codes, timestamps, altitude, speed, heading angle, longitude, and latitude.

For our model training and experimental analysis, we concentrated on five core parameters: altitude, speed, heading angle, longitude, and latitude. These parameters not only illustrate the aircraft’s dynamic flight state but are also essential for understanding the interplay of various factors throughout the flight process.

Table 3 presents detailed statistical information for these five parameters, including key indicators such as minimum, maximum, mean, and standard deviation. These data provide an overall view and the distribution characteristics of the dataset. Additionally, we created visual representations of these data (as shown in Figure 9) to intuitively illustrate the distribution patterns and trends.

4.2. Experimental Results and Analysis

4.2.1. Effect of Different Numbers of Neurons on Modeling

When constructing a deep learning model, the number of neurons is a critical hyperparameter that significantly influences the model’s capacity for representation and learning. Different configurations of neuron counts can profoundly affect the overall performance of the model.

In the case of a BiLSTM layer, a small number of neurons limits the model’s ability to represent complex patterns and capture long-term dependencies within sequence data. With fewer neurons, the more the model’s information processing capacity is restricted, which may lead to underfitting, characterized by poor performance on the training set. Conversely, as the number of neurons increases, the model’s representation capacity improves. However, an excessive number of neurons can lead to overfitting, where the model performs well on the training set but struggles on the test set due to its tendency to learn noise and outliers present in the training data.

Similarly, for CNN layers, a limited number of neurons may hinder the extraction of essential features from the input data. Just like in the BiLSTM layer, having too many neurons in the CNN layer can also result in overfitting.

Therefore, to determine the optimal neuron count configuration, we compared the model performance under different configurations through experimental methods, and, thus, selected the optimal neuron count setting. Specifically, we fixed the number of layers of the BiLSTM and CNN to 2 and referred to the literature [30] to choose from the commonly used parameters such as 32, 64, 128, and 256. Due to too many considerations, we set the number of neurons in the latter layer of the BiLSTM and CNN to twice that of the previous layer. The experimental results are shown in Table 4. Among them, {32, 64, 128, 256} indicates that the number of neurons in the two layers of the BiLSTM is 32 and 64, respectively, and the number of neurons in the two layers of the CNN is 128 and 256, respectively. We chose four metrics, MSE, RMSE, MAE, and R², to evaluate the performance of the model. The calculation formula for these four indicators is as follows:

If the true value is

y = {y_{1}, y_{2}, \dots, y_{n}}

and the predicted value is

\hat{y} = {{\hat{y}}_{1}, {\hat{y}}_{2}, \dots, {\hat{y}}_{n}}

, then

M S E = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}

(17)

MSE is generally used to detect the deviation between the predicted and true values of a model. With a range of

[0, + \infty)

, the larger the value, the greater the error.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}

(18)

RMSE stands for the root of MSE, which represents the sample standard deviation of the difference between the predicted value and the true value. The range

[0, + \infty)

is equal to 0 when the predicted value completely matches the true value.

M A E = \frac{1}{n} \sum_{i = 1}^{n} | {\hat{y}}_{i} - y_{i} |

(19)

With a MAE range of

[0, + \infty)

, the smaller the value, the better accuracy the predictive model possesses.

\begin{array}{l} R^{2} = 1 - \frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{y})}^{2}} \\ \bar{y} = \frac{1}{n} \sum_{i = 1}^{n} y_{i} \end{array}

(20)

Based on the value of

R^{2}

, the quality of the model can be judged. The larger the value, the better the fitting effect of the model.

In Table 4, we evaluate the five signals—height, speed, angle, longitude, and latitude—using four performance indicators: MSE, RMSE, MAE, and R². To emphasize the best performance for each signal across these indicators, we present the optimal values in bold. The model exhibits optimal performance when the number of neurons is set to {128, 256, 32, 64}. Consequently, we selected these neuron configurations for our model, and the detailed parameters are outlined in Table 1.

4.2.2. Comparative Analysis of Different Methods

To further validate the effectiveness of the proposed method, we selected several related techniques for comparative analysis, specifically: (1) real data: ADS-B real trajectories; (2) proposed: our method; (3) encoder–decoder [22]; (4) CNN-3DCNN-GRU [23]; (5) TCN-Informer [20]; and (6) attention-TCN-GRU [31]. The experimental results are presented in Figure 10. Panels (a) to (e) illustrate the comparison between predicted and actual trajectories for each model across five different signals: altitude, speed, angle, longitude, and latitude. The predicted trajectory of our proposed method aligns closely with the overall trend of the actual trajectory, whereas the other methods exhibit noticeable errors. This further demonstrates the effectiveness and robustness of our approach.

To provide a clearer evaluation of the performance of various methods, we plotted the results of the proposed method alongside those of the comparative methods across four indicators: MSE, RMSE, MAE, and R², as illustrated in Figure 11. For indicators (a) to (c), larger values correspond to greater errors, indicating that the proposed method exhibits smaller prediction errors across five datasets: altitude, speed, angle, longitude, and latitude. Conversely, for indicator (d), larger values signify higher prediction accuracy. The proposed method demonstrated superior prediction accuracy for all five datasets, further confirming its enhanced trajectory prediction performance compared to the other methods.

4.2.3. Ablation Experiment

To verify the effectiveness of the improvements presented in this paper, we designed a series of ablation experiments, outlined as follows: (1) proposed: our method; (2) CNN-BiLSTM: removal of the multi-head attention mechanism; (3) att-CNN: replacing the BiLSTM in the model with a CNN; (4) att-BiLSTM: replacing the CNN in the model with a BiLSTM; (5) att-CNN-LSTM: replacing the BiLSTM in the model with an LSTM; (6) att1-CNN-BiLSTM: replacing the multi-head attention mechanism in the model with a self-attention mechanism.

The experimental results are illustrated in Figure 12. First, we focused on the three indicators shown in panels (a) to (c), which measure the magnitude of prediction error. In the ablation experiments, we systematically removed each key component or improvement point from the method and re-evaluated the model’s performance. By comparing the error metrics before and after the removal of each component, we found that eliminating any key element led to a significant increase in prediction error. This clearly indicates that each improvement in our method plays a crucial role in reducing prediction errors.

Next, we analyzed the indicator shown in panel (d), which represents the accuracy of the model’s predictions. During the ablation experiments, we observed that as key components of our method were gradually reintroduced, the value of this indicator also steadily increased, demonstrating continuous improvement in the model’s prediction accuracy. This highlights the effectiveness of our enhanced method in boosting prediction accuracy.

From this series of ablation experiments, we concluded that the proposed method demonstrates superior predictive performance across five datasets: altitude, speed, angle, longitude, and latitude. Whether evaluated from the perspective of reducing prediction errors or enhancing prediction accuracy, our method significantly outperforms the comparative methods. This further validates that our improvements to existing trajectory prediction methods are effective and possess significant practical application value.

5. Conclusions

In this paper, we propose an innovative trajectory estimation method that integrates CNNs, BiLSTMs, and multi-head attention mechanisms. This method not only highlights the advantages of CNNs in accurately capturing local features (such as curvature and velocity changes) within trajectory data, but also fully utilizes the capability of BiLSTMs in modeling long-term dependencies, especially considering the temporal characteristics of trajectory data and its associations with past and future states. Additionally, by incorporating a multi-head attention mechanism, the method can globally focus on key information within the trajectories, thereby enhancing the accuracy and robustness of trajectory estimation.

To validate the effectiveness of the proposed method, we conducted experiments using a real ADS-B trajectory dataset. The experimental results demonstrate that this method significantly outperforms traditional approaches in both accuracy and robustness of trajectory estimation. It not only overcomes the limitations faced by conventional trajectory estimation methods when dealing with complex trajectory data but also brings substantial technical breakthroughs to the field of trajectory estimation.

In the future, we will continue to explore and improve the proposed trajectory estimation method. On one hand, we will attempt to introduce more deep learning techniques and strategies, such as temporal convolutional networks and graph neural networks, to further enhance the performance of the model. On the other hand, we will also focus on the challenges and needs in practical applications to promote the effective application and further development of the proposed method in the field of trajectory estimation.

Author Contributions

Methodology, Y.X.; software, B.H.; writing—original draft preparation, Y.X.; writing—review and editing, Z.W.; supervision, Z.W. and Q.P.; data curation, B.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research work was partially supported by the National Natural Science Foundation of China (Grant Nos. 61790552, 62203358, 62233014).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

ADS-B datasets can be downloaded from https://flightadsb.variflight.com/track-data (accessed on 24 February 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hu, Y.; Yi, J.; Cheng, F.; Wan, X.; Hu, S. 3-D Target Tracking for Distributed Heterogeneous 2-D–3-D Passive Radar Network. IEEE Sens. J. 2023, 23, 29502–29512. [Google Scholar] [CrossRef]
Bartusiak, E.R.; Jacobs, M.A.; Chan, M.W.; Comer, M.L.; Delp, E.J. Predicting Hypersonic Glide Vehicle Behavior with Stochastic Grammars. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 1208–1223. [Google Scholar] [CrossRef]
Hashemi, S.M.; Botez, R.M.; Grigorie, T.L. New Reliability Studies of Data-Driven Aircraft Trajectory Prediction. Aerospace 2020, 7, 145. [Google Scholar] [CrossRef]
Chuya-Sumba, J.; Alonso-Valerdi, L.M.; Ibarra-Zarate, D.I. Deep-Learning Method Based on 1D Convolutional Neural Network for Intelligent Fault Diagnosis of Rotating Machines. Appl. Sci. 2022, 12, 2158. [Google Scholar] [CrossRef]
Ruan, D.; Wang, J.; Yan, J.; Gühmann, C. CNN Parameter Design Based on Fault Signal Analysis and Its Application in Bearing Fault Diagnosis. Adv. Eng. Inform. 2023, 55, 101877. [Google Scholar] [CrossRef]
Kosova, F.; Altay, Ö.; Ünver, H.Ö. Structural Health Monitoring in Aviation: A Comprehensive Review and Future Directions for Machine Learning. Nondestruct. Test. Eval. 2024, 1–60. [Google Scholar] [CrossRef]
Zhu, Z.; Lei, Y.; Qi, G.; Chai, Y.; Mazur, N.; An, Y.; Huang, X. A Review of the Application of Deep Learning in Intelligent Fault Diagnosis of Rotating Machinery. Measurement 2023, 206, 112346. [Google Scholar] [CrossRef]
Huang, J.; Li, Z.; Liu, D.; Yang, Q.; Zhu, J. An Adaptive State Estimation for Tracking Hypersonic Glide Targets with Model Uncertainties. Aerosp. Sci. Technol. 2023, 136, 108235. [Google Scholar] [CrossRef]
Rahman, S.; Lapasset, L.; Mothe, J. Aircraft Conflict Resolution Using Convolutional Neural Network on Trajectory Image. In Intelligent Systems Design and Applications, Proceedings of the 21st International Conference on Intelligent Systems Design and Applications (ISDA 2021), Online, 13–15 December 2021; Springer: Cham, Switzerland, 2022. [Google Scholar]
Wang, J.; Wu, Y.; Liu, M.; Yang, M.; Liang, H. A Real-Time Trajectory Optimization Method for Hypersonic Vehicles Based on a Deep Neural Network. Aerospace 2022, 9, 188. [Google Scholar] [CrossRef]
Li, H.; Chen, H.; Tan, C.; Jiang, Z.; Xu, X. Fast Trajectory Generation with a Deep Neural Network for Hypersonic Entry Flight. Aerospace 2023, 10, 931. [Google Scholar] [CrossRef]
Han, P.; Yue, J.; Fang, C.; Shi, Q.; Yang, J. Short-Term 4D Trajectory Prediction Based on LSTM Neural Network. In Proceedings of the Second Target Recognition and Artificial Intelligence Summit Forum, Changchun, China, 20–22 August 2019. [Google Scholar]
Zeng, W.; Quan, Z.; Zhao, Z.; Xie, C.; Lu, X. A Deep Learning Approach for Aircraft Trajectory Prediction in Terminal Airspace. IEEE Access 2020, 8, 151250–151266. [Google Scholar] [CrossRef]
Schimpf, N.; Wang, Z.; Li, S.; Knoblock, E.J.; Li, H.; Apaza, R.D. A Generalized Approach to Aircraft Trajectory Prediction via Supervised Deep Learning. IEEE Access 2023, 11, 116183–116195. [Google Scholar] [CrossRef]
Liu, Z.; Yan, J.; Ai, B.; Fan, Y.; Luo, K.; Cai, G.; Qin, J. An Online Generation Method of Terminal-Area Trajectories for Wave-Rider Using Deep Neural Networks. Aerospace 2023, 10, 654. [Google Scholar] [CrossRef]
Zhang, A.; Zhang, B.; Bi, W.; Mao, Z. Attention Based Trajectory Prediction Method under the Air Combat Environment. Appl. Intell. 2022, 52, 17341–17355. [Google Scholar] [CrossRef]
Liu, Z.; Wang, Z.; Yang, Y.; Lu, Y. A Data-Driven Maneuvering Target Tracking Method Aided with Partial Models. IEEE Trans. Veh. Technol. 2024, 73, 414–425. [Google Scholar] [CrossRef]
Zeng, W.; Chu, X.; Xu, Z.; Liu, Y.; Quan, Z. Aircraft 4D Trajectory Prediction in Civil Aviation: A Review. Aerospace 2022, 9, 91. [Google Scholar] [CrossRef]
Pepper, N.; Thomas, M. Learning Generative Models for Climbing Aircraft from Radar Data. J. Aerosp. Inf. Syst. 2024, 21, 474–481. [Google Scholar] [CrossRef]
Dong, Z.; Fan, B.; Li, F.; Xu, X.; Sun, H.; Cao, W. TCN-Informer-Based Flight Trajectory Prediction for Aircraft in the Approach Phase. Sustainability 2023, 15, 16344. [Google Scholar] [CrossRef]
Wu, Y.; Yu, H.; Du, J.; Liu, B.; Yu, W. An Aircraft Trajectory Prediction Method Based on Trajectory Clustering and a Spatiotemporal Feature Network. Electronics 2022, 11, 3453. [Google Scholar] [CrossRef]
Tran, P.N.; Nguyen, H.Q.V.; Pham, D.-T.; Alam, S. Aircraft Trajectory Prediction With Enriched Intent Using Encoder-Decoder Architecture. IEEE Access 2022, 10, 17881–17896. [Google Scholar] [CrossRef]
Shafienya, H.; Regan, A.C. 4D Flight Trajectory Prediction Using a Hybrid Deep Learning Prediction Method Based on ADS-B Technology: A Case Study of Hartsfield–Jackson Atlanta International Airport (ATL). Transp. Res. Part C Emerg. Technol. 2022, 144, 103878. [Google Scholar] [CrossRef]
Tong, Q.; Hu, J.; Chen, Y.; Guo, D.; Liu, X. Long-Term Trajectory Prediction Model Based on Transformer. IEEE Access 2023, 11, 143695–143703. [Google Scholar] [CrossRef]
Yang, Z.; Kang, X.; Gong, Y.; Wang, J. Aircraft Trajectory Prediction and Aviation Safety in ADS-B Failure Conditions Based on Neural Network. Sci. Rep. 2023, 13, 19677. [Google Scholar] [CrossRef] [PubMed]
Cao, H.; Shao, H.; Zhong, X.; Deng, Q.; Yang, X.; Xuan, J. Unsupervised Domain-Share CNN for Machine Fault Transfer Diagnosis from Steady Speeds to Time-Varying Speeds. J. Manuf. Syst. 2022, 62, 186–198. [Google Scholar] [CrossRef]
Hu, B.; Liu, J.; Zhao, R.; Xu, Y.; Huo, T. A New Fault Diagnosis Method for Unbalanced Data Based on 1DCNN and L2-SVM. Appl. Sci. 2022, 12, 9880. [Google Scholar] [CrossRef]
Smagulova, K.; James, A.P. A Survey on LSTM Memristive Neural Network Architectures and Applications. Eur. Phys. J. Spec. Top. 2019, 228, 2313–2324. [Google Scholar] [CrossRef]
Gao, Y.; Wang, C.; Shen, J.; Wang, Z.; Liu, Y.; Chai, Y. Systematic Review and Network Meta-Analysis of Machine Learning Algorithms in Sepsis Prediction. Expert Syst. Appl. 2024, 245, 122982. [Google Scholar] [CrossRef]
Jia, P.; Chen, H.; Zhang, L.; Han, D. Attention-LSTM Based Prediction Model for Aircraft 4-D Trajectory. Sci. Rep. 2022, 12, 15533. [Google Scholar] [CrossRef]
Ma, L.; Meng, X.; Wu, Z. Data-Driven 4D Trajectory Prediction Model Using Attention-TCN-GRU. Aerospace 2024, 11, 313. [Google Scholar] [CrossRef]

Figure 1. CNN structure.

Figure 2. LSTM structure.

Figure 3. BiLSTM structure.

Figure 4. Structure of the proposed model.

Figure 5. Self-attention mechanism.

Figure 6. Multi-head attention mechanism.

Figure 7. Batch size and test set percentage settings.

Figure 8. Flowchart of network training.

Figure 9. Five signals over time. (a) Height. (b) Speed. (c) Angle. (d) Longitude. (e) Latitude.

Figure 10. Comparison of predicted and true trajectories of different methods. (a) Height. (b) Speed. (c) Angle. (d) Longitude. (e) Latitude.

Figure 11. Performance of different methods under four metrics. (a) MSE. (b) RMSE. (c) MAE. (d) R².

Figure 12. Results of ablation experiments. (a) MSE. (b) RMSE. (c) MAE. (d) R².

Table 1. Structural parameters of the network layers.

Layers	Network Parameters	Output Shape	#Param
bidirectional_1	LSTM (128, activation = ‘relu’, return_sequences = True), input_shape = 10 × 1	10 × 256	133,120
dropout_1	0.2	10 × 256	0
bidirectional_2	LSTM (256, activation = ‘relu’, return_sequences = True)	10 × 512	1,050,624
dropout_2	0.2	10 × 512	0
Multi-head attention	-	10 × 512	0
conv1d_1	filters = 32, kernel_size = 3, activation = ‘relu’	8 × 32	49,184
BN_3	-	8 × 32	128
dropout_3	0.2	8 × 32	0
max_pooling1d_1	pool_size = 2	4 × 32	0
conv1d_2	filters = 64, kernel_size = 3, activation = ‘relu’	2 × 64	6208
BN_4	-	2 × 64	256
dropout_4	0.2	2 × 64	0
max_pooling1d_2	pool_size = 2	1 × 64	0
dense_1	32, activation = ‘relu’	1 × 32	2080
dense_2	1	1 × 1	33

Table 2. Network hyperparameter settings.

Parameters	Value
window_size	10
optimizer	Adam
loss function	MSE
epochs	500
batch_size	32
testing_split	0.3

Table 3. Trajectory data statistics.

Parameters	Minimum Values	Maximum Values	Mean	Standard Deviation
Height (m)	312.42	10,393.68	8424.562	2372.032
Speed (m/s)	37.04	850.068	745.8833	124.0183
Angle (°)	111	329	288.1862	31.27066
Longitude (°)	87.46532	114.073	101.886	7.601263
Latitude (°)	33.9939	44.11853	37.81195	3.028017

Table 4. The impact of different numbers of neurons on the model (Bold is the best value).

Number of Neurons	Error	Height	Speed	Angle	Longitude	Latitude
{32, 64, 32, 64}	MSE	411.156	212.227	11.936	0.375	6.614
	RMSE	20.277	14.568	3.454	0.613	2.571
	MAE	19.419	14.169	3.270	0.595	2.562
	R²	0.354	0.401	0.400	0.632	−5.270
{32, 64, 64, 128}	MSE	105.540	79.569	15.654	2.103	1.198
	RMSE	10.273	8.920	3.956	1.450	1.095
	MAE	10.002	8.720	3.862	1.449	0.063
	R²	0.527	0.404	0.214	−1.063	−0.136
{32, 64, 128, 256}	MSE	187.838	117.381	14.541	1.457	2.599
	RMSE	13.705	10.834	3.813	1.207	1.612
	MAE	13.381	10.649	3.457	1.156	1.607
	R²	0.705	0.121	0.270	−0.429	−1.464
{64, 128, 32, 64}	MSE	271.576	1487.105	11.358	75.858	40.001
	RMSE	16.480	38.563	3.370	8.998	6.324
	MAE	16.004	14.976	3.186	8.008	6.324
	R²	0.573	0.888	0.429	0.206	−36.920
{64, 128, 64, 128}	MSE	188.321	3803.365	24.553	0.526	0.278
	RMSE	13.723	61.671	4.955	0.725	0.527
	MAE	13.515	59.810	4.765	0.651	0.496
	R²	0.704	0.715	−0.232	0.484	0.736
{64, 128, 128, 256}	MSE	60.829	133.116	6.507	2.238	2.083
	RMSE	7.780	11.537	2.551	1.496	1.443
	MAE	7.581	11.269	1.401	1.476	1.439
	R²	0.904	0.804	0.673	−1.196	−0.974
{128, 256, 32, 64}	MSE	40.900	67.990	14.873	0.765	0.026
	RMSE	6.395	8.246	3.856	0.875	0.162
	MAE	5.894	8.124	3.547	0.865	0.277
	R²	0.936	0.791	0.854	0.948	0.907
{128, 256, 64, 128}	MSE	134.479	100.129	5.756	42.808	0.097
	RMSE	11.597	10.006	2.399	6.542	0.311
	MAE	11.137	9.613	1.586	6.306	0.109
	R²	0.789	0.651	0.711	0.892	0.975
{128, 256, 128, 256}	MSE	50.791	3685.085	10.937	111.709	0.534
	RMSE	7.127	60.704	3.307	10.569	0.731
	MAE	6.886	57.098	3.112	10.116	0.658
	R²	0.920	0.724	0.451	−0.095	0.493

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Y.; Pan, Q.; Wang, Z.; Hu, B. A Novel Trajectory Prediction Method Based on CNN, BiLSTM, and Multi-Head Attention Mechanism. Aerospace 2024, 11, 822. https://doi.org/10.3390/aerospace11100822

AMA Style

Xu Y, Pan Q, Wang Z, Hu B. A Novel Trajectory Prediction Method Based on CNN, BiLSTM, and Multi-Head Attention Mechanism. Aerospace. 2024; 11(10):822. https://doi.org/10.3390/aerospace11100822

Chicago/Turabian Style

Xu, Yue, Quan Pan, Zengfu Wang, and Baoquan Hu. 2024. "A Novel Trajectory Prediction Method Based on CNN, BiLSTM, and Multi-Head Attention Mechanism" Aerospace 11, no. 10: 822. https://doi.org/10.3390/aerospace11100822

APA Style

Xu, Y., Pan, Q., Wang, Z., & Hu, B. (2024). A Novel Trajectory Prediction Method Based on CNN, BiLSTM, and Multi-Head Attention Mechanism. Aerospace, 11(10), 822. https://doi.org/10.3390/aerospace11100822

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Trajectory Prediction Method Based on CNN, BiLSTM, and Multi-Head Attention Mechanism

Abstract

1. Introduction

2. Theoretical Foundations

2.1. Convolutional Neural Networks (CNNs)

2.2. Long Short-Term Memory Network (LSTM)

2.3. Bidirectional Long Short-Term Memory Network (BiLSTM)

3. Proposed Method

3.1. CNN-BiLSTM Incorporating Multi-Head Attention Mechanism

3.2. Multi-Head Attention Mechanism

3.3. Network Structure Parameters

3.4. Network Training Process

4. Experimental Validation

4.1. Dataset Description

4.2. Experimental Results and Analysis

4.2.1. Effect of Different Numbers of Neurons on Modeling

4.2.2. Comparative Analysis of Different Methods

4.2.3. Ablation Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI