Multi-Task Dynamic Spatio-Temporal Graph Attention Network: A Variable Taxi Time Prediction Model for Airport Surface Operation

Yang, Xiaoyi; Yang, Hongyu; Mao, Yi; Wang, Qing; Yin, Suwan

doi:10.3390/aerospace11050371

Open AccessArticle

Multi-Task Dynamic Spatio-Temporal Graph Attention Network: A Variable Taxi Time Prediction Model for Airport Surface Operation

by

Xiaoyi Yang

¹

,

Hongyu Yang

¹,

Yi Mao

²

,

Qing Wang

¹ and

Suwan Yin

^1,2,*

¹

College of Computer Science, Sichuan University, Chengdu 610065, China

²

Key Laboratory of Maritime Intelligent Network Information Technology, Ministry of Education, Hohai University, Nanjing 210024, China

^*

Author to whom correspondence should be addressed.

Aerospace 2024, 11(5), 371; https://doi.org/10.3390/aerospace11050371

Submission received: 24 March 2024 / Revised: 26 April 2024 / Accepted: 2 May 2024 / Published: 8 May 2024

(This article belongs to the Collection Air Transportation—Operations and Management)

Download

Browse Figures

Versions Notes

Abstract

:

Variable taxi time prediction is the core of the Airport Collaborative Decision Making (A-CDM) system. An accurate taxi time prediction contributes to enhancing airport operational efficiency, safety and predictability. The deep dynamic spatio-temporal correlation inherent in airport traffic data is critical for taxi time prediction. However, existing machine learning (deep learning) methods have been unable to thoroughly exploit these correlations. To address this issue, we propose a deep learning-based model called the multi-task dynamic spatio-temporal graph attention network (MT-DSTGAN). Our model also predicts future entire airport traffic flow and taxiing segment traffic flow as auxiliary tasks, with the goal of enhancing the accuracy of aircrafts’ taxi time prediction. The proposed MT-DSTGAN model is implemented and assessed through a case study of Beijing Capital International Airport with a real-world dataset. The advantage of the proposed model, which shows better performance in various evaluation metrics, is demonstrated in a comparative study with other baseline works. In summary, the proposed MT-DSTGAN exhibits promising capabilities in perceiving the dynamic changes in the taxiing process of aircraft and demonstrates the ability to capture complex spatio-temporal correlations in airport traffic data.

Keywords:

variable taxi time prediction; deep learning; dynamic spatio-temporal correlations; graph neural networks; multi-task learning

1. Introduction

International Civil Aviation Organization (ICAO) predicts that aviation demand will continue to grow annually at a rate of 4.6% until 2032 [1]. The number of flights made globally by the airline industry has reached 39 million in 2019 [2]. Moreover, it is expected that passenger numbers could double to 8.2 billion from 2017 to 2037. This increased demand is expected to contribute to congestion at airports and their surrounding areas, resulting in flight delays, as well as safety and environmental issues. To address this issue, significant research efforts have been dedicated to introducing new operational decision support tools by Single European Sky ATM Research (SESAR) and Next-Generation Air Transportation System (NEXTGEN), for example, the Airport Collaborative Decision Making (A-CDM) system, which aims to improve the overall efficiency of airport operations by enhancing the predictability of aircraft operation processes and optimizing the utilization of existing resources. The successful implementation of A-CDM in 28 European airports highlights the crucial role of accurate taxi time estimation [3]. However, despite the successful application of A-CDM at hub airports, most airports still rely on default taxi times, either based on the experience of air traffic controllers (ATCOs) or on average taxi times derived from historical data. This approach may result in flights failing to arrive at the runway on time. For instance, arriving too early may lead to an excessively long runway queue, while arriving too late may cause flights to miss their runway slots. Therefore, accurate taxi time prediction is crucial for the efficient operation of airport systems, and it plays an integral role in the implementation of NEXTGEN [4].

Owing to the inherent uncertainties in surface operations, achieving accurate taxi time predictions for individual flights poses a considerable challenge. Each aircraft taxiing on the ground operates under distinct conditions, which are influenced by factors such as varying levels of traffic congestion and fluctuating weather conditions. This complexity persists even when aircraft follow the same taxi route between the runway and the gate. Considering this, existing data-driven research can be categorized into three types: statistical analysis-based methods, traditional machine learning-based methods, and deep learning-based methods. The statistical analysis-based approach is a data-driven approach, which establishes the relationship between the taxi time and its potential influencing factors. This is achieved by investing the impact of different factors, drawing conclusions on the most important factor for accurately modeling taxi time [5,6,7,8,9,10]. Another kind of data-driven approach is traditional machine learning, which is achieved by constructing feature sets that may affect aircraft taxi time and utilizing extensive historical operational data to establish taxi time prediction models [11,12,13,14,15,16,17,18]. However, these methods have some drawbacks: (1) Some features defined by experts are only available after the taxiing process [9,16]. (2) Some features may be overlooked or inadequately quantified, thereby reducing the performance of machine learning models.

On the other hand, deep learning-based approaches are adopted to predict aircraft taxi times, which primarily employ neural network models to learn and represent the complex features of data [19,20,21,22,23,24,25,26]. Due to the powerful feature representation learning capabilities of deep learning, it typically outperforms traditional machine learning algorithms. Existing research considers impact factors either at the macroscopic-level [16,20,21,25], the entire airport traffic flow (the number of departing and arriving aircraft), or at the microscopic-level [22], namely traffic conditions on taxiing segments. Few studies based on deep learning so far have considered the inherent deep spatio-temporal correlations in airport traffic data, despite the fact that this correlation is critical for large hub airports, where it is caused by the high dynamism and uncertainty of aircraft ground movements.

In light of this, this paper considers a more efficient neural network architecture that is capable of capturing the intricate spatio-temporal correlations inherent in airport traffic data, which can also be used to predict future entire airport traffic flow and taxiing segment traffic flow as auxiliary tasks. Obtaining estimates related to future traffic conditions is likely to be meaningful for the prediction of taxi time [21,26]. The main contributions of this work are summarized as follows:

To fully exploit the complex dynamic spatio-temporal correlations among taxiing segments, we develop a novel dynamic spatio-temporal graph attention network to capture the correlations that are inherent in airport traffic data, which are crucial for more accurate taxi time prediction.
We adopt a multi-task learning approach to predict future entire airport traffic flow and future taxiing segment traffic flow as auxiliary tasks, aiming to enhance the prediction accuracy of the taxi time of aircraft.
Unlike previous research, this study considers both macroscopic-level and microscopic-level traffic conditions, thus maximizing the potential value derived from airport traffic data.
We validate the effectiveness of the proposed MT-DSTGAN model using a real-world dataset from Beijing International Airport, and the experimental results demonstrate that MT-DSTGAN outperforms other baseline models in various evaluation metrics.

This paper is organized as follows: Section 2 provides a comprehensive literature review related to taxi time prediction; Section 3 introduces the features that could impact taxi time and provides a detailed methodology; the dataset descriptions, performance metrics, and experimental results are presented in Section 4; finally, conclusions are drawn in Section 5, highlighting the important contributions of our work and possible directions for future work.

2. Literature Review

Aircraft taxi time can be categorized into two components: taxi-out time for departing aircraft, which represents the duration from the gate to the runway; and taxi-in time for arriving aircraft, referring to the period from the runway to the gate. For simplicity, both components are collectively referred to as ‘taxi time’ throughout this study.

2.1. Taxi Time Prediction

Existing variable taxi time prediction research can be distinguished by different criteria or principles; in this work, we mainly focus on the research methodology, which can be categorized into three types: statistical analysis-based methods, traditional machine learning-based methods, and deep learning-based methods.

Early research mainly focused on statistical analysis by establishing the relationship between the taxi time and potential impact factors. Refs. [5,6] found that runway configuration, airline/terminal, downstream limitations, and departure queue size were key factors affecting the taxi time of aircraft. Refs. [7,8,9,10] tried to use linear regression, multiple linear regression, and log-linear regression to improve taxi time estimates. These studies examined various factors that could influence an aircraft’s taxi time, including taxi distance, the number of departing and arriving aircraft, the taxi turning angle, the runway configuration, the aircraft type, the weather conditions, etc. However, these methods still fail to adequately capture the complex nonlinear relationships present in airport traffic data.

As traditional machine learning techniques continue to advance, particularly with regard to their ability to mine nonlinear relationships in data, they are seeing increasing application in studies [11,12,13,14,15,16,17,18] focused on predicting the taxi time of aircraft. In general, expert-defined feature sets [16,18] that may affect an aircraft’s taxi time can be categorized into flight properties, airport operational information, traffic conditions, and weather conditions (the details of the influencing factors can be found in Appendix A). Based on the potential factors influencing the taxi time, Refs. [14,15,16,18] conducted a series of experiments to predict the taxi time of aircraft using various traditional machine learning methods. These studies provided an objective evaluation and thorough analysis of each model, highlighting the superior performance of ensemble learning techniques such as the random forest (RF) and the gradient boosting regression tree (GBRT). However, certain expert-defined features are only accessible at the end of the taxiing process [9,16]. In addition, some effective features may be neglected or inadequately quantified, limiting the performance of the traditional machine learning model.

With advances in neural network architectures and optimization methods, researchers are applying deep learning methods with increasing frequency [19,20,21,22,23,24,25,26] to predict the taxi time of aircraft. Such methods include convolutional neural networks [19,23], wide and deep models [20], graph neural networks [22], and deep metric learning [24,25]. And deep learning-based methods often lead to superior results compared to traditional machine learning algorithms due to their powerful capabilities of feature representation learning. In addition, Refs. [21,26] state that obtaining estimates about future traffic conditions is likely to be meaningful for predicting taxi time.

Although existing methods based on machine learning (deep learning) have achieved good performance in terms of taxi time prediction, these methods are still not sufficient to effectively capture the deep dynamic spatio-temporal correlations inherent in airport traffic data, which are crucial for more accurate taxi time prediction. Most of the studies [16,20,21,25] focus on macroscopic-level influencing factors, such as the number of departing and arriving aircraft, namely entire airport traffic flow, and ignore the impact of microscopic-level influencing factors, such as traffic conditions, on individual taxiing segment; such factors have been considered by only a few studies [22]. While the study [22] considers the impact of the traffic condition on individual taxiing segments, it overlooks the dynamic spatio-temporal correlations among taxiing segments. At large hub airports, the complex spatio-temporal correlations among taxiing segments become particularly significant due to the high dynamism and uncertainty of aircraft ground movements. Therefore, there is a pressing need to create a more efficient neural network architecture capable of capturing this complexity.

2.2. Graph Neural Networks

In recent years, Graph Neural Networks (GNNs) have been widely used in several domains, including recommender systems [27], biology [28], and traffic forecasting [29], due to their ability to explicitly model correlations in unstructured data, i.e., graph-structured data.

The spatio-temporal graph neural network is a variant of the GNNs designed to capture the spatio-temporal correlations inherent in traffic data. Some works [30,31,32,33,34,35,36] have applied it to the travel time-estimating tasks. Ref. [31] supposed that traffic congestion on a road segment could have a high probability of causing traffic congestion on its neighboring road segments in a short time; thus, a road segment’s future traffic condition is strongly correlated with the historical traffic conditions of the road segment itself, as well as its neighbor road segments. This effect is even more significant at large hub airports, where numerous aircraft are waiting to depart or arrive. In order to build a fast inference system for travel time prediction, Ref. [32] used a positional encoding technique to record the sequential information of the journey path. Ref. [35] proposed a multi-task learning framework that predicts travel time while also predicting future traffic speeds and taking into account the driver’s individual preferences. Building on the work [31], Ref. [36] used a cross-graph attention network to capture the dynamic correlation between the bus route and the bus transportation network.

In light of this, we propose a novel neural network architecture called the multi-task dynamic spatio-temporal graph attention network (MT-DSTGAN) to estimate the taxi time of aircraft. Our architecture captures the complex dynamic spatio-temporal correlations inherent in airport traffic data. We adopt a multi-task learning approach to predict future airport traffic flow and future taxiing segment traffic flow as auxiliary tasks, aiming to enhance the prediction accuracy of the taxi time of aircraft. More accurate taxi time prediction can improve the predictability of aircraft operation processes and enhance airport operational efficiency, providing scientific pre-tactical and tactical support for airport operational control departments and air traffic control units.

3. Methodology

3.1. Problem Definition

We first define the related concepts of taxi time prediction as follows:

Airport network. In this study, the airport network is defined as a directed graph $G = (V_{l}, E_{l})$ , where $V_{l}$ is the set of all taxiing segments and $E_{l}$ is the edge set. For the sake of convenience, the “taxiing segment” is also referred to as the “link” hereafter. Edge $e_{i j} \in E_{l}$ denotes the edge connecting link $l_{i}$ and link $l_{j}$ , if there is a connection from link $l_{i}$ to link $l_{j}$ . See Figure 1 for an overview.
Taxi route. In this study, we define the taxi route as a sequence link on the intended origin–destination path, $r o u t e = \{v_{1}, \dots, v_{| r o u t e |}\}$ , where $| r o u t e |$ is the number of links found along the taxi route.
Taxi route query. A taxi route query is defined as a query tuple $r e q = 〈r, p, t〉$ , where r is the taxi route, p represents the flight properties, and t is the start time.
Problem: Taxi time prediction. When an aircraft sends a query request, we aim to predict the aircraft’s taxi time based on the taxi route query and historical traffic conditions.

3.2. Feature Extraction

Effective features that may affect the taxi time of an aircraft are extracted from a large amount of historical data for taxi time prediction.

Airport network: The airport network describes the layout of taxiways, including the relation between the links. The following features are extracted for each link: ID, length, width, and type (i.e., whether there is a maneuvering area or not). In addition, the graph structure of the airport network is used to describe geographic relationships.
Flight properties: Aircraft type, airline affiliation, arrival or departure status, and taxiing distance are considered in this work.
Traffic condition: It is known that historical traffic conditions have a significant impact on traffic forecasts. At the microscopic level, the impact of features such as taxi speed and traffic flow on individual taxiing segments can be considered. At the macroscopic level, we can consider the average taxi speed and traffic flow of the entire airport. In this study, we only focus on traffic flow as the traffic condition feature. For more details, see Section 4.
Temporal information: The aircraft’s taxi time can be influenced by various temporal factors. We consider the hour of the day, minute of the hour, and day of the week as temporal features.

3.3. Model Framework

The Figure 2 shows the architecture of the proposed model framework, the multi-task dynamic spatio-temporal graph attention network (MT-DSTGAN), which consists of four modules: a time series block, dynamic spatio-temporal learning, dynamic airport traffic flow learning, and integration. The time series block module utilizes historical traffic conditions to generate future traffic conditions. The dynamic spatio-temporal learning module fully exploits the complex dynamic spatio-temporal correlations among taxiing segments. The dynamic airport traffic flow learning module determines the impact of the entire airport traffic flow on the taxi time of aircraft. The integration module concatenates the feature representations from all modules’ outputs and then employs fully connected layers to output the prediction of the taxi time.

The algorithm aims to train a parametric model capable of learning the mapping of high-dimensional features embedding of traffic conditions, flight properties, temporal information, and airport network to the taxi time of aircraft using a large amount of historical data. (The definitions are shown in Section 3.1 and Section 3.2.)

The historical taxiing segments traffic flow (

H T F

), historical entire airport traffic flow (

H A F

), flight properties (p), taxi route (r), start time (t), and airport network (G) are input into the model. The model then outputs the aircraft’s taxi time (

\hat{T A T}

).

\hat{T A T} \leftarrow M o d e l (〈 H T F, H A F, p, r, t 〉 | G)

(1)

This study does not address the path optimization problem, specifically focusing on predicting the taxi time for a given taxi route.

3.4. Time Series Block

Intuitively, the aircraft’s taxi time is not only related to historical traffic conditions but also to future traffic conditions. Incorporating future traffic conditions has been shown to facilitate a better prediction of the taxi time of aircraft [21,26]. The time series block module utilizes historical traffic conditions to generate future traffic conditions (as shown in Figure 3), aiming to enhance the accuracy of taxi time predictions. This study employs a state-of-the-art time series module, iTransformer, which is specifically designed for time series data, to predict future traffic conditions. For more details about iTransformer, see [37].

In this study, we only focus on traffic flow as the traffic condition feature; more details can be found in Section 4. At the microscopic level, the traffic flow of the taxiing segments at time step t is denoted as

T F^{t}

. At the macroscopic level, the traffic flow of the entire airport at time step t is denoted as

A F^{t}

. The inputs of the module relate to the historical taxiing segments’ traffic flow, which is

H T F = [T F^{s - t_{h}}, \dots, T F^{s - 1}] \in R^{t_{h} \times N_{T F}}

, and historical entire airport traffic flow, which is

H A F = [A F^{s - t_{h}}, \dots, A F^{s - 1}] \in R^{t_{h} \times N_{A F}}

. The outputs of the module are future taxiing segments traffic flow, which is

\hat{F T F} = [T F^{s}, \dots, T F^{s + t_{f} - 1}] \in R^{t_{f} \times N_{T F}}

, and future entire airport traffic flow, which is

\hat{F A F} = [A F^{s}, \dots, A F^{s + t_{f} - 1}] \in R^{t_{f} \times N_{A F}}

. Here,

t_{h}

denotes the input series length,

t_{f}

denotes the predict series length, and the current time step is s. Then, the generated future taxiing segments traffic flow and future entire airport traffic flow are fed into the dynamic spatio-temporal learning module and dynamic airport traffic flow learning module, respectively, aiming to enhance the prediction accuracy of the taxi time of aircraft.

3.5. Dynamic Spatio-Temporal Learning

Inspired by related work [31,32,35,36], dynamic spatio-temporal learning is divided into three phases. In the first stage, we assume that the taxi time of the individual taxiing segment is directly influenced by the traffic conditions of the taxiing segment itself, as well as its neighbor taxiing segments [31]. Spatial and temporal relationships between and within taxiing segments are captured using spatial and temporal attention, respectively [35]. This process generates the hidden states of the dynamic spatio-temporal tensor of taxiing segments. In the second stage, a 3D cross spatio-temporal attention network [36] is utilized to adaptively capture the joint dynamic spatio-temporal correlation between the hidden states of flight properties and the hidden states of the taxiing segments through which the taxi route passes and their neighbors. This process generates a higher-level dynamic spatio-temporal representation of the taxi route. In the third stage, the final representation of the taxi route is obtained by adding sequence information to the links using a position embedding technique [32].

Attention mechanisms are widely used techniques in deep learning. By incorporating attention mechanisms, neural networks are able to automatically learn and selectively focus on important information in the input, thus enhancing the model’s performance and generalization ability. The well-known self-attention mechanism is proposed in transformer [38], which consists of a QKV structure. Attention scores are computed using QK, after which important information is extracted from V according to these scores. In this module, we use different attention mechanisms to capture the complex spatio-temporal correlations in airport traffic data.

Static spatio-temporal tensors of the taxiing segments need to be constructed first. We leverage the concept of [31]. The dynamic links hidden states

X^{(D)} \in R^{T \times N_{l} \times d^{(D)}}

(where

T = t_{h} + t_{f}

,

N_{l}

denotes the number of links) incorporate historical and future taxiing segments traffic conditions features (generated by the time series block) which are encoded through a fully connected layer. The static links’ hidden states

X^{(S)} \in R^{N_{l} \times d^{(S)}}

are encoded by discrete features such as the link ID and continuous features such as link length using a lookup table technique and a fully connected layer, respectively. The hidden states of temporal information

X^{(T)} \in R^{T \times d^{(T)}}

are encoded through a fully connected layer. Then, matrices

X^{(D)}

,

X^{(S)}

, and

X^{(T)}

are combined into spatio-temporal matrices

X^{(S T)} \in R^{T \times N_{l} \times d^{(S T)}}

, by expanding matrices

X^{(S)}

and

X^{(T)}

,

\begin{matrix} X^{(S T)} = C o n c a t (X_{i, k}^{(D)}, X_{i}^{(S)}, X_{k}^{(T)}), \\ i \in [1, N_{l}], k \in [1, t_{h} + t_{f}] \end{matrix}

(2)

The static spatio-temporal hidden representation of a taxiing segment i can be expressed as a tensor

{X_{i}}^{(S T)} \in R^{T \times d^{(S T)}}

, where T is the number of time slots, which is the sum of historical and future time steps, and

d^{(S T)}

is the dimension of the hidden representation.

x_{i, k} \in R^{d^{(S T)}}

represents static spatio-temporal hidden representation of taxiing segment i at time step k.

3.5.1. Spatial Attention and Temporal Attention

(a) Spatial attention

The term “dynamic spatial correlation” describes how the influence weights of other taxiing segments in the airport network change over time, impacting the traffic state of the target taxiing segment. For example, as illustrated in Figure 4, the upstream taxiing segment in shallow blue may negatively affect the traffic state of the green taxiing segment during rush hour. This impact may diminish when the congestion eases. To capture the dynamic spatial correlations, we designed a spatial attention network, the multi-head graph attention network (MGAT) [39], to adaptively model spatial correlations between neighbor taxiing segments.

In an airport network

G = (V_{l}, E_{l})

, for a taxiing segment

v_{i} \in V_{l}

, at time step k, the set of spatial-related taxiing segments is

N_{i, k}^{S} = {v_{s, k} |e_{i s} \in E_{l}} ⋃ {v_{i, k}}

, which includes the taxiing segment at time step k itself and its neighbor taxiing segments at time step k. Then, we use the following graph attention settings to compute the dynamic spatial attention hidden representation of the taxiing segment

x_{i, k}^{(S_S T)}

.

The static spatio-temporal hidden representation of taxiing segment i at time step k, where

{x_{i, k}}^{(S T)}

is taken as the

q u e r y

of the attention mechanism. The static spatio-temporal hidden representations of spatial-related taxiing segments

X_{i, k, s}^{(S T)}

(

s \in [1, |N_{i, k}^{s}|]

) are taken as the

k e y s

and

v a l u e s

of the attention mechanism. To be specific, the attention mechanism is formulated as follows:

Q_{i, k}^{m} = W_{s}^{m} x_{i, k}^{m, (S T)}

(3)

K_{i, k, s}^{m} = W_{s}^{m} x_{i, k, s}^{m, (S T)}

(4)

V_{i, k, s}^{m} = W_{s}^{m} x_{i, k, s}^{m, (S T)}

(5)

f (Q_{i, k}^{m}, K_{i, k, s}^{m}) = \frac{{(Q_{i, k}^{m})}^{T} \cdot K_{i, k, s}^{m}}{\sqrt{d}}

(6)

α (Q_{i, k}^{m}, K_{i, k, s}^{m}) = \frac{exp (f (Q_{i, k}^{m}, K_{i, k, s}^{m}))}{\sum_{s} exp (f (Q_{i, k}^{m}, K_{i, k, s}^{m}))}

(7)

x_{i, k}^{m, (S_S T)} = A t t e n t i o n (Q_{i, k}^{m}, K_{i, k}^{m}, V_{i, k}^{m}) = \sum_{s} α (Q_{i, k}^{m}, K_{i, k, s}^{m}) V_{i, k, s}^{m}

(8)

x_{i, k}^{(S_S T)} = B N ({| |}_{m = 1}^{M} x_{i, k}^{m, (S_S T)} W_{d s} + x_{i, k}^{(S T)})

(9)

where M is the number of heads,

| |

represents the concatenation,

W_{s}^{m}

is the learnable weight matrix of the

m^{t h}

head shared by all taxiing segments,

W_{d s}

is the learnable weight matrix shared by all taxiing segments,

e x p

refers to the exponential function, d refers to the embedding dimension, and

B N

is the batch normalization operation. Then, the dynamic spatial attention hidden representation of taxiing segment i, at time step k can be encoded as

x_{i, k}^{(S_S T)}

by Equation (9) (the residual connection [40] and batch normalization (BN) [41] are added to prevent feature loss and an internal covariate shift).

The initial input of the spatial attention network is static spatio-temporal hidden representation of taxiing segments

X^{(S T)} \in R^{T \times N_{l} \times d^{(S T)}}

and the output is the dynamic spatial attention hidden representation of taxiing segments

X^{(S_S T)} \in R^{T \times N_{l} \times d^{(S T)}}

.

(b) Temporal attention

The term “dynamic temporal correlation” refers to how the influence weights of a given taxiing segment’s historical traffic conditions on its future traffic state change over time in the airport network. For example, morning rush hour congestion might be influenced by preceding traffic conditions, and this influence may accumulate over time until it diminishes. To dynamically capture relationships across multiple time steps, we implement a temporal attention network based on the mask mechanism of the Transformer [38], as depicted in Figure 5.

In an airport network

G = (V_{l}, E_{l})

, for a taxiing segment

v_{i} \in V_{l}

, at time step k, the set of temporal-related taxiing segments is

N_{i, k}^{t} = {v_{i, t} |t < k} ⋃ {v_{i, k}}

, which includes the taxiing segment at step k itself and the taxiing segment before time step k. Then, we use the following temporal attention settings to compute the dynamic temporal attention hidden representation of the taxiing segment

x_{i, k}^{(T_S T)}

.

The static spatio-temporal hidden representation of taxiing segment i at time step k,

{x_{i, k}}^{(S T)}

is taken as the

q u e r y

of the attention mechanism. The static spatio-temporal hidden representation of temporal-related taxiing segments

X_{i, k, t}^{(S T)}

(

t \in [1, |N_{i, k}^{t}|]

) are taken as the

k e y s

and

v a l u e s

of the attention mechanism. To be specific, the attention mechanism is formulated as follows:

Q_{i, k}^{m} = W_{t}^{m} x_{i, k}^{m, (S T)}

(10)

K_{i, k, t}^{m} = W_{t}^{m} x_{i, k, t}^{m, (S T)}

(11)

V_{i, k, t}^{m} = W_{t}^{m} x_{i, k, t}^{m, (S T)}

(12)

f (Q_{i, k}^{m}, K_{i, k, t}^{m}) = \frac{{(Q_{i, k}^{m})}^{T} \cdot K_{i, k, t}^{m}}{\sqrt{d}}

(13)

α (Q_{i, k}^{m}, K_{i, k, t}^{m}) = \frac{exp (f (Q_{i, k}^{m}, K_{i, k, t}^{m}))}{\sum_{t} exp (f (Q_{i, k}^{m}, K_{i, k, t}^{m}))}

(14)

x_{i, k}^{m, (T_S T)} = A t t e n t i o n (Q_{i, k}^{m}, K_{i, k}^{m}, V_{i, k}^{m}) = \sum_{t} α (Q_{i, k}^{m}, K_{i, k, t}^{m}) V_{i, k, t}^{m}

(15)

x_{i, k}^{(T_S T)} = B N ({| |}_{m = 1}^{M} x_{i, k}^{m, (T_S T)} W_{d t} + x_{i, k}^{(S T)})

(16)

where

W_{t}^{m}

is the learnable weight matrix of the

m^{t h}

head shared by all taxiing segments and

W_{d t}

is the learnable weight matrix shared by all taxiing segments. The meanings of the other symbols remain consistent with those previously determined. Then, the dynamic temporal attention hidden representation of taxiing segment i at time step k can be encoded as

x_{i, k}^{(T_S T)}

by Equation (16).

The initial input of temporal attention is static spatio-temporal hidden representation of taxiing segments

X^{(S T)} \in R^{T \times N_{l} \times d^{(S T)}}

and the output is the dynamic temporal attention hidden representation of taxiing segments

X^{(T_S T)} \in R^{T \times N_{l} \times d^{(S T)}}

.

(c) Fusion network

The traffic states on a taxiing segment at a given time step are related to the traffic states at previous time steps and the neighboring taxiing segments. To obtain a dynamic spatio-temporal hidden representation of the taxiing segment, the dynamic spatial attention hidden representation and dynamic temporal attention hidden representation of a taxiing segment are fused using a fully connected layer. To be specific, the fusion mechanism is formulated as follows:

X^{(D S T)} = D e n s e (C o n c a t (X^{(S_S T)}, X^{(T_S T)}))

(17)

where

C o n c a t

represents the concatenation and

D e n s e

represents the fully connected layers with activation functions.

The initial inputs of the fusion networks are the dynamic spatial attention hidden representation

X^{(S_S T)}

and dynamic temporal attention hidden representation

X^{(T_S T)}

, the output is the dynamic spatio-temporal hidden representation of taxiing segments

X^{(D S T)} \in R^{T \times N_{l} \times d^{(D S T)}}

.

3.5.2. Three-Dimensional Cross Spatio-Temporal Attention

Intuitively, the taxi time for an aircraft within a given taxiing segment is related to the historical and future traffic states of the taxiing segment itself and its neighbors, as well as the flight properties. For instance, aircraft of different types may require different amounts of time on the same taxiing segment, even under the same traffic conditions. Therefore, in the second stage, we design a 3D cross spatio-temporal attention network [31,36] to adaptively model the joint dynamic spatio-temporal correlations between the hidden states of flight properties and hidden states of the taxiing segments (generated in the first stage) through which the taxi route passes, as well as those of their neighbors. Then, we obtain the higher-level spatio-temporal representation of the taxi route, as illustrated in Figure 6.

The hidden states of flight properties

x^{(R_C)} \in R^{d^{(D S T)}}

, which incorporate features of aircraft type, airline affiliation, arrival or departure status, and total taxi distance, are encoded using a lookup table technique and a fully connected layer. The hidden states of start-time information

x^{(R_T)} \in R^{d^{(D S T)}}

are encoded through a fully connected layer.

Then, the combination of the representation of flight properties

x^{(R_C)}

and the representation of start time

x^{(R_T)}

is taken as the

q u e r y

of the attention mechanism. The dynamic spatio-temporal hidden states of taxiing segments through which the taxi route passes and their neighbors in the last

t_{h}

and future

t_{f}

time slots

x_{i, j, k}^{(D S T)} (i \in [1, |r o u t e|], j \in [1, |N_{i, k}^{S}|], k \in [1, t_{h} + t_{f}])

are taken as the

k e y s

and

v a l u e s

of the attention mechanism. To be specific, the attention mechanism is formulated as follows:

x_{r} = C o n c a t (x^{(R_C)}, x^{(R_T)})

(18)

Q_{r}^{m} = W_{r q}^{m} x_{r}^{m}

(19)

K_{i, j, k}^{m} = W_{r k}^{m} x_{i, j, k}^{m, (D S T)}

(20)

V_{i, j, k}^{m} = W_{r v}^{m} x_{i, j, k}^{m, (D S T)}

(21)

f (Q_{r}^{m}, K_{i, j, k}^{m}) = \frac{{(Q_{r}^{m})}^{T} \cdot K_{i, j, k}^{m}}{\sqrt{d}}

(22)

α (Q_{r}^{m}, K_{i, j, k}^{m}) = \frac{exp (f (Q_{r}^{m}, K_{i, j, k}^{m}))}{\sum_{j, k} exp (f (Q_{r}^{m}, K_{i, j, k}^{m}))}

(23)

x_{i}^{m, (R_D S T)} = A t t e n t i o n (Q_{r}^{m}, K_{i}^{m}, V_{i}^{m}) = \sum_{j, k} α (Q_{r}^{m}, K_{i, j, k}^{m}) V_{i, j, k}^{m}

(24)

x_{i}^{(R_D S T)} = B N ({| |}_{m = 1}^{M} x_{i}^{m, (R_D S T)} W_{d s t})

(25)

where

W_{r q}^{m}

is the learnable weight matrix of the

m^{t h}

head shared by all aircraft and

W_{r k}^{m}

and

W_{r v}^{m}

are the

m^{t h}

head learnable weight matrices shared by all taxiing segments. The meanings of the other symbols remain consistent with those previously determined. Then, the higher-level dynamic spatio-temporal attention hidden states of taxiing segment i can be encoded as

x_{i}^{(R_D S T)}

by Equation (25).

The initial inputs of a 3D cross spatio-temporal attention network are the hidden states of flight properties

x^{(R_C)} \in R^{d^{(D S T)}}

, the hidden states of start time

x^{(R_T)} \in R^{d^{(D S T)}}

, and the dynamic spatio-temporal hidden states of taxiing segments through which the taxi route passes and their neighbors in the last

t_{h}

and future

t_{f}

time slots

x_{i, j, k}^{(D S T)} (i \in [1, |r o u t e|], j \in [1, |N_{i, k}^{S}|], k \in [1, t_{h} + t_{f}])

. And the output is the higher-level dynamic spatio-temporal representation of taxiing segments through which the taxi route passes,

X^{(R_D S T)} \in R^{|r o u t e| \times d^{(D S T)}}

.

3.5.3. Positional Encoding

The taxiing segments along each taxi route exhibit a clear sequential structure, which is significant for training a highly accurate taxi time prediction model. However, recurrent neural networks are very time-consuming during inference, especially with large sequence lengths, and may not meet the real-time requirements of taxi time prediction tasks. Therefore, in this study, a positional encoding technique [32] designed to capture the sequence relationship of taxiing segments is used to incorporate sequence information into the taxiing segments. For the sequence encoding problem, we generate a series of cosinusoids:

P E (p o s, i n d e x) = cos (\frac{p o s}{10000^{i n d e x / d s t}})

(26)

where

p o s

is the taxiing segment’s position in a taxi route,

i n d e x

is the dimension index of taxiing segment hidden representation, and

d s t

is the size of taxiing segment hidden representation. We denote

P E_{i} = P E (i, :)

as the positional encoding vector for position i (

i \in [1, |r o u t e|]

). Then, the final hidden representation of the taxi route

x^{(R_r)}

can be encoded as follows:

x^{(R_r)} = \sum_{i} (1 + β \cdot P E_{i}) \cdot x_{i}^{(R_D S T)}

(27)

where

β

is a hyper-parameter that is used to control the effect of sequence information.

The initial input of positional encoding is higher-level dynamic spatio-temporal representation of taxiing segments through which the taxi route passes,

X^{(R_D S T)} \in R^{|r o u t e| \times d^{(D S T)}}

. And the output is the final hidden representation of the taxi route

x^{(R_r)} \in R^{d^{(D S T)}}

.

3.6. Dynamic Airport Traffic Flow Learning

The aircraft’s taxi time on a given taxi route is influenced not only by the microscopic-level traffic conditions and the traffic flow on individual taxiing segments, but also by the macroscopic-level traffic conditions and the traffic flow throughout the entire airport [16]. Hence, the dynamic relationship between the hidden states of flight properties and the hidden states of the entire airport traffic flow is adaptively modeled using the cross-attention mechanism to derive the dynamic hidden representation of the entire airport traffic flow

x^{(R_A F)}

.

The hidden states of entire airport traffic flow

X^{(A F)} \in R^{T \times d^{(a f)}}

(where

T = t_{h} + t_{f}

), which incorporate features of the historical and future entire airport traffic flow (generated by a time series block) are encoded through a fully connected layer.

Then, the combination of the representation of flight properties

x^{(R_C)}

and the representation of start time

x^{(R_T)}

is taken as the

q u e r y

of the attention mechanism. The hidden states of the entire airport traffic flow in the last

t_{h}

and future

t_{f}

time slots are taken as the

k e y s

and

v a l u e s

of the attention mechanism. To be specific, the attention mechanism is formulated as follows:

x_{r} = C o n c a t (x^{(R_C)}, x^{(R_T)})

(28)

Q_{r}^{m} = W_{f q}^{m} x_{r}^{m}

(29)

K_{k}^{m} = W_{f k}^{m} x_{k}^{m, (A F)}

(30)

V_{k}^{m} = W_{f v}^{m} x_{k}^{m, (A F)}

(31)

f (Q_{r}^{m}, K_{k}^{m}) = \frac{{(Q_{r}^{m})}^{T} \cdot K_{k}^{m}}{\sqrt{d}}

(32)

α (Q_{r}^{m}, K_{k}^{m}) = \frac{exp (f (Q_{r}^{m}, K_{k}^{m}))}{\sum_{k} exp (f (Q_{r}^{m}, K_{k}^{m}))}

(33)

x^{m, (R_A F)} = A t t e n t i o n (Q_{r}^{m}, K^{m}, V^{m}) = \sum_{k} α (Q_{r}^{m}, K_{k}^{m}) V_{k}^{m}

(34)

x^{(R_A F)} = B N ({| |}_{m = 1}^{M} x^{m, (R_A F)} W_{d a f})

(35)

where

W_{f q}^{m}

is the learnable weight matrix of the

m^{t h}

head shared by all aircraft,

W_{f k}^{m}

and

W_{f v}^{m}

are the

m^{t h}

head learnable weight matrix shared by airport traffic flow at different time slots. The meanings of the other symbols remain consistent with those previously determined. Then the dynamic hidden representation of entire airport traffic flow can be encoded as

x^{(R_A F)}

by Equation (35).

The initial inputs of cross-attention network are the hidden states of flight properties

x^{(R_C)} \in R^{d^{(D S T)}}

, the hidden states of start time

x^{(R_T)} \in R^{d^{(D S T)}}

, and the hidden states of entire airport traffic flow at multiple time steps

X^{(A F)} \in R^{T \times d^{(a f)}}

. And the output is the dynamic hidden representation of entire airport traffic flow

x^{(R_A F)} \in R^{d^{(a f)}}

.

3.7. Integration

Based on the above components, we concatenate all the latent representations

x^{(R_C)}

,

x^{(R_T)}

,

x^{(R_r)}

,

x^{(R_A F)}

as

H_{t}

, which is then fed into a fully connected layer to output the taxi time prediction

\hat{T A T}

, defined as:

\hat{T A T} = W_{h t}^{T} H_{t} + b_{h t}

(36)

where

W_{h t}

and

b_{h t}

are the parameters to be learned. Finally, we use a linear transformation to generate the final output result.

3.8. Multi-Task Learning

To enhance the accuracy of aircraft taxiing time prediction, the model predicts future entire airport traffic flow and future taxiing segments traffic flow as auxiliary tasks. This forms a multi-task learning framework, as depicted in Figure 7.

By implementing the aforementioned modules, we obtain the target prediction results of the taxi time of aircraft

\hat{T A T}

, future taxiing segments traffic flow

\hat{F T F}

, and future entire airport traffic flow

\hat{F A F}

.

\hat{Y} = \{\begin{matrix} {\hat{Y}}^{1} = \hat{T A T}, {\hat{Y}}^{1} \in R^{1} \\ {\hat{Y}}^{2} = \hat{F T F}, {\hat{Y}}^{2} \in R^{t_{f} \times N^{l} \times 1} \\ {\hat{Y}}^{3} = \hat{F A F}, {\hat{Y}}^{3} \in R^{t_{f} \times 1} \end{matrix}

(37)

where

t_{f}

denotes the predict series length and

N^{l}

denotes the number of taxiing segments.

The loss function of MT-DSTGAN corresponding to the multi-task layer is defined as the mean absolute percentage error (MAPE) between observed values

Y

and predicted values

\hat{Y}

,

L (θ) = (1 - λ_{1} - λ_{2}) |\frac{Y^{1} - {\hat{Y}}^{1}}{Y^{1}}| + \frac{λ_{1}}{t_{f} \times N_{l}} \sum_{k}^{t_{f}} \sum_{i}^{N_{l}} |\frac{Y_{k, i}^{2} - {\hat{Y}}_{k, i}^{2}}{Y_{k, i}^{2}}| + \frac{λ_{2}}{t_{f}} \sum_{k}^{t_{f}} |\frac{Y_{k}^{3} - {\hat{Y}}_{k}^{3}}{Y_{k}^{3}}|

(38)

where

θ

denotes all the learnable parameters in MT-DSTGAN,

λ_{1}

and

λ_{2}

represent loss weights,

t_{f}

denotes the predict series length, and

N_{l}

denotes the number of links.

4. Experiments

In this section, a series of experiments is conducted to measure the performance of MT-DSTGAN. All our approaches are implemented on Python 3.10, the deep learning models are constructed by the Pytorch 2.0.1 framework, and other methods are constructed by Scikit-learn 1.2.1 python package.

4.1. Data Preparation

To validate the proposed method MT-DSTGAN, operational data from Beijing Capital International Airport in China, covering the period from 13 September 2017 to 19 September 2017, is collected for data processing and analysis. In 2019, Beijing Capital International Airport (PEK) ranked second in the world, with 100 million passengers, making it one of the busiest hub airports in the world.

This dataset involves about 11,993 pieces of information on flights, which include (1) aircraft type: a designator with a four-character alphanumeric code defined by the International Civil Aviation Organization (ICAO); (2) departure or arrival status: departing aircraft taxi from the gate to the runway, while arriving aircraft taxi from the runway to the gate; (3) gate: the gate assigned to the aircraft; (4) runway: the runway assigned to the aircraft; (5) taxi route: the assigned taxi route to the aircraft; (6) start time: off-block time for departing aircraft or landing time for arriving aircraft; (7) end time: take-off time for departing aircraft or on-block time for arriving aircraft; (8) time slot: the time slot to which the start time belongs (the time interval set to 5 min ) denotes the temporal features of the start time.

This study adopts a definition of airport traffic flow to the article [16]. At the macroscopic level, for traffic flow of the entire airport

A F

, the number is counted if the aircraft’s taxiing process occurs within this time slot as Equation (39). At the microscopic level, for traffic flow of individual taxiing segment

T F

, the number is counted if the aircraft’s taxiing process occurs within this time slot, as shown in Equation (40).

\begin{matrix} A F (T_{slot}) = & \sum_{i} c o u n t (i), \\ T_{slot} ⋂ [t_{s t a r t} (i), t_{e n d} (i)] \neq ⌀ \end{matrix}

(39)

where

t_{s t a r t} (i)

denotes the start time of i th aircraft,

t_{e n d} (i)

denotes the end time of ith aircraft.

\begin{matrix} T F (T_{slot}) = & \sum_{i} \sum_{j} c o u n t ({l i n k}_{j}), \\ T_{slot} ⋂ [t_{s t a r t} (i), t_{e n d} (i)] \neq ⌀, j \in R o u t e (i) \end{matrix}

(40)

where

t_{s t a r t} (i)

denotes the start time of the i th aircraft,

t_{e n d} (i)

denotes the end time of ith aircraft,

R o u t e (i)

denotes the links through which the i th aircraft’s taxi route passes.

An example of the processed dataset is listed in Table 1, an example of traffic flow of the entire airport is listed in Table 2, and an example of the traffic flow of individual taxiing segments is shown in Table 3. In our dataset, which spans from 13 September 2017 to 19 September 2017, there are 2,016 time slices with 5-minute intervals each. Our dataset consists of a total of 484 taxiing segments. And the dataset is randomly divided into a training set and testing set in the ratio of 80% and 20%, representing 9595 and 2398 entries, respectively.

4.2. Experiment Configuration

4.2.1. Evaluation Metric

In this study, the mean absolute percentage error (MAPE⁽⁴¹)) and the root mean square error (RMSE⁽⁴²⁾) and the

\pm 1, 3, 5 \min error window accuracy

⁽⁴³⁾ are chosen as the evaluation metrics. For consistency, taxi time is measured in seconds. For the evaluation metric value of MAPE and RMSE, a smaller evaluation metric value means a better performance, while for an evaluation metric value of

\pm 1, 3, 5

min error window accuracy, a higher evaluation metric value means a better performance.

MAPE

$MAPE = \frac{1}{n} \sum_{i = 1}^{n} | \frac{Y_{i} - \hat{Y_{i}}}{Y_{i}} |,$

(41)
RMSE

$RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - \hat{Y_{i}})}^{2}}$

(42)
$\pm k$ min Error Window Accuracy

$\pm k \min error window accuracy = \frac{\sum_{i = 1}^{n} count (i f (Y_{i} - \hat{Y_{i}}) \leq k)}{n}$

(43)

where n is the total number of flights, $Y_{i}$ and $\hat{Y_{i}}$ are the observed and predicted taxi times of the aircraft, respectively.

4.2.2. Training Details

In this study, the extracted features, which include continuous and categorical variables in the forms of discrete and real values, are uniformly projected into 64-dimensional embedding vectors. The hidden state dimension is fixed at 64. We use a 5 min time interval and consider traffic conditions with historical time slots

t_{h}

= 6 and future time slots

t_{f}

= 6. The batch size is set to 64. The epoch is set to 200. There are 484 links. ReLU is selected as the activation function for all hidden layers. The hyper-parameter of positional encoding

β

is set to one. The loss weights

λ_{1}

,

λ_{2}

are set to 0.1 and 0.1, respectively. During the training process, the model parameters are optimized using the Adam optimizer with the default settings, except for the learning rate, which is set to 0.001.

4.3. Comparative Baselines

In this study, the following baseline models are also designed to further confirm the superior performance of our proposed approach.

HA. The average taxi time for origin–destination pairs in the historical data (only the training dataset) is taken as the prediction results;
DNN. We also use a two fully-connected layers neural network with ReLU activation to predict the aircraft’s taxi time.
GBRT. Gradient boosted regression trees (GBRT), which is well known for its outstanding performance in previous works [15,22], are used for aircraft’s taxi time prediction.
RF. Random forests (RF), which have achieved a superior prediction performance in previous works [17,18], are used for aircraft’s taxi time prediction.

4.4. Experimental Results

The experimental results are listed in Table 4, in which the MT-DSTGAN and other baselines for aircraft taxi time prediction are reported. A visualization of aircraft taxi time prediction is shown in Figure 8. The results show that the proposed approach MT-DSTGAN achieves the best performance for all indicators. In contrast to baseline methods, the MT-DSTGAN obtains relative percent reductions in MAPE ranging from 11.31% to 45.38%, in RMSE from 9.22% to 35.16%, and relative percent improvements in the

\pm 1

error window accuracy from 13.90% to 41.64%, the

\pm 3

error window accuracy from 3.90% to 11.23%, the

\pm 5

error window accuracy from 1.13% to 5.68%, respectively.

GBRT and RF outperform HA and DNN, demonstrating the superiority of the ensemble learning approach, as demonstrated by previous works [15,18,25]. RF performs well when the input features are sufficiently effective and shows strong performance on our dataset. However, these methods still not sufficient to capture the complex dynamic spatio-temporal correlations inherent in airport traffic data, and thus their performances are weaker than that of MT-DSTGAN. Additionally, we anticipate that our method will exhibit superior performance on a larger dataset compared to other baseline methods.

4.5. Ablation Experiment

In this subsection of the ablation experiment, the time series module for predicting future traffic conditions is removed to investigate whether the time series module is effective for accurately predicting the taxi time of aircraft. The ablation experiment can be divided into categories with no time series module (MT-DSTGAN_1), no taxiing segments traffic flow prediction (MT-DSTGAN_2), and no airport traffic flow prediction (MT-DSTGAN_3). The experimental results are listed in Table 5.

The experimental results show that incorporating future traffic conditions leads to a slight improvement in terms of prediction accuracy. It is worth noting that during the experiment, we observed that predicting future traffic conditions is indeed a challenging task. The model requires more training time to achieve improved results. Sometimes, the inclusion of additional noise can lead to worse results. A trade-off needs to be made between computing resources and model performance.

5. Conclusions

In this study, we built a deep learning-based model, which we named the multi-task dynamic spatio-temporal graph attention network (MT-DSTGAN), to tackle the variable taxi time prediction task. MT-DSTGAN leverages a dynamic spatio-temporal graph attention network to capture complex spatio-temporal correlations inherent in airport traffic data. Additionally, it employs a multi-task learning approach to concurrently predict future entire airport traffic flow and future taxiing segments traffic flow as auxiliary tasks, with the goal of improving the accuracy of taxi time predictions and maximizing the potential value derived from airport traffic data.

The proposed approach is validated using a real-world airport traffic dataset, and the experimental results demonstrate its superior performance compared to other baseline methods, including various data-driven models. In contrast to baseline methods, the MT-DSTGAN obtains relative percent reductions in MAPE ranging from 11.31% to 45.38%, in RMSE from 9.22% to 35.16%, and relative percent improvements on the

\pm 1

error window accuracy from 13.90% to 41.64%, the

\pm 3

error window accuracy from 3.90% to 11.23%, and the

\pm 5

error window accuracy from 1.13% to 5.68%, respectively. In summary, the proposed MT-DSTGAN exhibits promising capabilities in perceiving the dynamic traffic states changes in the taxiing process of aircraft and demonstrates the ability to capture complex spatio-temporal correlations in airport traffic data.

In the future, we are planning to integrate explicit airport operation rules into neural networks. These rules, including safety intervals, taxiing speed limits, and runway configurations, among others, are crucial aspects of airport operations. Incorporating them into our model is expected to improve prediction accuracy. Accurately predicting future airport traffic conditions using various methods is meaningful for improving taxi time prediction. Furthermore, developing a model to assist controllers in decision-making, as discussed in previous works [17,25], is an important direction for airport operation systems. Our model can be integrated into these frameworks to achieve better performance.

Author Contributions

Conceptualization: X.Y. and S.Y.; methodology: X.Y., H.Y. and S.Y.; Data curation: X.Y. and Q.W.; formal analysis, X.Y.; funding acquisition: H.Y., Y.M. and S.Y.; writing—original draft preparation, X.Y. and S.Y.; writing—review and editing, X.Y., H.Y. and S.Y.; software: X.Y. and Q.W.; supervision: H.Y. and S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China through grants 72201184 and U20A20161, and the Natural Science Foundation of Sichuan Province through grants 23NSFSC3451.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to confidentiality agreements.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

ICAO	International Civil Aviation Organization.
SESAR	Single European Sky ATM Research.
NEXTGEN	Next-Generation Air Transportation System.
A-CDM	Airport Collaborative Decision Making.
HTF	Historical taxiing segments traffic flow.
HAF	historical entire airport traffic flow.
FTF	Future taxiing segments traffic flow.
FAF	future entire airport traffic flow.
TAT	Taxi time.
GNNs	Graph Neural Networks.
MGAT	Multi-head Graph Attention Network.
MAPE	Mean Absolute Percentage Error.
RMSE	Root Mean Square Error.
HA	Historical Average.
DNN	Deep Neural Network.
RF	Random Forest.
GBRT	Gradient Boosted Regression Tree.
MT-DSTGAN	Multi-task dynamic spatio-temporal graph attention network.

Appendix A

Below, we have listed the potential influencing factors affecting taxi time, with some information extracted from [18,21,25]. We did not explore all of the following influencing factors and only used partial factors in Section 3.2. We plan to investigate the effects of other factors in future studies.

Table A1. Features of flight properties.

Features	Type	Description
Flight number	Categorical	Flight number information, such as CZ441 and MU735
Flight property	Binary	Passenger (1) and freight (0) transport
Aircraft type	Categorical	Type of aircraft, such as Airbus A320 and Boeing 737
Airline	Categorical	The airline to which the flight belongs, such as CZ and MU
Runway	Categorical	Runway for aircraft takeoff, such as 16R and 34L
Gate	Categorical	Gate for aircraft pushed back, such as 117L and 117R
Apron	Categorical	Apron category used by aircraft, such as T1 and T2
Boarding port	Categorical	Boarding port used by aircraft, such as C50 and D69
Boarding character	Categorical	Boarding characters used by aircraft: domestic, international, and mixed
Gate type	Binary	Corridor bridge (1) and remote (0) gate
Engine type	Binary	Two engines (1) or not (0)
Flag limited	Binary	Limited flight (1) or not (0)
Taxi distance	Numerical	Aircraft taxi distance based on runway and gate
Turning angle	Numerical	Sum of turning angle in degrees in the taxiing process
Month	Numerical	Month of the flight
Day	Numerical	Day of the flight
Weekday	Numerical	Weekday of the flight
Hour	Numerical	Hour of the flight
Minute	Numerical	Minute of the flight

Table A2. Features of airport operational information.

Features	Type	Description
Runway configuration	Categorical	Runway operating patterns based on wind direction, such as 25L\|07R, 07L\|25R, and 25R\|25R
Airport network	Categorical and Numerical	The layout of taxiways including the connection between the taxiing segments and the ID, length, and width of taxiing segments

Table A3. Features of weather conditions.

Features	Type	Description
Wind direction	Numerical	Average wind direction, in degrees
Wind speed	Numerical	Average wind speed, in meters per second
Direction change	Binary	Wind direction has changed significantly (1) or not (0)
Visibility	Numerical	Airport visibility, in meters
Precipitation	Categorical	Type of precipitation, such as rain, snow, and hail
Vision obstruction	Categorical	Type of vision obstruction, such as fog and haze
Cloud cover	Numerical	Number of units for cloud cover
Cloud ceiling	Numerical	Cloud ceiling, in meters
Temperature	Numerical	Airport temperature, in Celsius
Dew point	Numerical	Airport dew point, in Celsius
Pressure	Numerical	Air pressure, in hectopascals
Flag precipitation	Binary	Precipitation (1) or not (0)
Flag vision	Binary	Vision obstruction (1) or not (0)
Flag thunderstorm	Binary	Thunderstorm (1) or not (0)

Table A4. Features of traffic conditions.

Macroscopic-level
Features	Type	Description
NDepDep	Numerical	Number of other aircraft on the way to runway when current aircraft on the way to runway during a time period
NDepArr	Numerical	Number of other aircraft on the way to gate when current aircraft on the way to runway during a time period
NArrDep	Numerical	Number of other aircraft on the way to runway when current aircraft on the way to gate during a time period
NArrArr	Numerical	Number of other aircraft on the way to gate when current aircraft on the way to gate during a time period
Entire airport traffic flow	Numerical	Number of other aircraft on the way to runway or gate when currently on the way to runway or gate during a specific time period
AvgSpdLast5Dep	Numerical	Average speed of 5 latest departing aircraft
AvgSpdLast5Arr	Numerical	Average speed of 5 latest arriving aircraft
AvgSpdLast5	Numerical	Average speed of 5 latest aircraft
AvgSpdLast10Dep	Numerical	Average speed of latest departing 10 aircraft
AvgSpdLast10Arr	Numerical	Average speed of latest 10 arriving aircraft
AvgSpdLast10	Numerical	Average speed of latest 10 aircraft
Microscopic-level
Features	Type	Description
Taxi speed	Numerical	Taxi speed of aircraft on individual taxiing segment during a time period
Taxiing segments traffic flow	Numerical	Number of aircraft passed the individual taxiing segment during a time period

References

ICAO. ICAO Long-Term Traffic Forecast: Passenger and Cargo. Available online: https://www.icao.int/safety/ngap/NGAP8 Presentations/ICAO-Long-Term-Traffic-Forecasts-July-2016.pdf (accessed on 24 March 2024).
Association, I.A.T. International Air Transport Association Annual Review 2019. Available online: https://www.iata.org/en/publications/annual-review/ (accessed on 24 March 2024).
Eurocontrol. Airport Collaborative Decision Making (A-CDM) Implementation Manual. Available online: https://www.eurocontrol.int/publication/airport-collaborative-decision-making-cdm-implementation-manual (accessed on 24 March 2024).
Yim, W.F. The Next Generation of Air Transportation System (NextGEN). In Challenges and Advances in Sustainable Transportation Systems; ASCE Publishing: Reston, VA, USA, 2014; pp. 25–32. [Google Scholar]
Idris, H.; Clarke, J.P.; Bhuva, R.; Kang, L. Queuing Model for Taxi-Out Time Estimation; Technical Report; MIT Library: Cambridge, MA, USA, 2001. [Google Scholar]
Simaiakis, I.; Pyrgiotis, N. An analytical queuing model of airport departure processes for taxi out time prediction. In Proceedings of the 10th AIAA Aviation Technology, Integration, and Operations (ATIO) Conference, Worth, TX, USA, 13–15 September 2010; p. 9148. [Google Scholar]
Jordan, R.; Ishutkina, M.A.; Reynolds, T.G. A statistical learning approach to the modeling of aircraft taxi time. In Proceedings of the 29th Digital Avionics Systems Conference, Salt Lake City, UT, USA, 3–7 October 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1.B.1-1–1.B.1-10. [Google Scholar]
Srivastava, A. Improving departure taxi time predictions using ASDE-X surveillance data. In Proceedings of the 2011 IEEE/AIAA 30th Digital Avionics Systems Conference, Seattle, WA, USA, 16–20 October 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2B5-1–2B5-14. [Google Scholar]
Ravizza, S.; Atkin, J.A.; Maathuis, M.H.; Burke, E.K. A combined statistical approach and ground movement model for improving taxi time estimations at airports. J. Oper. Res. Soc. 2013, 64, 1347–1360. [Google Scholar] [CrossRef]
Lordan, O.; Sallan, J.M.; Valenzuela-Arroyo, M. Forecasting of taxi times: The case of Barcelona-El Prat airport. J. Air Transp. Manag. 2016, 56, 118–122. [Google Scholar] [CrossRef]
Lee, H.; Malik, W.; Zhang, B.; Nagarajan, B.; Jung, Y.C. Taxi time prediction at Charlotte Airport using fast-time simulation and machine learning techniques. In Proceedings of the 15th AIAA Aviation Technology, Integration, and Operations Conference, Dallas, TX, USA, 22–26 June 2015; p. 2272. [Google Scholar]
Lee, H.; Malik, W.; Jung, Y.C. Taxi-out time prediction for departures at Charlotte airport using machine learning techniques. In Proceedings of the 16th AIAA Aviation Technology, Integration, and Operations Conference, Washington, DC, USA, 13–17 June 2016; p. 3910. [Google Scholar]
Lee, H.; Coupe, J.; Jung, Y.C. Prediction of pushback times and ramp taxi times for departures at charlotte airport. In Proceedings of the AIAA Aviation 2019 Forum, Dallas, TX, USA, 17–21 June 2019; p. 2933. [Google Scholar]
Herrema, F.; Curran, R.; Visser, H.; Huet, D.; Lacote, R. Taxi-out time prediction model at Charles de Gaulle Airport. J. Aerosp. Inf. Syst. 2018, 15, 120–130. [Google Scholar] [CrossRef]
Diana, T. Can machines learn how to forecast taxi-out time? A comparison of predictive models applied to the case of Seattle/Tacoma International Airport. Transp. Res. Part E: Logist. Transp. Rev. 2018, 119, 149–164. [Google Scholar] [CrossRef]
Yin, J.; Hu, Y.; Ma, Y.; Xu, Y.; Han, K.; Chen, D. Machine learning techniques for taxi-out time prediction with a macroscopic network topology. In Proceedings of the 2018 IEEE/AIAA 37th Digital Avionics Systems Conference (DASC), London, UK, 23–27 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–8. [Google Scholar]
Pham, D.; Ngo, M.; Tran, N.; Alam, S.; Duong, V. A data-driven approach for taxi-time prediction: A case study of Singapore changi airport. In Air Traffic Management and Systems IV: Selected Papers of the 6th ENRI International Workshop on ATM/CNS (EIWAC2019) 6, Tokyo, Japan, 29–31 October 2019; Springer: Singapore, 2021; pp. 113–129. [Google Scholar]
Wang, X.; Brownlee, A.E.; Woodward, J.R.; Weiszer, M.; Mahfouf, M.; Chen, J. Aircraft taxi time prediction: Feature importance and their implications. Transp. Res. Part C Emerg. Technol. 2021, 124, 102892. [Google Scholar] [CrossRef]
Li, N.; Jiao, Q.Y.; Zhang, L.; Wang, S.C. Using deep learning method to predict taxi time of aircraft: A case of Hong Kong airport. J. Aeronaut. Astronaut. Aviat. 2020, 52, 371–385. [Google Scholar]
Nan, L.; Qingyu, J.; Xinhua, Z.; Shaocong, W. Prediction of Departure Aircraft Taxi Time Based on Deep Learning. Trans. Nanjing Univ. Aeronaut. Astronaut. 2020, 37, 232–241. [Google Scholar]
Jiao, Q.Y.; Li, N. Taxi Time Prediction by Using Data Driven Approach: A New Perspective. Available online: https://ssrn.com/abstract=4084964 (accessed on 24 March 2024). [CrossRef]
Lim, Y.; Tan, F.; Lilith, N.; Alam, S. Variable Taxi-Out Time Prediction Using Graph Neural Networks. In Proceedings of the 11th SESAR Innovation Days, Virtual Event, 7–9 December 2021; pp. 1–8. [Google Scholar]
Vargo, E.; Tien, A.; Jafari, A.; Hubbard, W. Airport Taxi Time Prediction and Alerting: A Convolutional Neural Network Approach. In Proceedings of the AIAA AVIATION 2022 Forum, Chicago, IL, USA, 27 June–1 July 2022; p. 4026. [Google Scholar]
Du, J.; Hu, M.; Zhang, W. Decision Support for Aircraft Taxi Time based on Deep Metric Learning. In Proceedings of the The 23rd IEEE International Conference on Intelligent Transportation Systems, Rhodes, Greece, 20–23 September 2020. [Google Scholar]
Du, J.; Hu, M.; Zhang, W.; Yin, J. Finding Similar Historical Scenarios for Better Understanding Aircraft Taxi Time: A Deep Metric Learning Approach. IEEE Intell. Transp. Syst. Mag. 2022, 15, 101–116. [Google Scholar] [CrossRef]
Wang, F.; Bi, J.; Xie, D.; Zhao, X. A data-driven prediction model for aircraft taxi time by considering time series about gate and real-time factors. Transp. A: Transp. Sci. 2023, 19, 2071353. [Google Scholar] [CrossRef]
Hao, J.; Zhao, T.; Li, J.; Dong, X.L.; Faloutsos, C.; Sun, Y.; Wang, W. P-companion: A principled framework for diversified complementary product recommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 2517–2524. [Google Scholar]
Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021, 596, 583–589. [Google Scholar] [CrossRef] [PubMed]
Jiang, W.; Luo, J. Graph neural network for traffic forecasting: A survey. Expert Syst. Appl. 2022, 207, 117921. [Google Scholar] [CrossRef]
Wang, Q.; Xu, C.; Zhang, W.; Li, J. GraphTTE: Travel time estimation based on attention-spatiotemporal graphs. IEEE Signal Process. Lett. 2021, 28, 239–243. [Google Scholar] [CrossRef]
Fang, X.; Huang, J.; Wang, F.; Zeng, L.; Liang, H.; Wang, H. Constgat: Contextual spatial-temporal graph attention network for travel time estimation at baidu maps. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 23–27 August 2020; pp. 2697–2705. [Google Scholar]
Fu, K.; Meng, F.; Ye, J.; Wang, Z. CompactETA: A fast inference system for travel time prediction. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 23–27 August 2020; pp. 3337–3345. [Google Scholar]
Derrow-Pinion, A.; She, J.; Wong, D.; Lange, O.; Hester, T.; Perez, L.; Nunkesser, M.; Lee, S.; Guo, X.; Wiltshire, B.; et al. Eta prediction with graph neural networks in google maps. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Online, 1–5 November 2021; pp. 3767–3776. [Google Scholar]
Jin, G.; Wang, M.; Zhang, J.; Sha, H.; Huang, J. STGNN-TTE: Travel time estimation via spatial–temporal graph neural network. Future Gener. Comput. Syst. 2022, 126, 70–81. [Google Scholar] [CrossRef]
Zou, G.; Lai, Z.; Ma, C.; Tu, M.; Fan, J.; Li, Y. When Will We Arrive? A Novel Multi-Task Spatio-Temporal Attention Network Based on Individual Preference for Estimating Travel Time. IEEE Trans. Intell. Transp. Syst. 2023, 24, 11438–11452. [Google Scholar] [CrossRef]
Rong, Y.; Yao, J.; Liu, J.; Fang, Y.; Luo, W.; Liu, H.; Ma, J.; Dan, Z.; Lin, J.; Wu, Z.; et al. GBTTE: Graph Attention Network Based Bus Travel Time Estimation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 4794–4800. [Google Scholar]
Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. itransformer: Inverted transformers are effective for time series forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. Stat 2017, 1050, 10–48550. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2016; pp. 770–778. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; PMLR: Westminster, UK, 2015; pp. 448–456. [Google Scholar]

Figure 1. An example of an airport surface network.

Figure 2. The framework of the proposed MT-DSTGAN model.

Figure 3. An example of the time series block function.

Figure 4. The spatial correlations between neighbor taxiing segments.

Figure 5. The temporal correlations between different time steps.

Figure 6. The joint dynamic spatio-temporal correlations between the hidden states of flight properties and the hidden states of taxiing segments (generated in the first stage) through which the taxi route passes and those of their neighbors.

Figure 7. The model structure of the proposed method MT-DSTGAN.

Figure 8. Comparison results of taxi time prediction based on different methods (50 randomly selected samples).

Table 1. An example of the processed dataset.

Aircraft Type	Departure/Arrival	Gate	Runway	Taxi Route	Start Time	End Time	Time Slot
A320	Departure	509	01	¹ R1	13 September 2017 09:35:05	13 September 2017 09:53:25	13 September 2017 09:40:00 ²
A319	Departure	805	36R	R2	13 September 2017 09:37:05	13 September 2017 09:58:45	13 September 2017 09:40:00
B747	Arrival	213	36L	R3	13 September 2017 09:38:05	13 September 2017 09:50:35	13 September 2017 09:40:00
...	...	...	...	...	...	...	...
B738	Arrival	213	36L	R3	18 September 2017 23:59:05	19 September 2017 00:10:35	19 September 2017 00:00:00
...	...	...	...	...	...	...	...

¹ denotes a sequence of links through which the assigned taxi route passes. ² denotes the temporal features of start time, the time interval set to 5 min. Time slot 13 September 2017 09:40:00 means that the time from 13 September 2017 09:35:00 to 13 September 2017 09:40:00.

Table 2. An example of entire airport traffic flow.

Time Slot ¹	Entire Airport Traffic Flow
13 September 2017 06:55:00	14
13 September 2017 07:00:00	15
13 September 2017 07:05:00	15
13 September 2017 07:10:00	15
13 September 2017 07:15:00	18
13 September 2017 07:20:00	18
13 September 2017 07:25:00	20
13 September 2017 07:30:00	20
13 September 2017 07:35:00	24
...	...

¹ In our dataset, which spans from 13 September 2017 to 19 September 2017, there are 2016 time slices with 5-minute intervals each.

Table 3. An example of taxiing segments traffic flow.

Time Slot\Link ID ¹	Link 1	Link 2	Link 3	Link 4	Link 5	Link 6	...	Link 300	Link 301	...
13 September 2017 06:55:00	0	0	0	0	0	0	...	1	0	...
13 September 2017 07:00:00	0	0	0	0	0	0	...	2	1	...
13 September 2017 07:05:00	0	0	0	0	0	0	...	2	1	...
13 September 2017 07:10:00	0	0	0	0	0	0	...	1	1	...
13 September 2017 07:15:00	0	0	0	0	0	1	...	3	3	...
13 September 2017 07:20:00	0	0	0	0	0	1	...	2	2	...
13 September 2017 07:25:00	0	0	0	0	0	1	...	2	2	...
13 September 2017 07:30:00	0	0	0	0	0	0	...	2	2	...
13 September 2017 07:35:00	0	0	0	0	0	0	...	3	3	...
...	...	...	...	...	...	...	...	...	...	...

¹ our dataset consists of a total of 484 links.

Table 4. Performance of MT-DSTGAN and baseline methods.

Method\Metrics	MAPE	RMSE (s)	±1 min	±3 min	±5 min
HA	11.48%	224.78	45.87%	81.65%	90.99%
DNN	10.63%	217.94	49.54%	82.65%	91.74%
GBRT	9.81%	173.70	44.20%	81.73%	93.08%
RF	7.07%	160.54	57.04%	87.41%	95.08%
MT-DSTGAN (this work)	6.27%	145.73	64.97%	90.82%	96.16%

underlined text denotes best performance.

Table 5. Performance of ablation experiment.

Method\Metrics	MAPE	RMSE (s)	±1 min	±3 min	±5 min
MT-DSTGAN_1	6.46%	174.39	66.22%	89.57%	94.91%
MT-DSTGAN_2	6.38%	169.58	66.81%	90.07%	95.41%
MT-DSTGAN_3	6.36%	162.95	65.97%	90.49%	95.83%
MT-DSTGAN (this work)	6.27%	145.73	64.97%	90.82%	96.16%

underlined text denotes best performance.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, X.; Yang, H.; Mao, Y.; Wang, Q.; Yin, S. Multi-Task Dynamic Spatio-Temporal Graph Attention Network: A Variable Taxi Time Prediction Model for Airport Surface Operation. Aerospace 2024, 11, 371. https://doi.org/10.3390/aerospace11050371

AMA Style

Yang X, Yang H, Mao Y, Wang Q, Yin S. Multi-Task Dynamic Spatio-Temporal Graph Attention Network: A Variable Taxi Time Prediction Model for Airport Surface Operation. Aerospace. 2024; 11(5):371. https://doi.org/10.3390/aerospace11050371

Chicago/Turabian Style

Yang, Xiaoyi, Hongyu Yang, Yi Mao, Qing Wang, and Suwan Yin. 2024. "Multi-Task Dynamic Spatio-Temporal Graph Attention Network: A Variable Taxi Time Prediction Model for Airport Surface Operation" Aerospace 11, no. 5: 371. https://doi.org/10.3390/aerospace11050371

APA Style

Yang, X., Yang, H., Mao, Y., Wang, Q., & Yin, S. (2024). Multi-Task Dynamic Spatio-Temporal Graph Attention Network: A Variable Taxi Time Prediction Model for Airport Surface Operation. Aerospace, 11(5), 371. https://doi.org/10.3390/aerospace11050371

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Task Dynamic Spatio-Temporal Graph Attention Network: A Variable Taxi Time Prediction Model for Airport Surface Operation

Abstract

1. Introduction

2. Literature Review

2.1. Taxi Time Prediction

2.2. Graph Neural Networks

3. Methodology

3.1. Problem Definition

3.2. Feature Extraction

3.3. Model Framework

3.4. Time Series Block

3.5. Dynamic Spatio-Temporal Learning

3.5.1. Spatial Attention and Temporal Attention

3.5.2. Three-Dimensional Cross Spatio-Temporal Attention

3.5.3. Positional Encoding

3.6. Dynamic Airport Traffic Flow Learning

3.7. Integration

3.8. Multi-Task Learning

4. Experiments

4.1. Data Preparation

4.2. Experiment Configuration

4.2.1. Evaluation Metric

4.2.2. Training Details

4.3. Comparative Baselines

4.4. Experimental Results

4.5. Ablation Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI