In order to evaluate the performance of the proposed prediction model, we conduct our study at Qingdao Jiaodong International Airport (IATA: TAO; ICAO: ZSQD). Operations/stand allocation are provided by the airport authority/AODB, while meteorology comes from CMA METAR/TAF with coding rules following CMA documentation; timestamps are converted to Beijing Time for consistency. There is a total of 80 terminals scattered in the U-shaped corridor bay area, and the five corridors have 10, 10, 20, 20, and 20 terminals. Three types of data are used in this case study. We use a 12-month span from 1 January 2023 to 31 December 2023, sampled at a 30 min resolution. The weather phenomenon in the meteorological message is extracted, the world coordinated time of the message is transformed into the local time of the airport meteorological observation station, namely Beijing time, and the semantic transformation is carried out. For part of the meteorological data, the official documents of the China Meteorological Administration and expert experience suggestions are referred to divide them into corresponding grades and encode them. It includes real number coding, integer coding, and 0–1 type data coding. Let T denote the number of half-hourly timestamps after alignment. With look-back L = 24, horizon H = 6, and stride s = 1, for a complete 2023 calendar (T = 17,520), = 17,491 per stand; with N = 80 stands, that yields up to 1,399,280 windows before filtering. We report effective train/val/test counts after removing windows overlapping gaps and after chronological splitting (60/20/20, consistent with our experimental setup). We apply a leakage-safe pipeline: forward fill (≤2 steps), KNN imputation (k = 5), winsorization at 1%/99%, circular encoding of wind direction, and standardization fitted on the training set only. The actual capacity values are derived from historical hourly arrival/departure statistics at the target airport, which were preprocessed for noise filtering and time alignment.
3.1. Experimental Environment and Parameter Configuration
The model implementation was based on Python 3.9 and PyTorch 2.1.0, leveraging PyTorch Geometric for graph-based learning components. Feature importance and selection were performed in the preprocessing phase using XGBoost (v1.7.6) in conjunction with SHAP (v0.41.0), both of which are widely adopted in interpretable machine learning pipelines.
The proposed ST-GTNet model incorporates multi-layer Graph Convolutional Networks (GCNs) followed by temporal encoding through a Transformer architecture. Specifically, the GCN module comprises two stacked layers, each with 64 hidden units and ReLU activation, enabling the extraction of localized spatial dependencies within the airport network topology. The Transformer module consists of two encoder layers, each employing 4 attention heads and a hidden dimension of 128, designed to capture long-range temporal dependencies from the sequence of historical features.
The model was trained using the Adam optimizer with an initial learning rate of 0.0005. A cosine annealing scheduler with warm restarts was employed to dynamically adjust the learning rate during training, following best practices for spatiotemporal sequence modeling. The batch size was set to 32, and dropout with a rate of 0.25 was applied to mitigate overfitting. The maximum number of training epochs was fixed at 20, with early stopping enabled (patience = 5) based on validation loss to ensure generalization and avoid unnecessary computation.
All experiments were repeated with three random seeds to ensure statistical robustness and reproducibility. Model performance was evaluated using Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination () on both takeoff and landing capacity prediction tasks.
3.4. Benchmark Models Comparison
In this section, the performance of the proposed ST-GTNet model is compared with three widely used benchmark models: Long Short-Term Memory (LSTM), the Gated Recurrent Unit (GRU), and the Transformer. These models are chosen for their proven success in sequential forecasting tasks. The goal is to demonstrate the advantages of the ST-GTNet in airport capacity forecasting by integrating both spatial and temporal dependencies.
Hardware and protocol: NVIDIA A100, 40GB; PyTorch 2.1.0; CUDA 12.1; fixed seed; 50 warm-ups; average over 1000 forwards; torch.cuda.synchronize () before/after timing; and identical input shape across models (N ≈ 80, L = 24, batch 32).
To compare the differences between predicted values and real values of the ST-GTNet and the benchmark models, three standard indicators are used:
Mean Absolute Error (MAE):
Mean Absolute Percentage Error (MAPE):
Root Mean Square Error (RMSE):
As shown in
Table 5, the ST-GTNet combines Graph Convolutional Networks (GCNs) and the Transformer to capture spatial dependencies between airport gates and terminals while also modeling temporal dependencies across the time-series data of flight schedules. By leveraging both types of dependencies, the ST-GTNet outperforms traditional models, which typically focus only on temporal dependencies.
LSTM is a well-established model that captures long-range temporal dependencies in sequential data through a gating mechanism [
22]. While effective for time-series forecasting, it does not model spatial interactions between entities (e.g., gates and terminals) in the airport network, which limits its performance in tasks where spatial context is crucial [
23].
The GRU is a simplified version of LSTM, with fewer gates, making it computationally more efficient [
24]. It has similar performance to LSTM on many tasks, but, like LSTM, it does not account for spatial dependencies, which are critical in airport capacity prediction [
25].
The Transformer utilizes self-attention to capture long-range temporal dependencies [
26]. It is highly efficient in learning complex sequential patterns and has become a dominant model for tasks that require processing long sequences of data. However, like LSTM and the GRU, it does not incorporate spatial relationships and is limited in tasks where spatial context is essential [
27].
In addition to the classical sequence models, we incorporated three recent spatiotemporal learning approaches as benchmarks: the MF-Transformer, GAT-LSTM, and the DMCSTN. These models represent the state-of-the-art in dynamic capacity estimation and spatiotemporal modeling.
The MF-Transformer extends the traditional Transformer by incorporating multi-feature fusion across meteorological and operational variables [
11].
GAT-LSTM leverages graph attention mechanisms to model spatial dependencies among gates and combines them with temporal learning via LSTM [
12].
The DMCSTN (Dynamic Multi-Graph Convolutional Spatiotemporal Network) utilizes dynamic graph convolution along with hierarchical temporal attention to adaptively learn from evolving spatiotemporal patterns [
13].
All models were trained using the same set of six input features, with a sliding window of historical observations and consistent gate-level adjacency graphs.
Table 4 summarizes the performance metrics across four representative months. Results show that the MF-Transformer, GAT-LSTM, and the DMCSTN outperform traditional temporal models (LSTM, GRU, Transformer), confirming the advantage of incorporating spatial dependencies and enhanced attention mechanisms. Among them, GAT-LSTM achieves the lowest RMSE among the new benchmarks, but the ST-GTNet consistently surpasses all others, achieving an average improvement of 12.8% in RMSE over GAT-LSTM and 20.2% over the MF-Transformer. These results validate the effectiveness of the ST-GTNet’s unified spatiotemporal representation and interpretable structure, especially under complex and dynamic traffic conditions such as in September and December.
We train three controlled variants under identical settings: w/o the GCN (Transformer-only), w/o the Transformer (GCN-only), and the full ST-GTNet. Metrics are reported per month (
Table 6). Given MAE
, we attribute the improvement of the full ST-GTNet over the single-module baselines via complementary marginal gains:
Results (MAE), using the values in
Table 6 (
), are as follows:
March: 1.30/1.38/1.09 → = 0.42; = 0.58;
June: 0.75/0.78/0.54 → 0.47/0.53;
September: 1.77/1.91/1.41 → 0.42/0.58;
December: 1.72/1.84/1.39 → 0.42/0.58.
Average: = 0.43; = 0.57. RMSE yields a similar split (≈0.42/0.58). These ablations confirm that the GCN and the temporal-only Transformer make complementary contributions.
3.5. Performance Evaluation
To assess the predictive effectiveness of the proposed ST-GTNet model in dynamic airport capacity estimation, a comprehensive evaluation is conducted against seven benchmark models: LSTM, the GRU, the Transformer, the MF-Transformer, GAT-LSTM, the DMCSTN, and the actual capacity series. Four representative months—March, June, September, and December—are selected to capture seasonal variability. Three standard metrics are used: Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). The results are summarized in
Table 6, and time-series prediction performance is visually illustrated in
Figure 5. The actual capacity values are derived from real-world historical hourly arrival/departure rates at the target airport, after data cleaning and temporal alignment.
As shown in
Table 6, the ST-GTNet achieves the best performance across all metrics and all months. In March, the model records a MAE of 1.09, RMSE of 1.37, and MAPE of 7.9%, compared to LSTM (1.60, 2.10, 12.1%), the GRU (1.42, 1.78, 10.5%), and the Transformer (1.30, 1.66, 9.2%). Other advanced models such as the MF-Transformer, GAT-LSTM, and DMCSTN exhibit MAEs between 1.18 and 1.22 and RMSEs between 1.49 and 1.55, still falling short of the ST-GTNet.
In June, a relatively stable operational month, the ST-GTNet achieves outstanding results with a MAE of 0.54, RMSE of 0.68, and MAPE of only 2.4%, significantly outperforming LSTM (0.95, 1.18, 4.3%) and the Transformer (0.75, 0.93, 3.2%). Competing models such as the GAT-LSTM and DMCSTN yield RMSEs of 0.84 and 0.86, respectively, but do not match the overall accuracy of the ST-GTNet.
In September, where dynamic fluctuations are more pronounced, the ST-GTNet maintains superior robustness with a MAE of 1.41, RMSE of 1.77, and MAPE of 12.8%, which are lower than that of the GRU (1.90, 2.39, 17.4%), Transformer (1.77, 2.21, 15.8%), and DMCSTN (1.63, 2.01, 14.1%).
In December, the ST-GTNet continues to lead with a MAE of 1.39, RMSE of 1.74, and MAPE of 8.0%, again outperforming the best baselines such as the GAT-LSTM (1.56, 1.94, 8.5%) and DMCSTN (1.59, 1.98, 8.7%).
As shown in
Figure 5, on average across the four months, the ST-GTNet achieves substantial reductions in RMSE when compared to all baseline models: 34.6% lower than LSTM, 26.3% lower than the GRU, 21.2% lower than the Transformer, 13.2% lower than the DMCSTN, and 11.5% lower than the GAT-LSTM. These consistent improvements demonstrate the proposed model’s strong generalization capability across both stable and high-variance seasonal conditions.
In addition to the quantitative metrics,
Figure 5 presents a full time-series comparison of predicted capacity versus actual values over 1500 timesteps (each representing a 30 min interval) for the four selected months. To enhance interpretability, the figure is presented in two stacked panels. The top panel compares the ST-GTNet with three classical models—LSTM, the GRU, and the Transformer—while the bottom panel shows the ST-GTNet alongside three advanced baselines—the MF-Transformer, GAT-LSTM, and the DMCSTN.
In both panels, the gray curve denotes the actual observed airport capacity, while each colored curve represents the output of a specific model. The ST-GTNet (highlighted in bold red) consistently exhibits the closest alignment with the actual series across all four months. It accurately captures daily periodicity, multi-peak patterns, and sudden capacity changes, particularly during morning ramp-ups and evening slowdowns. In contrast, traditional models like LSTM and the GRU show visible time lags during sharp transitions and tend to oversmooth the amplitude of fluctuations. The Transformer improves short-term tracking but introduces instability in low-capacity zones. Among the advanced models, GAT-LSTM and the DMCSTN reduce lag but still miss peak alignment and underestimate volatility in some intervals.
The performance differences are especially notable during high-variance periods, such as midday peaks in September or morning troughs in December. The ST-GTNet maintains both temporal responsiveness and amplitude accuracy, reinforcing the numerical findings in
Table 4 and confirming its robustness for real-time, fine-grained airport capacity forecasting.
Model-level complexity. Let be the number of gates, the look-back, the hidden width, the average degree of the sparse graph, and hhh the number of heads. Per inference step (predicting one 30 min horizon vector),
GCN (2 layers, per slice):
.
Temporal-only Transformer (2 layers): per layer; memory for attention maps.
By confining attention to the temporal axis and using a sparse physical graph, the ST-GTNet avoids the attention cost of fully spatiotemporal transformers.
Concrete deployment shape (TAO case). . The total arithmetic is multiply–adds per step; attention memory ≈ elements (≲1 MB FP32).
Measured latency and resources. On NVIDIA A100-40GB, the ST-GTNet runs at 0.98 ms/step (batch = 1) with a parameter count of 0.302 M. This is orders of magnitude below the decision cadence (rolling 30 min updates) and leaves ample headroom for what-if simulation or multi-terminal batched scoring [
29].
Real-time terminal-area prediction has been emphasized by, e.g., MST-WA (Zeng et al., AEI 2024), which integrates multimodal weather and spatiotemporal dependencies [
30]. Our architecture is complementary: by keeping attention temporal-only and spatial modeling GCN-based, we achieve lower asymptotic and empirically measured latency while retaining interpretability through SHAP-guided features.
The effectiveness of the ST-GTNet is primarily due to its integration of spatiotemporal graph modeling and temporal attention mechanisms. These enable the model to capture nonlinear dependencies across input features while remaining sensitive to short-term variations. Moreover, the model balances accuracy and computational efficiency, making it suitable for real-time capacity forecasting in airport operations.
In summary, both the quantitative results in
Table 4, rolling Inference in Algorithm 3 and the visual alignment shown in
Figure 5 confirm that the ST-GTNet significantly outperforms existing models in terms of accuracy, stability, and adaptability, making it a highly effective tool for dynamic airport capacity prediction across diverse operational scenarios.
Algorithm 3. Rolling Inference. |
1 loop: |
2 x_now ← FetchLatest() |
3 x_now ← ForwardFillOne(x_now, max_gap = 2) → KNNImpute(k = 5) |
4 x_now ← EncodeWindDir(x_now) → Winsorize(1%,99%) |
5 x_now ← Apply(scaler, x_now) → KeepFeatures(x_now, F_sel) |
6 X_ctx ← Push(X_ctx, x_now, maxlen = 24) |
7 if len(X_ctx) = =24: |
8 H_ctx ← [GCN_stack(X_ctx[t], ) for t = 1..24] |
9 Z_ctx ← Transformer_timeonly(H_ctx) |
10 Ŷ_next ← DecoderMLP(Z_ctx) # predicts next H = 6 steps |
11 Emit(Ŷ_next); SleepUntilNext30 min() |