The following two kinds of disruptions occurred in the experiment: the two-hour disruptions performed from 08:00 to 10:00, and the four-hour weather event disruptions performed from 08:00 to 12:00. These intervals were selected due to their high traffic density. Observations from the Rapid Developing Thunderstorms satellite product, provided by the Nowcasting and Very Short Range Forecasting Satellite Application Facility, indicate that convective weather can occur at any time during the summer. The chosen time window allowed us to focus on a critical operational period, where demand frequently exceeds capacity, creating a relevant scenario for assessing the impact of disruptions. Six instances of the targeted weather events are performed:
Figure 6 shows the affected waypoints for each of these disruptions along with the computed normalised centrality sum, indicating the scale of each disruption. In the first two smallest instances, the affected waypoints are entirely in the southeastern part of the network, between London and mainland Europe. However, once 20 and more central nodes are disabled, the waypoints north of London are also affected. By the last two instances, almost all waypoints in the southeastern part of the network are affected, representing around 44% of the total centrality of the network.
Unlike the targeted disruptions, the random disruptions are not concentrated in particular locations, and instead affect many small parts of the network; thus, the values for the random disruptions are significantly smaller than for all the targeted ones. This indicates that while the random disruptions affect a similar amount of arcs, the affected arcs may not be in high-traffic areas.
4.1. Two-Hour Disruptions
The results from the two-hour weather event disruption are displayed in
Table 1. The first column indicates the type of disruption. The first character indicates whether the disruption is targeted, ‘t’, or random, ‘r’. The second character refers to the disruptions as indicated in
Figure 6 and
Figure 7. The last character indicates that it is a two-hour disruption. Therefore, for example, disruption “t-d-2” refers to the targeted two-hour disruption where the top 30 most central nodes are disabled.
There are no cancelled flights under the two-hour disruption events. This means that the model provides sufficient rerouting capabilities to manage a quite large weather event, even in the areas with the highest traffic. To analyse the impact of the scale of the disruption,
, on the ATFM-related KPIs, the relations are plotted on
Figure 8.
The plots show that with the increasing size of the disruption, the number of rerouted, ground-held and late-arriving flights increases as well. The number of rerouted flights increases very steeply initially; however, at larger values, the increase is very marginal. On the other hand, the number of delayed flights rises more steadily and does not stagnate as much, even at a large . This suggests that the model’s first priority is to reroute a flight and if that is not possible, the flight is delayed. Additionally, the number of flights experiencing departure and arrival delays is very similar, suggesting that ground holds, rather than air delays, dominate the total delay. This is made further evident by the cost breakdown, as departure delays account for over 95% of the total cost. Therefore, even though the number of rerouted flights is large, the rerouting is mostly insignificant in terms of the distance and flight duration change.
The largest cost increase is seen between disruptions “t-b-2” and “t-c-2”, which also marks the largest increase in
between the disruptions. Based on
Figure 6, “t-c-2” is the first instance where the disruption spreads to the north of the London area, hence impacting more flight routes.
Arrival punctuality is used to monitor the performance of the network and assess its resilience.
Figure 9 compares the punctuality and average arrival delays of flights for the different disruptions. The punctuality performance is directly related to the scale of the disruption,
, with the trough of the punctuality decreasing as disruption gets more significant. Interestingly, the time at which the performance reaches its low point and starts to recover is very consistent across all targeted disruptions. The rate of the decline of the performance is also consistent once the disruption reaches a scale of
. Overall, even in the worst case two-hour weather event, the punctuality stays above 80% and recovers to around 90% by the end of the day. For reference, from 1 October 2023 to 1 June 2024, the average daily arrival punctuality in the UK was at around 70%. Therefore, despite the two-hour disruption to the highest-traffic areas, the network is able to perform quite well in terms of arrival punctuality of the whole day of operations. The average arrival delay is also under 15 min for the whole day of operations. However, if only considering the delayed flights, the average arrival delay is around 58 min with a standard deviation of 28 min. This is also consistent across all scales of targeted disruptions. In contrast, for the random disruptions, the punctuality rate is barely affected and the average delay for the whole day of operations is very small. This can be attributed to several factors. A primary reason is the network’s inherent design, characterized by multiple interconnected nodes and flexible routing options. This interconnected structure facilitates rerouting and traffic redistribution when certain sectors face disruptions, effectively mitigating their impact. Additionally, the relatively short duration of these disruptions allows the network to maintain its capacity to absorb and adapt effectively. Together, these factors contribute to the network’s ability to maintain efficiency despite disruptions.
The punctuality curves can be used to compute the GRI as described in
Section 3. The obtained values are plotted with a curve of best fit on
Figure 10. The SWD is plotted in the same manner on
Figure 11. These plots show that there is a strong negative relationship between the GRI and the scale of the disruption. Conversely, there is a strong positive relationship between the SWD and the scale of the disruption.
Based on these metrics, the model can help recover from random disruptions, but targeted disruptions have a much greater impact on system performance. At high values, there is no significant change in GRI. This indicates that when the scale of disruption exceeds a certain threshold, the available airspace is limited and cannot accommodate a large volume of traffic. Due to this inherent constraint, the model’s ability to provide recovery based on the disrupted airspace is limited.
4.2. Four-Hour Disruptions
The results from the four-hour weather event disruption are displayed in
Table 2.
The main difference with the two-hour disruption is that, even at the smallest scale of disruption and even for random disruptions, the model can only maintain operations by cancelling flights. The results show that all flight cancellations occur within the first two hours of the four-hour storm. Considering the largest disruption, “t-f-4”, out of the 604 flights between 08:00 and 10:00, 195 get cancelled. This is a very large portion of all flights; hence, it can be concluded that the network is very susceptible to extended weather events in the high-traffic areas. Based on the model’s constraints, flights have a maximum ground delay period of two hours, after which flights are cancelled. This could explain why no cancellations occur for the two-hour disruption and why all the cancellations occur in the first two hours of the four-hour event. During the shorter disruption, a lot of the flights are able to essentially wait out the storm, and depart once the capacities have returned to normal. However, with the extended weather event, these flights would exceed their maximum ground-holding time and be cancelled instead.
The punctuality rates and the average arrival delays are plotted on
Figure 12.
The observed patterns are similar to the two-hour event. The punctuality rates hit the low point around 80 min after the beginning of the disruption, just as with the two-hour disruption. Again, the performance is closely correlated to the scale of the disruption. The punctuality rates observed are slightly higher than in the two-hour event; however, this is because cancelled flights are excluded from the calculations. Therefore, in terms of the punctuality and delay, the network performance is similar to that with the shorter disruption. In fact, when considering the average delay of delayed flights, the average is lower than for the two-hour disruptions. The arrival delay average is 56 min across all disruption scales, which is 6 min less than observed in the two-hour disruptions. The standard deviation is similar, at 31 min. However, for the random disruptions, the average arrival delay of delayed flights is 45 min, which is 20 min higher than the observation seen during the two-hour disruptions.
Overall, as with the two-hour event, the random disruptions have a significantly lower impact than the targeted disruptions. The relationship between the scale of the disruption and the GRI, as well as the SWD, is plotted on
Figure 13 and
Figure 14. Once again, there is a strong correlation between the scale of the disruption and the performance indicators. However, for the GRI, the difference between the random and targeted disruptions is not as large as shown in the two-hour disruption results. Nonetheless, the aforementioned phase transition is still visible. Note in
Figure 13 that as
increases, GRI generally exhibits a declining trend, but there are instances where it rises. This increase in GRI with higher
values is primarily due to the nature of random disruptions. Unlike targeted disruptions, which focus on high-traffic areas or critical nodes, random disruptions are dispersed throughout the network. Consequently, although a comparable number of arcs with specified
values are impacted, these arcs are not necessarily associated with high-traffic or strategically critical areas. This dispersion can lead to an increase in GRI even as
rises.
Considering the total cost, the four-hour weather events are significantly more disruptive to the system than the two-hour events. Even certain four-hour random disruptions can be more detrimental than targeted two-hour disruptions. The results indicate that even for a two-hour disruption targeting high-traffic areas, the model can effectively assist the system in gradually resuming normal level operations. Even at the maximum disruption scale, the model can optimize the schedule without any flight cancellations. However, for weather events lasting four hours, the model’s optimization capacity significantly decreases, leading to a substantial increase in the total number of flight cancellations and associated costs. Flight cancellations may occur even in areas that are not high-traffic zones.