*3.1. Traffic and Trajectory Analysis*

In the first step of the evaluation, an analysis of the traffic data was carried out in order to determine appropriate research horizon and measure the lengths of the travelled trajectories. Figures 7–9 present in detail the results of the validation trials and effects described above.

**Figure 7.** Flight trajectory results obtained under the simulation conditions (green) and compared with real traffic reference values for flows of 40 aircraft per hour (blue) and 20 aircraft per hour (orange).

In Figure 7, the vertical axis presents the average distance flown within the radius of 100 NM from the EDDM reference point for arriving aircraft. The horizontal axis presents the results of validation trials executed by the five ATCOs (C1–C5) testing the two traffic scenarios, differing with distribution of 3D-FMS and 4D-FMS flights, where 30 and 60 correspond, respectively, to the S30 scenario and the S60 scenario. The results obtained during the simulation have been marked as a green line. They can be directly compared with real traffic data for a flow of 40 aircraft per hour (marked in blue—R1) and for a flow of 20 aircraft per hour (marked in orange—R2). These reference flown distances were calculated as averages based on 10 h of arrival traffic at EDDM extracted from the OpenSky database for both.

Treating that as a reference, it can be observed that even with a smaller number of total arrivals in R1 and R2, the introduction of innovative airspace structure, new FMS procedures and ATCOs supporting systems resulted in a reduction in flight distance for all ATCOs and all scenarios. In each case, the simulation results were lower than all reference values.

Figure 8 presents the distances flown by aircraft as cumulative occurrence curve divided into 5 NM lengths, where again the green line refers to the simulation results, and the blue and orange lines present the data of the reference scenarios. Within the simulation scenarios, the numbers of flights covering shorter distances were slightly higher than those in the reference scenarios. This was particularly evidenced by the first two peaks observed in Figure 8, which are substantially higher, representing well over half of scheduled flights (sum of 68.3%) that arrived at the airport in the range of 100–115 NM, in contrast to 36.9% (orange) in one reference. In addition to that, the real traffic data show that a significant amount of flights (corresponding to 25% of occurrences) needed a distance of 125 NM to reach the airport.

**Figure 8.** Cumulative occurrence curve for flight-distance results obtained in validation trials (green) and real traffic reference values for flows of 40 aircraft per hour (blue) and 20 aircraft per hour (orange).

The last set of results is related to the number of approach operations performed. This situation is reflected in the results presented in Figure 9. The figure displays a comparison between number of approaches executed in two simulation scenarios, where different distributions of 3D-FMS and 4D-FMS operations are analysed. The blue bars represent the numbers for the S30 scenario, and the orange bars correspond to the S60 scenario. The bars display that for each ATCO, a greater number of landed aircraft was recorded during the S60 scenario in comparison to the S30 scenario.

### *3.2. Mental Workload*

Alongside the capacity analysis, the mental workload analysis was conducted. Therefore, Figure 10 shows the mean ISA ratings depending on the share of 4D-FMS equipped aircraft (30% vs. 60% vs. 80%). The results were averaged for all participants and assessment times.

**Figure 10.** Mean ISA ratings in the different scenarios (S30 vs. S60 vs. S80) summarised over all participants and assessment times. Error bards represent standard deviations.

Mean ISA ratings were the highest in the S30 scenario, followed by the S60 scenario and then the S80 scenario. For the S30 and S60 scenarios, respectively, ISA ratings fell between 2 (relaxed) and 3 (comfortable), indicating a slightly lower than mid-level mental workload. For the S80 scenario, mean ISA ratings were below 2 (relaxed), pointing towards mental underloading. It is worth noting that the S80 scenario was slightly shorter than the others runs, hence the lower sample size.

Moreover, the results from NASA-TLX were assessed. Figure 11 shows the mean raw NASA-TLX scores for scenarios S30 and S60. The mean global score and all mean sub-scores were higher in the S30 scenario than in the S60 scenario, indicating higher overall workload in the S30 scenario than in the S60 scenario on a descriptive level. This is in line with the ISA ratings. Standard deviations were especially high for the sub-scales frustration and performance.

**Figure 11.** Mean raw TLX scores for the scenarios (S30 vs. S60 vs. S80). Error bars represent standard deviations.
