*3.3. Air Traffic Database*

The last step in determining the input variables is to add to the temporal variables that allow the air traffic to be evaluated to the time variables. To do this, the first step is to structure the traffic according to its main traffic flows using the CRIDA methodology.

Once the traffic has been structured into flows, matrices will be filled in with information on this traffic, estimated before the operation. These matrices are presented in Figure 3.



**Figure 3.** Air Traffic arrays.

In matrices such as the one on the left, the following information shall be included:


With all the information analysed, the input information of the model is completed. Therefore, the complete format of the model is presented in Table 2 with the addition of an example use case. In this case, an example has been made in which the variables are aleatory. It does not represent a real case, but this example is used to observe the format of the data for the generation of the machine learning model. The values of the variables will vary depending on the actual operating scenario:

**Table 2.** Format of machine learning model.


With Flow1, ... , FlowN being the identifiers of the flows within the sector, and N is the total number of flows. It shall be identified with 0 when there are no aircraft within the sector and with 1 when there are aircraft. Similarly, 0 shall identify when the sector is not regulated and 1 when it is regulated.

#### *3.4. Machine Learning Model Evaluation*

With the machine learning model in place, part of the methodology is to determine which methods will best evaluate the performance of the model. Evaluating a machine learning model is a fundamental step. In machine learning applications in ATM, it is as important to correctly evaluate the model as it is to emphasise its explainability [30]. The methods for evaluating and validating the model, based on a binary classification, are presented below.

The first method of evaluating the model will be Accuracy. This was the most important criterion for determining the performance of a classification algorithm, which showed the percentage of the proper classification of the total set of the experimental record [31]. The Accuracy formula is:

$$Accuracy = \frac{TP + TN}{TP + TN + FP + FN} \tag{1}$$

In Table 3, the meaning of *TP*, *TN*, *FP*, and *FN* is explained:

**Table 3.** Accuracy description.


In this case, T and F are "True" and "False". This classification is used in the literature to represent which elements of the training set are correctly (True) or not correctly (False) evaluated by the machine learning model or not. The terms "Negative" and "Positive" refer to whether the sector is not (Negative) or is (Positive) regulated.

This indicator will give a picture of the overall performance of the application. Based on applications of a similar nature, an Accuracy threshold of 0.85 out of 1 is set [32]. As Accuracy is defined as correctly classified cases among total cases, the maximum Accuracy is 1 (corresponding to all the elements having been correctly classified). In the literature studied, a value that is considered to say that a machine learning model is good at classifying is that it classifies 85% of the cases correctly [32]. However, since it is measured over one, this 85% corresponds to 0.85.

This limit is independent of the training of the model. The model is trained and tested independently. Subsequently, the Accuracy of the model is obtained and compared with the defined threshold. If the Accuracy of the model is greater than 0.85, the model is considered valid. If the Accuracy of the model is less than 0.85, the model is not valid and cannot be used in real applications.

To evaluate the two labels independently, the Recall, Precision, and F1-score parameters for each of the classes are added to the analysis for each of the classes [33] (0 when the sector is not regulated and 1 when the sector is regulated). Recall indicates the ability of the algorithm to accurately detect when the sector will be regulated or not. The Precision indicates the ability of the algorithm to detect the categories. The F1-score is a harmonic mean of the Recall and the Precision [31]. The formulae are [34]:

$$Recall = \frac{TP}{TP + FN} \tag{2}$$

$$Precision = \frac{TP}{TP + TN} \tag{3}$$

$$F1-score = \frac{2 \star Recall \star Precision}{Precision + Recall} \tag{4}$$

In addition, to represent this information in a visual and summarised form, the confusion matrix is presented [35].

Moreover, emphasis will be placed on the explainability of the model. This analysis will provide insight into the learning and prediction process of the developed model. In this case, the analysis of explainability was carried out with graphs made in the Shapley Additive exPlanations (SHAP) library. This library is used for the explainability of machine learning models and its use is widespread in the industry [36,37].
