*4.2. Machine Learning Model Evaluation*

Once how both traffic and regulations behave in LECMPAU in 2019 has been studied, the machine learning model that will attempt to predict when the sector will be regulated is evaluated. For this model, the total dataset has been divided into 80% for training and 20% for testing. This ratio is often used by machine learning models developed for different academic fields [38]. Firstly, the Accuracy of the model is shown to test the model in general.

$$Accuracy = 0.868\tag{5}$$

The Accuracy is above 0.85, so the model seems to work correctly according to the established standards. However, to evaluate the model more specifically, the Confusion Matrix is presented in Figure 7 and the indicators related to the Confusion Matrix are in Table 4. The confusion matrix is a visual indicator that simply indicates the number of cases where the model predicts whether the sector is regulated (1) or not (0) and compares

it with the actual labels. This indicator is complementary to those defined in Section 3.4 and presents the same information visually.

**Figure 7.** Confusion matrix of unbalanced machine learning model.



These indicators show that the model only predicts well when the sector is not regulated. Since the sample is so unbalanced, and most of the time the sector is not regulated, the model normally predicts that the sector will not be regulated. In doing so, the model is mostly correct, giving an Accuracy above the minimum. However, the model is influenced by the imbalance of the sample and the indicators of when the sector is regulated are well below what is considered correct.

As the sample is highly unbalanced, it has been decided to balance it with Synthetic Minority Oversampling TEchnique (SMOTE) to generate minority class samples [39]. This model creates samples of the minoritarian class based on the behaviour of contiguous elements (neighbours). In particular, the creators of the method state: "synthetic samples are generated in the following way: Take the difference between the sample under consideration and its nearest neighbor. Multiply this difference by a random number between 0 and 1 and add it to the feature vector under consideration. [40]".

With this sample balancing, the Accuracy of the model is:

$$Accuracy = 0.917\tag{6}$$

Accuracy has increased by 6%. This, in advance, makes the model better beforehand. However, it is necessary to check that the model acts correctly when predicting for each of the classes. For this purpose, Figure 8 shows the Confusion Matrix, and Table 5 the Classification report.

These indicators are all above 0.9, exceeding the 0.85 set, so the performance of the model is very good both in predicting that the sector will be regulated and in predicting that it will not be regulated.

**Figure 8.** Confusion matrix of balanced machine learning model.

**Table 5.** Classification report of balanced machine learning model.


The balanced machine learning model seems to have found behavioural patterns in seasonality or air traffic and correctly predicts when the sector will be regulated. Therefore, with these results obtained, this model can be validated for the case of the LECMPAU sector.
