Next Article in Journal
Application of Adaptive Weighted Strong Tracking Unscented Kalman Filter in Non-Cooperative Maneuvering Target Tracking
Next Article in Special Issue
Distributed Conflict Resolution at High Traffic Densities with Reinforcement Learning
Previous Article in Journal
Existence and Control of Special Orbits around Asteroid 4 Vesta
Previous Article in Special Issue
Natural Language Processing of Aviation Safety Reports to Identify Inefficient Operational Patterns
 
 
Article
Peer-Review Record

Study of the Impact of Traffic Flows on the ATC Actions

Aerospace 2022, 9(8), 467; https://doi.org/10.3390/aerospace9080467
by Guillermo Gutiérrez Teuler 1,*, Rosa María Arnaldo Valdés 1, Victor Fernando Gómez Comendador 1, Patricia María López de Frutos 2 and Rubén Rodríguez Rodríguez 2
Reviewer 1:
Reviewer 2:
Aerospace 2022, 9(8), 467; https://doi.org/10.3390/aerospace9080467
Submission received: 7 July 2022 / Revised: 10 August 2022 / Accepted: 18 August 2022 / Published: 22 August 2022
(This article belongs to the Special Issue Advances in Air Traffic and Airspace Control and Management)

Round 1

Reviewer 1 Report

The aim of this is to predict the controller’s action to know his workload. It has certain research significance and application value.

(1)     In section 2.3, the authors have mentioned that “Two algorithms have been used in this work: Random Forest and XGBoost. Both of which are decision trees algorithms. There are other interesting machine learning algorithms which have not been used in this work, such us neural networks [18] 155 and support vector machine (SVM) [19]. These algorithms have been used successfully in 156 the ATM field [20,21].” . As far as I know, SVM and NN should work better than random forest method, why is there no comparison?

(2)     In the abstract, the authors also mentioned that “Several machine learning models are tested and two are chosen.” If you have tested several methods, I recommend that the experimental results of other methods be analyzed and discussed as well.

Author Response

First, I would like to thank you for your comments. I am happy to know that you liked our paper and think that it has research significance. I will now proceed to response to your comments.

The changes made based on your comments are coloured in blue in the new version of the paper. They have been pasted here too.

1. You are right, a comparison has been added explaining why we did not use those algorithms in our work. It was not clear enough.

"Neural networks are considered deep learning algorithms. Deep Learning is a type of machine learning whose algorithms are inspired by the human brain, mimicking the way biological neurons signal each other. They have not been chosen in this work because they usually decimate the interpretability of the features to the point where they become meaningless. Instead, we wanted to focus on explainability, as it is more relevant for practical use in the ATM field."

"SVM are well suited for the classification of complex datasets but because of the tabular nature of our data, Random Forest was way more accessible for creating the ML models. Early exploratory data analysis showed that the data was not very spare and easy to classify. The best approach was not clear, and SVM were discarded."

2. We did not express ourselves correctly. We were talking about different trained models combining several features and algorithms. The used algorithms are random forest and XGBoost, the ones explained in Materials and methods in section 2.3.

The phrase has been properly reworded as follows: "Several machine learning models are tested to try different combinations of features and the selected
algorithms and two models are finally chosen."

 

 

Reviewer 2 Report

Overall

An interesting paper in an important research area. I believe that it does add value to the field, however, some additional consideration and justification of some of the arguments and scientific basis is required. Improvement of the grammar and presentation of paper is also needed prior to it being acceptable for publication in my opinion. Please see my general and specific notes below for your consideration. Thank you for the interesting read and good luck!

General comments:

I would recommend the use of gender-neutral pronouns throughout, e.g. they/them/their rather than he/him/his.

English grammar generally requires some work (see specific examples below), particularly regarding tenses and plurals. Tenses example, line 225: to see what patterns the machine has learned (past) and how it calculates the predictions (present).

Try to avoid the use of ambiguous text, and always be specific. For example, line 66: “A model is built”, by who? And when? Why is this relevant?

Ensure acronyms are expanded when they are first used during the paper.

Generally, I find the introduction confusing. It’s not clear what is previous work and what are the aims of the current paper. It’s also not particularly clear how you have learnt from that previous work to propose this novel approach.

I think Section 2 provides a good overview of the available models for someone who is not an expert in ML. Although it’s not always clear why some approaches have been chosen over others.

In Section 3, I strongly approve of the use of exploratory data analysis before apply ML models.

It’s not clear if you’ve used the same set of data to train the model and then validate it’s accuracy. Best practice when training ML algorithms is to split the data into training at testing data sets, can you please confirm if that’s the approach you took?

Generally, the figures need improving as they are not very clear in some cases.

Section 4: Very well written. I still think there is a missing link between the number of actions and the complexity of each action.

Specific comments:

Line 2 “estimate controller workload”

Line 2 “So far, works of research have…”

Line 3 “We have enough data to be…”

Line 14: the complexity and number of controllers tasks is what drives workload, not just complexity.

Line 19: It’s not clear what “the objective” refers to. The previous study in 3?

Line 26: More accurate should be defined. Accurate in what sense?

Line 27: It’s not clear why you have chosen to focus on performance measures if other measures are better

Line 40: Sentence starts “First an investigation…” but I don’t see a list of points.

Line 44: should it say, “machine learning has been used”?

Line 45: not sure what “it” refers to. The previous reference?

Line 50: some context regarding the planner / executive / controller teams would be useful for the general reader.

Line 51: Duc-Thinh Pham <- reference required

Line 68: SHAP (also used in this work) and LIME <- are these acronyms? They should be spelt out on first read.

Line 72: don’t capitalise ‘The’

Line 76/77: Predicting actions is not a metric itself. You could say something like predicting actions to estimate future workload would be useful metric to ATM.

Line 89: “from four sectors” will mean nothing to readers who don’t have an ATC background.

Section 2.1.1: Flows of traffic which are used very infrequently (or traffic that is not following a defined flow) can cause the most complexity and increase workload considerably due to their irregularity, so please consider this factor.

Section 2.1.3: This data is recorded manually? I’m quite surprised that the system doesn’t record this.

Section 2.3: Why Random Forest/XGBoost and not some of the other available models?

Line 234: what is API?

Line 280: should be “at the same flight level”

Section 3.1.1. I don’t think the trends that you describe in the text are obvious from Fig 1, all sectors appear to have both entry and exit between FL0 and FL500. You could show a distribution of the change in FL, e.g. entry FL – exit FL to show the average ascending vs descending traffic (I now see you’ve done this for Fig 5.). There will be other ways too so please consider this.

Section 3.1.2. What do you consider to me an ‘action’ is this a single instruction (ATC clearance) given to an aircraft e.g. climb to FL300? Sometimes ATCOs will instruct an aircraft to follow to actions at the same time e.g. climb to FL300 and follow heading 270. Would that me one or two actions? Ultimately, some actions are more complex than others.

Figure 2: the colours on the bar chart seem wrong. On the 3 and 4 event bars I cannot see blue outline. Is cruise the top of bottom portion of the bar? Shouldn’t your Y axis be ‘number of aircraft’ rather than probability, so the reader can infer the total number of aircraft given 1, or 2, or 3+ clearances?

Section 3.1.3. Do you need this section? I think if you add more information to Section 3.1.1. then you don’t need this short paragraph.

Figure 3: again, I don’t think this is the most appropriate type of chart to show the trends. I believe a density function or histogram would represent the data better. It is unlikely that you would see aircraft cruising below FL300, but I see a number of outliers in the diagram – are you able to explain these?

Line 318: what do your codes mean?

Line 323: why do you need new variables?

Line 328: Accuracy in what sense?

Figure 5: really hard to interpret. Why not split into 4 graphs like Figure 1?

Section 3.2.2.: How did you decide flight times > 1500 s are outliers? Is this based only on visual observation? You could look at the size of the sector also. Why would some flights be up to 3,500? Are these training flights that circle within the sector for an extended period of time or are they erroneous data points?

Line 350/351: I’m not sure from Section 3.1.3. what you mean by ‘trends’. Also, how have you defined ‘flows’?

Line 362: What is the error relative to? Did the model correctly predict the number of actions for 58% of flights? Or was the average error 42%? Average error is a useful indicator, but so is the standard deviation / variance of the error (assuming a normal distribution) which I think is important to report.

Model 3: I don’t understand why removing some flights would increase the error, unless your model is better at predicting low complexity flights that high complexity ones? All controller actions require workload, but some actions require more than others. Can you weight your events so that different levels of workload are accounted for?

Line 381: what impact does replacing the RandomForest algorithms with their XGBoost equivalent have on the relative error?

Line 404: I understand that an individual flow is not important, but surely the number of flows, and especially the interactions between those flows would drive complexity and consequently workload.

Line 407: Absolutely, the longer a flight stays in a sector, the more actions it may receive. However, I would argue the number of actions per unit time is more important on workload. If one flight requires 4 actions in 10 minutes, and another 4 actions in 1 minute, the second flight has a higher impact on workload.

Section 3.4.2.: You refer to ‘allocating’ resources, but sectors are assigned a controller team. Specifically, if you know a sector will be high workload you would either, split the sector into more sections (i.e. sector splitting), or you might choose to rotate through more controllers so that they can have longer rest periods.

Figure 12: Should be in English for the reader to understand.

Figure 12: Your model predicts low load much better than it predicts high load, this is expected, but note that the high load flights are those that are more important to predict as these are the flights that increase complexity and workload.

Author Response

First, we would like to thank you for your comments and for your in-depth review because we think it has helped to improve the quality of this paper quite a lot. It has helped us improve the English grammar and style and to add more context to readers who do not have an ATC background. Thank you for your time.

Some figures and tables were left out because we were worried about the paper being too long. But they were probably necessary for better explaining some parts. We are happy to include them in this new version, and we hope they help to better understand the paper.

I am happy to know that you liked our paper and that you believe that it does add value to the field. I will now proceed to response to your comments.

The changes made based on your comments are coloured in green in the new version of the paper. Some have been pasted here too.

General comments:

I would recommend the use of gender-neutral pronouns throughout, e.g. they/them/their rather than he/him/his.

A general review throughout the paper was made. I only found a "his" in the abstract, and it has been replaced by a "their". I am sorry, I was not familiarized with this topic.

English grammar generally requires some work (see specific examples below), particularly regarding tenses and plurals. Tenses example, line 225: to see what patterns the machine has learned (past) and how it calculates the predictions (present).

English grammar has been improved along the text. This particular error has been corrected too.

Try to avoid the use of ambiguous text, and always be specific. For example, line 66: “A model is built”, by who? And when? Why is this relevant?

Duly noted. Reworded to try to reduce the ambiguity of the text.

Ensure acronyms are expanded when they are first used during the paper.

Duly noted. Several acronyms explanations have been added.

Generally, I find the introduction confusing. It’s not clear what is previous work and what are the aims of the current paper. It’s also not particularly clear how you have learnt from that previous work to propose this novel approach.

Some parts of the introduction have been modified trying to achieve this goal. Two subsections have been added (introduction now is split in two: Background and related work / Focus and structure of the document) to attempt to separate previous work vs aims of the current paper.

I think Section 2 provides a good overview of the available models for someone who is not an expert in ML. Although it’s not always clear why some approaches have been chosen over others.

New explanations have been added (in blue). This was commented by another reviewer too.

In Section 3, I strongly approve of the use of exploratory data analysis before apply ML models.

Thanks. We think this is crucial too.

It’s not clear if you’ve used the same set of data to train the model and then validate it’s accuracy. Best practice when training ML algorithms is to split the data into training at testing data sets, can you please confirm if that’s the approach you took?

Yes, we took this approach. We split randomly the data in two sets: 80% train, 20% test. This information has been added to section 3.3. Machine learning models.

Generally, the figures need improving as they are not very clear in some cases.

Some figures have been changed, or they have been explained better in the text.

Section 4: Very well written. I still think there is a missing link between the number of actions and the complexity of each action.

Thanks. It is one of the most challenging parts we had to address, and we are thinking of better approaches for future work. 

Specific comments:

Line 2 “estimate controller workload”

Correction added.

Line 2 “So far, works of research have…”

Correction added.

Line 3 “We have enough data to be…”

Correction added.

Line 14: the complexity and number of controllers tasks is what drives workload, not just complexity.

Correction added.

Line 19: It’s not clear what “the objective” refers to. The previous study in 3?

Yes, it refers to the new mental models that are being developed in the ATM field. Clarification added.

Line 26: More accurate should be defined. Accurate in what sense?

The advantages lead to more accurate reports. Rephrased for better clarity.

Line 27: It’s not clear why you have chosen to focus on performance measures if other measures are better

We have worked on performance measure because of the nature of our data. We had operational data from Spain's ANSP. Added for more clarity.

Line 40: Sentence starts “First an investigation…” but I don’t see a list of points.

Wrong use of a connector. Corrected.

Line 44: should it say, “machine learning has been used”?

Typo, corrected.

Line 45: not sure what “it” refers to. The previous reference?

Typo, the previous line was modified in a draft, and that phrase is no longer needed.

Line 50: some context regarding the planner / executive / controller teams would be useful for the general reader.

Context added: "Controllers usually work in pairs, the main responsibility of the planner controller is to coordinate  with the neighbouring sectors while the executive controller is responsible for issuing clearances and instructions to aircraft that are in their sector."

Line 51: Duc-Thinh Pham <- reference required

Reworded, the authors is used instead.

Line 68: SHAP (also used in this work) and LIME <- are these acronyms? They should be spelt out on first read.

Acronyms added.

Line 72: don’t capitalise ‘The’

Typo, corrected.

Line 76/77: Predicting actions is not a metric itself. You could say something like predicting actions to estimate future workload would be useful metric to ATM.

Correction added.

Line 89: “from four sectors” will mean nothing to readers who don’t have an ATC background.

Context added: "Countries' airspaces are divided in sectors and assigned to controllers in order to break down the provision of air traffic services into tasks with a manageable workload."

Section 2.1.1: Flows of traffic which are used very infrequently (or traffic that is not following a defined flow) can cause the most complexity and increase workload considerably due to their irregularity, so please consider this factor.

This factor was partially taken into account in: Improving the quality of the data, especially the geometrical part (4.2. Future lines of research). But it should appear in Section 2.1.1 as you pointed out, so we added.

Section 2.1.3: This data is recorded manually? I’m quite surprised that the system doesn’t record this.

Yes. The actions taken by controllers can be complex, so the best way to capture the maximum information is by supervising the controllers. Not all the actions would be possible to be extracted from computer data. That is what makes this data valuable. A new table has been added with all the actions that ENAIRE's program can record (Table 1. IDs of each possible ATCo action.).

 

Section 2.3: Why Random Forest/XGBoost and not some of the other available models?

This comment was made by another reviewer. We only described the models that were finally used in this work. We added more context of why we did not use SVM and neural networks:

"Neural networks are considered deep learning algorithms. Deep Learning is a type of machine learning whose algorithms are inspired by the human brain, mimicking the way biological neurons signal each other. They have not been chosen in this work because they usually decimate the interpretability of the features to the point where they become meaningless. Instead, we wanted to focus on explainability, as it is more relevant for practical use in the ATM field."

"SVM are well suited for the classification of complex datasets but because of the tabular nature of our data, Random Forest was way more accessible for creating the ML models. Early exploratory data analysis showed that the data was not very spare and easy to classify. The best approach was not clear, and SVM were discarded."

Line 234: what is API?

Acronym added.

Line 280: should be “at the same flight level”

Correction added.

Section 3.1.1. I don’t think the trends that you describe in the text are obvious from Fig 1, all sectors appear to have both entry and exit between FL0 and FL500. You could show a distribution of the change in FL, e.g. entry FL – exit FL to show the average ascending vs descending traffic (I now see you’ve done this for Fig 5.). There will be other ways too so please consider this.

We added a reference to Figure 5 and explained that those trends become clearer when we study the new variable we created. We hope this will reduce confusion. Furthermore, we cannot move the figure up because in 3.1 we are discussing the findings of the Exploratory data analysis, and this analysis was made before the new variables were created. With the info obtained in the data analysis, we thought about creating new variables. We want to maintain the order of the sections because we think it helps to show the reader the process followed during the research. 

Section 3.1.2. What do you consider to me an ‘action’ is this a single instruction (ATC clearance) given to an aircraft e.g. climb to FL300? Sometimes ATCOs will instruct an aircraft to follow to actions at the same time e.g. climb to FL300 and follow heading 270. Would that me one or two actions? Ultimately, some actions are more complex than others.

Clarification added: "There are different possible actions with variable complexity, but events are just the number of total actions taken. Each action contributes the same to the sum, despite having different complexity."

Figure 2: the colours on the bar chart seem wrong. On the 3 and 4 event bars I cannot see blue outline. Is cruise the top of bottom portion of the bar? Shouldn’t your Y axis be ‘number of aircraft’ rather than probability, so the reader can infer the total number of aircraft given 1, or 2, or 3+ clearances?

Yes, in 3 and 4 the cruise flights are represented by the blue line at the top. This is due there are two different distributions one on top of the other (blue and orange).

In this figure, we want to show each distribution, so we show percentages. We thought it was more interesting to put the two distributions on the same figure rather than using two separate figures.

Section 3.1.3. Do you need this section? I think if you add more information to Section 3.1.1. then you don’t need this short paragraph.

We agree. The Section 3.1.3 has fallen too short, and it makes perfect sense to add the information to Section 3.1.1

Figure 3: again, I don’t think this is the most appropriate type of chart to show the trends. I believe a density function or histogram would represent the data better. It is unlikely that you would see aircraft cruising below FL300, but I see a number of outliers in the diagram – are you able to explain these?

We included this figure because we think it gives a nice view of the data. We chose this type of chart because we thought it was intuitive to see the altitude represented in the Y-axis.

The outliers are general aviation flights. There are a few of them, and we decided it was interesting to keep them to see how the models try to predict these flights too.

Line 318: what do your codes mean?

See new added table: Table 1. IDs of each possible ATCo action. Reference to the table added.

Line 323: why do you need new variables?

"The aim was to obtain new variables that would make it easier for artificial intelligence to interpret the data to obtain more accurate models"

Feature engineering is an important part of data science. Context added: "The objective of feature engineering is simplifying and speeding up data transformations while also enhancing model accuracy. It achieves that by creating new variables that are not in the training set."

Line 328: Accuracy in what sense?

Clarification added.

Figure 5: really hard to interpret. Why not split into 4 graphs like Figure 1?

We wanted to be able to compare more directly the different sectors. We tried splitting the graphs, but all seem very similar, and the differences were harder to notice.

Section 3.2.2.: How did you decide flight times > 1500 s are outliers? Is this based only on visual observation? You could look at the size of the sector also. Why would some flights be up to 3,500? Are these training flights that circle within the sector for an extended period of time or are they erroneous data points?

All the information was extracted from a boxplot of the variable. This boxplot has been included in the paper. There were some flights that were considered error that lasted around 80000 seconds. We delete all the flights > 3600s. The ones < 3600 were considered feasible.

Line 350/351: I’m not sure from Section 3.1.3. what you mean by ‘trends’. Also, how have you defined ‘flows’?

Context added for more clarity.

Line 362: What is the error relative to? Did the model correctly predict the number of actions for 58% of flights? Or was the average error 42%? Average error is a useful indicator, but so is the standard deviation / variance of the error (assuming a normal distribution) which I think is important to report.

With error relative, we are referring to MAPE (Mean absolute percentage error). See table 2 Regression model results. as an example.

Model 3: I don’t understand why removing some flights would increase the error, unless your model is better at predicting low complexity flights that high complexity ones? All controller actions require workload, but some actions require more than others. Can you weight your events so that different levels of workload are accounted for?

You are correct. As you can see in Figure 13. Confusion matrices for each sector our models are better at predicting low complexity flights than high complexity ones. To address the complexity problem, we designed the classification problem approach. We were considering the two options, regression (events) weighting complexity vs classification (low workload vs high workload flights) . We finally decided to take the classification approach, and we think the model obtained is an improvement.

Line 381: what impact does replacing the RandomForest algorithms with their XGBoost equivalent have on the relative error?

Clarification added: "The performance (relative error and accuracy) of the models improved, but not in any significant way." We changed the algorithm because of the ease of training and tuning the models and the saving of computational time.

Line 404: I understand that an individual flow is not important, but surely the number of flows, and especially the interactions between those flows would drive complexity and consequently workload.

Here, importance is referring to the total impact of a single variable (one flow) has on all the outputs of the dataset. As there are many flows, their importance is diluted. One flow has a few entries where its value is 1 (the flight is in that flow) and for the rest of entries it is 0.  When it is 0 it usually does not affect the output, so the overall impact is small. But when the value is 1, the output is greatly affected, so they are very influential and "important" for the model.

Line 407: Absolutely, the longer a flight stays in a sector, the more actions it may receive. However, I would argue the number of actions per unit time is more important on workload. If one flight requires 4 actions in 10 minutes, and another 4 actions in 1 minute, d flight has a higher impact on workload.

The metric you are proposing is interesting and will be taken into account in future works. What we thought it was interesting about that figure is how the slope changes depending on the flight trend. It is an interesting result that ascending or descending are less critical than cruise flights when they spend less time in the sector, and more critical when then spend a lot of time.

Section 3.4.2.: You refer to ‘allocating’ resources, but sectors are assigned a controller team. Specifically, if you know a sector will be high workload you would either, split the sector into more sections (i.e. sector splitting), or you might choose to rotate through more controllers so that they can have longer rest periods.

Now we see that the general reader needs additional information. An explanation has been added. 

Figure 12: Should be in English for the reader to understand.

We are sorry, we have missed changing this figure. The graphs have been properly updated.

Figure 12: Your model predicts low load much better than it predicts high load, this is expected, but note that the high load flights are those that are more important to predict as these are the flights that increase complexity and workload.

We agree on that. That is why it is important that the model seem to improve in sector 4 where there are more critical flights (high load).

 

Round 2

Reviewer 1 Report

NO

Reviewer 2 Report

Dear authors,

Many thanks for your thorough consideration of my comments. I'm very happy with your responses and believe your paper is a valuable contribution to the field of ATM research.

Best of luck for your future research!

Back to TopTop