Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Methodology for Predicting Ground Delay Program Incidence through Machine Learning

Sustainability 2023, 15(8), 6883; https://doi.org/10.3390/su15086883

by Xiangning Dong^1,2,*

, Xuhao Zhu^1,2, Minghua Hu^1,2 and Jie Bao^1,2

Reviewer 1:

Pandian R Sudhakar

Reviewer 2:

Tadesse Gemeda Wakjira

Reviewer 3:

Vladimir Sudakov

Sustainability 2023, 15(8), 6883; https://doi.org/10.3390/su15086883

Submission received: 27 March 2023 / Revised: 17 April 2023 / Accepted: 18 April 2023 / Published: 19 April 2023

(This article belongs to the Special Issue Air Traffic Management (ATM) for the Sustainability and Environmental Performance of Aviation)

Round 1

Reviewer 1 Report

REVIEW COMMENTS

sustainability-2338266

A Methodology for Predicting Ground Delay Programs Incidence through Machine Learning

The authors have done meticulous research on GDP and Machine learning, however, the following suggestions are made to refine the work.

· In Abstract, the motivation for the present work is not mentioned clearly

· Page No.2 line 74 – 75, “Wang et al. [9] studied the impact of dynamic airport ground weather on GDP and used T-WITI-FA and Air Traffic Data to model GDP prediction.”…..what does T-WITI-FA stands for?

· Page 2, line 97, “We comprehensively compare the predictive abilities of” ………..pronouns could be avoided.

· In Figure 2, Check the font style and size

· Figure 3. “Nanjing Lukou International Airport layout.”………The letters are not legible.

· In Table 1. “weather and traffic variables description.”……….please check the Font Style for starting letter

· In Table 2 “ the type of precipitation” classification has not been expanded properly

· In page line 247, “Gini coefficient”…..please include appropriate reference..

· Page 6, line 211 “AUC value”…..Authors could expand this….

· In page14, line 463 “contributing factors.”….Could you name few factors?

Comments for author File: Comments.pdf

Author Response

Dear reviewer,

The authors would like to thank you for your feedback on the paper. In this document, we explain how we dealt with the comments. For clarity, we have separated individual comments and put them in boxes (and in italics). Our responses to these comments can be found under each box. All modifications in the manuscript have been highlighted in yellow.

Kind regards,

The authors

Author Response File: Author Response.pdf

Reviewer 2 Report

This study investigates the application of machine learning techniques in predicting the incidence of effective ground delay programs (GDPs) during adverse weather conditions or airport capacity issues. The methodology combines local weather and flight operation data with the ATM Airport Performance algorithm to estimate the probability of GDPs and improve the accuracy of departure flight delay predictions during GDPs.

Generally, the paper addresses a topic of relevance and general interest to the journal's readers and holds practical significance. However, it has several critical flaws in both methodology and results. For instance, the inclusion of departure delay time prediction in the study appears to be an afterthought, lacking proper attention to detail. The authors used a very small database for their machine learning regression algorithms and did not provide any information about the hyperparameters of the decision tree regressor used for predicting departure delay time. Additionally, it is unclear whether the authors evaluated the models on an unseen or test dataset, making it difficult to assess the accuracy of the results. The results presented in Table 7 clearly demonstrate that the developed models fall significantly short of acceptable accuracy levels. Without following a proper approach and achieving acceptable accuracy, presenting the model serves little purpose.

Summary of other critical flaws in the methodology and discussion of results is given below.

Methodology: The methodology used in the study requires further discussion to improve its accuracy and credibility. Specifically, the study lacks background discussion on the regression models and discussion on the hyperparameters of the model. The assumption of some hyperparameters rather than optimizing their values indicates a lack of proper optimization of the models. It is crucial to follow a proper methodology and optimize all hyperparameters of the models to ensure the study's results have strong credibility. Additionally, the study lacks details on the database, including the number of databases used, and does not analyze the distribution of each input feature and the correlation between each input feature. Moreover, it is recommended to use K-fold cross-validation during hyperparameter optimization to overcome the problem of overfitting, which can be clearly observed in Table 6.

Results and discussion: The discussion of the results is inadequate, and Table 6 shows significant overfitting in almost all models. This overfitting may be due to the suboptimal optimization of the models' hyperparameters. To address this issue, the authors should consider using a cross-validation technique during hyperparameter optimization to increase the model's generalization ability and prevent overfitting. Incorporating cross-validation into the optimization process would improve the accuracy and reliability of the model, making the study's results more robust. Moreover, the details of the database including the number of database used in the study is missing. Also, the distribution of each input feature in the database and the correlation between each input feature are not analyzed.

Other major comments are listed below.

1. Lines 63-65: Please, cite the reference by “Grabbe et al.”.

2. Please, proofread the entire paper and ensure that all abbreviations are defined before using them (e.g., SVM in Line 70, T-WITI-FA in Line 75, WITI in Line 84, R² in Line 22, AUC in Line 211).

3. Correct “LSAAO” to “LASSO” in Line 137.

4. The study employs SVM and two ensemble models (random forest and xgBoost) to predict GDP events using ATMAP scores, airport flight data, and local weather, while simple regression models (Ridge regression, LASSO, and decision tree) are used to predict GDP flight delay. The authors model selection is unclear , and they should perform a preliminary study to investigate from the simplest to the most complex model for both cases to ensure they choose the most appropriate and effective models for the study.

5. Figure 3 is difficult to comprehend, and it seems to have been copied from another source. If that is the case, to avoid any potential copyright issues, I recommend regenerating the figure.

6. Section 3.3 lacks clarity on the reasoning behind the selection of the three classification algorithms. To provide a more thorough evaluation, it is advisable for the authors to compare the ensemble methods (random forest and xgBoost) with the performance of the base learner (decision tree classifier).

7. The statement in Line 219 is misleading as there is only one type of SVM classifier, but multiple types of kernel functions. Please make the necessary correction.

8. Please, enhance the comprehensiveness of the literature review on the application of machine learning models used in this study, particularly ensemble models (random forest and xgBoost) and regression models (ridge regression, LASSO, decision tree) by referring to relevant recent studies such as doi.org/10.3390/su15021718 and doi.org/10.3390/su13020926.

9. Lines 230: The selection of the optimal type of kernel function should be determined through hyperparameter optimization, rather than being assumed without proper consideration. This will help to produce a more accurate and effective model.

10. The paper lacks a discussion on the regression models (ridge regression, LASSO, and decision tree regressor), which hinders a full understanding and evaluation of the models. Please, provide a brief background on these models.

11. In Tables 4 and 5, the authors should provide a comprehensive discussion on the hyperparameters of the models to improve the transparency and reproducibility of the study. It is unclear why the authors optimized only the hyperparameters listed in Table 4 and did not consider other important hyperparameters. For instance, xgBoost has several hyperparameters to be optimized, such as the maximum depth of the tree, which was not addressed in the study. Additionally, the kernel type of SVM should be optimized rather than being assumed to ensure the best possible performance of the model.

12. Table 4: Why the number of estimators (n_estimators has two ranges: [20,200] and [10,200])? Please, correct.

13. Please, correct the typo in Line 364.

14. To enhance the reliability of the study's results, the authors should address the issue of significant overfitting observed in Table 6, which may have resulted from suboptimal hyperparameter optimization. Please, address this issue by referring to Section 3.6, Figure 4 and related discussion in the following study: doi.org/10.1016/j.jclepro.2022.134203

15. In Lines 383-401 and Figure 7, it is unclear how the feature importance was determined. provide a more accurate and comprehensive understanding of the importance of each input feature and enhance the quality of the discussion, I suggest using the Shapley Additive exPlanation (SHAP) technique, which can interpret black-box ML models like xgBoost and rank input features globally and locally.

16. The discussion of the results and model evaluation in Section 4.2.2. is inadequate. The authors must thoroughly evaluate and compare the performance of the models based on all performance measures and provide a comprehensive discussion and evaluation of the models on both the train and test datasets. In Table 6, the authors should include other performance measures such as Cohen’s Kappa and AUC to enhance the study's credibility. Furthermore, the authors should provide a confusion matrix, a standard tool for classification models, to improve the interpretation of the models' performance. Please, refer to Figure 9 and the related discussion in doi.org/10.1016/j.jclepro.2022.134203 to gain insights into how to effectively evaluate and present the models' results, providing a more robust and compelling discussion.

17. Line 414: Please, define R, RMSE, and MAE including their mathematical expressions.

18. In Section 4.3, it is unclear what values the authors used for the hyperparameters of the decision tree regressor and how they optimized these values. Additionally, it is unclear why the authors chose to use a decision tree on such a small database for regression (Figure 8 and Figure 9).

19. The inclusion of departure delay time prediction in the study appears to be an afterthought, lacking proper attention to detail. The authors used a very small database for their machine learning regression algorithms and did not provide any information about the hyperparameters of the decision tree regressor used for predicting departure delay time. Additionally, it is unclear whether the authors evaluated the models on an unseen or test dataset, making it difficult to assess the accuracy of the results. The results presented in Table 7 clearly demonstrate that the developed models fall significantly short of acceptable accuracy levels. Without following a proper approach and achieving acceptable accuracy, presenting the model serves little purpose. This section requires significant revision, and the authors should provide a more detailed and rigorous analysis to improve the study's credibility.

20. The conclusions should be revised considering the above comments.

21. Please, discuss the practical implementation of the results of this study.

Author Response

Dear reviewer,

Kind regards,

The authors

Reviewer2

the paper addresses a topic of relevance and general interest to the journal's readers and holds practical significance. However, it has several critical flaws in both methodology and results. For instance, the inclusion of departure delay time prediction in the study appears to be an afterthought, lacking proper attention to detail. The authors used a very small database for their machine learning regression algorithms and did not provide any information about the hyperparameters of the decision tree regressor used for predicting departure delay time. Additionally, it is unclear whether the authors evaluated the models on an unseen or test dataset, making it difficult to assess the accuracy of the results. The results presented in Table 7 clearly demonstrate that the developed models fall significantly short of acceptable accuracy levels. Without following a proper approach and achieving acceptable accuracy, presenting the model serves little purpose.

Response: Thank you for taking the time to review our paper and providing valuable feedback. We appreciate your positive feedback regarding the relevance and practical significance of our topic.

We agree with your comments on the critical flaws in our methodology and results, and we apologize for any confusion or lack of detail in our presentation of the study. We plan to revise the paper to provide more comprehensive explanations and details on this aspect.

With regard to the size of the database, we acknowledge that it is relatively small. This is because this part of the GDP data is derived from ATC logs, which are hand-copied records by controllers. However, this six-month data is still representative, as it includes both spring/winter and summer/fall seasons, and the weather characteristics are more pronounced.

We also agree that we should provide more information on the hyperparameters of the decision tree regressor used to predict departure delay times. Because of the length and the focus of this part of the study on the effect of ATMAP scores on the performance of regression models, a few representative regression models have been chosen. The code for the models we have open sourced on GitHub[https://github.com/helloAristole/paper-].

We apologize for not clearly stating whether we evaluated the models on an unseen or test dataset, and we confirm that we performed model evaluation on a test dataset. We understand the importance of this step in assessing the accuracy of the results, and we will make sure to clarify this in the revised version of our paper.

We agree with your comment that the developed models did not meet acceptable accuracy levels, and we understand the implications of presenting models that fall short of acceptable accuracy levels.

We agree with your comment that the model developed does not achieve an acceptable level of accuracy and we understand the implications of proposing a model that does not achieve an acceptable level of accuracy. This is because predicting delays generated by ground waiting strategies is a relatively complex problem and is not as cyclical as typical flight delays, where delays generated by GDP are more random. The regression models in the paper predict errors of up to 15 minutes are acceptable. In recent research, we are also experimenting with neural networks for prediction and the research is still ongoing.

Once again, thank you for your valuable feedback, and we will work hard to address the issues raised in our revised manuscript.

The methodology used in the study requires further discussion to improve its accuracy and credibility. Specifically, the study lacks background discussion on the regression models and discussion on the hyperparameters of the model. The assumption of some hyperparameters rather than optimizing their values indicates a lack of proper optimization of the models. It is crucial to follow a proper methodology and optimize all hyperparameters of the models to ensure the study's results have strong credibility. Additionally, the study lacks details on the database, including the number of databases used, and does not analyze the distribution of each input feature and the correlation between each input feature. Moreover, it is recommended to use K-fold cross-validation during hyperparameter optimization to overcome the problem of overfitting, which can be clearly observed in Table 6.

Response: Thanks for your suggestion. Thank you for your insightful comments on the methods used in our study. We recognize the importance of properly optimizing the hyperparameters to ensure the accuracy and credibility of our results. In response to your comments, however, we assure you that the refinement of the article did not add a discussion of the regression model and its hyperparameters, but we did do this part, which you can view on the code we have provided.

Regarding the database, we apologize for not providing enough information. The study used a single database of flight data, which was a limitation of the study. We have added a description of the weather in Figure 4 in the methodology section to make it clear. Regarding the relevance of the data, we have added Figure 5 to show the relevance of the data.

Finally, we also agree that K-fold cross-validation can be a useful technique to overcome overfitting problems, but we have used 5-fold cross-validation in the text. Thank you for your valuable feedback, which has helped us to improve the methodology and credibility of the study.

The discussion of the results is inadequate, and Table 6 shows significant overfitting in almost all models. This overfitting may be due to the suboptimal optimization of the models' hyperparameters. To address this issue, the authors should consider using a cross-validation technique during hyperparameter optimization to increase the model's generalization ability and prevent overfitting. Incorporating cross-validation into the optimization process would improve the accuracy and reliability of the model, making the study's results more robust. Moreover, the details of the database including the number of database used in the study is missing. Also, the distribution of each input feature in the database and the correlation between each input feature are not analyzed.

Response: Thanks for your suggestion. Thank you for your valuable feedback. We recognize your concerns regarding the optimization of hyperparameters and the overfitting issues observed in Table 6. We agree that cross-validation techniques can help improve the generalization of the model and prevent overfitting. We have incorporated cross-validation techniques into the hyperparameter optimization process to improve the accuracy and reliability of the model.

Regarding the lack of database details, we apologize for this oversight and will provide a more detailed description of the databases used in the study, including the number of databases used, in the revised manuscript. We also agree that an analysis of the distribution of each input feature and the correlation between each input feature could provide valuable insights and will include this analysis in our revised manuscript.

We thank you again for your constructive comments and we will thoroughly address these issues in our revised manuscript to improve the credibility and validity of the study.

Lines 63-65: Please, cite the reference by “Grabbe et al.”.

Response: Thanks for your suggestion. We have moved the reference to Grabbe et al in lines 65 to 66.

Please, proofread the entire paper and ensure that all abbreviations are defined before using them (e.g., SVM in Line 70, T-WITI-FA in Line 75, WITI in Line 84, R2 in Line 22, AUC in Line 211).

Response: Thanks for your suggestion. We apologize for any confusion caused by the lack of abbreviation definitions in our manuscript. We will ensure that all abbreviations used in our paper are defined before their first use to make it easier for readers to understand the content. Additionally, we will carefully proofread the entire manuscript to correct any errors or typos and improve its overall clarity.

Correct “LSAAO” to “LASSO” in Line 137.

Response: Thanks for your suggestion. We appreciate you bringing the error to our attention. We apologize for the typo in line 137 and acknowledge that the correct term is "LASSO," not "LSAAO."

The study employs SVM and two ensemble models (random forest and xgBoost) to predict GDP events using ATMAP scores, airport flight data, and local weather, while simple regression models (Ridge regression, LASSO, and decision tree) are used to predict GDP flight delay. The authors model selection is unclear , and they should perform a preliminary study to investigate from the simplest to the most complex model for both cases to ensure they choose the most appropriate and effective models for the study.

Response: Thanks for your suggestion. As the contents of Table 2 are taken from reference [22] “Algorithm to Describe Weather Conditions at European Airports”, it is not possible to expand on them. The precipitation types within Table 2 are derived from METAR messages and represent different levels of precipitation.

Figure 3 is difficult to comprehend, and it seems to have been copied from another source. If that is the case, to avoid any potential copyright issues, I recommend regenerating the figure.

Response: Thanks for the suggestion. Figure 3 is a plan of Nanjing Lukou International Airport, showing the runway data I chose for that study. We have replaced the image with a much clearer one. This is publicly available information can be accessed at this website [https://aip.chinaflier.com/#/].

line 211 Section 3.3 lacks clarity on the reasoning behind the selection of the three classification algorithms. To provide a more thorough evaluation, it is advisable for the authors to compare the ensemble methods (random forest and xgBoost) with the performance of the base learner (decision tree classifier).

Response: Thanks for your suggestion. We will provide a detailed explanation in Section 3.3 on the reasoning behind the selection of the classification algorithms. This will enhance the clarity of our work and provide readers with a better understanding of the models used in our study. Thank you again for your valuable feedback. “SVM denotes the conventional classification model, whereas Random Forest and XGBoost indicate the ensemble learning model. Ensemble learning is a comprehensive phrase for amalgamating numerous learning methods in machine learning, which possesses the benefit of integrating multiple weakly supervised models to construct a strongly super-vised model, thereby amplifying the forecasting precision of the model.”

The statement in Line 219 is misleading as there is only one type of SVM classifier, but multiple types of kernel functions. Please make the necessary correction.

Response: Thanks for your suggestion. We appreciate your comments on our manuscript and agree that the statement in Line 219 may be misleading. We have made the necessary changes and removed the misleading sections. Change in part on line 263 to read “) The four prevalent kernel varieties encompass the linear kernel, polynomial kernel (poly), hyperbolic tangent (sigmoid) kernel, and radial basis function (rbf) kernel..”

Please, enhance the comprehensiveness of the literature review on the application of machine learning models used in this study, particularly ensemble models (random forest and xgBoost) and regression models (ridge regression, LASSO, decision tree) by referring to relevant recent studies such as doi.org/10.3390/su15021718 and doi.org/10.3390/su13020926.

Response: Thanks for your suggestion. We have carefully reviewed the suggested papers (doi.org/10.3390/su15021718 and doi.org/10.3390/su13020926), and we will include them in our revised manuscript are references [30] and [32] respectively. We will also review the relevant literature to identify other recent studies that could help us improve the comprehensiveness of our literature review.

Lines 230: The selection of the optimal type of kernel function should be determined through hyperparameter optimization, rather than being assumed without proper consideration. This will help to produce a more accurate and effective model.

Response: Thanks for your suggestion. We agree with your comment regarding the selection of the optimal type of kernel function for the SVM classifier. As you have rightly pointed out, we should not assume the optimal type of kernel function without proper consideration, and instead, we should determine it through hyperparameter optimization. We will revise our manuscript to clarify this point and explain that we will perform hyperparameter optimization to select the optimal type of kernel function for our SVM classifier. By doing so, we aim to produce a more accurate and effective model that can better predict GDP events.

The paper lacks a discussion on the regression models (ridge regression, LASSO, and decision tree regressor), which hinders a full understanding and evaluation of the models. Please, provide a brief background on these models.

Response: Thanks for your suggestion. We appreciate your feedback and agree that a brief background on the regression models used in the study would be helpful for readers , which we have added in section 3.4.

In Tables 4 and 5, the authors should provide a comprehensive discussion on the hyperparameters of the models to improve the transparency and reproducibility of the study. It is unclear why the authors optimized only the hyperparameters listed in Table 4 and did not consider other important hyperparameters. For instance, xgBoost has several hyperparameters to be optimized, such as the maximum depth of the tree, which was not addressed in the study. Additionally, the kernel type of SVM should be optimized rather than being assumed to ensure the best possible performance of the model.

Response: Thanks for your suggestion. We agree that providing a comprehensive discussion on the hyperparameters of the models would improve the transparency and reproducibility of the study. In our study, we optimized only the hyperparameters listed in Table 4 due to limitations in computational resources and time. However, we acknowledge that XGBoost has several hyperparameters to be optimized, including the maximum depth of the tree, which was not addressed in our study. We have re-experimented with your proposed parameters using Bayesian tuning and obtained the best choice of parameters without over-fitting, as shown in Tables 4 and 5.

Table 4: Why the number of estimators (n_estimators has two ranges: [20,200] and [10,200])? Please, correct.

Response: Thank you for bringing this to our attention. The number of estimators (n_estimators) was mistakenly listed with two ranges in Table 4. We apologize for any confusion this may have caused and will correct this error in the revised version of the paper.

Please, correct the typo in Line 364

Response: Thanks for your suggestion. We have corrected the error in line 364. Thank you for drawing our attention to this issue.

To enhance the reliability of the study's results, the authors should address the issue of significant overfitting observed in Table 6, which may have resulted from suboptimal hyperparameter optimization. Please, address this issue by referring to Section 3.6, Figure 4 and related discussion in the following study: doi.org/10.1016/j.jclepro.2022.134203

Response: Thanks for your suggestion. We agree that overfitting is a critical issue that can affect the reliability and generalizability of the study's results. We have carefully examined the sub-optimal hyperparameter optimization in our model as suggested in your comments, taking into account the information provided in your recommended study. We address this issue by discussing the potential impact of overfitting on our results and providing additional information on how we will optimize our model to avoid overfitting in a revised version of the manuscript, the results of which are shown in Table 6.Your recommendation of this paper is valuable and we have adopted it as one of our references.

In Lines 383-401 and Figure 7, it is unclear how the feature importance was determined. provide a more accurate and comprehensive understanding of the importance of each input feature and enhance the quality of the discussion, I suggest using the Shapley Additive exPlanation (SHAP) technique, which can interpret black-box ML models like xgBoost and rank input features globally and locally.

Response: Thanks for your suggestion. The importance metric for model features is derived from the interface provided by the model itself, based on the split features. We agree that the use of SHAP provides a more comprehensive understanding of the importance of each input feature and improves the quality of the discussion. However, given the time factor, we will use the technique in a subsequent study to apply this method to my master's thesis.

The discussion of the results and model evaluation in Section 4.2.2. is inadequate. The authors must thoroughly evaluate and compare the performance of the models based on all performance measures and provide a comprehensive discussion and evaluation of the models on both the train and test datasets. In Table 6, the authors should include other performance measures such as Cohen’s Kappa and AUC to enhance the study's credibility. Furthermore, the authors should provide a confusion matrix, a standard tool for classification models, to improve the interpretation of the models' performance. Please, refer to Figure 9 and the related discussion in doi.org/10.1016/j.jclepro.2022.134203 to gain insights into how to effectively evaluate and present the models' results, providing a more robust and compelling discussion.

Response: Thanks for your suggestion. We appreciate your comments and will make the necessary revisions to improve the quality of our paper. We agree that a more comprehensive evaluation and comparison of model performance is needed, including all relevant performance metrics, on both the training and test datasets. We have added Cohen's Kappa and AUC to Table 6 and provided a confusion matrix for each model to improve the interpretation of model performance. We will also refer to Figure 9 and the related discussion in doi.org/10.1016/j.jclepro.2022.134203 for insights on how to effectively evaluate and present model results, providing a more robust and cogent discussion. Thank you for your suggestions, which will greatly enhance the credibility of the study. The confusion matrix is shown in Figure 10.

Line 414: Please, define R, RMSE, and MAE including their mathematical expressions.

Response: Thanks for your suggestion. We have provided an additional explanation of the formula in section 4.3.1.

In Section 4.3, it is unclear what values the authors used for the hyperparameters of the decision tree regressor and how they optimized these values. Additionally, it is unclear why the authors chose to use a decision tree on such a small database for regression (Figure 8 and Figure 9).

Response: Thanks for your suggestion.

In answer to the first part of your question, the hyperparameters of the decision tree regressor are optimised by a cross-validated grid search. Specifically, we varied the maximum depth of the tree, the minimum number of samples needed to split the internal nodes, the minimum number of samples needed for the leaf nodes, and the maximum number of features to consider when finding the best split. The optimal parameters are not shown for reasons of space.

Regarding the second part of your question, decision trees are useful for small datasets because they are less prone to overfitting compared to other models. (In our study, the dataset was relatively small), which makes decision tree regression a reasonable choice. However, we agree that other models such as random forest and XGBoost or neural network models can also be used and we are also incorporating them into our future research using larger datasets.

The inclusion of departure delay time prediction in the study appears to be an afterthought, lacking proper attention to detail. The authors used a very small database for their machine learning regression algorithms and did not provide any information about the hyperparameters of the decision tree regressor used for predicting departure delay time. Additionally, it is unclear whether the authors evaluated the models on an unseen or test dataset, making it difficult to assess the accuracy of the results. The results presented in Table 7 clearly demonstrate that the developed models fall significantly short of acceptable accuracy levels. Without following a proper approach and achieving acceptable accuracy, presenting the model serves little purpose. This section requires significant revision, and the authors should provide a more detailed and rigorous analysis to improve the study's credibility.

Response: Thanks for your suggestion. Thank you for your feedback. We apologize for the lack of attention given to the task of departure delay time prediction in our study. However, flight delay forecasting is a very complex issue as there are so many relevant factors involved. Most delay forecasting is about predicting on-time performance or delay levels, which we feel is not very meaningful as it does not give passengers accurate information. In fact, forecast errors of up to 15 minutes are acceptable.

Regarding the small database used for regression, we acknowledge that a larger data set would provide more robust results. Unfortunately, due to limitations in data availability, we were only able to obtain a limited amount of data for this task. However, we agree that this limitation should be more clearly highlighted in the study.

In addition, we acknowledge that we have not provided sufficient information on how we have evaluated the model for departure delay prediction. We have given an example and will describe the evaluation process in more detail in our revised manuscript.

The conclusions should be revised considering the above comments.

Response: Thanks for your suggestion. We will carefully revise the conclusions of our study to address the concerns you have raised. We will ensure that our conclusions accurately reflect the limitations of our study and the implications of our findings. We appreciate your input and will take your comments into account in our revisions.

Please, discuss the practical implementation of the results of this study.

Response: Thanks for your suggestion. The practical implementation of the results of this study could have important implications for airport operations and airline management. By accurately predicting flight arrival delays, airlines and airports can adjust their schedules and operations to minimize the impact of delays on passengers and reduce costs associated with delays, such as increased fuel consumption and crew overtime. Additionally, predicting departure delay times can help airlines and airports optimize their resources, such as ground handling equipment and staff, to reduce the turnaround time of aircraft and improve overall efficiency.

Furthermore, the study's results could be used to inform decision-making processes in other areas, such as airport capacity planning and air traffic management. For example, by predicting delays in advance, air traffic controllers can proactively adjust traffic flow to prevent congestion and reduce delays. Additionally, airport capacity planners can use the study's results to optimize the allocation of resources, such as gates and runways, to minimize the impact of delays on overall airport operations.

Author Response File: Author Response.pdf

Reviewer 3 Report

The topic of the article is certainly relevant. Since errors in the definition of GDP can lead to significant losses.

The general level of the article is average, since traditional machine learning tools are used and we cannot talk about its novelty. However, the proposed approach to the use of features is new, so I recommend the article for publication.

For the future, authors are advised to:

1) check the features for multicollinearity, this will justify their use or exclusion;

2) evaluate the capabilities of neural networks;

3) the cost of an error in underestimating the GDP time is significantly higher than the cost of an error in overestimating it, it is advisable to take this into account by introducing a user-defined loss function;

4) try to use the stacking of the proposed models.

Author Response

Dear reviewer,

Kind regards,

The authors

Reviewer3

Response: Thank you for taking time out of your busy schedule to review our work. We are thrilled by your positive comments and will carefully consider your comments to revise the manuscript.

check the features for multicollinearity, this will justify their use or exclusion;

Response: Thanks for your suggestion. Checking for multicollinearity is an important step in analyzing data with multiple independent variables, as it helps to identify whether the variables are highly correlated with each other. In future research we will use PCA or SAE models for dimensionality reduction to reduce the problem of feature covariance.

evaluate the capabilities of neural networks;

Response: Thanks for your suggestion. Neural networks are powerful tools for analyzing complex data and have shown great success in various fields, including image and speech recognition, natural language processing, and predictive modeling. In our recent research, we are using the neural network model ELM, adding heuristics to it for network optimization and evaluating its capabilities.

the cost of an error in underestimating the GDP time is significantly higher than the cost of an error in overestimating it, it is advisable to take this into account by introducing a user-defined loss function;

Response: Thanks for your suggestion. I completely agree that underestimating the GDP time can have significant consequences, and it is important to take into account the cost of such errors when modeling GDP. By introducing a user-defined loss function, we can explicitly incorporate the costs of false positives and false negatives into the model and optimize the trade-off between them. This can help to improve the accuracy and robustness of our GDP estimates and reduce the potential for negative impacts. I appreciate the suggestion and will definitely consider incorporating a user-defined loss function into my analysis.

try to use the stacking of the proposed models.

Response: Thanks for your suggestion. Stacking is a powerful ensemble learning technique that combines the predictions of multiple models to improve predictive performance. By using stacking, we can leverage the strengths of different models and mitigate their weaknesses, resulting in more accurate and robust predictions. I will definitely explore the possibility of using stacking in my analysis and compare its performance with other modeling approaches. I appreciate the suggestion and will take it into account in my future research.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The main concerns raised by this referee have been addressed in the revised manuscript.

Article Menu

A Methodology for Predicting Ground Delay Program Incidence through Machine Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI