Next Article in Journal
A Relevant Characterization and Compatibility for Reuse the Sediments from Reservoirs in Southern Italy
Previous Article in Journal
EEG Emotion Classification Based on Graph Convolutional Network
 
 
Article
Peer-Review Record

Prediction of Traffic Incident Locations with a Geohash-Based Model Using Machine Learning Algorithms

Appl. Sci. 2024, 14(2), 725; https://doi.org/10.3390/app14020725
by Mesut Ulu 1,*, Erdal Kilic 2 and Yusuf Sait Türkan 3
Reviewer 1: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2024, 14(2), 725; https://doi.org/10.3390/app14020725
Submission received: 22 November 2023 / Revised: 20 December 2023 / Accepted: 9 January 2024 / Published: 15 January 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

 

The authors have undertaken a commendable effort in addressing the challenging task of predicting traffic incident locations. The utilization of the Random Forest algorithm in this study demonstrates a thoughtful choice, considering its efficacy in handling complex spatial data and producing accurate predictions. The article is well-structured, providing a clear and concise overview of the problem statement, methodology, and results.

Furthermore, the results section effectively communicates the performance metrics of the proposed model, offering insights into its accuracy, precision, and recall. In conclusion, the article "Prediction of Traffic Incident Locations with a Geohash-Based Model Using Machine Learning Algorithms" successfully combines theoretical knowledge with practical application. The strategic use of the Random Forest algorithm, coupled with the innovative incorporation of geohash-based features, makes this research a valuable contribution to the field of traffic incident prediction.

Please improve the quality of the figures. Try to find more relevant literature. The literature review should be extended. Attempt to identify more papers that utilize the presented algorithms. References from section 2 should also be described in section 1. Please use the same citation structure throughout the entire document.

Author Response

Thank you for reviewing our study and making suggestions. We are grateful for your contribution to the completion of the gaps in our study. Below are our responses to each of your comments.

  • The authors have undertaken a commendable effort in addressing the challenging task of predicting traffic incident locations. The utilization of the Random Forest algorithm in this study demonstrates a thoughtful choice, considering its efficacy in handling complex spatial data and producing accurate predictions. The article is well-structured, providing a clear and concise overview of the problem statement, methodology, and results.
  • Thank you for your valuable opinions on our work.
  • Furthermore, the results section effectively communicates the performance metrics of the proposed model, offering insights into its accuracy, precision, and recall. In conclusion, the article "Prediction of Traffic Incident Locations with a Geohash-Based Model Using Machine Learning Algorithms" successfully combines theoretical knowledge with practical application. The strategic use of the Random Forest algorithm, coupled with the innovative incorporation of geohash-based features, makes this research a valuable contribution to the field of traffic incident prediction.
  • Thank you very much for your positive evaluation and opinions.

 

  • Please improve the quality of the figures.
  • The quality of Figure 1, 3, 5 and all the Figures in Appendix B has been improved. Figures 2 and 4 were included in the study after the peer review process.
  • Try to find more relevant literature. The literature review should be extended. Attempt to identify more papers that utilize the presented algorithms. References from section 2 should also be described in section 1. Please use the same citation structure throughout the entire document.
  • In the Introduction section, we have included four paragraphs discussing GIS-based studies and recently conducted research utilizing proposed algorithms for traffic incident forecasting (4 paragraphs from line 58 have been highlighted, A total of 24 new references have been utilised.)
  • Descriptions of references for the topics discussed in chapter 2 are available in the highlighted sections of chapter 1. Moreover, we have included our experimental work and additional information about the methods in the subsections of chapter 2 (highlighted sections have been added under the methods in section 2).
  • We have also updated the references to ensure a consistent citation structure throughout the document.

We believe that the quality of our work has improved further with the additions and revisions we implemented based on your feedback. We appreciate your valuable contributions.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper proposes a novel approach to predict the location of traffic incidents based on geohash and machine learning algorithms such as decision trees, k-NN, random forest, and SVM. Even if the approach is interesting, the paper needs many improvements. Please see below my open points and suggestions: 

1. Describe better the scientific contribution within this paper and the need for this research. 

2. Use the citation style corresponding to the chosen journal (lines 46-47, 138). 

3. Improve the quality of Figure 1, 3 and the figures from the appendix. Additionally, each figure in the appendix should have a caption. 

4. Write the equations from lines (265-276) and refer to them according to the journal guidelines. 

5. Table 5 is not according to the journal guidelines. 

6. I suggest creating a separate section for the literature review and providing a more comprehensive discussion on the research gap identified within the current literature. 

7. The methodology is too brief. It should be a more comprehensive and detailed discussion of the concepts (geohash, decision trees, random forest, k-NN, etc.) currently presented in Section 2 based on their application on the experiment presented in Section 3. 

8. The confusion matrix in Figure 3 is well explained at lines 321-339. 

9. Are there limitations to the proposed approach? Please discuss them. 

10. How can this research be further extended? Please discuss the recommendations for future research in a separate section. 

11. I appreciate the table from Appendix A with the description of the variables used. Please reference it correctly at line 213 and use the table style according to the guidelines. 

Comments for author File: Comments.pdf

Author Response

Thank you for reviewing our study and making suggestions. We are grateful for your contribution to the completion of the gaps in our study. Below are our responses to each of your comments.

  1. Describe better the scientific contribution within this paper and the need for this research. 

For the first time, our study contributes to the literature by predicting the locations of traffic events, including traffic accidents, vehicle breakdowns, and emergencies, from a broader perspective, rather than focusing only on the locations of traffic accidents or congestion. Another contribution of this research is the utilization of geohash zones with suitable di-mensions within the three-stage model to identify the locations of traffic incidents. Traffic data has been collected and pre-processed for various variables that impact traffic events, including time, region, vehicles, traffic index, road structure, and weather conditions, specifically for these areas.

The gap filled by our study and its contributions to the literature are included in the introduction on lines 65-70 and 77-84. Incorporating geographic data and geohash coding as input variables to machine learning algorithms to improve prediction accuracy will contribute to traffic management and safety. This contribution is explained in lines 147-155.

  1. Use the citation style corresponding to the chosen journal (lines 46-47, 138).

Quotations are arranged according to the format of the journal. It is highlighted in yellow and given in line numbers 46-47 and 209.

  1. Improve the quality of Figure 1, 3 and the figures from the appendix. Additionally, each figure in the appendix should have a caption.

Figures 1,3 and 5, and the quality of the figures in Appendix B have been improved. All figures in Appendix B have been given a title. Figures 2 and 4 were included in the study after the peer review process.

  1. Write the equations from lines (265-276) and refer to them according to the journal guidelines.

The equations are numbered, and the corrections are in lines 388-402.

  1. Table 5 is not according to the journal guidelines.

The table has been corrected. On line 368 you can find the corrected table.

  1. I suggest creating a separate section for the literature review and providing a more comprehensive discussion on the research gap identified within the current literature.

We think that this revision you suggested is extremely necessary and we thank you for your contribution. We have included a comprehensive literature study in the introduction. In this context, in addition to expanding the literature, we aimed to provide a more comprehensive discussion on the research gap identified within the current literature. We added a new field from line 58 to line 124. In the first of the four paragraphs we added, we included prediction studies regarding traffic incidents. Since the places of traffic accidents or traffic congestion were determined in these studies, we emphasized that our study works on the prediction of the locations of traffic incidents (traffic accidents, vehicle breakdowns, road maintenance and emergencies) that cause traffic congestion with a broad perspective and that it is the first study in this field. In this paragraph, we also mentioned that in researches trying to detect hotspots (traffic congestion or traffic accidents) using geographic information system (GIS)-based methods and spatial analysis, it is an important problem to divide the zones where traffic incidents occur into reasonable dimensions and to collect and process data considering these zones. At this point, we mentioned the utilization of geohash zones with suitable dimensions, which is another contribution of our study. In the following paragraphs, we have given examples of recent studies on GIS-based systems and data-driven models in the investigation of traffic incidents in the literature.

  1. The methodology is too brief. It should be a more comprehensive and detailed discussion of the concepts (geohash, decision trees, random forest, k-NN, etc.) currently presented in Section 2 based on their application on the experiment presented in Section 3.

We added information to the methods described in sections 2.2.1, 2.2.2, 2.2.3, and 2.2.4. We presented additional details about the methods, including their strengths and weaknesses. We provide explanations based on their application in the experiment presented in Section 3, in addition to supplementary information on the methods in Section 2. Moreover, we briefly explained the reasons behind the success of some methods. This information can be found between lines 516-523 under the conclusion section.

You will find these edits and additions on these line numbers: 195-200, 210-2019, 227-237, 248-265, and 272-281.

  1. The confusion matrix in Figure 3 is well explained at lines 321-339.

Thank you for your comment. As a result of the arrangements and changes made, the following sentences were added to the confusion matrix to make it more understandable.

The confusion matrix shows a high number of correctly predicted instances, indicating the model's success in predicting traffic event locations. Each class has a relatively high number of correctly predicted instances, demonstrating successful identification.

In clusters 1, 2 and 5, incorrect predictions were confused with only one cluster. Cluster 9 was confused with the highest number of different clusters, with a total of 3. This indicates that the model's incorrect predictions were primarily due to confusion between regions that are close to each other. Conversely, predictions for regions that are relatively distant from each other were more accurate.

The explanations were highlighted yellow and given in line numbers 453-456 and 473-478.

  1. Are there limitations to the proposed approach? Please discuss them.

Taking your suggestion into account, we have added the limitations section to section 5.1. and shared the limitations of the approach. You can find the additions in lines 548-559.

  1. How can this research be further extended? Please discuss the recommendations for future research in a separate section.

The applicability of our model in different geographical contexts, as well as future research suggestions such as integrating real-time data for dynamic predictions and what needs to be done to achieve these, are included under the section 5.2. Explanations about Future Research can be found on lines 560-575.

  1. I appreciate the table from Appendix A with the description of the variables used. Please reference it correctly at line 213 and use the table style according to the guidelines.

Thank you for your appreciation. You can find the correction in line 336.

 

We believe that the quality of our work has improved further with the additions and revisions we implemented based on your feedback and comments. We appreciate your valuable contributions.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors
  1. The study introduces an approach to predicting traffic incident locations using a geohash-based model and machine learning algorithms. This approach have potential and could benefit traffic management and safety research.

  2.  
  3. Methodology: The methodology, involving the use of ArcGIS for geohash conversion and the application of machine learning algorithms like Decision Tree, k-Nearest Neighbor, Random Forest, and Support Vector Machine, is robust and well-explained. However, more details on the data pre-processing steps would enhance the clarity of the research.

  4. Data and Analysis: The use of traffic incident data from Istanbul is appropriate and provides a relevant context for the study. The analysis seems thorough, but the manuscript could benefit from a deeper discussion on the selection of the specific geohash accuracy level used in the study. Why the authors have opted for GridSearch, the pitfalls for that is very obvious, wouldn't the baysian optimization more beneficial?

  5. Results: The presentation of results, especially the performance metrics of the machine learning models, is clear and concise. However, it would be beneficial to include more discussion on why certain models performed better than others in this specific context. Moreover, try to include more statistical matrices such as Mathew correlation. An ROC curve would also be a good choice.

  6. Literature Review: The manuscript provides a comprehensive review of related studies but could further highlight how this research fills gaps in existing knowledge, particularly in the application of geohash-based models for traffic incident prediction. Also strengthen the literature by including some latest studies related to ML applications such as https://www.sciencedirect.com/science/article/pii/S1674775523002226

  7. Practical Implications: The study's implications for traffic management and emergency response are well-articulated. Additional insights into how this model can be integrated into existing traffic management systems would be valuable.

  8. Figures and Tables: The inclusion of figures and tables aids in understanding the methodology and results. However, ensuring that all figures are of high quality and clearly legible would further improve the manuscript.

  9. Recommendations for Future Research: The manuscript could benefit from a section discussing potential avenues for future research, such as exploring the model's applicability in different geographical contexts or integrating real-time data for dynamic predictions.

Overall, the manuscript is good. With some enhancements in data processing details, further discussion on model performance, and additional insights into practical applications, it can be a valuable resource for researchers and practitioners in this field.

Comments on the Quality of English Language

Minor editing

Author Response

  • The study introduces an approach to predicting traffic incident locations using a geohash-based model and machine learning algorithms. This approach have potential and could benefit traffic management and safety research.
  • Thank you for reviewing our study and making suggestions. We are grateful for your contribution to the completion of the gaps in our study. Below are our responses to each of your comments.
  • Methodology: The methodology, involving the use of ArcGIS for geohash conversion and the application of machine learning algorithms like Decision Tree, k-Nearest Neighbor, Random Forest, and Support Vector Machine, is robust and well-explained. However, more details on the data pre-processing steps would enhance the clarity of the research.
  • Thank you for your positive evaluation and opinions. As you mentioned in your comment, we agree that it is important to show the data preprocessing steps more clearly to increase the understandability of the study. In order to understand the data preprocessing steps we performed in the second stage of our three-stage model, we have created Figure 4, which shows the data preprocessing steps after line 329 under the heading "3.2.Data Collection". Instead of explaining the steps at length, we aimed to visually summarise our data preprocessing steps and make them easy to understand. Thank you for your valuable contribution.
  • Data and Analysis: The use of traffic incident data from Istanbul is appropriate and provides a relevant context for the study. The analysis seems thorough, but the manuscript could benefit from a deeper discussion on the selection of the specific geohash accuracy level used in the study. Why the authors have opted for GridSearch, the pitfalls for that is very obvious, wouldn't the baysian optimization more beneficial?
  • Thank you for your evaluation and suggestions. In the paper, we agree with your comment that it is very important to determine the areas of the traffic incident regions to be studied or the choice of the geohash level.Therefore, first we explain the geohash geographic data coding technique in section 2.1, starting from line 158. Additionally, we provide information about the 18 accuracy levels of Geohash after line 174. After line 179, we stated that traffic incidents in Istanbul have an impact area ranging from 50 to 1000 square meters. Therefore, we chose geohash code 6 to represent a rectangular region of 0.74 km² with a cell width of 1.22 km and a cell height of 0.61 km. In our study, we expanded the literature by providing sample applications in various GIS-based studies. The areas of the regions were determined using different methods. We shared these sample applications between lines 58 and 124, particularly after line 85.
  • Previous studies have primarily employed grid search to explore the parameter space. However, this approach can be computationally intensive, particularly when dealing with high-dimensional parameter spaces. As an alternative, Bayesian optimisation is often less computationally demanding. Despite the high dimensionality of the parameter space, we used grid search to explore the parameter space. This is because the gridsearchcv method is well-suited to our problem and scales well. Testing each unique hyperparameter combination in the search space took a reasonable amount of time to determine the best performing combination. This explanation is given between lines 272 and 281.
  • Results: The presentation of results, especially the performance metrics of the machine learning models, is clear and concise. However, it would be beneficial to include more discussion on why certain models performed better than others in this specific context. Moreover, try to include more statistical matrices such as Mathew correlation. An ROC curve would also be a good choice.
    • We agree that it would be useful to include more discussion on why certain models perform better than others, as you mentioned in your comment. For this purpose, we added information to the methods described in sections 2.2.1, 2.2.2, 2.2.3, and 2.2.4. (Please see the highlighted areas in 2.2.) We presented additional details about the methods, including their strengths and weaknesses. Moreover, we briefly explained the reasons behind the success of some methods. This information can be found between lines 516-523 under the conclusion section.
    • In our study, we calculated the MCC value of the RF model by taking your suggestions into consideration (see lines 442-446). We also created the Roc_Auc graph of the RF model and presented it in Appendix B.
  • Literature Review: The manuscript provides a comprehensive review of related studies but could further highlight how this research fills gaps in existing knowledge, particularly in the application of geohash-based models for traffic incident prediction. Also strengthen the literature by including some latest studies related to ML applications such as https://www.sciencedirect.com/science/article/pii/S1674775523002226
  • In the Introduction section, we have included four paragraphs discussing GIS-based studies and recently conducted research utilizing proposed algorithms for traffic incident forecasting (4 paragraphs from line 58 have been highlighted, A total of 24 new references have been utilised.).
  • We explain how this research fills gaps in existing knowledge and contributes to the literature in the introduction, lines 65-70, 77-84 and 147-155
  • Taking your suggestions into consideration, we have enriched our study by including some recent studies on machine learning applications as well as recent GIS-based studies on traffic incident locations. We have added a recent literature review on lines 58-124. We have also made some literature additions in addition to the additional information about the methods described under 2.2.
  • Practical Implications: The study's implications for traffic management and emergency response are well-articulated. Additional insights into how this model can be integrated into existing traffic management systems would be valuable.
  • Thank you for your positive evaluation and opinions. The model's predictive capabilities can facilitate automatic alerts and emergency response management when incorporated into traffic management systems. Further information on this subject is provided in the conclusion section. Explanations have been added in lines 532-538 of the paragraph. To integrate the model into existing traffic management systems, it is crucial to gather real-time data from all potential areas where traffic incidents may occur. Furthermore, integrated systems that work with all databases should be developed. We already discussed this topic in the second paragraph of the Future Research section (refer to lines 568-575).
  • Figures and Tables: The inclusion of figures and tables aids in understanding the methodology and results. However, ensuring that all figures are of high quality and clearly legible would further improve the manuscript.
  • The quality of Figure 1, 3, 5 and all the Figures in Appendix B has been improved. Figures 2 and 4 were included in the study after the peer review process.
  • Recommendations for Future Research: The manuscript could benefit from a section discussing potential avenues for future research, such as exploring the model's applicability in different geographical contexts or integrating real-time data for dynamic predictions.
  • The applicability of our model in different geographical contexts, as well as future research suggestions such as integrating real-time data for dynamic predictions and what needs to be done to achieve these, are included under the section 5.2. Future study suggestions are presented under Future Research (Lines 560-575).
  • Overall, the manuscript is good. With some enhancements in data processing details, further discussion on model performance, and additional insights into practical applications, it can be a valuable resource for researchers and practitioners in this field.
  • We believe that the quality of our work has improved further with the additions and revisions we implemented based on your feedback. We appreciate your valuable contributions.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors,

congratulations on the high-quality revised paper. Thank you for considering all my points from the previous review round.

I consider that the paper is readable, well-explained and all the updates make this research reproducible. I appreciate that you emphasized the limitations and provided feasible recommendations for future research.

I recommend the acceptance of the paper in its present form.

Reviewer 3 Report

Comments and Suggestions for Authors

The paper can now be accepted in current form.

Comments on the Quality of English Language

Overall ok. Slightly minor editing for typos 

Back to TopTop