Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Drivers’ Behavior and Traffic Accident Analysis Using Decision Tree Method

Sustainability 2022, 14(18), 11339; https://doi.org/10.3390/su141811339

by Pires Abdullah^1,2,*

and Tibor Sipos^1,3

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Jannos Toth

Sustainability 2022, 14(18), 11339; https://doi.org/10.3390/su141811339

Submission received: 9 August 2022 / Revised: 3 September 2022 / Accepted: 6 September 2022 / Published: 9 September 2022

(This article belongs to the Section Sustainable Transportation)

Round 1

Reviewer 1 Report (Previous Reviewer 4)

excellent work.

Author Response

Thanks for your comments, The English review was conducted in this article using a two-stage process in which two editors reviewed the file. Both editors are native English speakers.

Reviewer 2 Report (Previous Reviewer 2)

The authors provided no responses to my third review. I have several big questions about this revision.

In the Materials and Methods section, the authors now state that the total number of participants from the public was 1172, whereas they previously stated it was 586. It is very strange that the new number is exactly twice the old number. Even more strange is that all of the pie charts depicting the attributes of this dataset show exactly the same percentages as in previous versions of the paper with only 586 participants. So the number of participants exactly doubled, and the percentages of attributes for the additional participants happened to be distributed exactly the same as the original participants? How is this possible?

Next, the decision tree shows that 384 participants were used to create it. This again is exactly double the 168 participants used to create the decision tree in the previous version of the paper. However, the branches of the decision tree now show very different variables to be associated with crash severity at each level even though the characteristics of this double-size dataset are shown by the pie-charts to be the same as the half-size dataset.

Exceeding the speed limit, which was a very prominent variable in the previous analysis is now absent from the decision tree. Excessive speed has been shown by dozens of analyses to be significantly related to crash severity. So this analysis contradicts the finding of so many other studies. As stated in my previous reviews, the results do not show any measures of statistical significance, which undermines the validity of these results.

The authors note that higher entropy values (which can range from 0 to 1) indicate greater variability and less reliability. Of the 17 entropy values shown on the decision tree for which there are not simply zero severe or non-severe crashes, 16 are above 0.5 with the root having a 0.828 entropy value and the others very high as well. Statistical significance is unlikely with this much variability.

Author Response

Point 1: In the Materials and Methods section, the authors now state that the total number of participants from the public was 1172, whereas they previously stated it was 586. It is very strange that the new number is exactly twice the old number. Even more strange is that all of the pie charts depicting the attributes of this dataset show exactly the same percentages as in previous versions of the paper with only 586 participants. So the number of participants exactly doubled, and the percentages of attributes for the additional participants happened to be distributed exactly the same as the original participants? How is this possible?

Response 1: This article is gone through a major revision, in which the entire coding has been updated using the total number of the dataset that is stated in this new version and submitted again. This update was not applied to the graph and charts from the beginning of the submission but only to the part of the decision tree, which is the main part of this article. The reason behind using a less quantity of data in the decision tree in the previous versions was because of the lack of controlling the size of the tree from my side and the less reliable percentage obtained in the training set. I confirm that there was a mistake in the previous versions of the decision tree part. In addition, my discussion was wrong in describing the graph content because of a misunderstanding of the decision tree algorithm at the beginning. I realized that after reviewing your last comment, which I totally agree with, and after deep learning and well understanding of the concept of the decision tree and using advanced coding in Python to control and generate the accurate decision tree map, I believe that now I am very confident and totally convinced about the result in this article.

Point 2: Next, the decision tree shows that 384 participants were used to create it. This again is exactly double the 168 participants used to create the decision tree in the previous version of the paper. However, the branches of the decision tree now show very different variables to be associated with crash severity at each level even though the characteristics of this double-size dataset are shown by the pie-charts to be the same as the half-size dataset.

Response 2: As it was clarified above, the dataset and the coding related to the decision tree have been updated and revised using the full dataset instead of the half and running the program again. This has implied involving other variables. Furthermore, the size of the tree generated by the program is now larger than before as you may have noticed.

Point 3: Exceeding the speed limit, which was a very prominent variable in the previous analysis is now absent from the decision tree. Excessive speed has been shown by dozens of analyses to be significantly related to crash severity. So this analysis contradicts the finding of so many other studies. As stated in my previous reviews, the results do not show any measures of statistical significance, which undermines the validity of these results.

Response 3: Exceeding the speed limit has a significant effect on accidents; this has been clearly identified in the article as a whole. In the decision tree graph, the variable “Support_SpeedLimit_Radars” has been taken instead, and the Exceeding_speed_limit variable was eliminated as they almost have the same idea. Nevertheless, the variable Exceeding_Speed_limit is added to the algorithm and it is now available in figure 6. As for the statistical significance, it was clarified before and again from the concept of the Decision Tree in this article that it is an algorithm set to analyze traffic accidents as non-parametric procedures that do not consider mathematical assumptions of relationships between variables involved. The main purpose of these algorithms is to obtain division data for defining homogeneous groups with respect to the dependent variable. The “score()” method of the decision tree object is used to display the percentage accuracy of the assignments made by the classifier. It takes the input and target variables as arguments. The test score value obtained from the “skilearn” function by the Python program for this study indicates that the classification made by this model is correct with almost 80% percent, which is now more than the previous submission.

Point 4: The authors note that higher entropy values (which can range from 0 to 1) indicate greater variability and less reliability. Of the 17 entropy values shown on the decision tree for which there are not simply zero severe or non-severe crashes, 16 are above 0.5 with the root having a 0.828 entropy value and the others very high as well. Statistical significance is unlikely with this much variability.

Response 4: In all Decision Tree related studies that have been reviewed, there was nothing found that all nodes have less or equal to 0.5 entropy. The requirement mentioned in point 4 above is almost impossible and even does not reflect reality. I am bringing here again the definition of the DT, which represents a classification problem as a set of decisions based on the values of the features. Each node of the tree represents a threshold over the value of a feature and splits the training samples into two smaller sets. The decision process is repeated over all the features, growing the tree until an optimal way of splitting the samples is computed. It is normal that at the root of the tree, the entropy is high. However, it still indicates that the root is the main influential factor in this classification. Decision Tree does not need to specify a functional form as other traditional statistical modeling techniques; instead, the test score function identifies the accuracy level of the classification, which proves the significance of the result obtained.

Finally, I would like to thank you and mention that I have invested a lot of time and effort until I get to know the concept behind the DT algorithm. This is a new approach, the implementation of this method in Python that has been done is professional, and the idea of the analysis is unique. I have followed your comments and revised the whole article multiple times based on your review as much as possible. This is the final version. I would say that I am confident about the content in this version. I have done everything possible from my side. I am ready, if you want, to send you the Python code and the dataset file that was utilized in this study to check it yourself. If you find a single error or any mismatch in all graphs in the article I am ready to be held accountable. Thanks again for considering my request.

Reviewer 3 Report (Previous Reviewer 1)

The improvement is well done.

Author Response

Thanks for your comments, Further amendments were done to the article.

Round 2

Reviewer 2 Report (Previous Reviewer 2)

The authors did not answer my previous question: “How is this possible?” The authors agree that their previous versions of the paper were incorrect, and that they are now reporting results based on all 1172 participants rather than 586 participants. However, Figure 1 showing some characteristics of the 1172 participants is still identical to Figure 1 in all previous versions of the paper. So again I ask “How can the full group of 1172 participants used in versions 3 and 4 of the paper have identical characteristics to the first group of 586 participants used in versions 1 and 2 of the paper?”

The authors state “In the decision tree graph, the variable “Support Speed Limit Radars” has been taken instead, and the “Exceeding Speed Limit” variable was eliminated as they almost have the same idea.” However, the authors do not show any correlation between these variables in the dataset to support this claim.

Author Response

Point 1: The authors did not answer my previous question: “How is this possible?” The authors agree that their previous versions of the paper were incorrect, and that they are now reporting results based on all 1172 participants rather than 586 participants. However, Figure 1 showing some characteristics of the 1172 participants is still identical to Figure 1 in all previous versions of the paper. So again I ask “How can the full group of 1172 participants used in versions 3 and 4 of the paper have identical characteristics to the first group of 586 participants used in versions 1 and 2 of the paper?”

Response 1: I think that I have answered this question in the previous review round, but let me explain it again. In answering this question in the last round, I stated that, if you recheck it, "This update was not applied to the graph and charts from the beginning of the submission but only to the part of the decision tree." It means that, of course, the graphs must remain identical for all versions as it took the 1172 participants from the beginning, but half of the dataset (586) was taken to be applied to the DT algorithm and to generate its graph. I explained why this had happened, and I admitted that I realized this was an incorrect process, and I said, open and frank, that the mistake was from my side. However, in the current version, the mistake is corrected and the entire dataset was considered for all graphs and for the DT graph, which is 1172. As you may have seen in Figure 6, the DT graph is now different from the previous versions, as it is larger in size, and it has different nodes.

Point 2: The authors state “In the decision tree graph, the variable “Support Speed Limit Radars” has been taken instead, and the “Exceeding Speed Limit” variable was eliminated as they almost have the same idea.” However, the authors do not show any correlation between these variables in the dataset to support this claim.

Response 2: Figure 6 in the current version contains all the details, in which both "Support Speed Limit Radars" and "Exceeding Speed Limit" are now included in the graph after your requirement to be available in the DT graph. According to the concept of the Decision Tree algorithm, variables are managed and processed based on classification.

Note: The English review was conducted for this article using a two-stage process in which two editors reviewed the file. Both editors are native English speakers.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

I added some comments in the article.

You should detailed explain the result of the Python examination.

Comments for author File: Comments.pdf

Author Response

Point 1: summarise

Response 1: It was a typo. The word has been corrected. Thanks!

Point 2: allways?

Response 2: The previous word has been replaced with the word “Always” instead of “Yes almost”.

Point 3: Not this is the main consequence. It is more important that the supporters (gnerally and totally) have higher share than the unsupporters.

Response 3: The text has been edited according to the comment provided

Point 4: Allways

Response 4: The graph has been replaced with a new one marked with the word “Always” instead of “Yes”

Point 5: What are the meaning of the boxes in the last row?

Response 5: It is the Entropy, the measure of noise in the decision. Noise can be viewed as uncertainty. For example, in nodes in which the decision results is equal values in the severity value array, the entropy is at its highest value, which is 1.0. This means that the model was unable to definitively mark the classification decision based on the input variables. For values of very low entropy, the decision was much more clear-cut, and the difference in the number of severe and non-severe is much higher. The graph has been more clarified. Thanks!

Point 6: Older respondents?

Response 6: The higher age group in classifaction in the data set are the older respondents. It has has been clarified and changed with the “older repondents”.

Point 7: The reader can not understand what is the meaning of this branch. Please write here the ages not the number of groups. Nor does it follow from the next sentence

Response 7: All the paragraph has been modified with detailed information about every single part and object of the graph to be more understandable. Thanks!

Author Response File: Author Response.docx

Reviewer 2 Report

The second sentence of page 4 and Table 1 do not agree. The sentence says that only property damage crashes are 0 and fatal crashes are 1, but the table lists all injury and property damage crashes together. Then, in the paragraph below Table 1, all fatal and injury crashes are grouped together as “severe” and only property damage crashes are grouped as “non-severe”. This agrees with Figure 3 on page 5, which shows all fatal and injury crashes grouped together.

The authors never say how they controlled the sample of who was surveyed or even the sample size of the survey. They also do not compare the characteristics of those sampled (such as gender and age) to the general population. Thus, the percentages of drivers involved in crashes by age may not represent these percentages for the overall population. The survey results and decision tree analysis are not very informative or reliable without knowing quality of the sample. As a result, the relationships between the attributes indicated by the decision tree may be misleading. The authors state “it can be revealed from the graph above that the accidents that was related to the higher age group has been reported as severe crashes more frequently than other groups.” Statistics from throughout world often indicate that younger, less experienced drivers have the most frequent and severe crashes per number of drivers.

Author Response

Point 1: The second sentence of page 4 and Table 1 do not agree. The sentence says that only property damage crashes are 0 and fatal crashes are 1, but the table lists all injury and property damage crashes together. Then, in the paragraph below Table 1, all fatal and injury crashes are grouped together as “severe” and only property damage crashes are grouped as “non-severe”. This agrees with Figure 3 on page 5, which shows all fatal and injury crashes grouped together.

Response 1: Table 1 was taken and cited from a previous study and the citation with reference number was written in the article. This table was for showing the situation of the city in terms of the number of traffic accidents per year. The table has been clarified. Thanks!

Point 2: The authors never say how they controlled the sample of who was surveyed or even the sample size of the survey. They also do not compare the characteristics of those sampled (such as gender and age) to the general population. Thus, the percentages of drivers involved in crashes by age may not represent these percentages for the overall population. The survey results and decision tree analysis are not very informative or reliable without knowing quality of the sample. As a result, the relationships between the attributes indicated by the decision tree may be misleading. The authors state “it can be revealed from the graph above that the accidents that was related to the higher age group has been reported as severe crashes more frequently than other groups.” Statistics from throughout world often indicate that younger, less experienced drivers have the most frequent and severe crashes per number of drivers

Response 2: It is added to the article that the Simple Random Sample technique was used to collect the data in the city by interviewing the local people. The overall number of participants was roughly 300 people the population of the city is around 450,000. The questionnaire form has been distributed among different groups of people in all areas of the city considering age, gender, education level, and other factors. New figures have been added to the article to represent the percentages of the participants related to age and gender.

In Python, there is a function to evaluate the model, which is the score(). This function of the decision tree is to display the percentage accuracy of the assignments made by the classifier. It takes the input and target variables as arguments. The score value for this study indicates that classifications made by the model should be correct approximately 77% of the time.

The higher age group in the graph pointed to those who are over 65 years old, which are mostly suffering from health issues and vision problems. The text has been revised and explained in a more understandable way. Thanks!

Author Response File: Author Response.docx

Reviewer 3 Report

This study has no theoretical foundation. The authors need to address review of literature by reviewing relevant literature.

The authors also need to elaborate the method. The elaboration include data collection(source) and analytic method. Why the decision tree method is essential in this research?

Moreover, the authors need to present the theoretical contribution of this research. However, it could be implemented after presenting organized review of literature.

Author Response

Point 1: This study has no theoretical foundation. The authors need to address review of literature by reviewing relevant literature.

Response 1: Additional literature has been added to the article. Thanks!

Point 2: The authors also need to elaborate the method. The elaboration include data collection(source) and analytic method. Why the decision tree method is essential in this research?

Response 2: The method has been more elaborated on and explained in a more comprehensible way. It is also added that the Decision Tree has been seen as an effective method to be used in this research, the main motivation is because the target variable that needs to be investigated has a binary attribute, which is the level of severity, which is considered compatible with this the Decision Tree method. Thanks!

Point 3: the authors need to present the theoretical contribution of this research. However, it could be implemented after presenting organized review of literature

Response 3: Theoretical contribution has been done and the gap has been addressed at the end of the literature. Thanks!

Author Response File: Author Response.docx

Reviewer 4 Report

The research represent an important work which has the potential save human life, but it needs further improvement based on these comments:

1) figure 1 should be deleted from manuscript and its information incorporated inside the text because it is not common practice to copy/paste software raw output to an academic article.

2) figure 6 should be redrawn by excel or word document tools because it is not common practice to copy/paste software raw output to an academic article. Please draw the tree output by yourself.

3) in regards to road safety assessment, please read this article and add relevant information from it to your text and of course cite it in the reference list:

Gitelman, V., Doveh, E., & Hakkert, S. (2010). Designing a composite indicator for road safety. Safety science, 48(9), 1212-1224.

4) in regards to machine learning decision trees in transportation please read this article and add relevant information from it to your text and of course cite it in the reference list:

Clarke, D. D., Forsyth, R., & Wright, R. (1998). Machine learning in road accident research: decision trees describing road accidents during cross-flow turns. Ergonomics, 41(7), 1060-1079.

5) please read this article about lidar technology and add relevant information from it to your text and of course cite it in the reference list:

Kim, J., Park, B. J., Roh, C. G., & Kim, Y. (2021). Performance of mobile LiDAR in real road driving conditions. Sensors, 21(22), 7461.

6) please add as an appendix to article the questionnaire used for interview.

7) the introduction section of the article should be shorter and focus on postulating the research questions and gap being addressed in literature. Please move a lot of information which is currently in the introduction section into a new section that you will create titled: literature review.

8) please add chi-square test of independency to investigate wether the severity of accident is dependent on levels of various social factors described in the questionnaire.

Author Response

Point 1: figure 1 should be deleted from manuscript and its information incorporated inside the text because it is not common practice to copy/paste software raw output to an academic article.

Response 1: Figure 1 has been deleted. Thanks!

Point 2: figure 6 should be redrawn by excel or word document tools because it is not common practice to copy/paste software raw output to an academic article. Please draw the tree output by yourself.

Response 2: A new figure format of JPEG has been created by Python and has been inserted as a graph to the article instead of the copy/paste. Thanks!

Point 3: in regards to road safety assessment, please read this article and add relevant information from it to your text and of course cite it in the reference list:

Response 3: It’s done. Thanks!

Point 4: in regards to machine learning decision trees in transportation please read this article and add relevant information from it to your text and of course cite it in the reference list:

Response 4: It’s done. Thanks!

Point 5: please read this article about lidar technology and add relevant information from it to your text and of course cite it in the reference list:

Response 5: It’s done. Thanks!

Point 6: please add as an appendix to article the questionnaire used for interview.

Response 6: The questionnaire is available but it was not possible to add it to the article because the design of the article is corrupted once it has been added. I will send you the questionnaire separately and to the journal to be added as an appendix. Thanks!

Point 7: the introduction section of the article should be shorter and focus on postulating the research questions and gap being addressed in literature. Please move a lot of information which is currently in the introduction section into a new section that you will create titled: literature review.

Response 7: The introduction includes the litrature review part in which the gap has been addressed at the end. The sections has been devided already by the templete provided form the journal website that I followed accordingly.

Point 8: please add chi-square test of independency to investigate wether the severity of accident is dependent on levels of various social factors described in the questionnaire.

Response 7: It has been added by Python, in which, there is a function to evaluate the model, which is the score(). This function of the decision tree is to display the percentage accuracy of the assignments made by the classifier. It takes the input and target variables as arguments. The score value for this study indicates that classifications made by the model should be correct approximately 77% of the time. Thanks!

Author Response File: Author Response.doc

Round 2

Reviewer 2 Report

The paper analyzes a very small sample. The authors state that the number of participants was “roughly 300 people”. A sample size is not a rough estimate. The paper must say exactly how many observations were collected. However, as stated by the authors, the root of the decision tree indicates that only 184 observations were processed. The authors do not state what happened to the other 116 observations.

The authors never say how they controlled the sample of who was surveyed and do not compare the characteristics of those sampled (such as gender and age) to the general population of Duhuk. Thus, the percentage attributes of drivers involved in these crashes may not represent the overall population. The survey results and decision tree analysis are not very informative or reliable without knowing the quality of the sample. In addition, this is self-reported data, which may not represent true driving behavior. As a result, the relationships between the attributes indicated by the decision tree may be misleading.

The results provide no indication of statistical significance. The authors mention that higher entropy values (which can range from 0 to 1) indicate greater variability and less reliability. Of the 15 entropy values shown on the decision tree, 11 are above 0.5 with the root having a 0.828 entropy value. The lower entropy values are in the bottom branches of the tree with very few observations. The entropy values do not indicate statistical significance.

Author Response

Point 1: The paper analyzes a very small sample. The authors state that the number of participants was “roughly 300 people”. A sample size is not a rough estimate. The paper must say exactly how many observations were collected. However, as stated by the authors, the root of the decision tree indicates that only 184 observations were processed. The authors do not state what happened to the other 116 observations.

Response 1: Thank you so much for the note! 586 samples are the exact number of respondents. These are the individuals who agreed to respond truthfully to the questionnaire that was composed of questions about traffic accidents and driving habits and attitudes. The program only chooses participants who have been in traffic "accidents" because the study is investigating traffic accidents and the severity of the crashes. It means that, since 45.4 percent of the respondents said they had been in traffic accidents, according to figure 2a in the article, the number has changed to 266. Furthermore, only those who checked the option in the questionnaire form to "position == driver" were chosen from this group of people, which means, excluding "passengers" and considering only the "drivers". In figure 2a, it was also reported that 69.2% of those involved in accidents were the drivers, with the remaining people being passengers. Currently, 184 has been taken as the last number.

Point 2: The authors never say how they controlled the sample of who was surveyed and do not compare the characteristics of those sampled (such as gender and age) to the general population of Duhuk. Thus, the percentage attributes of drivers involved in these crashes may not represent the overall population. The survey results and decision tree analysis are not very informative or reliable without knowing the quality of the sample. In addition, this is self-reported data, which may not represent true driving behavior. As a result, the relationships between the attributes indicated by the decision tree may be misleading.

Response 2: The questionnaire form has been distributed among different groups of people all over the city of Duhok. People from both genders and all age groups starting from 18 years old were targeted. Figure 1a represents that the people who have been interviewed were 64.5% male and 35.5% female. This is reflecting the reality that male drivers are more than female drivers which is the case in this city. In addition, all age groups were included in the survey, the exact percentage of all groups is in figure 1b. Duhok is a small-sized city with 450,000 residents. 586 samples were collected even though it was hard to investigate drivers’ habits and attitudes. The Decision Tree was used to investigate the severity of the crashes. This target variable has the binary attribute that is “Severe Crash”, and “Non-Severe”, which is compatible to be tested with this method “DT”. Another study that was referenced in this article [24] utilized the decision tree method with only two hundred police case files in a city in the UK to distinguish the characteristics of accidents that resulted in injury or in damage only.

Point 3: The results provide no indication of statistical significance. The authors mention that higher entropy values (which can range from 0 to 1) indicate greater variability and less reliability. Of the 15 entropy values shown on the decision tree, 11 are above 0.5 with the root having a 0.828 entropy value. The lower entropy values are in the bottom branches of the tree with very few observations. The entropy values do not indicate statistical significance.

Response 3: Decision trees represent a classification problem as a set of decisions based on the values of the features. Each node of the tree represents a threshold over the value of a feature and splits the training samples into two smaller sets. The decision process is repeated over all the features, growing the tree until an optimal way of splitting the samples is computed. It is normal that at the root of the tree, the entropy is high. However, it still indicates that the root is the main influential factor in this classification. Decision Tree does not need to specify a functional form as other statistical modeling techniques. One of the advantages of the DT is that the outcomes of the analysis are easy to understand and perform due to the graphical nature of its results. It can easily find the important variables of the model. The test score value obtained from the “skilearn” function by the Python program for this study indicates that the classification made by this model is correct with 77% percent. Thanks!

Author Response File: Author Response.docx

Reviewer 3 Report

The revision are performed well.

Please check English and grammar.

Author Response

Point 1: Please check English and grammar.

Response 1: The English writing has been modified. Thanks!

Reviewer 4 Report

The article needs service of professional English proof editing before it can be published. As simple example, in line 275 is should be written "driving" not "deriving". Also, many sentences are poorly phrased along the article. another example, in lines 235 and 236 the number of figure mentioned in the text does not match the numbers listed below the figures.

The methodology needs to be stronger, please read this article about chi-square test of road safety factors and see if you can implement some of its methods:

Afandizadeh, S., & Hassanpour, S. (2020). Evaluating the effect of roadway and development factors on the rural road safety risk index. Advances in Civil Engineering, 2020.

I would like to see the authors mentioning autonomous vehicles as a future avenue of research that has potential to significantly diminish road accidents.

Please read these three articles about mobileye, Tesla and google which are leaders in the field of driveless cars and mention them inside your text.

Yoffie, D. B. (2014). Mobileye: The future of driverless cars. Harvard Business School Case, 715-421.

Naor, M., Coman, A., & Wiznizer, A. (2021). Vertically Integrated Supply Chain of Batteries, Electric Vehicles, and Charging Infrastructure: A Review of Three Milestone Projects from Theory of Constraints Perspective. Sustainability, 13(7), 3632.

Poczter, S. L., & Jankovic, L. M. (2014). The google car: driving toward a better future?. Journal of Business Case Studies (JBCS), 10(1), 7-14.

Author Response

Point 1: The article needs service of professional English proof editing before it can be published. As simple example, in line 275 is should be written "driving" not "deriving". Also, many sentences are poorly phrased along the article. another example, in lines 235 and 236 the number of figure mentioned in the text does not match the numbers listed below the figures.

Response 1: The words have been corrected. The English writing has been modified. Thanks!

Point 2: The methodology needs to be stronger, please read this article about chi-square test of road safety factors and see if you can implement some of its methods:

Response 2: Many thanks for the suggested article, it has many significant findings that have been added to my research. However, the methods that have been used in the article referenced were based on clustering while the decision tree belongs to classification. It would be a great idea to make a comparison between them but it would take a long time to do it. I’m taking it into consideration for my future research.

Classification problems – These are used when the target variable is discrete. Typically, the problem consists in estimating to which, of a set of pre-defined classes, a specific sample belongs. A visual interpretation of a classification problem can be seen in two dimensions, where points belonging to different classes are marked with a different symbol. The main motivation to use the decision tree method in this article is because the target variable that needs to be investigated has a binary attribute, which is the level of severity (“Severe Crash” and “Non-Severe”), which is considered compatible with this study.

Decision Tree does not need to specify a functional form as other statistical modeling techniques. One of the advantages of the DT is that the outcomes of the analysis are easy to understand and perform due to the graphical nature of its results. It can easily find the important variables of the model. The test score value obtained from the “skilearn” function by the Python program for this study indicates that the classification made by the model is correct approximately 77% of the time. Thanks!

Point 3: I would like to see the authors mentioning autonomous vehicles as a future avenue of research that has potential to significantly diminish road accidents.

Please read these three articles about mobileye, Tesla and google which are leaders in the field of driveless cars and mention them inside your text.

Response 3: The idea has been mentioned and referenced in the article. Thanks!

Author Response File: Author Response.docx

Round 3

Reviewer 2 Report

The results of this analysis are not well explained and therefore very difficult to interpret. First, the analysis is only among drivers that had crashes. I assume the severity level is of the most severe crash, since each driver is associated with only one severity level. The top level shows that there were 184 drivers in the sample of which 136 had a less severe crash as the worse crash and 48 had a more severe crash as the worse crash. Drivers that were more supportive of speed detection (the branch labeled false) were mostly younger than the oldest age group and had a higher proportion of severe crashes, which makes no sense.

The right branch then splits into two overlapping ranges (and thus conflicting) of the same variable, with one branch being for exceeding the speed limit <= 0.5 and the other for <= 1.5. The branch for the least often exceeding the speed limit (<= 0.5) had a higher proportion of severe crashes, which again makes no sense. However, the authors completely removed their discussion of age from the first version of the paper after I commented that it was misleading and contrary to world statistics. Moreover, the authors have reduced the discussion and conclusions of results to very little. I presume this is because of many inconsistencies.

The difficulty with this paper is that the variables analyzed by the decision tree may not be variables regarding the crash victim on which the severity of the crash is based. For example, the right branch of the decision tree’s second level for age <= 2.5 indicates that crashes are less severe for younger drivers, but the reported crash severity is often not the severity of the driver’s injuries. Crash frequency on the other hand is higher for younger drivers than older drivers. Research often shows that crash severity is unrelated to a driver’s age, but is related to age of the most severely injured person.

The decision tree is poorly labeled. The first level shows true and false for the two branches, but none of the lower branches indicate which direction the more or less severity. Finally, DT analyses often lack good measures of statistical significance, which undermines the validity of these results.

Reviewer 4 Report

The authors addressed all of my comments.

Article Menu

Drivers’ Behavior and Traffic Accident Analysis Using Decision Tree Method

Further Information

Guidelines

MDPI Initiatives

Follow MDPI