Next Article in Journal
Utilization of Recycled Plastic Waste in Fiber Reinforced Concrete for Eco-Friendly Footpath and Pavement Applications
Previous Article in Journal
Dynamic Versus Static Life Cycle Assessment of Energy Renovation for Residential Buildings
 
 
Article
Peer-Review Record

XGBoost-DNN Mixed Model for Predicting Driver’s Estimation on the Relative Motion States during Lane-Changing Decisions: A Real Driving Study on the Highway

Sustainability 2022, 14(11), 6829; https://doi.org/10.3390/su14116829
by Chen Zhao 1, Xia Zhao 1,*, Zhao Li 1 and Qiong Zhang 2
Reviewer 2:
Sustainability 2022, 14(11), 6829; https://doi.org/10.3390/su14116829
Submission received: 11 May 2022 / Revised: 30 May 2022 / Accepted: 31 May 2022 / Published: 2 June 2022

Round 1

Reviewer 1 Report

The topic presented in the paper is interesting. In the reviewed paper, a driving study on the highway connected with a mixed model for predicting the driver's estimation of the relative motion state during lane-changing decisions has been presented. A mixed algorithm of extreme gradient boosting (XGBoost) and deep neural network (DNN) was proposed in this paper for establishing driver's speed estimation and distance prediction models. Compared with other machine learning models, the XGBoost-DNN prediction model performs more accurately prediction performance in both classification scenarios. It is worth mentioning that the XGBoost-DNN mixed model exhibits a prediction accuracy of approximately two percentage points higher than that of the XGBoost model. In the two-classification scenarios, the accuracy estimations of XGBoost-DNN speed and distance prediction models are 91.03% and 92.46%, respectively. In the three-classification scenarios, the accuracy estimations of XGBoost-DNN speed and distance prediction models are 87.18% and 87.59%, respectively. In my opinion, the paper can be published, after taking into account the following remarks:

  • lines 31-32:, the Authors wrote as follows: ..." Statistics show that lane change misjudgment is a major factor in traffic accidents"..., The question is in which country?, because in the majority of countries it is not a true statement,
  • equation (1): there is a lack of explaining the meaning of "fk(xi)". The Authors should check in the whole paper text if all used acronyms/variables are explained in their meaning,
  • the review of the scientific literature is poorly presented and should be extended,
  • section 3 called "3. Study implementation" should not be divided into further subsections, because their content is too poor, e.g. subsection 3.1. called "3.1. Participants" consist only of 3 sentences. The Author can either develop the content of each subsection or just not divided section 3 into further subsections. This remark is dedicated the all similar cases in the paper text,
  • section 3: how look like the data gathered for analysis? The statistical description of gathered data should be provided,
  • "Figure 7. Distribution of target vehicle motion and position parameters during relative distance estimation task " - what is the name and the unit of axis "x"? It should be added,
  • "Figure 8. Distance Estimation Task Ridge Trace Plot" - what is the name and the unit of axis "x" and axis "y"? It should be added tool. The same remark is dedicated to figure 9 and all similar cases in this paper,
  • a section called "6. Discussion and Conclusion" has a character of discussion. More detailed conclusions should be provided.

Author Response

Response to Reviewer 1 Comments

We gratefully thank you for your time spend making constructive remarks and useful suggestions, which has significantly raised the quality of the manuscript and has enable us to improve the manuscript. Each suggested revision and comment, brought forward by the reviewer 1 was accurately incorporated and considered. Below the comments of the reviewer are response point by point and the revisions are indicated.

Specific comments:

The topic presented in the paper is interesting. In the reviewed paper, a driving study on the highway connected with a mixed model for predicting the driver's estimation of the relative motion state during lane-changing decisions has been presented. A mixed algorithm of extreme gradient boosting (XGBoost) and deep neural network (DNN) was proposed in this paper for establishing driver's speed estimation and distance prediction models. Compared with other machine learning models, the XGBoost-DNN prediction model performs more accurately prediction performance in both classification scenarios. It is worth mentioning that the XGBoost-DNN mixed model exhibits a prediction accuracy of approximately two percentage points higher than that of the XGBoost model. In the two-classification scenarios, the accuracy estimations of XGBoost-DNN speed and distance prediction models are 91.03% and 92.46%, respectively. In the three-classification scenarios, the accuracy estimations of XGBoost-DNN speed and distance prediction models are 87.18% and 87.59%, respectively. In my opinion, the paper can be published, after taking into account the following remarks:

1. lines 31-32:, the Authors wrote as follows: ..." Statistics show that lane change misjudgment is a major factor in traffic accidents"..., The question is in which country?, because in the majority of countries it is not a true statement.

Answer: I am sorry that this part was not rigorous in the original draft and I have changed the relevant content in the revised manuscript. Thank you very much for your rigorous scientific attitude, which I think will have a lasting impact on my future writing.

2. equation (1): there is a lack of explaining the meaning of "fk(xi)". The Authors should check in the whole paper text if all used acronyms/variables are explained in their meaning.

Answer: Thank you very much for your professional opinion. We have added the interpretation of fk(xi) in the revised version of the manuscript.

3. the review of the scientific literature is poorly presented and should be extended.

Answer: Thank you for your careful review of our manuscript. We think your comments are very meaningful. Driver's lane change process is generally divided into three stages: information perception, lane change decision, and action implementation. Most of the studies in the current research have focused on the study of lane change decision and ignored the study of driver's information perception. Within the scope of our knowledge, there are not many studies on the perception of driver's lane change information, so there are not many articles in our review in the literature review. Given that your comments are so meaningful, we have done our best to expand part of the literature review in the revised manuscript, but we know that our efforts have not been enough. We will certainly follow your comments and expand the literature review when we carry out further research in the future.

4. section 3 called "3. Study implementation" should not be divided into further subsections, because their content is too poor, e.g. subsection 3.1. called "3.1. Participants" consist only of 3 sentences. The Author can either develop the content of each subsection or just not divided section 3 into further subsections. This remark is dedicated the all similar cases in the paper text.

Answer: Thank you very much for your valuable suggestion. We believe that dividing the description of "3. Study implementation" into different subsections can increase the readability of the reader. Most of the articles are divided into separate subsections for the description of the experimental section, such as the article cited below. However, we think that your suggestion is very helpful, so we have made the following changes in the revised manuscript after taking it into consideration.

Firstly, the description of the section "3.1. Participants" has been expanded according to your suggestion. Secondly, we have merged sections 3.2 and 3.3 of the original manuscript into a new "3.2. Apparatus and Experimental Route".

[1] Wang, C.; Sun, Q.; Fu, R.; Li, Z.; Zhang, Q. Lane change warning threshold based on driver perception characteristics. Accident Analysis & Prevention 2018, 117, 164-174,

[2] Li, Z.; Wang, C.; Fu, R.; Sun, Q.; Zhang, H. What is the difference between perceived and actual risk of distracted driving? A field study on a real highway. PLoS One 2020, 15, e0231151.

[3] Yun, M.; Zhao, J.; Zhao, J.; Weng, X.; Yang, X. Impact of in-vehicle navigation information on lane-change behavior in urban expressway diverge segments. Accident Analysis & Prevention 2017, 106, 53-66.

5. section 3: how look like the data gathered for analysis? The statistical description of gathered data should be provided.

Answer: Thank you for posing such an important question. I am sorry that this part was not clear in the original manuscript. Millimeter wave radar is used to capture the relative position and relative speed data between the subject vehicle and the target vehicle. The video surveillance system was used to record the motion of the target vehicle; the GPS was used to record the geographic location of the subject vehicle; the CAN capture card obtained the speed information of the subject vehicle; and the wireless button recorded the moment when the participant performed the task. Therefore, the original data collected in this study were in multiple formats. After all the data were collected the useful data were manually selected for further analysis and Section 4 shows a statistical description of the data used in this study.

6. "Figure 7. Distribution of target vehicle motion and position parameters during relative distance estimation task " - what is the name and the unit of axis "x"? It should be added.

Answer: Thank you very much for your expert comment. So Figure 7 depicts the distribution of the sample data, in which position of the vertical coordinate in Figure 7 has more data and which position has a wider horizontal width.

7. "Figure 8. Distance Estimation Task Ridge Trace Plot" - what is the name and the unit of axis "x" and axis "y"? It should be added tool. The same remark is dedicated to figure 9 and all similar cases in this paper.

Answer: Thank you for your careful comments, which will help us a lot in our future writing.

(1) The names of axis "x" and axis "y" have been added in the revised manuscript. There are no units since the axes "x" and "y" are expressed as numbers.

(2) The axis "x" in Figure 9 represents the weight, which has no unit; the axis "y" has no practical meaning. We have added the meaning of axis "x" in Figure 9 of the revised manuscript.

8. a section called "6. Discussion and Conclusion" has a character of discussion. More detailed conclusions should be provided.

Answer: This was a very helpful suggestion for us. Based on your suggestion, we have separated the Conclusion section into Section 7 and summarized the findings of this study in “7. Conclusion and Future work”.

 

Author Response File: Author Response.docx

Reviewer 2 Report

1. The abstract is overloaded with narrative about the data, and a brief description is sufficient. The importance and significance of the research should be highlighted.

 

2. The loss function could be explained more clearly in Eq.(2).

 

3. The author could have increased the number of participants in the experiment and the gender ratio of participants was uneven. And the author may use the frequency of daily driving or actual driving experience as an evaluation indicator to determine whether participants can be selected to participate in the experiment, due to that skilled drivers and unskilled drivers are vastly different in their judgement of road conditions. And a table could be shown in Section 3.1 to describe the details of the participants.

 

4. The description of “two and three classifications” in Section 5.1.1 was recommended to be revised in a more understandable manner.

 

5. How to determine the value range of each parameter in table 1, and the parameters like batch and epochs are absent.

 

6. Inevitably, overfitting occurs during the training process. How did the author avoid it? Author may extend this work in the manuscript.

 

7. The statement like “In the underestimation case the driver's speed estimation of the target vehicle is smaller than the actual speed value of the target vehicle or the driver's estimation of the relative distance is smaller than the actual relative distance, while as in the overestimation case, the driver’s speed estimation of the target vehicle is higher that its actual speed or the driver’s estimation of the relative distance is larger than the actual distance.” is redundant.

Author Response

Response to Reviewer 2 Comments

Thank you very much for reviewing our manuscript carefully and making excellent comments, which are very helpful to improve the quality of our paper. The following is our response to your comments:

1. The abstract is overloaded with narrative about the data, and a brief description is sufficient. The importance and significance of the research should be highlighted.

Answer: Thank you for your careful review of our manuscript. We think your comments are very meaningful. We have revised the abstract section based on your suggestions. In the revised abstract we have reduced the description of the data and added the significance and importance of the study. We have attached a screenshot of the revised Abstract below.

2. The loss function could be explained more clearly in Eq.(2).

Answer: Thank you for your professional comments. We have modified the formula in "2.1. Description of XGBoost algorithm" in the revised manuscript to make it easier to understand.

3. The author could have increased the number of participants in the experiment and the gender ratio of participants was uneven. And the author may use the frequency of daily driving or actual driving experience as an evaluation indicator to determine whether participants can be selected to participate in the experiment, due to that skilled drivers and unskilled drivers are vastly different in their judgement of road conditions. And a table could be shown in Section 3.1 to describe the details of the participants.

Answer:(1) This was a very helpful suggestion for us. In the present study we collected 2116 valid speed estimation samples and 1542 valid speed estimation samples from 14 participants. The sample size collected in the trial conducted in this study was not large and the proportion of male and female participants was uneven, considering the limitations of the trial conditions and the inconvenience of conducting the trial on the highway. We will continue to overcome this problem and strive to conduct large sample size trials in future studies. In future studies, we will also consider more specific factors to investigate the causes of driver misjudgment of lane change and the development of more acceptable warning rules. We have presented the relevant content in Section 7 " 7. Conclusion and Future work" of the revised manuscript.

(2) I am sorry that the description of driving experience was not clear in the original manuscript. In fact, in this study we classified participants into two types based on actual driving mileage and driving age and considered the effect of driving experience on speed estimation and distance estimation. We also used driving experience as an input variable in the modeling. We have provided a screenshot of the description of the input variables of the model below. We apologize again for the confusion caused.

(3) We have added a table in Section 3.1 to describe the participants' information. Thank you very much for your professional advice.

4. The description of “two and three classifications” in Section 5.1.1 was recommended to be revised in a more understandable manner.

Answer: We gratefully appreciate for your valuable suggestion. We have added detailed descriptions in the revised version ((as shown in the screenshot below)) to make it easier to understand the meaning of “two and three classifications”.

5. How to determine the value range of each parameter in table 1, and the parameters like batch and epochs are absent.

Answer: (1) Thank you for posing such an important question. In this paper we use a search traversal method to derive the best combination of other parameters, as shown in Table 2 of the revised manuscript.

(2) The epoch was not set separately in this study. In the DNN module, the parameter tol is the condition for the model to stop training, and we set tol to 0.001, which means the model stops training when the loss value is less than or equal to 0.001. The parameter Batchsize affects the speed of model convergence, and the smaller the Batchsize, the less likely the model converges. We set the Batchsize to Auto and the Number of iterations to 200. We have added the relevant information to the revised manuscript.

6. Inevitably, overfitting occurs during the training process. How did the author avoid it? Author may extend this work in the manuscript.

Answer: Thank you very much for your expert comment. To avoid overfitting, we adjust the parameters subsample and colsample-bytree in the XGBoost module, which represent the ratio of the data and features used in training each tree to the total training set and features, respectively, with typical values of 0.5-1. By adjusting these two parameters, the overfitting of the model can be prevented. We have added the relevant information to the revised manuscript.

7. The statement like “In the underestimation case the driver's speed estimation of the target vehicle is smaller than the actual speed value of the target vehicle or the driver's estimation of the relative distance is smaller than the actual relative distance, while as in the overestimation case, the driver’s speed estimation of the target vehicle is higher that its actual speed or the driver’s estimation of the relative distance is larger than the actual distance.” is redundant.

Answer: Thank you for your wonderful comments, which will help us to improve the conciseness of the paper. We have revised the redundant statements in the revised manuscript.

For more response details, please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

The authors carry out a study of a model that estimates the speed and distance of one driver from others, using the XGBoost-DNN algorithm.

 

a)

2.1. Description of XGBoost algorithm

2.3. XGBoost-DNN mixed algorithm model

 

On another page

 

b)

6. Discussion and Conclusion

should be in two different sections.

 

Author Response

Thank you for taking the time to review our papers and make rigorous comments. With your help, the quality of our paper has been greatly improved. The following is our point-to-point response to your comments.

Comments and Suggestions for Authors

The authors carry out a study of a model that estimates the speed and distance of one driver from others, using the XGBoost-DNN algorithm.

a)

2.1. Description of XGBoost algorithm

2.3. XGBoost-DNN mixed algorithm model

On another page

Answer: Thank you for your valuable suggestions. We have moved the titles of the 2.1 and 2.3 sections to the next page in the revised version of the manuscript. We will pay extra attention to this point in future writing. Thank you again for your suggestions, which will have a profound impact on our future writing.

b)

  1. Discussion and Conclusion should be in two different sections.

Answer: Thank you very much for your expert comment. In the revised manuscript, we have split "6. Discussion and Conclusion" in the original manuscript into two sections.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors explore the factors that affect driver's speed estimation and distance estimation and build a corresponding model based on the XGBoost-DNN hybrid algorithm. This is a very interesting topic that can provide a theoretical basis for more user-friendly lane change warning rule making. The logical framework of the paper is relatively clear and informative, however, the following issues need to be considered.

 

1.     I suggest unifying "test platform" and "experimental platform" in "3. Study implementation" to improve the readability of the paper.

2.     I suggest adding a citation for “There are many methods to set or validate the ridge regression parameter , the most used is the ridge trace plot which shows the coefficients as a function of k.”

3.     In addition to the parameters listed in Table 2, there are other parameters in the XGBoost and DNN modules, I wonder how you set the other parameters?

4.     Why was the ROC curve used in the two-classification model evaluation, but not in the three-classification model evaluation?

5.     There are minor spelling errors or space problems in the text, please check carefully.

Author Response

We gratefully thank you for your time spend making constructive remarks and useful suggestions, which has significantly raised the quality of the manuscript and has enable us to improve the manuscript. Below the comments of the reviewer are response point by point and the revisions are indicated.

 

Comments and Suggestions for Authors

The authors explore the factors that affect driver's speed estimation and distance estimation and build a corresponding model based on the XGBoost-DNN hybrid algorithm. This is a very interesting topic that can provide a theoretical basis for more user-friendly lane change warning rule making. The logical framework of the paper is relatively clear and informative, however, the following issues need to be considered.

1. I suggest unifying "test platform" and "experimental platform" in "3. Study implementation" to improve the readability of the paper.

Answer: Thank you for your careful review of our manuscript. In the revised version we have unified them to " test platform ".

2. I suggest adding a citation for “There are many methods to set or validate the ridge regression parameter , the most used is the ridge trace plot which shows the coefficients as a function of k.”

Answer: Thank you very much for your professional opinion. We have added a citation for “There are many methods to set or validate the ridge regression parameter , the most used is the ridge trace plot which shows the coefficients as a function of k.”

3. In addition to the parameters listed in Table 2, there are other parameters in the XGBoost and DNN modules, I wonder how you set the other parameters?

Answer: Thank you for your careful comments. The main parameters affecting the model performance have been given by Table 2, and the other parameters have little effect on the model performance, so default values were used for these parameters in the model parameter settings.

4. Why was the ROC curve used in the two-classification model evaluation, but not in the three-classification model evaluation?

Answer: Thank you for posing such an important question. The ROC curve shows the relationship between TP and FP, and is more applicable to two-classification problems. For multi-classification problems, ROC curves can be drawn if the problem is converted to a two-classification problem, but the practical significance is not prominent. Therefore in this study we used ROC curves in the two classification problem to evaluate the model performance and not in the three classification problem.

5. There are minor spelling errors or space problems in the text, please check carefully.

Answer: Thank you very much for your expert comment. We have rechecked the full text and removed the extra spaces.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Back to TopTop