Next Article in Journal
Mobility Patterns of Scholar Communities in Southwestern European Countries
Next Article in Special Issue
Deep-Reinforcement-Learning-Based Active Disturbance Rejection Control for Lateral Path Following of Parafoil System
Previous Article in Journal
Data-Driven Graph Filter-Based Graph Convolutional Neural Network Approach for Network-Level Multi-Step Traffic Prediction
Previous Article in Special Issue
Modeling of Acoustic Vibration Theory Based on a Micro Thin Plate System and Its Control Experiment Verification
 
 
Article
Peer-Review Record

Deep Reinforcement Learning Car-Following Model Considering Longitudinal and Lateral Control

Sustainability 2022, 14(24), 16705; https://doi.org/10.3390/su142416705
by Pinpin Qin 1,*, Hongyun Tan 1, Hao Li 1 and Xuguang Wen 2
Reviewer 1:
Reviewer 2:
Reviewer 3:
Sustainability 2022, 14(24), 16705; https://doi.org/10.3390/su142416705
Submission received: 9 November 2022 / Revised: 4 December 2022 / Accepted: 8 December 2022 / Published: 13 December 2022

Round 1

Reviewer 1 Report

In this article, a car-following model with combined longitudinal and lateral control is constructed based on the three degrees of freedom vehicle dynamics model and reinforcement learning method. The proposed method help improve the consideration of lateral control, but there are still some suggestions before publishment.

1. The contribution should be clarified in abstract and introduction, thus the readers can get the innovation of the article.

2. It is confusing to apply a neural network; it is necessary or irreplaceable? The network usually indicates it is with physical non-explanatory ability.

3. The equation (7) and associated parameters should be cited with reference or experiment.

4. In fig.10, the speeds during decelerating are always larger than lead vehicle. Will the closing distance will less than safety distance?

Author Response

In this article, a car-following model with combined longitudinal and lateral control is constructed based on the three degrees of freedom vehicle dynamics model and reinforcement learning method. The proposed method help improve the consideration of lateral control, but there are still some suggestions before publishment.

Point 1: The contribution should be clarified in abstract and introduction, thus the readers can get the innovation of the article.

Response 1: We are so grateful for your kind suggestion. For readers to understand the innovation of this paper, we wrote the main contributions of this article in lines 105-121 of the article. The contents are as follows:

The research on the existing CF model only considers the longitudinal control and the defects of the control algorithm used. This paper considers the joint control of longitudinal acceleration and the vehicle's lateral steering angle, establishes the deep deterministic policy gradient (DDPG) car-following model and multiple agents deep deterministic policy gradient (MADDPG) car-following model, and uses the experimental data to train and validate the proposed models. The main contributions of this paper are as follows: (1) The reward function is designed based on the vehicle dynamics theory, and multiple constraints such as CF safety, comfort, traffic efficiency, and lateral stability are imposed on it; (2) Based on the deep reinforcement learning theory and the designed reward function, the CF model of multi-objective optimization is established; (3) The validity of the CF model established in this paper is verified based on the open CF data set, which can provide a reference for the subsequent development of adaptive cruise control system considering lateral stability.

Point 2: It is confusing to apply a neural network; it is necessary or irreplaceable? The network usually indicates it is with physical non-explanatory ability.

Response 2: The network is necessary to apply. The car-following model in this paper is based on the algorithm theory of deep reinforcement learning, including two algorithms: the deep deterministic policy gradient (DDPG) and the multi-agent deep deterministic policy gradient (MADDPG). Since the response of the following vehicle is continuous rather than discrete, if the Q table (the table representing the action state of the agent) is used, it will face the problem of dimensional disaster, so it is necessary to select a Q value function (a value function representing the action state space of the agent) to approximate the action state of the agent. Still, Q function is challenging to solve, so a neural network is used to approximate the continuous Q function. Although neural networks are not as interpretable as physical formulas, the underlying logic is still based on mathematics. Some relevant references [1-2] are as follows:

[1] Lillicrap, T. P.; Hunt, J. J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; ... & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. https://doi.org/10.48550/arXiv.1509.02971
[2] Molnar, C. (2020). Interpretable machine learning. Lulu. com. [Link]

Point 3: The equation (7) and associated parameters should be cited with reference or experiment.

Response 3: Equation (7) is the equation of the reward function, which is determined by reference to the design principles of reference [3] and our repeated training and experiments. In the original text, we have added references to design criteria; see lines 222-223.

[3] Sutton, R. S.; & Barto, A. G. Reinforcement learning: An introduction. 2018, MIT Press. [Link]

Point 4: In fig.10, the speeds during decelerating are always larger than lead vehicle. Will the closing distance will less than safety distance?

Response 4: In the process of car-following, the following car needs to receive the acceleration and deceleration signal of the leading car to make a corresponding response, so in the deceleration phase, the speed of the following car is greater than that of the leading car. To intuitively indicate whether the distance between the following car and the leading car in the deceleration phase is greater than the safety distance, we have added the safety distance curve (equation(2)) in Figure 5 (e) and Figure 6 (e). It can be seen that the distance between the leading car and the following car of the DDPG car-following model and the MADDPG car-following model in the deceleration phase is greater than the safety distance, which meets the safety requirements.

Author Response File: Author Response.docx

Reviewer 2 Report

This is a very well done paper.  An English grammar edit would help the reader significantly.  The use of machine learning was described very well, however, its use for a problem like this might be questioned due to the following.  Our experience with machine learning models tends to indicate that they may respond in unusual ways to situations that they have not experienced during the learning process.  The development of car following models that include lateral or transverse movement is indeed a great step forward.

Author Response

#Reviewer2:

Point: This is a very well done paper. An English grammar edit would help the reader significantly. The use of machine learning was described very well, however, its use for a problem like this might be questioned due to the following. Our experience with machine learning models tends to indicate that they may respond in unusual ways to situations that they have not experienced during the learning process. The development of car following models that include lateral or transverse movement is indeed a great step forward.

Response : Thank you for your affirmation of the research results of this paper. The training of reinforcement learning requires a large number of samples. The driving scenes covered in this paper are car-following scenes on the expressway, and 100 CF segments were selected from the OpenACC database, including 50 linear and 50 curved road trajectories. The average duration of each CF segment is 60 s, and the cumulative time is 6000 s.

Author Response File: Author Response.docx

Reviewer 3 Report

1. P2 Line 51 “ation” and Line 119 “in the third section 3”. Correct these errors.

2. It is recommended to redraw Figure 4(a). The model DDPG has too many episodes, and the convergence process of the reward function cannot be visualized.

3. Judging by the THW results presented in Line 331-338, it seems that the MPC has better control. In addition, the reference empirical value of THW safety threshold is 1.2s. It is used to evaluate the THW result evaluation of MPC. But it can be seen from Figure 5(d) of P11 that the control effect of the MPC model is more similar to the distribution of the human driver data. MPC can capture the distribution of THW in 1.0s~1.2s.

4. The results of the Line 338-341 and Line 382-385 experiments are the same. But these two paragraphs describe different scenarios, and it is almost impossible to get exactly the same result, which needs to be added by the author.

Author Response

#Reviewer 3:

Point 1: P2 Line 51 “ation” and Line 119 “in the third section 3”. Correct these errors.

Response 1: Thank you very much for pointing out the errors in the article, which have been corrected in the original text. To avoid such mistakes, we rechecked the full text.

Point 2: It is recommended to redraw Figure 4(a). The model DDPG has too many episodes, and the convergence process of the reward function cannot be visualized.

Response 2: Thank you for your comments. Figure 4(a) has been redrawn. The size of the episode is determined according to the training results. According to the training results, when the episode is 2090 s, the model meets the convergence requirements, and the training is terminated.

Point 3: Judging by the THW results presented in Line 331-338, it seems that the MPC has better control. In addition, the reference empirical value of THW safety threshold is 1.2s. It is used to evaluate the THW result evaluation of MPC. But it can be seen from Figure 5(d) of P11 that the control effect of the MPC model is more similar to the distribution of the human driver data. MPC can capture the distribution of THW in 1.0s~1.2s.

Response 3: We redrew the resulting graph of car-following distance (Fig.5 (e) and Fig.6 (e)), adding the curve of safe distance. The MPC model has the best lateral control from lateral displacement and yaw angle changes. Still, from the TTC distribution and car-following distance, the performance of longitudinal control is worse than the DDPG and MADDPG model, mainly in safety and comfort. Judging from THW, although the traffic efficiency of the MPC model is high, it sacrifices the safety of car-following, which does not conform to the essential principle of safety during the car-following process. Our demand should be to ensure ride comfort and traffic efficiency on the premise of safety. In addition, we described some comparisons between the MPC, DDPG , and MADDPG models in the original text, as shown in lines 431-443.

Point 4: The results of the Line 338-341 and Line 382-385 experiments are the same. But these two paragraphs describe different scenarios, and it is almost impossible to get exactly the same result, which needs to be added by the author.

Response 4: Thank you very much for pointing out this error. The results are different under different scenarios. The mistakes in the article are caused by carelessness in the writing process. The data here should be consistent with the data of the abstract, and the original text has been revised.

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report

It have been improved. Can be accept in this form.

Back to TopTop