Deep Reinforcement Learning Car-Following Model Considering Longitudinal and Lateral Control
Abstract
:1. Introduction
2. Data and Methodology
2.1. Data Preparation
2.2. Vehicle Dynamics
2.3. Error Model of Car-Following
2.4. Car-Following Model Based on DDPG
Algorithm 1 DDPG algorithm pseudo-code. |
Car-Following Model Based on DDPG |
Initialize with random weights ω and θ |
Initialize the target network: |
Initialize Replay buffer |
Forepisode = 1, M do |
Initialize a random process for action exploration |
receive initialization status |
For do |
Choose action based on current policy and noise: |
Perform action , get feedback rewards and move to the next state |
Store the state transition sequence in the reply buffer |
Randomly take a batch of samples from the replay buffer |
Calculate via the temporal-difference algorithm |
Update the Critic network: |
Update the actor network via gradient ascent: |
Update target network: |
End for |
End for |
2.5. Car-Following Model Based on MADDPG
2.6. Evaluation Metrics for Car-Following Behavior
3. Results
3.1. Training Result
3.2. Car-Following Effect on Straight Roads
4. Discussion and Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, W.; Wang, Y.; Liu, Y.; Wu, B. Headway Distribution Considering Vehicle Type Combinations. J. Transp. Eng. Part A-Syst. 2022, 148, 04021119. [Google Scholar] [CrossRef]
- Zong, F.; Wang, M.; Tang, M.; Li, X.; Zeng, M. An improved intelligent driver model considering the information of multiple front and rear vehicles. IEEE Access. 2021, 9, 66241–66252. [Google Scholar] [CrossRef]
- Yu, Y.; Jiang, R.; Qu, X. A modified full velocity difference model with acceleration and deceleration confinement: Calibrations, validations, and scenario analyses. IEEE Intell. Transp. Syst. Mag. 2019, 13, 222–235. [Google Scholar] [CrossRef]
- Ardakani, M.K.; Yang, J. Generalized Gipps-type vehicle-following models. J. Transp. Eng. Part A-Syst. 2017, 143, 04016011. [Google Scholar] [CrossRef]
- He, Y.; Montanino, M.; Mattas, K.; Punzo, V.; Ciuffo, B. Physics-augmented models to simulate commercial adaptive cruise control (ACC) systems. Transp. Res. Pt. C-Emerg. Technol. 2022, 139, 103692. [Google Scholar] [CrossRef]
- Meng, D.; Song, G.; Wu, Y.; Zhai, Z.; Yu, L.; Zhang, J. Modification of Newell’s car-following model incorporating multidimensional stochastic parameters for emission estimation. Transport. Res. Part D-Transport. Environ. 2021, 91, 102692. [Google Scholar] [CrossRef]
- Liu, Y.; Zou, B.; Ni, A.; Gao, L.; Zhang, C. Calibrating microscopic traffic simulators using machine learning and particle swarm optimization. Transp. Lett. 2021, 13, 295–307. [Google Scholar] [CrossRef]
- Gao, K.; Yan, D.; Yang, F.; Xie, J.; Liu, L.; Du, R.; Xiong, N. Conditional artificial potential field-based autonomous vehicle safety control with interference of lane changing in mixed traffic scenario. Sensors 2019, 19, 4199. [Google Scholar] [CrossRef] [Green Version]
- Qu, D.; Wang, S.; Liu, H.; Meng, Y. A Car-Following Model Based on Trajectory Data for Connected and Automated Vehicles to Predict Trajectory of Human-Driven Vehicles. Sustainability 2022, 14, 7045. [Google Scholar] [CrossRef]
- Li, W.; Zhang, Y.; Shi, X.; Qiu, F. A Decision-Making Strategy for Car Following Based on Naturalist Driving Data via Deep Reinforcement Learning. Sensors 2022, 22, 8055. [Google Scholar] [CrossRef]
- Ye, Y.; Zhang, X.; Sun, J. Automated vehicle’s behavior decision making using deep reinforcement learning and high-fidelity simulation environment. Transp. Res. Pt. C-Emerg. Technol. 2019, 107, 155–170. [Google Scholar] [CrossRef] [Green Version]
- Zhang, D.; Li, K.; Wang, J. A curving ACC system with coordination control of longitudinal car-following and lateral stability. Veh. Syst. Dyn. 2012, 50, 1085–1102. [Google Scholar] [CrossRef]
- Zhang, J.; Li, Q.; Chen, D. Integrated Adaptive Cruise Control with Weight Coefficient Self-Tuning Strategy. Appl. Sci. 2018, 8, 978. [Google Scholar] [CrossRef] [Green Version]
- Chen, K.; Pei, X.; Okuda, H.; Zhu, M.; Guo, X.; Guo, K.; Suzuki, T. A hierarchical hybrid system of integrated longitudinal and lateral control for intelligent vehicle. ISA Trans. 2020, 106, 200–212. [Google Scholar] [CrossRef] [PubMed]
- Ghaffari, A.; Gharehpapagh, B.; Khodayari, A.; Salehinia, S. Longitudinal and lateral movement control of car following maneuver using fuzzy sliding mode control. In Proceedings of the 2014 IEEE 23rd International Symposium on Industrial Electronics, Istanbul, Turkey, 1–4 June 2014; IEEE: New York, NY, USA, 2014; pp. 150–155. [Google Scholar]
- Guo, J.; Luo, Y.; Li, K. Integrated adaptive dynamic surface car following control for nonholonomic autonomous electric vehicles. Sci. China-Technol. Sci. 2017, 60, 1221–1230. [Google Scholar] [CrossRef]
- Yang, Y.; Ma, F.; Wang, J.; Zhu, S.; Gelbal, S.Y.; Kavas-Torris, O.; Guvenc, L. Cooperative ecological cruising using hierarchical control strategy with optimal sustainable performance for connected automated vehicles on varying road conditions. J. Clean. Prod. 2020, 275, 123056. [Google Scholar] [CrossRef]
- Li, Y.; Chen, W.; Peeta, S.; Wang, Y. Platoon Control of Connected Multi-Vehicle Systems Under V2X Communications: Design and Experiments. IEEE Trans. Intell. Transp. Syst. 2020, 21, 1891–1902. [Google Scholar] [CrossRef]
- Lin, Y.; McPhee, J.; Azad, N.L. Comparison of Deep Reinforcement Learning and Model Predictive Control for Adaptive Cruise Control. IEEE Trans. Intell. Veh. 2021, 6, 221–231. [Google Scholar] [CrossRef]
- Makridis, M.; Mattas, K.; Anesiadou, A.; Ciuffo, B. OpenACC. An open database of car-following experiments to study the properties of commercial ACC systems. Transp. Res. Pt. C-Emerg. Technol. 2021, 125, 103047. [Google Scholar] [CrossRef]
- Wang, Y.; Ding, H.; Yuan, J.; Chen, H. Output-feedback triple-step coordinated control for path following of autonomous ground vehicles. Mech. Syst. Signal. Proc. 2019, 116, 146–159. [Google Scholar] [CrossRef]
- Puan, O.C.; Mohamed, A.; Idham, M.K.; Ismail, C.R.; Hainin, M.R.; Ahmad, M.S.A.; Mokhtar, A. Drivers behaviour on expressways: Headway and speed relationships. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2019; Volume 527, p. 012071. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Zhu, M.; Wang, X.; Wang, Y. Human-like autonomous car-following model with deep reinforcement learning. Transp. Res. Pt. C-Emerg. Technol. 2018, 97, 348–368. [Google Scholar] [CrossRef] [Green Version]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Pu, Z.; Li, Z.; Jiang, Y.; Wang, Y. Full Bayesian Before-After Analysis of Safety Effects of Variable Speed Limit System. IEEE. Trans. Intell. Transp. Syst. 2021, 22, 964–976. [Google Scholar] [CrossRef]
- Zhang, G.; Wang, Y.; Wei, H.; Chen, Y. Examining headway distribution models with urban freeway loop event data. Transp. Res. Record. 2007, 1999, 141–149. [Google Scholar] [CrossRef]
Parameter | Symbol | Value |
---|---|---|
Vehicle mass | m (kg) | 1600 |
Moment of inertia of the vehicle around the z-axis | (kg/m2) | 2875 |
Distance from the center of mass to the front axle | (m) | 1.4 |
Distance from the center of mass to the rear axle | (m) | 1.6 |
Front tire cornering stiffness | (N/rad) | −19,000 |
Rear tire cornering stiffness | (N/rad) | −33,000 |
Parameter | Symbol | DDPG | MADDPG |
---|---|---|---|
Sampling step | (s) | 0.1 | 0.1 |
Training time per episode | (s) | 60 | 60 |
Actor learning rate | 1 × 10−4 | 1 × 10−4 | |
Critic learning rate | 1 × 10−3 | 1 × 10−3 | |
Soft update rate | 1 × 10−3 | 1 × 10−3 | |
Discount factor | 0.99 | 0.99 | |
Replay buffer capacity | / | 1 × 106 | 1 × 106 |
Batch size | / | 64 | 64 |
Maximum training episode | / | 3000 | 2000 |
Termination episode return value | / | 1670 | 1670 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qin, P.; Tan, H.; Li, H.; Wen, X. Deep Reinforcement Learning Car-Following Model Considering Longitudinal and Lateral Control. Sustainability 2022, 14, 16705. https://doi.org/10.3390/su142416705
Qin P, Tan H, Li H, Wen X. Deep Reinforcement Learning Car-Following Model Considering Longitudinal and Lateral Control. Sustainability. 2022; 14(24):16705. https://doi.org/10.3390/su142416705
Chicago/Turabian StyleQin, Pinpin, Hongyun Tan, Hao Li, and Xuguang Wen. 2022. "Deep Reinforcement Learning Car-Following Model Considering Longitudinal and Lateral Control" Sustainability 14, no. 24: 16705. https://doi.org/10.3390/su142416705
APA StyleQin, P., Tan, H., Li, H., & Wen, X. (2022). Deep Reinforcement Learning Car-Following Model Considering Longitudinal and Lateral Control. Sustainability, 14(24), 16705. https://doi.org/10.3390/su142416705