Dynamic Path Planning for Vehicles Based on Causal State-Masking Deep Reinforcement Learning
Abstract
:1. Introduction
- Enhanced model robustness and transferability: By combining the causal consistency loss term and dynamic state masks, the policy network is compelled to rely solely on state features with genuine causal relationships for decision making. This addresses the issue of DRL models capturing spurious correlations during training.
- Improved convergence and generalization capabilities of DRL in dynamic path planning: Introducing encoding configurations optimized through backpropagation learning enables the model to adaptively adjust the causal features of each state dimension.
- Algorithm validation: The effectiveness of CSM-TD3 was validated through multiple simulation experiments. The experimental results demonstrate that this method performs excellently in terms of convergence and generalization capabilities in dynamic path planning.
2. Related Work
2.1. Traditional Path-Planning Methods
2.2. RL-Based Path-Planning Methods
3. Problem Formulation
3.1. Dynamic Path-Planning Model
3.2. Formalization of the Markov Decision Process
3.3. Causal Relationship Model
- (1)
- Relationships between state features. , indicating that feature has a direct causal influence on feature .
- (2)
- The direct influence of state features on control inputs. , indicating that feature has a direct causal influence on control input .
- (3)
- The influence of control inputs on the next state features. , indicating that control input influences the state feature at the next time step.
- (4)
- States and control inputs jointly determine rewards. .
4. Causal State-Masking TD3 Algorithm
4.1. Critic Networks
4.2. Actor Networks
- (1)
- Intervention feature selection. For each training sample, randomly select a causal feature . This process simulates possible causal disturbances in real environments, forcing the model to learn robust policies under different causal contexts.
- (2)
- Intervention value generation. Apply perturbations to the selected causal feature to generate an intervention value :
- (3)
- Intervention state construction. Generate the intervened state :Replacing the selected causal feature with the intervention value constructs a new state, reflecting the environmental state when the feature is disturbed.
- (4)
- Intervention state masking. Apply mask processing to the intervened state to obtain . Through state masks, adjust the intervened state features, allowing the model to focus on key causal features under intervention conditions.
- (5)
- Policy network output calculation. Employ the policy network to generate actions and on and , respectively. By comparing the actions generated under original and intervened states, the model can learn the true impact of causal feature changes on decisions.
- (6)
- Causal consistency loss calculation. Compute the causal consistency loss . This loss term encourages the policy network to maintain consistency in action outputs when facing causal feature interventions, thereby reducing dependence on non-causal features and enhancing the model’s causal inference capabilities.
4.3. CSM-TD3 Architecture
Algorithm 1 CSM-TD3 |
|
5. Comparison of Simulation Results
5.1. Convergence Analyses
5.2. Generalization Analyses
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Quan, L.; Han, L.; Zhou, B.; Shen, S.; Gao, F. Survey of UAV motion planning. IET Cyber-Syst. Robot. 2020, 2, 14–21. [Google Scholar] [CrossRef]
- Li, D.; Wang, P.; Du, L. Path planning technologies for autonomous underwater vehicles-a review. IEEE Access 2018, 7, 9745–9768. [Google Scholar] [CrossRef]
- Wu, Z. Path planning techniques for autonomous vehicles. AIP Conf. Proc. 2024, 3144, 050020. [Google Scholar]
- Vásconez, J.P.; Basoalto, F.; Briceño, I.C.; Pantoja, J.M.; Larenas, R.A.; Rios, J.H.; Castro, F.A. Comparison of path planning methods for robot navigation in simulated agricultural environments. Procedia Comput. Sci. 2023, 220, 898–903. [Google Scholar] [CrossRef]
- Feng, L.; Jia, J. Improved algorithm of RRT path planning based on comparison optimization. Jisuanji Gongcheng Yu Yingyong (Comput. Eng. Appl.) 2011, 47. Available online: https://api.semanticscholar.org/CorpusID:62978159 (accessed on 23 October 2024).
- Hou, J.; Shi, L.; Jiang, W.; Luo, Z.; Yang, L. Dynamic path planning of mobile robots by combining improved informed-RRT* and VFH+ algorithms. In Proceedings of the Eleventh International Symposium on Precision Mechanical Measurements, Guangzhou, China, 24–26 November 2023; Volume 13178, pp. 624–641. [Google Scholar]
- Liu, Y.; Wang, C.; Wu, H.; Wei, Y. Mobile Robot Path Planning Based on Kinematically Constrained A-Star Algorithm and DWA Fusion Algorithm. Mathematics 2023, 11, 4552. [Google Scholar] [CrossRef]
- Zhang, X.; Lai, J.; Xu, D.; Li, H.; Fu, M. 2D Lidar-Based SLAM and Path Planning for Indoor Rescue Using Mobile Robots. J. Adv. Transp. 2020, 2020, 8867937. [Google Scholar] [CrossRef]
- Ishihara, S.; Kanai, M.; Narikawa, R.; Ohtsuka, T. A proposal of path planning for robots in warehouses by model predictive control without using global paths. IFAC-PapersOnLine 2022, 55, 573–578. [Google Scholar] [CrossRef]
- Huang, D.X. Prospects of AI assisted planning based on machine learning. Urban Dev. Stud. 2017, 24, 50–55. [Google Scholar]
- Rocha, L.G.S.; Kim, P.H.C.; Teixeira Vivaldini, K.C. Performance analysis of path planning techniques for autonomous robots: A deep path planning analysis in 2D environments. Int. J. Intell. Robot. Appl. 2023, 7, 778–794. [Google Scholar] [CrossRef]
- Mokhtari, S.A. Fopid control of quadrotor based on neural networks optimization and path planning through machine learning and pso algorithm. Int. J. Aeronaut. Space Sci. 2022, 23, 567–582. [Google Scholar] [CrossRef]
- Xu, T.; Chen, S.; Wang, D.; Wu, T.; Xu, Y.; Zhang, W. A novel path planning method for articulated road roller using support vector machine and longest accessible path with course correction. IEEE Access 2019, 7, 182784–182795. [Google Scholar]
- Zhang, Y.; Zhao, W.; Wang, J.; Yuan, Y. Recent progress, challenges and future prospects of applied deep reinforcement learning: A practical perspective in path planning. Neurocomputing 2024, 608, 128423. [Google Scholar] [CrossRef]
- Almazrouei, K.; Kamel, I.; Rabie, T. Dynamic obstacle avoidance and path planning through reinforcement learning. Appl. Sci. 2023, 13, 8174. [Google Scholar] [CrossRef]
- Yang, H.; Qi, Y.; Wu, B.; Rong, D.; Hong, M.; Wang, J. Path planning of mobile robots based on memristor reinforcement learning in dynamic environment. J. Syst. Simul. 2023, 35, 1619. [Google Scholar]
- Yang, L.; Bi, J.; Yuan, H. Dynamic path planning for mobile robots with deep reinforcement learning. IFAC-PapersOnLine 2022, 55, 19–24. [Google Scholar] [CrossRef]
- Ren, J.; Huang, X.; Huang, R.N. Efficient deep reinforcement learning for optimal path planning. Electronics 2022, 11, 3628. [Google Scholar] [CrossRef]
- Lin, Z.; Wu, K.; Shen, R.; Yu, X.; Huang, S. An Efficient and Accurate A-star Algorithm for Autonomous Vehicle Path Planning. IEEE Trans. Veh. Technol. 2023, 73, 9003–9008. [Google Scholar] [CrossRef]
- Sun, Y.; Yuan, Q.; Gao, Q.; Xu, L. A Multiple Environment Available Path Planning Based on an Improved A* Algorithm. Int. J. Comput. Intell. Syst. 2024, 17, 172. [Google Scholar] [CrossRef]
- Chen, G.; Tan, Y.; Zeng, G.; Tang, F.; Zhou, W. Research on multipath planning problems based on an improved LPA* algorithm. In Proceedings of the Seventh International Conference on Advanced Electronic Materials, Computers, and Software Engineering (AEMCSE 2024), Nanchang, China, 10–12 May 2024; Volume 13229, pp. 674–679. [Google Scholar]
- Li, J.; Huang, C.; Pan, M. Path-planning algorithms for self-driving vehicles based on improved RRT-Connect. Transp. Saf. Environ. 2023, 5, tdac061. [Google Scholar] [CrossRef]
- Shi, X.; Zhang, L.; Tang, L.; Dong, L.; Peng, J. Path planning of mobile robot using improved RRT_Connect algorithm. Sci. Technol. Rev. 2024, 42, 111–119. [Google Scholar]
- Lee, C.C.; Song, K.T. Path re-planning design of a cobot in a dynamic environment based on current obstacle configuration. IEEE Robot. Autom. Lett. 2023, 8, 1183–1190. [Google Scholar] [CrossRef]
- Chen, Z.; Yu, J.; Zhao, Z.; Wang, X.; Chen, Y. A path-planning method considering environmental disturbance based on VPF-RRT. Drones 2023, 7, 145. [Google Scholar] [CrossRef]
- Liu, J.; Fu, M.; Liu, A.; Zhang, W.; Chen, B. A Homotopy Invariant Based on Convex Dissection Topology and a Distance Optimal Path Planning Algorithm. IEEE Robot. Autom. Lett. 2023, 8, 7695–7702. [Google Scholar] [CrossRef]
- Wen, Y.; Wen, H.; Zhang, Z. Obstacle avoidance path planning of manipulator based on improved RRT algorithm. In Proceedings of the 2021 International Conference on Computer, Control and Robotics (ICCCR), Shanghai, China, 8–10 January 2021; pp. 104–109. [Google Scholar]
- Tan, X.; Han, L.; Gong, H.; Wu, Q. Biologically inspired complete coverage path planning algorithm based on Q-learning. Sensors 2023, 23, 4647. [Google Scholar] [CrossRef]
- Zhang, P.; Zhang, C.Y.; Gai, W.L. Research on path planning algorithm of unmanned ground platform based on reinforcement learning. In Proceedings of the Third International Conference on Artificial Intelligence and Computer Engineering (ICAICE 2022), Wuhan, China, 4–6 November 2022; Volume 12610, pp. 1241–1246. [Google Scholar]
- Zhang, D.; Ju, R.; Cao, Z. Reinforcement learning-based motion control for snake robots in complex environments. Robotica 2024, 42, 947–961. [Google Scholar] [CrossRef]
- Kong, F.; Wang, Q.; Gao, S.; Yu, H. B-APFDQN: A UAV path planning algorithm based on deep Q-network and artificial potential field. IEEE Access 2023, 11, 44051–44064. [Google Scholar] [CrossRef]
- Zhang, L.; Hou, Z.; Wang, J.; Liu, Z.; Li, W. Robot navigation with reinforcement learned path generation and fine-tuned motion control. IEEE Robot. Autom. Lett. 2023, 8, 4489–4496. [Google Scholar] [CrossRef]
- Dan, H.; Peng, H. 3D path planning of UAV based on improved reinforcement learning. In Proceedings of the Second International Conference on Electronic Information Engineering and Computer Communication (EIECC 2022), Xi’an, China, 25–27 November 2022; Volume 12594, pp. 21–30. [Google Scholar]
- Jaramillo-Martínez, R.; Chavero-Navarrete, E.; Ibarra-Pérez, T. Reinforcement-Learning-Based Path Planning: A Reward Function Strategy. Appl. Sci. 2024, 14, 7654. [Google Scholar] [CrossRef]
- Wang, Z.; Gao, W.; Li, G.; Wang, Z.; Gong, M. Path Planning for Unmanned Aerial Vehicle via Off-Policy Reinforcement Learning With Enhanced Exploration. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2625–2639. [Google Scholar] [CrossRef]
- Hua, B.; Sun, S.; Wu, Y.; Chen, Z. A spacecraft attitude maneuvering path planning method based on PIO-improved reinforcement learning. Sci. Sin. Technol. 2021, 53, 1674–7259. [Google Scholar] [CrossRef]
- Feng, Z.; Wang, G.; Chang, Y.; Shi, Y.; Geng, J.; Zhu, C. Research on path planning and obstacle avoidance for unmanned platforms based on reinforcement learning. In Proceedings of the International Conference on Electronic Materials and Information Engineering (EMIE 2023), Guangzhou, China, 14–16 July 2023; Volume 12919, pp. 107–112. [Google Scholar]
- Liu, Y.; Wang, H.; Wu, T.; Lun, Y.; Fan, J.; Wu, J. Attitude control for hypersonic reentry vehicles: An efficient deep reinforcement learning method. Appl. Soft Comput. 2022, 123, 108865. [Google Scholar] [CrossRef]
- Tang, W.; Wu, F.; Lin, S.W.; Ding, Z.; Liu, J.; Liu, Y.; He, J. Causal deconfounding deep reinforcement learning for mobile robot motion planning. Knowl.-Based Syst. 2024, 303, 112406. [Google Scholar] [CrossRef]
- Teng, Y.; Natalino, C.; Li, H.; Yang, R.; Majeed, J.; Shen, S.; Monti, P.; Nejabati, R.; Yan, S.; Simeonidou, D. Deep-reinforcement-learning-based RMSCA for space division multiplexing networks with multi-core fibers. J. Opt. Commun. Netw. 2024, 16, C76–C87. [Google Scholar] [CrossRef]
- Zheng, J.; Kurt, M.N.; Wang, X. Stochastic integrated actor–critic for deep reinforcement learning. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 6654–6666. [Google Scholar] [CrossRef]
- Luo, X.; Wang, Q. UAV path planning based on the average TD3 algorithm with prioritized experience replay. IEEE Access 2024, 12, 38017–38029. [Google Scholar] [CrossRef]
- Yu, K.; Liu, L.; Li, J. A unified view of causal and non-causal feature selection. ACM Trans. Knowl. Discov. Data (TKDD) 2021, 15, 63. [Google Scholar] [CrossRef]
DRL Algorithm | CSM-TD3 | TD3 | PPO | DQN |
---|---|---|---|---|
Hidden layer dimension | 256 | 256 | 256 | 256 |
Batch size | 256 | 256 | 256 | 256 |
Discount factor | 0.99 | 0.99 | 0.99 | 0.99 |
Soft update coefficient | 0.05 | 0.05 | × | × |
Policy noise | 0.2 | 0.2 | × | × |
Noise clipping range | 0.5 | 0.5 | × | × |
Policy update frequency | 2 | 2 | 2 | × |
Priority exponent | 0.6 | 0.6 | 0.6 | 0.6 |
Learning rate | 1 × 10−4 | 1 × 10−4 | 1 × 10−4 | 1 × 10−4 |
Path-Planning Parameter | Parameter |
---|---|
Vehicle mass (kg) | 1477 |
Vehicle yaw inertia () | 1536.7 |
Boundary length (m) | 8 |
Angular velocity range of obstacles (rad/s) | |
Radius of obstacles (m) | 0.5 |
Radius of obstacle motion path (m) | 2.83 |
DRL Algorithm | DQN | TD3 | CSM-TD3 |
---|---|---|---|
Mean average return | −781.6090 | −508.2607 | −318.9694 |
DRL Algorithm | DQN | TD3 | CSM-TD3 |
---|---|---|---|
Location of Goal Point | |||
26 | 24 | 23 | |
40 | 35 | 33 | |
67 | 50 | 43 |
DRL Algorithm | DQN | TD3 | CSM-TD3 |
---|---|---|---|
Location of Goal Point | |||
49 | 42 | 23 | |
44 | 34 | 32 | |
58 | 40 | 36 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hua, X.; Zhang, T.; Cao, J. Dynamic Path Planning for Vehicles Based on Causal State-Masking Deep Reinforcement Learning. Algorithms 2025, 18, 146. https://doi.org/10.3390/a18030146
Hua X, Zhang T, Cao J. Dynamic Path Planning for Vehicles Based on Causal State-Masking Deep Reinforcement Learning. Algorithms. 2025; 18(3):146. https://doi.org/10.3390/a18030146
Chicago/Turabian StyleHua, Xia, Tengteng Zhang, and Jun Cao. 2025. "Dynamic Path Planning for Vehicles Based on Causal State-Masking Deep Reinforcement Learning" Algorithms 18, no. 3: 146. https://doi.org/10.3390/a18030146
APA StyleHua, X., Zhang, T., & Cao, J. (2025). Dynamic Path Planning for Vehicles Based on Causal State-Masking Deep Reinforcement Learning. Algorithms, 18(3), 146. https://doi.org/10.3390/a18030146