Optimizing Autonomous Vehicle Performance Using Improved Proximal Policy Optimization
Abstract
:Highlights
- The integration of Lévy flight into the proximal policy optimization (PPO) algorithm (LFPPO) significantly improves the algorithm’s exploration capabilities, allowing it to escape local minima and achieve better policy optimization.
- The experimental results in the CARLA simulator show that the LFPPO algorithm achieves a 99% success rate, compared to the 81% achieved by the standard PPO algorithm, demonstrating enhanced stability and higher rewards in autonomous vehicle decision-making.
- The LFPPO algorithm enables autonomous vehicles to make more reliable and safer decisions in complex and dynamic traffic conditions, enhancing overall driving performance.
- The integration of real-time data streaming using Apache Kafka allows autonomous systems to process and react to dynamic environments more efficiently, improving real-time decision-making.
Abstract
1. Introduction
2. Related Works
2.1. Literature Review
2.2. Research Gap and Novel Contributions of This Study
- Robust Validation: Evaluated in CARLA’s Town 10 and Town 5 over 100 episodes and 10 runs, LFPPO achieves a 99% success rate versus the PPO algorithm’s 81%, with safety metrics (1% collision rate) confirming its efficacy.
3. Materials and Methods
3.1. PPO Algorithm
- Vehicle speed: speed
- Vehicle acceleration: acceleration
- Coordinates of traffic elements: traffic_location_x, traffic_location_y, traffic_location_z
- Vehicle coordinates: vehicle_location_x, vehicle_location_y, vehicle_location_z
- Status of traffic lights: traffic_light_state
- Distances to other actors: distances_to_actors
3.2. Lévy Flight Fundamentals and Use with PPO
Algorithm 1: Working principle of LFPPO algorithm |
1: Initialize policy network with random weights 2: for each iteration do: 3: Collect experience (s, a, r, s’) using current policy 4: Compute advantage estimates = − V() 5: Compute PPO loss function: = + α ∗ 6: Apply Lévy flight step: L(s; λ) = + 7: Update policy with combined loss: = + β ∗ L(s; λ) 8: end for |
4. Evaluation Criteria and Mathematical Representation
4.1. Reward Function
4.2. Entropy Function
4.3. Success Rate Function
5. Apache Kafka’s Core Components and Working Principle
6. Hyperparameter Tuning and Values
7. Experiments and Results
7.1. Development and Operation of CARLA, Apache Kafka, and PPO and LFPPO Algorithm
7.2. Performance Evaluation Metrics for the Algorithms
7.3. Safety and Comfort Metrics Analysis
8. Conclusions and Recommendations
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Sharma, R.; Garg, P. Optimizing Autonomous Driving with Advanced Reinforcement Learning: Evaluating DQN and PPO. In Proceedings of the 2024 5th International Conference on Smart Electronics and Communication (ICOSEC), Trichy, India, 18–20 September 2024. [Google Scholar]
- Xiao, R. Economic benefit, challenges, and perspectives for the application of Autonomous technology in self-driving vehicles. Highlights Sci. Eng. Technol. 2023, 38, 456–460. [Google Scholar] [CrossRef]
- Rievaj, V.; Mokričková, L.; Synák, F. Benefits of Autonomously Driven Vehicles. Transp. Commun. 2016, 4, 15–17. [Google Scholar] [CrossRef]
- Nastjuk, I.; Herrenkind, B.; Marrone, M.; Brendel, A.B.; Kolbe, L.M. What drives the acceptance of autonomous driving? An investigation of acceptance factors from an end-user’s perspective. Technol. Forecast. Soc. Chang. 2020, 161, 120319. [Google Scholar] [CrossRef]
- Martínez-Díaz, M.; Soriguera, F. Autonomous vehicles: Theoretical and practical challenges. Transp. Res. Procedia 2018, 33, 275–282. [Google Scholar] [CrossRef]
- Sana, F.; Azad, N.L.; Raahemifar, K. Autonomous Vehicle Decision-Making and Control in Complex and Unconventional Scenarios—A Review. Machines 2023, 11, 676. [Google Scholar] [CrossRef]
- Yu, M.-Y.; Vasudevan, R.; Johnson-Roberson, M. Occlusion-Aware Risk Assessment for Autonomous Driving in Urban Environments. IEEE Robot. Autom. Lett. 2019, 4, 2235–2241. [Google Scholar] [CrossRef]
- Rashid, P.Q.; Turker, I. Lung Disease Detection Using U-Net Feature Extractor Cascaded by Graph Convolutional Network. Diagnostics 2024, 14, 1313. [Google Scholar] [CrossRef]
- Kazangirler, B.Y.; Özkaynak, E. Conventional Machine Learning and Ensemble Learning Techniques in Cardiovascular Disease Prediction and Analysis. J. Intell. Syst. Theory Appl. 2024, 7, 81–94. [Google Scholar] [CrossRef]
- Saihood, Q.; SonuÇ, E. A practical framework for early detection of diabetes using ensemble machine learning models. Turk. J. Electr. Eng. Comput. Sci. 2023, 31, 722–738. [Google Scholar] [CrossRef]
- Baydilli, Y.Y.; Atila, U.; Elen, A. Learn from one data set to classify all—A multi-target domain adaptation approach for white blood cell classification. Comput. Methods Programs Biomed. 2020, 196, 105645. [Google Scholar] [CrossRef]
- Cizmeci, H.; Ozcan, C. Enhanced deep capsule network for EEG-based emotion recognition. Signal Image Video Process. 2022, 17, 463–469. [Google Scholar] [CrossRef]
- Priyadarshi, R.; Ranjan, R.; Vishwakarma, A.K.; Yang, T.; Rathore, R.S. Exploring the Frontiers of Unsupervised Learning Techniques for Diagnosis of Cardiovascular Disorder: A Systematic Review. IEEE Access 2024, 12, 139253–139272. [Google Scholar] [CrossRef]
- Gautam, R.; Sharma, M. Computational Approaches for Anxiety and Depression: A Meta- Analytical Perspective. ICST Trans. Scalable Inf. Syst. 2024, 11, 1. [Google Scholar] [CrossRef]
- Karaoğlan, K.M.; Fındık, O. Extended rule-based opinion target extraction with a novel text pre-processing method and ensemble learning. Appl. Soft Comput. 2022, 118, 108524. [Google Scholar] [CrossRef]
- Habbal, A.; Ali, M.K.; Abuzaraida, M.A. Artificial Intelligence Trust, Risk and Security Management (AI TRiSM): Frameworks, applications, challenges and future research directions. Expert Syst. Appl. 2024, 240, 122442. [Google Scholar] [CrossRef]
- Muhammad, K.; Ullah, A.; Lloret, J.; Ser, J.D.; de Albuquerque, V.H.C. Deep Learning for Safe Autonomous Driving: Current Challenges and Future Directions. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4316–4336. [Google Scholar] [CrossRef]
- Galvao, L.G.; Abbod, M.; Kalganova, T.; Palade, V.; Huda, M.N. Pedestrian and Vehicle Detection in Autonomous Vehicle Perception Systems—A Review. Sensors 2021, 21, 7267. [Google Scholar] [CrossRef]
- Neamah, O.N.; Almohamad, T.A.; Bayir, R. Enhancing Road Safety: Real-Time Distracted Driver Detection Using Nvidia Jetson Nano and YOLOv8. In Proceedings of the 2024 Zooming Innovation in Consumer Technologies Conference (ZINC), Novi Sad, Serbia, 22–23 May 2024. [Google Scholar]
- Hung, G.L.; Sahimi, M.S.B.; Samma, H.; Almohamad, T.A.; Lahasan, B. Faster R-CNN Deep Learning Model for Pedestrian Detection from Drone Images. SN Comput. Sci. 2020, 1, 116. [Google Scholar] [CrossRef]
- Durgut, R.; Aydin, M.E.; Rakib, A. Transfer Learning for Operator Selection: A Reinforcement Learning Approach. Algorithms 2022, 15, 24. [Google Scholar] [CrossRef]
- Alharbi, A.; Poujade, A.; Malandrakis, K.; Petrunin, I.; Panagiotakopoulos, D.; Tsourdos, A. Rule-Based Conflict Management for Unmanned Traffic Management Scenarios. In Proceedings of the 2020 AIAA/IEEE 39th Digital Avionics Systems Conference (DASC), San Antonio, TX, USA, 11–15 October 2020. [Google Scholar]
- Mousa, A. Extended-deep Q-network: A functional reinforcement learning-based energy management strategy for plug-in hybrid electric vehicles. Eng. Sci. Technol. Int. J. 2023, 43, 101434. [Google Scholar] [CrossRef]
- Ahmed, M.; Raza, S.; Ahmad, H.; Khan, W.U.; Xu, F.; Rabie, K. Deep reinforcement learning approach for multi-hop task offloading in vehicular edge computing. Eng. Sci. Technol. Int. J. 2024, 59, 101854. [Google Scholar] [CrossRef]
- Xu, X.; Zuo, L.; Li, X.; Qian, L.; Ren, J.; Sun, Z. A Reinforcement Learning Approach to Autonomous Decision Making of Intelligent Vehicles on Highways. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 3884–3897. [Google Scholar] [CrossRef]
- Yuan, M.; Shan, J.; Mi, K. Deep Reinforcement Learning Based Game-Theoretic Decision-Making for Autonomous Vehicles. IEEE Robot. Autom. Lett. 2022, 7, 818–825. [Google Scholar] [CrossRef]
- Aydin, M.E.; Durgut, R.; Rakib, A. Why Reinforcement Learning? Algorithms 2024, 17, 269. [Google Scholar] [CrossRef]
- Yau, H.T.; Kuo, P.H.; Luan, P.C.; Tseng, Y.R. Proximal policy optimization-based controller for chaotic systems. Int. J. Robust Nonlinear Control. 2023, 34, 586–601. [Google Scholar] [CrossRef]
- Vakili, E.; Amirkhani, A.; Mashadi, B. DQN-based ethical decision-making for self-driving cars in unavoidable crashes: An applied ethical knob. Expert Syst. Appl. 2024, 255, 124569. [Google Scholar] [CrossRef]
- Agarwal, T.; Arora, H.; Parhar, T.; Deshpande, S.; Schneider, J. Learning to Drive Using Waypoints, Proceedings of NeurIPS ’19 Machine Learning for Autonomous Driving Workshop. 2019. Available online: https://api.semanticscholar.org/CorpusID:209442419 (accessed on 12 December 2024).
- Song, Q.; Liu, Y.; Lu, M.; Zhang, J.; Qi, H.; Wang, Z.; Liu, Z. Autonomous Driving Decision Control Based on Improved Proximal Policy Optimization Algorithm. Appl. Sci. 2023, 13, 6400. [Google Scholar] [CrossRef]
- Huang, Y.; Xu, X.; Li, Y.; Zhang, X.; Liu, Y.; Zhang, X. Vehicle-Following Control Based on Deep Reinforcement Learning. Appl. Sci. 2022, 12, 10648. [Google Scholar] [CrossRef]
- Guan, Y.; Ren, Y.; Li, S.E.; Sun, Q.; Luo, L.; Li, K. Centralized Cooperation for Connected and Automated Vehicles at Intersections by Proximal Policy Optimization. IEEE Trans. Veh. Technol. 2020, 69, 12597–12608. [Google Scholar] [CrossRef]
- Ferrarotti, L.; Luca, M.; Santin, G.; Previati, G.; Mastinu, G.; Gobbi, M.; Campi, E.; Uccello, L.; Albanese, A.; Zalaya, P.; et al. Autonomous and Human-Driven Vehicles Interacting in a Roundabout: A Quantitative and Qualitative Evaluation. IEEE Access 2016, 4, 32693–32705. [Google Scholar] [CrossRef]
- Peng, Z.; Zhou, X.; Wang, Y.; Zheng, L.; Liu, M.; Ma, J. Curriculum Proximal Policy Optimization with Stage-Decaying Clipping for Self-Driving at Unsignalized Intersections. In Proceedings of the 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 24–28 September 2023. [Google Scholar]
- Chen, H.; Chen, K.-L.; Hsu, H.-Y.; Hsieh, J.-Y. An Adaptive Federated Reinforcement Learning Framework with Proximal Policy Optimization for Autonomous Driving. In Proceedings of the 2023 IEEE 5th Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin, Taiwan, 27–29 October 2023. [Google Scholar]
- Grandesso, G.; Alboni, E.; Papini, G.P.R.; Wensing, P.M.; Prete, A.D. CACTO: Continuous Actor-Critic with Trajectory Optimization—Towards Global Optimality. IEEE Robot. Autom. Lett. 2023, 8, 3318–3325. [Google Scholar] [CrossRef]
- Ashraf, N.M.; Mostafa, R.R.; Sakr, R.H.; Rashad, M.Z. Optimizing hyperparameters of deep reinforcement learning for autonomous driving based on whale optimization algorithm. PLoS ONE 2021, 16, e0252754. [Google Scholar] [CrossRef] [PubMed]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Chen, D.; Liu, J.; Li, T.; He, J.; Chen, Y.; Zhu, W. Research on Mobile Robot Path Planning Based on MSIAR-GWO Algorithm. Sensors 2025, 25, 892. [Google Scholar] [CrossRef]
- Zheng, J.; Yuan, T.; Xie, W.; Yang, Z.; Yu, D. An Enhanced Flower Pollination Algorithm with Gaussian Perturbation for Node Location of a WSN. Sensors 2023, 23, 6463. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Zheng, J.; Mu, P.K.; Man, Z.; Luan, T.H.; Cai, L.X.; Shan, H. Device Placement for Autonomous Vehicles using Reinforcement Learning. In Proceedings of the 2021 IEEE International Conferences on Internet of Things (iThings) and IEEE Green Computing & Communications (GreenCom) and IEEE Cyber, Physical & Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics), Melbourne, Australia, 6–8 December 2021. [Google Scholar]
- Wong, C.C.; Feng, H.M.; Kuo, K.L. Multi-Sensor Fusion Simultaneous Localization Mapping Based on Deep Reinforcement Learning and Multi-Model Adaptive Estimation. Sensors 2023, 24, 48. [Google Scholar] [CrossRef]
- Yang, J.; Zhang, J.; Wang, H. Urban Traffic Control in Software Defined Internet of Things via a Multi-Agent Deep Reinforcement Learning Approach. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3742–3754. [Google Scholar] [CrossRef]
- Wu, S.; Xue, W.; Ye, H.; Li, S. A novel proximal policy optimization control strategy for unmanned surface vehicle. In Proceedings of the 2023 35th Chinese Control and Decision Conference (CCDC), Yichang, China, 20–22 May 2023. [Google Scholar]
- Sun, P.; Yang, C.; Zhou, X.; Wang, W. Path Planning for Unmanned Surface Vehicles with Strong Generalization Ability Based on Improved Proximal Policy Optimization. Sensors 2023, 23, 8864. [Google Scholar] [CrossRef]
- Ahmed, M.; Ouda, A.; Abusharkh, M. An Analysis of the Effects of Hyperparameters on the Performance of Simulated Autonomous Vehicles. In Proceedings of the 2022 International Telecommunications Conference (ITC-Egypt), Alexandria, Egypt, 26–28 July 2022. [Google Scholar]
- Town 10-Town 5. Carla Simulator. Available online: https://carla.readthedocs.io/en/latest/core_map/#non-layered-maps (accessed on 22 January 2025).
Hyperparameter | LFPPO Value | PPO Value | Description |
---|---|---|---|
hidden_size | 128 | 128 | Number of neurons in the hidden layers |
lr (learning rate) | 3 × 10−5 | 3 × 10−5 | Learning rate for the optimizer |
gamma | 0.99 | 0.99 | Discount factor for future rewards |
eps_clip | 0.15 | 0.15 | Clipping range for PPO’s surrogate function |
k_epochs | 10 | 10 | Number of epochs for policy optimization |
levy_lambda | 1.5 | - | Lambda value for Lévy flight optimization |
decay_rate | 0.95 | - | Decay rate for levy_lambda |
weight_decay | 1 × 10−4 | - | Weight decay parameter in the optimizer |
max_norm | 1.0 | 1.0 | Maximum norm for gradient clipping |
Performance Comparison Metrics | ||||||
---|---|---|---|---|---|---|
Metric | PPO Mean | PPO Max | PPO Std | LFPPO Mean | LFPPO Max | LFPPO Std |
Success Rate | 0.674511719 | 0.8154296 | 0.1041606 | 0.978944 | 0.99999 | 0.0191408 |
Entropy | 0.349050014 | 0.6315500 | 0.1932317 | 0.1736518 | 0.40245183 | 0.1401299 |
Policy Loss | −1801.333398 | −1797.805 | 2.2216211 | −3197.556 | −2907.9943 | 307.64856 |
Rewards | 24,180.5 | 33,895 | 5957.7468 | 38,518 | 42,135 | 3058.3312 |
Method | Success Rate (%) | Exploration Mechanism | Real-Time Data Processing | Key Limitation |
---|---|---|---|---|
PPO (Baseline) | ~81 | Clipped Surrogate Objective | Yes (Apache Kafka) | Limited exploration |
CPPO [35] | 78.5 | Curriculum-based Clipping | No | Slow adaptation to complexity |
MA-PPO [33] | - | Model-Augmented Optimization | No | Lacks robust exploration |
DDPG [38] | ~85 | Deterministic Policy Gradient | No | High-dimensional instability |
SAC [42] | ~90 | Entropy Maximization | No | Computational Complexity |
LFPPO (Proposed) | ~99 | Lévy flight + Clipped Objective | Yes (Apache Kafka) | High exploration |
Metric | LFPPO (Town 10) | PPO (Town 10) | LFPPO (Town 5) | PPO (Town 5) |
---|---|---|---|---|
Collisions (TTC = 0 s) | 10 runs (1%) | 190 runs (19%) | 8 runs (0.8%) | 205 runs (20.5%) |
Near-Miss (TTC 0.5–5 s) | 180 runs (18%) | 280 runs (28%) | 140 runs (14%) | 240 runs (24%) |
Mean TTC (Near-Miss) | 1.3 s | 2.4 s | 1.6 s | 3.1 s |
Safe (TTC > 10 s) | 810 runs (81%) | 530 runs (53%) | 852 runs (85.2%) | 555 runs (55.5%) |
Mean TTC (Safe) | 22 s | 16 s | 26 s | 19 s |
Mean Jerk (m/s3) | 5.2 (0.8–6.8) | 1.9 (0.5–3.2) | 4.8 (0.7–6.5) | 1.7 (0.5–3.0) |
Emergency Braking (events/episode) | 0.05 | 0.12 | 0.04 | 0.10 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bilban, M.; İnan, O. Optimizing Autonomous Vehicle Performance Using Improved Proximal Policy Optimization. Sensors 2025, 25, 1941. https://doi.org/10.3390/s25061941
Bilban M, İnan O. Optimizing Autonomous Vehicle Performance Using Improved Proximal Policy Optimization. Sensors. 2025; 25(6):1941. https://doi.org/10.3390/s25061941
Chicago/Turabian StyleBilban, Mehmet, and Onur İnan. 2025. "Optimizing Autonomous Vehicle Performance Using Improved Proximal Policy Optimization" Sensors 25, no. 6: 1941. https://doi.org/10.3390/s25061941
APA StyleBilban, M., & İnan, O. (2025). Optimizing Autonomous Vehicle Performance Using Improved Proximal Policy Optimization. Sensors, 25(6), 1941. https://doi.org/10.3390/s25061941