Advances in Q-Learning: Real-Time Optimization of Multi-Distant Transportation Systems
Abstract
1. Introduction
2. Materials and Methods
2.1. Ant Colony Optimization
2.2. The Genetic Algorithm
2.3. Nearest Neighbor
2.4. Insertion Heuristics
2.5. Christofides Algorithm
2.6. Simulated Annealing
2.7. Google Or-Tools
2.8. Q-Learning
3. Results and Discussion
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
QL | Q-learning |
TSP | Traveling salesman problem |
VRP | Vehicle routing problem |
ACO | Ant colony optimization |
GA | Genetic algorithm |
NN | Nearest neighbor |
SA | Simulated annealing |
MST | Minimum spanning tree |
RL | Reinforcement learning |
References
- Sutton, R.; Barto, A. Reinforcement Learning: An Introduction. 1998. Available online: https://www.cambridge.org/core/journals/robotica/article/robot-learning-edited-by-jonathan-h-connell-and-sridhar-mahadevan-kluwer-boston-19931997-xii240-pp-isbn-0792393651-hardback-21800-guilders-12000-8995/737FD21CA908246DF17779E9C20B6DF6 (accessed on 28 May 2025).
- Uchoa, E.; Pecin, D.; Pessoa, A.; Poggi, M.; Vidal, T.; Subramanian, A. New benchmark instances for the Capacitated Vehicle Routing Problem. Eur. J. Oper. Res. 2017, 257, 845–858. [Google Scholar] [CrossRef]
- CVRPLIB—Plotted Instances. Available online: http://vrp.atd-lab.inf.puc-rio.br/index.php/en/plotted-instances?data=E-n22-k4 (accessed on 29 May 2025).
- Davendra, D. Traveling Salesman Problem: Theory; Applications. 2010. Available online: https://books.google.com/books?hl=en&lr=&id=gKWdDwAAQBAJ&oi=fnd&pg=PR11&dq=Traveling+Salesman+Problem:+Theory+and+Applications&ots=aacB087hD7&sig=y3elL3SUkXtjd_TbIwEdi0T0ix8 (accessed on 31 May 2025).
- Demez, H. Combinatorial Optimization: Solution Methods of Traveling Salesman Problem. Master’s Thesis, Eastern Mediterranean University, Famagusta, North Cyprus, 2013. Available online: https://i-rep.emu.edu.tr/xmlui/handle/11129/654 (accessed on 31 May 2025).
- TSP—Data for the Traveling Salesperson Problem. Available online: https://people.sc.fsu.edu/~jburkardt/datasets/tsp/tsp.html (accessed on 29 May 2025).
- Reinelt, G. TSPLIB—A Traveling Salesman Problem Library. ORSA J. Comput. 1991, 3, 376–384. [Google Scholar] [CrossRef]
- Lingling, W.; Qingbao, Z. An efficient approach for solving TSP: The rapidly convergent ant colony algorithm. In Proceedings of the 4th International Conference on Natural Computation, ICNC 2008, Jinan, China, 18–20 October 2008; Volume 4, pp. 448–452. [Google Scholar] [CrossRef]
- Mohsen, A.M. Annealing Ant Colony Optimization with Mutation Operator for Solving TSP. Comput. Intell. Neurosci. 2016, 2016, 8932896. [Google Scholar] [CrossRef] [PubMed]
- Hussain, A.; Muhammad, Y.S.; Sajid, M.N.; Hussain, I.; Shoukry, A.M.; Gani, S. Genetic Algorithm for Traveling Salesman Problem with Modified Cycle Crossover Operator. Comput. Intell. Neurosci. 2017, 2017, 7430125. [Google Scholar] [CrossRef] [PubMed]
- Ismkhan, H.; Zamanifar, K. Developing Programming Tools to Handle Traveling Salesman Problem by the Three Object-Oriented Languages. Appl. Comput. Intell. Soft Comput. 2014, 2014, 137928. [Google Scholar] [CrossRef]
- Mukhairez, H.H.A.; Maghari, A.Y.A. Performance Comparison of Simulated Annealing, GA and ACO Applied to TSP. Int. J. Intell. Comput. Res. 2015, 6, 647–654. [Google Scholar] [CrossRef]
- El Din, H.M. Comparative Analysis of Ant Colony Optimization and Genetic Algorithm in Solving the Traveling Salesman Problem; Blenkinge Institute of Technology: Karlskrona, Sweden, 2021. [Google Scholar]
- Haroun, S.A.; Jamal, B.; Hicham, E.H. A Performance Comparison of GA and ACO Applied to TSP. Int. J. Comput. Appl. 2015, 117, 28–35. [Google Scholar] [CrossRef]
- Alhanjouri, M.; Alfarra, B. Ant colony versus genetic algorithm based on travelling salesman problem. Int. J. Comput. Technol. Appl. 2011, 2, 570–578. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=20d117804c246f3bcb366fd8e6962cde78e34f1b (accessed on 31 May 2025).
- Chen, P.; Wang, Q. Learning for multiple purposes: A Q-learning enhanced hybrid metaheuristic for parallel drone scheduling traveling salesman problem. Comput. Ind. Eng. 2024, 187, 109851. [Google Scholar] [CrossRef]
- Manfrin, M.; Birattari, M.; Stützle, T.; Dorigo, M. Parallel ant colony optimization for the traveling salesman problem. In Ant Colony Optimization and Swarm Intelligence: 5th International Workshop; Springer: Berlin, Heidelberg, 2006; Available online: https://link.springer.com/chapter/10.1007/11839088_20 (accessed on 31 May 2025).
- Gan, R.; Guo, Q.; Chang, H.; Yi, Y. Improved ant colony optimization algorithm for the traveling salesman problems. J. Syst. Eng. Electron. 2010, 21, 329–333. [Google Scholar] [CrossRef]
- Holland, J. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. 1992. Available online: https://books.google.com/books?hl=en&lr=&id=5EgGaBkwvWcC&oi=fnd&pg=PR7&dq=Adaptation+in+Natural+and+Artificial+Systems:+An+Introductory+Analysis+with+Applications+to+Biology,+Control,+and+Artificial+Intelligence&ots=mKjq65Knwo&sig=7UCexT89PHykaf8ooWmXKCa9XZM (accessed on 31 May 2025).
- Hougardy, S.; Wilde, M. On the nearest neighbor rule for the metric traveling salesman problem. Discrete Appl. Math. 2015, 195, 101–103. [Google Scholar] [CrossRef]
- Raymond, T.C. Journal of and Development and Undefined 1969. Heuristic Algorithm for the Traveling-Salesman Problem. Available online: https://ieeexplore.ieee.org/abstract/document/5391746/ (accessed on 31 May 2025).
- Ayudhya, W.; Grasman, S. Conference Undefined 2005. A New Heuristic Algorithm for the Traveling Salesman Problem. Available online: https://search.proquest.com/openview/ab34da7735a2205dfa1664a3bf507c81/1?pq-origsite=gscholar&cbl=51908 (accessed on 31 May 2025).
- Genova, K.; Williamson, D.P. An Experimental Evaluation of the Best-of-Many Christofides’ Algorithm for the Traveling Salesman Problem. Algorithmica 2017, 78, 1109–1130. [Google Scholar] [CrossRef]
- Bayram, H.; Şahin, R. A new simulated annealing approach for travelling salesman problem. Math. Comput. Appl. 2013, 18, 313–322. [Google Scholar] [CrossRef]
- Wang, Y.; Sun, S.; Li, W. Hierarchical Reinforcement Learning for Vehicle Routing Problems with Time Windows. In Proceedings of the Canadian Conference on Artificial Intelligence, Vancouver, BC, Canada, 25–28 May 2021. [Google Scholar] [CrossRef]
- Nazari, M.; Oroojlooy, A.; Snyder, L.; Takác, M. In Neural; Undefined 2018. Reinforcement Learning for Solving the Vehicle Routing Problem. Advances in Neural Information Processing Systems. Available online: https://proceedings.neurips.cc/paper/2018/hash/9fb4651c05b2ed70fba5afe0b039a550-Abstract.html (accessed on 19 August 2025).
- Yan, D.; Guan, Q.; Ou, B.; Yan, B.; Cao, H. Graph-Driven Deep Reinforcement Learning for Vehicle Routing Problems with Pickup and Delivery. Appl. Sci. 2025, 15, 4776. [Google Scholar] [CrossRef]
- Tien, Z.C.; Qi-lee, J. Enhancing vehicle routing problem solutions through deep reinforcement learning and graph neural networks. Int. J. Enterp. Model. 2022, 16, 125–135. [Google Scholar]
- Singh, J.; Dhurandher, S.K.; Woungang, I.; Ngatched, T.M.N. Multi-agent Reinforcement Learning Based Approach for Vehicle Routing Problem. Lect. Notes Inst. Comput. Sci. Soc.-Inform. Telecommun. Eng. 2023, 459, 411–422. [Google Scholar] [CrossRef]
Problem | Size | Optimal Answer | Q-Learning | Ant Colony Algorithm | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Iterations | Time | Answer | Accuracy | Iterations | Time | Answer | Accuracy | |||
FIVE | 5 | 19 | 100 | 0.015615 | 19 | 100 | 100 | 0.25 | 19 | 100 |
P01 | 15 | 291 | 100 | 0.0625 | 291 | 100 | 100 | 1.95 | 291 | 100 |
GR17 | 17 | 2085 | 100 | 0.04688 | 2187 | 95.33 | 100 | 2.3 | 2153 | 96.84 |
FRI26 | 26 | 937 | 200 | 0.15625 | 959 | 97.7 | 200 | 4.48 | 962 | 97.4 |
DANTZIG42 | 42 | 699 | 300 | 0.421941 | 82 | 85.24 | 300 | 21.55 | 830 | 84.22 |
ATT48 | 48 | 33,523 | 500 | 0.79745 | 38,375 | 87.36 | 500 | 28.86 | 38,624 | 86.79 |
Problem | Size | Optimal Answer | Nearest Neighbor | Christofides Algorithm | ||||||
Iterations | Time | Answer | Accuracy | Iterations | Time | Answer | Accuracy | |||
FIVE | 5 | 19 | - | 66.4 × 10−6 | 21 | 90.48 | - | 13.3 × 10−4 | 23 | 82.6 |
P01 | 15 | 291 | - | 80 × 10−6 | 291 | 100 | - | 16.5 × 10−4 | 432 | 67.36 |
GR17 | 17 | 2085 | - | 86.5 × 10−6 | 2187 | 95.33 | - | 20 × 10−4 | 2352 | 88.65 |
FRI26 | 26 | 937 | - | 115 × 10−6 | 1112 | 84.26 | - | 31 × 10−4 | 1094 | 85.65 |
DANTZIG42 | 42 | 699 | - | 200 × 10−6 | 956 | 73.12 | - | 57 × 10−4 | 908 | 76.98 |
ATT48 | 48 | 33,523 | - | 238 × 10−6 | 40,551 | 82.67 | - | 70 × 10−4 | 43,088 | 77.8 |
Problem | Size | Optimal Answer | Simulated Annealing | Genetic Algorithm | ||||||
Iterations | Time | Answer | Accuracy | Generations | Time | Answer | Accuracy | |||
FIVE | 5 | 19 | 100 | 59.2 × 10−4 | 19 | 100 | 100 | 0.0883 | 19 | 100 |
P01 | 15 | 291 | 100 | 86 × 10−4 | 291 | 100 | 100 | 0.272 | 307 | 94.79 |
GR17 | 17 | 2085 | 100 | 73 × 10−4 | 2090 | 99.76 | 100 | 0.2663 | 2167 | 96.22 |
FRI26 | 26 | 937 | 200 | 11 × 10−3 | 1088 | 86.12 | 200 | 1.02 | 1353 | 69.25 |
DANTZIG42 | 42 | 699 | 300 | 10.4 × 10−3 | 919 | 76.06 | 300 | 2.805 | 1066 | 65.57 |
ATT48 | 48 | 33,523 | 500 | 188 | 52,658 | 63.66 | 500 | 6.523 | 53,625 | 62.51 |
Problem | Size | Optimal Answer | Insertion Heuristics | |||||||
Iterations | Time | Answer | Accuracy | |||||||
FIVE | 5 | 19 | - | 83 × 10−6 | 19 | 100 | ||||
P01 | 15 | 291 | - | 206 × 10−6 | 371 | 78.43 | ||||
GR17 | 17 | 2085 | - | 273 × 10−6 | 2382 | 87.53 | ||||
FRI26 | 26 | 937 | - | 733 × 10−6 | 1201 | 78.02 | ||||
DANTZIG42 | 42 | 699 | - | 285 × 10−4 | 895 | 78.1 | ||||
ATT48 | 48 | 33,523 | - | 409 × 10−4 | 42,252 | 79.34 |
Problem | Number of Locations | Number of Drivers | Optimal Answer | Q-Learning (Multi—Agent) | ||
---|---|---|---|---|---|---|
Iterations | Answer | Accuracy (%) | ||||
P-n20-k2 | 20 | 2 | 216 | 100 | 223 | 96.86 |
P-n22-k2 | 22 | 2 | 216 | 100 | 238 | 90.75 |
E-n22-k4 | 22 | 4 | 375 | 100 | 412 | 91 |
E-n33-k4 | 33 | 4 | 835 | 200 | 921 | 90.66 |
P-n76-k4 | 76 | 4 | 593 | 300 | 1062 | 57.8 |
P-n101-k4 | 101 | 4 | 681 | 500 | 1500 | 45 |
Problem | Size | Number of Drivers | Optimal Answer | Google OR—TOOLS | ||
Iterations | Answer | Accuracy (%) | ||||
P-n20-k2 | 20 | 2 | 216 | 100 | 216 | 100 |
P-n22-k2 | 22 | 2 | 216 | 100 | 216 | 100 |
E-n22-k4 | 22 | 4 | 375 | 100 | 375 | 100 |
E-n33-k4 | 33 | 4 | 835 | 200 | 857 | 97.4 |
P-n76-k4 | 76 | 4 | 593 | 300 | 606 | 97.8 |
P-n101-k4 | 101 | 4 | 681 | 500 | 942 | 72.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Barghash, A.; Abuznaid, A. Advances in Q-Learning: Real-Time Optimization of Multi-Distant Transportation Systems. Appl. Sci. 2025, 15, 9493. https://doi.org/10.3390/app15179493
Barghash A, Abuznaid A. Advances in Q-Learning: Real-Time Optimization of Multi-Distant Transportation Systems. Applied Sciences. 2025; 15(17):9493. https://doi.org/10.3390/app15179493
Chicago/Turabian StyleBarghash, Ahmad, and Ahmad Abuznaid. 2025. "Advances in Q-Learning: Real-Time Optimization of Multi-Distant Transportation Systems" Applied Sciences 15, no. 17: 9493. https://doi.org/10.3390/app15179493
APA StyleBarghash, A., & Abuznaid, A. (2025). Advances in Q-Learning: Real-Time Optimization of Multi-Distant Transportation Systems. Applied Sciences, 15(17), 9493. https://doi.org/10.3390/app15179493