A Complete Coverage Path Planning Algorithm for Lawn Mowing Robots Based on Deep Reinforcement Learning
Abstract
:1. Introduction
2. Workspace Modeling and Robotic System Overview
2.1. Hardware and Software Architecture
2.1.1. Integrated Hardware and Software System
2.1.2. Base Station Overview
2.2. Simplified Robot Modeling
2.3. Workspace Modeling and Preprocessing
3. Deep Reinforcement Learning Models and the Improvements
3.1. The DQN Algorithm
3.2. The Re-DQN Algorithm
Algorithm 1 Re-DQN |
3.2.1. Improvements Based on Action Selection
3.2.2. Reward Function
3.2.3. Improved DQN Network Architecture
- represents the state of the robot ().
- represents the state of the target ().
- represent the states of the first n obstacles ().
- Proximity: This function calculates the distance between the current action and the obstacles. The closer the action is to an obstacle, the lower the dynamic incentive value (i.e. negative incentive):
- Obstacl_Density: This function measures the density of obstacles in the environment. The higher the obstacle density, the more negative the incentive applied by the system, encouraging the agent to avoid areas with dense obstacles:
- and are hyperparameters used to adjust the importance weights of Proximity and Obstacle_Density, respectively.
3.2.4. Environmental Terrain Design
- (1)
- Noise Generation: The noise generator generates noise maps with specified dimensions and frequency. These noise maps serve as the foundation for creating terrain features.
- (2)
- Normalization: The generated noise maps are normalized to ensure that the terrain values range from 0 (representing the lowest altitude) to 1 (representing the highest altitude). This normalization helps to evenly represent different terrain heights.
- (1)
- represents a linear interpolation function:
- (2)
- is the smooth interpolation function, typically using the cubic Hermite function:
4. Simulation Results and Discussion
4.1. Setting of Simulation Conditions
- (1)
- Validate the effectiveness of the improved DQN algorithm, i.e., verify the path coverage capability of the improved DQN in different environments through experiments, and assess its adaptability and performance in various complex scenarios.
- (2)
- Evaluate the impact of key hyperparameters during training, that is, adjust hyperparameters such as the exploration rate, discount factor, and learning rate to analyze their effects on model training performance and stability.
- (3)
- Enhance the model’s adaptability in complex environments, i.e., test the model in environments with obstacles or irregular boundaries, study the performance of the DQN algorithm in such environments, and propose corresponding optimization strategies.
- (1)
- Coverage map indicating all reachable areas have been visited.
- (2)
- Agent collides with an obstacle.
- (3)
- Agent reaches the boundary of the map.
4.2. Outdoor Map Simulation Experiment
- (1)
- Movement cost differences: Varying terrain heights result in different movement costs for agents, with the algorithm favoring paths with lower costs.
- (2)
- Accessibility: Areas with significant elevation differences may be considered impassable, requiring the path planning algorithm to avoid these regions.
- (3)
- Reward mechanism: The terrain information is integrated into the reward function, where larger elevation changes incur penalties, encouraging agents to select flatter paths.
- (4)
- Environmental complexity: The terrain adds complexity to the planning process, requiring the algorithm to balance terrain difficulty with coverage efficiency.
4.3. Parameter Analysis
- (1)
- N: Determines the number of samples stored in the experience replay buffer. A buffer that is too small may lead to less diverse samples, affecting the model’s generalization ability, while a buffer that is too large may increase memory demands. It is generally set between 5000 and 100,000. For simpler tasks, 5000 may be sufficient. For more complex tasks or if ample memory resources are available, a larger size can be chosen.
- (2)
- : A larger discount factor means the model focuses more on long-term rewards, while a smaller factor means the model focuses more on short-term rewards. It is usually set between 0.9 and 0.99. A value of 0.9 indicates that the importance of future rewards gradually decreases, suitable for short-term decision tasks; 0.99 is more appropriate for tasks with a longer time span.
- (3)
- : Initially, a high exploration rate helps in exploring new strategies; as training progresses, the exploration rate gradually decreases, leading to more reliance on the learned strategy.
- : 0.9 to 1.0. Usually set high to encourage more exploration at the beginning.
- : 0.01 to 0.1. A lower value ensures that the model relies more on the learned strategy during the later stages of training.
- : 2000 to 10,000. A higher decay value means a longer exploration period, suitable for more complex tasks.
- (4)
- : Frequency of updating the target network. A lower update frequency may lead to delayed target updates, while a higher frequency may cause instability in training. It is recommended to set it between 500 and 5000 steps. For simpler tasks, 500 steps may suffice; for more complex tasks, a higher step count can be selected to stabilize training.
- (5)
- : Controls the step size of each parameter update. A learning rate that is too high may cause instability in training, while a learning rate that is too low may result in slow or stagnant training. It is usually advisable to start with a small learning rate, such as 0.001, and adjust based on the training outcomes.
- (6)
- Environment and reward parameters: Additional settings related to the environment and rewards, which are not extensively discussed here, can be adjusted based on the size of your map and the terrain you wish to design. Relevant parameters include the following:
- : 0.01–0.1.
- : 0.01–0.1.
- : 0.1–1.0: adjusted according to the density of obstacles and the agent’s ability to avoid them. A higher value should be set if you want the agent to be very sensitive to obstacles.
- and : If the task focuses on coverage and discovering new areas, increase the values of and . If the task emphasizes precise and efficient path planning, consider increasing and values.
4.4. Complete Coverage Path Planning Results
5. Conclusions
5.1. Main Conclusions and Findings
5.2. Main Limitation of the Research
5.3. Future Research Prospects
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Choset, H. Coverage for robotics–a survey of recent results. Ann. Math. Artif. Intell. 2001, 31, 113–126. [Google Scholar] [CrossRef]
- Latombe, J.-C.; Latombe, J.-C. Exact cell decomposition. In Robot Motion Planning; Springer Science & Business Media: Berlin, Germany, 1991; pp. 200–247. [Google Scholar]
- Oksanen, T.; Visala, A. Coverage path planning algorithms for agricultural field machines. J. Field Robot. 2009, 26, 651–668. [Google Scholar] [CrossRef]
- Choset, H.; Pignon, P. Coverage path planning: The boustrophedon cellular decomposition. In Field and Service Robotics; Springer: Berlin/Heidelberg, Germany, 1998; pp. 203–209. [Google Scholar]
- Choset, H. Coverage of known spaces: The boustrophedon cellular decomposition. Auton. Robot. 2000, 9, 247–253. [Google Scholar] [CrossRef]
- Huang, W.H. Optimal line-sweep-based decompositions for coverage algorithms. In Proceedings of the 2001 ICRA, IEEE International Conference on Robotics and Automation (Cat. No. 01CH37164), Seoul, Republic of Korea, 21–26 May 2001; IEEE: Piscataway, NJ, USA, 2001; Volume 1, pp. 27–32. [Google Scholar]
- Moravec, H.; Elfes, A. High resolution maps from wide angle sonar. In Proceedings of the 1985 IEEE International Conference on Robotics and Automation, St. Louis, MO, USA, 25–28 March 1985; IEEE: Piscataway, NJ, USA,, 1985; Volume 2, pp. 116–121. [Google Scholar]
- Hodgkin, A.L.; Huxley, A.F. A quantitative description of membrane current and its application to conduction and excitation in nerve. Bull. Math. Biol. 1990, 52, 25–71. [Google Scholar] [CrossRef]
- Grossberg, S. Nonlinear neural networks: Principles, mechanisms, and architectures. Neural Netw. 1988, 1, 17–61. [Google Scholar] [CrossRef]
- Gabriely, Y.; Rimon, E. Competitive on-line coverage of grid environments by a mobile robot. Comput. Geom. 2003, 24, 197–224. [Google Scholar] [CrossRef]
- Gonzalez, E.; Alarcon, M.; Aristizabal, P.; Parra, C. Bsa: A coverage algorithm. In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No. 03CH37453), Las Vegas, NV, USA, 27–31 October 2003; IEEE: Piscataway, NJ, USA, 2003; Volume 2, pp. 1679–1684. [Google Scholar]
- Choi, Y.-H.; Lee, T.-K.; Baek, S.-H.; Oh, S.-Y. Online complete coverage path planning for mobile robots based on linked spiral paths using constrained inverse distance transform. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO, USA, 10–15 October 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 5788–5793. [Google Scholar]
- Luo, C.; Yang, S.X.; Stacey, D.A.; Jofriet, J.C. A solution to vicinity problem of obstacles in complete coverage path planning. In Proceedings of the 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), Washington, DC, USA, 11–15 May 2002; IEEE: Piscataway, NJ, USA, 2002; Volume 1, pp. 612–617. [Google Scholar]
- Yang, S.X.; Luo, C. A neural network approach to complete coverage path planning. IEEE Trans. Syst. Man Cybern. Part B (Cybernetics) 2004, 34, 718–724. [Google Scholar] [CrossRef]
- Tang, G.; Tang, C.; Zhou, H.; Claramunt, C.; Men, S. R-DFS: A coverage path planning approach based on region optimal decomposition. Remote Sens. 2021, 13, 1525. [Google Scholar] [CrossRef]
- Zhang, J.; Zhang, J.; Ma, Z.; He, Z. Using partial-policy q-learning to plan path for robot navigation in unknown enviroment. In Proceedings of the 2017 10th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 9–10 December 2017; IEEE: New York, NY, USA, 2017; Volume 1, pp. 192–196. [Google Scholar]
- Szczerba, R.J.; Galkowski, P.; Glicktein, I.S.; Ternullo, N. Robust algorithm for real-time route planning. IEEE Trans. Aerosp. Electron. Syst. 2000, 36, 869–878. [Google Scholar] [CrossRef]
- LaValle, S. Rapidly-exploring random trees: A new tool for path planning. In Research Report 9811; 1998; Available online: https://msl.cs.illinois.edu/~lavalle/papers/Lav98c.pdf (accessed on 7 January 2025).
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef]
- Tai, L.; Paolo, G.; Liu, M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 31–36. [Google Scholar]
- Wu, J.; Yu, P.; Feng, L.; Zhou, F.; Li, W.; Qiu, X. 3D aerial base station position planning based on deep Q-network for capacity enhancement. In Proceedings of the 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), Arlington, VA, USA, 8–12 April 2019; IEEE: New York, NY, USA, 2019; pp. 482–487. [Google Scholar]
- Zhou, S.; Liu, X.; Xu, Y.; Guo, J. A deep Q-network (DQN) based path planning method for mobile robots. In Proceedings of the 2018 IEEE International Conference on Information and Automation (ICIA), Wuyishan, China, 11–13 August 2018; IEEE: New York, NY, USA, 2018; pp. 366–371. [Google Scholar]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Zhang, F.; Gu, C.; Yang, F. An improved algorithm of robot path planning in complex environment based on Double DQN. In Advances in Guidance, Navigation and Control, Proceedings of the 2020 International Conference on Guidance, Navigation and Control, ICGNC 2020, Tianjin, China, 23–25 October 2020; Springer: Berlin/Heidelberg, Germany, 2022; pp. 303–313. [Google Scholar]
- Lei, X.; Zhang, Z.; Dong, P. Dynamic Path Planning of Unknown Environment Based on Deep Reinforcement Learning. J. Robot. 2018, 2018, 5781591. [Google Scholar] [CrossRef]
- Sonny, A.; Yeduri, S.R.; Cenkeramaddi, L.R. Q-learning-based unmanned aerial vehicle path planning with dynamic obstacle avoidance. Appl. Soft Comput. 2023, 147, 110773. [Google Scholar] [CrossRef]
- Yang, X.; Han, Q. Improved DQN for Dynamic Obstacle Avoidance and Ship Path Planning. Algorithms 2023, 16, 220. [Google Scholar] [CrossRef]
- Wang, T.; Peng, X.; Wang, T.; Liu, T.; Xu, D. Automated design of action advising trigger conditions for multiagent reinforcement learning: A genetic programming-based approach. Swarm Evol. Comput. 2024, 85, 101475. [Google Scholar] [CrossRef]
- Yuan, G.; Xiao, J.; He, J.; Jia, H.; Wang, Y.; Wang, Z. Multi-agent cooperative area coverage: A two-stage planning approach based on reinforcement learning. Inf. Sci. 2024, 678, 121025. [Google Scholar] [CrossRef]
- Ramezani, M.; Amiri Atashgah, M.A.; Rezaee, A. A Fault-Tolerant Multi-Agent Reinforcement Learning Framework for Unmanned Aerial Vehicles–Unmanned Ground Vehicle Coverage Path Planning. Drones 2024, 8, 537. [Google Scholar] [CrossRef]
- Sanchez-Ibanez, J.R.; Pérez-del Pulgar, C.J.; García-Cerezo, A. Path planning for autonomous mobile robots: A review. Sensors 2021, 21, 7898. [Google Scholar] [CrossRef]
- Borenstein, J. Vfh+: Reliable obstacle avoidance for fast mobile robots. In Proceedings of the 1998 IEEE International Conference on Robotics and Automation (Cat. No. 98CH36146), Leuven, Belgium, 20–20 May 1998. [Google Scholar]
- Lin, Y.Y.; Ni, C.C.; Lei, N.; Gu, X.D.; Gao, J. Robot coverage path planning for general surfaces using quadratic differentials. arXiv 2017, arXiv:1701.07549. [Google Scholar]
- Dam, T.; Chalvatzaki, G.; Peters, J.; Pajarinen, J. Monte-carlo robot path planning. IEEE Robot. Autom. Lett. 2022, 7, 11213–11220. [Google Scholar] [CrossRef]
- Watkins, C.; Dayan, P. Technical Note: Q-Learning. Mach. Learn. 2004, 8, 279–292. [Google Scholar] [CrossRef]
- Zhang, J.; Zhang, C.; Chien, W.C. Overview of deep reinforcement learning improvements and applications. J. Internet Technol. 2021, 22, 239–255. [Google Scholar]
- Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4909–4926. [Google Scholar] [CrossRef]
- Kim, M.J.; Park, H.; Ahn, C.W. Nondominated policy-guided learning in multi-objective reinforcement learning. Electronics 2022, 11, 1069. [Google Scholar] [CrossRef]
- Mignon, A.d.S.; da Rocha, R.L.d.A. An adaptive implementation of ε-greedy in reinforcement learning. Procedia Comput. Sci. 2017, 109, 1146–1151. [Google Scholar] [CrossRef]
- Li, J.; Shi, X.; Li, J.; Zhang, X.; Wang, J. Random curiosity-driven exploration in deep reinforcement learning. Neurocomputing 2020, 418, 139–147. [Google Scholar] [CrossRef]
- He, Y.L.; Zhang, X.L.; Ao, W.; Huang, J.Z. Determining the optimal temperature parameter for Softmax function in reinforcement learning. Appl. Soft Comput. 2018, 70, 80–85. [Google Scholar] [CrossRef]
- Pan, L.; Rashid, T.; Peng, B.; Huang, L.; Whiteson, S. Regularized softmax deep multi-agent q-learning. Adv. Neural Inf. Process. Syst. 2021, 34, 1365–1377. [Google Scholar]
- Wang, Y.; He, Z.; Cao, D.; Ma, L.; Li, K.; Jia, L.; Cui, Y. Coverage path planning for kiwifruit picking robots based on deep reinforcement learning. Comput. Electron. Agric. 2023, 205, 107593. [Google Scholar] [CrossRef]
- Guo, S.; Zhang, X.; Du, Y.; Zheng, Y.; Cao, Z. Path planning of coastal ships based on optimized DQN reward function. J. Mar. Sci. Eng. 2021, 9, 210. [Google Scholar] [CrossRef]
- Zhu, S.; Gui, L.; Cheng, N.; Sun, F.; Zhang, Q. Joint design of access point selection and path planning for UAV-assisted cellular networks. IEEE Internet Things J. 2019, 7, 220–233. [Google Scholar] [CrossRef]
- Yang, Y.; Juntao, L.; Lingling, P. Multi-robot path planning based on a deep reinforcement learning DQN algorithm. CAAI Trans. Intell. Technol. 2020, 5, 177–183. [Google Scholar] [CrossRef]
- Meng, L.; Goodwin, M.; Yazidi, A.; Engelstad, P.E. Improving the diversity of bootstrapped dqn by replacing priors with noise. IEEE Trans. Games 2022, 15, 580–589. [Google Scholar] [CrossRef]
- Xing, B.; Wang, X.; Yang, L.; Liu, Z.; Wu, Q. An algorithm of complete coverage path planning for unmanned surface vehicle based on reinforcement learning. J. Mar. Sci. Eng. 2023, 11, 645. [Google Scholar] [CrossRef]
OS | Language | CPU | GPU | RAM |
---|---|---|---|---|
Ubuntu 22.04 | Python 3.8 | Intel i5-13400 | RTX 4070 Ti | 12 Gb |
DQN | Re-DQN | Re-DQN w/o EM | Re-DQN w/o IR | Re-DQN w/o DIS | |
---|---|---|---|---|---|
Steps | 120 | 100 | 108 | 110 | 115 |
Tiles visited | 87 | 108 | 88 | 96 | 101 |
Rewards | 65 | 85 | 72 | 79 | 81 |
Algorithm | fill_ratio = 0.04 | fill_ratio = 0.06 | fill_ratio = 0.07 |
---|---|---|---|
Boustrophedon | 92% | 89% | 86% |
A* Coverage Algorithm | 95% | 94% | 93% |
DQN | 87% | 83% | 78% |
DDQN | 89% | 83% | 82% |
Dueling DQN | 87% | 84% | 81% |
PPO | 95% | 92% | 90% |
Re-DQN (Our algorithm) | 100% | 97% | 94% |
Algorithm | Simple Flat | Moderate Complexity | High Complexity |
---|---|---|---|
Boustrophedon | 92% | 83% | 78% |
A* Coverage Algorithm | 96% | 91% | 83% |
DQN | 84% | 83% | 78% |
DDQN | 86% | 83% | 80% |
Dueling DQN | 87% | 84% | 81% |
PPO | 95% | 92% | 86% |
Re-DQN (Our algorithm) | 100% | 95% | 93% |
Algorithm | Path Length | Coverage (%) | Redundancy (%) | Adaptability (%) | Complexity |
---|---|---|---|---|---|
Boustrophedon | 237 | 87 | 18.4 | 60 | Low |
A* Coverage Algorithm | 212 | 95 | 32.7 | 65 | Low |
DQN | 189 | 82 | 28.2 | 65 | Moderate |
DDQN | 178 | 84 | 27.4 | 75 | Moderate |
Dueling DQN | 183 | 81 | 26.3 | 65 | Moderate |
PPO | 173 | 93 | 11.4 | 85 | very high |
Re-DQN(Our algorithm) | 159 | 95 | 6.2 | 90 | Moderate |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, Y.; Lu, Z.-M.; Cui, J.-L.; Luo, H.; Zheng, Y.-M. A Complete Coverage Path Planning Algorithm for Lawn Mowing Robots Based on Deep Reinforcement Learning. Sensors 2025, 25, 416. https://doi.org/10.3390/s25020416
Chen Y, Lu Z-M, Cui J-L, Luo H, Zheng Y-M. A Complete Coverage Path Planning Algorithm for Lawn Mowing Robots Based on Deep Reinforcement Learning. Sensors. 2025; 25(2):416. https://doi.org/10.3390/s25020416
Chicago/Turabian StyleChen, Ying, Zhe-Ming Lu, Jia-Lin Cui, Hao Luo, and Yang-Ming Zheng. 2025. "A Complete Coverage Path Planning Algorithm for Lawn Mowing Robots Based on Deep Reinforcement Learning" Sensors 25, no. 2: 416. https://doi.org/10.3390/s25020416
APA StyleChen, Y., Lu, Z.-M., Cui, J.-L., Luo, H., & Zheng, Y.-M. (2025). A Complete Coverage Path Planning Algorithm for Lawn Mowing Robots Based on Deep Reinforcement Learning. Sensors, 25(2), 416. https://doi.org/10.3390/s25020416