Abstract
The more path conflicts between multiple robots, the more time it takes to avoid each other, and the more navigation time it takes for the robots to complete all tasks. This study designs a multi-robot navigation system based on deep reinforcement learning to provide an innovative and effective method for global path planning of multi-robot navigation. It can plan paths with fewer path conflicts for all robots so that the overall navigation time for the robots to complete all tasks can be reduced. Compared with existing methods of global path planning for multi-robot navigation, this study proposes new perspectives and methods. It emphasizes reducing the number of path conflicts first to reduce the overall navigation time. The system consists of a localization unit, an environment map unit, a path planning unit, and an environment monitoring unit, which provides functions for calculating robot coordinates, generating preselected paths, selecting optimal path combinations, robot navigation, and environment monitoring. We use topological maps to simplify the map representation for multi-robot path planning so that the proposed method can perform path planning for more robots in more complex environments. The proximal policy optimization (PPO) is used as the algorithm for deep reinforcement learning. This study combines the path selection method of deep reinforcement learning with the A* algorithm, which effectively reduces the number of path conflicts in multi-robot path planning and improves the overall navigation time. In addition, we used the reciprocal velocity obstacles algorithm for local path planning in the robot, combined with the proposed global path planning method, to achieve complete and effective multi-robot navigation. Some simulation results in NVIDIA Isaac Sim show that for 1000 multi-robot navigation tasks, the maximum number of path conflicts that can be reduced is 60,375 under nine simulation conditions.
1. Introduction
Artificial intelligence and robotics are developing rapidly around the world, and all industries are subject to the significant impact of artificial intelligence and robotics. In the logistics, manufacturing, and service-related fields, the application of robotics significantly improves production efficiency and reduces costs. A multi-robot navigation system is one of the advanced core technologies, in addition to enabling the robot to have basic mobility [1], so that more than one robot has the ability to work in collaboration [2,3] so as to achieve higher efficiency and capacity than a single robot. A multi-robot navigation system is an advanced and hot topic, which aims to enable multiple robots to have the ability to work together in the same environment. In addition to the basic functions of localization [4], path planning [5], and obstacle avoidance [6], it is more important that the multi-robot navigation system can transfer data through communication [7] and then coordinate the multi-robots to complete the tasks.
Multi-robot navigation technology is important in the logistics and manufacturing industries [5]. According to Expert Market Research, global air cargo traffic continues to grow, and logistics operators need to adopt more efficient methods to improve delivery speed [8]. Amazon has deployed more than 750,000 robots in its warehousing system [9], using a multi-robot navigation system to coordinate and synchronize work to improve logistics efficiency and change the traditional mode of operation. With the development of technology, multi-robot cooperative work has become an important issue. In terms of accurate robot positioning, UWB technology is widely used in robot positioning systems [10,11] to achieve high-accuracy positioning. In addition, optimizing the path and task assignment is also crucial in the factory. Chatzisavvas et al. [12] combined the Dijkstra and Kuhn–Munkers algorithms to achieve significant results in optimizing the path and task assignment and provided a real-time visualization tool. Warita et al. [13] utilized a Monte Carlo tree search and item exchange strategy to enhance the efficiency of pickup and delivery operations in the warehouse. It can be seen that multi-robot navigation systems face challenges, such as increased demand for computing resources, navigation difficulties in dynamic environments, complexity of multi-robot coordination and control, and system reliability, which affect the performance of their practical applications. Therefore, multi-robot navigation is an important and evolving issue in real warehouses.
One of the core challenges of multi-robot navigation systems is to ensure that multiple robots can work together in a common environment without collision [14]. To solve this problem, multi-agent path finding (MAPF) has become an important research direction. The MAPF problem involves planning paths for multiple mobile agents (e.g., robots, drones, etc.) to ensure that they do not collide in the process of reaching their respective target locations from their starting points while optimizing the path efficiency and resource utilization. The MAPF problem is usually carried out in a discrete grid or graph, where each node represents a location, and each edge represents a movable path. The goal is to find a collision-free path for each agent to move from the starting point to the target location. The mainstream of its solution methods include algorithms such as A* extension, mixed integer linear programming (MILP), conflict-based search (CBS), and decentralized. For example, the BA*-MAPF algorithm proposed by Meng et al. [15] combines the improved A* algorithm and the improved artificial position field algorithm, which effectively solves the path planning problem of UGVs in complex dynamic environments through the bi-directional search strategy and the modification of the position field function. Wu et al. [16] utilized a decentralized architecture using complex Laplacian matrices for local interactions between robots to achieve queue control, where each robot adjusts its position and speed based on interactions with neighboring robots without requiring global information about the entire system. Pianpak et al. [17] proposed a decentralized multi-agent path planning solver called ros-dmapf. This solver combines several MAPF sub-solvers that not only solve their respective sub-problems but also solve the overall MAPF problem through synergy. Levin et al. [18] proposed an intersection control method based on a conflict point model for optimizing the movement of self-driving vehicles at unsignalized intersections. This approach uses MILP to ensure that vehicles do not collide during their travel at intersections and uses a rolling window algorithm to handle large-scale traffic flows.
In addition, Bai et al. [19] used the CBS method, a well-known centralized approach, to combine centralized task allocation with decentralized path execution methods, using a two-layer architecture to allow the upper-layer architecture to effectively solve the path conflicts of the underlying robot planning and improve the overall efficiency and robustness of the multi-robot system. This conflict-based search method [20] is also one of the common methods currently. Meanwhile, many studies are conducting further studies based on CBS, such as K-CBS [21], CBS-MP [22], and db-CBS [23].
Specifically, many solutions for MAPF have been developed, such as decentralized approaches and centralized approaches. Some of these methods have matured and are applicable in real-world environments, but there is still huge potential for development, especially in reducing navigation time. Both types of methods involve obstacle avoidance, which is inherently time consuming. Therefore, the navigation time of these methods will increase with the number of path conflicts. In summary, effectively resolving path conflicts, reducing navigation time, and improving navigation efficiency are issues that need to be discussed.
Based on the literature review and problem analysis, this study proposes a novel multi-robot navigation architecture, which aims to minimize path conflicts by decentralizing robot paths and ultimately reduce the overall navigation time. The proposed architecture integrates topological maps with deep reinforcement learning to achieve fast and efficient path planning. Furthermore, the proposed method focuses on reducing the number of path conflicts while ensuring computational efficiency and maintaining scalability. The comparison of conflict-based search, distributed reinforcement learning, and the proposed method is shown in Table 1. The proposed method not only combines the advantages of both but also strikes a balance between computational load and system scalability.
Table 1.
Comparison of conflict-based search, distributed reinforcement learning, and the proposed method for multi-robot navigation.
The rest of this article is organized as follows. In Section 2, an overview of the experimental environment and equipment are provided. In Section 3, the proposed multi-robot navigation system is described, including its architecture and process, the construction of the environmental map, the preselected path generation unit, and the implementation of the deep reinforcement learning model. In Section 4, the experimental results are presented, detailing the performance of the model and data comparisons, followed by simulations conducted in NVIDIA Isaac Sim. Finally, Section 5 concludes this study with a summary of the findings and future work.
2. Preliminary
In the preliminary, there are two main sections: (i) environment and (ii) equipment. They are described as follows.
2.1. Environment
In order to further demonstrate the performance of the system proposed in this study, as shown in Figure 1, the experimental environment is set in the NVIDIA Isaac Sim virtual environment to highly reproduce the warehouse scene. Isaac Sim is a highly realistic robot simulator developed by NVIDIA [24], using NVIDIA’s PhysX engine, which offers highly simulated physical properties, including collision detection, dynamics simulation, etc., and is capable of performing robot development, testing, and management in a physically accurate environment. There are three main features: (i) simulated control and navigation, (ii) modular graphical interface design, and (iii) a GPU-accelerated physics engine. In addition to enabling developers to conveniently test robot motions and behaviors in a virtual environment, the use of GPUs for physics simulation and rendering provides a high-performance simulation environment that accelerates training and testing. Moreover, its environment supports ROS/ROS2, allowing developers to run existing ROS programs and modules directly into the simulation environment and use the extensive Isaac SDK API for custom development. In addition, NVIDIA Isaac sim also integrates various libraries, such as NVIDIA Isaac Gym [25], OpenAl Gym [26], and Stable Baselines 3 [27], to form a powerful framework, which is also very suitable for the development and training of robotic systems based on deep reinforcement learning [28,29].
Figure 1.
The warehousing environment used in this study in Isaac Sim. (a) The warehouse has a total of 24 neatly arranged shelves. (b) The internal conditions of the warehouse are shown in (a).
Meanwhile, Isaac Sim supports a variety of file formats, including the USD format developed by Pixar, which facilitates efficient scene management and multi-user collaboration; the URDF format, which is used to describe the robot’s structure and kinematics in detail, ensuring good compatibility with ROS/ROS2; and the MJCF format, which describes the robot’s kinematics in detail, making it suitable for high-precision kinetic simulation. The MJCF format, which describes the robot’s kinematics in detail, is suitable for high-precision dynamic simulation. Support for these formats enables Isaac Sim to efficiently describe and manage complex 3D scenes and robot models, facilitating interoperability between different systems. In summary, NVIDIA Isaac Sim is a powerful and flexible tool that provides users with a wealth of resources and support for developing, testing, and optimizing virtual environment systems.
2.2. Equipment
For the actual operation of the robot, as shown in Figure 2, this study uses the autonomous mobile robot iw.hub developed by Idealworks GmbH, a subsidiary of the BMW Group, based in Munich, Germany. Its specifications are shown in Table 2. The main features of the iw.hub include a loading capacity of up to 1000 kg and a maximum speed of 2.2 m per second, which makes it one of the fastest robots of its kind, and it is currently being used extensively in BMW factories in addition to being used in various industries to meet the scenario and task requirements of this study.
Figure 2.
The mobile robot iw.hub used in NVIDIA Isaac Sim.
Table 2.
Specification of the mobile robot iw.hub.
4. Simulation Results and Discussion
This study designed some experiments to evaluate the performance of the proposed system. In addition, experiments are performed in NVIDIA Isaac Sim to obtain experimental results that closely resemble real-world scenarios. In the system evaluation experiment, the system’s ability to reduce path conflicts and its operating efficiency under different map sizes and the number of robots are mainly evaluated. This experiment demonstrates the significant impact of the concept of reducing path conflicts, while in feasibility evaluation experiments, the main goal is to verify the practical feasibility of the proposed system.
4.1. System Evaluation Experiment
This study designed nine simulation conditions, as shown in Table 5. Each condition differed in map size and number of robots, labeled E1(23) to E3(338). The map size is divided into three scenarios: small, medium and large, corresponding to node arrangements of 5 × 5, 10 × 10, and 15 × 15. The map node diagram is shown in Figure 8. The horizontal and vertical axes distances between each node in the map are set to three and six time steps, respectively. The main consideration is the realistic map size and the number of robots. Usually, this number of robots can operate better and will not be too crowded or insufficient to cope with transportation needs. Therefore, the number of robots is set in corresponding proportions based on these scenarios, including 50%, 100%, and 150% of the corresponding number of nodes. These settings help simulate the dynamic characteristics of the real environment.
Table 5.
Nine simulation conditions.
Figure 8.
Map node diagram. (a) A 5 × 5 map. (b) A 10 × 10 map. (c) A 15 × 15 map.
In the simulation results of 1000 times, the numbers of path conflicts in nine simulation conditions are shown in Figure 9. For the part of the number of conflict improvement, E1(13) to E3(338) have about 15,451 and 2,139,331 collisions, respectively, in 1000 plannings. We can see that when the complexity of the environment and the number of robots increases, the number of path conflicts increases significantly. This highlights the important concept proposed in this study to reduce the number of path conflicts.
Figure 9.
Numbers of path conflicts in nine simulation conditions. (a) Experimental results in a 5 × 5 map. (b) Experimental results in a 10 × 10 map. (c) Experimental results in a 15 × 15 map.
As shown in Figure 10 and Table 6, the number of path conflicts and improvement effects in nine experimental conditions are described. We can see that the deep reinforcement learning model reduces the ratio of reduced path conflicts for E1(13) and E3(338) by about 9.84% and 2.82%, respectively. If it takes about 3 s for a robot control system to resolve the path conflict once, the navigation time for E1(13) and E3(338) can save 1.27 h and 50.31 h, respectively. The main reason for this is that the increase in robot density makes it difficult for the model to select a better combination of paths; the preselected paths used in this study are relatively one-dimensional, and more complete preselected paths may further improve the model’s capability.
Figure 10.
Ratio of reduced path conflicts in nine simulation conditions. (a) Experimental results in a 5 × 5 map. (b) Experimental results in a 10 × 10 map. (c) Experimental results in a 15 × 15 map.
Table 6.
Statistics table on the number of path conflicts and improvement effects in nine experimental conditions.
In the part of resource utilization, as shown in Figure 11, the graph’s vertical axis represents the time in milliseconds. As the map size and the number of robots increase, the system response time also increases proportionally. However, it can be observed that the inference time of this system model is very short compared to the inference time of the overall system. It can be observed that the model inference time for E3(338) takes only 0.19 ms, but the inference time of the whole system takes 38.50 ms. Therefore, most of the system time is spent on other functions, such as preselected path generation and path management. It can be observed more clearly in Figure 12 that with the rise of the complexity of the environment and the number of robots, the inference time of the system is much higher than that of the model. Figure 12 shows that the inference time of the system is much higher than that of the model when the complexity of the environment and the number of robots is increased. If further optimization of performance is needed in the future, other functions can be improved first. On the other hand, under the two simulation conditions of E1(13) and E3(338), according to the conversion of the model and system inference time, we can obtain an operation efficiency of about 1538.46 FPS and 20.58 FPS, respectively. Therefore, if we need to change the path immediately or even carry out the planning in a dynamic way, the current efficiency of the system can be fully capable of handling it, which confirms the feasibility of the system proposed in this study.
Figure 11.
Results of the resource utilization for system response time. (a) Experimental results in a 5 × 5 map. (b) Experimental results in a 10 × 10 map. (c) Experimental results in a 15 × 15 map.
Figure 12.
Results of the resource utilization for processing time ratio comparison. (a) Experimental results in a 5 × 5 map. (b) Experimental results in a 10 × 10 map. (c) Experimental results in a 15 × 15 map.
In the model training part, as shown in Figure 13, the horizontal axis in the figure represents the experimental conditions, and the horizontal axes represent training time and FPS, respectively. Under the conditions of a small environment, a stable model can be obtained in about 0.48 h. This training process is trained at a speed of about 1020 FPS, while under the conditions of a large environment E3(338) with 225 nodes and 338 robots, it takes about 28.58 h of training to obtain a stable speed model, and this process takes about 101 FPS.
Figure 13.
Model training time and FPS under each experimental condition. (a) Experimental results in a 5 × 5 map. (b) Experimental results in a 10 × 10 map. (c) Experimental results in a 15 × 15 map.
Summarizing the above, the system proposed in this study has a good performance in reducing the number of path conflicts and efficiency; especially, regarding efficiency, it can re-plan all the robots’ paths in a very short time, and the planned paths are close to the best results in the whole domain. Most importantly, the system does find a new way to solve the path conflicts by reducing the number of path conflicts instead of simply solving the path conflicts through obstacle avoidance.
4.2. Feasibility Assessment Experiment
As shown in Figure 14, a simulation scenario with 24 neatly arranged shelves and 10 Idealworks iw.hubs is designed to simulate the movement of 10 robots in a warehouse to verify whether the system proposed in this study can smoothly guide 10 robots at the same time, and the local path algorithm named reciprocal velocity obstacles (RVOs) [31] is combined to avoid each other at the path conflict points. The state definition for the system evaluation is shown in Table 7. A complete navigation without any collision will be considered a success, a collision but still completing the mission will be considered a danger, and a collision and not completing the mission for that navigation will be considered a failure, and the number of collisions will continue to accumulate for each collision. The evaluation of this simulation of 10 robots in an environment with 24 shelves is shown in Table 7. In a total of fifty experiments, forty-nine were successful and one was stopped, achieving a success rate of 98%. The only major reason for failure was the possible collision of multiple robots with the field equipment during path congestion, resulting in the stoppage of the robots. One of the simulation videos can be viewed on this website: https://youtu.be/oPmJAwGHg70, accessed on 21 August 2024. In this simulation scenario, the start point and the end point of each robot are randomly generated, and the start point state is shown in Figure 14a, while the end point state is shown in Figure 14b, and the experiment is carried out for a total of 50 times. We can see that all robots effectively move from the start points to the end points.
Figure 14.
One simulation scenario of ten robots in an environment with twenty-four shelves. (a) Start points of 10 robots. (b) End points of 10 robots.
Table 7.
State definition for the system evaluation and evaluation of the simulation results of 10 robots in an environment with 24 shelves.
5. Conclusions
This study designs a multi-robot navigation system. A path planning and selection method based on deep reinforcement learning is proposed to plan paths with fewer path conflicts for all robots. This study has two advantages. First, this study uses topological maps to simplify the map representation for multi-robot path planning. Since the number of nodes used in the topological map is small, the implemented system does not occupy too many computing resources. This enables the proposed method to perform path planning for more robots in more complex environments. Second, this study employs the A* algorithm and combines other policies for path generation. The PPO algorithm is used for path selection, which can effectively reduce path conflicts in multi-robot navigation. Experiments using three different map sizes and different numbers of robots show that the proposed method has good real-time performance and good computational efficiency. The experimental results show that its performance ranges from a minimum of 20.58 FPS to a peak of approximately 1538.46 FPS. In addition, it also significantly reduced the number of path conflicts. The experimental results show that the maximum number of path conflicts that can be reduced is 60,375. In 1000 multi-robot navigation tasks, it can reduce the number of path conflicts by 9.84%. The fewer path conflicts between multiple robots, the less navigation time it takes for the robots to complete all tasks. For busy warehouses, a method that can save a lot of overall navigation time is very important. From the perspective of multi-robot global path planning, the proposed method is already a phased contribution. For local path planning on the robot, the existing RVO algorithm is used to prove the effectiveness of the proposed method in global path planning. In future work, we will also try to use DRL methods to design a more effective local path planning method so that the robots can more effectively avoid each other at the path conflict points and avoid obstacles dynamically on the planned path.
Author Contributions
Conceptualization, C.-C.W. and K.-D.W.; methodology, K.-D.W. and B.-Y.Y.; validation, C.-C.W.; analysis and investigation, K.-D.W. and B.-Y.Y.; writing—original draft preparation, K.-D.W. and B.-Y.Y.; writing—review and editing, C.-C.W.; visualization, K.-D.W. and B.-Y.Y.; project administration, C.-C.W.; and funding acquisition, C.-C.W. All authors have read and agreed to the published version of the manuscript.
Funding
This research was partly supported by the National Science and Technology Council (NSTC) of Taiwan, R.O.C. under grant number NSTC 112-2221-E-032-035-MY2.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
All data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Zhu, K.; Zhang, T. Deep reinforcement learning based mobile robot navigation: A review. Tsinghua Sci. Technol. 2021, 26, 674–691. [Google Scholar] [CrossRef]
- Verma, J.K.; Ranga, V. Multi-robot coordination analysis, taxonomy, challenges and future scope. J. Intell. Robot. Syst. 2021, 102, 10. [Google Scholar] [CrossRef]
- Seenu, N.; RM, K.C.; Ramya, M.M.; Janardhanan, M.N. Review on state-of-the-art dynamic task allocation strategies for multiple-robot systems. Ind. Robot Int. J. Robot. Res. Appl. 2020, 47, 929–942. [Google Scholar] [CrossRef]
- Wang, S.; Wang, Y.; Li, D.; Zhao, Q. Distributed relative localization algorithms for multi-robot networks: A survey. Sensors 2023, 23, 2399. [Google Scholar] [CrossRef] [PubMed]
- Madridano, Á.; Al-Kaff, A.; Martín, D.; De La Escalera, A. Trajectory planning for multi-robot systems: Methods and applications. Expert Syst. Appl. 2021, 173, 114660. [Google Scholar] [CrossRef]
- Fan, T.; Long, P.; Liu, W.; Pan, J. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Int. J. Robot. Res. 2020, 39, 856–892. [Google Scholar] [CrossRef]
- Olcay, E.; Schuhmann, F.; Lohmann, B. Collective navigation of a multi-robot system in an unknown environment. Robot. Auton. Syst. 2020, 132, 103604. [Google Scholar] [CrossRef]
- Expert Market Research. Logistics Market Report and Forecast 2024–2032. Available online: https://www.expertmarketresearch.com/reports/logistics-market (accessed on 30 June 2024).
- Quinlivan, J. How Amazon Deploys Robots in Its Operations Facilities. About Amazon 2023. Available online: https://www.aboutamazon.com/news/operations/how-amazon-deploys-robots-in-its-operations-facilities (accessed on 30 June 2024).
- Gao, Z.; Jiao, Y.; Yang, W.; Li, X.; Wang, Y. A Method for UWB Localization Based on CNN-SVM and Hybrid Locating Algorithm. Information 2023, 14, 46. [Google Scholar] [CrossRef]
- Zhang, H.; Zhou, X.; Zhong, H.; Xie, H.; He, W.; Tan, X.; Wang, Y. A dynamic window-based UWB-odometer fusion approach for indoor positioning. IEEE Sens. J. 2022, 23, 2922–2931. [Google Scholar] [CrossRef]
- Chatzisavvas, A.; Chatzitoulousis, P.; Ziouzios, D.; Dasygenis, M. A routing and task-allocation algorithm for robotic groups in warehouse environments. Information 2022, 13, 288. [Google Scholar] [CrossRef]
- Stern, R. Multi-agent path finding—An overview. In Artificial Intelligence; Tutorial Lectures; Osipov, G., Panov, A., Yakovlev, K., Eds.; Springer: Cham, Switzerland, 2019; pp. 96–115. [Google Scholar]
- Warita, S.; Fujita, K. Online planning for autonomous mobile robots with different objectives in warehouse commissioning task. Information 2024, 15, 130. [Google Scholar] [CrossRef]
- Meng, X.; Fang, X. A UGV Path Planning Algorithm Based on Improved A* with Improved Artificial Potential Field. Electronics 2024, 13, 972. [Google Scholar] [CrossRef]
- Wu, X.; Wu, R.; Zhang, Y.; Peng, J. Distributed Formation Control of Multi-Robot Systems with Path Navigation via Complex Laplacian. Entropy 2023, 25, 1536. [Google Scholar] [CrossRef]
- Pianpak, P.; Son, T.C.; Toups Dugas, P.O.; Yeoh, W. A distributed solver for multi-agent path finding problems. In Proceedings of the First International Conference on Distributed Artificial Intelligence, Beijing, China, 13–15 October 2019; pp. 1–7. [Google Scholar]
- Levin, M.W.; Rey, D. Conflict-point formulation of intersection control for autonomous vehicles. Transp. Res. Part C Emerg. Technol. 2017, 85, 528–547. [Google Scholar] [CrossRef]
- Motes, J.; Sandström, R.; Lee, H.; Thomas, S.; Amato, N.M. Multi-robot task and motion planning with subtask dependencies. IEEE Robot. Autom. Lett. 2020, 5, 3338–3345. [Google Scholar] [CrossRef]
- Sharon, G.; Stern, R.; Felner, A.; Sturtevant, N.R. Conflict-Based Search for Optimal Multi-Agent Pathfinding. Artif. Intell. 2015, 219, 40–66.9. [Google Scholar] [CrossRef]
- Kottinger, J.; Almagor, S.; Lahijanian, M. Conflict-based search for multi-robot motion planning with kinodynamic constraints. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; pp. 13494–13499. [Google Scholar]
- Solis, I.; Motes, J.; Sandström, R.; Amato, N.M. Representation-optimal multi-robot motion planning using conflict-based search. IEEE Robot. Autom. Lett. 2021, 6, 4608–4615. [Google Scholar] [CrossRef]
- Moldagalieva, A.; Ortiz-Haro, J.; Toussaint, M.; Hönig, W. db-CBS: Discontinuity-Bounded Conflict-Based Search for Multi-Robot Kinodynamic Motion Planning. arXiv 2023, arXiv:2309.16445. [Google Scholar] [CrossRef]
- NVIDIA Isaac Sim. Available online: https://developer.nvidia.com/isaac-sim (accessed on 9 July 2024).
- Makoviychuk, V.; Wawrzyniak, L.; Guo, Y.; Lu, M.; Storey, K.; Macklin, M.; State, G. Isaac gym: High performance GPU-based physics simulation for robot learning. arXiv 2021, arXiv:2108.10470. [Google Scholar] [CrossRef]
- OpenAI Gym. Available online: https://www.gymlibrary.dev/index.html (accessed on 9 July 2024).
- Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-baselines3: Reliable reinforcement learning implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
- Rojas, M.; Hermosilla, G.; Yunge, D.; Farias, G. An Easy to Use Deep Reinforcement Learning Library for AI Mobile Robots in Isaac Sim. Appl. Sci. 2022, 12, 8429. [Google Scholar] [CrossRef]
- Zhou, Z.; Song, J.; Xie, X.; Shu, Z.; Ma, L.; Liu, D.; See, S. Towards building AI-CPS with NVIDIA Isaac Sim: An industrial benchmark and case study for robotics manipulation. In Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice, Lisbon, Portugal, 14–20 April 2024; pp. 263–274. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Van den Berg, J.; Lin, M.; Manocha, D. Reciprocal velocity obstacles for real-time multi-agent navigation. In Proceedings of the 2008 IEEE International Conference on Robotics and Automation (ICRA), Pasadena, CA, USA, 19–23 May 2008; pp. 1928–1935. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).













