*4.4. Performance on Large-Scale Navigation Tasks*

Finally, we evaluate the performance of JPS-IA3C on large-scale navigation tasks. JPS-A3C+ replaces IA3C with A3C+ in the proposed method. As shown in Figure 9, JPS-IA3C can navigate the nonholonomic robot with continuous control by dealing with both complex static obstacles and moving obstacles. Furthermore, navigation policies learned by JPS-IA3C have the notable generalization ability and can handle unseen tested environments that are almost three orders of magnitude larger than the training environment (Figure 6). We do not show trajectories generated by JPS-A3C+ for clear representation.

(**a**) Environment 1—722 by 722

(**b**) Environment 2—624 by 624 (**c**) Environment 3—1084 by 1084

**Figure 9.** (**a**–**c**) Given the start position and the goal position, the purple line is the path planned by the Jump Point Search+ (Prune) (JPS+ (P)), and the blue line is the trajectory generated by JPS-IA3C. Black regions are unpassable, while light white regions are traversable for robots. Yellow regions close to obstacles denote warning areas. Moving obstacles appear in pink square areas, where there are 30 random moving obstacles.

In Figure 10a, JPS-IA3C can achieve success rates of more than 94% in all of the tested environments and even 99.8% in environment 3, while the success rates of JPS-A3C+ are below 42% in all of the evaluations and even 0% in environment 2. Guided by the same subgoals, JPS-A3C+ only considers current observations and ignores hidden states (i.e., moving obstacles' velocities and

changes of map topology) from history observations, leading to collisions with complex static obstacles or moving obstacles. However, JPS-IA3C can use the LSTM-based network to construct accurate belief states by storing historical information over a certain length, which can handle environmental changes. In Figure 10b,c, compared with JPS-A3C+, JPS-IA3C can navigate robots in large-scale environments with shorter path lengths and less execution time, which concludes that constructing the LSTM-based network architecture can greatly improve the performance of the proposed algorithm by endowing the robot with certain memory.

**Figure 10.** We generate 500 navigation tasks in each tested environment (in sum 1500) whose starting and goal points are randomly chosen. The distance between the initial starting and goal position is greater than half of the width of the environment. (**a**) Success rate denotes the probability of successfully finishing tasks. (**b**) Path length denotes the traveling length between the starting and goal position. (**c**) Execution time denotes the total execution time for the motion controller.
