*4.2. Evaluation of Global Path Planning*

In Figure 7a, subgoals planned by A\* are not placed at the exit of local minima due to ignoring the map topology, such as for example the local minimum caused by the concave canyon between subgoal 1 and subgoal 2. In Figure 7b, the number of subgoals of A\* is over two times that of JPS-IA3C. Although the subgoals generated by A\* cover all of the exits of local minima, they also include many unnecessary subgoals, which will hinder the ability to avoid moving obstacles. In conclusion, the subgoals planned by JPS-IA3C are essential waypoints that can help the motion controller handle the problem of local minima and simultaneously give full play to capabilities of the motion controller.

(**a**) map 1—50 by 50 (**b**) map 2—50 by 50

**Figure 7.** (**a**–**b**) Orange lines denote paths planned by A\*, and purple lines denote paths generated by the path planner in JPS-IA3C. Small red circles denote subgoals planned by JPS-IA3C. Yellow regions denote warning areas. (**a**) A\* find subgoals (i.e., green crosses) by sampling optimal paths at fixed intervals. (**b**) A\* plans subgoals (i.e., green crosses) through taking the inflection points of optimal paths.

To avoid randomness, A\* and JPS-IA3C are tested 100 times on large-scale maps (Figure 8), and the inflection points of optimal paths are regarded as subgoals of A\*. First-move lags of A\* include the time of planning optimal paths and finding subgoals by checking inflection points, while JPS-IA3C can directly finding subgoals without generating optimal paths. In Table 4, the first-move lags of A\* are 271, 1309, and 860 times those of JPS-IA3C on these tested environments. Obviously, compared with A\*, JPS-IA3C can more efficiently generate subgoals by the cached information about map topology and the canonical ordering way of search in JPS+ (P). Besides, the first-move lag of JPS-IA3C is below 1 millisecond on large-scale environments, which proves that the proposed approach can plan subgoals at low time costs to provide users with excellent experiences.



The first-move lag denotes the amount of planning time the robot takes before deciding on the first move (i.e., time of finding subgoals).

**Figure 8.** The performances of IA3C and A3C+ in the training phase.

#### *4.3. Evaluation of Local Motion Controlling*

To evaluate the LSTM-based network, we compare IA3C with A3C+, which uses an FC layer to substitute an LSTM layer of IA3C's network in the training process. State values denote the outputs of the critic-network, which can reflect the performances of learned policies [7]. Success rates are probabilities of accomplishing training tasks in the last 1000 episodes.

In Figure 8a, when the episode is less than 20,000, A3C+ performs better than IA3C. It demonstrates that A3C+ can learn reactive policies to deal with simple learning tasks that only require a robot to navigate for short distances while avoiding a few moving obstacles, and the learning efficiency of LSTM-based networks is slower than that of FC-based networks. When the complexity of learning tasks increases further (i.e., the episode increases to 20,000), the performance of IA3C is better than that of A3C+. It proves that the reactive policies generated by A3C+ only consider current observations, while IA3C can integrate current observations and abstract history information to learn more rational policies. Besides, modified curriculum learning can transfer experiences from learned tasks to new complex tasks, which is more suitable to accelerate the training efficiencies of LSTM-based networks. In Figure 8b, the data of success rates presents the same trend as that of the mean V-value, which further demonstrates the former conclusion.

To evaluate the effectiveness of the novel reward function framework, we remove individual reward shaping or single modified curriculum learning from the reward framework, but there is no way to converge. It shows that the proposed reward function framework can solve the problem of sparse reward and be beneficial for learning useful policies.
