*2.2. Deep Learning*

In recent years, as deep neural networks show great potential for solving complex estimation problems, a rapidly growing trend of utilizing DL techniques has appeared in robotics tasks. For navigation based on visual information, Chen et al. [17] used deep neural networks to learn how to recognize objects from the image, and then determined discrete actions such as turning left and turning right according to the identified information. Gao et al. [18] proposed a novel deep neural network architecture, Intention-Net, to track the planned path by mapping monocular images to four discrete actions such as going forward. Since the action is discrete, the navigation behaviors in above-mentioned researches are rough and likely to be unfeasible. For navigation based on range information, Pfeiffer et al. [19] used demonstrative data to train a DL model that can control the robot through inputting the data from laser sensors and the target position. However, since a large number of labeled data are required for training the above DL models, it may be unpractical for real-world applications.

#### *2.3. Deep Reinforcement Learning*

Autonomous navigation demands two essential building blocks: perception and control. Similar to the above-mentioned studies about DL, many research studies [20] about applying DL in navigation are pure perception, which means that agents passively receive observations and infer desired information. Compared with pure perception, control goes one step further, seeking to actively interact with the environment by executing actions [21]. Then, navigation becomes a problem of sequential decisions. DRL is very suitable for solving it, which is proved by reference [7] in game and reference [8] in robotics. Since DRL methods can deal with better full observable states than partial states, lots of studies [9,10,22] about applying DRL in navigation focus on static environments. Regarding dynamic environments, Chen et al. [23] proposed a time-efficient approach based on DRL for socially aware navigation. However, to calculate the reward based on social norms, the motion information of pedestrians are known in advance, which are not reliable in real-world applications, since it is not precise to estimate information by sensor readings.

Given that it is difficult for an individual DRL method to solve the navigation problem, hierarchical approaches are widely researched in the literature [24–26]. Lei et al. [26] combined A\* and least-squares policy iteration for mobile robot navigation in complex environments. Aleksandra et al. [25] integrated sampling-based path planning with reinforcement learning (RL) agents for indoor navigation and aerial cargo delivery. The aforementioned methods can only handle navigation in static environments. Kato et al. [24] combined value-based DRL with A\* on topological maps to solve navigation in environments with pedestrians. However, the experiment results are not well applied in dynamic environments, since the learned policy is reactive and the way of taking observations as states is inaccurate, which may lead to irrational behaviors and even collisions with obstacles.

Compared with the above hierarchical methods, our hierarchical navigation algorithm can generate better waypoints by JPS+ (P) and navigate in dynamic environments via IA3C.

## **3. The Methodology**

In this section, we at first briefly introduce the problem and model the virtual tracked robot. Next, we present the architecture of the proposed navigation method. Finally, we describe the global path planner based on JPS+ (P) and the local motion controller based on IA3C in detail.
