Risk-Aware Deep Reinforcement Learning for Robot Crowd Navigation

Sun, Xueying; Zhang, Qiang; Wei, Yifei; Liu, Mingmin

doi:10.3390/electronics12234744

Open AccessArticle

Risk-Aware Deep Reinforcement Learning for Robot Crowd Navigation

¹

College of Automation, Jiangsu University of Science and Technology, No. 666 Changhui Road, Zhenjiang 212100, China

²

Systems Science Laboratory, Jiangsu University of Science and Technology, No. 666 Changhui Road, Zhenjiang 212100, China

³

Central Research Institute, SIASUN Robot & Automation Co., Ltd., No. 16 Jinhui Street, Shenyang 110168, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(23), 4744; https://doi.org/10.3390/electronics12234744

Submission received: 5 October 2023 / Revised: 10 November 2023 / Accepted: 20 November 2023 / Published: 22 November 2023

(This article belongs to the Special Issue Assistive Robotic Navigation Using Deep Reinforcement)

Download

Browse Figures

Versions Notes

Abstract

:

Ensuring safe and efficient navigation in crowded environments is a critical goal for assistive robots. Recent studies have emphasized the potential of deep reinforcement learning techniques to enhance robots’ navigation capabilities in the presence of crowds. However, current deep reinforcement learning methods often face the challenge of robots freezing as crowd density increases. To address this issue, a novel risk-aware deep reinforcement learning approach is proposed in this paper. The proposed method integrates a risk function to assess the probability of collision between the robot and pedestrians, enabling the robot to proactively prioritize pedestrians with a higher risk of collision. Furthermore, the model dynamically adjusts the fusion strategy of learning-based and risk-aware-based features, thereby improving the robustness of robot navigation. Evaluations were conducted to determine the effectiveness of the proposed method in both low- and high-crowd density settings. The results exhibited remarkable navigation success rates of 98.0% and 93.2% in environments with 10 and 20 pedestrians, respectively. These findings emphasize the robust performance of the proposed method in successfully navigating through crowded spaces. Additionally, the approach achieves navigation times comparable to those of state-of-the-art methods, confirming its efficiency in accomplishing navigation tasks. The generalization capability of the method was also rigorously assessed by subjecting it to testing in crowd environments exceeding the training density. Notably, the proposed method attains an impressive navigation success rate of 90.0% in 25-person environments, surpassing the performance of existing approaches and establishing itself as a state-of-the-art solution. This result highlights the versatility and effectiveness of the proposed method in adapting to various crowd densities and further reinforces its applicability in real-world scenarios.

Keywords:

crowd navigation; deep reinforcement learning; risk-aware

1. Introduction

Assistive robots have dramatically transformed human lives by offering a multitude of services, such as healthcare, cleaning, meal delivery, and transportation, thereby augmenting convenience and comfort [1,2,3,4]. In recent years, there has been a considerable surge in the deployment of assistive robots across various settings such as rehabilitation centers, offices, hotels, and airports [1,2,3,4]. The capability of robots to navigate and evade collisions is of paramount importance, especially in high-density environments. The implementation of effective navigation and collision avoidance strategies is crucial in ensuring agent safety, preventing encounters between robots and humans or other obstacles, and minimizing potential hazards and injuries [5].

Traditional methods face a significant challenge in ensuring collision avoidance while navigating through dense crowds. Crowded environments are characterized by their complex structures and dynamic scenarios, which further exacerbate the navigation process. It becomes imperative for robots to precisely predict and comprehend human intentions for effective navigation. The implementation of appropriate strategies is essential to proactively evading potential collisions.

Current navigating and path planning in real-world applications rely on reactive algorithms such as the social force model (SFM) [6] and the dynamic window approach (DWA) [7]. While these methods perform well in simpler environments, they fall short in dynamic, complex environments where humans and robots coexist and struggle with uncertainties [8]. Recently, researchers have been investigating the amalgamation of planning and learning to improve the real-time decision-making and path planning capabilities of robots. For instance, using imitation learning [9] and reinforcement learning algorithms [10,11,12,13,14,15], robots can learn behavioral strategies through environmental interactions, enabling them to learn from data and adapt to environmental changes. Learning-based approaches have shown promising results in crowd navigation problems, but their effectiveness in obstacle avoidance diminishes as crowd density increases. To tackle this, some studies have suggested using graph-based approaches to represent interactions between agents. Chen et al. [16] introduced a graph attention mechanism, but it requires pre-defined network parameters based on human attention in advance. Liu et al. [17] proposed a spatial–temporal graph attention mechanism, which effectively addresses the complex interaction problem between agents by establishing connections and incorporating pedestrian trajectory prediction, as shown in Figure 1a. In [17], the interaction relationship between pedestrians and the robot is learned using a self-attention mechanism. However, this method requires the robot to have a complete connection with all pedestrians within its field of view, leading to non-influential pedestrians impacting the robot’s actions. In crowded scenes, excessive attention to unimportant individuals and neglecting pedestrians with collision risks can result in the freezing robot problem [18].

However, developing a dense crowd navigation robot system based on deep reinforcement learning presents several challenges. Firstly, while current reinforcement learning-based crowd navigation methods perform well in sparse or medium-density crowds, they struggle in high-density areas such as airports or exhibition halls. In such environments, robots can sometimes halt or oscillate indefinitely, a phenomenon known as the “Freezing Robot” problem. Secondly, when confronted with many pedestrians, directing the robot’s attention towards those with a high risk of collision to enable more reasonable decision making is also a significant challenge.

In order to address these challenges, we propose a reinforcement learning strategy that incorporates a risk-aware interaction graph. This strategy aims to capture the collision risk in robot–human interactions, thereby addressing the freezing robot problem in crowded navigation and improving the generalization of learning models. As shown in Figure 1, we construct a risk-aware interaction graph using attention and risk-awareness mechanisms. This graph establishes connections between the robot and humans within its visual range who pose collision risks, as depicted in Figure 1b. Risk-aware features are derived based on the relative position and velocity between the robot and humans. To obtain spatial features of agent interactions, we combine the features from the learning-based network and the risk-aware network, which can be seen in Figure 1c. These spatial features are then fed into a temporal encoder. Finally, the policy network learns the optimal strategy through training. This approach ensures that the robot can effectively navigate through dense crowds by focusing on those humans with a high risk of collision, thus making more reasonable and safer decisions.

The primary contributions of our work are as follows:

-: We suggest a risk-aware deep reinforcement learning-based method for robot crowd navigation. This approach effectively addresses the freezing robot problem by enhancing the robot’s perception of a small number of humans who pose collision risks. Our experimental results demonstrate that this method achieves state-of-the-art performance in crowded scenario tests.
-: We introduce a feature encoding method based on a collision risk function. This method enables robots to anticipate humans with collision risks in a moving crowd. The collision risk function takes into account factors such as relative speed and relative position in robot–human interactions.
-: We propose an adaptive feature fusion model that integrates the features encoded based on the collision risk function with those encoded using learning techniques. By incorporating prior knowledge of collision risk into the learned attention, our model effectively reduces the risk of robot–human collisions in crowded scenes.

2. Related Works

The objective of crowd navigation is to successfully reach specific goals while avoiding collisions with other agents as well as static or dynamic obstacles. Existing crowd navigation methods can be broadly classified into three categories: reaction-based methods, learning-based methods, and graph-based methods.

Reaction-based robot navigation in dynamic environments often employs methods such as SMF [6], ORCA [19], and DWA [7]. The SMF method is based on the concept of social forces, which model the interactions between agents as forces. Kamezaki et al. [20] introduce a proactive social motion model that enables mobile service robots to navigate safely and socially in crowded and dynamic environments. The core idea of the ORCA model is to coordinate the velocities of agents to avoid collisions. The DWA method samples the possible velocity range for the robot, estimates the robot’s potential trajectories based on velocity, performs collision detection and evaluation on different trajectories, and selects the optimal path based on the evaluation. These aforementioned methods are reactive approaches that have demonstrated excellent performance in industrial settings. However, as assistive robots are increasingly integrated into interactive environments with humans, such as airports and supermarkets, they often encounter crowded scenarios with numerous dynamic humans.

Deep reinforcement learning (DRL) has garnered significant attention in the field of robot navigation due to its remarkable learning capability. Liu et al. [21] employed a deep reinforcement learning framework to address the collision avoidance problem in decentralized multi-agent systems. Their proposed method utilized a policy network that takes into account the relative positions and velocities of agents as inputs, effectively considering the motion features of other agents and generating collision-free velocity vectors. Chen et al. [22] suggested incorporating a pairwise interaction with a self-attention mechanism, jointly modeling human–robot and human–human interactions within the deep reinforcement learning framework. This model provides a more explicit representation of crowd–robot interaction, capturing human-to-human interactions in dense crowds and improving the robot’s navigation capability in crowded environments. However, the unpredictability of human behavior poses a challenge for real-time learning. Samsani et al. [23] introduced a model that considers all possible actions humans might take within a given time, incorporating real-time human behavior by modeling danger zones. The robot achieves safe navigation by training to avoid these danger zones. Wang et al. [24] proposed an efficient interactive reinforcement learning approach that integrates human preferences into the reward model, using them as teacher guidance to explore the underlying aspects of social interaction. The introduction of hybrid experience policy learning enhances sample efficiency and incorporates human feedback. Mun et al. [25] employed variational autoencoders to learn occlusion inference representations and applied deep reinforcement learning techniques to integrate these learned representations into occlusion-aware planning. This approach enhances obstacle avoidance performance by estimating agents in occluded spaces.

Furthermore, typical indoor environments present additional challenges due to spatial limitations. Zheng et al. [26] proposed a hierarchical approach that integrates local and global information to ensure the safety and efficiency of exploration and planning. This planning method utilizes reinforcement learning-based obstacle avoidance algorithms, enabling the robot to navigate safely through crowds by following the path generated by the exploration planner. Liu et al. [14] devised a reward function that incorporates social norms to guide the training of the policy. The policy model takes laser scan sequences and the robot’s own state as inputs to differentiate between static and dynamic obstacles and outputs steering commands. The learned policy successfully guides the robot to the target location in a manner consistent with social norms.

Previous research has demonstrated the effectiveness of DRL frameworks in training efficient navigation strategies. However, their performance declines as the crowd size increases. Recent advancements have highlighted the potential of graph neural networks to capture local interactions among surrounding objects. Liu et al. [27] proposed a relational graph learning approach that utilizes the latent features of all agents to infer their relationships. They encode higher-order interactions in the state representation of each agent using graph convolutional networks and employ these interactions for state prediction and value estimation. Implementing this method leads to enhanced navigation efficiency and reduced collisions. Chen et al. [16] introduced a novel network that utilizes graph representations to learn policies. Initially, a graph convolutional network was trained using human gaze data to predict human attention towards different agents during navigation tasks. The learned attention was then integrated into a graph-based reinforcement learning framework. Liu et al. [28] introduced a distributed structure-recurrent neural network (DS-RNN) that considers the spatial and temporal relationships of pedestrian movements to construct spatiotemporal graphs. They employ RNNs to generate the robot’s motion policy. Liu et al. [13] introduced a recurrent graph neural network with an attention mechanism that infers the intentions of dynamic agents by predicting their future trajectories across multiple time steps. These predictions are combined into a model-free reinforcement learning framework to prevent the robot from deviating from the preplanned paths of other agents. Zhang et al. [29] addressed autonomous navigation tasks in large-scale environments by incorporating relational graph learning to capture interactions among agents. They also introduced a globally guided reinforcement learning strategy to achieve fixed-size learning models in large-scale complex environments. Spatiotemporal graph transformer [30] is also employed to model the interactions between humans and the robot for crowd navigation. Wang et al. [31] proposed a social-aware navigation method named NaviSTAR, which utilizes a hybrid spatiotemporal graph transformer (STAR) to comprehend crowd interactions and incorporate latent multimodal information. The strategy is trained using off-policy reinforcement learning algorithms and preference learning, with the reward function network trained under supervision.

Current research suggests that the integration of graph neural networks and reinforcement learning can effectively capture interactions among crowds. Anticipating humans’ intentions in advance can provide improved guidance for robot navigation. However, existing graph neural networks primarily rely on distances between humans to construct graph models, neglecting positional and velocity information among humans. This oversight can lead to potential collision risks and result in the freezing robot problem.

To tackle this problem, we propose a method for robot crowd navigation based on risk-aware deep reinforcement learning. A feature encoding method based on a collision risk function is introduced, aimed at enabling robots to anticipate individuals with collision risks in a moving crowd. Additionally, we present an adaptive feature fusion model that integrates features encoded using the collision risk function with those encoded using learning techniques. By incorporating prior knowledge of collision risk into the learned attention, our model demonstrates better performance in crowded scenes compared to previous methods.

3. Methodology

3.1. Problem Formulation

The goal of crowd navigation tasks is to seek a navigation strategy that allows robots to move to the target location in crowded environments without collisions. This task is a sequential decision problem that can be abstracted as a partially observable Markov decision process (POMDP) [32] and can be modeled using the framework of reinforcement learning [33]. The primary elements of the reinforcement learning framework are composed of five components:

M = \{S, A, P, R, γ\}

, where

S

denotes the state space,

A

represents the action space,

P

signifies the state transition model,

R

denotes the reward function, and

γ

represents the discount factor. Below, we will proceed to model the problem of dense crowd navigation for robots using reinforcement learning methods, providing detailed explanations regarding the state space, action space, reward function design, and other aspects.

State space: In our proposed model, the design of the state space is similar to the approach used in [22]. Assuming a robot is navigating in a space with

n

pedestrians. The robot’s state is denoted as

r_{0}^{t}

, and the humans’ states are denoted as

h_{1}^{t}

to

h_{n}^{t}

. The states of each agent are described in the world coordinate system. For all agents, the state can be expressed as [

s_{o}

,

s_{h}

], where

s_{o}

represents the observable state, and

s_{h}

represents the hidden state. For pedestrian

i

, the visible state includes the position

(x_{i}, y_{i})

and velocity

(v_{x i}, v_{y i})

, while the hidden state includes the target position

(g_{x i}, g_{y i})

, radius of the pedestrian

ρ

, preferred speed of the pedestrian

v_{r e f}

. The robot cannot observe the hidden states of pedestrians but can observe its own complete state. Hence, the input state of the model can be represented as follows:

s^{t} = [r_{0}^{t}, h_{1}^{t}, \dots, h_{n}^{t}]

(1)

r_{0}^{t} = [x_{0}, y_{0}, v_{x 0}, v_{y 0}, g_{x 0}, g_{y 0}, ρ_{0}, v_{r e f}]

(2)

h_{i}^{t} = [x_{i}, y_{i}, v_{x i}, v_{y i}]

(3)

Action Space: The action space is composed of velocity vectors that satisfy constraints. Assuming that the robot is an omnidirectional mobile robot and the robot can move in any direction at any time, the action space can be expressed as:

a^{t} = [v_{x}, v_{y}], v_{x}, v_{y} \in (0, v_{r e f})

(4)

In Equation (4),

a^{t}

is the next action of the robot,

v_{x}

and

v_{y}

are the robot’s next velocity in the x and y directions, respectively.

Reward Function: Similar to the approach in [28], a reward is given when the robot reaches the target point, and a penalty should be applied when the robot collides with pedestrians. A small penalty should be assigned when the distance between the robot and pedestrians is less than the comfortable distance for pedestrians. The detailed design of the reward function is as follows:

r^{t} (s^{t}, a^{t}) = \{\begin{matrix} \begin{matrix} 10, & i f d_{g}^{t} < 0.3 \\ - 20, & i f d_{m i n}^{t} < 0 \\ 2.5 (d - 0.25), & i f 0 < d_{m i n}^{t} < 0.25 \\ 2 (- d_{g}^{t} + d_{g}^{t - 1}), & o t h e r w i s e \end{matrix} \end{matrix}

(5)

In Equation (5),

d_{g}^{t}

denotes the distance between the robot and the target at moment

t

, and

d_{m i n}^{t}

denotes the minimum distance between the robot and the pedestrians at moment

t

. The discount return

R^{t} = E [\sum_{i = t}^{T} γ^{i - 1} r_{i}]

and

γ

represents the discount factor.

The objective is to find the optimal policy to maximize the expected return

J (π)

:

π^{*} = a r g \max_{π} J (π),

(6)

where

J (π)

denotes the expected value of the returns for all possible trajectories under the given strategy, which can be expressed as follows:

J (π) = \int p (τ | π) R (τ) = E_{τ ~ π} [R (τ)]

(7)

Here

p (τ | π)

denotes the probability that trajectory

τ

occurs given strategy

π

.

3.2. Risk-Aware Interaction Graph Representation

The key to robot navigation in dense crowds is to express the interaction between agents. We model human–human and robot–human interactions using a risk-aware interaction graph, as shown in Figure 1c. The risk-aware interaction graph can be represented as

G_{t} = (V_{t}, E_{t})

, where

V_{t}

denotes the set of nodes at moment

t

. For a pedestrian node, the initial values are the pedestrian’s position and velocity. As for the robot node, the initial values are the robot’s position, velocity, target, radius, and reference velocity. In the interaction graph,

E_{t}

is the set of edges, which represents the degree of attention of intelligence

i

to

j

. Our proposed enhanced risk-aware interaction graph is a fusion of a learning-based graph and a risk-aware strategy-based graph. As shown in Figure 1a, the learning-based interaction graph connects all pedestrians that can be observed by the robot and constructs the adjacency matrix through the attention mechanism. In the learning-based interaction graph, any two visible agents are connected, which may result in apparently uninfluential pedestrians affecting the robot’s next action. However, over-attention to unimportant pedestrians can lead the robot to adopt an overly conservative behavioral strategy, which can lead to the robot freezing problem. Therefore, we propose a robot–human graph based on the risk-aware strategy, as shown in Figure 1b, which gives the adjacency matrix by calculating the collision risk between the robot and pedestrians. We finally fuse the learning-based graph and the risk-aware strategy-based graph. Thus, the robot can pay more attention to the pedestrians with a high collision risk.

In addition, the movement of all the agents leads to dynamic changes in the interactions. The node set

V_{t}

and the edge set

E_{t}

, as well as the parameters of the interaction function, change accordingly. To capture the temporal characteristic of nodes and edges, we use a temporal encoder to integrate the correlation of the time series graph

G_{t}

. The graphs of neighboring time steps are connected by the time encoder, thus realizing the long-term decision-making of the robot.

3.3. Network Architecture

In this section, we describe how to model the robot’s crowd navigation as a risk-aware interaction graph, and the overall structure is shown in Figure 2. The network can be divided into three parts, including spatial encoder, temporal encoder, and policy learning. First, the environment state

s^{t} = [r_{0}^{t}, h_{1}^{t}, \dots, h_{n}^{t}]

is taken as the input of the network, which includes the robot as well as the states of all the observed pedestrians in the vision of the robot. Then, learning-based features and collision risk-aware-based features are constructed to encode the interaction features of the crowd. After that, the two features are fused based on the adaptive weights. Then, the temporal correlation of the encoded features is captured by GRU. Finally, the value network and the policy network are designed as the action planner to obtain the value

V_{t}

and the policy

π (a_{t} | s_{t})

of the robot.

3.3.1. Spatial Feature Encoder

(a) Risk-aware strategy-based feature extraction.

In a scenario where there is a higher number of pedestrians, it is possible for the robot to allocate excessive attention to pedestrians who are not relevant while disregarding pedestrians who pose a collision risk. To address this issue, the robot should prioritize pedestrians who have a higher likelihood of colliding with it. Therefore, we propose the utilization of a risk-aware-based feature extraction algorithm. This algorithm is designed to generate spatial features denoted as

h_{r i s k}^{t} \in ℝ^{1 \times d}

for pedestrians who present a collision risk with the robot. The expression used to calculate these features is as follows:

h_{r i s k}^{t} = \sum_{i = 1}^{n} a_{i}^{t} \cdot h_{i_e m b}^{t}

(8)

In Equation (8),

a_{i}^{t}

represents the risk attention weight, which signifies the collision risk between the robot and pedestrian

i

.

h_{i_e m b}^{t}

is the embedded feature of pedestrian

i

. The interaction state between the robot and visible pedestrians can be categorized into three types: head-on encounter, head-on departure, and parallel walking, as illustrated in Figure 3.

In Figure 3, the robot encounters pedestrian 1 in the opposite direction, with both

α_{1}^{t}

and

β_{1}^{t}

being less than 90°. In this case, the risk of collision between the robot and pedestrian is high. Hence, the risk weight assigned to pedestrian 1 should be greater than 0.

When the robot and pedestrian 2 walk in opposite directions, with both

α_{2}^{t}

and

β_{2}^{t}

greater than 90°, there is no interaction between them, and the risk of collision is eliminated. Therefore, the risk weight of pedestrian 2 should be 0.

In the situation where the robot and pedestrian 3 move in the same direction, with

α_{3}^{t}

less than 90° and

β_{3}^{t}

greater than 90°, the robot’s speed

v_{0}^{t}

compared to the pedestrian’s speed

v_{3}^{t}

becomes a determining factor. If the robot’s speed exceeds that of the pedestrian, a risk of collision exists, and the pedestrian’s risk weight should be 0. Conversely, if the robot’s speed is lower than the pedestrian’s speed, there is no collision risk, and the pedestrian’s risk weight should be 0.

By considering the speeds and relative positions of the robot and pedestrian, the robot can make an initial assessment of the collision risk between the two entities. Additionally, the risk of collision is influenced by the distance between the robot and pedestrian. A smaller distance indicates a higher risk of collision, resulting in an inversely correlated relationship between the risk weight of the pedestrian and the distance between them.

The collision risks between robots and pedestrians vary depending on the interaction states and are influenced by the relative positions and speeds of both entities. To prioritize pedestrians with a higher collision risk, we propose the utilization of a risk-aware function, denoted as

f_{r i s k} (\cdot)

, to calculate the attention weight,

a_{i}^{t}

, between the robot and pedestrian

i

. By employing this function, the robot can allocate more attention to pedestrians who pose a higher collision risk.

{\tilde{a}}_{i}^{t} = f_{r i s k} (v_{0}^{t}, v_{i}^{t}, α_{i}^{t}, β_{i}^{t}, d_{i}^{t}) = \max (0, \frac{|v_{0}^{t}| c o s α_{i}^{t} + |v_{i}^{t}| c o s β_{i}^{t}}{‖ d_{i}^{t} ‖_{2}})

(9)

a_{i}^{t} = \frac{e^{{\tilde{a}}_{i}^{t}}}{\sum_{k = 1}^{n} e^{{\tilde{a}}_{i}^{t}}}

(10)

For pedestrians who are not observable to the robot, the weight of collision risk is assigned as zero.

In Equation (8), the feature

h_{i_e m b}^{t}

represents the embedded feature of pedestrian

i

. This feature is obtained by linearly mapping the initialized feature

h_{i_i n i t}^{t}

of the pedestrian.

h_{i_e m b}^{t} = L i n e a r (h_{i_i n i t}^{t}; ω)

(11)

In Equation (11), the term

L i n e a r (\cdot)

represents the linear mapping operation, and the weight matrix is denoted by

w

. The initialized features of pedestrians,

h_{i_i n i t}^{t}

, as shown in Equation (12), serve as the input for this mapping process.

h_{i_i n i t}^{t} = (x_{i}^{t}, y_{i}^{t}, x_{i}^{t + 1}, y_{i}^{t + 1}, x_{i}^{t + 2}, y_{i}^{t + 2}, x_{i}^{t + 3}, y_{i}^{t + 3}, x_{i}^{t + 4}, y_{i}^{t + 4}, x_{i}^{t + 5}, y_{i}^{t + 5})

(12)

In Equation (12),

\{\begin{matrix} x_{i}^{t + i} = x_{i}^{t} + i \times v x_{i}^{t} \\ y_{i}^{t + i} = y_{i}^{t} + i \times v y_{i}^{t} \end{matrix}, i \in \{1, 2, 3, 4, 5\}

(13)

That is to say, utilizing the current coordinates

p_{i} (x_{i}^{t}, y_{i}^{t})

and the current velocity

v_{i} (v x_{i}^{t}, v y_{i}^{t})

of pedestrian

i

, a 5-step prediction is generated by employing a primary function interpolation on the pedestrian’s trajectory.

(b) Attention-based feature extraction.

Similar to the approach described in reference [17], we employ an attention mechanism to represent the spatial interactions between pedestrians, as well as between robots and pedestrians, at each time step. Specifically, the interaction weights between pedestrians are determined using the human-to-human attention (HH-Attention) function. The HH-Attention function is defined as follows:

A t t n (Q_{H H}, K_{H H}, V_{H H}) = s o f t m a x (\frac{Q_{H H} K_{H H}^{T}}{\sqrt{d}}) V_{H H}

(14)

M u l t i (Q_{H H}, K_{H H}, V_{H H}) = l i n e a r (h e a d_{1}, \dots, h e a d_{m})

(15)

h e a d_{i} = A t t n_{i} (Q_{H H}, K_{H H}, V_{H H})

(16)

Q_{H H} = u_{H}^{t} w_{H H}^{Q}, K_{H H} = u_{H}^{t} w_{H H}^{K}, V_{H H} = u_{H}^{t} w_{H H}^{V}

(17)

Here,

u_{H}^{t}

represents the linear mapping of the pedestrian state

h_{i_i n i t}^{t}

, computed as

u_{H}^{t} = L i n e a r (h_{i_i n i t}^{t}; ω)

. The symbol

d

denotes the dimension of the pedestrian feature. By employing the multi-head self-attention mechanism, we obtain the spatial interaction feature

h_{H H}^{t} \in ℝ^{n \times d}

for the pedestrians. In our application, we set the number of attention heads,

m

, to 8.

Given that robots allocate different levels of attention to the pedestrians in their vicinity, we define the RH-Attention mechanism to capture the interactions between robots and humans. The RH-Attention mechanism is defined as follows:

A t t n (Q_{R H}, K_{R H}, V_{R H}) = s o f t m a x (\frac{Q_{R H} K_{R H}^{T}}{\sqrt{d}}) V_{R H}

(18)

Q_{R H} = h_{H H}^{t} w_{H H}^{Q}, K_{H H} = u_{0}^{t} w_{H H}^{K}, V_{H H} = h_{H H}^{t} w_{H H}^{V}

(19)

u_{0}^{t}

represents the linear mapping of the robot state

r_{0}^{t}

, given by

u_{0}^{t} = L i n e a r (r_{0}^{t})

. The symbol

d

denotes the dimension of the feature. By applying the HH-Attention and RH-Attention mechanisms, we obtain the agents’ interaction features

h_{a t t n}^{t} \in ℝ^{1 \times d}

.

(c) Feature fusion

Finally, we employ an adaptive weighting mechanism to combine the features obtained from the attention mechanism and the risk strategy. The procedure is expressed in Equations (20) and (21). The resulting fused feature, denoted as

h_{s p a t i a l}^{t}

, serves as the output of the spatial encoder. The operation is represented by means of Figure 4.

[λ_{1}, λ_{2}]= S o f t m a x[L i n e a r (h_{a t t n}^{t}) © L i n e a r (h_{r i s k}^{t})]

(20)

h_{s p a t i a l}^{t} = λ_{1} \cdot h_{a t t n}^{t} + λ_{2} \cdot h_{r i s k}^{t}

(21)

In Equations (20) and (21),

L i n e a r (\cdot)

operations are used to generate two adaptive weights for the learning-based features

h_{a t t n}^{t}

and risk-aware-based features

h_{r i s k}^{t}

. In addition, softmax activation is also employed to generate the normalized weights, denoted as

λ_{1}

and

λ_{2}

. The abbreviation

©

is used to represent channel-wise feature concatenation.

3.3.2. Temporal Encoder

Since the position and velocity of the robot and visible pedestrians change over time, a temporal network is introduced in the proposed framework to capture the temporal correlation of the robot and visible pedestrians. The algorithm employs a gated recurrent unit (GRU) with a memory function to capture the contextual information in the sequence data by using two gates to control the passing and forgetting of the information. The GRU module can be expressed as Equation (22):

h_{G R U}^{t} = G R U (h_{r o b o t}^{t}, h_{s p a t i a l}^{t}; w_{G R U})

(22)

In the equation,

h_{G R U}^{t}

is the output feature of the GRU module.

h_{r o b o t}^{t}

is the embedded feature obtained by linear mapping of the robot state

r_{0}^{t}

, i.e.,

h_{r o b o t}^{t} = l i n e a r (r_{0}^{t})

.

h_{s p a t i a l}^{t}

is the coded feature of all visible pedestrians.

w_{G R U}

is the parameter to be trained for the module.

3.3.3. Policy Learning

The feature

h_{G R U}^{t}

, which is generated by the temporal encoder, is fed into a fully connected layer to separately output the value

V_{t}

and the policy

π (a_{t} | s_{t})

. We employ the model-free policy gradient algorithm proximal policy optimization (PPO) to the optimal policy

π^{*}

. In the process of finding the optimal policy, PPO relies on the use of a value function. The value function is used to evaluate the expected return of taking a certain action in a specific state. By iteratively updating the value function, the learning algorithm is able to obtain more accurate estimates. The objective of the PPO algorithm is to learn an optimal policy

π^{*}

that maximizes the expected cumulative reward, enabling the robot to take appropriate actions in different states.

The robot learns through interaction with the environment, continuously trying and adjusting its strategy. Through multiple iterations and updates, the robot gradually optimizes its policy and eventually reaches the optimal policy. By learning the optimal policy, the robot is able to safely reach the target location in crowded and dynamically changing environments, avoiding collisions.

4. Experiments and Results

4.1. Experimental Environments

In this study, we employ an experimental environment akin to [17] with the experimental scene measuring

12 m \times 12 m

. The scene consisted of multiple pedestrians and a mobile robot, whose perception radius is

5 m

, restricting its perception to pedestrian coordinates within a circular area with a radius of

5 m

centered on itself.

In the experiment, each pedestrian is treated as a moving cylinder, with the radius of each pedestrian model ranging between

0.3 m

and

0.5 m

. The movement speed of pedestrians ranges from a minimum of

0.5 m / s

to a maximum of

1.5 m / s

. Pedestrian movement is modeled using the ORCA model, with pedestrians exhibiting obstacle avoidance behavior exclusively towards other pedestrians, while disregarding the mobile robots in the environment. The robots in the environment have a maximum speed of

1.57 m / s

, and their motion space is continuous. We also assume that the robot in the environment is not visible to the pedestrians. This assumption ensures that the robot can avoid pedestrians even when it is completely unnoticed by them.

In the experiment, the starting and ending points for both the robots and pedestrians are determined using a random sampling method. The experimental environment is configured to simulate real-world application scenarios, thus improving the transferability of experimental results.

4.2. Training Details for Our Proposed Method

To train a navigation architecture based on risk-aware deep reinforcement learning, we set the number of pedestrians in each simulation environment to 20.

We conducted all experiments on the Ubuntu 22.04 operating system and the PyTorch 2.0 deep learning software development platform. The hardware base contains an Intel i9−12900KF CPU, an NVIDIA GeForce RTX3090Ti display card, and a total of 64 GB DDR5 memory. The CPU has 16 cores and features a clock speed from 3.2 GHz to 5.3 GHz. To accelerate the training procedure, we used multi-thread techniques and virtualized up to 16 simulation environments.

In the training stage, we adopted the RMSprop optimizer in the training procedure and initialized the learning rate with

4 \times 10^{- 5}

. We trained the proposed method for

4 \times 10^{4}

steps.

4.3. Evaluation Metrics

We evaluated the proposed method and compared it with state-of-the-art representative methods, including ORCA [19], DS-RNN [28], CrowdNav++ [17].

In order to assess the algorithm’s performance across varying pedestrian quantities, the experimental design included two distinct scenarios of pedestrian density: low density (with a count of 10 pedestrians), and high density (with a count of 20 pedestrians).

We also conducted more rigorous testing on our proposed algorithm and the CrowdNav++ algorithm. In this test, the number of pedestrians in each environment was increased to 25. This allowed us to evaluate the scalability of our proposed algorithm and assess the performance under more challenging conditions.

In the evaluation experiments, we conducted tests on all methods using 500 randomly selected cases. We evaluated the performance of each method using navigation metrics, which include the success rate (SR) as a percentage, collision rate (CR) as a percentage, average navigation time (NT) in seconds, path length (PL) in meters, and average speed (AS) in m/s for successful episodes. To ensure a fair comparison, all intrusions were calculated based on the ground truth future positions of humans.

4.4. Experiment Results

4.4.1. Experiment Results for Scenarios with Low Pedestrian Density (10 Individuals)

We conducted experimental evaluations of our algorithm’s performance in a scenario with 10 pedestrians and compared it with several existing methods. In the low pedestrian density scenario, the robot focuses on analyzing the pedestrians’ movement trajectories without considering that the pedestrians will react to the robot’s behavior. The SR, CR, NT, PL, and AS metrics for ORCA, DS-RNN, CrowdNav++ and the proposed method are shown in Table 1.

Our proposed algorithm achieved an impressive success rate of 98.0%, outperforming all other methods in the comparison. This exceptional success rate underscores the effectiveness of our approach in enabling robots to autonomously navigate crowded environments while minimizing the risk of collisions. Additionally, our algorithm demonstrated a remarkably low collision rate of 2.0%, further validating its ability to ensure safe navigation.

In terms of navigation time, our algorithm achieved a reasonably competitive result (13.29 s) compared to the other algorithms. However, in terms of path length, our algorithm had a slightly longer path (20.15 m) compared to some of the other methods. Despite the longer path length, our algorithm achieved a higher average speed (1.52 m/s) compared to all the other algorithms.

The average navigation path length and the average navigation time matrices of the proposed method are higher than the other three methods. This phenomenon occurs because the proposed algorithm copes with predictable risks at the cost of a smaller increase in path length, thus avoiding the risk of collision. On the other hand, since the robot is able to predict risks in advance, the continuity of the robot’s motion is better and the average speed of the robot is higher than the other three algorithms.

An illustrative example of robot navigation in a low-crowd density environment is shown in Figure 5.

4.4.2. Experiment Results for Scenarios with High Pedestrian Density (20 Individuals)

We also conducted experimental evaluations of our algorithm’s performance in a scenario with 20 pedestrians and compared it with several existing methods. Table 2 lists the SR, CR, NT, PL, and AS metrics for ORCA, DS-RNN, CrowdNav++, and the proposed methodology in the high-density population setting.

Table 2 shows that our proposed algorithm achieves 93.2% success rate of autonomous navigation. This result shows that the proposed algorithm has good applicability for robot navigation in high-density crowd environments.

In contrast, our algorithm exhibits an average navigation time of 15.89 s and covers an average path length of 22.37 m. While our proposed algorithm may not demonstrate superiority in terms of path length and navigation time, its strength lies in its ability to predict risks in high-density crowd environments. As a result of this advantage, the robot displays remarkable flexibility in its motion during navigation, achieving an impressive average speed of 1.41 m/s.

An illustrative example of robot navigation in a high-crowd-density environment is shown in Figure 6. We can see that the freezing robot problem occurs with the other three methods in a high-density crowded environment. Our proposed method can sense the risk in advance, so as to choose the appropriate path and avoid collision with humans.

4.4.3. Experiment Results for 25 Pedestrian Scenarios

In training the proposed model, the number of pedestrians in the training environment is set to 20. To further test the generalizability of our proposed algorithm, we tested the navigation performance of the robot in 25-pedestrian environments. Note that in the test we still used the weights of the neural network obtained through training in Section 4.2, i.e., the weight file obtained by training the proposed model using an environment with 20 pedestrians.

Table 3 presents the SR, CR, NT, PL, and AS metrics for both CrowdNav++ and the proposed methodology. The table clearly illustrates that our proposed method outperforms the CrowdNav++ algorithm, with a notable success rate of 90.0%. This achievement represents a significant 12% improvement.

Regarding navigation time, our algorithm showed a slightly longer duration of 16.71 s compared to the CrowdNav++ algorithm. It is crucial to note that our algorithm also had a longer path length of 22.82 m.

Despite the longer path, our algorithm achieved a slightly higher average speed of 1.37 m/s compared to the CrowdNav++ algorithm. This indicates that our approach prioritizes maintaining a relatively higher speed during navigation.

Figure 7 and Figure 8 depict the navigation trajectories of both the robot and 25 pedestrians at key frames during the visible testing phase. Based on Figure 7 and Figure 8, it can be observed that the CrowdNav++ algorithm encounters difficulties in circumventing moving pedestrians when the number of pedestrians in the robot’s environment increases. As a result, the robot fails to navigate autonomously. In contrast, our proposed algorithm enhances the robot’s pedestrian sensing ability by incorporating a risk-aware mechanism. This improvement leads to a higher success rate in the robot’s autonomous navigation.

4.5. Analysis of Experimental Results

According to the three experiment results, it can be seen that the proposed method can effectively deal with the robot freezing problem due to crowd density and has better generalization performance. Since our method adaptively fuses learning-based features and collision risk-aware-based features. When the density of pedestrians increases, the weight of the collision risk-aware-based features increases. Therefore, the robot can pay more attention to the pedestrians with collision risk in the dense crowd, so as to choose an appropriate path and avoid collision with pedestrians. The proposed method consistently exhibited superior performance across different pedestrian densities, reaffirming its effectiveness in addressing the challenges of robot crowd navigation. The high success rate achieved by our algorithm signifies its ability to navigate through crowded environments successfully, ensuring the safe and efficient movement of robots. Additionally, the low collision rate observed in our experiments further establishes the robustness of our method in avoiding potential accidents and collisions.

In terms of navigation efficiency, our algorithm demonstrates comparable performance to the existing algorithms, taking into consideration its navigation speed. This noteworthy finding emphasizes the efficacy of our approach in achieving similar levels of efficiency in complex and challenging environments. Not only does our approach confirm the unique advantages of the algorithms proposed in this study in terms of success rate in crowd navigation, but it also attains navigation speed capabilities that meet industry standards.

5. Conclusions

In this paper, we present a novel reinforcement learning strategy that utilizes a risk-aware enhanced attention-based interaction graph for robotic dense crowd navigation.

We first introduce a feature encoding method based on a collision risk function. This method is designed to enable robots to anticipate and avoid potential collisions with humans in a moving crowd. In addition, we present an adaptive feature fusion model that combines features encoded based on the collision risk function with those encoded using learning techniques. By integrating the prior knowledge of collision risk into the learned attention mechanism, we are able to effectively reduce the risk of robot–human collisions in crowded scenes. This innovative approach allows for safer and more efficient navigation in densely populated environments.

Our approach was evaluated in two different environments with varying numbers of random pedestrians (10 and 20), achieving success rates of 98% and 93.2%, respectively, outperforming current state-of-the-art methods. Furthermore, we conducted tests in a higher crowd density scenario (25 random pedestrians) than the training environment and achieved a success rate of 90%, which also surpasses existing state-of-the-art methods. This demonstrates that our proposed method exhibits strong generalization capabilities and effectively addresses the frozen robot problem.

In our future research, we aim to address the following two critical aspects.

First, our current work only takes into account dynamic pedestrian scenarios and does not consider the presence of static obstacles. To overcome this limitation, we plan to incorporate the perception of static obstacles into our navigation approach. Our strategy is to represent the robot’s surrounding environment implicitly using raw data from laser sensors. This will allow us to integrate both dynamic pedestrians and static obstacles within a single representation framework, providing a more comprehensive understanding of the robot’s environment. This advancement will greatly enhance the navigation capabilities of the robot, increasing its efficiency and safety.

Furthermore, our current research model assumes that pedestrians are not aware of the mobile robot’s presence, which results in no reaction to the robot’s movements. This assumption does not fully reflect the reality of human perception and expectations in real-world interactions. Therefore, in our future work, we aim to develop a more comprehensive human–robot interaction model. This model will be based on principles derived from social psychology, allowing for a more realistic simulation of the interaction between humans and robots. This will not only enhance the robot’s navigation capabilities but also make its movements more intuitive and predictable for pedestrians, thereby improving overall safety and efficiency.

Author Contributions

Conceptualization, X.S.; methodology, X.S. and Q.Z.; software, Q.Z. and Y.W.; writing—original draft preparation, Y.W. and M.L.; writing—review and editing, X.S.; visualization, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number 61903162) and Jiangsu Province’s “Double Innovation Plan”: Research and development of flexible cooperative robot technology for intelligent manufacturing.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Author Mingmin Liu was employed by the company SIASUN Robot & Automation Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Brose, S.W.; Weber, D.J.; Salatin, B.A.; Grindle, G.G.; Wang, H.; Vazquez, J.J.; Cooper, R.A. The Role of Assistive Robotics in the Lives of Persons with Disability. Am. J. Phys. Med. Rehab. 2010, 89, 509–521. [Google Scholar] [CrossRef] [PubMed]
Matarić, M.J.; Scassellati, B. Socially Assistive Robotics. In Springer Handbook of Robotics; Springer: Berlin/Heidelberg, Germany, 2016; Volume 18, pp. 1973–1994. [Google Scholar] [CrossRef]
Matarić, M.J. Socially Assistive Robotics: Human Augmentation versus Automation. Sci. Rob. 2017, 2, eaam5410. [Google Scholar] [CrossRef] [PubMed]
Udupa, S.; Kamat, V.R.; Menassa, C.C. Shared Autonomy in Assistive Mobile Robots: A Review. Disabil. Rehabil. Assist. Technol. 2023, 18, 827–848. [Google Scholar] [CrossRef] [PubMed]
Xiao, X.; Liu, B.; Warnell, G.; Stone, P. Motion Planning and Control for Mobile Robot Navigation Using Machine Learning: A Survey. Auton. Robot. 2022, 46, 569–597. [Google Scholar] [CrossRef]
Helbing, D.; Molnár, P. Social Force Model for Pedestrian Dynamics. Phys. Rev. E 1995, 51, 4282–4286. [Google Scholar] [CrossRef] [PubMed]
Zhong, X.; Tian, J.; Hu, H.; Peng, X. Hybrid Path Planning Based on Safe A* Algorithm and Adaptive Window Approach for Mobile Robot in Large-Scale Dynamic Environment. J. Intell. Robot. Syst. 2020, 99, 65–77. [Google Scholar] [CrossRef]
Guillén-Ruiz, S.; Bandera, J.P.; Hidalgo-Paniagua, A.; Bandera, A. Evolution of Socially-Aware Robot Navigation. Electronics 2023, 12, 1570. [Google Scholar] [CrossRef]
Qin, L.; Huang, Z.; Zhang, C.; Guo, H.; Ang, M.; Rus, D. Deep Imitation Learning for Autonomous Navigation in Dynamic Pedestrian Environments. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 4108–4115. [Google Scholar]
Everett, M.; Chen, Y.F.; How, J.P. Collision Avoidance in Pedestrian-Rich Environments With Deep Reinforcement Learning. IEEE Access 2021, 9, 10357–10377. [Google Scholar] [CrossRef]
Cui, Y.; Zhang, H.; Wang, Y.; Xiong, R. Learning World Transition Model for Socially Aware Robot Navigation. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 9262–9268. [Google Scholar]
Jin, J.; Nguyen, N.M.; Sakib, N.; Graves, D.; Yao, H.; Jagersand, M. Mapless Navigation among Dynamics with Social-Safety-Awareness: A Reinforcement Learning Approach from 2D Laser Scans. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 6979–6985. [Google Scholar]
Patel, U.; Kumar, N.K.S.; Sathyamoorthy, A.J.; Manocha, D. DWA-RL: Dynamically Feasible Deep Reinforcement Learning Policy for Robot Navigation among Mobile Obstacles. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 6057–6063. [Google Scholar]
Perez-D’Arpino, C.; Liu, C.; Goebel, P.; Martin-Martin, R.; Savarese, S. Robot Navigation in Constrained Pedestrian Environments Using Reinforcement Learning. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1140–1146. [Google Scholar]
Samsani, S.S.; Mutahira, H.; Muhammad, M.S. Memory-Based Crowd-Aware Robot Navigation Using Deep Reinforcement Learning. Complex Intell. Syst. 2023, 9, 2147–2158. [Google Scholar] [CrossRef]
Chen, Y.; Liu, C.; Shi, B.E.; Liu, M. Robot Navigation in Crowds by Graph Convolutional Networks with Attention Learned From Human Gaze. IEEE Robot. Autom. Lett. 2020, 5, 2754–2761. [Google Scholar] [CrossRef]
Liu, S.; Chang, P.; Huang, Z.; Chakraborty, N.; Hong, K.; Liang, W.; McPherson, D.L.; Geng, J.; Driggs-Campbell, K. Intention Aware Robot Crowd Navigation with Attention-Based Interaction Graph. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 12015–12021. [Google Scholar]
Sathyamoorthy, A.J.; Patel, U.; Guan, T.; Manocha, D. Frozone: Freezing-Free, Pedestrian-Friendly Navigation in Human Crowds. IEEE Robot. Autom. Lett. 2020, 5, 4352–4359. [Google Scholar] [CrossRef]
Van Den Berg, J.; Guy, S.J.; Lin, M.; Manocha, D. Reciprocal N-Body Collision Avoidance. In Robotics Research; Pradalier, C., Siegwart, R., Hirzinger, G., Eds.; Springer Tracts in Advanced Robotics; Springer: Berlin/Heidelberg, Germany, 2011; Volume 70, pp. 3–19. ISBN 978-3-642-19456-6. [Google Scholar]
Kamezaki, M.; Tsuburaya, Y.; Kanada, T.; Hirayama, M.; Sugano, S. Reactive, Proactive, and Inducible Proximal Crowd Robot Navigation Method Based on Inducible Social Force Model. IEEE Robot. Autom. Lett. 2022, 7, 3922–3929. [Google Scholar] [CrossRef]
Han, R.; Chen, S.; Wang, S.; Zhang, Z.; Gao, R.; Hao, Q.; Pan, J. Reinforcement Learned Distributed Multi-Robot Navigation With Reciprocal Velocity Obstacle Shaped Rewards. IEEE Robot. Autom. Lett. 2022, 7, 5896–5903. [Google Scholar] [CrossRef]
Chen, C.; Liu, Y.; Kreiss, S.; Alahi, A. Crowd-Robot Interaction: Crowd-Aware Robot Navigation With Attention-Based Deep Reinforcement Learning. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 6015–6022. [Google Scholar]
Samsani, S.S.; Muhammad, M.S. Socially Compliant Robot Navigation in Crowded Environment by Human Behavior Resemblance Using Deep Reinforcement Learning. IEEE Robot. Autom. Lett. 2021, 6, 5223–5230. [Google Scholar] [CrossRef]
Wang, R.; Wang, W.; Min, B.-C. Feedback-Efficient Active Preference Learning for Socially Aware Robot Navigation. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 11336–11343. [Google Scholar]
Mun, Y.-J.; Itkina, M.; Liu, S.; Driggs-Campbell, K. Occlusion-Aware Crowd Navigation Using People as Sensors. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 12031–12037. [Google Scholar]
Zheng, Z.; Cao, C.; Pan, J. A Hierarchical Approach for Mobile Robot Exploration in Pedestrian Crowd. IEEE Robot. Autom. Lett. 2022, 7, 175–182. [Google Scholar] [CrossRef]
Liu, Z.; Zhai, Y.; Li, J.; Wang, G.; Miao, Y.; Wang, H. Graph Relational Reinforcement Learning for Mobile Robot Navigation in Large-Scale Crowded Environments. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8776–8787. [Google Scholar] [CrossRef]
Liu, S.; Chang, P.; Liang, W.; Chakraborty, N.; Driggs-Campbell, K. Decentralized Structural-RNN for Robot Crowd Navigation with Deep Reinforcement Learning. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 3517–3524. [Google Scholar]
Zhang, Y.; Feng, Z. Crowd-Aware Mobile Robot Navigation Based on Improved Decentralized Structured RNN via Deep Reinforcement Learning. Sensors 2023, 23, 1810. [Google Scholar] [CrossRef] [PubMed]
He, H.; Fu, H.; Wang, Q.; Zhou, S.; Liu, W. Spatio-Temporal Transformer-Based Reinforcement Learning for Robot Crowd Navigation. arXiv 2023, arXiv:2305.16612. [Google Scholar]
Wang, W.; Wang, R.; Mao, L.; Min, B.C. NaviSTAR: Socially Aware Robot Navigation with Hybrid Spatio-Temporal Graph Transformer and Preference Learning. arXiv 2023, arXiv:2304.05979. [Google Scholar]
Shani, G.; Pineau, J.; Kaplow, R. A Survey of Point-Based POMDP Solvers. Auton. Agents Multi-Agent Syst. 2013, 27, 1–51. [Google Scholar] [CrossRef]
Chen, Y.F.; Liu, M.; Everett, M.; How, J.P. Decentralized Non-Communicating Multiagent Collision Avoidance with Deep Reinforcement Learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Marina Bay Sands, Singapore, 29 May–3 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 285–292. [Google Scholar]

Figure 1. Illustration of the proposed robot–human interaction method. The proposed method integrates learning-based and risk-aware strategy-based robot–human interaction features. The gray arrow represents the velocity of the agent.

Figure 2. The network architecture of our proposed approach. The size of the scene is

12 m \times 12 m

and the environment contains 20 pedestrians and one mobile robot with the visualization radius of

5 m

. The neural network consists of the spatial feature encoder, temporal feature encoder and policy-learning subnetwork. The spatial feature encoder integrates learning-based and risk-aware-based features, which leads to robust feature expression for policy learning.

Figure 2. The network architecture of our proposed approach. The size of the scene is

12 m \times 12 m

and the environment contains 20 pedestrians and one mobile robot with the visualization radius of

5 m

. The neural network consists of the spatial feature encoder, temporal feature encoder and policy-learning subnetwork. The spatial feature encoder integrates learning-based and risk-aware-based features, which leads to robust feature expression for policy learning.

Figure 3. One robot interacts with three pedestrians. In the figure,

v_{0}^{t}

indicates the speed of the robot at time

t

.

v_{i}^{t}

stands for the speed vector of the observable pedestrian

i, i \in \{1, 2, \dots, k\}

.

α_{i}^{t}

and

β_{i}^{t}

denote the angle between the line connecting pedestrian

i

and the robot and the velocity vector.

d_{i}^{t}

represents the straight-line distance between

v_{i}^{t}

and

v_{0}^{t}

.

Figure 3. One robot interacts with three pedestrians. In the figure,

v_{0}^{t}

indicates the speed of the robot at time

t

.

v_{i}^{t}

stands for the speed vector of the observable pedestrian

i, i \in \{1, 2, \dots, k\}

.

α_{i}^{t}

and

β_{i}^{t}

denote the angle between the line connecting pedestrian

i

and the robot and the velocity vector.

d_{i}^{t}

represents the straight-line distance between

v_{i}^{t}

and

v_{0}^{t}

.

Figure 4. Schematic of the feature fusion module.

Figure 5. Comparison of navigation trajectories in visible testing for scenarios with 10 pedestrians in the environment. (a) ORCA, (b) DSRNN, (c) CrowdNav++, (d) Ours. In the figure, the red star stands for the goal of the mobile robot. The circles of different colors represent the paths taken by different agents. Among them, the trajectory of the green circle is the movement path of the mobile robot.

Figure 6. Comparison of navigation trajectories in visible testing for scenarios with 20 pedestrians in the environment. (a) ORCA, (b) DSRNN, (c) CrowdNav++, (d) Ours. In the figure, the red star stands for the goal of the mobile robot. The circles of different colors represent the paths taken by different agents. Among them, the trajectory of the green circle is the movement path of the mobile robot.

Figure 7. Case 1: comparison of the navigation trajectories of the pedestrians and the robot for each frame in the visible testing under the condition of 25 pedestrians in the environment. (a) CrowdNav++, (b) the proposed method. In the figure, the red star stands for the goal of the mobile robot. Solid-lined circles with numerical identifiers indicate the presence of pedestrians within the environment. The smaller circles, rendered in yellow, denote the mobile robot. Meanwhile, the larger circles outlined with dashed lines signify the extent of the mobile robot’s sensory field of view.

Figure 8. Case 2: comparison of the navigation trajectories of the pedestrians and the robot for each frame in the visible testing under the condition of 25 pedestrians in the environment. (a) CrowdNav++, (b) the proposed method. In the figure, the red star stands for the goal of the mobile robot. Solid-lined circles with numerical identifiers indicate the presence of pedestrians within the environment. The smaller circles, rendered in yellow, denote the mobile robot. Meanwhile, the larger circles outlined with dashed lines signify the extent of the mobile robot’s sensory field of view.

Table 1. Experimental Comparison Results for Scenarios with Low Pedestrian Density. SR: successful rate, CR: collision rate, NT: navigation time, PL: path length, AS: average speed. In the table, the symbol of an upward arrow ‘↑’ signifies that a higher value is preferred, whereas the downward arrow ‘↓’ indicates that a lower value is more desirable.

Method	SR (%) ↑	CR (%) ↓	NT (s) ↓	PL (m) ↓	AS (m/s) ↑
ORCA [19]	73.0	27.0	13.56	17.09	1.26
DS-RNN [28]	86.0	14.0	15.32	19.05	1.24
CrowdNav++ [17]	95.0	5.0	12.70	19.10	1.50
Ours	98.0	2.0	13.29	20.15	1.52

Table 2. Experimental Comparison Results for Scenarios with High Pedestrian Density. SR: successful rate, CR: collision rate, NT: navigation time, PL: path length, AS: average speed. In the table, the symbol of an upward arrow ‘↑’ signifies that a higher value is preferred, whereas the downward arrow ‘↓’ indicates that a lower value is more desirable.

Method	SR (%) ↑	CR (%) ↓	NT (s) ↓	PL (m) ↓	AS (m/s) ↑
ORCA [19]	69.0	31.0	14.77	17.67	1.20
DS-RNN [28]	64.0	36.0	16.31	19.63	1.20
CrowdNav++ [17]	89.0	11.0	15.03	21.31	1.42
Ours	93.2	6.8	15.89	22.37	1.41

Table 3. Comparison of experimental results between CrowdNav++ and the proposed method for the case of 25 pedestrians in the environment. SR: successful rate, CR: collision rate, NT: navigation time, PL: path length, AS: average speed. In the table, the symbol of an upward arrow ‘↑’ signifies that a higher value is preferred, whereas the downward arrow ‘↓’ indicates that a lower value is more desirable.

Method	SR (%) ↑	CR (%) ↓	NT (s) ↓	PL (m) ↓	AS (m/s) ↑
CrowdNav++ [17]	78.0	22.0	15.06	20.35	1.35
Ours	90.0	10.0	16.71	22.82	1.37

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, X.; Zhang, Q.; Wei, Y.; Liu, M. Risk-Aware Deep Reinforcement Learning for Robot Crowd Navigation. Electronics 2023, 12, 4744. https://doi.org/10.3390/electronics12234744

AMA Style

Sun X, Zhang Q, Wei Y, Liu M. Risk-Aware Deep Reinforcement Learning for Robot Crowd Navigation. Electronics. 2023; 12(23):4744. https://doi.org/10.3390/electronics12234744

Chicago/Turabian Style

Sun, Xueying, Qiang Zhang, Yifei Wei, and Mingmin Liu. 2023. "Risk-Aware Deep Reinforcement Learning for Robot Crowd Navigation" Electronics 12, no. 23: 4744. https://doi.org/10.3390/electronics12234744

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Risk-Aware Deep Reinforcement Learning for Robot Crowd Navigation

Abstract

1. Introduction

2. Related Works

3. Methodology

3.1. Problem Formulation

3.2. Risk-Aware Interaction Graph Representation

3.3. Network Architecture

3.3.1. Spatial Feature Encoder

3.3.2. Temporal Encoder

3.3.3. Policy Learning

4. Experiments and Results

4.1. Experimental Environments

4.2. Training Details for Our Proposed Method

4.3. Evaluation Metrics

4.4. Experiment Results

4.4.1. Experiment Results for Scenarios with Low Pedestrian Density (10 Individuals)

4.4.2. Experiment Results for Scenarios with High Pedestrian Density (20 Individuals)

4.4.3. Experiment Results for 25 Pedestrian Scenarios

4.5. Analysis of Experimental Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI