Hierarchical Motion Planning of AUVs in Three Typical Marine Environments

Ran, Xiangrui; Bian, Hongyu; Zhang, Guocheng; Sun, Yushan

doi:10.3390/electronics10030292

Open AccessArticle

Hierarchical Motion Planning of AUVs in Three Typical Marine Environments

Science and Technology on Underwater Vehicle Laboratory, Harbin Engineering University, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

Electronics 2021, 10(3), 292; https://doi.org/10.3390/electronics10030292

Submission received: 29 December 2020 / Revised: 19 January 2021 / Accepted: 20 January 2021 / Published: 26 January 2021

(This article belongs to the Section Systems & Control Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Broad waters, harbor waters, and waterway waters make up more than 90% of autonomous underwater vehicles (AUV) navigation area, and each of them has its typical environmental characteristics. In this paper, a three-layer AUV motion planning architecture was designed to improve the planning logic of an AUV when completing complex underwater tasks. The AUV motion planning ability was trained by the improved deep deterministic policy gradient (DDPG) combined with the experience pool of classification. Compared with the traditional DDPG algorithm, the proposed algorithm is more efficient. Using the strategy obtained from the training and the motion planning architecture proposed in the paper, the tasks of AUVs searching in broad waters, crossing in waterway waters and patrolling in harbor waters were realized in the simulation experiment. The reliability of the planning system was verified in field tests.

Keywords:

autonomous underwater vehicle; motion plan; marine environment; deep deterministic policy gradient; artificial expertise

1. Introduction

The autonomous underwater vehicle (AUV) is an important tool in the field of ocean development [1]. It can be used for environmental monitoring and topographic detection in broad waters, patrolling in harbor waters, and crossing in waterway waters. Broad waters, harbor waters, and waterway waters make up more than 90% of AUVs navigation area, and each of them has its typical environmental characteristics. Broad waters belong to the ideal area for AUVs navigation, with large space for avoidance and few restrictions on behavior and action. There are irregular dynamic and static obstacles in harbor waters, and AUVs can move within a narrow range. The complex waterway environment and obstacles such as fast-moving ships require real-time high performance AUV motion planning. When an AUV performs underwater tasks, it is faced with complex environments and situations such as sudden obstacles and unknown water environment. Therefore, it is urgent to improve AUV’s motion planning capability under complex dynamic environment [2].

Eichhorn, Mike et al. [3] proposed several important requirements for optimal motion planning for the "SLOCUM Glider" AUV, as well as a solution based on fast graph algorithms. According to the marine environment of the Continental shelf of Newfoundland and Labrador, this paper planned the optimal navigation path of the waterway area.

Sun, Zhu et al. [4] proposed an optimized fuzzy control algorithm for AUVs motion planning. The model could be used for motion planning in complex underwater environment. The virtual acceleration and speed of an AUV were obtained by using the environmental information collected by sonar and the fuzzy system with an acceleration and braking module, so that the AUV could avoid dynamic obstacles automatically. However, considering that the choice of a fuzzy boundary is subjective, the generated path could not be guaranteed to be optimal. The following year, they [5] proposed a new discrete centralized planning strategy based on glasius bio-inspired neural network for AUVs full coverage motion planning. The algorithm had low computational cost and high efficiency. It solved the problem that a single AUV is difficult to carry out full coverage task with long range. Ramos, Garcia-Garrido et al. [6] planned a path to optimize scientific impact and navigation efficiency according to the complex space-time structure of ocean flow field and dynamic system. They pre-planned the Silbo glider’s mission to cross the North Atlantic from April 2016 to March 2017. The planning capability of the system in broad waters and waterway waters was demonstrated. MahmoudZadeh, Somaiyeh et al. [7] designed a hierarchical dynamic task planning framework for an AUV to complete task assignment within a limited time in an uncertain underwater environment. An advanced reactive task planner was developed to guide an AUV towards the target and finish the task on time. They designed a low-level motion planner to handle unexpected changes in dynamic terrain by regenerating the optimal trajectory. Based on the biogeography optimization (BBO) algorithm, the task was rearranged by updating the terrain, and the motion planning of broad water area was verified.

The above research content only focuses on a single water area and lacks detailed obstacle avoidance strategies. Different underwater environments have different characteristics, and AUVs motion planning needs a logical, detailed and accurate hierarchical structure.

In order to solve the problem of AUVs planning in three typical underwater environments, a three-layer AUV motion planning architecture is proposed, and the planning logic of AUV in complex tasks is given in the paper. At the same time, in order to reduce the complexity of programming, the improved reinforcement learning method is used to train the planning strategy of AUVs. DDPG algorithm is improved in the aspect of critic network. The improved algorithm combined with the classified experience pool is used to train the AUV motion planning ability. Compared with the traditional DDPG algorithm, the proposed algorithm is more efficient. In addition, other traditional methods, such as A* and ant colony algorithm, can only carry out global optimization and are easy to fall into local minimum. By contrast, the method proposed in this paper can obtain the local obstacle avoidance strategy for AUVs, and the intelligence level of AUVs can be improved.

The structure of the paper is as follows: The Section 2 establishes the three-layer motion planning architecture of the AUV. In Section 3, an improved reinforcement learning algorithm is proposed, and an algorithm model is established by combining with the artificial experience pool. In Section 4, the comparative experiment of the algorithm and the motion planning experiment of an AUV under three typical environments are carried out. The fifth chapter carries on the field experiments of an AUV motion planning. Section 6 concludes the study.

2. The Hierarchy of Motion Planning

AUV motion planning is divided into three levels of "task-behavior-action". The work which an AUV needs to complete is defined as tasks, such as: motion planning, target following, terrain detection, underwater search, etc. Target commands generated by an AUV underwater navigation are defined as actions, such as: left turn, right turn, forward, etc. The set of a series of actions generated by an AUV in order to complete the task is defined as the behavior, such as obstacle avoidance, target search, path following, etc. The three levels can be understood as: task decomposition layer, behavior planning layer, and action execution layer. The task decomposition layer breaks down the task commands received by the AUV into behaviors. The behavior planning layer plans the behavior based on the environmental information. The action execution layer uses reinforcement learning method to train AUV’s movements to complete the motion planning. The AUV actuator is controlled to generate the action to achieve the target commends.

The motion planning task of AUVs is divided into three layers, as shown in Figure 1. The top layer is the task layer, which is the motion planning task. The second layer is the behavior layer, including obstacle avoidance behavior and navigation to the target. The third layer is the action layer, including: change velocities, change directions, up and down, emergency braking and back action.

The task layer is the highest level of AUV learning system, the last learning layer and the slowest learning layer. Therefore, in practical design, task design at the root task level is relatively simple, making it easy for the AUV to learn experience and make decisions. Decision design is the most important part of the root task level, so a better decision strategy should be designed for the root task layer.

The behavior layer realizes the analysis of environmental information, the update of knowledge and the choice of behavior. The AUV must analyze the environmental information collected by the sensor and send the analysis result to the decision-making strategy module, and then make the choice of behavior according to the corresponding policy decision. At the same time, the learning system can update the analysis results to its knowledge database, and then send the knowledge database to the decision-making strategy module of this layer.

The action layer selects actions according to its own decision-making strategies. It is the bottom layer of the hierarchy and also the execution layer of the motion planning task. There are options for multiple actions that are executed immediately after selected.

The decision-making process of the system is a top-down process. The task layer does not directly select policies according to the detected environment state, but makes decisions from the top to the bottom layer, until the basic action instructions to be executed are finally decided.

Decision making is a bottom-up learning process. First, learn the action layer. Then, the behavioral layer learns to select strategies for each behavior. Finally, the task layer learns how to implement the AUV’s final task decision. This process can be extended based on decision problems.

The planning logic of an AUV navigation is as follows:

Load the existing map environment and initialize the parameters.
The planning mission is confirmed and the AUV begins to sail.
According to the multi-constraint model, the optimal navigation behavior to the target point is planned.
Real-time sensor data is received to determine the AUV location and supplement the environmental map.
Whether the AUV has reached the target point: Yes, end the task. No, proceed to the next step.
Whether unknown obstacle is detected: No, go to 7. Yes, go to 8.
Whether an AUV deviates from the path of global planning: No, maintain this behavior. Yes, preplanning to navigate to the target. Go to 9.
According to the multi-constraint model, the optimal obstacle avoidance behavior is programmed. Go to 9.
Plan actions;
The AUV performs one step of the planned action, go to 4.

3. Motion Planning Algorithm Modeling

The AUV’s ability to reach the target and avoid obstacles is trained by improved deep deterministic policy gradient (DDPG) [8], combining artificial experience [9] with reinforcement learning algorithms. Artificial experience is used to improve the training efficiency of reinforcement learning. Reinforcement learning is used to optimize the obstacle avoidance strategy of artificial experience.

3.1. The Improved DDPG

DDPG combines Actor Critic [10] and Deep Q Network (DQN) [11], which is to apply the memory banks in the DQN structure and the idea of two neural networks update on Actor Critic. The Critic network in DDPG only evaluates the overall actions’ performance. However, there are 6-dof outputs which are coupled with each other in the motion system of AUVs, including: longitudinal velocity u, lateral velocity

v

, angular velocity

ω

, heeling angle

φ

, trim angle

θ

, and heading angle

ψ

. The critic network does not evaluate the actions in each dimension, so the difference between the state value of the optimal and non-optimal actions is small, resulting in low learning efficiency of the algorithm. This study proposes to establish the average motion critic network to solve this problem.

DDPG is divided into two parts: Main Net and Target Net, as shown in Figure 2. Each part includes an ActorNet and six CriticNets. The ActorNet outputs actions of 6-dof, and the CriticNet evaluates each action. Considering a standard Reinforcement Learning (RL) problem, a finite Markov Decision Process (MDP) which comprises a current state

s_{t}

, an action space

a_{t}

, a reward function

r

, and the next state

s_{t + 1}

is established.

Extract data from the experience pool for learning. Input the 6-dof actions at form

(s_{t}, a_{t}, r, s_{t + 1})

into six critic networks of Main Net, and each critic network calculates the value function of each action, respectively, by Bellman equation:

Q^{π} (s_{t} [i], a_{t} [i]) = E_{r_{t}, s_{t + 1} \sim E} [r (s_{t} [i], a_{t} [i]) + γ E_{a_{t + 1} \sim π} [Q^{π} (s_{t + 1} [i], a_{t + 1} [i])]]

(1)

where, i is between 1 and 6. If the target policy is deterministic, it can be described as a function

μ : s \leftarrow a

, and the inner expectation is as followings:

Q^{μ} (s_{t} [i], a_{t} [i]) = E_{r_{t}, s_{t + 1} \sim E} [r (s_{t} [i], a_{t} [i]) + γ [Q^{μ} (s_{t + 1} [i], μ (s_{t + 1} [i]))]]

(2)

Get the average Q value

\bar{Q} (s_{t}, a_{t})

:

{\bar{Q}}^{μ} (s_{t}, a_{t}) = \frac{1}{6} \sum_{i = 1}^{6} Q^{μ} (s_{t} [i], a_{t} [i])

(3)

According to Q-learning [12], considering function approximators parameterized by

θ^{Q}

, the loss function is:

L (Q) = \bar{R} + γ \max_{a} \bar{Q} (s_{t + 1}, μ (s_{t + 1}) | θ^{Q}) - \bar{Q} (s_{t}, a_{t} | θ^{Q})

(4)

Train the neural network to minimize the loss function so that the actual

\bar{Q}

value tends to the target

\bar{Q}

value.

On basis of policy gradient, a parameterized actor function

μ (s | θ^{μ})

which specifies the current policy by deterministically mapping states to a specific action is maintained. The parameters of the motion estimation network are updated by following function:

\begin{array}{l} \nabla_{θ^{μ}} J & \approx \frac{1}{N} \sum_{t} \nabla_{θ^{μ}} \bar{Q} (s, a | θ^{Q}) |_{s = s_{t}, a = μ (s_{t} | θ^{μ})} \\ = \frac{1}{N} \sum_{t} \nabla_{a} \bar{Q} (s, a | θ^{Q}) |_{s = s_{t}, a = μ (s_{t})} \nabla_{θ^{μ}} μ (s | θ^{μ}) |_{s = s_{t}} \end{array} .

(5)

3.2. Obstacle Avoidance Strategy Based on Artificial Experience

The AUV adopts the method of artificial experience to avoid obstacles during underwater navigation.15

(1) The calculation formula of the heading angle is as follows:

β = {\begin{array}{l} β^{'} + \arctan (y_{2} - y_{1}) / (x_{2} - x_{1}) & x_{1} < x_{2} \\ β^{'} + \arctan (y_{2} - y_{1}) / (x_{2} - x_{1}) + π & x_{1} > x_{2} \\ π / 2 & x_{1} = x_{2} and y_{1} < y_{2} \\ - π / 2 & x_{1} = x_{2} and y_{1} > y_{2} \\ 0 & x_{1} = x_{2} and y_{1} = y_{2} \end{array} .

(6)

where, β is the target heading. β’ is the current heading. (x₁, y₁) is the current position of an AUV. (x₂, y₂) is the target position of an AUV.

(2) When an AUV detects a static obstacle within 5 m in front, avoid it. Make an edge in the direction perpendicular to the AUV navigation, while reserving a 2 m safe range to rectangle the obstacle. The rectangle angle nearest to an AUV is taken as the target point and the heading angle is calculated by Formula 6, as shown in Figure 3.

(3) When the AUV detects a dynamic obstacle within 5 m in front, avoid it. The node at the end of the dynamic obstacle is taken as the target point, and the heading angle is calculated based on Formula 6, as shown in Figure 4.

The artificial experience algorithm uses simple logic to avoid obstacles. However, it is not optimal; the logic of the artificial experience algorithm needs to be trained.

3.3. Sample Space Classification

The sample space of DDPG algorithm is divided into four categories: the original sample space D₀, high error sample space D₁, diversity sample space D₂, and artificial sample space D₃;

In DDPG algorithm, the network samples randomly from the buffer for training. The buffer is finite in size in which

(s_{t}, a_{t}, r, s_{t + 1})

is stored and sampled randomly according to the exploration strategy. However, due to the limited of sampling space size and the general learning effect of samples in the early stage, the learning rate will slow down in the later stage, and the behavior cannot be significantly improved, making the control range of the controller unstable. Therefore, in the second stage of learning, the sample space should be increased to get better samples, so as to obtain more control experience. Before that, the bisimulation is showed as following.

Definition 1.

The bisimulation [13]: If

E \subseteq S \times S

is the relationship of bisimulation, then for

s_{1}, s_{2} \in S

:

(1) \forall a \in A, r (s_{1}, a) = r (s_{2}, a);

(7)

(2) \forall a \in A, \forall C \in \frac{S}{E}, \sum_{t \in C} P (t | s_{1}, a) = \sum_{t \in C} P (t | s_{2}, a);

(8)

where, the equivalent set of statesSwith respect toEis denoted by

\frac{S}{E}

.

P (t | s_{1}, a)

represents the probability that the system takes an actionaand moves to the next state at timetin the process of MDP. If the two states satisfy the bisimulation relationship, it can be denoted as

s_{1} \sim s_{2}

.

Lemma 1.

Bisimulation metric [14]: D is defined as the set of metrics on the state set S, and let

d \in D

. For

\forall s_{1}, s_{2} \in S

, define:

G (d) (s_{1}, s_{2}) = \max_{a \in A} (d_{a} (s_{1}, s_{2}) + γ T_{K} (d) (P (s_{1}, a) P (s_{2}, a)))

(9)

where,

0 \leq γ \leq 1

,

d_{a} (s_{1}, s_{2}) = | r (s_{1}, a) - r (s_{2}, a) |

,

T_{K} (d)

is the Kantorovich metric [15], which is defined by the following linear program:

\max_{i = 1 \dots | S |} \sum_{i = 1}^{| S |} (P (s_{i}) - Q (s_{i})) u_{i} .

(10)

Subject to:

\forall i, j . u_{i} - u_{j} \leq d (s_{i}, s_{j})

,

\forall i .0 \leq u_{i} \leq 1

.

On the basis of the above questions, this study constructed two sample spaces besides the original sample space D₀, namely, high error sample space D₁ and diversity sample space D₂. In the space D₁, the temporal-difference (TD) error is used as the heuristic information to sort samples in the sample space so as to improve the probability of selecting samples with large error. The bisimulation measurement method is used to measure the distance between samples in D₂. During the algorithm learning process, the training samples will be selected in proportion from the D₁ and the D₂, respectively, so as to give consideration to the diversity and value of samples.

The AUV’s position within 5 m of the obstacle is set as the dangerous state. The system selects an action based on artificial experience. When the AUV encounters an obstacle and correctly outputs the heading angle according to the artificial experience, the reward value will be 1, which will be stored in the artificial experience pool D₃.

During the process of training, the sample

(s_{t}, a_{t}, r, s_{t + 1})

is put into D₀. TD error of the action function is calculated, as shown in formula (10). When the TD error is larger than a threshold value, it indicates that the sample has a great influence on the change of the action value function, and it can be considered that the modified sample has high value. Thus, the sample is put in D₁. When the samples in D₁ are sufficient, the training samples are not only selected from D₀, but from D₀ and D₁, respectively, in a certain proportion. Meanwhile, Algorithm 1 is used to calculate the distance between samples in D₀. Low-similarity samples are put into D₂ to ensure the diversity of selected samples. After that, Samples from D₁ and D₂ are selected, respectively, in a certain proportion for learning. The distance between samples is calculated by Algorithm 1. The larger the distance, the lower the similarity of samples. When the state of AUV is in danger, the samples is put in D₃.

Algorithm 1 Distance measurement algorithm between states.

Input states s₁ and s₂;

Initialize:

d (s_{1}, s_{2}) = 0,

distance parameters

γ

and

ζ

.

for

k = 1

in range

k \leq \frac{\ln ζ}{\ln γ}

do

fori = 1 in range

| A |

do

T_{K} (d) (p (s_{1}, a_{i}), p (s_{2}, a_{i}))

;

End

d (s_{1}, s_{2}) = \max_{a \in A} {d_{a} (s_{1}, s_{2}) + γ T_{K} (d) (p (s_{1}, a_{i}), p (s_{2}, a_{i}))}

;

End

3.4. Algorithm Model Establishment

The algorithm mainly includes three models [11]: input model, output model, and reward value model.

The input model of the algorithm is the state of an AUV, including position coordinates, velocity, and heading of an AUV. The output of the algorithm is the action of an AUV. The reward value model is designed based on multiple constraints. The constraints of motion planning are: following target constraint, obstacle constraint, and current constraint.

A multi-constraint model is established, which includes the position coordinates and attitude of the AUV, ocean currents, and collision distance of obstacles. By adding the weight coefficient of economy, safety and concealment, the navigation value is obtained.

C = ς \cdot (\sin α - l_{o b} / L_{\max}) + ζ \cdot (\cos α + l_{t a} / L_{\max})

(11)

C is the constraint. The larger the C, the more costly the AUV navigation. So, C should be as small as possible.

α

is the angle between the current and the AUV attitude.

l_{t a}

is the distance between the current position of the AUV and the target position.

L_{\max}

is the distance from the initial position of the AUV to the target position.

l_{o b}

is the distance between the current position of the AUV and the obstacle.

ς

is the security coefficient, and the value is (0,1). The larger

ς

is, the more security is considered in the planning. The AUV tends to yaw at an angle of 90° with the current. Therefore, the attitude of AUV is planned to avoid a 90° angle to the current.

ζ

is the economic coefficient, and the value is (0,1). The larger

ζ

is, the more economical it is considered in the planning. In short, the AUV should try to navigate downstream and avoid obstacles to reach the target as soon as possible.

The multi-constraint model of AUV is designed as the reward value during the training of DDPG algorithm,

R = - C

. The higher the C value, the higher the AUV navigation cost, the worse the reward value. The smaller the C value, the smaller the AUV navigation cost and the better the reward value. The heading of the AUV is treated as an action, which is a continuous output value. The planned combination of actions is the behavior and, in this article, the path.

In addition, if the AUV hits the obstacle, the round ends, and the reward R = −1. If the AUV reaches the target, the round ends, and the reward R = 1. During the path following, the system will be rewarded in real time:

R = - l_{d} / L_{\max}

(12)

l_{d}

represents the vertical distance between the AUV position and the target path.

4. Simulation Experiment

4.1. Contrast Experiment

AUV motion planning is simulated by using the algorithm designed in this paper. AUV’s position coordinates, AUV’s current heading angle, speed, distance between the AUV and the target, distance between the AUV and the obstacle are taken as input, and AUV’s actions are output to train AUV’s ability to follow towards the target and avoid obstacles. The training process is as follows in Algorithm 2:

Algorithm 2 The training process.

Initialize parameters.

for episode in MAX_EPISODES do
Put low-similarity samples into D₁;

Targets and obstacles appear randomly within the environment;

for step in MAX_EP_STEPS:

Learning and output action

β = a = μ (s_{t} | θ^{μ})

;
According to AUV’s heading and velocity, the AUV kinematic model is used to calculate its
position at the next moment;
Calculate the constraint C,

R = - C

;

if AUV reaches the target:

R = 1;

done = true;

step = 0;

break;

else if collides with obstacles:

R = −1

done = true;

step = 0;

break;

else:

on_goal = 0;
Store

(s, a, r, s^{'})

in D₀;
if AUV is in critical condition:
Select a certain number of samples from D₃ Randomly;
else if

D_{1} \neq null

:
Select a certain number of samples from D₁ and D₂ Randomly;
else if

D_{2} \neq null

:
Select a certain number of samples from D₀ and D₂ Randomly;
else
Select a certain number of samples from D₀ Randomly;
Update the critic network:

L (Q) = R + γ \max_{a} \bar{Q} (s_{t + 1}, μ (s_{t + 1}) | θ^{Q}) - \bar{Q} (s_{t}, a_{t} | θ^{Q})

;
if

L (Q) - \bar{Q} (s_{t}, a_{t} | θ^{Q}) > p

:
Put

(s, a, r, s^{'})

in D₂;
if AUV is in critical condition:
Put

(s, a, r, s^{'})

in D₃;
Update the actor network:

\nabla_{θ^{μ}} J = \frac{1}{N} \sum_{t} \nabla_{a} \bar{Q} (s, a | θ^{Q}) |_{s = s_{t}, a = μ (s_{t})} \nabla_{θ^{μ}} μ (s | θ^{μ}) |_{s = s_{t}}

;
Update the network parameters:

{θ^{Q}}^{'} = ρ θ^{Q} + (1 - ρ) {θ^{Q}}^{'}, {θ^{μ}}^{'} = ρ θ^{μ} + (1 - ρ) {θ^{μ}}^{'}

;

end for step;
end for episode;

After the training, complete obstacle avoidance and following strategies were obtained. The results are shown in Figure 5.

In Figure 5, the abscissa represents the episodes of training, whereas the ordinate represents the total reward of each episode. A total of 10,000 episodes are created during simulation training. Each training episode is updated with 500 steps. Although the learning curves of both artificial experience-DDPG and DDPG algorithm converges to a satisfactory value, the learning process of artificial experience-DDPG is more stable. It can be seen form Figure 5 that artificial experience-DDPG is more effective than conventional DDPG. Figure 5 shows that when the reward value converges to 230–250, the training is successful. In addition, in order for the AUV to learn all the motions, there is a strategy of randomly selecting the actions during the training. So, the curves in Figure 5 do not converge.

4.2. Simulation Experiment Platform

The architecture of the simulation experiment platform is shown in Figure 6. The system consists of several computers, which are visual simulation computers for the operation environment modeling system, motion planning computers for the operation planning system, and sensor data processor for the operation perception processing system. They integrate into a system via network switches. Motion planning control, environment awareness and information transmission of typical environment model are accomplished through network communication.

A total of three typical environmental models, the AUV dynamics and kinematics models, obstacle models, and target models, were established in the environmental modeling system. The perception processing system detects the target in the environment, filters the perception data, and sends the processed data to the planning system. The planning system plans behaviors and actions according to perceptual data, environmental map and the AUV real-time location. The planning instruction is sent to the environmental system. The AUV is controlled to move in the environment to realize target following and obstacle avoidance. When the sensing system detects a sudden obstacle or a moving obstacle, the planning system carries out re-planning to realize the real-time obstacle avoidance behavior.

4.3. Motion Planning Simulation Experiment

After the simulation platform was built, models of broad waters, waterway waters and harbor waters were established. Training experience was introduced, and simulation experiments were carried out respectively.

In broad waters, the AUV was assigned an area search task. The AUV was greatly influenced by topography. The AUV path following and obstacle avoidance were simulated. The result is shown in Figure 7.

In Figure 7, the yellow polygons represent the known environmental terrain. The green polygons represent sudden obstacles. The blue lines are the path of the global plan. The red lines indicate the AUV’s path. In this simulation experiment, the AUV path following and obstacle avoidance are realized. Figure 7 shows that in broad waters, the AUV actions can be correctly planned based on the algorithm designed in this paper, enabling the AUV to follow the path and avoid obstacles.

In the waterway, the task of transversal crossing was assigned to the AUV. The AUV was greatly influenced by passing ships. The AUV behavior of avoiding moving obstacles was realized. The result is shown in Figure 8.

In Figure 8, the yellow polygons represent the known environmental terrain. The blue polygons represent moving obstacles. The blue lines are the path of the obstacles. The red lines indicate the AUV’s path. The motion planning of AUV waterway crossing and obstacle avoidance is realized in the simulation experiment. Figure 8 shows that in the waterway, the algorithm designed in this paper can correctly plan AUV’s actions, enabling the AUV to navigate and avoid moving obstacles.

In the harbor waters, the task of patrolling was assigned to the AUV. The AUV was influenced by both terrain and stationary and moving obstacles. However, different from the obstacles with regular movement in waterway, the obstacles in harbor waters area were complex and irregular. AUV patrol and obstacle avoidance were realized. The result is shown in Figure 9.

In Figure 9, the yellow polygons represent the known environmental terrain. The blue polygons represent moving obstacles. The blue lines are the path of the obstacles. The red lines indicate the AUV’s path. In this simulation experiment, the AUV patrol and obstacle avoidance are realized. Figure 9 shows that in harbor waters, the algorithm designed in this paper can correctly plan AUV’s actions, enabling the AUV to realize patrol and obstacle avoidance.

To sum up, compared with the traditional algorithm, the algorithm designed in this paper has a great improvement in training efficiency. With the experience gained from algorithm training, the AUV motion planning under three typical environments can be well realized, so as to complete the scheduled tasks. In addition, other traditional methods, such as A* and ant colony algorithm, can only carry out global optimization and are easy to fall into a local minimum. By contrast, the method proposed in this paper can obtain the local obstacle avoidance strategy for AUVs, and the intelligence level of AUVs can be improved.

5. Field Tests

The applicability of the algorithm was verified by experiments in the field. The AUV system in the experiment was composed of sensing, planning and control system [16]. The AUV carried out motion planning under the guidance and supervision of global knowledge, and completed the assigned task. As the decision maker of the AUV, the planning system received the information of sensor and navigation system in real time, analyzed and gave corresponding planning instructions, and then sent them to the control system to control the movement of AUV. Within each step, the data of the AUV’s current state information (position information, attitude information, planning instructions, etc.) was recorded. Parameters of the AUV were shown in Table 1. The system hardware architecture of the AUV is shown in Figure 10. At the end of the test, the data results were derived to verify the technical indicators.

Area search experiment in broad waters was shown in Figure 11 which was completed in an inland river area in Qinghai Province, China. The experimental area is 1000 m × 1000 m. The maximum water depth is 30 m. The AUV navigated along the predetermined route to complete the search task in the test area. The test results are shown in the Figure 12.

The test results show that, based on the AUV motion planning algorithm designed in this paper, the system outputs planning instructions to the control system in real time, and the control system controls the AUV navigation, which can realize the task of the AUV path following in broad waters.

The test of the AUV following along the wall in the waterway was shown in the Figure 13, which was completed in Zhejiang Province, China. The experimental area is 1500 m × 50 m. The maximum water depth is 8m. The ranging sonar mounted by the AUV was used to detect the distance between the AUV and one side of the waterway in real time, and the distance was kept to 3 m. The ranging sonar mounted by the AUV is shown in Figure 14. The test results are shown in the Figure 14. The sonar parameters are shown in Table 2.

As shown in the Figure 15, based on the motion planning system designed in this paper, the AUV can maintain a distance of about 3 m from one side of the waterway. The reliability of the proposed planning system is verified.

6. Conclusions

In order to improve the AUV’s motion planning ability in underwater complex tasks, this paper proposes a three-layer AUV motion planning architecture. The logical order of AUV underwater motion planning is pointed out. In the motion critic network, the DDPG algorithm is improved. The AUV obstacle avoidance and target following motion planning ability is trained by combining with classified experience pool. Compared with the traditional DDPG algorithm, the training efficiency of this method is higher. The motion planning simulation test platform of the AUV under three typical environments was built. By using the strategy obtained from training and based on the motion planning architecture designed in this paper, the tasks of the AUV scanning in broad waters, navigating in waterway waters and patrolling in harbor waters were realized in the simulation experiment. The area search test and the waterway following test were carried out in the field. Experimental results were obtained to verify the reliability of the planning system.

Author Contributions

Conceptualization, X.R.; methodology, X.R.; software, X.R.; validation, Y.S., H.B., and G.Z.; formal analysis, X.R.; investigation, H.B.; resources, Y.S.; data curation, X.R.; writing—original draft preparation, X.R.; writing—review and editing, X.R.; visualization, X.R.; supervision, Y.S.; project administration, Y.S.; funding acquisition, Y.S., G.Z., and H.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Heilongjiang Province, grant number ZD2020E005, Financial support for Shaanxi Provincial Water Conservancy Science and technology program, grant number 2020slkj-5, and the China National Natural Science Foundation, grant number 51779057 and 51709061.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sun, Y.; Ran, X.; Zhang, G.; Wang, X.; Xu, H. AUV path following controlled by modified Deep Deterministic Policy Gradient. Ocean Eng. 2020, 210, 107360. [Google Scholar] [CrossRef]
Sun, Y.; Ran, X.; Zhang, G.; Xu, H.; Wang, X. AUV 3D Path Planning Based on the Improved Hierarchical Deep Q Network. J. Mar. Sci. Eng. 2020, 8, 145. [Google Scholar] [CrossRef] [Green Version]
Eichhorn, M. Solutions for practice-oriented requirements for optimal path planning for the AUV “SLOCUM Glider”. In Proceedings of the OCEANS 2010 MTS/IEEE, Seattle, WA, USA, 20–23 September 2010; pp. 1–10. [Google Scholar]
Sun, B.; Zhu, D.; Yang, S.X. An Optimized Fuzzy Control Algorithm for Three-Dimensional AUV Path Planning. Int. J. Fuzzy Syst. 2018, 20, 597–610. [Google Scholar] [CrossRef]
Sun, B.; Zhu, D.; Tian, C.; Luo, C. Complete Coverage Autonomous Underwater Vehicles Path Planning Based on Glasius Bio-inspired Neural Network Algorithm for Discrete and Centralized Programming. IEEE Trans. Cogn. Develop. Syst. 2019, 11, 73–84. [Google Scholar] [CrossRef]
Ramos, A.G.; García-Garrido, V.J.; Mancho, A.M.; Wiggins, S.; Coca, J.; Glenn, S.; Schofield, O.; Kohut, J.; Aragon, D.; Kerfoot, J.; et al. Lagrangian coherent structure assisted path planning for transoceanic autonomous underwater vehicle missions. Sci. Rep. 2018, 8, 1–9. [Google Scholar] [CrossRef] [PubMed]
Mahmoudzadeh, S.; Powers, D.M.; Sammut, K.; Atyabi, A.; Yazdani, A. A hierarchal planning framework for AUV mission management in a spatiotemporal varying ocean. Comput. Electr. Eng. 2018, 67, 741–760. [Google Scholar] [CrossRef] [Green Version]
Xu, Y.-H.; Yang, C.-C.; Hua, M.; Zhou, W. Deep Deterministic Policy Gradient (DDPG)-Based Resource Allocation Scheme for NOMA Vehicular Communications. IEEE Access 2020, 8, 18797–18807. [Google Scholar] [CrossRef]
Simpson, S.A.; Cook, T.S. Artificial Intelligence and the Trainee Experience in Radiology. J. Am. Coll. Radiol. 2020, 17, 1388–1393. [Google Scholar] [CrossRef] [PubMed]
Peters, J.; Schaal, S. Natural Actor-Critic. Neurocomputing 2008, 71, 1180–1190. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nat. Cell Biol. 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Watkins, C.J.C.H.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Givan, R.; Dean, T.; Greig, M. Equivalence notions and model minimization in Markov decision processes. Artif. Intell. 2003, 147, 163–223. [Google Scholar] [CrossRef] [Green Version]
Ferns, N.; Panangaden, P.; Precup, D. Metrics for Finite Markov Decision Processes. In Proceedings of the 20th Conference in Uncertainty in Artificial Intelligence, Banff, Canada, 7–11 July 2004; pp. 162–169. [Google Scholar]
Gibbs, A.L.; Su, F.E. On Choosing and Bounding Probability Metrics. Int. Stat. Rev. 2002, 70, 419–435. [Google Scholar] [CrossRef] [Green Version]
Sun, Y.; Ran, X.; Zhang, G.-C.; Wu, F.; Du, C. Distributed control system architecture for deep submergence rescue vehicles. Int. J. Nav. Arch. Ocean Eng. 2018, 11, 274–284. [Google Scholar] [CrossRef]

Figure 1. The hierarchy of motion planning.

Figure 2. The improved deep deterministic policy gradient (DDPG) algorithm.

Figure 3. Avoid static obstacles.

Figure 4. Avoiding dynamic obstacles.

Figure 5. Accumulation of training reward value.

Figure 6. Hardware architecture of the autonomous underwater vehicle (AUV) simulation platform.

Figure 7. Motion planning of patrol inspection in broad waters. (a) marine environment, (b) global planning, (c) path following and obstacle avoidance, (d) end of the test.

Figure 8. Motion planning of transversal crossing in waterway. (a) marine environment, (b) moving obstacle, (c) obstacle avoidance, (d) end of the test.

Figure 9. Motion planning of patrolling in harbor waters. (a) marine environment, (b) global planning, (c) path following and obstacle avoidance, (d) end of the test.

Figure 10. The system hardware architecture of the AUV.

Figure 11. Area search test.

Figure 12. The results of area search test.

Figure 13. The test of AUV following along the wall.

Figure 14. The ranging sonar.

Figure 15. The AUV test results for wall tracking.

Table 1. Parameters of the AUV.

Parameters	Value
Length	2060 mm
Maximum Diameter	400 mm
Weight	174.6 kg
Maximum Depth	200 m
Cruising Speed	≤2 m/s
Endurance Time	8 h
Video Storage Time(1080p/30fps)	≥10 h
Sensors	Fiber Optic Gyroscope (FOG)
	TCM5 magnetic compass
	Water Leak Sensor
	Power Supply Monitoring
	Ranging Sonar
	Depth Sensor
	Video Camera
Operating System	VxWorks5.5 (bottom layer)
Operating System	Windows 7 (surface layer)

Table 2. The sonar parameters.

The Sonar Parameters
Name: 200 kHz–50 kHz Underwater ranging transducer.
Model: DYW-50/200-NA.
Frequency: 200 kHz ± 5 kHz/50 kHz ± 3 kHz.
Range: 200 kHz 0.6–120 m./ 50 kHz 3–500 m.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ran, X.; Bian, H.; Zhang, G.; Sun, Y. Hierarchical Motion Planning of AUVs in Three Typical Marine Environments. Electronics 2021, 10, 292. https://doi.org/10.3390/electronics10030292

AMA Style

Ran X, Bian H, Zhang G, Sun Y. Hierarchical Motion Planning of AUVs in Three Typical Marine Environments. Electronics. 2021; 10(3):292. https://doi.org/10.3390/electronics10030292

Chicago/Turabian Style

Ran, Xiangrui, Hongyu Bian, Guocheng Zhang, and Yushan Sun. 2021. "Hierarchical Motion Planning of AUVs in Three Typical Marine Environments" Electronics 10, no. 3: 292. https://doi.org/10.3390/electronics10030292

APA Style

Ran, X., Bian, H., Zhang, G., & Sun, Y. (2021). Hierarchical Motion Planning of AUVs in Three Typical Marine Environments. Electronics, 10(3), 292. https://doi.org/10.3390/electronics10030292

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hierarchical Motion Planning of AUVs in Three Typical Marine Environments

Abstract

1. Introduction

2. The Hierarchy of Motion Planning

3. Motion Planning Algorithm Modeling

3.1. The Improved DDPG

3.2. Obstacle Avoidance Strategy Based on Artificial Experience

3.3. Sample Space Classification

3.4. Algorithm Model Establishment

4. Simulation Experiment

4.1. Contrast Experiment

4.2. Simulation Experiment Platform

4.3. Motion Planning Simulation Experiment

5. Field Tests

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI