Risk-Sensitive Markov Decision Processes of USV Trajectory Planning with Time-Limited Budget

Ding, Yi; Zhu, Hongyang

doi:10.3390/s23187846

Open AccessArticle

Risk-Sensitive Markov Decision Processes of USV Trajectory Planning with Time-Limited Budget

by

Yi Ding

¹

and

Hongyang Zhu

^2,*

¹

Maritime College, Guangdong Ocean University, Zhanjiang 524091, China

²

College of Mathematics and Computer, Guangdong Ocean University, Zhanjiang 524091, China

^*

Author to whom correspondence should be addressed.

Sensors 2023, 23(18), 7846; https://doi.org/10.3390/s23187846

Submission received: 5 August 2023 / Revised: 6 September 2023 / Accepted: 7 September 2023 / Published: 13 September 2023

(This article belongs to the Topic AI-Enabled Sustainable Computing for Digital Infrastructures: Challenges and Innovations)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Trajectory planning plays a crucial role in ensuring the safe navigation of ships, as it involves complex decision making influenced by various factors. This paper presents a heuristic algorithm, named the Markov decision process Heuristic Algorithm (MHA), for time-optimized avoidance of Unmanned Surface Vehicles (USVs) based on a Risk-Sensitive Markov decision process model. The proposed method utilizes the Risk-Sensitive Markov decision process model to generate a set of states within the USV collision avoidance search space. These states are determined based on the reachable locations and directions considering the time cost associated with the set of actions. By incorporating an enhanced reward function and a constraint time-dependent cost function, the USV can effectively plan practical motion paths that align with its actual time constraints. Experimental results demonstrate that the MHA algorithm enables decision makers to evaluate the trade-off between the budget and the probability of achieving the goal within the given budget. Moreover, the local stochastic optimization criterion assists the agent in selecting collision avoidance paths without significantly increasing the risk of collision.

Keywords:

trajectory planning strategy; trajectory optimization; Risk-Sensitive Markov decision processes

1. Introduction

Ship trajectory planning is an imperative task for navigation safety at sea. The goal of trajectory planning is to calculate a safe and comfortable trajectory for the unmanned agent to accomplish the intended driving task [1]. An Unmanned Surface Vehicle (USV) is a light intelligent surface-carrying unmanned piece of equipment with a small size, low cost, fast speed, and strong maneuverability [2]. Collision-free path generation is one of the core technologies of an USV’s autonomous navigation capability [3]. An USV’s collision avoidance algorithm can be considered a local path-planning algorithm. Some scholars have tried the design of a ship navigation safety domain to solve the ship collision problem [4,5,6]. However, most of them only consider static obstacles or semi-dynamic obstacles that do not change in the course [7,8]. A highly ideal motion model is used in collision avoidance [9]. Some collision avoidance studies ignored balance efficiency and effectiveness [10]. Some other researchers have evaluated the generated collision avoidance trajectories from the perspective of risk assessment [11], using multiple parameters such as navigation risk [12], navigation smoothness [13], and other metrics. In [14,15], the authors discussed ASV(Autonomous Surface Vehicle) developments in depth, while conflict detection and obstacle avoidance received a lesser amount of attention. Only a few studies related to reacting collision avoidance for unmanned ships were included in the paper [16]. Traditional obstacle avoidance algorithms mainly study the smoothness and safety of the trajectory without considering the time constraints [17], and the usual obstacle avoidance algorithms require iteration through all possible states [18], which greatly increases the computational complexity of the algorithms [19].

Recently, planning-based methods have been widely applied in various fields [20], including artificial intelligence [21], robotics [22], and operations research [23,24]. These methods are especially applicable in complex and dynamic environments where coordination of multiple actions is required to achieve the goal [25]. In the trajectory planning of Unmanned Surface Vehicles (USVs), improving image quality is crucial for effective navigation. Particularly in challenging environments such as sea fog, the visual system of USVs may be compromised. To address this issue, several studies have explored image-defogging techniques for nighttime conditions, aiming to enhance the image clarity of USV visual systems [26,27]. By integrating these techniques into USV trajectory planning, navigation performance in complex environments can be improved, ensuring safe and efficient operations. There are various methods of path planning available, such as path planning based on genetic algorithms, fuzzy logic, and optimization algorithms. For instance, path planning methods that use genetic algorithms can locate optimal routes in intricate environments by simulating biological evolution processes, such as an enhanced ant colony optimization algorithm for designing search and rescue routes [28,29]. However, this method may require extensive computational resources and time for optimization. In contrast, fuzzy logic-based path-planning techniques employ fuzzy reasoning and online learning, such as a WIP vehicle control approach based on an enhanced artificial potential field for multi-obstacle environments [30]. This method can adapt to changes and uncertainties in the environment. However, the method could be more arduous in modeling the environment’s geometric features and dynamical constraints [31]. Furthermore, there exist other path-planning methods such as the optimization-based RRT-A* path-planning method [32] which could require more effort in obtaining and modeling obstacle information of the environment and may have restricted adaptability to intricate and dynamic obstacle environments. However, the objective function of the USV trajectory planning problem is very complex and involves the processing of a large amount of different information [33]. Due to a large number of constraints and mathematical modeling difficulties, a number of problems remain to be solved in existing trajectory planning systems [34]. These include the handling of trajectory constraints [35], real-time performance [36], and the lack of trajectory substitution [37]. Inspired by the design of reliable systems, Yu et al. [38] proposed a class of planning problems called Risk Sensitive Markov Decision Processes (RS-MDP), where the goal is to find a policy that maximizes the probability of cumulative costs without exceeding a user-defined budget [39].

The USV travel process has distinct states, and each state corresponds to the same action space [40]. Thus, the process of optimizing an USV’s trajectory planning depends on the existing state, prediction of possible future collisions, optimization judgment, and the subsequent state selection based on the prediction. The idea is that the USV reaches its destination without collision and without exceeding the formulated time budget. The traveling process of the USV can be divided into different states, and the cost consumed for each state transition depends on the current state of motion of the USV and the time budget. The decision-making process stops irreversibly when the time required to generate a local budget reaches a certain point when the path trajectory exceeds expectations, or when collision avoidance is unsuccessful before reaching the destination [41]. In this state, no matter what action is taken, it results in a local path search failure, which is called a “dead end” for an USV. Therefore, this paper aims to determine a set of measures to prevent the USV from reaching a “dead end” between the start and goal states. The objective is to maximize the probability that the total cost (time consumption) of local path planning for the USV remains within a specified time constraint [42].

This paper’s principal contributions are based on the aforementioned concepts.

This paper proposes a standard method for probabilistic planning to optimize the trajectory planning of an USVs considering uncertain time costs. The approach utilizes a goal-oriented Markov decision process to find strategies that minimize the expected cumulative cost from the initial state to reach the goal.
In this study, it is assumed that the decision maker has a predetermined upper budget limit. However, we seek to investigate whether a lower budget would imply an acceptable probability of the optimal cost threshold in the policy, given the risk attitude. Therefore, the objective is to find the best balance between the danger of a situation leading to a dead end and the initial budget.
Trajectory optimization planning for USVs is accomplished by transferring states between action and transition probabilities. The optimization of USV trajectories is achieved by moving states between action and transfer probabilities. The problem being investigated consists of two phases. In the design phase, the number of time intervals and the potential states that can be produced within those intervals are determined. In the second phase, the USV is programmed to move from the start state to the goal state in a manner that maximizes the probability of the cumulative cost of local path planning for the USV being within a specific duration. To achieve this, a series of velocity angles and yaw angles must be determined.

This paper consists of the following sections: Section 2 presents the dynamic model of an USV based on the Risk-Sensitive Markov decision process (RS-MDP). Section 3 describes in detail the generation and evaluation of the optimal trajectory strategy with time constraints. Section 4 presents the results of computational experiments conducted to evaluate the proposed algorithm. Based on simulation experiments, a comparison was made between the effects of three different algorithms on the selection of a collision avoidance path for an USV, and the degree of excellence resulting from various collision avoidance factors in the path selection was analyzed. Section 5 contains the discussion of conclusions.

2. Problem Statement and Risk-Sensitive Markov Decision Process (RS-MDP)

2.1. Problem Description

This section describes a USV trajectory optimization planning problem with an uncertain time cost. The series consists of a set of

s = \{s_{0}, s_{1}, \dots, s_{t}\}

of USV position states, which performs a set of

A c t i o n = \{A_{0}, A_{1}, \dots, A_{t}\}

. The actions have a time cost and at the end of each action time, the USV moves to the next state. Each action produces a series of different next states, depending on the current state of the USV and the action to be chosen. In other words, the task processing time changes as the USVs select different actions.

The aim of this paper is to produce an execution strategy for USVs that maximize the probability of successfully completing the given task before the deadline. This implies generating trajectories with movement times shorter than the current time limit.

2.2. RS Markov Decision Process (RS-MDP) for USV

This section presents how the MDP can be applied to solve this problem. We consider the problem of planning a temporal constraint USV trajectory. Given a 2D static map of the environment representing occupied and free space, and a goal state, trajectory S. The trajectory S is a continuous function, and a goal-oriented Risk-Sensitive Markov Decision Process (RS-MDP) is defined as a tuple < S,

S_{0}

, A, P, C, G,

T_{δ}

>, where

S_{0}

is a finite state set; A is a finite action set; P is a transition function that returns the probability

P ({\tilde{S}}_{i}

,

a^{i}

,

{\tilde{S}}_{i + 1}

) of reaching state

{\tilde{S}}_{i + 1}

from state

{\tilde{S}}_{i}

when the USV applies action

a^{i} \in A

; C is a cost function that associates a cost

C (S_{i}, a^{i})

when action

a^{i} \in A

is executed in state

{\tilde{S}}_{i} \in \tilde{S}

; G is a set of goal states; and

T_{δ}

is the user-defined limited resources(e.g., time) associated with the trajectory, which in this paper refers to the time spent traveling the trajectory. We assume that the goal state is an absorbing state with zero cost and that non-goal absorbing states have positive costs.

2.2.1. A Set of Possible States of RS-MDP

In this section, we begin by formalizing the class of MDP problems in which we are interested. With the circle group T, the trajectory S is a continuous function that maps time to the USV position and the heading(yaw) angle

ψ

. For certain applications, such as mapping and surveillance, one may opt to define a sequence with

m - 1

intermediate waypoints. We denote

\tilde{S}

= [

{\tilde{S}}_{0}

,

{\tilde{S}}_{1}

, …,

{\tilde{S}}_{i}

, …

{\tilde{S}}_{m - 1}

,

{\tilde{S}}^{e n d}

] to be attained in order.

{\tilde{S}}_{0} \in \tilde{S}

is the initial state,

{\tilde{S}}^{e n d} \in \tilde{S}

is the final state. We denote

s_{r} (i)

as the position of the USV,

v (i)

as the instantaneous velocity of the USV at its current position

s_{r} (i)

, and

s_{ψ} (i)

as the heading angle

ψ

of the USV at state

S (i)

.

\{\begin{matrix} S (i) = {[{s_{r} (i)}^{T} s_{ψ} (i), v (i)]}^{T} \\ {s_{r} (i)}^{T} = {[S_{x} (i) S_{y} (i)]}^{T} \end{matrix}

(1)

2.2.2. Action Function of RS-MDP

An action A is a set of all possible actions.

A (s)

defines the set of actions that can be taken in state S. An action

a^{i} = < △ v^{i}, △ ψ^{i} >

defines the deterministic transition between states

s_{r} (i) \overset{a^{i}}{\to} s_{r} (i + 1)

, and

a^{i} \in A

. The conversion of the USV angle between two adjacent states is given in detail in Figure 1.

S_{ψ}^{i}

is the USV heading angle at waypoint

S_{i}

. Figure 1 shows the process of state

s_{r} (i)

to state

s_{r} (i + 1)

is the trajectory of the USV from waypoint

S_{i}

to waypoint

S_{i + 1}

,

∡ μ

represents the minimum angle at which the USV must turn to avoid a collision, which is considered as a threshold velocity of collision avoidance.

∡ μ

can be obtained by the Velocity Obstacle model [43].

△ ψ^{i}

is the USV heading angle turned from state

s_{r} (i)

to state

s_{r} (i + 1)

. We define the following constraints in the angle of a state on a feasible trajectory:

\{\begin{matrix} △ ψ^{i} = ∠ S_{ψ}^{i + 1} - ∠ S_{ψ}^{i} \\ ∡ μ \leq △ ψ^{i} \end{matrix}

(2)

The trajectory generation is described by three degrees of freedom, where

(x, y, ψ)

is the state of the waypoint

S_{i}

. An action

a^{i} = < v^{i}, △ ψ^{i} >

defines the deterministic transition between states

s_{r} (i) \overset{a^{i}}{\to} s_{r} (i + 1)

, and

v^{i} \in [0, v_{m a x}^{i}]

,

v_{m a x}^{i}

is the maximum velocity the USV can reach from state

S_{i}

, and the action

a^{i}

represents the change of motion of the USV, calculated as Equation (3), where

ν^{i} \in a^{i}

is the velocity taken by the USV at state

s_{r} (i)

.

\begin{matrix} S_{x} (i + 1) = S_{x} (i) + ν^{i} cos ψ \\ S_{y} (i + 1) = S_{y} (i) + ν^{i} sin ψ \end{matrix}

(3)

Typically, since there are many actions available on state

S^{i}

. In order to measure the extent achieved by the USV after performing a certain action, we summarize five types of actions, according to the rules of collision avoidance, which are:

A_{1} (△ v^{i} > 0, △ ψ^{i} = 0)

describes the acceleration of the USV in the current direction;

A_{2} (△ v^{i} < 0, △ ψ^{i} = 0)

characterizes the deceleration of the USV in the current direction;

A_{3} (△ v^{i} < 0, △ ψ^{i} > 0)

depicts the USV deceleration and steering to avoid collisions;

A_{4} (△ v^{i} = 0, △ ψ^{i} > 0)

describes the USV maintaining constant speed and steering to avoid collisions; and

A_{5} (△ v^{i} = 0, △ ψ^{i} = 0)

gives a description of the USV traveling in the current direction, at a constant speed. The USV starts to move from the initial state and stops gently at the end of the time or path, while the acceleration/deceleration of the USV is limited to safe navigation. According to the International Regulations for Preventing Collisions at Sea (COLREGs), ships should choose to pass from the right side (i.e., their own left side) of an oncoming ship [44]. Therefore, all heading angle changes defined in this paper are greater than or equal to zero.

2.2.3. The Cost Function of RS-MDP Search Strategies

Typically, since there are many actions available on state

S^{i}

. In order to measure the extent achieved by the USV after performing a certain action, we use a utility function to evaluate it. In this paper, combining the core idea of USV trajectory planning: USVs always drive to the position with a high safety factor and close to the destination, which is consistent with the stochastic Boltzmann policy [45] in the thermodynamic principle: molecules all tend to move to the position with low energy consumption. Therefore, in order to predict alternative paths of the USV to the destination and to allow deviations from the optimal policy, this algorithm will combine a stochastic Boltzmann policy to relax the constraint and obtain the best matching way of the actions, i.e., the optimal policy, denoted as

ψ_{g}

, which assigns a probability to each action a to be executed in state s, as shown in Equations (4) and (5). The USV in state i finds the successor state node

S_{i + 1}

through the probability p in Equation (4).

C (a^{i})

is constructed as a weighted sum of the Euclidean distance covered by a, and the unitary cost of the target state

C (s)

, provided by the optional input semantic map.

\begin{matrix} p (S^{i + 1} | S^{i}, a) \propto e^{- C (s^{i}, a) / K} \end{matrix}

(4)

\begin{matrix} C (s_{r}^{i}, a^{i}, s_{r}^{i + 1}) = C (S_{r}^{i + 1}) + C (S_{r}^{i + 1} | a^{i}, S_{r}^{i}) \end{matrix}

(5)

States transfer probability, as shown in Equation (6).The probability of a history

h_{π}

is given by Equation (7):

P^{π} (s^{'} | s) = \sum_{a \in A} π (a | s) P (s^{'} | s, a)

(6)

P (h_{π}) = \prod_{t = 0}^{\infty} P^{π} (s^{'} | s)

(7)

A policy for an RS-MDP is a function

π

, and

π (s^{t}, t) : S \times A ⟼ [0, 1]

is the rule for selecting actions at time t, which specifies which action

a^{i}

to take for each state

s_{r} (i)

and time step. A history

h_{π}

= (

{\tilde{S}}^{s t a r t}

,

{\tilde{S}}_{1}

, …,

{\tilde{S}}_{i}

, … ) for the policy

π

is a valid sequence of states obtained by executing policy

π

starting from state

S^{s t a r t}

. The historical cumulative cost on trajectory

{\tilde{S}}^{i}

is denoted as

C (h_{π})

, note that policy

π

is allowed to pick an action a in state s that exceeds the current available time. In Equation (8), we denote t=

[t_{1}, \dots, t_{m}]

as the time allocation over trajectory segments, e.g.,

t_{j}

is the time allocated for the segment between states or waypoints (note that we see waypoint i and the state

S_{i}

as the same)

S_{i - 1}

and

S_{i}

. When planning trajectories through obstacle-rich environments, it may be preferable to incorporate collision avoidance constraints instead of states. For each t in the definition domain

([0, t_{m a x}])

, there are states s that satisfy the following conditions: kinematic constraints on the USV, collision constraints, and physical limitations of the USV.

\sum_{j = 1}^{i} t_{j} = T^{i}, i = 1, 2, \dots, m - 1

(8)

The time constraint for the transition between successor states is given by Equation (8).The cumulative cost of history

h_{π}

is given by:

\{\begin{matrix} C (h_{π}) = \sum_{i = 0}^{\infty} C (s^{t}, π (s^{t}, t)) \\ C (s^{t}, a^{t}) > 0 \\ a^{t} \in π (s^{t}, t) \end{matrix}

(9)

The deterministic transition between states s in the Markov reward process. The strategy

π

satisfies the following equation. The probability of taking action a:

P (h_{π}) = \prod_{t = 0}^{\infty} P (s^{t}, π (s^{t}, t), s^{t + 1})

(10)

The set of all histories of policy

π

that ends in a goal state is denoted as

H_{π}

. From Equations (7)–(10), we obtain the probability of policy

π

generating a history that starts at state s and reaches a goal state without exceeding a user-defined voyage time

T_{δ}

.

P^{π} (s, δ) = \sum_{h_{π} \in H_{π}, s^{0} = s, c (h_{π}) \leq T_{δ}} P (h_{π})

(11)

Alternatively, the above probability is called the cost-threshold probability for policy

π

from states s and the restricted time

T_{δ}

. When no histories satisfy the time constraint, the probability is zero. The optimum cost-threshold probability is then defined as follows for a given state s and constraint time

T_{δ}

:

P^{*} (s, T_{δ}) = max_{π} P^{π} (s, T_{δ})

(12)

Figure 2 shows an example of a RS-MDP. Figure 2a illustrates the state transition diagram for an example of a RS-MDP with four states. It displays potential transitions between states as well as the corresponding actions that can be executed. Nodes in the diagram represent states, and they are labeled as

s_{0}, s_{1}, s_{2},

and

s_{3}

. The arrows that link the nodes represent the state transitions, displaying the different possible routes that the system can take. Each transition has an action and cost linked to it. To illustrate, we can examine the transition between states

s_{0}

and

s_{1}

. This transition is represented by the arrow labeled as ‘action

a_{1}

’ as shown in Figure 2b. This transition indicates that the system can progress from state

s_{0}

to state

s_{1}

by taking the action

a_{1}

, which incurs a specific cost.

In Figure 2b,

S = (s_{0}, s_{1}, s_{2}, s_{3})

,

G = (s_{3})

,

s_{0}

is the initial state,

A = (a_{1}, a_{2})

, the transition function is depicted as labeled arcs, the cost function

C (s, a)

annotates outgoing arcs and

T_{δ} = 2

. The different policies are due to the uncertainty of the cost function and time prediction. For instance, we can obtain some polices such as:

{s_{0}, a_{1}, s_{1}, a_{2}, s_{3}}

,

{s_{0}, a_{2}, s_{2}, a_{2}, s_{3}}

,

{s_{0}, a_{1}, s_{1}, a_{1}, s_{2}, a_{2}, s_{3}}

. Actions

a_{i}

are represented by squares and transitions

P (s, a, s^{'})

are represented by arcs. As the constraint time grows, the number of arcs for an augmented state increases. Since more budget is available, more actions can be applied.

The RS-MDP transitions the action space by adjusting the control inputs of the unmanned vessel. Adjusting the acceleration and heading angle changes the motion behavior of the unmanned vessel, thus controlling the increase or decrease of its speed. Acceleration adjustments allow the USV to increase or decrease its speed, or even come to a complete stop. The heading angle is the angle of the USV relative to the reference direction. Adjusting the heading angle can change the direction of the USV’s sailing path. Modifying the heading angle allows the USV to move in a different direction. By adjusting these control inputs, the USV can perform a variety of actions, transition between different states, and successfully achieve goals. For example, in a given state, increasing the acceleration and modifying the steering angle can cause the USV to move towards a goal state. Upon reaching the target state, the USV can be stabilized by appropriately decreasing the acceleration and modifying the steering angle.

3. Trajectory Planning Based on MHA Algorithm

Trajectory planning involves moving from point A to point B while avoiding collisions over time. This can be calculated using both discrete and continuous methods. In essence, trajectory planning includes path planning as well as planning how to move based on velocity, time, and kinematics. For the path requirements, both velocity and acceleration are continuous. Here,

{\tilde{S}}^{s t a r t}

and

{\tilde{S}}^{e n d}

represent the start and end points, respectively, and each consists of a prescribed position and heading angle. The motion trajectory points of the USV can be determined from the state collection by considering the heading of the current node and the position and heading of the selected waypoint (next node). Figure 3 illustrates the motion planning flow of the USV operating in free space.

The distribution shows that states with low time costs of consumption always have a higher chance of being occupied.

\begin{matrix} C (S^{i}, a) = (λ + 1) (α - 1) \end{matrix}

where

\begin{matrix} α = \{\begin{matrix} 1 & i f dist (S_{x, y}^{i + 1}, g o a l) = 0 \\ \infty & i f dist (S_{x, y}^{i + 1}, o b s t a c l e) \leq R \\ \frac{dist (S_{x, y}^{i + 1}, g o a l)}{dist (S_{x, y}^{i + 1}, o b s t a c l e)} & e l s e \end{matrix} \end{matrix}

(13)

Equation (13) shows that

dist (ψ_{0}, ψ_{t}) = \sqrt{{(x_{t} - x_{0})}^{2} + {(y_{t} - y_{0})}^{2}}

, the value is the Euclidean distance between the two points. When

dist (S_{x, y}^{i + 1}, g o a l) = 0

, it means that the USV has already reached the destination, and at this time no cost is consumed to arrive with probability 1.

λ \in [0, 1]

are the obstacle densities in the

S^{i + 1}

state, R is the anti-collision threshold of the USV for the integration of the speed barrier algorithm with the ship safety domain [46]. If

dist (S_{x, y}^{i + 1}, o b s t a c l e) \leq R

, it means that the USV has collided with an obstacle in that state. At this time the cost is infinite and cannot be chosen.

C (S^{i + 1} | a, S^{i}) = C (a) + C (S^{i})

(14)

3.1. RS-MDP for Initialized Cost Threshold Probabilities

Unlike a standard goal-oriented MDP, the optimal policy for the RS-MDP is not stationary, which prevents the direct application of standard MDP algorithms. Seeking to improve the convergence of dynamic programming for the RS-MDP, we develop more efficient solutions for the RS-MDP under the local stochastic optimization criterion in this study. Equation (15) denotes the probability of transition to state

s^{'}

after the execution of action a in state s, while

θ

represents the time budget cost set by the user. In path-planning problems,

θ

acts to restrict the amount of time expended on path planning, thereby limiting the path-planning algorithm to identifying the most effective route within a predetermined time frame. This probability is identified as

P (s, θ, a, s^{'})

and can be assessed through the examination of past data or by utilizing the system model. In a path-planning problem, this probability may be represented as the probability of moving to state

s^{'}

following the selection of action a in state s.

P^{*} (s, θ) = max_{a \in A} \sum_{s^{'} \in S} P (s, θ, a, s^{'})

where:

P (s, θ, a, s^{'}) = \{\begin{matrix} 0 & if C (s, a) > θ, \\ 1 & if C (s, a) \leq θ and s^{'} \in G, \\ P (s, a, s^{'}) P^{*} (s^{'}, θ - C (s, a)) & if C (s, a) \leq θ and s^{'} \notin G . \end{matrix}

(15)

Local Stochastic Optimization is able to plan increasingly fast trajectories as the model improves by classifying candidate trajectories as either feasible or infeasible.The optimal policy for every budget level that is equal to or smaller than the constrained time

T_{δ}

is defined by

\begin{matrix} π_{T}^{*} (s) \in arg max_{π} P^{π} (s_{0}, T), for each T \in [0, T_{δ}] \\ s . t . \{\begin{matrix} S (0) = {\tilde{S}}^{s t a r t} \\ S (T) = {\tilde{S}}^{e n d} \\ S \in ξ_{p a t h} \cap ξ_{a l l} \\ \sum_{} T \leq T_{δ} \\ \dot{S (T)} = 0 \end{matrix} \end{matrix}

(16)

P^{*} (s, T) = P^{*} (s, T_{j}), for T_{j - 1} \leq T < T_{j}

(17)

Equations (15)–(17) illustrate the relationship between the probabilistic models in trajectory planning. Equation (15) evaluates historical probability given a policy, Equation (16) computes the probability of the optimal path given a time restriction, and Equation (17) estimates the overall optimal probability of the trajectory by computing the probability of the optimal path across distinct time intervals. Together, these equations form the computational process of probabilistic modeling in trajectory planning.

r (s, a, s^{'}) = \{\begin{matrix} - 1 & if C (s, a) > θ \\ \frac{P (s, θ, a, s^{'})}{C (S^{'} | a, S)} & if C (s, a) \leq θ and s^{'} \in G, \\ - \frac{P (s, θ, a, s^{'})}{C (S^{'} | a, S)} & if C (s, a) \leq θ and s^{'} \notin G . \end{matrix}

(18)

In Equation (18),

r (s, a, s^{'})

represents the reward value based on the given conditions.

\frac{P (s, θ, a, s^{'})}{C (s^{'} | a, s)}

is the ratio of the accessibility of state s to state

s^{'}

to its cost. It can be used to measure the efficiency of travelling from state s to state

s^{'}

, which is the probability of reaching state s for a unit cost. The higher the ratio, the more efficient it is to arrive in state s from state

s^{'}

. The function

r (s, a, s^{'})

returns a reward value based on this ratio, as well as other conditions. Specifically, if the cost of action a from state s is greater than threshold

θ

, the reward value is set to

- 1

. If the cost is less than or equal to

θ

and the resulting state

s^{'}

is in the goal set G, the reward value is set to

\frac{P (s, θ, a, s^{'})}{C (S^{'} | a, S)}

. On the other hand, if the resulting state s is not in the goal set G, the reward value is set to

- \frac{P (s, θ, a, s^{'})}{C (S^{'} | a, S)}

.

Algorithm 1 describes how the choice of the probability distribution of the initial cost threshold affects the search space and the convergence performance of the algorithm. The effectiveness and performance of the optimization algorithm can be improved by choosing the probability distribution of the initial cost threshold appropriately.

Algorithm 1 Initialized cost threshold probabilities

Input:: Initial state s, goal state g, time limit T
Output:: Optimal path $p^{*}$ to reach g
1:: Initialization
2:: Set s as the current location of the USV
3:: Set g as the destination of the USV
4:: for t = 1 to T do
5:: for a in possible actions do
6:: Observe next state set $s^{'}$ and reward $r (s, a, s^{'})$
7:: Add experience $(s, a, r, s^{'})$ to experience replay buffer
8:: end for
9:: Update cumulative cost $C (s^{t}, π (s^{t}, t))$ and policy function $π (s)$ using experience replay buffer and RS-MDP algorithm
10:: Select action a with lowest C(s, a)-value from policy function $π (s)$ for current state s
11:: Move USV to next state $s^{'}$ and receive reward $r (s, a, s^{'})$
12:: if $then s = g$
13:: Break out of the loop
14:: end if
15:: end for
16:: Return optimal path $p^{*}$ to reach g

3.2. Adaptive Threshold Search Algorithm for RS-MDP

We propose a version of the classical dynamic programming solution for the MDP, called the Adaptive Threshold Search algorithm, which is based on the concept of LOS. This algorithm is an improved version of LOS for the RS-MDP in the augmented state space, which addresses the finite-policy MDP problem with reinforcement learning. By generating only augmented states that belong to histories that end in a goal state and by verifying probability convergence at a given budget at each iteration, we demonstrate how LOS efficiency can be improved.

3.2.1. Value Iteration for RS-MDP

Since the optimal cost threshold probability is the immovable point of the Bellman equation for the RS-MDP (Equation (16)), we can compute it by iteratively applying the following Bellman update rule (estimated cost threshold probability at iteration i) until (near) convergence, starting from an acceptable probability. An overview of the algorithm is shown in Algorithm 2. The main parameters involved are as follows:

ϵ

is error tolerance. It represents the allowed range of error in the final result. The algorithm stops iterating when the difference between

P (s, θ)

and

P^{*} (s, θ)

is less than

ϵ

. The algorithm searches and calculates the state transition probabilities, continuously updating the expected time of the best path for each state and time combination. It decomposes the problem into a series of state transitions and action choices, selecting the action with the

v a l_{m a x}

(highest “val” value) as the best path’s expected time for the current state. Whether there exists a policy with a cumulative reward less than or equal to the upper bound threshold.

Algorithm 2 Adaptive Threshold Search algorithm

(ϵ, θ)

1:: Input:
2:: $ϵ$ : a small positive value representing the desired accuracy
3:: $θ$ : a parameter used for threshold calculation
4:: Output:
5:: True if a condition is met, False otherwise
6:: for all $((s, θ) \in T)$ do
7:: $P^{*} (s, θ) = 0$
8:: end for
9:: $c_{m a x} : = max_{s, a \in A} C (s, a)$
10:: if $C H E C K P (θ - 1, c_{m a x}) > 0$ then
11:: return ture
12:: end if
13:: for all $((s, θ) \in T)$ do
14:: $v a l_{m a x} = 0$
15:: end for
16:: for all $a \in A$ do
17:: $v a l = 0$
18:: end for
19:: for each $T \in T^{θ}$ do
20:: repeat
21:: $i : = 0$
22:: $θ : = 0$
23:: for all $((s, θ) \in T)$ do
24:: $P^{i + 1} (s, θ) : = \max_{a \in A} \sum_{s^{'} \in S} \{\begin{matrix} 0 & if C (s, a) > θ \\ P (s, a, s^{'}) * SEARCH P (s^{'}, θ - C (s, a)) & if (s^{'}, θ - C (s, a)) \notin T, C (s, a) \leq θ \\ P (s, a, s^{'}) * P^{i} (s^{'}, θ - C (s, a)) & if (s^{'}, θ - C (s, a)) \in T, C (s, a) \leq θ \end{matrix}$
25:: if $|P^{i + 1} (s, θ) - P^{i} (s, θ)| > δ$ then
26:: $v a l = v a l + T (s, a, s^{'}) \cdot P (s^{'}, θ - C (s, a))$
27:: $δ : = |P^{i + 1} (s, θ) - P^{i} (s, θ)|$
28:: end if
29:: $i : = i + 1$
30:: end for
31:: until $δ < ϵ$
32:: for all $(s, θ) \in T$ do
33:: $P^{*} (s, θ) : = P^{i} (s, θ)$
34:: end for
35:: end for
36:: return false

3.2.2. Cost-Threshold Probability Convergence

Algorithm 3 verifies that the probability model in the USV path-planning problem satisfies the collision avoidance risk and cost consumption preferences. It ensures the consistency of the probabilistic model by comparing the values of the probability transfer function that are computed at two different thresholds. Algorithm 4 sets thresholds by searching for distinct combinations of thresholds and states to locate the probability transfer function that meets specific requirements. It determines the threshold value at which the probability transfer function of all states complies with the specified condition. This process assists in selecting the appropriate threshold value to ensure the optimal performance of the path-planning model in diverse states.

Algorithm 3 CHECK

P (s, θ)

1:: if $θ < c_{max}$ then
2:: return false
3:: end if
4:: for all $s \in S$ do
5:: if $P (s, θ) \neq P (s, θ - c_{max})$ then
6:: return false
7:: end if
8:: end for

Algorithm 4 SEARCH

P (s, θ)

1:: for all $(s^{'}, θ^{'})$ in decreasing sort by $θ^{'}$ do
2:: if $P (s, θ) \neq P (s, θ - c_{max})$ then
3:: return false
4:: end if
5:: end for

4. Simulations and Discussion

Two algorithms, dynamic window approach with virtual (DWV) [17] and State Lattice Planner (SLP) [45], are considered in this study, where the DWV and SLP are considered conventional methods, and the MHA is envisioned as a proposed improved method. The simulations will show the quality of the operation of such an algorithm. This paper observed a collision-avoiding simulation using a minimum safety domain

d_{s a f e}

for vessels. Moreover, the influence of navigational behaviors and environmental impacts (wind and currents) is ignored in the modeling process. We set

ϵ = 0.01

as the residual error, and all the algorithms in this study run on MATLAB 2019a.

Table 1 shows the initial parameters of the USV. The optimal offset of the USV heading angle is calculated under the condition of constant speed, the expansion of the barrier is set to 0.8 m.

4.1. Optimization of Trajectory Planning with Limited Budgets

As shown in Figure 4a, the random map size is

210 \times 210

(grid size), the blue point is the initial position of the USV, and the red point is the goal. The start and goal positions are (USVxstart, USVystart) = (30.0, 15.0) and (GBxgoal, GBygoal) = (180.0, 200.0).

Figure 4b illustrates the gradient colors used to represent the velocity magnitude of the USV at different positions. This visual representation provides an intuitive illustration of how the USV velocity changes in the given path. Faster speeds are indicated by lighter colors in the bit-shift representation, whereas darker colors indicate slower speeds. This coding method swiftly obtains the USV’s velocity change throughout the designated path. The correlation between colors and speeds is exploited to precisely identify likely speed bottlenecks or congested areas, enabling the assessment and optimization of path-planning models.

According to the initial settings of parameters in Table 1, parameters are shown in Table 2, which displays simulation results in different time budgets, including the success rate, trajectory length (TL), and the average travel time. The algorithm was tested in 50 iterations where successful planning had to be completed within 100 ms. We set

ϵ = 0.01

as the residual error.

Table 2 provides a correspondence between the path length and the time budget to show how the path changes with different time budgets, offering guidance for path-planning decisions in practical applications. Table 2 presents the metrics connected with generating trajectories, such as trajectory length, time, and cost. State node utilization is a measure of the planning algorithm’s efficiency, calculated by dividing the algorithm utilization by the planned nodes and total sampled nodes. The performance of the algorithm is evaluated using the algorithm duration and success rate. Reducing the time budget significantly lowered the node sampling rate to 90.84% in the test results. In addition, frequent computational timeouts led to a reduced success rate of 90%. Moreover, the algorithm processing time was longer (89.06 ms).

Figure 5 illustrates two different paths based on two different time budgets. Figure 5a displays the path map under various time budgets. Color-coded path lines clearly reveal the optimal path selection of the USV under various time constraints. The path length does not differ significantly under a time budget of 45 s. This implies that the USV produces optimum paths at high speeds by selecting the state node with the smallest steering angle as often as possible, guaranteeing safe navigation and collision avoidance while reaching the target location. The path becomes flatter with a longer time budget of 65 s. This enables the USV to explore the area for a more extended period and choose flatter routes to gather additional information or complete intricate tasks.

The gradient colors of the displacement trajectory curves in the displacement transfer representation graph in Figure 5b represent the USV’s velocity magnitude at different points, illustrating how the USV’s speed varies along the path. Slower speeds are represented by lighter colors, and faster speeds are indicated by darker colors in the displacement transfer representation graph. As the time budget decreases, there is a higher likelihood of the USV selecting the maximum speed to traverse the path. However, the path length does not reduce correspondingly, given the low probability of the USV changing its heading angle, which is set for collision avoidance risk preference.

4.2. Comparative Algorithm Simulation Experiments

Figure 6 shows the evolution of four moving features during simulation in the time budget for 65 s. Figure 6a demonstrates that a change in the Course Angle parameter of the DWV has a minimal effect on path planning. The text suggests that the DWV prioritizes the stability of the heading over the curvature of the path. The Course Angle parameter has a greater impact on path planning in the SLP. The statement suggests that the SLP prioritizes path curvature to increase its flexibility in adapting to changes in the environment. When comparing the first two algorithms, it was found that in the MHA, the variation of the Course Angle parameter has a moderate effect on path planning. The statement suggests that the MHA achieves a balance between heading stability and path curvature.

Figure 6b shows that the DWV significantly affects path planning when the Heading Angle parameter varies. This indicates that the DWV is highly sensitive to the Heading Angle and fluctuates more when the Heading Angle varies. The SLP, in contrast to the DWV, has a lower impact on path planning when the Heading Angle parameter changes. This suggests that the SLP prioritizes maintaining heading stability in the path trajectories over fine-tuning the heading angle, which may result in less success within the predetermined time budget. The MHA demonstrates a moderate effect on path planning when the Heading Angle parameter changes. This highlights that the MHA achieves a balance between adjusting the heading angle and maintaining heading stability.

By combining Figure 6c,d, we can observe that in the DWV, higher translational acceleration and velocity parameters lead to a faster acceleration of the USV, resulting in faster arrival at the target position. This is useful in situations that require rapid arrival. The SLP prioritizes smooth acceleration and deceleration processes with less emphasis on the changes to translational acceleration and velocity parameters during path planning. This is useful in situations where smoothness is a priority. The MHA considers both speed and stability, balancing the impact of changes in translational acceleration and velocity parameters during path planning. This results in moderate effects on path planning, ensuring both speed and smoothness. This approach is useful in situations where both speed and stability are important.

It can be observed in Table 3 that the MHA algorithm outperforms other algorithms with its high success rate and low average angular velocity integral under different time budgets. The SLP algorithm performs better in terms of the average angular velocity integral but is relatively poor in terms of success rate and average time. The DWV algorithm has poor performance in terms of average path length. On the other hand, the MHA algorithm has the highest success rate under all-time budgets. This can be attributed to its use of state-space transfer probabilities in path planning, which optimizes the sampling of state nodes, thereby enhancing the effectiveness of the search space. Trajectory evaluation is a simultaneous consideration of multiple objectives such as shortest path, obstacle avoidance, and minimum angular velocity integral. Considering these objectives together, the MHA algorithm can find better paths, which in turn increases the success rate. The SLP algorithm’s relatively low success rate can be attributed to its use of a simplified single-objective optimization approach in path planning. Since the SLP algorithm considers only a single objective (e.g., shortest path or obstacle avoidance) without comprehensively analyzing other factors, it is unable to find optimal paths in complex scenarios and thereby leads to a reduced success rate.

The average time of the MHA algorithm under the three time budget requirements is 34.00 s, 59.772 s, and 76.278 s, respectively, which all satisfy the requirement that the trajectory sailing time does not exceed the time budget. This is because the MHA algorithm can dynamically adjust the weight allocation of the state node selection according to the different scenarios and requirements. This can balance the trade-offs between different goals and find the appropriate path efficiently.

In all available time budgets, the DWV algorithm has the longest average path length. On the other hand, the MHA algorithm more accurately predicts future states and, as a result of its ability to estimate the transfer probability of USVs taking different actions, plans paths more efficiently. Achieving the above requires precise modeling of state transfer and elimination of unnecessary path nodes. The SLP algorithm has a high average angular velocity integral of 2.194 deg/s in all available time budgets. Conversely, the MHA algorithm combines its objectives in path planning: it determines the shortest path and sets a minimum steering angle threshold. This implementation results in a lower average angular velocity integral. Consequently, the MHA algorithm identifies paths that are both quick and comfortable, thereby lowering the angular velocity integral.

5. Conclusions

This study uses a Risk-Sensitive Markov decision process model to propose a method for generating a collection of states within the search space of collision avoidance for USVs. It plans the movement routes using an improved reward and cost function. The results of the simulation experiments are analyzed, and it is found that the MHA algorithm allows decision makers to make trade-offs between budget and objectives when planning trajectories within budget. The local stochastic optimization criterion assists USVs in selecting collision avoidance paths without significantly increasing risk. The MHA algorithm outperforms in complex environments. In complex environments where many obstacles and constraints such as narrow passages and different types of obstacles are present, the MHA algorithm takes these factors into account and finds the optimal path that meets the time budget while adapting to the collision avoidance needs. However, the methodology is subject to limitations and constraints. Firstly, additional research is required to determine the optimal cost threshold probabilities under a low budget in order to achieve the best balance between budget and objectives. Secondly, our approach may have limitations in addressing uncertainty, particularly in complicated and dynamic environments. Additionally, we have neglected to consider the influence of navigation behaviors and environmental factors on the model, which necessitates further investigation into their incorporation. Lastly, our study requires validation through field experiments to evaluate its performance and feasibility in practical settings. Therefore, future research could overcome these limitations and enhance the applicability and precision of the approach.

Author Contributions

Conceptualization, Y.D. and H.Z.; methodology, Y.D.; software, H.Z.; validation, Y.D. and H.Z.; formal analysis, Y.D.; resources, Y.D.; data curation, Y.D.; writing—original draft preparation, Y.D.; writing—review and editing, H.Z.; supervision, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Program for the Project of Science and Technology of Zhanjiang City, grant number: 2022B01103, Guangdong Ocean University Teaching Quality Project, grant number: YSZL-201922058-JXD, and the National College Students Innovation and Entrepreneurship Training Program of Guangdong Province, grant number: S202210566074, the National Natural Science Foundation of China, China (grant number: 52171346), the Natural Science Foundation of Guangdong Province, China (grant number: 2021A1515012618), the special projects of key fields (Artificial Intelligence) of Universities in Guangdong Province (grant number: 2019KZDZX1035), China, and program for scientific research start-up funds of Guangdong Ocean University, China.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

RS-MDP	Risk-Sensitive Markov Decision process
DWV	Dynamic window approach with virtual
SLP	State Lattice Planner
USV	Unmanned Surface Vehicle

References

Qu, C.; Gai, W.; Zhang, J.; Zhong, M. A novel hybrid grey wolf optimizer algorithm for unmanned aerial vehicle (UAV) path planning. Knowl.-Based Syst. 2020, 194, 15–35. [Google Scholar] [CrossRef]
Stateczny, A.; Gierlowski, K.; Hoeft, M. Wireless Local Area Network Technologies as Communication Solutions for Unmanned Surface Vehicles. Sensors 2022, 22, 655. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Chen, D.; Liao, W.; Liang, Z. Autonomous Obstacle Avoidance Strategies in the Mission of Large Space Debris Removal using Potential Function. Adv. Space Res. 2022, 72, 2860–2873. [Google Scholar] [CrossRef]
Goodwin, E.M. A Statistical Study of Ship Domains. J. Navig. 1975, 28, 328–344. [Google Scholar] [CrossRef]
Zhou, J.; Ding, F.; Yang, J.; Pei, Z.; Wang, C.; Zhang, A. Navigation safety domain and collision risk index for decision support of collision avoidance of USVs. Int. J. Nav. Archit. Ocean Eng. 2021, 13, 340–350. [Google Scholar] [CrossRef]
Ibadurrahman; Hamada, K.; Wada, Y.; Nanao, J.; Watanabe, D.; Majima, T. Long-Term Ship Position Prediction Using Automatic Identification System (AIS) Data and End-to-End Deep Learning. Sensors 2021, 21, 7169. [Google Scholar] [CrossRef]
Abebe, M.; Noh, Y.; Seo, C.; Kim, D.; Lee, I. Developing a Ship Collision Risk Index estimation model based on Dempster-Shafer theory. Appl. Ocean Res. 2021, 113, 102735. [Google Scholar] [CrossRef]
Abebe, M.; Noh, Y.; Kang, Y.J.; Seo, C.; Kim, D.; Seo, J. Ship trajectory planning for collision avoidance using hybrid ARIMA-LSTM models. Ocean Eng. 2022, 256, 11–27. [Google Scholar] [CrossRef]
Polvara, R.; Sharma, S.; Wan, J.; Manning, A.; Sutton, R. Obstacle Avoidance Approaches for Autonomous Navigation of Unmanned Surface Vehicles. J. Navig. 2018, 71, 241–256. [Google Scholar] [CrossRef]
Gonzalez-Garcia, A.; Castañeda, H.; Garrido, L. USV Path-Following Control Based On Deep Reinforcement Learning and Adaptive Control. In Proceedings of the Global Oceans 2020: Singapore—U.S. Gulf Coast, Biloxi, MS, USA, 5–30 October 2020; pp. 1–7. [Google Scholar] [CrossRef]
Yu, Y.; Chen, L.; Shu, Y.; Zhu, W. Evaluation model and management strategy for reducing pollution caused by ship collision in coastal waters. Ocean Coast. Manag. 2021, 203, 105446. [Google Scholar] [CrossRef]
Heiberg, A.; Larsen, T.N.; Meyer, E.; Rasheed, A.; San, O.; Varagnolo, D. Risk-based implementation of COLREGs for autonomous surface vehicles using deep reinforcement learning. Neural Netw. 2022, 152, 17–33. [Google Scholar] [CrossRef] [PubMed]
Marin-Plaza, P.; Hussein, A.; Martin, D.; Arturo, D.L.E. Global and Local Path Planning Study in a ROS-Based Research Platform for Autonomous Vehicles. J. Adv. Transp. 2018, 2018, 6392697. [Google Scholar] [CrossRef]
Wu, B.; Yip, T.L.; Yan, X.; Guedes Soares, C. Fuzzy logic based approach for ship-bridge collision alert system. Ocean Eng. 2019, 187, 106152. [Google Scholar] [CrossRef]
Ivanov, Y.S.; Zhiganov, S.V.; Ivanova, T.I. Intelligent Deep Neuro-Fuzzy System of Abnormal Situation Recognition for Transport Systems. In Current Problems and Ways of Industry Development: Equipment and Technologies; Shakirova, O.G., Bashkov, O.V., Khusainov, A.A., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 224–233. [Google Scholar]
Hörteborn, A.; Ringsberg, J.W. A method for risk analysis of ship collisions with stationary infrastructure using AIS data and a ship manoeuvring simulator. Ocean Eng. 2021, 235, 109396. [Google Scholar] [CrossRef]
Kobayashi, M.; Motoi, N. Local Path Planning: Dynamic Window Approach With Virtual Manipulators Considering Dynamic Obstacles. IEEE Access 2022, 10, 17018–17029. [Google Scholar] [CrossRef]
Kabir, A.M.; Thakar, S.; Malhan, R.K.; Shembekar, A.V.; Gupta, S.K. Generation of synchronized configuration space trajectories with workspace path constraints for an ensemble of robots. Int. J. Robot. Res. 2021, 40, 027836492098808. [Google Scholar] [CrossRef]
Zhu, M.; Hahn, A.; Wen, Y.Q.; Sun, W.Q. Optimized support vector regression algorithm-based modeling of ship dynamics. Appl. Ocean Res. 2019, 90, 101842. [Google Scholar] [CrossRef]
Zhu, H.; Ding, Y. Optimized Dynamic Collision Avoidance Algorithm for USV Path Planning. Sensors 2023, 23, 4567. [Google Scholar] [CrossRef]
Gracia, S.; Olivito, J.; Resano, J.; Martin-Del-Brio, B.; Lvarez, E. Improving accuracy on wave height estimation through machine learning techniques. Ocean Eng. 2021, 236, 86–99. [Google Scholar] [CrossRef]
Chaal, M.; Ren, X.; BahooToroody, A.; Basnet, S.; Bolbot, V.; Banda, O.A.V.; Gelder, P.V. Research on risk, safety, and reliability of autonomous ships: A bibliometric review. Saf. Sci. 2023, 167, 106256. [Google Scholar] [CrossRef]
Kim, D.H. Human factors influencing the ship operator’s perceived risk in the last moment of collision encounter. Reliab. Eng. Syst. Saf. 2020, 203, 107078. [Google Scholar] [CrossRef]
Kumar, U.M.; Bhat, S.P.; Kavitha, V.; Hemachandra, N. Approximate solutions to constrained Risk-Sensitive Markov decision processes. Eur. J. Oper. Res. 2023, 310, 249–267. [Google Scholar] [CrossRef]
Ding, Y.; Zhu, H.; Chen, R.; Li, R. An Efficient AdaBoost Algorithm with the Multiple Thresholds Classification. Appl. Sci. 2022, 12, 5872. [Google Scholar] [CrossRef]
Liu, Y.; Yan, Z.; Tan, J.; Li, Y. Multi-Purpose Oriented Single Nighttime Image Haze Removal Based on Unified Variational Retinex Model. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 1643–1657. [Google Scholar] [CrossRef]
Yan, W.; Tan, R.T.; Dai, D. Nighttime Defogging Using High-Low Frequency Decomposition and Grayscale-Color Networks. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 473–488. [Google Scholar]
Wang, J.; Xu, Z.; Zheng, X.; Liu, Z. A Fuzzy Logic Path Planning Algorithm Based on Geometric Landmarks and Kinetic Constraints. Inf. Technol. Control. 2022, 51, 499–514. [Google Scholar] [CrossRef]
Zhang, H.; Sun, J.; Yang, B.; Shi, Y.; Li, Z. Optimal search and rescue route design using an improved ant colony optimization. Inf. Technol. Control. 2020, 49, 438–447. [Google Scholar] [CrossRef]
Ning, Y.; Yue, M.; Guo, L.; Zhao, J. A WIP Vehicle Control Method Based on Improved Artificial Potential Field Subject to Multi-Obstacle Environment. Inf. Technol. Control. 2020, 49, 320–344. [Google Scholar] [CrossRef]
Zagradjanin, N.; Pamucar, D.; Jovanovic, K. Cloud-Based Multi-Robot Path Planning in Complex and Crowded Environment with Multi-Criteria Decision Making Using Full Consistency Method. Symmetry 2019, 11, 1241. [Google Scholar] [CrossRef]
Ayawli, B.; Mei, X.; Shen, M.; Appiah, A.; Kyeremeh, E.F. Optimized RRT-A* Path Planning Method for Mobile Robots in Partially Known Environment. Inf. Technol. Control 2019, 48, 179–194. [Google Scholar] [CrossRef]
Khalaji, A.K.; Tourajizadeh, H. Nonlinear Lyapounov based control of an underwater vehicle in presence of uncertainties and obstacles. Ocean Eng. 2020, 198, 98–116. [Google Scholar] [CrossRef]
Sang, H.; You, Y.; Sun, X.; Zhou, Y.; Liu, F. The hybrid path planning algorithm based on improved A* and artificial potential field for unmanned surface vehicle formations. Ocean Eng. 2021, 223, 87–109. [Google Scholar] [CrossRef]
Haque, M.S.; Chowdhury, M.U. A New Cyber Security Framework Towards Secure Data Communication for Unmanned Aerial Vehicle (UAV). In Security and Privacy in Communication Networks; Lin, X., Ghorbani, A., Ren, K., Zhu, S., Zhang, A., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 113–122. [Google Scholar]
Al Maawali, W.; Mesbah, M.; Al Maashri, A.; Saleem, A. Design of intelligent thruster decision-making system for USVs. Ocean Eng. 2023, 285, 115431. [Google Scholar] [CrossRef]
Abdolkarimi, E.; Abaei, G.; Selamat, A.; Mosavi, M. A hybrid Type-2 Fuzzy Logic System and Extreme Learning Machine for low-cost INS/GPS in high-speed vehicular navigation system. Appl. Soft Comput. 2020, 94, 106447. [Google Scholar] [CrossRef]
Melo Moreira, D.A.; Valdivia Delgado, K.; Nunes de Barros, L.; Deratani Mauá, D. Efficient algorithms for Risk-Sensitive Markov Decision Processes with limited budget. Int. J. Approx. Reason. 2021, 139, 143–165. [Google Scholar] [CrossRef]
Schlosser, R. Risk-sensitive control of Markov decision processes: A moment-based approach with target distributions. Comput. Oper. Res. 2020, 123, 104997. [Google Scholar] [CrossRef]
Huang, Y.; van Gelder, P. Collision risk measure for triggering evasive actions of maritime autonomous surface ships. Saf. Sci. 2020, 127, 104708. [Google Scholar] [CrossRef]
Shah, B.C.; Gupta, S.K. Long-Distance Path Planning for Unmanned Surface Vehicles in Complex Marine Environment. IEEE J. Ocean. Eng. 2020, 45, 813–830. [Google Scholar] [CrossRef]
Bhabak, A.; Saha, S. Risk-sensitive semi-Markov decision problems with discounted cost and general utilities. Stat. Probab. Lett. 2022, 184, 109408. [Google Scholar] [CrossRef]
Fiorini, P.; Shiller, Z. Motion Planning in Dynamic Environments Using Velocity Obstacles. Int. J. Robot. Res. 1998, 17, 760–772. [Google Scholar] [CrossRef]
Komen, D.; Neilsen, T.B.; Mortenson, D.B.; Acree, M.C.; Hodgkiss, W.S. Seabed type and source parameters predictions using ship spectrograms in convolutional neural networks. J. Acoust. Soc. Am. 2021, 149, 1198–1210. [Google Scholar] [CrossRef]
Wang, S.; Pu, Y.; Yang, S.; Yao, X.; Li, B. Boltzmann Exploration for Deterministic Policy Optimization. In Neural Information Processing; Yang, H., Pasupa, K., Leung, A.C.S., Kwok, J.T., Chan, J.H., King, I., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 214–222. [Google Scholar]
Szlapczynski, R.; Szlapczynska, J. A ship domain-based model of collision risk for near-miss detection and Collision Alert Systems. Reliab. Eng. Syst. Saf. 2021, 214, 107766. [Google Scholar] [CrossRef]

Figure 1. States under heading angle

ψ

change.

Figure 1. States under heading angle

ψ

change.

Figure 2. Example of RS-MDP with four states and integer costs: (a) state transition diagrams; (b) action variation diagrams associated with time costs.

Figure 3. Motion planning flow of the USV operating.

Figure 4. Analysis of USV path planning: random map, initial goal positions, and velocity distribution. (a) Random map, initial goal positions. (b) Visualization of velocity distribution.

Figure 5. Comparison of a path generated in different time budgets: (a) path map under various time budgets. (b1) Time budget for 65 s. (b2) Time budget for 45 s.

Figure 6. Three algorithms simulation parameters compared in time budget of 65 s: (a) Course angle. (b) Heading Angle. (c) Translational Acceleration

(m / s^{2})

. (d) USV speed.

Figure 6. Three algorithms simulation parameters compared in time budget of 65 s: (a) Course angle. (b) Heading Angle. (c) Translational Acceleration

(m / s^{2})

. (d) USV speed.

Table 1. Constraint initial parameter values.

Parameters	USV	Target Vessel
Maximum Translational Velocity	4.5 [m/s]	6 [m/s]
The average obstacle gap	−0.3 [m/s]	0.0 [m/s]
Maximum Angular Range of Heading Angle	45 [deg]	0 [deg]
Minimum Angular Range of Heading Angle	−45 [deg]	0 [deg]
Maximum Angular Velocity	5 [m/s]	-
Minimum Angular Velocity	−5 [m/s]	-
Time Step	0.1 [s]	0.1 [s]
Maximum Predicted Time	5 [s]	-
Maximum Translational Acceleration	2 [ $m / s^{2}$ ]	0 [ $m / s^{2}$ ]

Table 2. Simulation results in different time budgets.

Time Budget	State Node Utilization	Success Rate [%]	Time [s]	Trajectory Length [m]	Algo. Time [ms]
65	97.84	96	62.03	199.521	80.21
45	90.84	90	43.79	195.804	89.06

Table 3. Compare algorithms at different time budgets.

Time Budget	Method	Succ. [%]	T.vel [s]	TL [m]	Mean.vel [ $m / s^{2}$ ]	Ang.vel. Integral [ $\deg / s$ ]
	DWV	82.340	34.57	154.093	4.881	2.340
35	SLP	12.36	49.76	92.077	3.094	1.823
	MHA	80.231	34.00	105.060	3.090	1.340
	DWV	12.340	81.653	270.843	3.317	1.800
60	SLP	34.554	88.910	358.574	4.033	2.560
	MHA	88.231	59.772	182.902	3.060	3.230
	DWV	91.630	76.157	249.871	3.281	2.101
80	SLP	93.800	82.976	173.752	2.094	3.471
	MHA	97.781	76.278	238.140	3.122	0.822

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ding, Y.; Zhu, H. Risk-Sensitive Markov Decision Processes of USV Trajectory Planning with Time-Limited Budget. Sensors 2023, 23, 7846. https://doi.org/10.3390/s23187846

AMA Style

Ding Y, Zhu H. Risk-Sensitive Markov Decision Processes of USV Trajectory Planning with Time-Limited Budget. Sensors. 2023; 23(18):7846. https://doi.org/10.3390/s23187846

Chicago/Turabian Style

Ding, Yi, and Hongyang Zhu. 2023. "Risk-Sensitive Markov Decision Processes of USV Trajectory Planning with Time-Limited Budget" Sensors 23, no. 18: 7846. https://doi.org/10.3390/s23187846

APA Style

Ding, Y., & Zhu, H. (2023). Risk-Sensitive Markov Decision Processes of USV Trajectory Planning with Time-Limited Budget. Sensors, 23(18), 7846. https://doi.org/10.3390/s23187846

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Risk-Sensitive Markov Decision Processes of USV Trajectory Planning with Time-Limited Budget

Abstract

1. Introduction

2. Problem Statement and Risk-Sensitive Markov Decision Process (RS-MDP)

2.1. Problem Description

2.2. RS Markov Decision Process (RS-MDP) for USV

2.2.1. A Set of Possible States of RS-MDP

2.2.2. Action Function of RS-MDP

2.2.3. The Cost Function of RS-MDP Search Strategies

3. Trajectory Planning Based on MHA Algorithm

3.1. RS-MDP for Initialized Cost Threshold Probabilities

3.2. Adaptive Threshold Search Algorithm for RS-MDP

3.2.1. Value Iteration for RS-MDP

3.2.2. Cost-Threshold Probability Convergence

4. Simulations and Discussion

4.1. Optimization of Trajectory Planning with Limited Budgets

4.2. Comparative Algorithm Simulation Experiments

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI