Constrained Parameterized Differential Dynamic Programming for Waypoint-Trajectory Optimization

Zheng, Xiaobo; Xia, Feiran; Lin, Defu; Jin, Tianyu; Su, Wenshan; He, Shaoming

doi:10.3390/aerospace11060420

Open AccessArticle

Constrained Parameterized Differential Dynamic Programming for Waypoint-Trajectory Optimization

by

Xiaobo Zheng

¹

,

Feiran Xia

¹,

Defu Lin

¹,

Tianyu Jin

¹,

Wenshan Su

² and

Shaoming He

^1,*

¹

School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081, China

²

National Innovation Institute of Defense Technology, Beijing 100071, China

^*

Author to whom correspondence should be addressed.

Aerospace 2024, 11(6), 420; https://doi.org/10.3390/aerospace11060420

Submission received: 22 April 2024 / Revised: 18 May 2024 / Accepted: 21 May 2024 / Published: 22 May 2024

(This article belongs to the Topic Target Tracking, Guidance, and Navigation for Autonomous Systems, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Unmanned aerial vehicles (UAVs) are required to pass through multiple important waypoints as quickly as possible in courier delivery, enemy reconnaissance and other tasks to eventually reach the target position. There are two important problems to be solved in such tasks: constraining the trajectory to pass through intermediate waypoints and optimizing the flight time between these waypoints. A constrained parameterized differential dynamic programming (C-PDDP) algorithm is proposed for meeting multiple waypoint constraints and free-time constraints between waypoints to deal with these two issues. By considering the intermediate waypoint constraints as a kind of path state constraint, the penalty function method is adopted to constrain the trajectory to pass through the waypoints. For the free-time constraints, the flight times between waypoints are converted into time-invariant parameters and updated at the trajectory instants corresponding to the waypoints. The effectiveness of the proposed C-PDDP algorithm under waypoint constraints and free-time constraints is verified through numerical simulations of the UAV multi-point reconnaissance problem with five different waypoints. After comparing the proposed algorithm with fixed-time constrained DDP (C-DDP), it is found that C-PDDP can optimize the flight time of the trajectory with three segments to 7.35 s, 9.50 s and 6.71 s, respectively. In addition, the maximum error of the optimized trajectory waypoints of the C-PDDP algorithm is 1.06 m, which is much smaller than that (7 m) of the C-DDP algorithm used for comparison. A total of 500 Monte Carlo tests were simulated to demonstrate how the proposed algorithm remains robust to random initial guesses.

Keywords:

trajectory optimization; differential dynamic programming; waypoints

1. Introduction

Unmanned aerial vehicles (UAVs) have become the preferred platform for tasks such as reconnaissance, transportation and delivery due to their low cost, flexibility and rapid deployment characteristics. With a variety of integrated sensors, good concealment and long loiter capabilities, UAVs can be used for close-range low-altitude reconnaissance of important targets or hidden targets and greatly improve the reconnaissance efficiency and success rate [1,2,3,4,5]. However, UAVs have a short endurance due to the limitation of their energy resources, and hence, it is necessary to plan a reasonable flight trajectory to reduce the flight time and energy consumption of UAVs in missions. In addition, UAVs are required to pass through multiple necessary mission waypoints in reconnaissance, delivery and other tasks. Planning a reasonable flight trajectory to quickly pass through these points is of great economic benefit to UAV missions and can improve the performance of mission execution. Therefore, planning the flight trajectory of UAVs to reach the necessary intermediate waypoints while optimizing the flight time between waypoints is the foundation for improving the effectiveness of UAVs for reconnaissance, courier, and other missions [6,7,8].

One important piece of technology that allows UAVs to increase their autonomy is trajectory optimization [9,10,11]. Trajectory optimization aims at finding the best control commands that improve performance index such as flight endurance and utilization of energy, and enable the UAV to satisfy nonlinear constraints such as control limits and no-fly zones. Real-world trajectory optimization problems are inherently nonlinear and it is impossible to develop an analytical solution in most cases [12]. In the last few years, a wide range of numerical algorithms, including pseudo-spectral methods [13,14,15], sequential quadratic programming [16,17,18], sequential convex programming [19,20,21], and differential dynamic programming (DDP) [22,23,24] have been found to be capable of addressing nonlinear trajectory optimization problems. From the standpoint of showcasing their benefits and mathematical nature, Chai et al. [9] provide a neatly structured explanation of the understandings of several numerical optimization techniques.

In recent years, DDP has attracted a lot of attention because of its rapid convergence properties [22,24,25,26,27,28,29]. Classical dynamic programming (DP) is the foundation of DDP, which employs a second-order approximation to the value function to break down the original problem into a number of smaller-dimensional subproblems, therefore overcoming the drawbacks of DP, also known as the curse of dimensionality [30,31]. The solution process of the DDP algorithm is carried out by approximating the optimal cost function of the current nominal trajectory utilizing a second-order Taylor series and recursively updating the optimal increment of the control sequence by using the first-order optimality condition until it converges to a locally optimal solution [32]. Real-time applications are possible with DDP and its variants since they can guarantee a theoretical second-order convergence rate [33], and exhibit linear complexity throughout the prediction horizon.

However, the major problem with original DDP is its inability to handle control, state, and other constraints besides dynamics constraints, and the need for a preset flight time. These two aspects limit the application of DDP in tasks such as UAV reconnaissance and courier delivery. On the one hand, it is necessary to pass through multiple intermediate waypoints in scenarios such as reconnaissance and courier delivery; thus, it is required that the trajectory satisfies the intermediate point state constraints. In addition, presetting the flight time between multiple waypoints entails heuristic methods, complicating the manual determination of the optimal termination time. Therefore, addressing these two drawbacks with the original DDP technique is essential and necessary.

One possible solution for the problem of the flight trajectory passing through necessary intermediate waypoints is the multiple shooting method. Giftthaler et al. [34] generalized the DDP algorithm to the multiple shooting framework by introducing intermediate shooting states as initial conditions for each segment and imposing continuity constraints with the previous segment’s trajectory, and the resulting method achieves faster convergence, better local shrinkage, and a shorter runtime. For the purpose of increasing the calculation speed of the algorithm and decreasing the sensitivity of the algorithm to the initial guess, Etienne et al. [35] embedded the multiple shooting method into a DDP method for the design of spacecraft trajectories; the algorithm derived the update formula of the multiple shooting DDP method for multiple trajectory segments and used the augmented Lagrangian method to deal with continuous constraints between trajectory segments and waypoint constraints. Li et al. [36] improved the convergence performance of the multiple shooting DDP algorithm by introducing a penalty method and an adaptive evaluation function to control the algorithm step size. Expanding on previous work, Mastalli et al. [37] proposed a feasibility-driven DDP based on Goldstein’s condition and a linear search process for expected cost reductions. Although the multiple shooting method is able to improve the performance of the algorithm, this is at the cost of potential discontinuities between multiple trajectories. In addition to the multiple shooting idea, Philipp et al. [38] established the concept of waypoint proximity, which utilizes a quadratic term penalty function to constrain the trajectory to be close to the waypoint and solves the problem using a solver. The idea of waypoint proximity avoids possible discontinuities between multiple trajectories and ensures the dynamical feasibility of the whole trajectory.

In addition, a number of researchers have dealt with the free time using other numerical optimization algorithms. Maity et al. [39] incorporated changes in the final time into the variation in state variables, developed a composite cost function for the Model Predict Static Programming (MPSP) method that included both the control and terminal time increments, and utilized the final time update and the terminal time increment to produce an analytical statement with the first-order optimality condition. Hong et al. [16] solved the free time obstacle avoidance constrained guidance problem by using sequential quadratic programming based on this concept. We revisited original DDP by dynamically optimizing the final time in conjunction with the control input in our previous work [40]. Finally, Sun et al. proposed a free terminal time stochastic DDP method considering random state perturbations based on our work [41]. The work published in Science Robotics by Foehn et al. developed time-optimized quadrotor flight trajectory planning with multiple waypoints, and the proposed algorithm optimizes both the time allocation of the trajectory and the trajectory itself [38], but the work was still performed on a PC rather than on an airborne computer. Oshin et al. [42] proposed a parameterized differential dynamic programming (PDDP) method for simultaneous optimization of parameters and control inputs, which was able to fast and stable converge to the minimum cost via rigorous mathematical demonstrations. Setting the flight time as an unknown parameter can solve the free terminal time problem using PDDP. However, this algorithm does not consider the path state constraints, and it cannot deal with time optimization of multi-segment trajectories. Martinez et al. [43] developed a multiple shooting solver that solves unknown time and parameter structures via parameterized Riccati recursion for complex action estimation problems such as humanoid robot problems.

Inspired by the observations mentioned above, this article develops a novel algorithm, constrained parameterized differential dynamic programming (C-PDDP), which builds on parameterized DDP. The proposed algorithm utilizes the penalty function method to deal with the nonlinear constraints of the intermediate waypoints, and a detailed mathematical demonstration of the modification of the state–action value function is provided. Based on the concept of parameterization, the flight time between waypoints is transformed into the unknown parameters to be solved, and the optimality condition of value function at corresponding instants of the waypoints is adopted to obtain the optimal time deviation and dynamically optimize the flight time. The proposed C-PDDP algorithm solves the problem that original PDDP is not able to deal with the intermediate waypoints and flexible times for multi-segment trajectories by introducing the penalty function method as well as the idea of time parameterization. Therefore, the proposed C-PDDP algorithm provides a strong potential in UAV reconnaissance and courier and other missions.

In this paper, numerical simulations are conducted in a UAV reconnaissance mission with multiple waypoints, and the results of the simulations demonstrate that this proposed algorithm can optimize the trajectory of the UAV to reach the necessary reconnaissance mission points as well as the optimal time for each trajectory segment. Compared with the fixed-time algorithm, the proposed algorithm has a smaller margin of error and a smoother trajectory, which reduces the flight time and the cost of the trajectory.

The remainder of this paper is outlined as follows. Section 2 describes the typical form of the trajectory optimization problem and discusses the fundamental PDDP algorithm. In Section 3, the proposed C-PDDP algorithm is laid out specifically and is then numerically evaluated in Section 4. Eventually, Section 5 concludes this paper. In this paper,

0_{n}

refers to the n-dimensional zero vector; vectors are denoted by

a

and matrices are denoted by

A

.

I_{n}

is the n-th order unit matrix. The operators

min (a, b)

(and

max (a, b)

) denote the process of finding the minimum (and maximum) value of a and b, respectively.

2. Preliminaries and Problem Formulation

Firstly, we model the spatial–temporal trajectory optimization problem with waypoint constraints in this section, and then introduce the PDDP algorithm for solving unknown parameters.

2.1. Modeling of the Spatial–Temporal Trajectory Optimization Problem with Multiple Waypoints

In a spatial–temporal trajectory optimization problem with multiple waypoint constraints, the goal of trajectory optimization is to find the optimal trajectory that passes through each waypoint while optimizing the flight time of each trajectory segment. Given W mission waypoints

p_{w_{j}}

, where

j \in [1, \dots, W]

, the objective of the multiple-waypoint-constrained spatial–temporal trajectory optimization problem can be expressed as

min_{u, t} Φ (x (t_{f}), t_{f}) + \sum_{j = 1}^{W} [\int_{t_{w_{j - 1}}}^{t_{w_{j}}} L_{j} (x (t), u (t)) d Ø]

(1)

where

t_{w_{j}}

denotes the terminal flight time of the flight trajectory segment

X_{w_{j - 1} w_{j}}

that starts at the waypoint

w_{j - 1}

. The equivalent objective function of the objective function (1) in the discrete time frame can be expressed as

min_{u_{k}, t} ϕ (x_{N}, t_{N}) + \sum_{j = 1}^{W} [\sum_{k = t_{w_{j - 1}}}^{t_{w_{j}}} l_{j} (x_{k}, u_{k})]

(2)

where

l_{j} (x_{k}, u_{k}) = L_{j} (x (t), u (t)) δ t

,

δ t = t_{k} - t_{k - 1}

. It can be seen from the equation that since the trajectory needs to pass through the waypoints, the original trajectory is also partitioned into multiple segments based on the number of waypoints. If the initial point of the trajectory is also considered as a waypoint, the trajectory will be divided into W trajectory segments in the case of W waypoints.

For multiple-waypoint constraints, in order to generate a trajectory that passes through the waypoint

p_{w_{j}}

, it is necessary to define the cost of the distance between a specific point

x_{k}

on the trajectory and the waypoint, and we consider the quadratic distance cost to be

D_{w_{j}} = {∥ p_{k} - p_{w_{j}} ∥}_{2}

(3)

In order to require the trajectory to pass through the necessary waypoints, the above distance cost can be expressed as a state constraint on the waypoint

x_{k}

, written as

{∥ x_{k} - p_{w_{j}} ∥}_{2}^{2} = 0

(4)

This constraint is a special case of the general state constraint

g (x_{k}, u_{k}) \leq 0

, which constrains the point

x_{k}

on the trajectory at time

t_{k}

to pass through the j-th waypoint at

p_{w_{j}}

.

In addition to constraining the trajectory to pass through the waypoints, it is also necessary that the flight time of each trajectory segment can be dynamically optimized; i.e., the flight time within each trajectory segment is free and the optimal flight time is obtained through optimization. The free flight time within each trajectory segment can be expressed as

t_{w_{j} w_{j + 1}} \in T

(5)

Based on the above description of multiple waypoint constraints and the free flight time of the trajectory segment and considering the control limits of the UAV, the spatial–temporal trajectory optimization problem with multiple waypoint constraints for a UAV can be formulated as a constrained trajectory optimization problem 1, denoted as CTO₁, and has the following form

\begin{matrix} u_{k}^{*} = min_{u_{k}, t} & ϕ (x_{N}, t_{N}) + \sum_{j = 1}^{W} [\sum_{k = t_{w_{j - 1}}}^{t_{w_{j}}} l_{j} (x_{k}, u_{k})] \\ s . t . & x_{k + 1} = f (x_{k}, u_{k}) \\ {∥ x_{k} - p_{w_{j}} ∥}_{2} = 0 \\ u_{min} \leq u_{k} \leq u_{max} \\ t_{w_{j} w_{j + 1}} \in T \end{matrix}

(6)

which is assumed to have a solution.

2.2. Parameterized Differential Dynamic Programming (PDDP)

The PDDP algorithm is a variant of the DDP algorithm that optimizes both control sequences and unknown parameters [42]. By abstracting the terminal time as an unknown parameter, PDDP is able to address the spatial–temporal trajectory optimization issue without presetting a terminal time of the UAV. In this subsection, the main ideas of PDDP are given. PDDP is designed to solve the following optimization problem:

\begin{matrix} \min_{U; θ} J (U; θ) & = \sum_{k = 1}^{N - 1} l (x_{k}, u_{k}; θ) + ϕ (x_{N}; θ) \\ s . t . x_{k + 1} & = f (x_{k}, u_{k}; θ) \end{matrix}

(7)

where a semicolon separates the parameter

θ

from the state

x

and the control

u

, indicating that

θ

is a time-invariant parameter, independent of the time instant k.

The basic idea of original PDDP is to transform the trajectory optimization problem containing unknown parameters into a backward sequential optimization issue at each time instant and a parameter optimization problem between single iterations by using the Bellman optimality principle. The parameterized optimal running cost, which can also be defined as a parameterized value function, can be expressed as

V (x_{k}; θ) = min [\underset{Q (x_{k}, u_{k}; θ)}{\underset{︸}{l (x_{k}, u_{k}; θ) + V (x_{k + 1}; θ)}}]

(8)

where

Q (x_{k}, u_{k}; θ)

denotes the action value function.

In contrast to standard DDP, which only considers the quadratic approximation of the state

x_{k}

and the control

u_{k}

along the nominal trajectory

(x_{k}, u_{k})

at time

t_{k}

[40], PDDP also needs to consider the quadratic approximation of the value function with respect to the parameter

θ

, which yields

\begin{matrix} V (x_{k} + δ x_{k}; θ + δ θ) \approx & V (x_{k}; θ) + V_{x, k}^{T} δ x_{k} + V_{θ, k}^{T} δ θ \\ + \frac{1}{2} {[\begin{matrix} δ x_{k} \\ δ θ \end{matrix}]}^{T} [\begin{matrix} V_{x x, k} & V_{x θ, k} \\ V_{θ x, k} & V_{θ θ, k} \end{matrix}] [\begin{matrix} δ x_{k} \\ δ θ \end{matrix}] \end{matrix}

(9)

where the deviations of the state, control and parameters at

t_{k}

from the nominal trajectory are denoted by

δ x_{k}

,

δ u_{k}

and

δ θ

.

V_{x, k}

and

V_{θ, k}

are the gradient vectors, and

V_{x x, k}

,

V_{x θ, k}

,

V_{θ x, k}

and

V_{θ θ, k}

are the derivatives of

V (x_{k}; θ)

at time instant k.

Similarly, the quadratic approximation of

Q (x_{k}, u_{k}; θ)

, defined in Equation (8), is approximated as

\begin{matrix} Q (x_{k} + δ x_{k}, u_{k} + δ u_{k}; θ + δ θ) \approx Q (x_{k}, u_{k}; θ) + Q_{x, k}^{T} δ x_{k} + Q_{u, k}^{T} δ u_{k} + Q_{θ, k}^{T} δ θ \\ + \frac{1}{2} {[\begin{matrix} δ x_{k} \\ δ u_{k} \\ δ θ \end{matrix}]}^{T} [\begin{matrix} Q_{x x, k} & Q_{x u, k} & Q_{x θ, k} \\ Q_{u x, k} & Q_{u u, k} & Q_{u θ, k} \\ Q_{θ x, k} & Q_{θ u, k} & Q_{θ θ, k} \end{matrix}] [\begin{matrix} δ x_{k} \\ δ u_{k} \\ δ θ \end{matrix}] \end{matrix}

(10)

Combining the terms of the same order of

δ x

,

δ u

, and

δ θ

obtains the derivation of

Q (x_{k}, u_{k}; θ)

as

\begin{matrix} Q_{x, k} = l_{x, k} + f_{x, k}^{T} V_{x, k + 1} \\ Q_{u, k} = l_{u, k} + f_{u, k}^{T} V_{x, k + 1} \\ Q_{θ, k} = l_{θ, k} + V_{θ, k + 1} + f_{θ, k}^{T} V_{x, k + 1} \\ Q_{x x, k} = l_{x x, k} + f_{x, k}^{T} V_{x x, k + 1} f_{x, k} \\ Q_{u u, k} = l_{u u, k} + f_{u, k}^{T} V_{x x, k + 1} f_{u, k} \\ Q_{θ θ, k} = l_{θ θ, k} + V_{θ θ, k + 1} + f_{θ, k}^{T} V_{x x, k + 1} f_{θ, k} + f_{θ, k}^{T} V_{x θ, k + 1} + V_{θ x, k + 1} f_{θ, k} \\ Q_{x u, k} = l_{x u, k} + f_{x, k}^{T} V_{x x, k + 1} f_{u, k} = Q_{u x, k}^{T} \\ Q_{x θ, k} = l_{x θ, k} + f_{x, k}^{T} V_{x x, k + 1} f_{θ, k} + f_{x, k}^{T} V_{x θ, k + 1} = Q_{θ x, k}^{T} \\ Q_{u θ, k} = l_{u θ, k} + f_{u, k}^{T} V_{x x, k + 1} f_{θ, k} + f_{u, k}^{T} V_{u θ, k + 1} = Q_{θ u, k}^{T} \end{matrix}

(11)

With regard to the control deviation

δ u_{k}

, the second-order expansion of

Q (x_{k}, u_{k}; θ)

is a quadratic function, and then one can utilize the optimality condition to find

δ u_{k}^{*}

, i.e.,

\begin{matrix} δ u_{k}^{*} = k_{k} + K_{k} δ x_{k} + M_{k} δ θ_{k} \end{matrix}

(12)

where

\begin{matrix} \begin{matrix} k_{k} & = - Q_{u u, k}^{- 1} Q_{u, k} \\ K_{k} & = - Q_{u u, k}^{- 1} Q_{u x, k} \\ M_{k} & = - Q_{u u, k}^{- 1} Q_{u θ, k} \end{matrix} \end{matrix}

(13)

and the feedforward term

k_{k}

and the feedback term

K_{k}

are the same as in the original DDP method. However, an additional feedback term

M_{k}

is added to optimize

θ

in PDDP.

Since the parameter

θ

is time-independent, i.e., it does not vary with time, it only needs to be updated at the time instant

k = 1

. In other words, the expression for the optimal parameter increment

δ θ^{*}

can be obtained under the deviation condition of

δ x_{1} = 0

in Equation (9) as follows

\begin{matrix} δ θ^{*} = - V_{θ θ, 1}^{- 1} V_{θ, 1} \end{matrix}

(14)

To ensure the convergence of the PDDP algorithm, the feedforward terms

k_{k}

of

δ u_{k}^{*}

and

- V_{θ θ, 1}^{- 1} V_{θ, 1}

of

δ θ^{*}

need to be scaled by the damping coefficient

α_{l}

, where

α_{l}

can be determined by the linear search method [42], and

\begin{matrix} δ u_{k}^{*} & = α_{l} k_{k} + k_{k} δ x_{k} + M_{k} δ θ_{k} \end{matrix}

(15)

\begin{matrix} δ θ^{*} & = - α_{l} V_{θ θ, 1}^{- 1} V_{θ, 1} \end{matrix}

(16)

Substituting the optimal control variables (12) into the Taylor expansion of

Q (x_{k}, u_{k}; θ)

and grouping the terms of

δ x_{k}

and

δ θ

with the same order, we can get

\begin{matrix} V (x_{k}; θ) & = Q (x_{k}, u_{k}; θ) - (\frac{1}{2} α_{l}^{2} - α_{l}) Q_{u, k}^{T} Q_{u u, k}^{- 1} Q_{u, k} \\ V_{x, k} & = Q_{x, k} - Q_{x u, k} Q_{u u, k}^{- 1} Q_{u, k} \\ V_{θ, k} & = Q_{θ, k} - Q_{θ u, k} Q_{u u, k}^{- 1} Q_{u, k} \\ V_{x x, k} & = Q_{x x, k} - Q_{x u, k} Q_{u u, k}^{- 1} Q_{u x, k} \\ V_{x θ, k} & = Q_{x θ, k} - Q_{x u, k} Q_{u u, k}^{- 1} Q_{u θ, k} = V_{θ x, k}^{T} \\ V_{θ θ, k} & = Q_{θ θ, k} - Q_{θ u, k} Q_{u u, k}^{- 1} Q_{u θ, k} \end{matrix}

(17)

Before running the PDDP algorithm, in order to generate the nominal trajectory, an initial guess of the control inputs should be given. Then, the terminal value function is initialized using the terminal objective function

V (x_{N}; θ) = ϕ (x_{N}; θ)

,

V_{x, N} = ϕ_{x, N}

,

V_{x x, N} = ϕ_{x x, N}

,

V_{θ, N} = ϕ_{θ, N}

,

V_{θ θ, N} = ϕ_{θ θ, N}

,

V_{x θ, N} = ϕ_{x θ, N} = V_{θ x, N}^{T}

. Then, the optimal control correction

δ u_{k}

is computed in reverse time using Equation (12), while the optimal parameter increment

δ θ

is computed at time

k = 1

using Equation (14). Briefly, the one-step control update and one-iteration parameter update used to generate a new trajectory are, respectively,

\begin{matrix} u_{k} & = u_{k} + δ u_{k}^{*} = u_{k} + α_{l} k_{k} + k_{k} δ x_{k} + M_{k} δ θ_{k} \\ θ & = θ + δ θ^{*} = θ - α_{l} V_{θ θ, 1}^{- 1} V_{θ, 1} \end{matrix}

(18)

Once the backward process computes the optimal control update and the optimal parameter update, the forward process is triggered to generate a nominal trajectory. The PDDP method iteratively executes the backward and forward processes until the difference in the cost functions between the two iterations is below a preset amount

|J^{(n)} - J^{(n - 1)}| < ϵ

(19)

in which the constant

ϵ > 0

has a relatively small value and

J^{(n)}

denotes the objective function of the nth iteration. As long as the termination condition is fulfilled, the iterative process of the algorithm is terminated and the optimal state and control are given.

3. Constrained Parameterized Differential Dynamic Programming for Spatial–Temporal Optimization

The purpose of this section is to determine the trajectory that passes through multiple waypoints and achieves the optimal flight time between waypoints. For easy understanding, this section defines the flight time in each trajectory segment between waypoints as

θ_{j} = t_{w_{j}} - t_{w_{j - 1}}

, which is used to denote the unknown parameter in the PDDP algorithm. In this section, the C-PDDP algorithm that considers the spatial–temporal trajectory optimization problem with waypoint constraints in the free-time framework is derived in detail. We will first use the penalty function method based on the original PDDP algorithm to deal with the nonlinear waypoint path constraints, and then introduce how to optimize the parameterized time dynamically in an iterative approach to obtain the optimal flight time for each trajectory segment.

The PDDP algorithm is used to solve problem CTO₁ (6); then, the parameterized formulation of problem (6) is in the form of constrained trajectory optimization problem 2, denoted as CTO₂, which has the following form

\begin{matrix} (u_{k}^{*}, θ_{j}^{*}) = min_{U; Θ} & ϕ (x_{N}; t_{N}) + \sum_{j = 1}^{W} [\sum_{k = t_{w_{j - 1}}}^{t_{w_{j}}} l_{j} (x_{k}, u_{k}; θ_{j})] \\ s . t . & x_{k + 1} = f (x_{k}, u_{k}; θ_{j}) \\ {∥ x_{k} - p_{w_{j}} ∥}_{2} = 0 \\ u_{min} \leq u_{k} \leq u_{max} \\ θ_{j} \in T \end{matrix}

(20)

where

U = {[u_{1}, \dots, u_{i}, \dots, u_{N}]}^{T}

,

Θ = {[θ_{1}, \dots, θ_{j}, \dots, θ_{W}]}^{T}

.

The multiple waypoint constraint

{∥ x_{k} - p_{w_{j}} ∥}_{2}^{2} = 0

in problem (20) only requires this path constraint to be considered at the corresponding time instant

t_{w_{j}}

of the waypoint. This constraint is an equational constraint that represents forcing the trajectory to pass through the waypoint

p_{w_{j}}

and tolerate a 0 error, which is usually impractical in numerical optimization algorithms. Therefore, in this section, the penalty function method is used to impose path constraints at each waypoint and the waypoint constraints are considered in the cost function so that the trajectory is able to pass through waypoints. The PDDP algorithm with path constraints is derived according to Bellman’s principle of optimality, which gives the updated expression of

V (x_{k}; θ)

and

Q (x_{k}, u_{k}; θ)

, thus constraining the trajectory to pass through the waypoints. On this basis, for the problem of a non-optimal preset flight time, the updating formula of unknown parameters in PDDP is utilized at the corresponding instants of the waypoints in the trajectory according to the value function and its derivatives, which derives the optimal deviation of flight times for each trajectory segment and obtains the optimal flight time for each segment by iterative calculation.

The idea of solving this problem can be illustrated by Figure 1.

3.1. Constrained Parameterized Differential Dynamic Programming

In this subsection, the PDDP is extended to nonlinear path constraints

{(x_{k} - p_{w_{j}})}^{T}

by the penalty function method [44,45,46], after which control clamping

u_{min} \leq u_{k} \leq u_{max}

is utilized to deal with the control constraints. The flight time of each trajectory segment is fixed in this subsection.

By adding the quadratic penalty function of the equality constraints

{(x_{k} - p_{w_{j}})}^{T}

at each waypoint to the original cost function (7), the augmented parameterized cost function can be obtained as

J = \sum_{k = 0}^{N - 1} l (x_{k}, u_{k}; θ) + ϕ (x_{N}; θ) + \sum_{j = 0}^{w} \frac{1}{2} {(x_{w_{j}} - p_{w_{j}})}^{T} μ_{k}^{j} (x_{w_{j}} - p_{w_{j}})

(21)

where

μ_{k}^{j}

, which satisfies the condition

μ_{k}^{j} > 0

, denotes the penalty coefficient for the j-th waypoint constraint.

By generalizing the cost function (21), the value function is modified as

\begin{matrix} \hat{V} (x_{k}; θ) = & ϕ (x_{N}; θ) + \sum_{i = k}^{N - 1} l (x_{k}, u_{k}^{*}; θ) \\ + \sum_{j = 0}^{w} \frac{1}{2} {(x_{w_{j}} - p_{w_{j}})}^{T} μ_{k}^{j} (x_{w_{j}} - p_{w_{j}}) \end{matrix}

(22)

According to Bellman’s optimality principle, the action value function at the corresponding time instant

t_{w_{j}}

for each waypoint is modified to be

\begin{matrix} \hat{Q} (x_{w_{j}}, u_{w_{j}}; θ_{j}) = & \hat{V} (x_{w_{j} + 1}; θ_{j}) + l (x_{w_{j}}, u_{w_{j}}; θ_{j}) \\ + \frac{1}{2} {(x_{w_{j}} - p_{w_{j}})}^{T} μ_{w_{j}}^{j} (x_{w_{j}} - p_{w_{j}}) \end{matrix}

(23)

Similar to original PDDP, we consider small perturbations around the nominal trajectory and then we approximate the value function and action value function with a second-order Taylor series. The second-order Taylor expansion of the value function and action value function at the corresponding time instant

t_{w_{j}}

at the waypoint is given by

\begin{matrix} \hat{V} (x_{w_{j}} + δ x_{w_{j}}; θ_{j} + δ θ_{j}) \approx & \hat{V} (x_{w_{j}}; θ_{j}) + [\begin{matrix} {\hat{V}}_{x, w_{j}}^{T} & {\hat{V}}_{θ, w_{j}}^{T} \end{matrix}] [\begin{matrix} δ x_{k} \\ δ θ_{j} \end{matrix}] \\ + \frac{1}{2} {[\begin{matrix} δ x_{w_{j}} \\ δ θ_{j} \end{matrix}]}^{T} [\begin{matrix} {\hat{V}}_{x x, w_{j}} & {\hat{V}}_{x θ, w_{j}} \\ {\hat{V}}_{θ x, w_{j}} & {\hat{V}}_{θ θ, w_{j}} \end{matrix}] [\begin{matrix} δ x_{w_{j}} \\ δ θ_{j} \end{matrix}] \end{matrix}

(24)

\begin{matrix} \hat{Q} (x_{k} + δ x_{k}, u_{k} + δ u_{k}; θ_{j} + δ θ_{j}) \approx \hat{Q} (x_{k}, u_{k}; θ_{j}) + [\begin{matrix} {\hat{Q}}_{x, w_{j}}^{T} & {\hat{Q}}_{u, w_{j}}^{T} & {\hat{Q}}_{θ, w_{j}}^{T} \end{matrix}] [\begin{matrix} δ x_{k} \\ δ u_{k} \\ δ θ_{j} \end{matrix}] \\ + \frac{1}{2} {[\begin{matrix} δ x_{k} \\ δ u_{k} \\ δ θ_{j} \end{matrix}]}^{T} [\begin{matrix} {\hat{Q}}_{x x, w_{j}} & {\hat{Q}}_{x u, w_{j}} & {\hat{Q}}_{x θ, w_{j}} \\ {\hat{Q}}_{u x, w_{j}} & {\hat{Q}}_{u u, w_{j}} & {\hat{Q}}_{u θ, w_{j}} \\ {\hat{Q}}_{θ x, w_{j}} & {\hat{Q}}_{θ u, w_{j}} & {\hat{Q}}_{θ θ, w_{j}} \end{matrix}] [\begin{matrix} δ x_{k} \\ δ u_{k} \\ δ θ_{j} \end{matrix}] \end{matrix}

(25)

Then, we can calculate derivatives of the modified action value function, i.e., the gradient and Hessian matrix as

\begin{matrix} {\hat{Q}}_{x, w_{j}} & = l_{x, w_{j}} + μ_{k}^{j} (x_{k} - p_{w_{j}}) + f_{x, w_{j}}^{T} V_{x, w_{j} + 1} \\ {\hat{Q}}_{u, w_{j}} & = l_{u, w_{j}} + f_{u, w_{j}}^{T} V_{x, w_{j} + 1} \\ {\hat{Q}}_{θ, w_{j}} & = l_{θ, w_{j}} + V_{θ, w_{j} + 1} + f_{θ, w_{j}}^{T} V_{x, w_{j} + 1} \\ {\hat{Q}}_{x x, w_{j}} & = l_{x x, w_{j}} + μ_{k}^{j} I_{n} + f_{x, w_{j}}^{T} V_{x x, w_{j} + 1} f_{x, w_{j}} \\ {\hat{Q}}_{u u, w_{j}} & = l_{u u, w_{j}} + f_{u, w_{j}}^{T} V_{x x, w_{j} + 1} f_{u, w_{j}} \\ {\hat{Q}}_{θ θ, w_{j}} & = l_{θ θ, w_{j}} + V_{θ θ, w_{j} + 1} + f_{θ, w_{j}}^{T} V_{x x, w_{j} + 1} f_{θ, w_{j}} \\ + f_{θ, w_{j}}^{T} V_{x θ, w_{j} + 1} + V_{θ x, w_{j} + 1} f_{θ, w_{j}} \\ {\hat{Q}}_{x u, w_{j}} & = l_{x u, w_{j}} + f_{x, w_{j}}^{T} V_{x x, w_{j} + 1} f_{u, w_{j}} = Q_{u x, w_{j}}^{T} \\ {\hat{Q}}_{x θ, w_{j}} & = l_{x θ, w_{j}} + f_{x, w_{j}}^{T} V_{x x, w_{j} + 1} f_{θ, w_{j}} + f_{x, w_{j}}^{T} V_{x θ, w_{j} + 1} = Q_{θ x, w_{j}}^{T} \\ {\hat{Q}}_{u θ, w_{j}} & = l_{u θ, w_{j}} + f_{u, w_{j}}^{T} V_{x x, w_{j} + 1} f_{θ, w_{j}} + f_{u, w_{j}}^{T} V_{u θ, w_{j} + 1} = Q_{θ u, w_{j}}^{T} \end{matrix}

(26)

From Equation (26), it can be seen that since the multiple waypoint state constraints are only related to the state

x_{w_{j}}

; the derivatives of the action value functions

{\hat{Q}}_{x, w_{j}}

and

{\hat{Q}}_{x x, w_{j}}

are therefore modified.

After obtaining the updated derivatives of the action value function, the control correction

δ u_{w_{j}}

is updated to

\begin{matrix} δ {\hat{u}}_{w_{j}}^{*} = {\hat{k}}_{w_{j}} + {\hat{k}}_{w_{j}} δ x_{w_{j}} + {\hat{M}}_{w_{j}} δ θ_{w_{j}} \end{matrix}

(27)

where

\begin{matrix} \begin{matrix} {\hat{k}}_{w_{j}} & = - {\hat{Q}}_{u u, w_{j}}^{- 1} {\hat{Q}}_{u, w_{j}} \\ {\hat{k}}_{w_{j}} & = - {\hat{Q}}_{u u, w_{j}}^{- 1} {\hat{Q}}_{u x, w_{j}} \\ {\hat{M}}_{w_{j}} & = - {\hat{Q}}_{u u, w_{j}}^{- 1} {\hat{Q}}_{u θ, w_{j}} \end{matrix} \end{matrix}

(28)

The derivatives of the original value function (17) ought to be substituted with the modified values obtained in Equation (26); then, the value function and its derivatives should be corrected to

\begin{matrix} \hat{V} (x_{w_{j}}; θ_{j}) & = \hat{Q} (x_{w_{j}}, u_{w_{j}}; θ_{j}) - (\frac{1}{2} α_{l}^{2} - α_{l}) {\hat{Q}}_{u, w_{j}}^{T} {\hat{Q}}_{u u, w_{j}}^{- 1} {\hat{Q}}_{u, w_{j}} \\ {\hat{V}}_{x, w_{j}} & = {\hat{Q}}_{x, w_{j}} - {\hat{Q}}_{x u, w_{j}} {\hat{Q}}_{u u, w_{j}}^{- 1} {\hat{Q}}_{u, w_{j}} \\ {\hat{V}}_{θ, w_{j}} & = {\hat{Q}}_{θ, w_{j}} - {\hat{Q}}_{θ u, w_{j}} {\hat{Q}}_{u u, w_{j}}^{- 1} {\hat{Q}}_{u, w_{j}} \\ {\hat{V}}_{x x, w_{j}} & = {\hat{Q}}_{x x, w_{j}} - {\hat{Q}}_{x u, w_{j}} {\hat{Q}}_{u u, w_{j}}^{- 1} {\hat{Q}}_{u x, w_{j}} \\ {\hat{V}}_{x θ, w_{j}} & = {\hat{Q}}_{x θ, w_{j}} - {\hat{Q}}_{x u, w_{j}} {\hat{Q}}_{u u, w_{j}}^{- 1} {\hat{Q}}_{u θ, w_{j}} = {\hat{V}}_{θ x, w_{j}}^{T} \\ {\hat{V}}_{θ θ, w_{j}} & = {\hat{Q}}_{θ θ, w_{j}} - {\hat{Q}}_{θ u, w_{j}} {\hat{Q}}_{u u, w_{j}}^{- 1} {\hat{Q}}_{u θ, w_{j}} \end{matrix}

(29)

It is important to point out again that the above update to the value function and the action value function occurs only at the time instants corresponding to the waypoints, and at other time instants, the gradient and Hessian matrix of the action value function are the same as in Equation (17).

For

u_{min} \leq u_{k} \leq u_{max}

of each time instant, the control is set in the region as

\begin{matrix} u_{k} & = max (min (u_{k} + {\hat{k}}_{k} + {\hat{k}}_{k} δ x_{k} + {\hat{M}}_{k} δ θ_{k}, u_{max}), u_{min}) \end{matrix}

(30)

3.2. Constrained Parameterized Differential Dynamic Programming for Spatial–Temporal Optimization

In the presence of multiple waypoint constraints, the flight trajectory can be constrained to the necessary waypoints through the penalty function method. However, the use of predefined fixed flight times between multiple trajectory segments usually results in a non-optimal trajectory. According to Equation (14), the original PDDP algorithm obtains the optimal parameter variations by applying the first-order optimality condition to the value function

V_{1}

at the initial time instant. In the case of multiple waypoints, the flight time needs to be optimized separately for each trajectory segment to acquire the time-optimal flight trajectory. Therefore, C-PDDP uses the optimality condition of

V (x_{k}; θ)

at each corresponding time instant of waypoints. Then, we can acquire the optimal flight time deviation of the trajectory segment, thus achieving spatial–temporal trajectory optimization between multiple waypoints through iterative optimization.

In this subsection, the constrained PDDP algorithm from the previous subsection is incorporated into the free-time framework, and an algorithm that will completely solve the problem CTO₁ is derived. Considering the existence of multiple trajectory segments to optimize the flight time for each segment, the flight time parameter

θ_{j}

is updated at the intersection of trajectory segments, which is the discrete time instant of the trajectory corresponding to the waypoint, to optimize the flight time of each trajectory segment. The core idea of solving the free-time problem is to update the flight time by determining the time increment at each iteration.

The second-order Taylor expansion of the value function at the waypoint

w_{j}

is given in Equation (24); thus, the deviation in the value function at waypoint

w_{j}

can be defined as

Δ V (x_{w_{j}}; θ_{j}) = \hat{V} (x_{w_{j}} + δ x_{w_{j}}; θ_{j} + δ θ_{j}) - \hat{V} (x_{w_{j}}; θ_{j})

, i.e.,

\begin{matrix} Δ V (x_{w_{j}}; θ_{j}) = & {\hat{V}}_{x, θ}^{T} δ x_{w_{j}} + {\hat{V}}_{θ, w_{j}}^{T} δ θ_{j} \\ + \frac{1}{2} {[\begin{matrix} δ x_{w_{j}} \\ δ θ_{j} \end{matrix}]}^{T} [\begin{matrix} {\hat{V}}_{x x, w_{j}} & {\hat{V}}_{x θ, w_{j}} \\ {\hat{V}}_{θ x, w_{j}} & {\hat{V}}_{θ θ, w_{j}} \end{matrix}] [\begin{matrix} δ x_{w_{j}} \\ δ θ_{j} \end{matrix}] \end{matrix}

(31)

This function is quadratic with respect to the flight time increment

δ θ_{j}

. According to its definition, the optimal flight time increment

δ θ_{j}^{*}

meets the first-order optimality condition

δ θ_{j}^{*} = min_{δ θ_{j}} Δ V (x_{w_{j}}; θ_{j})

(32)

Therefore, the optimal flight time variation

δ θ_{j}^{*}

can be obtained as

\nabla_{δ θ_{j}} V (x_{w_{j}}; θ_{j}) = V_{θ, w_{j}} + δ {x_{w_{j}}}^{T} V_{x θ, w_{j}} + δ θ_{j} V_{θ θ, w_{j}} = 0

(33)

Solving Equation (33) yields the following expression for the optimal flight time increment

δ {θ_{j}}^{*}

of the trajectory segment:

\begin{matrix} δ θ_{j}^{*} = - V_{θ θ, w_{j}}^{- 1} (V_{θ, w_{j}} + V_{θ x, w_{j}}^{T} δ x_{w_{j}}^{*}) \end{matrix}

(34)

The above equation shows that the increment deviation

δ t_{N}^{*}

includes a feedforward term

- V_{θ θ, w_{j}}^{- 1} V_{θ, w_{j}}

and a feedback term

- V_{θ θ, w_{j}}^{- 1} V_{θ x, w_{j}}^{T} δ x_{w_{j}}

, where

V_{θ, w_{j}}

,

V_{θ x, w_{j}}

and

V_{θ θ, w_{j}}

are given by Equation (29).

From Equation (34), it can be seen that unlike the condition

δ x_{1} = 0

in the unknown parameter solution for a single-segment trajectory,

δ x_{w_{j}}

is not 0 at the intermediate waypoint in the multiple trajectory segments problem, and thus the update of

δ θ_{j}

is related to

δ x_{w_{j}}

. However, at the time instant

k = w_{j}

, the optimal updates of the three variables

δ θ_{j}

,

δ u_{w_{j}}

, and

δ x_{w_{j}}

are coupled to each other, and the computation of

δ x_{w_{j}}

is determined by

δ u_{w_{j}}

via dynamic equations with a strong nonlinear nature, and therefore a closed-form solution between these three variables cannot be found. Consequently,

δ x_{w_{j}}

from the previous iteration is used in the actual computation, denoted as

δ x_{w_{j}}^{old}

, to approximate the solution of

δ θ_{j}

, i.e.,

\begin{matrix} δ θ_{j}^{*} = - V_{θ θ, w_{j}}^{- 1} (V_{θ, w_{j}} + V_{θ x, w_{j}}^{T} δ x_{w_{j}}^{old}) \end{matrix}

(35)

Finally, the optimal time variation

δ θ_{j}^{*}

is calculated at the time instants of each waypoint

k = w_{j}

by Equation (35). That is, the time update used for one iteration at each trajectory segment intersection point is

\begin{matrix} θ_{j} & = θ_{j} + δ θ_{j}^{*} \\ = θ_{j} - V_{θ θ, w_{j}}^{- 1} V_{θ, w_{j}} - V_{θ θ, w_{j}}^{- 1} V_{θ x, w_{j}}^{T} δ x_{w_{j}}^{old} \end{matrix}

(36)

To avoid a poor quality of the initial guesses for the flight time of each trajectory segment, which leads to C-PDDP divergence and a failure to converge to the local optimum, and to make sure that the time parameter is non-negative, we use the box constraint [47] to constrain the update of the flight time of each segment in a bounded region to ensure numerical stability

θ_{j} = min (max (θ_{j} + δ θ_{j}^{*}, t_{min}), t_{max})

(37)

where

t_{min}

and

t_{max}

denote the lower and upper bounds of each flight time.

The C-PDDP algorithm uses the same initialization method as the PDDP algorithm and will not be repeated here. The updates of the value function and the state–action value function at other time instants in the C-PDDP algorithm are the same as in Equations (26) and (29), except that the time parameter needs to be updated at the waypoint

k = w_{j}

.

Remark 1.

Although Equation (35) provides a feasible solution scheme for

δ θ_{j}^{*}

, its calculation only uses an approximation of

δ x_{w_{j}}

, i.e.,

δ θ_{j}^{*}

is not an exact solution to the original problem, and thus the equation does not guarantee that the updated time parameter is an optimal solution.

3.3. Scaling the Flight Time of Trajectory Segments

By constraining the distance cost between waypoint

p_{w_{j}}

and the specific point

x_{w_{j}}

of the trajectory through the penalty function method and updating the flight time

θ_{j}

as an unknown parameter at the time instant of the waypoint, the C-PDDP algorithm is able to find the spatial–temporal optimal trajectory under the constraint of multiple waypoints. However, it should be noted that the C-PDDP algorithm obtains the optimal control update and the optimal time update after the calculation of the value function and its derivatives at each algorithm iteration, and it is also necessary to obtain a new nominal trajectory through the control update and the time update by utilizing the integration of the dynamic equations in the forward process of the algorithm. After obtaining the updated flight time

θ_{j}

for each segment of the trajectory, an update of the objective function and the time parameter

θ_{j}

contained in the dynamic equation in problem (20) is required.

The objective function and dynamics in problem (20) need to be scaled and reformulated on the time horizon to ensure that the dynamics equations and objective function in each trajectory segment are described in terms of the most recent flight time parameters obtained from the calculation. The variable

τ_{j} = \frac{t_{w_{j}} - t_{w_{j - 1}}}{θ_{j}} + j - 1

is adopted to express the time deviation after scaling; then, the objective function and dynamics in problem (20) can be reformulated over a fixed time interval of unit length as

\begin{matrix} \min_{U; Θ} J (U; Θ) & = \sum_{k = 1}^{N - 1} θ_{j} \cdot l (x_{k} (τ_{k}), u_{k} (τ_{k})) + ϕ (x_{N} (τ_{N})) \\ s . t . x_{k + 1} & = θ_{j} f (x_{k} (τ_{k}), u_{k} (τ_{k})) \end{matrix}

(38)

During the solution of the problem using the C-PDDP algorithm, each iteration of the algorithm will generate a new time parameter

θ_{j}

for each trajectory segment.

Remark 2.

From the definition of the scaled time variable

τ_{j}

, it can be seen that after extracting the time parameter

θ_{j}

of each trajectory segment, the scaled flight time length is in the range of unit length

[0, 1]

. To ensure the numerical stability of the C-PDDP algorithm, a fixed number of time instant points are set in each trajectory segment, i.e., a fixed, scaled time step h is used within the deflation time interval

[0, 1]

during the actual operation of the algorithm, and thus the discrete-time dynamics can be expressed as

\begin{matrix} x_{k + 1} & = x_{k} + θ_{j} f (x_{k} (τ_{k}), u_{k} (τ_{k})) h \end{matrix}

(39)

Further, the actual time step within each trajectory segment can be dynamically adjusted according to the change in the time parameter

θ_{j}

during the optimization of the algorithm, which avoids using an interpolation operation due to the change in the time parameter, thus making the algorithm difficult to converge.

Remark 3.

The Hessian matrices

Q_{u u, k}

and

V_{θ θ, w_{j}}

are assumed to remain positive in the C-PDDP algorithm throughout the optimization process. Unlike the direct LM approach on

Q_{u u, k}

in the original DDP algorithm [48], the authors of [49] present a more robust approach that places the regularization on the state variables instead of the control variables:

\begin{matrix} {\hat{V}}_{x x, k} & = {\hat{V}}_{x x, k} + ρ_{μ} I_{n} \\ {\hat{V}}_{θ θ, k} & = {\hat{V}}_{θ θ, k} + ρ_{ν} I \end{matrix}

(40)

in which

ρ_{v}

and

ρ_{q}

are small and constant with regularization.

Remark 4.

The penalty factor is incrementally increased in the classical penalty function method to make sure that the solution converges to the optimal solution in the feasible state domain. Therefore, in every iteration, we update

μ_{k}^{j} > 0

to

μ_{k, i}^{j} = α μ_{k, i - 1}^{j}, μ_{k, 0}^{j} = μ

(41)

in which

μ_{k}^{j}

in i-th iteration is

μ_{k, i}^{j}

in the C-PDDP method.

α > 1

denotes the update rate.

Remark 5.

It should be noted in particular that the proposed algorithm still cannot handle cases where there are obstacles in the environment, and further development of the algorithm for obstacle constraints is needed, which will be a part of our future research.

We provide the pseudo-code of C-PDDP in Algorithm 1 by considering the results of the preceding subsections. The algorithm is based on the PDDP algorithm and constraining the cost of the distance of the waypoint at the corresponding time instant and updating the time parameter. The algorithm terminates when the difference in cost functions between two iterations is below a given constant.

Algorithm 1 Constrained parameterized differential dynamic programming

1:: Initialize $U_{0}, μ, θ_{j}$
2:: for $k = 0, . . ., N - 1$ do
3:: $x_{k + 1} = f (x_{k}, u_{k}; θ_{j})$
4:: end for
5:: while $|J_{i} - J_{i - 1}| \geq ϵ$ do
6:: J← (2)
7:: $V (x_{N}; θ) = ϕ (x_{N}; θ)$ , $V_{x, N} = ϕ_{x, N}$ , $V_{x x, N} = ϕ_{x x, N}$ , $V_{θ, N} = ϕ_{θ, N}$ , $V_{θ θ, N} = ϕ_{θ θ, N}$ , $V_{x θ, N} = ϕ_{x θ, N} = V_{θ x, N}^{T}$
8:: for $k = N - 1, . . ., 0$ do
9:: Compute derivatives of $Q (x_{k}, u_{k}; θ_{j})$ by (11)
10:: if $k = w_{j}$ then
11:: Compute derivatives of modified $\hat{Q} (x_{w_{j}}, u_{w_{j}}; θ_{j})$ by (26)
12:: end if
13:: ${\hat{k}}_{k}, {\hat{k}}_{k}, {\hat{M}}_{k}$ ← (28)
14:: Compute derivatives of $\hat{V} (x_{k}, u_{k}; θ_{j})$ by (29)
15:: if $k = w_{j}$ or $k = 1$ then
16:: $δ θ_{j}^{*} = - V_{θ θ, w_{j}}^{- 1} (V_{θ, w_{j}} + V_{θ x, w_{j}}^{T} δ x_{w_{j}}^{old})$
17:: end if
18:: end for
19:: $θ_{j}$ ← (37)
20:: $μ_{k, i}^{j} = α μ_{k, i - 1}^{j}$
21:: for $k = 0, . . ., N - 1$ do
22:: ${δ x}_{k}^{old} = x_{k}^{new} - x_{k}$
23:: $u_{k}$ ← (30)
24:: $u_{k} = max (min (u_{k} + {\hat{k}}_{k} + {\hat{k}}_{k} δ x_{k} + {\hat{M}}_{k} δ θ_{k}, u_{max}), u_{min})$
25:: $x_{k + 1}^{new} = f (x_{k}, u_{k}; θ_{j})$
26:: end for
27:: $X = X^{new}$
28:: end while

4. Application Example

In order to verify the effectiveness of the proposed C-PDDP algorithm, this section investigates a spatial–temporal optimization problem with multiple waypoint constraints and control limits in a UAV reconnaissance scenario, and compares the proposed algorithm with the fixed-time constrained DDP algorithm (C-DDP) to demonstrate the superiority of the spatial–temporal optimization of the C-PDDP algorithm. All simulations were conducted on a personal computer with i7-8570H and 8 GB RAM.

4.1. Mission Scenario and Simulation Setup

Consider a multi-point reconnaissance mission of a UAV in the horizontal plane, where the velocity of the UAV is assumed to be constant. The dynamics equations can be simplified to

\{\begin{matrix} \dot{x} & = V cos ψ \\ \dot{y} & = V sin ψ \\ \dot{ψ} & = \frac{a_{y}}{V} \end{matrix}

(42)

The multi-point reconnaissance mission requires the UAV to pass through multiple necessary intermediate mission points

p_{w_{j}}

and arrive at the end point

x_{d}

while optimizing the flight time

θ_{j}

between the multiple waypoints that is limited to

t_{min} \leq θ_{j} \leq t_{max}

. We assume that the UAV has a minimum flight energy consumption while accomplishing the mission objectives, and therefore, the quadratic cost function is defined as follows:

\begin{matrix} J & = \frac{1}{2} {(x_{N} - x_{d})}^{T} W_{N} (x_{N} - x_{d}) \\ + \sum_{k = 0}^{N - 1} [\frac{1}{2} u_{k}^{T} R u_{k} + \frac{1}{2} {(x_{k} - x_{d})}^{T} W_{s} (x_{k} - x_{d})] \end{matrix}

(43)

in which

W_{N}

,

R

and

W_{s}

are the weighted diagonal matrices and

x_{d} = {[x_{T}, y_{T}, ψ_{d}]}^{T}

refers to the intended terminal state vector of the UAV.

To accelerate the convergence rate of the algorithm, we employ a penalty factor that increases with the number of iterations, i.e.,

μ_{k, i}^{j} = α μ_{k, i - 1}^{j}

,

μ_{k, 0}^{j} = μ

. Table 1 summarizes the parameters required for the implementation of the C-PDDP algorithm.

4.2. Performance of the C-PDDP Algorithm

In this subsection, the effectiveness of the proposed C-PDDP algorithm in handling spatial–temporal optimization with multiple waypoints is illustrated by a typical UAV reconnaissance scenario algorithm. We consider that the UAV passes through five different intermediate waypoints, and the scenario setup conditions, such as the specific state values of the UAV, the locations of the five waypoints, and the initial values and boundaries of the flight time of the trajectory segments, are listed in Table 2. The initial approximate trajectories in all five scenarios are uncontrolled trajectories, i.e., the initial guess is

a_{y} = 0

. The UAV-related parameters and settings used in the simulation were taken from a U.S. SwitchBlade 300 UAV [50].

The results of the spatial–temporal trajectory optimization of the UAV passing through the intermediate waypoints under five waypoint scenarios are shown in Figure 2, which includes the optimized trajectories, lateral commanded acceleration, convergence of the flight time for each trajectory segment and the time history of flight path angles. As shown in the optimized trajectory results in Figure 2a, C-PDDP is able to optimize the UAV flight trajectory from an uncontrolled trajectory to pass through five different intermediate waypoints and arrive at the terminal point, which satisfies the waypoint constraints and the terminal constraints while guaranteeing the dynamic constraints of the UAV. The UAV control command, i.e., lateral acceleration, is shown in Figure 2b, from which it shows that in order to reach the waypoint, the lateral acceleration command optimized by the algorithm in all five cases undergoes one large deflection. From Figure 2d, it is evident that the C-PDDP algorithm optimizes the flight time of the trajectory segments to different results in all five cases, and realizes the flight time optimization of multiple trajectory segments. The simulation results of the above five cases show that the C-PDDP algorithm is capable of accomplishing the task of multi-segment free-time trajectory optimization with the constraints of the waypoints.

4.3. Comparison with C-DDP

In this subsection, we consider a scenario where the UAV passes through two necessary intermediate waypoints of the trajectory. The flight trajectory is divided into three trajectory segments if the initial and terminal position points of the trajectory are also accounted for, and the flight time of the UAV is free within the three trajectory segments. The initial guess is

a_{y} = 0

, meaning that the initial nominal trajectory is uncontrolled. To illustrate the superiority of the proposed C-PDDP algorithm for spatial–temporal optimization, we perform a simulation comparison with the fixed time C-DDP algorithm in the same simulation scenario. The simulation conditions such as the initial guess and the number of discrete points are set the same for both algorithms, so that the comparison is fair. Scenario setting conditions such as the location of the waypoints and the initial value of the flight time of the trajectory segment are listed in Table 3.

The optimization results of the spatial–temporal trajectory of the UAV passing through two intermediate waypoints under the two algorithms are shown in Figure 3, which include the optimized trajectory, lateral commanded acceleration, convergence of the flight time for each trajectory segment, time history of flight path angles, and cost function convergence results. As shown in the optimized trajectory results in Figure 3a, the two algorithms are able to optimize the UAV flight trajectory from an uncontrolled trajectory to pass through two intermediate waypoints in two different directions and arrive at the terminal point, which satisfies the waypoint constraints and the terminal point constraints while guaranteeing the dynamics constraints of the UAV. However, the figure shows that the trajectory optimized by the C-PDDP algorithm is smoother, while the trajectory optimized by the C-DDP algorithm is more tortuous due to its inability to optimize the flight time between the waypoints. The UAV control command, i.e., lateral acceleration, is shown in Figure 3c, from which it shows that both algorithms optimize the lateral acceleration command with two large deflections in order to reach the waypoints. The trajectory optimized by the C-DDP algorithm has a larger lateral acceleration than the one optimized by the C-PDDP algorithm due to the long flight time, and this leads to a larger trajectory cost. From Figure 3d, it can be seen that the flight times of the three trajectory segments are, respectively, optimized from the initial value of

9 s

to

7.35 s

,

9.50 s

and

6.71 s

by the C-PDDP algorithm, which completes the task of free-time optimizing.

In conclusion, the trajectory optimized by the C-PDDP algorithm has a smaller error distance between the waypoints, a smaller total flight time and a lower trajectory cost; differences in the optimization results of the two algorithms are listed in Table 4. The results in the table indicate that the proposed C-PDDP is able to optimize the UAV trajectory passing through the necessary intermediate points in reconnaissance scenarios with multiple waypoints and optimize the flight time of each trajectory segment at the same time to ensure the spatial–temporal optimality of the trajectory.

4.4. Monte Carlo Simulation Study

In order to demonstrate the robustness of the C-PDDP algorithm to initial control sequence guesses, we generated lateral acceleration commands for each time instant randomly from a uniform distribution. A total of 500 Monte Carlo simulations were conducted. The lateral acceleration command guesses for each time instant are generated within the upper and lower limits of the lateral acceleration limit, i.e.,

a_{y} \sim U (a_{y_{min}}, a_{y_{max}})

. The simulation setup in this subsection is the same as that of the C-PDDP algorithm in the previous subsection.

The Monte Carlo simulation results of the C-PDDP algorithm are shown in Figure 4. Figure 4a gives the initial guessed trajectories and optimal flight trajectories for all converged solutions. It can be seen that the C-PDDP algorithm is able to converge to the optimal trajectory even under such complicated initial guess conditions, while completing the task of passing the waypoints and optimizing the flight time of each trajectory segment. The distribution of the optimal flight time for each trajectory segment is shown in Figure 4b, which indicates that the calculated flight time averages converge to

7.35 s

,

9.50 s

and

6.71 s

, which is same as the results obtained in the previous subsection. The number of iterations of the C-PDDP algorithm shown in Figure 4c shows that C-PDDP can converge to the optimal result within 400 algorithmic iterations in most cases under an initial guess with strong randomness. Figure 4d shows the distribution of cost function values of the algorithm obtained from 500 Monte Carlo simulations, from which it is shown that most of the simulations are able to converge to the optimal cost value; however, some results failed to converge. According to the statistics, the algorithm fails to converge to the optimal value in three simulations, and so the convergence rate of the algorithm in 500 Monte Carlo simulations is

99.4 %

. The Monte Carlo simulations demonstrate that the C-PDDP algorithm is stable with random initial control guesses under given conditions. Meanwhile, the calculation time of the given examples to converge to the optimal solution is about 1 s, which also proves the computational efficiency of the algorithm.

5. Conclusions

This paper proposes a C-PDDP algorithm capable of solving spatial–temporal trajectory optimization problems with multiple intermediate waypoints. This method handles the waypoint path constraints through a penalty function method and parameterizes the flight times of multiple trajectory segments to dynamically optimize multiple segment times. The detailed procedure of the algorithm is derived in this paper, and the optimal control variations and optimal time variations at the waypoints are given.

In UAV reconnaissance mission scenarios, numerical simulations are conducted for five different waypoint scenarios to verify the effectiveness of the spatial–temporal trajectory optimization achieved by the C-PDDP algorithm. In a typical UAV reconnaissance scenario with two intermediate waypoints, the proposed algorithm is compared with the fixed-time constrained DDP algorithm. The proposed algorithm is able to optimize the trajectories with distance errors of

1.06 m

and flight times of

7.35 s

,

9.50 s

and

6.71 s

for the spatially and temporally optimized trajectories. For the C-DDP algorithm, which cannot optimize the time between trajectory segments, the trajectories are optimized with errors above

7 m

and the trajectories are more tortuous. The above simulation results demonstrate the advantages of the proposed C-PDDP algorithm for spatial–temporal optimization, and the Monte Carlo simulations show the robustness of the algorithm to random initial guesses.

The limitations of the proposed algorithm in this paper include the following. The C-PDDP algorithm proposed in this paper uses the penalty function method to deal with waypoint constraints, and when the number of waypoints is large, poor initial guesses may affect the convergence of the algorithm. In addition, the implementation of the penalty function method requires manual adjustment of some algorithm-related parameters, and the algorithm does not consider the presence of obstacles in the environment. Future work includes providing better initial guesses, eliminating the parameter tuning problem of the penalty function method using the primal-dual interior point method, and exploring the proposed method using a 3D model of a UAV in a 3D environment with obstacles.

Author Contributions

Conceptualization, X.Z., S.H. and D.L.; methodology, X.Z. and S.H.; software, X.Z., T.J. and F.X.; validation, X.Z.; formal analysis, X.Z.; investigation, X.Z. and T.J.; resources, D.L., S.H. and W.S.; data curation, X.Z. and S.H.; writing—original draft preparation, X.Z. and F.X.; writing—review and editing, X.Z., F.X. and S.H.; visualization, X.Z.; supervision, S.H.; project administration, S.H. and D.L.; funding acquisition, W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant No. 52302449, 62103435, Beijing Nova Program under Grant No. Z211100002121071 and Civilian Aircraft Research under Grant No. MJG5-1N21.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mulumba, T.; Diabat, A. Optimization of the drone-assisted pickup and delivery problem. Transp. Res. Part E Logist. Transp. Rev. 2024, 181, 103377. [Google Scholar] [CrossRef]
Gao, J.; Pan, Y.; Zhang, X.; Han, Q.; Hu, Y. Sharing instant delivery UAVs for crowdsensing: A data-driven performance study. Comput. Ind. Eng. 2024, 191, 110100. [Google Scholar] [CrossRef]
Yan, S.; Sun, C.S.; Chen, Y.H. Optimal routing and scheduling of unmanned aerial vehicles for delivery services. Transp. Lett. 2023, 1–12. [Google Scholar] [CrossRef]
Li, C.; Li, H.; Su, H. Analysis of the Development of Foreign Small Loitering Munitions and Their Early Warning System Response Strategy. Air Space Def. 2023, 6, 58–65. [Google Scholar]
Li, F.; Kunze, O. A Comparative Review of Air Drones (UAVs) and Delivery Bots (SUGVs) for Automated Last Mile Home Delivery. Logistics 2023, 7, 21. [Google Scholar] [CrossRef]
He, S.; Lee, C.H.; Shin, H.S.; Tsourdos, A. Minimum-effort waypoint-following guidance. J. Guid. Control. Dyn. 2019, 42, 1551–1561. [Google Scholar] [CrossRef]
He, S.; Shin, H.S.; Tsourdos, A.; Lee, C.H. Energy-optimal waypoint-following guidance considering autopilot dynamics. IEEE Trans. Aerosp. Electron. Syst. 2019, 56, 2701–2717. [Google Scholar] [CrossRef]
Khalil, H.; Rahman, S.U.; Ullah, I.; Khan, I.; Alghadhban, A.J.; Al-Adhaileh, M.H.; Ali, G.; ElAffendi, M. A UAV-Swarm-Communication Model Using a Machine-Learning Approach for Search-and-Rescue Applications. Drones 2022, 6, 372. [Google Scholar] [CrossRef]
Chai, R.; Savvaris, A.; Tsourdos, A.; Chai, S.; Xia, Y. A review of optimization techniques in spacecraft flight trajectory design. Prog. Aerosp. Sci. 2019, 109, 100543. [Google Scholar] [CrossRef]
Wang, Z.; Grant, M.J. Constrained trajectory optimization for planetary entry via sequential convex programming. J. Guid. Control. Dyn. 2017, 40, 2603–2615. [Google Scholar] [CrossRef]
Hong, H.; Maity, A.; Holzapfel, F.; Tang, S. Adaptive trajectory generation based on real-time estimated parameters for impaired aircraft landing. Int. J. Syst. Sci. 2019, 50, 2733–2751. [Google Scholar] [CrossRef]
He, S.; Shin, H.S.; Tsourdos, A. Optimal Guidance for Integrated Waypoint Following and Obstacle Avoidance. In Proceedings of the 2019 Workshop on Research, Education and Development of Unmanned Aerial Systems (RED UAS), Cranfield, UK, 25–27 November 2019; pp. 325–334. [Google Scholar]
Chai, R.; Savvaris, A.; Tsourdos, A. Violation Learning Differential Evolution-Based hp-Adaptive Pseudospectral Method for Trajectory Optimization of Space Maneuver Vehicle. IEEE Trans. Aerosp. Electron. Syst. 2017, 53, 2031–2044. [Google Scholar] [CrossRef]
Tang, G.; Jiang, F.; Li, J. Fuel-optimal low-thrust trajectory optimization using indirect method and successive convex programming. IEEE Trans. Aerosp. Electron. Syst. 2018, 54, 2053–2066. [Google Scholar] [CrossRef]
Laad, D.; Elango, P.; Mohan, R. Fourier Pseudospectral Method for Trajectory Optimization with Stability Requirements. J. Guid. Control. Dyn. 2020, 43, 2073–2090. [Google Scholar] [CrossRef]
Hong, H.; Maity, A.; Holzapfel, F. Free Final-Time Constrained Sequential Quadratic Programming Based Flight Vehicle Guidance. J. Guid. Control. Dyn. 2021, 44, 181–189. [Google Scholar] [CrossRef]
Zhang, J.; Ma, K.; Meng, G.; Tian, S. Spacecraft maneuvers via singularity-avoidance of control moment gyros based on dual-mode model predictive control. IEEE Trans. Aerosp. Electron. Syst. 2015, 51, 2546–2559. [Google Scholar] [CrossRef]
Li, J.; Li, C.; Zhang, Y. Entry Trajectory Optimization With Virtual Motion Camouflage Principle. IEEE Trans. Aerosp. Electron. Syst. 2019, 56, 2527–2536. [Google Scholar] [CrossRef]
Morgan, D.; Chung, S.J.; Hadaegh, F.Y. Model predictive control of swarms of spacecraft using sequential convex programming. J. Guid. Control. Dyn. 2014, 37, 1725–1740. [Google Scholar] [CrossRef]
Augugliaro, F.; Schoellig, A.P.; D’Andrea, R. Generation of collision-free trajectories for a quadrocopter fleet: A sequential convex programming approach. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 1917–1922. [Google Scholar]
Deligiannis, A.; Amin, M.; Lambotharan, S.; Fabrizio, G. Optimum Sparse Subarray Design for Multitask Receivers. IEEE Trans. Aerosp. Electron. Syst. 2019, 55, 939–950. [Google Scholar] [CrossRef]
Aziz, J.D.; Scheeres, D.J.; Lantoine, G. Hybrid Differential Dynamic Programming in the Circular Restricted Three-Body Problem. J. Guid. Control. Dyn. 2019, 42, 963–975. [Google Scholar] [CrossRef]
Sun, W.; Pan, Y.; Lim, J.; Theodorou, E.A.; Tsiotras, P. Min-Max Differential Dynamic Programming: Continuous and Discrete Time Formulations. J. Guid. Control. Dyn. 2018, 41, 2568–2580. [Google Scholar] [CrossRef]
Ozaki, N.; Campagnola, S.; Funase, R.; Yam, C.H. Stochastic Differential Dynamic Programming with Unscented Transform for Low-Thrust Trajectory Design. J. Guid. Control. Dyn. 2018, 41, 377–387. [Google Scholar] [CrossRef]
Morimoto, J.; Atkeson, C.G. Minimax differential dynamic programming: An application to robust biped walking. Adv. Neural Inf. Process. Syst. 2002, 12, 1–8. [Google Scholar]
Li, W.; Todorov, E. Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems. In Informatics in Control, Automation and Robotics; SciTePress: Setubal, Portugal, 2004; Volume 1, pp. 222–229. [Google Scholar]
Tassa, Y.; Erez, T.; Smart, W.D. Receding Horizon Differential Dynamic Programming. In Proceedings of the NIPS, Vancouver, BC, Canada, 3–6 December 2007; pp. 1465–1472. [Google Scholar]
He, S.; Shin, H.S.; Tsourdos, A. Computational guidance using sparse Gauss–Hermite quadrature differential dynamic programming. IFAC-PapersOnLine 2019, 52, 13–18. [Google Scholar] [CrossRef]
Zhang, G.; Wen, C.; Han, H.; Qiao, D. Aerocapture Trajectory Planning Using Hierarchical Differential Dynamic Programming. J. Spacecr. Rocket. 2022, 59, 1647–1659. [Google Scholar] [CrossRef]
Mayne, D. A Second-order Gradient Method for Determining Optimal Trajectories of Non-linear Discrete-time Systems. Int. J. Control 1966, 3, 85–95. [Google Scholar] [CrossRef]
Han, X.; Zhao, X.; Xu, X.; Mei, C.; Xing, W.; Wang, X. Trajectory tracking control for underactuated autonomous vehicles via adaptive dynamic programming. J. Frankl. Inst. 2024, 361, 474–488. [Google Scholar] [CrossRef]
Bertsekas, D.P. Reinforcement Learning and Optimal Control; Athena Scientific: Belmont, MA, USA, 2019. [Google Scholar]
Manchester, Z.; Kuindersma, S. Derivative-free trajectory optimization with unscented dynamic programming. In Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV, USA, 12–14 December 2016; pp. 3642–3647. [Google Scholar]
Giftthaler, M.; Neunert, M.; Stäuble, M.; Buchli, J.; Diehl, M. A Family of Iterative Gauss–Newton Shooting Methods for Nonlinear Optimal Control. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–9. [Google Scholar]
Pellegrini, E.; Russell, R.P. A multiple-shooting differential dynamic programming algorithm. Part 1: Theory. Acta Astronaut. 2020, 170, 686–700. [Google Scholar] [CrossRef]
Li, H.; Yu, W.; Zhang, T.; Wensing, P.M. A Unified Perspective on Multiple Shooting In Differential Dynamic Programming. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 9978–9985. [Google Scholar]
Mastalli, C.; Budhiraja, R.; Merkt, W.; Saurel, G.; Hammoud, B.; Naveau, M.; Carpentier, J.; Righetti, L.; Vijayakumar, S.; Mansard, N. Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 2536–2542. [Google Scholar]
Foehn, P.; Romero, A.; Scaramuzza, D. Time-optimal planning for quadrotor waypoint flight. Sci. Robot. 2021, 6, eabh1221. [Google Scholar] [CrossRef]
Maity, A.; Padhi, R.; Mallaram, S.; Rao, G.M.; Manickavasagam, M. A robust and high precision optimal explicit guidance scheme for solid motor propelled launch vehicles with thrust and drag uncertainty. Int. J. Syst. Sci. 2016, 47, 3078–3097. [Google Scholar] [CrossRef]
Zheng, X.; He, S.; Lin, D. Constrained Trajectory Optimization With Flexible Final Time for Autonomous Vehicles. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 1818–1829. [Google Scholar] [CrossRef]
Sun, X.; Chai, R.; Chai, S.; Zhang, B.; Tsourdos, A. Flexible Final-Time Stochastic Differential Dynamic Programming for Autonomous Vehicle Trajectory Optimization. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 6658–6669. [Google Scholar] [CrossRef]
Oshin, A.; Houghton, M.D.; Acheson, M.J.; Gregory, I.M.; Theodorou, E.A. Parameterized Differential Dynamic Programming. arXiv 2022, arXiv:2204.03727. [Google Scholar]
Martinez, S.; Griffin, R.; Mastalli, C. Multi-Contact Inertial Estimation and Localization in Legged Robots. arXiv 2024, arXiv:2403.17161. [Google Scholar]
Nocedal, J.; Wright, S. Numerical Optimization; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Plancher, B.; Manchester, Z.; Kuindersma, S. Constrained unscented dynamic programming. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 5674–5680. [Google Scholar]
Farshidian, F.; Neunert, M.; Winkler, A.W.; Rey, G.; Buchli, J. An efficient optimal planning and control framework for quadrupedal locomotion. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017; pp. 93–100. [Google Scholar]
Tassa, Y.; Mansard, N.; Todorov, E. Control-limited differential dynamic programming. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 1168–1175. [Google Scholar]
Murray, D.M.; Yakowitz, S.J. Differential dynamic programming and Newton’s method for discrete optimal control problems. J. Optim. Theory Appl. 1984, 43, 395–414. [Google Scholar] [CrossRef]
Tassa, Y.; Erez, T.; Todorov, E. Synthesis and stabilization of complex behaviors through online trajectory optimization. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, Vilamoura-Algarve, Portugal, 7–12 October 2012; pp. 4906–4913. [Google Scholar]
AeroVironment, Inc. Available online: https://www.avinc.com/lms/switchblade (accessed on 17 May 2024).

Figure 1. The idea for solving the multiple waypoint problem.

Figure 2. Simulation results for different cases (TS refers to the trajectory segment): (a) Flight trajectory. (b) Flight angle change. (c) Command acceleration change. (d) Terminal time convergence.

Figure 3. Simulation results for different cases (TS refers to the trajectory segment and TP refers to the trajectory points corresponding to the preset waypoints in C-PDDP and C-DDP): (a) Flight trajectory. (b) Flight angle change. (c) Command acceleration change. (d) Terminal time convergence. (e) Cost function convergence.

Figure 4. Monte Carlo simulations. (a) Initial guesses and optimized trajectories for convergent solutions. (b) Distribution of optimal times for trajectory segments. (c) Distribution of C-PDDP iterations. (d) Distribution of cost function values for the algorithm.

Table 1. C-PDDP algorithm parameter settings.

Parameter	Value
Terminal weighting matrix $W_{i}^{N}$	$diag (15 I_{3})$
Control weighting matrix $R_{i}$	1
State weighting matrix $W_{i}^{s}$	$0$
LM parameter $ρ_{v}$	$10^{- 5}$
LM parameter $ρ_{q}$	$10^{- 5}$
Damping coefficient $α_{l}$	0.4
Penalty factor $μ$	50
Number of discrete nodes N	270
Penalty factor growth rate $α$	$1.2$
Stopping threshold $ϵ$	$10^{- 4}$

Table 2. Scenario 1 parameter settings.

Parameter	Value
UAV initial parameters $(x_{0}, y_{0}, ψ_{0})$	$(0 m, 0 m, 0^{\circ})$
Target point parameters $(x_{N}, y_{N}, ψ_{N})$	$(450 m, 450 m, 45^{\circ})$
UAV flight speed V	$30 m / s$
Lateral acceleration limitation $a_{y_{max}}$	$30 {m / s}^{2}$
Waypoint 1 position $p_{w_{1}}$	$(80 m, 250 m)$
Waypoint 2 position $p_{w_{2}}$	$(150 m, 250 m)$
Waypoint 3 position $p_{w_{3}}$	$(250 m, 250 m)$
Waypoint 4 position $p_{w_{4}}$	$(350 m, 250 m)$
Waypoint 5 position $p_{w_{5}}$	$(420 m, 250 m)$
Flight time initial guess for each segment $θ_{j}$	$13.5 s$
Segment flight time limitations $[t_{min}, t_{max}]$	$[5 s, 20 s]$

Table 3. Scenario 2 parameter settings.

Parameter	Value
Waypoint 1 position $p_{w_{1}}$	$(90 m, 180 m)$
Waypoint 2 position $p_{w_{2}}$	$(360 m, 270 m)$
Initial flight time guess for each segment $θ_{j}$	$9 s$

Table 4. Comparison results of C-PDDP and C-DDP.

Algorithm	Error at the Waypoint	Total Flight Time	Energy Consumption
C-PDDP	1.06 m	23.67 s	2099.0
C-DDP	7.34 m	27.00 s	4901.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, X.; Xia, F.; Lin, D.; Jin, T.; Su, W.; He, S. Constrained Parameterized Differential Dynamic Programming for Waypoint-Trajectory Optimization. Aerospace 2024, 11, 420. https://doi.org/10.3390/aerospace11060420

AMA Style

Zheng X, Xia F, Lin D, Jin T, Su W, He S. Constrained Parameterized Differential Dynamic Programming for Waypoint-Trajectory Optimization. Aerospace. 2024; 11(6):420. https://doi.org/10.3390/aerospace11060420

Chicago/Turabian Style

Zheng, Xiaobo, Feiran Xia, Defu Lin, Tianyu Jin, Wenshan Su, and Shaoming He. 2024. "Constrained Parameterized Differential Dynamic Programming for Waypoint-Trajectory Optimization" Aerospace 11, no. 6: 420. https://doi.org/10.3390/aerospace11060420

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Constrained Parameterized Differential Dynamic Programming for Waypoint-Trajectory Optimization

Abstract

1. Introduction

2. Preliminaries and Problem Formulation

2.1. Modeling of the Spatial–Temporal Trajectory Optimization Problem with Multiple Waypoints

2.2. Parameterized Differential Dynamic Programming (PDDP)

3. Constrained Parameterized Differential Dynamic Programming for Spatial–Temporal Optimization

3.1. Constrained Parameterized Differential Dynamic Programming

3.2. Constrained Parameterized Differential Dynamic Programming for Spatial–Temporal Optimization

3.3. Scaling the Flight Time of Trajectory Segments

4. Application Example

4.1. Mission Scenario and Simulation Setup

4.2. Performance of the C-PDDP Algorithm

4.3. Comparison with C-DDP

4.4. Monte Carlo Simulation Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI