Next Article in Journal
Review of Biomimetic Approaches for Drones
Previous Article in Journal
Efficacy of Mapping Grassland Vegetation for Land Managers and Wildlife Researchers Using sUAS
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Approximate Optimal Curve Path Tracking Control for Nonlinear Systems with Asymmetric Input Constraints

College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
*
Author to whom correspondence should be addressed.
Drones 2022, 6(11), 319; https://doi.org/10.3390/drones6110319
Submission received: 5 September 2022 / Revised: 20 October 2022 / Accepted: 21 October 2022 / Published: 26 October 2022
(This article belongs to the Section Drone Design and Development)

Abstract

:
This paper proposes an approximate optimal curve-path-tracking control algorithm for partially unknown nonlinear systems subject to asymmetric control input constraints. Firstly, the problem is simplified by introducing a feedforward control law, and a dedicated design for optimal control with asymmetric input constraints is provided by redesigning the control cost function in a non-quadratic form. Then, the optimality and stability of the derived optimal control policy is demonstrated. To solve the underlying tracking Hamilton–Jacobi–Bellman (HJB) equation in consideration of partially unknown systems, an integral reinforcement learning (IRL) algorithm is utilized using the neural network (NN)-based value function approximation. Finally, the effectiveness and generalization of the proposed method is verified by experiments carried out on a high-fidelity hardware-in-the-loop (HIL) simulation system for fixed-wing unmanned aerial vehicles (UAVs) in comparison with three other typical path-tracking control algorithms.

1. Introduction

The optimal tracking control problem (OTCP) is of major importance in a variety of applications for robotic systems such as wheeled vehicles, unmanned ground vehicles (UGVs), unmanned aerial vehicles (UAVs), etc. The aim is to find a control policy to drive the specified system, given a particular reference path to follow in an optimal manner [1,2,3,4,5,6]. The reference paths are generally generated by a separate mission planner according to specific tasks, and optimization is usually achieved by minimizing an objective function regarding energy cost, tracking error cost, and/or the traveling time cost.
With the rapid development of unmanned systems, algorithms to solve OTCPs have been widely studied in the literature. Addressing the OTCPs involves solving the underlying Hamilton–Jacobi–Bellman (HJB) equation. For linear systems, the HJB equation is replaced by the Riccati equation, and the numerical solution is generally available. However, for nonlinear robotic systems subject to asymmetric input constraints, such as fixed-wing UAVs and autonomous underwater vehicles (AUVs) [7,8,9], it is still a challenging issue. To deal with this difficulty while guaranteeing tracking performance for nonlinear systems, various methods have been developed to find approximate optimal control efficacy. One idea is to simplify or transform the objective function to be optimized to obtain a solution to an approximate or equivalent optimal control problem. For instance, nonlinear model predictive control (MPC) is used to obtain a near optimal path-following control law for UAVs by truncating the time horizon and minimizing a finite-horizon tracking objective function in [7,8]. Another idea aims to solve the approximate solution directly. An offline policy iteration (PI) strategy is utilized to obtain the near optimal solution by solving a sequence of Bellman equation iteratively [10]. However, in the abovmentioned methods, the complete dynamics of the system are generally required and the curse of dimensionality might occur. To deal with this issue, an approximate dynamic programming (ADP) scheme was developed and has received increasing interest in the optimal control area [11,12,13].
ADP, which combines the concept of reinforcement learning (RL) and Bellman’s principle of optimality, was first introduced in [11] to handle the curse of dimensionality that might occur in the classical dynamic programming (DP) scheme for solving optimal control problems. The main idea is to approximate the solution to the HJB equation using some parametric function approximation techniques, for which aneural network (NN) is the most commonly used scheme, such as a single-NN based value function approximation and the actor–critic dual-NN structure [14]. For continuous-time nonlinear systems, Ref. [15] proposed a data-based ADP algorithm to relax the dependence on the internal dynamics of the control system, which is also called integral reinforcement learning (IRL), to learn the solution to the HJB equation using only partial knowledge about the system dynamics. After that, the IRL scheme became widely used in various nonlinear optimization control problems, including optimal tracking control, control with input constraints, control with unknown or partially unknown systems, etc. [7,14,15,16].
The IRL-based methods are powerful tools used to solve nonlinear optimal control problems. However, the OTCP for nonlinear systems with partially unknown dynamics and asymmetric input constraints, especially for curve path tracking is still open to study. Firstly, the stability of the IRL-based methods for nonlinear constrained systems are generally hard to prove. Moreover, the changing curvature in the curve-path-tracking control problem makes it more difficult to stabilize the tracking error compared to the widely studied regulation control or circular path-tracking control problems. Moreover, the asymmetric input constraints are more difficult to deal with than commonly discussed symmetric constraints.
Motivated by the desire to solve the OTCP with the curve path for partially unknown nonlinear systems with asymmetric input constraints, this paper introduces a feedforward control law to simplify the problem and redesigns the non-quadratic form control input cost function and utilizes an NN-based IRL scheme to solve an approximate optimal control policy. The three main contributions are:
  • An approximate optimal curve-path-tracking control policy is developed for nonlinear systems with a feedforward control law, which handles the time-varying dynamics of the reference states caused by the curvature variation, and a data-driven IRL algorithm is developed to solve the approximate optimal control policy, in which a single-NN structure for value function approximation is utilized, reducing the computation burden and simplifying the algorithm structure.
  • The non-quadratic control cost function is redesigned via a constraint transformation with the introduced feedforward control law, which solves the challenge of asymmetric control input constraints that traditional methods cannot handle directly, and satisfactory input constraints are guaranteed with proof.
  • The proposed approximate optimal path-tracking control algorithm is validated via hardware-in-the-loop (HIL) simulations for fixed-wing UAVs in comparison with three other typical path-tracking algorithms. The result shows that the proposed algorithm not only has much less fluctuation and smaller root mean squared error (RMSE) of the tracking error but also naturally meets the control input constraints.

2. Problem Formulation

This section briefly formulates the OTCP of nonlinear systems subject to asymmetric control input constraints.
Consider the following affine nonlinear kinematic systems:
x ˙ k ( t ) = f k ( x k ( t ) ) + g k ( x k ( t ) ) u ( t ) ,
where x k R n 1 is the vector of system motion states that we focus on, f k ( · ) : R n 1 R n 1 is the internal kinematic dynamics, g k ( · ) : R n 1 R n 1 × m is the control input dynamics of system, and u R m is the control input, which is constrained by
λ j m i n u j λ j m a x , j = 1 , , m
where λ j m i n and λ j m a x are the minimum and the maximum thresholds of control input u j , which are decided by characteristics of the actuator, and not always satisfying λ j m i n = λ j m a x .
Remark 1.
The asymmetric control input constraint (2) is widespread in practical systems, such as fixed-wing UAVs and autonomous underwater vehicles (AUVs) [1,7,8,9,17]. For these systems, existing control algorithms that consider only symmetric input constraints cannot be utilized directly.
This paper studies the OTCP with curve paths for system (1) with input constraint (2). Thus, we focus on the tracking performance of the above motion states x k with reference to the reference motion states x k d specified by the corresponding virtual target point (VTP) p d on the reference path. Then, the considered tracking control system is described as
x ˙ e = f e ( x e , x d ) + g e ( x e , x d ) u x ˙ d = f d ( x d ) ,
where x e = x k x k d describes the tracking error state, x d = [ x k d , x c d ] R n 2 represents the bounded state vector related to the reference motion states, not subject to human control, x k d R n 1 is the reference motion states, and x c d R n 2 n 1 describes some other related system variables, n 2 n 1 0 . The continuous-time functions, f e ( · ) and g e ( · ) , are internal dynamics and control input dynamics of the tracking error system, f d ( x d ) is the dynamics of the reference states and is decided by the task setting. Obviously, the specific form of f e ( · ) and g e ( · ) is closely related to the specific f d ( · ) . For the tracking control problem of system (3), the complete system state is denoted as x = [ x e , x d ] . Then there is x R n , n = n 1 + n 2 .
Remark 2.
Suppose that the reference path is generated by a separate mission planner, and x c d describes system dynamic parameters determined by the task setting, such as the moving speed of the VTP along the reference path. Then, it is reasonable to suppose that f d ( · ) is known, which describes the shape of the reference path as well as the motion dynamics of the reference point along the path.
Then, in the problem of curve-path-tracking control, given the reference motion state x k d corresponding to p d , denote the curvature of the reference path at this point as κ d , and the speed of the point moving along the path as v d . The dynamics of the reference states can be more specifically described as
x ˙ d = f d ( x d ) = x ˙ k d v ˙ d κ ˙ d = f k d ( x k d , κ d , v d ) f v d ( x k d ) f κ d ( x k d ) .
Then the control objective is to find an optimal control policy u * that consumes at the least cost to drive the tracking error x e to converge to 0 . To this end, take the objective function as
J ( x ( 0 ) , u ) = 0 E ( x ) + U ( u ) d τ , x ( 0 ) X ,
where X R n is a compact set containing the origin of the tracking error, E ( x ) = x e T ( t ) Q x e ( t ) is the quadratic tracking error cost with the positive definite diagonal matrix Q , and U ( u ) is the positive semi-definite control cost to be designed.
Now, referring to the concept in optimal control theory in [18], we define the admissible control for OTCP as follows.
Definition 1.
A control policy u ( t ) = μ ( x ( t ) ) is said to be admissible, denoted as u ( t ) U , with respect to objective function (5) for the tracking control system (3), if μ ( x ( t ) ) is continuous on X and satisfies constraints (2), and the corresponding state trajectory x ( t ) makes J ( x ( 0 ) ) < , x ( 0 ) X .
Then, the main objective of this paper is to find the optimal control policy u * U that minimizes the objective function (5), and before we illustrate the design of solving u * , the following assumption is made in this paper.
Assumption 1.
For any initial state x ( 0 ) X , given the dynamic function f d ( · ) of the reference state, there exists an admissible control u ( 0 ) U , i.e., u ( 0 ) , which satisfies constraints (2), is continuous to x on set X , and stabilizes the tracking error in (3).

3. Optimal Control Design for Curve Path Tracking with Asymmetric Control Input Constraints

To find the optimal curve-path-tracking control policy u * for system (3), this section first introduces a feedforward control law which helps to deal with the variation of the reference state dynamics. Then a dedicated design for a control cost function, which enables natural satisfactory of the asymmetric input constraint, is proposed (2).
Note that the main difficulty of curve-path-tracking control compared with that of regulation or straight/circular path tracking, is that the dynamics of the reference motion states x k d is time-varying because of the varying curvature of the reference path. To drive the tracking error to converge to 0 , when x e = 0 , it needs x ˙ e = 0 . The point is, different from regulation control problem, there needs to be a non-zero steady-state control law (denoted as u ¯ ) because of the varying dynamics of x k d , such that
x ˙ e ( t ) | x e = 0 = f e ( 0 , x d ( t ) ) + g e ( 0 , x d ( t ) ) u ¯ ( 0 , x d ( t ) ) = 0 .
It is easy to know that this non-zero steady-state control input u ¯ mainly depends on the dynamics of reference states. Therefore, we rewrite the dynamic function of the reference motion state in (4) in the following form:
x ˙ k d = f k d ( x k d , κ d , v d ) = f k ( x k d ) + g k ( x k d ) u d ( x k d , κ d , v d ) .
Importing (7) and (1) into (6), we can obtain
u ¯ ( 0 , x d ) = u d ( x k d , κ d , v d ) .
Then for x e 0 , we extend the above result to define the feedforward control u ¯ ( x ) as
u ¯ ( x | x d ) = u d ( x k d , κ d , v d ) .
Remark 3.
  • The rewriting of (7) is reasonable for practical robotic systems since the reference state as well as the associated constraint conditions are well-concerned by the separate mission planner and will be illustrated by examples in later experiments.
  • The feedforward control law u ¯ here is not an admissible control policy, which cannot drive a non-zero tracking error to 0 , but is to be taken as a part of the control policy for the tracking control system.
Now, this paper explains how to solve the desired optimal tracking control strategy u * that satisfies the asymmetric control input constraint (2) in a simplified way by using u ¯ .
Given the dynamic function f d ( · ) of reference states, u ¯ can be obtained in real time according to (8). Then, the complete tracking control policy can be described as
u u ¯ + u ˜ ,
where u ˜ is the feedback control to be solved. Importing u ˜ into the tracking error state equation in (3) generates
x ˙ e = f e ( x e , x d ) + g e ( x e , x d ) ( u ¯ + u ˜ ) = f ¯ e ( x e , x d ) + g e ( x e , x d ) u ˜ ,
where
f ¯ e ( x e , x d ) = f e ( x e , x d ) + g e ( x e , x d ) u ¯ .
Thus it holds that f ¯ e ( 0 , x d ) = 0 . Then, to solve the optimal control policy u * is actually equivalent to solve the optimal feedback control u ˜ * u * u ¯ .
Therefore, in the consideration of control input constraint (2), referring to [10,16,19], the control cost in (5) is designed as
U ( u ) = U ˜ ( u ˜ ) = 2 j = 1 m 0 u ˜ j λ ˜ j r j tanh 1 ( s / λ ˜ j ) d s ,
which is a semi-positive definite function, and the greater the absolute value of the control input component u ˜ j , the greater the function value. So, being a part of the objective function, it can help to find an energy-optimal solution, and r j > 0 is the weight coefficient with reference to component j. The main difference of (10) compared with that in [10,16,19] is that the threshold parameter in the integrand, i.e., λ ˜ j 0 , is not a constant directly obtained from a symmetric control constraint but redefined for the asymmetric constraint (2) with the introduced feedforward control law as
λ ˜ j = ( λ j m i n u ¯ j ) , if u ˜ j < 0 . λ j m a x u ¯ j , if u ˜ j 0 .
This design allows for the natural satisfaction of the asymmetric control input constraint 2, which will be illustrated later in Lemma 1.
Then for tracking control system (3) subject to asymmetric control input constraint (2), given an initial state x ( 0 ) X and the objective fuction (5) with (10), we define the optimal value function V * ( x ) C 1 as
V * ( x ( t ) ) = min u U J ( x ( t ) , u ) = min u ˜ | u ¯ + u ˜ U J ( x ( t ) , u ˜ ) .
Correspondingly, the Hamiltonian is constructed as
H ( x , u ˜ , V * ) = E ( x ) + U ˜ ( u ˜ ) + ( V * ) x ˙ = E ( x ) + U ˜ ( u ˜ ) + ( x e V * ) f ¯ e ( x e , x d ) + g e ( x e , x d ) u ˜ ,
where V * = V * x , x e V * = V * x e . Then, according to the principle of optimality, u ˜ * satisfies
H ( x , u ˜ * , V * ) = E ( x ) + U ˜ ( u ˜ * ) + ( x e V * ) f ¯ e ( x e , x d ) + g e ( x e , x d ) u ˜ * = 0 .
Then using the stationary condition, the optimal feedback control u ˜ * can be obtained as
u ˜ * = Λ tanh 1 2 ( Λ R ) 1 g e ( x ) x e V * ,
where Λ , R R m × m are diagonal matrices constructed by λ ˜ j and r j , ( j { 1 , , m } ) , respectively, i.e.,
Λ = λ ˜ 1 0 0 0 0 λ ˜ m ,
R = r 1 0 0 0 0 r m .
Then the optimal tracking control policy u * u ¯ + u ˜ * is
u * = u ¯ Λ tanh 1 2 ( Λ R ) 1 g e ( x ) x e V * .
Importing u ˜ * into (10), we can obtain the optimal control cost
U ˜ ( u ˜ * ) = 2 j = 1 m 0 u ˜ j * λ ˜ j r j ( tanh 1 ( s / λ ˜ j ) ) T d s = ( x e V * ) g e ( x ) Λ tanh ( D * ) + diag ( Λ R Λ ) ln ( 1 2 tanh 2 ( D * ) ) ,
where D * = 1 / 2 ( Λ R ) 1 g e ( x ) x e V * , diag ( · ) represents the vector constructed by the matrix main diagonal elements, 1 2 = ( 1 , 1 ) .
Further, importing (16) into (13), the tracking HJB equation turns into
E ( x ) + ( x e V * ) f ¯ e ( x e , x d ) + diag ( Λ R Λ ) ln ( 1 2 tanh 2 ( D * ) ) = 0 .
Then if one can obtain the solution V * by solving (17), (15) would provide the desired optimal tracking control policy.
Now we propose the following lemma.
Lemma 1.
With the non-quadratic control cost function (10), the optimal control policy u * in (15) satisfies the asymmetric constraint (2) naturally.
Proof. 
Under Assumption 1, there exists an admissible control u ( 0 ) U such that
λ j m i n u j ( 0 ) λ j m a x .
Denote u ( 0 ) as
u ( 0 ) = u ˜ ( 0 ) + u ¯ .
Since u ( 0 ) is an admissible control law, according to Definition 1, there must be
x ˙ e ( t ) | x e = 0 = f e ( 0 , x d ( t ) ) + g e ( 0 , x d ( t ) ) u ( 0 ) ( t ) = f e ( 0 , x d ( t ) ) + g e ( 0 , x d ( t ) ) u ˜ ( 0 ) + u ¯ = 0 + g e ( 0 , x d ( t ) ) u ˜ ( 0 ) = 0 .
Thus we have
u ˜ ( 0 ) ( 0 , x d ) = 0 .
Putting u ˜ ( 0 ) into (18) generates
λ j m i n u ¯ j ( 0 , x d ) λ j m a x .
Then according to definition of λ ˜ j in (11) and the extended feedforward control defined in (8), we have
λ ˜ j 0 .
Since 1 tanh ( · ) 1 , according to (14) and (11), the feedback control u ˜ * satisfies
u ˜ j * λ ˜ j = λ j m i n u ¯ j , when u ˜ j * < 0 . u ˜ j * λ ˜ j = λ j m a x u ¯ j , when u ˜ j * 0 .
Then combining (20) with (15), we have
λ j m i n u j * λ j m a x .
This completes the proof.    □
Next, the following theorem provides the optimality and stability analysis of u * .
Theorem 1.
For tracking control system (3), given the dynamics function f d ( · ) of the reference state, initial state x ( 0 ) X and the objective function (5) with (10), assume V * is a smooth positive definite solution to (17), then the optimal control policy given by (15) has the following properties:
  • u U , u * minimizes objective function J ( x ( 0 ) , u ) ;
  • u * stabilizes the tracking error x e gradually.
Proof. 
First, we prove that u * minimizes the objective function J.
Given the initial state x ( 0 ) and the solution of HJB equation (17) as V * , it holds that
0 V ˙ * ( x ( t ) ) d t = V * ( x ( 0 ) ) .
Thus for any admissible control u = u ¯ + u ˜ , the corresponding objective function (5) can be represented as
J ( x ( 0 ) , u ) = 0 E ( x ) + U ˜ ( u ˜ ) d t + 0 V ˙ * ( x ( t ) ) d t + V * ( x ( 0 ) ) .
Deriving V * alone the state trajectory corresponding to u i , we have
V ˙ * ( x ( t ) ) = V * x x ˙ = ( x e V * ) f ¯ e ( x e , x d ) + g e ( x e , x d ) u ˜ ,
and
J ( x ( 0 ) , u ) = 0 E ( x ) + U ˜ ( u ˜ ) d τ + 0 ( x e V * ) f ¯ e ( x e , x d ) + g e ( x e , x d ) u ˜ d τ + V * ( x ( 0 ) ) .
By adding and subtracting 0 ( x e V * ) g e ( x ) u ˜ * d τ and 0 U ˜ ( u ˜ * ) d τ to the right side of the equation, it generates
J ( x ( 0 ) , u ) = 0 E ( x ) + U ˜ ( u ˜ * ) d τ + 0 ( x e V * ) f ¯ e ( x e , x d ) + g e ( x e , x d ) u ˜ * d τ + V * ( x ( 0 ) ) + 0 ( x e V * ) g e ( x e , x d ) ( u ˜ u ˜ * ) d τ + 0 U ˜ ( u ˜ ) d τ 0 U ˜ ( u ˜ * ) d τ .
Then combine (24) with HJB equation (13), we further obtain
J ( x ( 0 ) , u ) = 0 H ( x , u ˜ * , V * ) d τ + V * ( x ( 0 ) ) + 0 ( x e V * ) g e ( x e , x d ) ( u ˜ u ˜ * ) d τ + 0 U ˜ ( u ˜ ) d τ 0 U ˜ ( u ˜ * ) d τ = V * ( x ( 0 ) ) + 0 ( x e V * ) g e ( x ) ( u ˜ u ˜ * ) + 2 j = 1 m u ˜ j * u ˜ j λ ˜ j r j tanh 1 ( s / λ ˜ j ) d s d τ .
Denote that
M = ( x e V * ) g e ( x ) ( u ˜ u ˜ * ) + 2 j = 1 m u ˜ j * u ˜ j λ ˜ j r j tanh 1 ( s / λ ˜ j ) d s .
Then to prove that u * minimizes J, one needs to prove that M > 0 for all admissible control u u * , and that M = 0 if and only if u = u * .
Based on (14), there is
( x e V * ) g e ( x ) = 2 Λ R tanh 1 ( Λ 1 u ˜ * ) .
Then importing (27) to M, we obtain
M = 2 Λ R tanh 1 ( Λ 1 u ˜ * ) ( u ˜ * u ˜ ) + 2 j = 1 m u ˜ j * u ˜ j λ ˜ j r j tanh 1 ( s / λ ˜ j ) d s = 2 j = 1 m r j λ ˜ j tanh 1 ( u ˜ j * / λ ˜ j ) ( u ˜ j * u ˜ j ) + u ˜ j * u ˜ j λ ˜ j tanh 1 ( s / λ ˜ j ) d s .
To help to analyze, define a function ς a ( x 1 , x 2 ) as
ς a ( x 1 , x 2 ) = a tanh 1 ( x 1 / a ) ( x 1 x 2 ) + x 1 x 2 a tanh 1 ( s / a ) d s ,
where a > 0 , a x 1 ,   x 2 a . Since tanh 1 ( · ) increases monotonically, when x 1 < x 2 , there must be a x ^ ( x 1 , x 2 ) , such that
a tanh 1 ( x ^ / a ) ( x 2 x 1 ) = x 1 x 2 a tanh 1 ( s / a ) d s ,
and that tanh 1 ( x ^ / a ) > tanh 1 ( x 1 / a ) . Then importing tanh 1 ( x ^ / a ) into ς a ( x 1 , x 2 ) generates
ς a ( x 1 , x 2 ) = a tanh 1 ( x 1 / a ) ( x 1 x 2 ) + a tanh 1 ( x ^ / a ) ( x 2 x 1 ) = a tanh 1 ( x ^ / a ) tanh 1 ( x 1 / a ) ( x 2 x 1 ) > 0 .
Likewise, when x 1 > x 2 , there must also be a x ^ ( x 2 , x 1 ) , such that
a tanh 1 ( x ^ / a ) ( x 1 x 2 ) = x 1 x 2 a tanh 1 ( s / a ) d s ,
and that tanh 1 ( x 1 / a ) > tanh 1 ( x ^ / a ) . Then importing tanh 1 ( x ^ / a ) into ς a ( x 1 , x 2 ) we have
ς a ( x 1 , x 2 ) = a tanh 1 ( x 1 / a ) ( x 1 x 2 ) a tanh 1 ( x ^ / a ) ( x 1 x 2 ) = a tanh 1 ( x 1 / a ) tanh 1 ( x ^ / a ) ( x 1 x 2 ) > 0 .
Further, when x 1 = x 2 , it holds that ς a ( x 1 , x 2 ) = 0 . That is, ς a ( x 1 , x 2 ) = 0 only when x 1 = x 2 , and ς a ( x 1 , x 2 ) > 0 , when x 1 x 2 .
Combining the above conclusion with (28), M can be represented as
M = 2 j = 1 m r j ς λ ˜ j ( u ˜ j * , u ˜ j ) .
Then, there is
M > 0 if j { 1 , , m } , s . t . u ˜ j u ˜ j * . M = 0 if j { 1 , , m } , s . t . u ˜ j = u ˜ j * .
Therefore, J ( x ( 0 ) , u ) V * ( x ( 0 ) ) holds for all u U , in which the = holds only when u = u * u ¯ + u ˜ * .
Next, we prove that the tracking error x e is gradually stabilized with u * .
Note that V * ( x ) is a positive semi-definite function. Take V * ( x ) as the Lyapunov function of the tracking control system (3), then there is
V ˙ * ( x ( t ) ) = x e Q x e U ˜ ( u ˜ * ) 0 .
It is known from the proof of Lemma 1 that u ˜ * ( 0 , x d ) = 0 , then the “=’’ in (33) holds only if x e = 0 . Thus, u * gradually stabilizes x e .
This completes the proof.    □

4. IRL-Based Approximate Optimal Solution

The last section provides the design of the optimal tracking control policy u * . However, to solve u * involves solving the HJB Equation (17), which is highly nonlinear to V * . In consideration of the difficulty in solving (17), this section provides an NN-based IRL algorithm to obtain an approximate optimal solution.
With the optimal value function denoted as V * , the following integral form of the value function is taken according to the idea of IRL:
V * ( x ( t ) ) = t t + T x e Q x e + U ˜ ( u ˜ * ) d τ + V * ( x ( t + T ) ) ,
where the integral reinforcement interval T > 0 .
Then the IRL-based PI Algorithm 1 is presented as follows.
Algorithm 1 IRL-based optimal path-tracking algorithm
1:
Policy evaluation: NN weights update
V ( k ) ( x ( t ) ) = t t + T x e Q x e + U ˜ u ˜ ( k ) d τ + V ( k ) ( x ( t + T ) ) .
2:
Policy improvement:
u ( k + 1 ) = Λ tanh D ( k ) + u ¯ ,
where D ( k ) = 1 / 2 ( Λ R ) 1 g e ( x ) x e V ( k ) .
Remark 4.
Equation (34) is equivalent to the HJB equation (17) in the way that (34) and (17) have the same positive definite solution V * , and according to the result of traditional PI algorithm, given an initial admissible control u ( 0 ) , then for all k 0 , iteratively solving (35) for V ( k ) , there always exists an admissible control u ( k + 1 ) with (36), and when k , u ( k ) and V ( k ) uniformly converge to u * and V * [10,20].
To implement Algorithm 1, this paper introduces a single-layer NN with p neurons to approximate the value function:
V ( k ) ( x ) = W c ( k ) σ ( x ) + ε ( x ) ,
and
V ( k ) ( x ) = σ ( x ) W c ( k ) + ε ( x ) ,
where W c ( k ) R p is the optimal weight vector to approximate V ( k ) , σ ( · ) : R n R p is the vector of continuously differentiable bounded basis functions, and ε is the approximation error. Then, according to work in [10], when the number of neurons p , the fitting error ε would be close to 0, and [21] points out that, even when the number of neurons is limited, the fitting error is still bounded. Therefore, ε and ε are bounded over the compact set X , i.e., there exist constants b ε > 0 and b ε > 0 such that | ε ( x ) | b ε , ε ( x ) b ε .
Putting (37) into (35), we obtain the tracking Bellman error as
ε c ( k ) ( t ) = t t + T x e Q x e + U ˜ u ˜ ( k ) d τ + W c ( k ) Δ σ ( x ( t ) ) ,
where Δ σ ( x ( t ) ) = σ ( x ( t + T ) ) σ ( x ( t ) ) . Then, there exists a positive constant ε m a x such that | ε c ( k ) ( t ) | ε m a x ,   t 0 .
Since the optimal weight vector W c ( k ) in (37) is unknown, the value function is approximated in the iteration as
V ^ ( k ) ( x ) = W ^ c ( k ) σ ( x ) ,
where W ^ c ( k ) is the estimation of W c ( k ) . Then, the estimation of ε c ( k ) ( t ) is
e ^ c ( k ) ( t ) = t t + T x e Q x e + U ˜ u ˜ ( k ) d τ + W ^ c ( k ) Δ σ ( x ( t ) ) .
To find the best weight vector W c ( k ) of V ( k ) , the tuning law of the weight estimation W ^ c ( k ) should minimize the estimated Bellman error e ^ c ( k ) . Utilizing the gradient decent scheme and considering the objective function E c = 1 2 e ^ c ( k ) 2 , we take the tuning law for the weight vector as
W ^ ˙ c ( k ) = α c δ E c W ^ c ( k ) = α c Δ σ ( x ) Δ σ T ( x ) Δ σ ( x ) + 1 2 e ^ c ( k ) ,
where α c > 0 is the learning rate, and δ = 1 ( Δ σ T ( x ) Δ σ ( x ) + 1 ) 2 is used for normalization [16]. Then, taking the sampling period as equal to the integral reinforcement interval T, after each N sampling period, the NN weights of online IRL-based PI for the approximate tracking control policy after k t h iterations is updated by
W ^ c ( k + 1 ) = W ^ c ( k ) α c 1 N j = 0 N 1 Δ σ j ( x ) Δ σ j ( x ) Δ σ j ( x ) + 1 2 e ^ c j ( k ) .
Importing W ^ c ( k + 1 ) into (36), we obtain the improved control policy
u ^ ( k + 1 ) = Λ tanh D ^ ( k + 1 ) + u ¯ ,
where D ^ ( k + 1 ) = 1 / 2 ( Λ R ) 1 g e ( x ) σ ( x ) W ^ c ( k + 1 ) . Then, given an initial approximated weight W ^ c ( 0 ) corresponding to an admissible initial control u ( 0 ) , the online IRL-based PI can be performed as in Figure 1.
Remark 5.
Let u ( 0 ) be any admissible bounded control policy in the algorithm in Figure 1, and take (42) as the tuning law of the critic NN weights. If Δ σ ¯ = Δ σ ( x ) / ( Δ σ T ( x ) Δ σ ( x ) + 1 ) is persistently exciting (PE), i.e., if there exist γ 1 > 0 and γ 2 > 0 such that t > 0
γ 1 I t t + T Δ σ ¯ Δ σ ¯ T d τ γ 2 I ,
where I is the unit matrix, then for the bounded reconstruction error ε c ( k ) in (41), the critic weight estimation error W ˜ c ( k ) = W c ( k ) W ^ c ( k ) converges exponentially fast to a residual set [13,14,15].

5. Application to Fixed-Wing UAVs

This section verifies the proposed method on the OTCP for fixed-wing UAVs curve path tracking in HIL simulations in comparison with three other typical path tracking algorithms.

5.1. Problem Formulation

The system state of fixed-wing UAVs denoted by x k = ( x , y , ψ ) includes the position of the UAV in the inertial system p = ( x , y ) and the heading angle ψ . The control input u comprises the airspeed u v and heading rate u ω , which are constrained by
v s t a l l u v v m a x , ω m a x u ω ω m a x ,
where v s t a l l > 0 is the minimum stall speed, v m a x , and ω m a x are the maximum speed and heading rate, respectively, determined by executor features.
Given the VTP p d ( t ) at time t, the corresponding reference motion state x k d ( t ) = ( x d , y d , ψ d ) is then designated. Let the VTP move at a constant speed v d along the reference path. Then, denote the curve length of one point with reference to the start point along the path as l. Given the parameterized function Q ( l ) of the reference path, the curvature κ d at p d ( t ) can be calculated. Then, the reference state dynamics are obtained:
x ˙ d = x ˙ d y ˙ d ψ ˙ d v ˙ d κ ˙ d = 0 0 0 0 f κ d ( x k d | Q ( l ) ) + cos ψ d 0 sin ψ d 0 0 1 0 0 0 0 v d v d κ d .
Then, the feedforward control law is
u ¯ = v d v d κ d .
Define the tracking error in a local Frenet–Serret coordinate system { F } as x e = ( x e , y e , ψ e ) [22,23]. Then, let Q = I 3 × 3 and R = I 2 × 2 . The goal is to solve the optimal control u * that minimizes the objective function (5) with (10).

5.2. Approximate Optimal Control Policy Learning

This subsection utilizes the proposed method to find an approximate optimal policy for OTCP of fixed-wing UAVs formulated in the last subsection.
The learning process is carried out on Matlab 2018. Table 1 presents the parameter settings, and the nonlinear kinematics of fixed-wing UAVs is modeled by
x ˙ k = x ˙ y ˙ ψ ˙ = cos ψ 0 sin ψ 0 0 1 u v u ω .
Given coordinates of five waypoints, the reference curve path is generated using the third order B-spline curve algorithm (See Figure 2a). Given the reference state of the start point on the reference path, the initial state of the UAV is randomly chosen within x e ( 0 ) , y e ( 0 ) [ 50 , 50 ] , ψ e ( 0 ) [ π , π ] . The basis for the value function approximation is selected as
σ = x e y e ψ e x e y e x e ψ e y e ψ e x e 2 y e 2 ψ e 2 .
The value function NN weights are initialized as
W ^ c ( 0 ) = 0.1 0.1 0.5 0.1 0.1 0.1 0.1 0.1 0.5 ,
which corresponds to an admissible but non-optimal control policy u ( 0 ) . Given the initial NN weights and the corresponding admissible initial control policy, the tracking data are collected online, and the NN weights are updated once a batch of a specific amount of data is collected according to the flow in Figure 1.
The iterative process of critic NN weight estimates are provided in Figure 2b, which converges to a steady value in 23 steps, and the final NN weights are
W ^ c ( 23 ) = 0.329 0.002 0.574 0.102 0.074 0.120 0.100 0.103 3.204 ,
which provides an approximate optimal path-tracking control policy for fixed-wing UAVs. In the process of policy training, we found that w 3 and w 9 demonstrate stronger oscillation compared with other NN weights, which is also presented in Figure 2b. This is because both of the corresponding activation functions are one-variable functions of the heading angle error ψ e , which is set to be within [ π , π ] during the training, and the value ranges of x e and y e are set to be [ 300 , 300 ] . Thus, there is no unified metric for the three components. As a result, the weight of the activation function would be much more sensitive to variation of the approximated function value.

5.3. HIL Simulation Test and Result Analysis

To fully validate the effectiveness of the proposed method on OTCP of fixed-wing UAVs, the learned control policy was tested on a high-fidelity HIL simulation system in comparison with three other typical path-tracking algorithms [5]: the pure pursuit and line of sight algorithm (PLOS), the nonlinear Lyapunov guidance method (NLGL), and the backstepping control method (BS). The HIL simulation system consists of a swarm control station, the host computer, a Pixhawk autopilot, a QGround Control, and an X-Plane aircraft simulator. Specifically, the swarm control station, which is used to give task instructions and displays the current status of the system, was developed by the authors’ team. The host computer was used to simulate the onboard computer of the physical aircraft, which receives and processes task instructions from the control station and state information from onboard sensors and generates and sends control commands to Pixhawk. The Pixhawk autopilot is a widely-used open-source autopilot, and it processes and generates control commands for the underlying actuators and collects and sends back the sensor data. The X-Plane aircraft simulator, which is a high-fidelity aircraft simulator, provides the physical engine and dynamics simulation of the UAV, and the QGround Control performs as an information transfer station between X-Plane and Pixhawk (see Figure 3 for the flow of control commands and sate information).
Note that:
  • The reference path in HIL simulations shown in Figure 4a is generated by QGround Control with eight waypoints (provided in Table 2) on an experimental airport, which is different from that used for policy learning and has larger curvature changes.
  • Speed constraints in the aircraft simulator during the test were 10 u v 18 , different from settings in policy learning (which is the same as a practical UAV platform).
In spite of the abovementioned differences between settings of policy learning process and HIL simulation, the learned control policy provided satisfying tracking performance in the comparison HIL simulation. The path-tracking trajectories are presented in Figure 4b, which shows that all of the four algorithms can stably track the reference curve path. Figure 5 and Figure 6 further show the heading and the cross-tracking errors of the four algorithms. From these two figures, we can see that the learned control policy using the proposed method leads to a smooth curve-path-tracking trajectory with a small lateral steady-state tracking error and near zero heading and forward steady-state tracking errors. Moreover, heading tracking errors of BS, PLOS, and NLGL, the forward tracking error of BS, and the lateral tracking error of PLOS and NLGL, show significant fluctuations compared with the proposed method, especially when the UAV moves up to the corners of the reference path. This is because the heading tracking error and the information of curvature variation of the reference path are not considered in the three algorithms. Therefore, the three algorithms cannot achieve a satisfactory curve-path-tracking control performance as in the straight-line and circular path-tracking control problems, and the proposed method can provide more stable and smooth tracking performance. Figure 6 also shows that both PLOS and NLGL algorithms have a significant steady-state forward error. The main reason is that, the tracking performance of the two algorithms is very dependent on the update rule of the VTP, which is required to be updated ahead of a distance before the UAV’s arrival, and the algorithms would fail to track the path if this distance is not large enough (such as, smaller than about 20 m). Finally, Figure 7 provides the control input using the provided method, which verifies that the input constraints are naturally satisfied instead of being forcibly cut down during the whole path-tracking period.

6. Conclusions

This paper developed an approximate optimal control scheme for OTCP of nonlinear systems with asymmetric input constraints. Especially, the difficulty brought by the varying curvature of the curve reference path is handled by introducing a feedforward control law. The effectiveness was verified in a high-fidelity HIL system for fixed-wing UAVs. The result confirmed the effectiveness and generalization of the learned control policy and indicates the capability of ADP theory in complicated nonlinear systems. Future work will study the robust control of such control systems under external disturbance.

Author Contributions

Conceptualization, Y.W. and X.W.; methodology, Y.W.; software, Y.W.; validation, Y.W. and X.W.; formal analysis, Y.W.; investigation, Y.W.; resources, X.W. and L.S.; data curation, Y.W.; writing—original draft preparation, Y.W.; writing—review and editing, Y.W. and X.W.; visualization, Y.W.; supervision, L.S. and X.W.; project administration, X.W. and L.S.; funding acquisition, X.W. and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China grant number 61973309; Natural Science Foundation of Hunan Province grant number 2021JJ10053 and Hunan Provincial Innovation Foundation for Postgraduate grant number CX20210009.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yang, J.; Liu, C.; Coombes, M.; Yan, Y.; Chen, W.H. Optimal Path Following for Small Fixed-Wing UAVs under Wind Disturbances. IEEE Trans. Control Syst. Technol. 2021, 29, 996–1008. [Google Scholar] [CrossRef]
  2. Kang, J.G.; Kim, T.; Kwon, L.; Kim, H.D.; Park, J.S. Design and Implementation of a UUV Tracking Algorithm for a USV. Drones 2022, 6, 66. [Google Scholar] [CrossRef]
  3. Ratnoo, A.; Sujit, P.B.; Kothari, M. Adaptive Optimal Path Following for High Wind Flights. IFAC Proc. Vol. 2011, 44, 12985–12990. [Google Scholar] [CrossRef] [Green Version]
  4. Lin, F.; Chen, Y.; Zhao, Y.; Wang, S. Path Tracking of Autonomous Vehicle Based on Adaptive Model Predictive Control. Int. J. Adv. Robot. Syst. 2019, 16, 1–12. [Google Scholar] [CrossRef] [Green Version]
  5. Sujit, P.B.; Saripalli, S.; Sousa, J.B. Unmanned Aerial Vehicle Path Following: A Survey and Analysis of Algorithms for Fixed-Wing Unmanned Aerial Vehicles. IEEE Control Syst. Mag. 2014, 34, 42–59. [Google Scholar]
  6. Chen, S.; Chen, H.; Negrut, D. Implementation of MPC-Based Path Tracking for Autonomous Vehicles Considering Three Vehicle Dynamics Models with Different Fidelities. Automot. Innov. 2020, 3, 386–399. [Google Scholar] [CrossRef]
  7. Rucco, A.; Aguiar, A.P.; Pereira, F.L.; de Sousa, J.B. A Predictive Path-Following Approach for Fixed-Wing Unmanned Aerial Vehicles in Presence of Wind Disturbances. Adv. Intell. Syst. Comput. 2016, 417, 623–634. [Google Scholar] [CrossRef]
  8. Alessandretti, A.; Aguiar, A.P. A Planar Path-Following Model Predictive Controller for Fixed-Wing Unmanned Aerial Vehicles. In Proceedings of the 11th International Workshop on Robot Motion and Control (RoMoCo), Wasowo, Poland, 3–5 July 2017; pp. 59–64. [Google Scholar] [CrossRef]
  9. Chen, H.; Cong, Y.; Wang, X.; Xu, X.; Shen, L. Coordinated Path-Following Control of Fixed-Wing Unmanned Aerial Vehicles. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 2540–2554. [Google Scholar] [CrossRef]
  10. Abu-Khalaf, M.; Lewis, F.L. Nearly Optimal Control Laws for Nonlinear Systems with Saturating Actuators Using a Neural Network HJB Approach. Automatica 2005, 41, 779–791. [Google Scholar] [CrossRef]
  11. Powell, W.B. Approximate Dynamic Programming; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2007. [Google Scholar] [CrossRef]
  12. Yang, X.; He, H.; Liu, D.; Zhu, Y. Adaptive Dynamic Programming for Robust Neural Control of Unknown Continuous-Time Non-Linear Systems. IET Control Theory Appl. 2017, 11, 2307–2316. [Google Scholar] [CrossRef] [Green Version]
  13. Jiang, H.; Zhang, H.; Luo, Y.; Han, J. Neural-Network-Based Robust Control Schemes for Nonlinear Multiplayer Systems with Uncertainties via Adaptive Dynamic Programming. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 579–588. [Google Scholar] [CrossRef]
  14. Vamvoudakis, K.G.; Lewis, F.L. Online Actor-Critic Algorithm to Solve the Continuous-Time Infinite Horizon Optimal Control Problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
  15. Vrabie, D.; Lewis, F. Neural Network Approach to Continuous-Time Direct Adaptive Optimal Control for Partially Unknown Nonlinear Systems. Neural Netw. 2009, 22, 237–246. [Google Scholar] [CrossRef] [PubMed]
  16. Modares, H.; Lewis, F.L. Optimal Tracking Control of Nonlinear Partially-Unknown Constrained-Input Systems Using Integral Reinforcement Learning. Automatica 2014, 50, 1780–1792. [Google Scholar] [CrossRef]
  17. Yan, J.; Yu, Y.; Wang, X. Distance-Based Formation Control for Fixed-Wing UAVs with Input Constraints: A Low Gain Method. Drones 2022, 6, 159. [Google Scholar] [CrossRef]
  18. Lewis, F.L.; Vrabie, D.L.; Syrmos, V.L. Optimal Control, 3rd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2012. [Google Scholar] [CrossRef]
  19. Adhyaru, D.M.; Kar, I.N.; Gopal, M. Bounded Robust Control of Nonlinear Systems Using Neural Network–Based HJB Solution. Neural Comput. Appl. 2010, 20, 91–103. [Google Scholar] [CrossRef]
  20. Liu, D.; Yang, X.; Li, H. Adaptive Optimal Control for a Class of Continuous-Time Affine Nonlinear Systems with Unknown Internal Dynamics. Neural Comput. Appl. 2012, 237, 2012, 23, 1843–1850. [Google Scholar] [CrossRef]
  21. Hornik, K.; Stinchcombe, M.; White, H. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw. 1990, 3, 551–560. [Google Scholar] [CrossRef]
  22. Aguiar, A.P.; Hespanha, J.P.; Kokotović, P.V. Performance Limitations in Reference Tracking and Path Following for Nonlinear Systems. Automatica 2008, 44, 598–610. [Google Scholar] [CrossRef]
  23. Wang, Y.; Wang, X.; Zhao, S.; Shen, L. Vector Field Based Sliding Mode Control of Curved Path Following for Miniature Unmanned Aerial Vehicles in Winds. J. Syst. Sci. Complex. 2018, 31, 302–324. [Google Scholar] [CrossRef]
Figure 1. The flowchart of the online integral reinforcement learning (IRL)-based policy iteration algorithm for approximate optimal tracking control policy.
Figure 1. The flowchart of the online integral reinforcement learning (IRL)-based policy iteration algorithm for approximate optimal tracking control policy.
Drones 06 00319 g001
Figure 2. The reference path for policy learning and the neural network (NN) weights iteration.
Figure 2. The reference path for policy learning and the neural network (NN) weights iteration.
Drones 06 00319 g002
Figure 3. The high-fidelity hardware-in-the-loop (HIL) simulation system.
Figure 3. The high-fidelity hardware-in-the-loop (HIL) simulation system.
Drones 06 00319 g003
Figure 4. The reference path and tracking trajectories in HIL simulation tests: (a) reference path; (b) tracking trajectory.
Figure 4. The reference path and tracking trajectories in HIL simulation tests: (a) reference path; (b) tracking trajectory.
Drones 06 00319 g004
Figure 5. The heading error comparison.
Figure 5. The heading error comparison.
Drones 06 00319 g005
Figure 6. The cross-tracking error and the root mean squared error comparison.
Figure 6. The cross-tracking error and the root mean squared error comparison.
Drones 06 00319 g006
Figure 7. The control input using the proposed method.
Figure 7. The control input using the proposed method.
Drones 06 00319 g007
Table 1. Parameter settings for optimal policy learning.
Table 1. Parameter settings for optimal policy learning.
SymbolValueMeaning
v d (m/s)19the cruising speed of the UAV
v s t a l l (m/s)14the minimum stall speed of the UAV
v m a x (m/s)24the maximum speed of the UAV
ω m a x (rad/s)0.6the maximum heading rate
Table 2. Waypoints of the reference path in HIL simulation tests.
Table 2. Waypoints of the reference path in HIL simulation tests.
WP 1WP 2WP 3WP 4
Latitude34.024534.028834.024034.0198
Longitude113.7068113.7032113.7012113.7040
WP 5WP 6WP 7WP 8
Latitude34.028234.024734.021234.0245
Longitude113.7096113.7114113.7106113.7068
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wang, Y.; Wang, X.; Shen, L. Approximate Optimal Curve Path Tracking Control for Nonlinear Systems with Asymmetric Input Constraints. Drones 2022, 6, 319. https://doi.org/10.3390/drones6110319

AMA Style

Wang Y, Wang X, Shen L. Approximate Optimal Curve Path Tracking Control for Nonlinear Systems with Asymmetric Input Constraints. Drones. 2022; 6(11):319. https://doi.org/10.3390/drones6110319

Chicago/Turabian Style

Wang, Yajing, Xiangke Wang, and Lincheng Shen. 2022. "Approximate Optimal Curve Path Tracking Control for Nonlinear Systems with Asymmetric Input Constraints" Drones 6, no. 11: 319. https://doi.org/10.3390/drones6110319

Article Metrics

Back to TopTop