Near-Optimal Tracking Control of Partially Unknown Discrete-Time Nonlinear Systems Based on Radial Basis Function Neural Network

Huang, Jiashun; Xu, Dengguo; Li, Yahui; Ma, Yan

doi:10.3390/math12081146

Open AccessArticle

Near-Optimal Tracking Control of Partially Unknown Discrete-Time Nonlinear Systems Based on Radial Basis Function Neural Network

¹

School of Automation, Guangxi University of Science and Technology, Liuzhou 545000, China

²

School of Physics and Electrical Engineering, Liupanshui Normal University, Liupanshui 553000, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(8), 1146; https://doi.org/10.3390/math12081146

Submission received: 28 February 2024 / Revised: 5 April 2024 / Accepted: 8 April 2024 / Published: 10 April 2024

(This article belongs to the Special Issue Advances in Nonlinear Analysis and Control)

Download

Browse Figures

Versions Notes

Abstract

:

This paper proposes an optimal tracking control scheme through adaptive dynamic programming (ADP) for a class of partially unknown discrete-time (DT) nonlinear systems based on a radial basis function neural network (RBF-NN). In order to acquire the unknown system dynamics, we use two RBF-NNs; the first one is used to construct the identifier, and the other one is used to directly approximate the steady-state control input, where a novel adaptive law is proposed to update neural network weights. The optimal feedback control and the cost function are derived via feedforward neural network approximation, and a means of regulating the tracking error is proposed. The critic network and the actor network were trained online to obtain the solution of the associated Hamilton–Jacobi–Bellman (HJB) equation within the ADP framework. Simulations were carried out to verify the effectiveness of the optimal tracking control technique using the neural networks.

Keywords:

adaptive dynamic programming (ADP); optimal tracking control; RBF neural network (RBF-NN); nonlinear systems

MSC:

93C10; 49L20; 49L12

1. Introduction

As is widely known, nonlinear system control is an important topic of control fields, especially for discrete-time nonlinear systems, and is difficult for traditional control methods. In recent decades, many different approaches to discrete-time system control have been proposed, such as adaptive control [1], fuzzy control [2], and PID control [3]. Optimal tracking control, one of the effective methods for nonlinear systems, has many practical engineering applications [4,5,6]. Its purpose is to design a control law that not only allows the system to track the desired trajectory but also minimizes a specific performance index. It is of great theoretical significance to explore the optimal tracking optimal control of nonlinear systems. Although dynamic programming is an effective method for solving optimal control problems, there is the problem of “curse of dimensionality” when dealing with relatively complex systems [7,8]. Moreover, it is difficult to solve the HJB equation derived from the optimal control of nonlinear systems, which has no analytical solution.

On the other hand, neural network control is used as a common control method for uncertainly nonlinear systems. In 1990, Narendra et al. first proposed an artificial neural network (ANN) adaptive control method for nonlinear dynamical systems [9]. Through neural network approximation, the uncertain system can be reconstructed using input and output data. Since then, multilayer neural networks (MNNs) have been successfully applied in pattern recognition and control systems [10]. This also has led to the generation of many types of neural networks, including the RBF-NN. In [11], Poggio et al. first proved that the RBF-NN is superior in approximating functions. Studies of RBF-NNs have also shown that these neural networks have the ability to approximate any nonlinear function with a compact ensemble and arbitrary accuracy [12,13]. Compared to other ANNs, the RBF-NN does not have the complex structure of neural networks such as back propagation (BP) networks or recurrent neural networks (RNNs), and it is easier to select parameters [11,14,15]. Its good generalization ability, simple network structure, and avoidance of unnecessarily lengthy computations are advantages that make RBF-NNs attract attention [15,16]. Many research results have been published on neural network control for nonlinear systems [17,18,19].

Benefiting from neural networks and reinforcement learning (RL), the difficult problem of solving nonlinear HJB partial differential equations is solved. The ADP algorithm was proposed by Powell to approximate the solution of the HJB equation [20], which combines the theory and methods of RL, neural networks, adaptive control and optimal control. As developed, ADP has not only been considered as one of the core methods for solving the diversity of optimal control problems but also has been successfully applied to both continuous-time systems [21,22,23] and discrete-time systems [24,25,26,27,28,29,30,31] to search for solutions of the HJB equations online. Particularly, several works have attempted to solve the discrete time nonlinear optimal regulation problem using the ADP algorithm such as robust ADP [32,33,34,35], iterative/invariant ADP [36,37,38,39], off-policy RL [40,41,42] and the Q-Learning Algorithm [40,43].

In the past decades, many relevant studies have been conducted on the optimal tracking control of discrete-time nonlinear system using the ADP algorithm. However, in the existing literature on optimal tracking of nonlinear discrete-time systems, there is no RBF neural network-based ADP algorithm. In this paper, an optimal tracking control method based on RBF-NNs for discrete-time partially unknown nonlinear systems is proposed. Two RBF neural networks are used to approximate the unknown system dynamic as well as the steady-state control. After transforming the tracking problem into a regulation problem, the critic network and the actor network are used to obtain the nearly optimal feedback control, which allows the online learning process to require only current and past system data.

The contributions of article are as follows: (1) Unlike the classical technique of NN approximation, we propose a near-optimal tracking control scheme for a class of partially unknown discrete-time nonlinear systems based on RBF-NNs and prove the stability of the system. (2) Compared with [35,39], we additionally used an RBF-NN to directly approximate the steady-state controller of the unknown system. It can solve the requirement for the priori knowledge of the controlled system dynamics and reference system dynamics. Moreover, we propose a novel adaptive law to update the weight of the steady-state controller.

The paper is organized as follows. The problem statement is shown in Section 2. The design of the optimal tracking controller of the system with partially unknown nonlinear dynamics is given in Section 3, which includes the RBF-NN identifier, the RBF-NN steady-state controller, near optimal feedback controller, and stability analysis. Section 4 provides simulation results to validate the proposed control method and details the method comparison. Section 5 draws some conclusions.

2. Problem Statement

Consider the following affine nonlinear discrete-time system [31]:

x (k + 1) = f [x (k)] + g [x (k)] u (k)

(1)

where

x (k) \in R^{n}

is the measurable system state and

u (k) \in R^{m}

is the control input. Assume that the nonlinear smooth function

f [x (k)] \in R^{n}

is an unknown drift function,

g [x (k)] \in R^{n \times m}

is a known function, and

{∥ g [x (k)] ∥}_{F} \leq g_{1}

where the Frobenius norm

{∥ \cdot ∥}_{F}

is applied. In addition, assume that

g [x (k)]

has a generalized inverse matrix

{g [x (k)]}^{+} \in R^{m \times n}

such that

g [x (k)] {g [x (k)]}^{+} = I \in R^{n \times n}

where I is the identity matrix. Let

x (0)

be the initial state.

The reference trajectory is generated by the following bounded command:

x_{d} (k + 1) = φ (x_{d} (k))

(2)

where

x_{d} (k)

\in R^{n}

and

φ (x_{d} (k)) \in R^{n}

, and

x_{d} (k)

is the reference trajectory; it need only be a stable state trajectory or asymptotically stable.

The goal of this paper is to design a controller

u (k)

that not only ensures the state of the system (1) tracks the reference trajectory but also minimizes a cost function. For the optimal tracking control technique, the cost functions are usually considered in quadratic form [4], that is

\begin{matrix} J (e (k), u (k)) = \sum_{k = 0}^{\infty} e^{T} (k) Q e (k) + u^{T} (k) R u (k) \end{matrix}

(3)

where

Q \in R^{n \times n}

and

R \in R^{m \times m}

are symmetric positive definite;

e (k) = x (k) - x_{d} (k)

is tracking error. For common solutions of tracking problems, the control input consists of two parts, a steady-state input

u_{d}

and a feedback input

u_{e}

[24]. Next, we will discuss how to obtain each part.

The steady-state controller is used to ensure perfect tracking. This perfect tracking equation is realized under the condition

x (k) = x_{d} (k)

. For this condition to be fulfilled, the steady-state part of the control

u_{d} (k)

must exist to make

x (k)

equivalent to

x_{d} (k)

. By substituting

x_{d} (k)

and

u_{d} (k)

into system (1), the reference state is

x_{d} (k + 1) = f [x_{d} (k)] + g [x_{d} (k)] u_{d} (k)

(4)

where

x_{d} (k)

and

x_{d} (k + 1)

are bounded to be tracked by the reference trajectory. If the system dynamics (1) are known,

u_{d} (k)

is acquired by

u_{d} (k) = g {[x_{d} (k)]}^{+} (x_{d} (k + 1) - f [x_{d} (k)])

(5)

where

g {[x_{d} (k)]}^{+} = {(g {[x_{d} (k)]}^{T} g [x_{d} (k)])}^{- 1} g {[x_{d} (k)]}^{T}

is the generalized inverse of

g [x_{d} (k)]

with

g {[x_{d} (k)]}^{+} g [x_{d} (k)] = I

.

Remark 1.

In the subsequent discussion, the RBF network can be used to identify the unknown dynamics of system (1); hence, (5) can be computed.

By using (1) and (4), the tracking error dynamics

e (k)

are given by

\begin{matrix} e (k + 1) & = f [x (k)] + g [x (k)] u (k) - x_{d} (k + 1) \\ = f_{e} (k) + g_{e} (k) u_{e} (k) \end{matrix}

(6)

where

f_{e} (k) = g (e (k) + x_{d} (k)) g {(x_{d} (k))}^{+} (φ (x_{d} (k)) - f (x_{d} (k))) + f (e (k) + x_{d} (k)) - φ (x_{d} (k))

,

u_{e} (k) = u (k) - u_{d} (k)

, and

g_{e} (k) = g [x_{d} (k) + e (k)]

.

u_{e} (k)

\in R^{m}

is the feedback control input. By minimizing the cost function, it is designed to stabilize the tracking error dynamics. For

e (k)

in the control sequence, the cost function can be expressed as the following discrete time tracking HJB equation

\begin{matrix} J_{e} (e (k), u_{e} (k)) & = \sum_{k = 0}^{\infty} e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k) \\ = e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k) + J_{e} (e (k + 1), u_{e} (k + 1)) \\ = r (k) + J_{e} (e (k + 1), u_{e} (k + 1)) \end{matrix}

(7)

where

r (k) = e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k)

,

J_{e} (e (k), u_{e} (k)) > 0

for

\forall e (k), u_{e} (k) \neq 0

and

J_{e} (e (k + 1)

,

u_{e} (k + 1))

denotes the cost function at the next tracking error dynamics

e (k + 1)

. The tracking error

e (k)

is used in the study of the cost function of the optimal tracking control problem.

In general, this feedback control

u_{e} (k)

is found by minimizing (7) to solve the extremum condition in the optimal control framework [4]. This result is

u_{e}^{*} (k) = - \frac{1}{2} R^{- 1} g_{e} (k) \frac{\partial J (e (k + 1))}{\partial e (k + 1)}

(8)

Then, the standard control input is obtained

u^{*} (k) = u_{d} (k) + u_{e}^{*} (k)

(9)

where

u_{d} (k)

is obtained from (5), and

u_{e}^{*} (k)

is obtained from (8).

As detailed in the subsequent discussion, in order to acquire the unknown dynamics in system (1), we used the RBF neural networks to reconstruct system dynamics. Moreover, faced with the problem of unable to find the analytical solution of (7) and the curse of dimensionality, the ADP algorithm was used to approximately solve the HJB Equation (7).

The main results of this paper are based on the following definitions and assumptions [30].

Definition 1.

A control law

u_{e}

is admissible with respect to (7) on the set Ω if

u_{e}

is continuous on a compact set

Ω_{u} \in R

for

\forall e (k) \in Ω

,

u_{e} (0) = 0

, and

J (e (0), u_{e} (\cdot))

is finite.

Assumption 1.

System (1) is controllable, and the system state

x (k) = 0

is in equilibrium under control

u (k) = 0

. Input control

u (k) = u (x (k))

satisfies

u (x (k)) = 0

for

x (k) = 0

, and the cost function is a positive definite function for any

x (k)

and

u (k)

.

Lemma 1.

For the tracking error system (6), assume that

u_{e} (k)

is an admissible control, the internal dynamics

f_{e} (k)

is bounded, and

\begin{matrix} ∥ f_{e} {(k) ∥}^{2} \leq & Γ λ_{min} {(Q) ∥ e (k) ∥}^{2} / 2 + (Γ λ_{min} (R) - 2 g_{1}^{2}) {∥ u_{e} (k) ∥}^{2} / 2, \end{matrix}

(10)

where

λ_{min} (R)

is the minimum eigenvalue of R,

λ_{min} (Q)

is the minimum eigenvalue of Q, and

Γ > 2 g_{1}^{2} / λ_{min} (R)

is a known positive constant. Then, the tracking error system (6) is asymptotically stable.

Proof.

We consider the following Lyapunov function,

V (k) = e^{T} (k) e (k) + Γ J_{e} (k)

(11)

where

J_{e} (k) = J_{e} (e (k), u_{e} (k))

is defined in (7). Differencing the Lyapunov function yields

Δ V (k) = e^{T} (k + 1) e (k + 1) - e^{T} (k) e (k) + Γ (J_{e} (k + 1) - J_{e} (k))

(12)

Using (6) and (7), we can obtain

\begin{matrix} Δ V (k) = & {(f_{e} (k) + g_{e} (k) u_{e} (k))}^{T} (f_{e} (k) + g_{e} (k) u_{e} (k)) \\ - e^{T} (k) e (k) - Γ (e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k)) \end{matrix}

(13)

Using the Cauchy–Schwarz inequality yields

\begin{matrix} Δ V (k) \leq 2 ∥ f_{e} {(k) ∥}^{2} - (Γ λ_{\min} (R) - 2 g_{1}^{2}) ∥ u_{e} {(k) ∥}^{2} - Γ λ_{\min} {(Q) ∥ e (k) ∥}^{2} - {∥ e (k) ∥}^{2} \end{matrix}

(14)

For the purpose of asymptotically stabilizing the tracking system (6), i.e.,

Δ V (k) < 0

, it is necessary to satisfy the following

\begin{matrix} 2 ∥ f_{e} {(k) ∥}^{2} \leq & Γ λ_{min} {(Q) ∥ e (k) ∥}^{2} + (Γ λ_{min} (R) - 2 g_{1}^{2}) {∥ u_{e} (k) ∥}^{2} \end{matrix}

(15)

Thus,

Δ V (k) < 0

and the asymptotic stability of the tracking error system (6) are proved if the bound in (10) is satisfied. □

Remark 2.

Lemma 1 shows that under the condition that the internal dynamics

f_{e} (k)

is bounded to satisfy (10), there exists an admissible control

u_{e} (k)

that not only stabilizes the tracking error system (6) on Ω but also guarantees that the cost function

J_{e} (k)

is finite.

3. Optimal Tracking Controller Design with Partially Unknown Dynamics

In this section, firstly, we use an RBF-NN to approximate the unknown system dynamics

f [x (k)]

and use another RBF-NN to approximate the steady-state controller

u_{d} (k)

. Secondly, two feedback neural networks are introduced to approximate the cost function and the optimal feedback control

u_{e} (k)

. Finally, the system stability is proved by selecting an appropriate Lyapunov function.

3.1. RBF-NN Identifier Design

In this subsection, in order to capture the unknown dynamics of the system (1), an RBF-NN-based identifier is proposed. Without losses of generality, this unknown dynamics is assumed to be a smooth function within a compact set. Using an RBF-NN, this unknown dynamics (1) is identified as

\hat{f} (x (k)) = \hat{w_{f}} {(k)}^{T} h [x (k)] + Δ_{f} (x)

(16)

where

\hat{w_{f}} (k)

is the matrix of ideal output weights of the neural network and

h [x (k)]

is the vector of radial basis functions,

Δ_{f} (x)

is the bounded approximation error, and

| | Δ_{f} (x) | | < ε_{f}

, where

ε_{f}

is a positive constant.

For any non-zero approximation error

Δ_{f} (x)

, there exists optimal weight matrix

{w_{f}}^{*}

such that

f (x (k)) = \hat{f} (x, w_{f}^{*}) - Δ_{f} (x)

(17)

where

w_{f}^{*}

is the optimal weight of identifier, and

\hat{f} (x, w_{f}^{*}) = w_{f}^{*} {(k)}^{T} h [x (k)]

. The output weights are updated, and the hidden weights remain unchanged when training, so the neural network model identification error is

\begin{matrix} \tilde{f} (x (k)) & = f [x (k)] - \hat{f} [x (k)] \\ = \hat{f} (x, {w_{f}}^{*}) - Δ_{f} [x (k)] - \hat{w_{f}} {(k)}^{T} h [x (k)] \\ = - \tilde{w_{f}} {(k)}^{T} h [x (k)] - Δ_{f} [x (k)] \end{matrix}

(18)

where

- \tilde{w_{f}} (k) = {w_{f}}^{*} (k) - \hat{w_{f}} (k)

.

The error function is defined as the following

E (k + 1) = \frac{1}{2} {[\tilde{f} (x (k))]}^{T} [\tilde{f} (x (k))]

(19)

Using the gradient descent method, the weights are updated by

\begin{matrix} Δ w_{f_{j}} (k + 1) & = - η \frac{\partial E}{\partial w_{f_{j}}} \\ = η (f (x (k)) - \hat{f} (x (k))) h [x (k)] \\ = η (\tilde{f} (x (k))) h [x (k)] \end{matrix}

(20)

and

w_{f_{j}} (k) = w_{f_{j}} (k - 1) + Δ w_{f_{j}} (k)

(21)

where

η > 0

is the learning rate of the identifier.

Inspired by the work in [36], we must state the following assumptions before proceeding.

Assumption 2.

The neural network identifying error is assumed to have an upper bound, namely

Δ_{f} {(x)}^{T} Δ_{f} (x) \leq \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) h {[x (k)]}^{T} h [x (k)]

(22)

3.2. RBF-NN Steady-State Controller Design

We use the RBF-NN to approximate the steady-state control

u_{d} (k)

directly, and the inverse dynamic NN is established to approximate [16,19].

We design the steady-state control

u_{d} (k)

through the approximation of the RBF-NN

u_{d} (k) = {\hat{w_{d}}}^{T} (k) h [x_{d} (k)]

(23)

where

\hat{w_{d}}

is the actual neural network weights;

h [x_{d} (k)]

is the output of the hidden layers; and

u_{d} (k)

is the output of the RBF-NN.

Let the ideal steady-state control

u_{d}^{*} (k)

be

u_{d}^{*} (k) = w_{d}^{* T} h [x_{d} (k)] + ε_{u}

(24)

where

w_{d}^{*}

is the optimal neural network weights and

ε_{u}

is the error vector. Assuming that

x_{d} (k + 1)

is the reference output of the system at the point

k + 1

, without considering external disturbances, the control input

u_{d}^{*} (k)

satisfies

L [x_{d} (k), u_{d}^{*} (k)] - x_{d} (k + 1) = 0

(25)

where

L [x_{d} (k), u_{d}^{*} (k)] = f [x_{d} (k)] + g [x_{d} (k)] u_{d}^{*} (k)

.

Thus, we can define the error

e_{m} (k)

of the approximation state as

e_{m} (k + 1) = L [x_{d} (k), u_{d} (k)] - x_{d} (k + 1)

(26)

where

L [x_{d} (k), u_{d} (k)] = f [x_{d} (k)] + g [x_{d} (k)] u_{d} (k)

.

(24) subtracted from (23) yields

\begin{matrix} u_{d} (k) - u_{d}^{*} (k) = {\hat{w_{d}}}^{T} (k) h [x_{d} (k)] - w_{d}^{* T} (k) h [x_{d} (k)] - ε_{u} \\ = {\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] - ε_{u} \end{matrix}

(27)

where

\tilde{w_{d}} (k) = \hat{w_{d}} (k) - w_{d}^{*} (k)

is weight approximation error.

The weights are updated by the following update law of the weights

\hat{w_{d}} (k + 1) = \hat{w_{d}} (k) - γ [h (x_{d} (k)) e_{m} (k + 1) + σ \hat{w_{d}} (k)]

(28)

where

γ > 0

and

σ > 0

are the positive constant.

Assumption 3.

Within the set

Ω_{ε}

, the optimal neural network weights

w^{*}

and the approximation error are bounded.

‖ w_{d}^{*} ‖ ⩽ w_{m}, | | ε_{u} | | ⩽ ε_{l}

(29)

3.3. Near-Optimal Feedback Controller Design

In this subsection, we present an ADP algorithm based on the Bellman optimality. The goal is to find the optimal approximate feedback control law that minimizes the approximate cost function.

First, considering the HJB Equation (7) and the optimal feedback control (8), the cost function

J_{e} (e (k), u_{e} (k))

is rewritten as

V^{i} (e (k))

. The initial cost function

V^{0} (e (k)) = 0

may not represent the optimal value function. Then, a single control vector

u_{e}^{0} (k)

can be solved by

\begin{matrix} V^{0} (e (k)) & = min_{u_{e} (k)} {e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k) + V^{0} (e (k + 1))} \\ = e^{T} (k) Q e (k) + {(u_{e}^{0} (k))}^{T} R u_{e}^{0} (k) \end{matrix}

(30)

Updating the control law yields

\begin{matrix} u_{e}^{1} (k) & = arg min_{u_{e} (k)} {e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k) + V^{0} (e (k + 1))} \\ = - \frac{1}{2} R^{- 1} g_{e}^{T} (k) \frac{\partial V^{0} (e (k + 1))}{\partial e (k + 1)} \end{matrix}

(31)

Hence, for

i = 1, 2, \dots

, the ADP algorithm can be realized in a continuous iterative process in

\begin{matrix} V^{i} (e (k)) & = min_{u_{e} (k)} \{e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k) + V^{i} (e (k + 1))\} \\ = e^{T} (k) Q e (k) + {(u_{e}^{i} (k))}^{T} R u_{e}^{i} (k) + V^{i} (e (k + 1)) \end{matrix}

(32)

and

\begin{matrix} u_{e}^{i + 1} (k) & = arg min_{u_{e} (k)} {e^{T} (k) Q e (k) + u_{e}^{T} (k) R u_{e} (k) + V^{i} (e (k + 1))} \\ = - \frac{1}{2} R^{- 1} g_{e}^{T} (k) \frac{\partial V^{i} (e (k + 1))}{\partial e (k + 1)} \end{matrix}

(33)

where index i represents the number of iterations of the control law and the cost function, i.e., the update count of internal neuron to update the weight parameters, while index k represents time index of state. Moreover, it is worth noting in the iterative process of the ADP algorithm that the number of iterations of the cost function and the control law increases from zero to infinity.

To begin the development of the feedback control policy, we used neural networks to construct the critic network and the actor network.

The cost function

V_{i} (e (k))

is defined as the critic network.

The output of the critic network is denoted as

{\hat{V}}^{i} (e (k)) = w_{c i}^{T} z (ν_{c i}^{T} e (k)) + ε_{c} (k)

(34)

where

z (ν_{c i}^{T} e (k))

is the hidden layer function,

w_{c i}

is the hidden layer weight of the critic network,

ν_{c i}

is the input layer weight of the critic network, and

ε_{c} (k)

is the approximation error.

So, we define the prediction error of the critic network as

e_{c i} (k) = {\hat{V}}^{i} (e (k)) - V^{i} (e (k))

(35)

The error function of the critic network is defined as

E_{c i} (k) = \frac{1}{2} e_{c i}^{T} (k) e_{c i} (k) .

(36)

Using the gradient descent method, the weights of the critic network are updated,

w_{c i} (k + 1) = w_{c i} (k) - α_{c} [\frac{\partial E_{c i} (k)}{\partial w_{c i} (k)}]

(37)

where

α_{c} > 0

is the learning rate of the critic network.

The inputs of the actor network is the system error

e (k)

, and the outputs of the actor network is the optimal feedback control

u_{e} (k)

. The output can be formulated as

{\hat{u}}_{e}^{i} (k) = w_{a i}^{T} z (v_{a i}^{T} e (k)) + ε_{a} (k),

(38)

where

z (ν_{a i}^{T} e (k))

is the hidden layer function,

w_{a i}

is the hidden layer weight of the actor network,

ν_{a i}

is the input layer weight of the actor network, and

ε_{a} (k)

is the approximation error.

Similar to the critic network, the prediction error of the actor network is defined as

e_{a i} (k) = {\hat{u}}_{e}^{i} (k) - u_{e}^{i} (k)

(39)

where

{\hat{u}}_{e}^{i} (k)

is approximation optimal feedback control, and

u_{e}^{i} (k)

is the optimal feedback control at the iterative number i.

The error function of the actor network is defined as

E_{a i} (k) = \frac{1}{2} e_{a i}^{T} (k) e_{a i} (k)

(40)

The weights of the actor network are also updated in the same way as the critic network; we use the gradient descent method

w_{a i} (k + 1) = w_{a i} (k) - β_{a} [\frac{\partial E_{a i} (k)}{\partial w_{a i} (k)}],

(41)

where

β_{a} > 0

is the learning rate of the actor network, and i is the update count of the internal neuron to update the weight parameters.

3.4. Stability Analysis

In this subsection, we give the stability proof by Lyapunov’s stability theory.

Assumption 4.

Radial basis function

h (t) = exp (- \frac{‖ x (t) - c (t) ‖^{2}}{2 b^{2}})

of the maximum value is

h_{m a x} = 1

, where

c (t)

is the center point and b is the width of radial basis function. Assuming the numbers of neurons is

l \in [l_{f}, l_{d}]

for any radial basis function

h \in [h [x (k)], h [x_{d} (k)]]

, then

\begin{matrix} |h_{i}| ⩽ 1, ∥h∥ ⩽ \sqrt{l} ⩽ l, \\ h^{T} h = {∥h∥}^{2} ⩽ l \end{matrix}

(42)

We can know the maximum value

{∥h∥}^{2}

of the hidden layer with l neurons is

l \in [l_{f}, l_{d}]

, then we assume the maximum value

{∥h [x (k)]∥}^{2}

of the hidden layer for the identifier

\hat{f} (x (k))

is

l_{f}

, and the maximum value

{∥h_{d} [x (k)]∥}^{2}

of the hidden layer for the steady-state controller

u_{d} (k)

is

l_{d}

.

Lemma 2.

The relationship between (25) and weight approximation error (27) satisfies the following equation.

{\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] = \frac{e_{m} (k + 1)}{L_{u}} + ε_{u}

(43)

where

e_{m} (k)

is the error of the approximation state

x_{d} (k)

,

L_{u} = \frac{\partial L}{\partial u} |_{u = ξ}, ξ \in [u_{d}^{*} (k), u_{d} (k)]

,

g_{1} ⩾ |\frac{\partial L}{\partial u}| > ϵ > 0

,

g_{1}

and ϵ are positive constants.

Proof.

Subtracting

w_{d}^{*}

from both sides of (28), we obtain

\tilde{w_{d}} (k + 1) = \tilde{w_{d}} (k) - γ [h [x_{d} (k)] e_{m} (k + 1) + σ \hat{w_{d}} (k)]

(44)

Combining (25) and (27) with the mean value theorem, we can obtain

\begin{matrix} L [x_{d} (k), u_{d} (k)] & = [L [x_{d} (k)], u_{d}^{*} (k) + {\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] - ε_{u}] \\ = [L x_{d} (k), u_{d}^{*} (k)] + [{\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] - ε_{u}] L_{u} \\ = x_{d} (k + 1) + [{\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] - ε_{u}] L_{u} \end{matrix}

(45)

Further combining (45) with (26), we can obtain

\begin{matrix} e_{m} (k + 1) & = L [x_{d} (k), u_{d} (k)] - x_{d} (k + 1) \\ = [{\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] - ε_{u}] L_{u} \end{matrix}

(46)

After rearranging, we can obtain

{\tilde{w_{d}}}^{T} (k) h [x_{d} (k)] = \frac{e_{m} (k + 1)}{L_{u}} + ε_{u}

(47)

The proof is completed. □

Lemma 3.

For simplicity of analysis,

ε_{u}

and

e_{m} (k + 1)

have an inequality relation though using Assumption 3 and Young’s inequality.

- 2 ε_{u}^{T} e_{m} (k + 1) ⩽ k_{0} ∥ ε_{u} ∥^{2} + \frac{1}{k_{0}} {∥ e_{m} (k + 1) ∥}^{2}

(48)

where

k_{0}

is a positive constant.

Theorem 1.

For the optimal tracking problem (1)–(3), the RBF-NN identifier (16) is used to approximate

f (x (k))

, the steady-state controller

u_{d} (k)

is approximated by the RBF-NN (23), and the feedforward networks (34), (38) are used to approximate the cost function

J (e (k), u (k))

and the feedback controller

u_{e} (k)

, respectively. Assume that the parameters satisfy the following inequality

\begin{matrix} (a) 0 < η ⩽ \frac{1}{l_{f}} \\ (b) 0 < g_{1} ⩽ k_{0} \\ (c) 0 < (1 + σ) l_{d} γ ⩽ \frac{1}{g_{1}} - \frac{1}{k_{0}} \\ (d) 0 < (l_{d} + σ) γ ⩽ 1 \\ (e) a_{c} \leq 2 / {∥ z (v_{c i}^{T} e (k)) ∥}^{2} \\ (f) β_{a} \leq 2 / {∥ z (ν_{a i}^{T} e (k)) ∥}^{2} \end{matrix}

(49)

where η is the learning rate of the RBF-NN identifier, σ and γ are the update parameters of the steady-state controller approximation network weights,

a_{c}

is the learning rate of the actor network,

β_{a}

is the learning rate of the critic network, and

z (v_{c i}^{T} e (k))

and

z (ν_{a i}^{T} e (k))

are hidden layer functions of the actor network and the critic network. Then, the closed loop system (6) of approximation error is asymptotically stable when the parameter estimation errors are bounded.

Proof.

Considering the following positive definite Lyapunov function candidate

\begin{matrix} J (k) & = J_{1} (k) + J_{2} (k) + J_{3} (k) + J_{4} (k) \\ = \frac{1}{η} \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) + \frac{1}{g_{1}} e_{m} {(k)}^{T} e_{m} (k) + \frac{1}{γ} \tilde{w_{d}} {(k)}^{T} \tilde{w_{d}} (k) + w_{c i} {(k)}^{T} w_{c i} (k) + w_{a i} {(k)}^{T} w_{a i} (k) \end{matrix}

(50)

where

J_{1} (k) = \frac{1}{η} \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k)

,

J_{2} (k) = \frac{1}{g_{1}} e_{m} {(k)}^{T} e_{m} (k) + \frac{1}{γ} \tilde{w_{d}} {(k)}^{T} \tilde{w_{d}} (k)

,

J_{3} (k) = w_{c i} {(k)}^{T} w_{c i} (k)

,

J_{4} (k) = w_{a i} {(k)}^{T} w_{a i} (k)

.

Firstly, differencing it according to the Lyapunov function of

J_{1} (k) = \frac{1}{η} \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k)

yields

\begin{matrix} Δ J_{1} (k) & = J_{1} (k + 1) - J_{1} (k) \\ = \frac{1}{η} \tilde{w_{f}} {(k + 1)}^{T} \tilde{w_{f}} (k + 1) - \frac{1}{η} \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) \\ = \frac{1}{η} {[\tilde{w_{f}} (k) + η \tilde{f} (x (k)) h [x (k)]]}^{T} [\tilde{w_{f}} (k) + η \tilde{f} (x (k)) h [x (k)]] - \frac{1}{η} \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) \\ = \frac{1}{η} [\tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) - \frac{1}{η} \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) + η^{2} [\tilde{f} {(x (k))}^{T} \tilde{f} (x (k)) h {[x (k)]}^{T} h [x (k)] \\ + 2 η \tilde{f} (x (k)) \tilde{w_{f}} {(k)}^{T} h [x (k)] \\ = η [{[\tilde{w_{f}} {(k)}^{T} h [x (k)] + Δ_{f} [x]]}^{T} [\tilde{w_{f}} {(k)}^{T} h [x (k)] + Δ_{f} [x]] h {[x (k)]}^{T} h [x (k)]] \\ + 2 [\tilde{w_{f}} {(k)}^{T} h [x (k)] + Δ_{f} [x]] \tilde{w_{f}} {(k)}^{T} h [x (k)] \\ = η [\tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) h {[x (k)]}^{T} h [x (k)] + 2 \tilde{w_{f}} {(k)}^{T} h [x (k)] Δ_{f} [x] - 2 \tilde{w_{f}} {(k)}^{T} h [x (k)] \\ - 2 \tilde{w_{f}} {(k)}^{T} \tilde{w_{f}} (k) h {[x (k)]}^{T} h [x (k)] + Δ_{f} {[x]}^{T} Δ_{f} [x] h {[x (k)]}^{T} h [x (k)] \end{matrix}

(51)

According to the Assumption 2, Assumption 4 and (42), (51) can be carried out to obtain

\begin{matrix} Δ J_{1} (k) ⩽ & η l_{f}^{2} ‖ \tilde{w_{f}} {(k) ‖}^{2} - 2 l_{f} ‖ \tilde{w_{f}} {(k) ‖}^{2} + η l_{f}^{2} {‖ \tilde{w_{f}} (k) ‖}^{2} \\ + 2 η l_{f} \tilde{w_{f}} {(k)}^{T} h [x (k)] Δ_{f} [x] - 2 \tilde{w_{f}} {(k)}^{T} h [x (k)] Δ_{f} [x] \\ ⩽ & ‖ \tilde{w_{f}} {(k) ‖}^{2} (2 η l_{f}^{2} - 2 l_{f}) + (2 l_{f} η - 2) \tilde{w_{f}} {(k)}^{T} h [x (k)] Δ_{f} [x] \\ ⩽ & ‖ \tilde{w_{f}} {(k) ‖}^{2} (4 l_{f}^{2} η - 4 l_{f}) \end{matrix}

(52)

Next, differencing according to the Lyapunov function of

J_{2} (k) = \frac{1}{g_{1}} e_{m} {(k)}^{T} e_{m} (k)

+ \frac{1}{γ} \tilde{w_{d}} {(k)}^{T} \tilde{w_{d}} (k)

yields

\begin{matrix} Δ J_{2} (k) & = J_{2} (k + 1) - J_{2} (k) \\ = \frac{1}{g_{1}} [e_{m} {(k + 1)}^{T} e_{m} (k + 1) - e_{m} {(k)}^{T} e_{m} (k)] - \frac{1}{γ} \tilde{w_{d}} {(k)}^{T} \tilde{w_{d}} (k) + \frac{1}{γ} \tilde{w_{d}} {(k + 1)}^{T} \tilde{w_{d}} (k + 1) \\ = \frac{1}{γ} {〈 \tilde{w_{d}} (k) - γ [h [x_{d} (k)] e_{m} (k + 1) + σ \hat{w_{d}} (k)] 〉}^{T} 〈 \tilde{w_{d}} (k) - γ [h [x_{d} (k)] e_{m} (k + 1) + σ \hat{w_{d}} (k)] 〉 \\ - \frac{1}{γ} \tilde{w_{d}} {(k)}^{T} \tilde{w_{d}} (k) + \frac{1}{g_{1}} [e_{m} {(k + 1)}^{T} e_{m} (k + 1) - e_{m} {(k)}^{T} e_{m} (k)] \\ = \frac{1}{g_{1}} [e_{m} {(k + 1)}^{T} e_{m} (k + 1) - e_{m} {(k)}^{T} e_{m} (k)] - 2 \tilde{w_{d}} {(k)}^{T} h [x_{d} (k)] e_{m} (k + 1) \\ - 2 σ \tilde{w_{d}} {(k)}^{T} \hat{w_{d}} (k) + γ h^{T} [x_{d} (k)] h [x_{d} (k)] e_{m} {(k + 1)}^{T} e_{m} (k + 1) + γ σ^{2} \hat{w_{d}} {(k)}^{T} \hat{w_{d}} (k) \\ + 2 γ σ \hat{w_{d}} {(k)}^{T} h [x_{d} (k)] e_{m} (k + 1) \end{matrix}

(53)

where

\begin{matrix} 2 σ \tilde{w_{d}} {(k)}^{T} \hat{w_{d}} (k) & = σ \tilde{w_{d}} {(k)}^{T} [\bar{w_{d}} (k) + w_{d}^{*}] + σ {[\hat{w_{d}} (k) - ω_{d}^{*}]}^{T} \hat{w_{d}} (k) \\ = σ ‖ \tilde{w_{d}} {(k) ‖}^{2} + {‖ \hat{w_{d}} (k) ‖}^{2} + \tilde{w_{d}} {(k)}^{T} w_{d}^{*} - w_{d}^{*} \hat{w_{d}} {(k)}^{T} \\ = σ [‖ \tilde{w_{d}} (k) ‖^{2} + ‖ \hat{w_{d}} (k) ‖^{2} - ‖ w_{d}^{*} ‖^{2}], \\ γ h^{T} [x_{d} (k)] h [x_{d} (k)] e_{m} {(k + 1)}^{T} e_{m} (k + 1) ⩽ γ l_{d} {∥ e_{m} (k + 1) ∥}^{2}, \\ 2 γ σ {\hat{w_{d}}}^{T} (k) h [x_{d} (k)] e_{m} (k + 1) ⩽ γ σ l_{d} [‖ \hat{w_{d}} {(k) ‖}^{2} + ∥ e_{m} (k + 1) ∥^{2}], \\ γ σ^{2} {\hat{w_{d}}}^{T} (k) \hat{w_{d}} (k) = γ σ^{2} {‖ \hat{w_{d}} (k) ‖}^{2} \end{matrix}

(54)

Considering (26) and

g_{1} ⩾ |\frac{\partial L}{\partial u}| > ϵ > 0

, we can deduce

\begin{matrix} \frac{1}{g_{1}} - \frac{2}{L_{u}} ⩽ \frac{1}{g_{1}} - \frac{2}{g_{1}} = - \frac{1}{g_{1}} < 0 \end{matrix}

(55)

Recall Lemmas 2 and 3; substituting (54) into (53) yields

\begin{matrix} Δ J_{2} (k) & ⩽ [- \frac{1}{g_{1}} + γ (1 + σ) l_{d} + \frac{1}{k_{0}}] ∥ e_{m} (k + 1) ∥^{2} + σ (γ l_{d} + γ σ - 1) {‖ \hat{w_{d}} (k) ‖}^{2} \\ - \frac{1}{g_{1}} ∥ e_{m} (k) ∥^{2} - σ {‖ \bar{w_{d}} (k) ‖}^{2} + σ ω_{m}^{2} + k_{0} ε_{l}^{2} \\ = - [\frac{1}{g_{1}} - (1 + σ) l_{d} γ - \frac{1}{k_{0}}] ∥ e_{m} (k + 1) ∥^{2} + σ [(l_{d} + σ) γ - 1] {‖ \hat{w_{d}} (k) ‖}^{2} \\ - \frac{1}{g_{1}} [∥ e_{m} (k) ∥^{2} - β] - σ {‖ \tilde{w_{d}} (k) ‖}^{2} \end{matrix}

(56)

where

β = g_{1} (σ w_{m}^{2} + k_{0} ε_{l}^{2})

is a positive constant.

Next, we consider the following Lyapunov function

J_{3} (k) + J_{4} (k) = w_{c i} {(k)}^{T} w_{c i} (k) + w_{a i} {(k)}^{T} w_{a i} (k) .

(57)

Then, differencing it according to the Lyapunov function of (57) yields

\begin{matrix} Δ J_{3} (k) + Δ J_{4} (k) & = {w_{c i} {(k + 1)}^{T} w_{c i} (k + 1) + w_{a i} {(k + 1)}^{T} w_{a i} (k + 1)} \\ - {w_{c i} {(k)}^{T} w_{c i} (k) + w_{a i} {(k)}^{T} w_{a i} (k)} \\ = a_{c} {∥ e_{c i} (k) ∥}^{2} (- 2 + a_{c} {∥ z (v_{c i}^{T} e (k)) ∥}^{2}) \\ + β_{a} ‖ e_{a i} {(k) ‖}^{2} (- 2 + β_{a} ‖ z (v_{a i}^{T} e (k)) ‖^{2}) . \end{matrix}

(58)

Finally,

Δ J (k)

is derived from (52), (56), and (58)

\begin{matrix} Δ J (k) & = Δ J_{1} (k) + Δ J_{2} (k) + Δ J_{3} (k) + Δ J_{4} (k) \\ ⩽ 4 ‖ \tilde{w_{f}} {(k) ‖}^{2} (l_{f}^{2} η - l_{f}) - σ ‖ \tilde{w_{d}} (k) ‖^{2} - [\frac{1}{g_{1}} - (1 + σ) l_{d} γ - \frac{1}{k_{0}}] {∥ e_{m} (k + 1) ∥}^{2} \\ + σ [(l_{d} + σ) γ - 1] ‖ \hat{w_{d}} (k) ‖^{2} - \frac{1}{g_{1}} [∥ e_{m} (k) ∥^{2} - β] + a_{c} {∥ e_{c i} (k) ∥}^{2} (- 2 + a_{c} {∥ z (v_{c i}^{T} e (k)) ∥}^{2}) \\ + β_{a} ‖ e_{a i} {(k) ‖}^{2} (- 2 + β_{a} ‖ z (v_{a i}^{T} e (k)) ‖^{2}) . \end{matrix}

(59)

Based on the above analysis, when the parameters are selected to fulfill the following condition with

∥ e_{m} (k) ∥^{2} ⩾ β

,

\begin{matrix} 0 < η ⩽ \frac{1}{l_{f}} \\ 0 < g_{1} ⩽ k_{0} \\ 0 < (1 + σ) l_{d} γ ⩽ \frac{1}{g_{1}} - \frac{1}{k_{0}} \\ 0 < (l_{d} + σ) γ ⩽ 1 \\ a_{c} \leq 2 / {∥ z (v_{c i}^{T} e (k)) ∥}^{2} \\ β_{a} \leq 2 / {∥ z (ν_{a i}^{T} e (k)) ∥}^{2} \end{matrix}

(60)

we can obtain

Δ J (k) ⩽ 0

. □

The working process of the proposed control technique is shown in Figure 1. As shown in Figure 1, with

x (k)

,

u_{d} (k)

and

u_{e}^{i} (k)

, the estimated error

e (k + 1)

can be obtained by using the RBF-NN identifier and the steady-state controller. Corresponding to the steady-state controller

u_{d} (k)

, we can obtain the reference trajectory

x_{d} (k)

. Using the ADP algorithm, we can obtain nearly optimal feedback controller

{\hat{u}}_{e}^{i} (k)

. Then, the actual controller

u (k) = {\hat{u}}_{e}^{i} (k) + u_{d} (k)

and system dynamics

x (k + 1)

can be obtained. In addition, by using

x_{d} (k)

and

x (k)

the estimated tracking error

e (k)

can be obtained,

e (k + 1)

can be further obtained. Finally, we can reconstruct the system dynamics to track the reference trajectory.

4. Simulation

In this section, we give the simulation results of our method and compare it with other methods [36]. A discrete-time nonlinear system is introduced to demonstrate the effectiveness of the proposed tracking control method. The case is derived from [24]. We assume that the nonlinear smooth function

f \in R^{n}

is an unknown nonlinear drift function and

g \in R^{n \times m}

is a known function. The corresponding

f [x (k)]

and

g [x (k)]

are given as

\begin{matrix} f [x (k)] = [\begin{matrix} f_{1} [x (k)] \\ f_{2} [x (k)] \end{matrix}] = [\begin{matrix} - sin (0.5 x_{2} (k)) x_{1}^{2} (k) \\ - cos (1.4 x_{2} (k)) sin (0.9 x_{1} (k)) \end{matrix}] \\ g [x (k)] = [\begin{matrix} {(x_{1} (k))}^{2} + 1.5 & 0.1 \\ 0 & 0.2 ({(x_{1} (k) + x_{2} (k))}^{2} + 1) \end{matrix}] \end{matrix}

(61)

The reference trajectory

x_{d} (k)

for the above system is defined as

x_{d} (k) = [\begin{matrix} 0.25 sin (10^{- 3} k) \\ 0.25 cos (10^{- 3} k) \end{matrix}]

(62)

where

u (k) \in R^{2} \in {[u_{1} (k), u_{2} (k)]}^{T}

, and

t i m e (s)

of y-axis is chosen to have a

k (1, \dots, 10, 000)

multiplied by

t s = 0.001

in the simulation.

4.1. Simulation Result of the Proposed Method

In this subsection, we give the simulation result for our proposed method.

Firstly, in order to deal with the unknown dynamics, we need to use two RBF networks to obtain the RBF identifier and the RBF steady-state controller. The RBF networks have a three-layer structure with two input neurons, hidden layers have nine neurons, and output layer have two neurons. The parameters

c_{i}

and

b_{j}

of the radial basis functions are chosen to be

c_{i} = [\begin{matrix} - 2 & - 1.5 & - 1.0 & - 0.5 & 0 & 0.5 & 1.0 & 1.5 & 2 \\ - 2 & - 1.5 & - 1.0 & - 0.5 & 0 & 0.5 & 1.0 & 1.5 & 2 \end{matrix}]

and

b_{j} = [b_{1}, b_{2}] = [2, 2]

, and the initial weights

w_{0}

are chosen to be random numbers between

(0, 1)

. For the RBF identifier with its weights updating law (21) to update the weights

\hat{w_{f}}

, the unknown function f is identified by the input/output data

x (k)

. For the RBF steady-state controller, the reference trajectory data

x_{d}

and the weights updating law (28) are used to update the weights

\hat{w_{d}}

to identify the steady-state controller

u_{d}

. Because

g_{1} ⩾ \frac{\partial L}{\partial u} = 1

, we can select

g_{1} = 5

. According to

0 < g_{1} ⩽ k_{0}

of Theorem 1, we can select

k_{0} = 10

. For the control parameters

η

, because hidden layers have nine neurons,

l = 9

,

0 < η ⩽ \frac{1}{l} ⩽ \frac{1}{9}

, we select

η = 0.1

. With control parameters

γ, σ

, we can know

0 < (1 + σ) 9 γ ⩽ \frac{1}{5} - \frac{1}{10} = \frac{1}{10} = 0.10

and

0 < (9 + σ) γ ⩽ 1

from Theorem 1 and thus select

γ = 0.01, σ = 0.001

. The initial state is set as

x (0) = {[0, 0]}^{T}

. We trained the RBF networks with 10,000 steps of acquired data. Figure 2 and Figure 3 show the RBF-NN identifiers to approximate the tracking curves of the unknown dynamics

\tilde{f} \in {[{\tilde{f}}_{1} [x (k)], {\tilde{f}}_{2} [x (k)]]}^{T}

.

Then, based on the ADP algorithm of Bellman optimality, Equation (6) was used to obtain the tracking error e and the optimal feedback control

u_{e}

to train the critic network and the actor network, respectively. Meanwhile, the obtained standard control inputs

u = {\hat{u}}_{e}^{i} + u_{d}

were used in system (1), which keeps on looping until the value function

V^{i} (e (k))

converge and the tracking error

e (k)

is zero, where the performance index is selected as

Q = I

and

R = I

, where I is the identity matrix with appropriate dimension. For the actor network and the critic network, we used the same parameter settings. The initial weights of the critic networks and actor networks are randomly chosen between

(- 10, 10)

. The input layer has 2 neurons, the hidden layer has 15 neurons, the output layer has 2 neurons, and the learning rate is 0.1. The hidden layer uses the function

t a n s i g

and the function

p u r e l i n

, and the output layer uses the function

t r a i n l m

. Though parameter settings, we trained the actor network and the critic network with 5000 training steps to reach the given accuracy

1 \times 10^{- 9}

. Figure 4 shows the curves of the system control

u \in [u_{1}, u_{2}]

. In Figure 5 and Figure 6, we can see the curves of the state trajectory x and the reference trajectory

x_{d}

.

Based on above the results, the simulation results show that this tracking technique obtains a relatively satisfactory tracking performance for partially unknown discrete-time nonlinear systems.

4.2. Comparison with Other Methods

In this subsection, we will compare with the research results in [36], which use a BP neural network to approximate the unknown system dynamics. In the comparison, we use the same system dynamics and desired tracking trajectory as (61) and (62) with the initial state

x (0) = {[0, 0]}^{T}

and the performance index

R = Q = I

.

To begin with, an NN identifier is established by a three-layer BP neural network, which is chosen to have a 4–10–2 structure with four input neurons, eight hidden neurons, and two output neurons. The feedforward-neuro-controller is also established by a three-layer BP NN, which is chosen to have a 2–10–2 structure with two input neurons, eight hidden neurons, and two output neurons. For the NN identifier and the feedforward-neuro-controller, the parameter settings of the neural networks are identical, where the hidden layers use the sigmoidal function

t a n s i g

, the output layers use the linear function

p u r e l i n

, the learning rate is 0.1, and the initial weights are chosen to be random numbers between

(0, 1)

.

For the actor network and the critic network, we also use the same parameter settings. A 2–15–2 structure is chosen for the critic networks and actor networks, the initial weights are randomly chosen between

(- 10, 10)

, and the learning rate is 0.1. The hidden layer uses the function

t a n s i g

and the function

p u r e l i n

, and the output layer uses the function

t r a i n l m

. Then, the given accuracy is

1 \times 10^{- 9}

. In Figure 7 and Figure 8, we can see the curves of the state trajectory x and the reference trajectory

x_{d}

using tracking control methods for references.

Comparing the two methods, from Figure 5, Figure 6, Figure 7 and Figure 8, we can see that our method has better performance in tracking the reference trajectory.

5. Conclusions

This paper proposes an effective scheme to find the near-optimal tracking controller for a class of partially unknown discrete-time nonlinear systems based on RBF-NNs. In dealing with unknown variables, two RBF-NNs are used to approximate the unknown function and the steady-state controller. Moreover, the ADP algorithm is introduced to obtain the optimal feedback control for tracking the error dynamics, two feedforward neural networks are utilized as structures to approximate the cost function and the feedback controller. Finally, simulation results show a relatively satisfactory tracking performance, which verifies the effectiveness of the optimal tracking control technique. In future work, we may consider completely unknown dynamics and event-triggering conditions.

Author Contributions

Methodology, D.X.; software, Y.L.; investigation, J.H.; resources, Y.M. All the authors contributed equally to the development of the research. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant No. 61463002, the Guizhou Province Natural Science Foundation of China under Grant No. Qiankehe Fundamentals-ZK[2021] General 322, and the Doctoral Foundation of Guangxi University of Science and Technology Grant No. Xiaokebo 22z04.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhai, D.; Lu, A.-Y.; Dong, J.; Zhang, Q. Adaptive Tracking Control for a Class of Switched Nonlinear Systems under Asynchronous Switching. IEEE Trans. Fuzzy Syst. 2018, 6, 1245–1256. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, X.; Wang, Z. Discrete-Time Adaptive Fuzzy Finite-Time Tracking Control for Uncertain Nonlinear Systems. IEEE Trans. Fuzzy Syst. 2024, 2, 649–659. [Google Scholar] [CrossRef]
Zhao, D.; Wang, Z.; Ho, D.W.C.; Wei, G. Observer-Based PID Security Control for Discrete Time-Delay Systems under Cyber-Attacks. IEEE Trans. Syst. Man Cybern. Syst. 2021, 6, 3926–3938. [Google Scholar] [CrossRef]
Lewis, F.L.; Vrabie, D.; Syrmos, V.L. Optimal Control, 3rd ed.; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2012. [Google Scholar]
Mannava, A.; Balakrishnan, S.N.; Tang, L.; Landers, R.G. Optimal tracking control of motion systems. IEEE Trans. Control Syst. Technol. 2012, 20, 1548–1558. [Google Scholar] [CrossRef]
Sharma, R.; Tewari, A. Optimal nonlinear tracking of spacecraft attitude maneuvers. IEEE Trans. Control Syst. Technol. 2013, 12, 677–682. [Google Scholar] [CrossRef]
Bellman, R.E. Dynamic Programming; Princeton University Press: Princeton, NJ, USA, 1957. [Google Scholar]
Lewis, F.L.; Syrmos, V.L. Optimal Control; Wiley: New York, NY, USA, 1995. [Google Scholar]
Narendra, K.S.; Parthasarathy, K. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1990, 1, 4–27. [Google Scholar] [CrossRef] [PubMed]
Narendra, K.S.; Mukhopadhyay, S. Adaptive control of nonlinear multivariable systems using neural networks. In Proceedings of the 32nd IEEE Conference on Decision and Control, San Antonio, TX, USA, 15–17 December 1993; pp. 737–752. [Google Scholar]
Poggio, T.; Girosi, F. Networks for approximation and learning. Proc. IEEE 1990, 78, 1481–1497. [Google Scholar] [CrossRef]
Hartman, E.J.; Keeler, J.D.; Kowalski, J.M. Layered Neural Networks with Gaussian Hidden Units as Universal Approximations. Neural Comput. 1990, 2, 210–215. [Google Scholar] [CrossRef]
Park, J. Universal approximation using radial basis function networks. Neural Comput. 1993, 3, 246–257. [Google Scholar] [CrossRef]
Powell, M.J.D. Radial Basis Functions for Multivariable Interpolation: A Review. In Algorithms for Approximation; Mason, J.C., Cox, M.G., Eds.; Clarendon Press: Oxford, UK, 1987; pp. 143–167. [Google Scholar]
Nelles, O.; Isermann, R. A Comparison between RBF Networks and Classical Methods for Identification of Nonlinear Dynamic Systems. In Adaptive Systems in Control and Signal Processing; Pergamon: Oxford, UK, 1995. [Google Scholar]
Ge, S.S.; Zhang, J.; Lee, T.H. Adaptive MNN control for a class of non-affine NARMAX systems with disturbances. Syst. Control Lett. 2004, 53, 1–12. [Google Scholar] [CrossRef]
Kobayashi, H.; Ozawa, R. Adaptive neural network control of tendon-driven mechanisms with elastic tendons. Automatica 2003, 1509–1519. [Google Scholar] [CrossRef]
Zhang, H.; Luo, Y.; Liu, D. Neural-Network-Based Near-Optimal Control for a Class of Discrete-Time Affine Nonlinear Systems with Control Constraints. IEEE Trans. Neural Netw. 2009, 20, 1490–1503. [Google Scholar] [CrossRef] [PubMed]
Liu, J.K. Radial Basis Function (RBF) Neural Network Control for Mechanical Systems: Design, Analysis and Matlab Simulation; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Powell, W. Approximate Dynamic Programming: Solving the Curses of Dimensionality; Wiley Series in Probability and Statistics; Wiley: Hoboken, NJ, USA, 2007. [Google Scholar]
Vrabie, D.; Lewis, F. Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw. 2009, 4, 237–246. [Google Scholar] [CrossRef] [PubMed]
Liu, D.; Yang, X.; Li, H. Adaptive optimal control for a class of continuous-time affifine nonlinear systems with unknown internal dynamics. Neural Comput. Appl. 2013, 11, 1843–1850. [Google Scholar] [CrossRef]
Bhasin, S.; Kamalapurkar, R.; Johnson, M.; Vamvoudakis, K.; Lewis, F.L.; Dixon, W.E. A novel actor–critic–identififier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 2013, 1, 82–92. [Google Scholar] [CrossRef]
Dierks, T.; Jagannathan, S. Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics. In Proceedings of the 48h IEEE Conference on Decision and Control (CDC) Held Jointly with 2009 28th Chinese Control Conference, Shanghai, China, 15–18 December 2009. [Google Scholar]
Al-Tamimi, A.; Lewis, F.L.; Abu-Khalaf, M. Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Trans. Syst. Man Cybern. B Cybern. 2008, 8, 943–949. [Google Scholar] [CrossRef] [PubMed]
Prokhorov, D.V.; Wunsch, D.C. Adaptive critic designs. IEEE Trans. Neural Netw. 1997, 9, 997–1007. [Google Scholar] [CrossRef] [PubMed]
Luo, Y.; Zhang, H. Approximate optimal control for a class of nonlinear discrete-time systems with saturating actuators. Prog. Natural Sci. 2008, 1023–1029. [Google Scholar] [CrossRef]
Dierks, T.; Jagannathan, S. Online optimal control of nonlinear discrete-time systems using approximate dynamic programming. Control Theory Appl. 2011, 361–369. [Google Scholar] [CrossRef]
Si, J.; Wang, Y.-T. Online learning control by association and reinforcement. IEEE Trans. Neural Netw. 2001, 5, 264–276. [Google Scholar] [CrossRef] [PubMed]
Liu, D.; Wei, Q. Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems. IEEE Trans. Neural Netw. Learn. Syst. 2014, 621–634. [Google Scholar] [CrossRef] [PubMed]
Kiumarsi, B.; Lewis, F.L. Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans. Neural Netw. Learn. Syst. 2017, 140–151. [Google Scholar] [CrossRef] [PubMed]
Ren, L.; Zhang, G.; Mu, C. Data-based H_∞ control for the constrained-input nonlinear systems and its applications in chaotic circuit systems. IEEE Trans. Circuits Syst. 2020, 8, 2791–2802. [Google Scholar] [CrossRef]
Zhao, F.; Gao, W.; Liu, T.; Jiang, Z.-P. Event-triggered robust adaptive dynamic programming with output-feedback for large-scale systems. IEEE Trans. Control Netw. Syst. 2023, 8, 63–74. [Google Scholar] [CrossRef]
Zhang, H.; Cui, L.; Zhang, X.; Luo, Y. Data-Driven Robust Approximate Optimal Tracking Control for Unknown General Nonlinear Systems Using Adaptive Dynamic Programming Method, in IEEE Transactions on Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2011, 11, 2226–2236. [Google Scholar] [CrossRef]
Lin, Q.; Wei, Q.; Liu, D. A novel optimal tracking control scheme for a class of discrete-time nonlinear systems using generalised policy iteration adaptive dynamic programming algorithm. Int. J. Syst. Sci. 2017, 48, 525–534. [Google Scholar] [CrossRef]
Huang, Y.; Liu, D. Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm. Neurocomputing 2014, 125, 46–56. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, B.; Liu, D.; Zhang, S. Event-triggered control of discrete-time zero-sum games via deterministic policy gradient adaptive dynamic programming. IEEE Trans. Syst. Man Cybern. Syst. 2022, 8, 4823–4835. [Google Scholar] [CrossRef]
Zhu, Y.; Zhao, D.; He, H. Invariant adaptive dynamic programming for discrete-time optimal control. IEEE Trans. Syst. Man Cybern. Syst. 2020, 11, 3959–3971. [Google Scholar] [CrossRef]
Zhang, H.; Wei, Q.; Luo, Y. A Novel Infinite-Time Optimal Tracking Control Scheme for a Class of Discrete-Time Nonlinear Systems via the Greedy HDP Iteration Algorithm. IEEE Trans. Syst. Man Cybern. Syst. 2008, 38, 937–942. [Google Scholar] [CrossRef] [PubMed]
Li, J.; Chai, T.; Lewis, F.L.; Ding, Z.; Jiang, Y. Off-Policy Interleaved Q-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems. IEEE Trans. Neural Netw. Learn. Syst. 2019, 5, 1308–1320. [Google Scholar] [CrossRef]
Sun, C.; Li, X.; Sun, Y. A parallel framework of adaptive dynamic programming algorithm with off-policy learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 8, 3578–3587. [Google Scholar] [CrossRef] [PubMed]
Duan, J.; Guan, Y.; Li, S.E.; Ren, Y.; Sun, Q.; Cheng, B. Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE Trans. Neural Netw. Learn. Syst. 2022, 11, 6584–6598. [Google Scholar] [CrossRef] [PubMed]
Song, S.; Zhu, M.; Dai, X.; Gong, D. Model-Free Optimal Tracking Control of Nonlinear Input-Affine Discrete-Time Systems via an Iterative Deterministic Q-Learning Algorithm. IEEE Trans. Neural Networks Learn. Syst. 2024, 1, 999–1012. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The structure schematic of the proposed technique.

Figure 2. The unknown function

f_{1} (x)

and approximation of the unknown function

{\tilde{f}}_{1} (x)

.

Figure 2. The unknown function

f_{1} (x)

and approximation of the unknown function

{\tilde{f}}_{1} (x)

.

Figure 3. The unknown function

f_{2} (x)

and approximation of the unknown function

\tilde{f_{2}} (x)

.

Figure 3. The unknown function

f_{2} (x)

and approximation of the unknown function

\tilde{f_{2}} (x)

.

Figure 4. The system control input

u_{1}

and the system control input

u_{2}

.

Figure 4. The system control input

u_{1}

and the system control input

u_{2}

.

Figure 5. The state trajectory

x_{1}

and the reference trajectory

x_{1 d}

using our tracking control method.

Figure 5. The state trajectory

x_{1}

and the reference trajectory

x_{1 d}

using our tracking control method.

Figure 6. The state trajectory

x_{2}

and the reference trajectory

x_{2 d}

using our tracking control method.

Figure 6. The state trajectory

x_{2}

and the reference trajectory

x_{2 d}

using our tracking control method.

Figure 7. The state trajectory

x_{1}

and the reference trajectory

x_{1 d}

using tracking control methods for references.

Figure 7. The state trajectory

x_{1}

and the reference trajectory

x_{1 d}

using tracking control methods for references.

Figure 8. The state trajectory

x_{2}

and the reference trajectory

x_{2 d}

using tracking control methods for references.

Figure 8. The state trajectory

x_{2}

and the reference trajectory

x_{2 d}

using tracking control methods for references.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, J.; Xu, D.; Li, Y.; Ma, Y. Near-Optimal Tracking Control of Partially Unknown Discrete-Time Nonlinear Systems Based on Radial Basis Function Neural Network. Mathematics 2024, 12, 1146. https://doi.org/10.3390/math12081146

AMA Style

Huang J, Xu D, Li Y, Ma Y. Near-Optimal Tracking Control of Partially Unknown Discrete-Time Nonlinear Systems Based on Radial Basis Function Neural Network. Mathematics. 2024; 12(8):1146. https://doi.org/10.3390/math12081146

Chicago/Turabian Style

Huang, Jiashun, Dengguo Xu, Yahui Li, and Yan Ma. 2024. "Near-Optimal Tracking Control of Partially Unknown Discrete-Time Nonlinear Systems Based on Radial Basis Function Neural Network" Mathematics 12, no. 8: 1146. https://doi.org/10.3390/math12081146

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Near-Optimal Tracking Control of Partially Unknown Discrete-Time Nonlinear Systems Based on Radial Basis Function Neural Network

Abstract

1. Introduction

2. Problem Statement

3. Optimal Tracking Controller Design with Partially Unknown Dynamics

3.1. RBF-NN Identifier Design

3.2. RBF-NN Steady-State Controller Design

3.3. Near-Optimal Feedback Controller Design

3.4. Stability Analysis

4. Simulation

4.1. Simulation Result of the Proposed Method

4.2. Comparison with Other Methods

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI