Optimal Robust Control of Nonlinear Systems with Unknown Dynamics via NN Learning with Relaxed Excitation

Rui Luo; Zhinan Peng; Jiangping Hu

doi:10.3390/e26010072

Abstract

This paper presents an adaptive learning structure based on neural networks (NNs) to solve the optimal robust control problem for nonlinear continuous-time systems with unknown dynamics and disturbances. First, a system identifier is introduced to approximate the unknown system matrices and disturbances with the help of NNs and parameter estimation techniques. To obtain the optimal solution of the optimal robust control problem, a critic learning control structure is proposed to compute the approximate controller. Unlike existing identifier-critic NNs learning control methods, novel adaptive tuning laws based on Kreisselmeier’s regressor extension and mixing technique are designed to estimate the unknown parameters of the two NNs under relaxed persistence of excitation conditions. Furthermore, theoretical analysis is also given to prove the significant relaxation of the proposed convergence conditions. Finally, effectiveness of the proposed learning approach is demonstrated via a simulation study.

Keywords:

optimal robust control; nonlinear systems; parameter estimation; neural networks learning; relaxed PE conditions

1. Introduction

In the past several decades, much attention has been given to

H_{\infty}

control problems, wherein the aim is to eliminate the influence of disturbance on the system.

H_{\infty}

control mainly focuses on designing a robust controller to regulate and stabilize the system. In practice, we should not only focus on the control performance, but also consider the optimization of the system [1,2]. Therefore, optimal

H_{\infty}

control problems will always be a hot research topic.

Adaptive dynamic programming (ADP), as one of the optimal control methods, has emerged as a powerful tool through which to deal with the optimal control problems of all kinds of dynamic systems [3]. The ADP framework combines dynamic programming and neural network approximation, and it has strong learning and adaptive ability. In this sense, ADP has rapidly developed in the control community in recent years. Generally speaking, the core of controller designs mainly concentrates on solving a Hamilton–Jacobi–Bellman (HJB) equation for nonlinear systems or an algebraic Riccati equation for linear systems [4]. Unfortunately, the HJB equation contains nonlinear, partial differential parts, which are difficult to solve directly [5]. Therefore, many efforts have been made for finding approximate solutions to the HJB equation using iterative or learning methods. Regarding the case of iterative methods, the ADP can be classed into two categories: value iteration (VI) [6,7] and policy iteration (PI) [8,9]. Regarding the case of learning-based methods, neural network (NN) approximation is generally utilized to learn the optimal or suboptimal solutions to the HJB equation. The standard learning frameworks include the following: actor–critic NNs and only-critic NNs. However, the abovementioned pieces of literature require partial or full model information in the controller design loop. To avoid relying on system models, many data-driven or model-free methods have been developed for improving the existing ADP frameworks, that is, data-driven RL [7], integral RL (IRL) [10,11], and system identification-based ADP methods [12,13,14].

More recently, excellent development has been realized with the use of ADP for the robust controller designs of optimal

H_{\infty}

control problems [15,16,17]. The main way through which to solve optimal

H_{\infty}

control problems is to model such problems as a two-player zero-sum game (min–max optimization problem), where the controller and the disturbance are viewed as players that try to find a controller to minimize the performance index function in worst-case disturbance conditions [18,19]. However, the disadvantage of zero-sum games is in judging the existence of the saddle point, which is generally difficult to judged. In order to overcome this issue, an indirect method motivated by [20] was developed by formulating an optimal regulation for a nominal system with new designs of the cost/value function [21]. For instance, Yang et al. proposed an event-triggered robust control strategy for nonlinear systems [22] using the indirect method. Xue et al. studied a tracking control problem for partial continuous-time systems with uncertainties and constraints [23] by transforming the robust control problem into an optimal regulation of nominal systems.

However, the existing results on

H_{\infty}

optimal control designs have two main characteristics: (1) their controller designs are based on the assumption that the complete or partial knowledge of the system dynamics are known in advance; however, (2) to address this issue, some system identification methods have been proposed, such as the identifier–critic- or identifier–actor–critic-based designs of

H_{\infty}

optimal control. However, it is generally required that the persistence of excitation (PE) condition must be satisfied to ensure the learning performance of the weight updating of neural networks, which is difficult to check online in practice [18,19,23]. Therefore, how to weaken the PE condition is also the research motivation of this paper.

From the abovementioned observations and considerations, in this paper, we propose a novel online parameter estimation method based on an identifier–critic learning control framework for the

H_{\infty}

optimal control of nonlinear systems that have unknown dynamics with relaxed PE conditions. The contributions of our work can be summarized as follows:

A new online identifier–critic learning control framework with a relaxed PE condition is proposed to address robust control for unknown continuous-time systems subject to unknown disturbances. To reconstruct the information of the system dynamics, neural networks combined with the linear regressor method are established to approximate the unknown system dynamics and disturbances.
The approach in this paper is different from the existing weight adaption laws [18,19,23], where the PE condition is needed to ensure the learning performance of the NN’s weight parameters. However, such a condition is difficult to check online, and a general way through which to satisfy this condition is to add external noise to the controller, which may lead to the instability of the system. To overcome this issue, a Kreisselmeier regressor extension and mixing (KREM)-based weight adaption law is designed for identifier–critic NNs with new convergence conditions.
Weak PE properties of new convergence conditions are analyzed rigorously compared to traditional PE conditions. Moreover, the theoretical results indicate that the closed-loop system’s stability and the convergence of identifier–critic learning are guaranteed.

The remainder of this article is organized as follows. In Section 2, some preliminaries are introduced and the optimal robust control problem of nonlinear continuous-time systems is given. Then, a system identifier design with a relaxed PE condition is constructed in Section 3. Section 4 gives the critic NN design for robust control under a relaxed PE condition. Theoretical analyses of the weak PE properties under new convergence conditions and the stability of the closed-loop systems are given in Section 5. The simulation results are provided in Section 6. Some conclusions are summarized in Section 7.

2. Preliminaries and Problem Formulation

In this section, some notation and definitions are first introduced. Then, the optimal robust control problem of the nonlinear continuous-time systems is described.

2.1. Preliminaries

To facilitate readability, some notations are listed.

$λ (\cdot)$	Eigenvalue of a matrix
${\cdot}^{*}$	Adjoint matrix
$I_{n}$	Identity matrix
$t r (\cdot)$	Trace of a matrix
$λ_{M} (\cdot)$	Maximum eigenvalues
$λ_{m} (\cdot)$	Minimum eigenvalues

The following definitions will be used in the sequel.

Definition 1

(

P e r s i s t e n c e o f E x c i t a t i o n

[24]). A bounded signal

ψ (t)

is said to be PE, if there exist positive constants T and

δ_{1}

such that

\int_{t}^{t + T} ψ (r) ψ^{T} (r) d r \geq δ_{1} I .

For clarity, we indicate that

ψ (t)

satisfies the PE condition using the notation

ψ (t) \in P E

; otherwise,

ψ (t) \notin P E

.

Definition 2

(

U n i f o r m l y U l t i m a t e l y B o u n d e d

[24]). The time function

x (t)

is said to be uniformly ultimately bounded (UUB) on a compact set

Ω_{x}

, if, for all

x (t_{0}) = x_{0} \in Ω_{x}

, there exists a

δ_{2} > 0

and a number

T (δ_{2}, x_{0})

such that

∥ x (t) ∥ < δ_{2}

for all

t \geq t_{0} + T

.

2.2. Problem Formulation

Consider the nonlinear continuous-time (NCT) systems with disturbances described by the following dynamics:

\begin{matrix} \dot{x} (t) = f (x) + g (x) u (t) + G (x) d (t), \end{matrix}

(1)

where

x (t) \in R^{n}

and

u (t) \in R^{m}

denote the system state and control input, respectively.

d (t) \in R^{q}

represents the external disturbance. The terms

f (x) \in R^{n}

,

g (x) \in R^{n \times m}

, and

G (x) \in R^{n \times q}

are the drift dynamics, input dynamics, and disturbance injection dynamics, respectively. In this study,

f (x)

,

g (x)

, and

G (x)

are assumed to be unknown. Furthermore, it is assumed that

f (x)

,

g (x)

, and

G (x)

are Lipschitz continuous with

f (0) = 0

, and that the system (1) is stabilizing and controllable.

The goal of this study is to solve an

H_{\infty}

control problem for the system (1). This problem can be equivalently transformed into a two-player zero-sum game, where the control input

u (t)

acts as the minimizing player and the disturbance

d (t)

acts as the maximizing player. The solution to the

H_{\infty}

control problem corresponds to a saddle point in the game, which stabilizes the equilibrium of the two-player zero-sum game.

Define the infinite-horizon performance index function as

\begin{matrix} V (x, u, d) = \int_{t}^{\infty} (x^{T} Q x + u^{T} R u - κ^{2} d^{T} d) d τ, \end{matrix}

(2)

where

κ > 0

,

V (0) = 0

, and Q and R are symmetric positive-definite matrices with appropriate dimensions. Let

u^{★}

be the optimal control input and

d^{★}

be the worst disturbance. Our objective is to find the saddle point

(u^{★}, d^{★})

that optimizes the performance index (2), which can be more precisely clarified by the following inequality:

\begin{matrix} V (u^{★}, d) \leq V (u^{★}, d^{★}) \leq V (u, d^{★}) . \end{matrix}

(3)

We then define the optimal performance index function

V^{★}

as follows:

\begin{matrix} V^{★} (x, u, d) = min_{u} max_{d} \int_{t}^{\infty} (x^{T} Q x + u^{T} R u - κ^{2} d^{T} d) d τ . \end{matrix}

(4)

The Hamiltonian of system (1) can be written as

\begin{matrix} H (V_{x}, x, u, d) = & V_{x}^{T} [f (x) + g (x) u + G (x) d] + x^{T} Q x + u^{T} R u - κ^{2} d^{T} d, \end{matrix}

(5)

where

V_{x} = \partial V / \partial x \in R^{n}

. The Hamilton–Jacobi–Isaacs (HJI) equation related to this game has the form

\begin{matrix} min_{u} max_{d} H (V_{x}^{★}, x, u, d) = 0, \end{matrix}

(6)

where

V_{x}^{★} = \partial V^{★} / \partial x \in R^{n}

. Based on the stationarity condition, the

H_{\infty}

control pair

(u^{★}, d^{★})

for (1) has the following form:

\begin{matrix} u^{★} = - \frac{1}{2} R^{- 1} g^{T} (x) V_{x}^{★} (x), \end{matrix}

(7)

\begin{matrix} d^{★} = \frac{1}{2 κ^{2}} G^{T} (x) V_{x}^{★} (x) . \end{matrix}

(8)

Thus, according to (7) and (8), the HJI Equation (6) can be rewritten as

\begin{matrix} x^{T} Q x & + V_{x}^{★ T} f (x) - \frac{1}{4} V_{x}^{★ T} g (x) R^{- 1} g^{T} (x) V_{x}^{★} + \frac{1}{4 κ^{2}} V_{x}^{★ T} G (x) G^{T} (x) V_{x}^{★} = 0 . \end{matrix}

(9)

Indeed, the HJI Equation (9) represents a highly nonlinear partial differential equation (PDE) and requires complete system information for its resolution. To address these challenges, a new IC framework with relaxed PE conditions will be proposed in the following sections. Furthermore, new adaptive update laws for the identifier and critic NNs are provided with the help of the KREM technique. The block diagram of the proposed control system is shown in Figure 1, and detailed theoretical analysis will be presented in subsequent sections.

Figure 1. Schematic of the proposed control system.

3. System Identifier Design with Relaxed PE Condition

In this section, an NN-based identifier is utilized to reconstruct the unknown system dynamics in (1). The KREM technique is introduced to adjust the identifier weights under relaxed PE conditions. We assume that the unknown system dynamics

f (x)

,

g (x)

, and

G (x)

in (1) are continuous functions defined on compact sets. The NN-based identifier is designed as follows:

f (x) = W_{f} θ_{f} (x) + ϵ_{f},

(10)

g (x) = W_{g} θ_{g} (x) + ϵ_{g},

(11)

G (x) = W_{G} θ_{G} (x) + ϵ_{G},

(12)

where

W_{f} \in R^{n \times d_{f}}

,

W_{g} \in R^{n \times d_{g}}

and

W_{G} \in R^{n \times d_{G}}

are the ideal NN weights;

θ_{f} (x) \in R^{d_{f}}

,

θ_{g} (x) \in R^{d_{g} \times m}

and

θ_{G} (x) \in R^{d_{G} \times q}

are the basis functions; and

ϵ_{f} \in R^{n}

,

ϵ_{g} \in R^{n \times m}

and

ϵ_{G} \in R^{n \times q}

are the reconstruction errors. Then, according to the Weierstrass theorem and the statements in [10], the approximation errors

ϵ_{f}

,

ϵ_{g}

, and

ϵ_{G}

can be shown to approach zero as the number of NN neurons

d_{f}

,

d_{g}

, and

d_{G}

increases to infinity.

Before proceeding, it is essential to establish the following underlying assumption.

Assumption 1.

(1): The basis functions $θ_{f} (x)$ , $θ_{g} (x)$ and $θ_{G} (x)$ are bounded, that is, $∥θ_{f} (x)∥ \leq b_{θ_{f}}$ , $∥θ_{g} (x)∥ \leq b_{θ_{g}}$ , $∥θ_{G} (x)∥ \leq b_{θ_{G}}$ , respectively.
(2): The reconstruction errors $ε_{f}$ , $ε_{g}$ and $ε_{G}$ are bounded, that is, $∥ε_{f}∥ \leq b_{ε_{f}}$ , $∥ε_{g}∥ \leq b_{ε_{g}}$ , $∥ε_{G}∥ \leq b_{ε_{G}}$ , respectively.

Using (10)–(12), the system (1) can be rewritten as

\dot{x} = W_{I}^{T} θ_{I} (x, u) + ϵ_{T},

(13)

where

W_{I} = {[W_{f}, W_{g}, W_{G}]}^{T} \in R^{d \times n}

is the augmented weight matrix with

d = d_{f} + d_{g} + d_{G}

, and

θ_{I} (x, u) = {[θ_{f}^{T} (x), u^{T} θ_{g}^{T} (x), d^{T} θ_{G}^{T} (x)]}^{T} \in R^{d}

is the augmented regressor vector.

ϵ_{T} = ϵ_{f} + ϵ_{g} u + ϵ_{G} d \in R^{n}

is the model approximation error.

Note that

\dot{x}

and

W_{I}

are unknown. Therefore, we define the filtered variables

x_{f}

and

θ_{I f}

as

\{\begin{matrix} ρ {\dot{x}}_{f} + x_{f} = x, x_{f} (0) = 0 \\ ρ {\dot{θ}}_{I f} + θ_{I f} = θ_{I}, θ_{I f} (0) = 0 \end{matrix}

(14)

where

ρ \in R > 0

is the filter coefficient. From Equations (13) and (14), we can deduce that

{\dot{x}}_{f} = \frac{x - x_{f}}{ρ} = W_{I}^{T} θ_{I f} + ϵ_{T f},

(15)

where

ϵ_{T f}

denotes the filtered version of

ϵ_{T}

as

ρ {\dot{ϵ}}_{T f} + ϵ_{T f} = ϵ_{T}

. Clearly, (15) is a linear regression equation (LRE), where

{\dot{x}}_{f}

and

θ_{I f}

can be calculated from (14). In the following, we describe how the KREM technique is applied to estimate

W_{I}

by using the measured information

{\dot{x}}_{f}

and

θ_{I f}

.

To approximate the unknown weights

W_{I}

in (15) such that the estimated weights

{\hat{W}}_{I}

converge to their true values under a relaxed PE condition, we aim to construct an extended LRE (E-LRE) based on (15). We define the matrices

P_{I} \in R^{d \times d}

and

Q_{I} \in R^{d \times n}

as follows:

\begin{matrix} {\begin{matrix} P_{I} = H_{I} [θ_{I f} θ_{I f}^{T}], P_{I} (0) = 0 \\ Q_{I} = H_{I} [θ_{I f} {(\frac{x - x_{f}}{ρ})}^{T}], Q_{I} (0) = 0 \end{matrix} \end{matrix}

(16)

where

\begin{matrix} H_{I} = \frac{1}{p + l_{I}} [s] (t) \end{matrix}

with

p = d / d t

,

l_{I} > 0

is a forgetting factor. From (16), we can derive its solution as

{\begin{matrix} P_{I} = \int_{0}^{t} e^{- l_{I} (t - τ)} θ_{I f} (τ) θ_{I f}^{T} (τ) d (τ), \\ Q_{I} = \int_{0}^{t} e^{- l_{I} (t - τ)} θ_{I f} (τ) {(\frac{x (τ) - x_{f} (τ)}{ρ})}^{T} d (τ) . \end{matrix}

(17)

Note that it can be verified that

P_{I}

and

Q_{I}

are bounded for any given bounded

θ_{I}

and x due to the appropriate choice of

l_{I}

. Thus, an E-LRE is obtained

Q_{I} (t) = P_{I} (t) W_{I} + v_{I},

(18)

where

v_{I} = \int_{0}^{t} e^{- l_{I} (t - τ)} θ_{I f} (τ) ϵ_{T f}^{T} (τ) d (τ)

.

To construct an identifier weight error dynamics that achieves better convergence properties, we define the variables

Q_{I} (t) \in R^{d \times n}

,

P_{I} \in R^{d \times d}

, and

V_{I} \in R^{d \times n}

as follows:

\begin{matrix} {\begin{matrix} Q_{I} = P_{I}^{*} Q_{I}, \\ P_{I} = P_{I}^{*} P_{I}, \\ V_{I} = P_{I}^{*} v_{I} . \end{matrix} \end{matrix}

(19)

Then Equation (18) becomes

Q_{I} (t) = P_{I} (t) W_{I} + V_{I} .

(20)

Note that for any square matrix

M \in R^{q \times q}

, we have

M^{*} M = | M | I_{q}

, even if M is not full rank. Thus,

P_{I} = | P_{I} | I_{d} \in R^{d \times d}

. Moreover,

P_{I}

is a scalar diagonal matrix, where (20) can be decoupled into a series of scalar LREs:

{Q_{I}}_{(i, j)} (t) = | P_{I} | (t) {W_{I}}_{(i, j)} + V_{I (i, j)}, i = 1, \dots, d, j = 1, \dots, n,

(21)

where

{Q_{I}}_{(i, j)}

and

{W_{I}}_{(i, j)}

indicate the ith row and jth column of

Q_{I}

and

W_{I}

, respectively.

Then, the estimation algorithm for the unknown identifier NN weights can be designed based on (21) as follows:

{\dot{\hat{W}}}_{I (i, j)} = - γ_{1} | P_{I} | [| P_{I} | {\hat{W}}_{I (i, j)} - Q_{I (i, j)}],

(22)

where

γ_{1} \in R > 0

presents the adaptive learning gain.

The convergence of identifier (22) can be given as follows.

Theorem 1.

Consider the system (13) with the online update law (22); if

| P_{I} | \in P E

, then

(i): for $ϵ_{T} = 0$ , the estimator error ${\tilde{W}}_{I (i, j)}$ converges to zero exponentially;
(ii): for $ϵ_{T} \neq 0$ , the estimator error ${\tilde{W}}_{I (i, j)}$ converges to a compact set around zero.

Proof.

If

| P_{I} | \in P E

, according to Definition 1 we have

\int_{t}^{t + T} | P_{I} |^{2} d r \geq δ_{I} > 0

. Defining the estimation error

{\tilde{W}}_{I (i, j)} = {\hat{W}}_{I (i, j)} - W_{I (i, j)}

,

i = 1, \dots, d

,

j = 1, \dots, n

. Due to (21) and (22), the identifier weight error dynamics can be obtained

{\dot{\tilde{W}}}_{I (i, j)} = - γ_{1} | P_{I} |^{2} {\tilde{W}}_{I (i, j)} + γ_{1} | P_{I} | V_{I (i, j)} .

(23)

Considering the Lyapunov function

V_{I} = 0.5 γ_{1}^{- 1} {\tilde{W}}_{I (i, j)}^{2}

, the derivation of

V_{I}

can be calculated as

\begin{matrix} {\dot{V}}_{I} & = \frac{1}{γ_{1}} {\tilde{W}}_{I (i, j)} {\dot{\tilde{W}}}_{I (i, j)} \\ = - | P_{I} |^{2} {\tilde{W}}_{I (i, j)}^{2} + | P_{I} | {\tilde{W}}_{I (i, j)} V_{I (i, j)} . \end{matrix}

(24)

In fact, when

ϵ_{T} = 0

, (24) can be rewritten as

\begin{matrix} {\dot{V}}_{I} = - {| P_{I} |}^{2} {\tilde{W}}_{I (i, j)}^{2} < - μ_{I} V_{I}, \end{matrix}

(25)

where

μ_{I} = 2 γ_{1} δ_{I} > 0

. According to the Lyapunov theorem, the weight estimation error

{\tilde{W}}_{I (i, j)}

exponentially converges to zero.

When

ϵ_{T} \neq 0

, (24) can be further presented as

\begin{matrix} {\dot{V}}_{I} & = - | P_{I} |^{2} {\tilde{W}}_{I (i, j)}^{2} + | P_{I} | {\tilde{W}}_{I (i, j)} V_{I (i, j)} \\ = - [| P_{I} |^{2} {\tilde{W}}_{I (i, j)} - | P_{I} | V_{I (i, j)}] {\tilde{W}}_{I (i, j)} . \end{matrix}

(26)

According to Assumption 1,

| P_{I} | V_{I (i, j)}

is bounded, denoted as

∥ | P_{I} | V_{I (i, j)} ∥ < b_{P_{I} V_{I}}

. Then,

\begin{matrix} {\dot{V}}_{I} & \leq - [| P_{I} |^{2} | | {\tilde{W}}_{I (i, j)} | | - b_{P_{I} V_{I}}] | | {\tilde{W}}_{I (i, j)} | | . \end{matrix}

(27)

According to the extended Lyapunov theorem, the estimation error

{\tilde{W}}_{I (i, j)}

uniformly ultimately converges to a compact set

{{\tilde{W}}_{I (i, j)} | | | {\tilde{W}}_{I (i, j)} | | \leq b_{P_{I} V_{I}} / p_{I}^{2}}

. □

Remark 1.

In [12], the update law for the unknown weight

W_{I}

was designed based on (18), while the PE condition (i.e.,

θ_{I} \in P E

) was required to ensure convergence. However, satisfying the PE condition is generally challenging. In Theorem 1, we provide a new convergence condition

| P_{I} | \in P E

. Notably, this new condition is significantly superior to the conventional PE condition for two reasons. (1) We theoretically prove that

| P_{I} | \in P E

is much weaker than

θ_{I} \in P E

, as detailed in Section 5. (2)

| P_{I} |

is directly related to the determinant of the matrix

P_{I} (t)

. Therefore, checking

| P_{I} | \in P E

online becomes feasible by calculating the determinant of

P_{I} (t)

. In contrast, assessing the standard PE condition directly online is not possible [18,19,23].

Based on the above analysis, the unknown information

f (x)

,

g (x)

, and

G (x)

can be estimated using (13) and (22). This allows for the reconstruction of the completely unknown system dynamics. In order to obtain the optimal

H_{\infty}

control pair, the critic NN will be introduced to learn the solution of the HJB equation in the subsequent section.

4. Critic NN Design for $H_{\infty}$ Control under Relaxed PE Condition

In this section, the performance index will be approximated via a critic NN to obtain the optimal

H_{\infty}

control pair. The KREM algorithm will be continually utilized to design the update law of critic NN under the relaxed PE condition. Firstly, based on the above identifier, the system (1) can be represented as

\dot{x} = {\hat{W}}_{f} θ_{f} (x) + {\hat{W}}_{g} θ_{g} (x) u + {\hat{W}}_{G} θ_{G} (x) d (t) + ϵ_{I} + ϵ_{T},

(28)

where

{\hat{W}}_{f}

,

{\hat{W}}_{g}

and

{\hat{W}}_{G}

are the estimated values of

W_{f}

,

W_{g}

and

W_{G}

, respectively.

ϵ_{I} = {\tilde{W}}_{I} θ_{I}

denotes the identifier error. And, the Hamiltonian (5) can be further written as

\begin{matrix} H = & V_{x}^{T} [{\hat{W}}_{f} θ_{f} (x) + {\hat{W}}_{g} θ_{g} (x) u + {\hat{W}}_{G} θ_{G} (x) d (t) + ϵ_{I} + ϵ_{T}] + x^{T} Q x + u^{T} R u - κ^{2} d^{T} d . \end{matrix}

(29)

Then, the HJI Equation (6) becomes

\begin{matrix} 0 & = min_{u} max_{d} [H (V_{x}^{★}, x, u^{★}, d^{★})] \\ = V_{x}^{★ T} [{\hat{W}}_{f} θ_{f} (x) + {\hat{W}}_{g} θ_{g} (x) u^{★} + {\hat{W}}_{G} θ_{G} (x) d^{★} (t) + ϵ_{I} + ϵ_{T}] + x^{T} Q x + u^{★ T} R u^{★} - κ^{2} d^{★ T} d^{★} . \end{matrix}

(30)

Therefore, based on (30), the

H_{\infty}

control pair

(u^{★}, d^{★})

for the estimated system (28) can be expressed as follows:

\begin{matrix} u^{★} = - \frac{1}{2} R^{- 1} {[{\hat{W}}_{g} θ_{g}]}^{T} V_{x}^{★}, \end{matrix}

(31)

\begin{matrix} d^{★} = \frac{1}{2 κ^{2}} {[{\hat{W}}_{G} θ_{G}]}^{T} V_{x}^{★} (x) . \end{matrix}

(32)

Since the HJI Equation (30) is a nonlinear PDE, similar to (6), we utilize a critic NN to estimate

V^{★} (x)

and its gradient

V_{x}^{★} (x)

as follows:

V^{★} (x) = W_{c}^{T} θ_{c} (x) + ϵ_{v},

(33)

V_{x}^{★} (x) = \nabla θ_{c}^{T} (x) W_{c} + \nabla ϵ_{v},

(34)

where

W_{c} \in R^{l}

is the unknown constant weight.

θ_{c} (x) \in R^{l}

represents the independent basis function with

\nabla θ_{c} (x) = \partial θ_{c} / \partial x

. l is the number of neurons. The approximation error is presented as

ϵ_{v}

with

\nabla ϵ_{v} = \partial ϵ_{v} / \partial x

. Note that as the number of independent basis functions increases, both the approximation errors and their gradients can approach zero.

Before proceeding, the following assumption is needed.

Assumption 2.

(1): The ideal critic NN’s weight $W_{c}$ is bounded, that is, $∥W_{c}∥ < b_{W_{c}}$ .
(2): The basis functions $θ_{c} (x)$ and its gradients $\nabla θ_{c} (x)$ are bounded, that is, $∥θ_{c}∥ \leq b_{θ_{c}}$ , $∥\nabla θ_{c}∥ \leq b_{\nabla θ_{c}}$ .
(3): The approximator reconstruction error $ϵ_{v}$ and its gradients $\nabla ϵ_{v}$ are bounded, that is, $∥ϵ_{v}∥ \leq b_{ϵ_{v}}$ , $∥\nabla ϵ_{v}∥ \leq b_{\nabla ϵ_{v}}$ .

Since the ideal critic NN weights

W_{c}

are unknown, take

{\hat{W}}_{c}

as the estimated value of

W_{c}

and

\hat{V}

as the estimated value of V, where the practical critic NN is given by

\begin{matrix} \hat{V} (x) = {\hat{W}}_{c}^{T} θ_{c} (x), \\ {\hat{V}}_{x} (x) = \nabla θ_{c}^{T} (x) {\hat{W}}_{c} . \end{matrix}

(35)

The estimated

H_{\infty}

control pair

\hat{u}

and

\hat{d}

can be obtained as

\begin{matrix} \hat{u} = - \frac{1}{2} R^{- 1} {[{\hat{W}}_{g} θ_{g}]}^{T} {\hat{V}}_{x} = - \frac{1}{2} R^{- 1} {[{\hat{W}}_{g} θ_{g}]}^{T} \nabla θ_{c}^{T} {\hat{W}}_{c}, \end{matrix}

(36)

\begin{matrix} \hat{d} = \frac{1}{2 κ^{2}} {[{\hat{W}}_{G} θ_{G}]}^{T} {\hat{V}}_{x} = \frac{1}{2 κ^{2}} {[{\hat{W}}_{G} θ_{G}]}^{T} \nabla θ_{c}^{T} {\hat{W}}_{c} . \end{matrix}

(37)

To online estimate the unknown weights of the critic NN using KREM technology, we aim to construct a linear equation according to (30) and (34) as

\begin{matrix} ϵ_{H J I} + x^{T} Q x + {\hat{u}}^{T} R \hat{u} - κ^{2} {\hat{d}}^{T} \hat{d} + W_{c}^{T} \nabla θ_{c} {\hat{W}}_{f} θ_{f} (x) \\ + W_{c}^{T} \nabla θ_{c} {\hat{W}}_{g} θ_{g} (x) \hat{u} + W_{c}^{T} \nabla θ_{c} {\hat{W}}_{G} θ_{G} (x) \hat{d} = 0, \end{matrix}

(38)

where

ϵ_{H J I} = W_{c}^{T} \nabla θ_{c} (ϵ_{I} + ϵ_{T}) + \nabla ϵ_{v}^{T} ({\hat{W}}_{f} θ_{f} + {\hat{W}}_{g} θ_{g} \hat{u} + {\hat{W}}_{G} θ_{G} \hat{d} + ϵ_{I} + ϵ_{T})

is a bounded residual HJI equation error. Let

Θ = \nabla θ_{c} [{\hat{W}}_{f} θ_{f} + {\hat{W}}_{g} θ_{g} \hat{u} + {\hat{W}}_{G} θ_{G} \hat{d}]

and

Σ = x^{T} Q x + {\hat{u}}^{T} R \hat{u} - κ^{2} {\hat{d}}^{T} \hat{d}

, where a linear equation is obtained as follows:

\begin{matrix} Σ = - W_{c}^{T} Θ - ϵ_{H J I} . \end{matrix}

(39)

Similar to the previous section, we define the filtered regressor matrix

P_{c} \in R^{l \times l}

and the vector

Q_{c} \in R^{l}

as follows:

\begin{matrix} \{\begin{matrix} P_{c} = H_{c} [Θ Θ^{T}], P_{c} (0) = 0 \\ Q_{c} = H_{c} [Θ Σ], Q_{c} (0) = 0 \end{matrix} \end{matrix}

(40)

where

H_{c} = \frac{1}{p + l_{c}} [s] (t),

and

l_{c} > 0

is the forgetting factor. Then, the solution of (40) can be deduced as

\begin{matrix} \{\begin{matrix} P_{c} = \int_{0}^{t} e^{- l_{c} (t - τ)} Θ Θ^{T} d τ, \\ Q_{c} = \int_{0}^{t} e^{- l_{c} (t - τ)} Θ Σ d τ . \end{matrix} \end{matrix}

(41)

From (39) and (41), an E-LRE related to

P_{c}

and

Q_{c}

is obtained

Q_{c} (t) = - P_{c} (t) W_{c} - v_{c},

(42)

where

v_{c} = \int_{0}^{t} e^{- l_{c} (t - τ)} Θ (τ) ϵ_{H J I}^{T} (τ) d τ

is bounded. To estimate the unknown parameter

W_{c}

in (42) under a relaxed PE condition, define the variables

Q_{c} (t) \in R^{l}

,

P_{c} \in R^{l \times l}

, and

V_{c} \in R^{l}

as

\begin{matrix} \{\begin{matrix} Q_{c} = P_{c}^{*} Q_{c}, \\ P_{c} = P_{c}^{*} P_{c}, \\ V_{c} = P_{c}^{*} v_{c} . \end{matrix} \end{matrix}

(43)

Then Equation (42) becomes

Q_{c} (t) = - P_{c} (t) W_{c} - V_{c} .

(44)

Note that

P_{c} = | P_{c} | I_{l}

. Since

P_{c}

is a scalar matrix, a series of scalar LREs is obtained as

{Q_{c}}_{(i)} (t) = - | P_{c} | (t) W_{c (i)} - V_{c (i)}, i = 1, \dots, l,

(45)

where

{Q_{c}}_{(i)}

,

W_{c (i)}

and

V_{c (i)}

indicate the

i t h

rows of

Q_{c}

,

W_{c}

, and

V_{c}

, respectively.

Driven by the parameter error based on (45), the critic unknown weight

W_{c (i)}

is designed as

{\dot{\hat{W}}}_{c (i)} = - γ_{2} | P_{c} | [| P_{c} | {\hat{W}}_{c (i)} + Q_{c (i)}],

(46)

where

γ_{2} \in R > 0

presents the adaptive learning gain.

The convergence condition for the proposed critic NN adaptive law is provided in Theorem 2.

Theorem 2.

For adaptive law (46) of critic NN with the regressor matrix

P_{c}

in (44); if

| P_{c} | \in P E

, then

(i): for $ϵ_{H J I} = 0$ , the estimator error ${\tilde{W}}_{c (i)}$ converges to zero exponentially;
(ii): for $ϵ_{H J I} \neq 0$ , the estimator error ${\tilde{W}}_{c (i)}$ converges to a compact set around zero;

Proof.

Defining the estimation error

{\tilde{W}}_{c (i)} = {\hat{W}}_{c (i)} - W_{c (i)}

,

i = 1, \dots, l

. The proofs presented in Theorem 1 can be extended to establish similar results in the current context. Note that the Lyapunov function

V_{c}

here is chosen as

0.5 γ_{2}^{- 1} {\tilde{W}}_{c (i)}^{2}

. □

Remark 2.

According to Theorem 2, a new convergence condition for the estimation error of the critic neural network weights, denoted as

{\tilde{W}}_{c}

, is provided. This condition does not rely on the conventional parameter estimation (PE) condition, i.e.,

Θ \in P E

. In this paper, the additional exploration signal is not required to guarantee

Θ \in P E

. Instead, the satisfaction of

| P_{c} | \in P E

can be achieved by adjusting the forgetting factor

l_{c}

. It is worth noting that the new convergence condition is associated with the matrix

P_{c}

, and it can be verified online by calculating the determinant of

P_{c}

. The proof of the weak PE property for the new convergence condition will be presented in the following section.

Remark 3.

The convergence analysis of

{\tilde{W}}_{I (i, j)}

and

{\tilde{W}}_{c (i)}

are provided in Theorem 1 and Theorem 2, respectively. In fact, we can derive the convergence of

{\tilde{W}}_{I}

and

{\tilde{W}}_{c}

using simple matrix operations, which will be omitted in this paper.

Till now, the identifier–critic learning-based framework for

H_{\infty}

optimal control under the relaxed PE condition is given. For clarity, the design details of the proposed method are shown in Algorithm 1, which can be considered the pseudocode for the simulation part.

Algorithm 1 Identifier–critic learning-based

H_{\infty}

optimal control algorithm

1:: Initialization
2:: Initialize system parameters: $x (0)$ , Q, R and running time T;
3:: Set the identifier and critic filter operators: $H_{I}$ and $H_{c}$ ;
4:: Set the basis functions of identifier and critic NNs: $θ_{I} (x, u)$ and $θ_{c} (x)$ ;
5:: Initialize and set the filter operator parameters: $ρ$ , $l_{I}$ , $l_{c}$ , $x_{f} (0) = 0$ , $θ_{I f} (0) = 0$ and $ϵ_{I f} (0) = 0$ ;
6:: Initialize identifier NNs parameters: $γ_{1} > 0$ , ${\hat{W}}_{I}^{i n i t i a l} \in (0, 1]$ ;
7:: Initialize critic NNs parameters: $γ_{2} > 0$ , ${\hat{W}}_{c}^{i n i t i a l} \in (0, 1]$ ;
8:: Initialize the control pair by $(36)$ and $(37)$ ;
9:: while $t \leq T$ do
10:: Calculate the filter processing of the identifier NNs by $(14)$ ;
11:: Calculate the dynamic regressor extension (DRE) of the identifier NNs by $(15)$ ;
12:: Calculate the regressor “mixing” of the identifier NNs by $(18)$ ;
13:: Update the weight parameters of the identifier NNs ${\hat{W}}_{I (i, j)}$ by $(20)$ ;

${\dot{\hat{W}}}_{I (i, j)} = - γ_{1} | P_{I} | [| P_{I} | {\hat{W}}_{I (i, j)} - Q_{I (i, j)}];$
14:: Compute the approximated HJB equation by $(39)$ ;
15:: Calculate the dynamic regressor extension (DRE) of the critic NNs by $(40)$ ;
16:: Calculate the regressor “mixing” of the critic NNs by $(42)$ ;
17:: Update the weight parameters of the critic NNs ${\hat{W}}_{c (i)}$ by $(46)$ ;

${\dot{\hat{W}}}_{c (i)} = - γ_{2} | P_{c} | [| P_{c} | {\hat{W}}_{c (i)} + Q_{c (i)}];$
18:: Update the control pair by $(36)$ and $(37)$ ;
19:: Update the system states x by $(28)$ ;
20:: end while

5. Stability and Convergence Analysis

In this section, we present the main results, which include the theoretical analysis of weak PE properties under new convergence conditions proposed in Theorem 1 and Theorem 2. Furthermore, we provide a stability result for the closed-loop system under the proposed online learning optimal control method.

To facilitate the analysis, the following assumption is made.

Assumption 3.

The system dynamics in (1) satisfy

∥f (x)∥ \leq b_{f} ∥x∥

,

∥g (x)∥ \leq b_{g}

and

∥G (x)∥ \leq b_{G}

, where

b_{f} > 0

,

b_{g} > 0

and

b_{G} > 0

.

5.1. Weak PE Properties of New Convergence Conditions

As shown in Theorem 1, Theorem 2 and Remark 3, the convergence of

{\tilde{W}}_{I}

and

{\tilde{W}}_{c}

is established without the restrictive PE condition, i.e.,

θ_{I} \in P E

and

Θ \in P E

. These new convergence conditions can be easily checked online, as mentioned in Remark 1 and Remark 2. Furthermore, we will analyze the superiority of the new convergence conditions compared to the conventional PE condition from a theoretical standpoint.

Theorem 3.

Consider the system (13) with the online identifier NN adaptive law (22) and critic NN adaptive law (46),

(i): The convergence condition of estimation error ${\tilde{W}}_{I}$ in Theorem 1, that is, $| P_{c} | \in P E$ , is weaker than $θ_{I} \in P E$ in the following precise sense

$θ_{I} (t) \in P E \Rightarrow | P_{c} | \in P E,$

(47)

$| P_{c} | \in P E ⇏ θ_{I} (t) \in P E;$

(48)
(ii): The convergence condition of estimation error ${\tilde{W}}_{c}$ in Theorem 2, that is, $| P_{c} | \in P E$ , is weaker than $Θ \in P E$ in the following precise sense

$Θ \in P E \Rightarrow | P_{c} | \in P E,$

(49)

$| P_{c} | \in P E ⇏ Θ \in P E .$

(50)

Proof.

For

(i)

, suppose that

θ_{I} (t)

in (13) is PE, indicating that

θ_{I f} (t) \in P E

[25]. From Definition 1, we have

\begin{matrix} \int_{t}^{t + τ} θ_{I f} (r) θ_{I f}^{T} (r) d r \geq δ I \Leftrightarrow \int_{t - τ}^{t} θ_{I f} (r) θ_{I f}^{T} (r) d r \geq δ I for t > τ > 0 . \end{matrix}

(51)

Moreover, since

e^{- β_{I} (t - r)} \geq e^{- β_{I} τ} > 0

with

r \in [t - τ, t]

, the following inequality holds

\begin{matrix} \int_{t - τ}^{t} e^{- β_{I} (t - r)} θ_{I f}^{T} (r) θ_{I f} (r) d r \geq \int_{t - τ}^{t} e^{- β_{I} τ} θ_{I f}^{T} (r) θ_{I f} (r) d r \geq e^{- β_{I} τ} δ I . \end{matrix}

(52)

Furthermore, for

t > τ > 0

, we also have

\begin{matrix} \int_{0}^{t} e^{- β_{I} (t - r)} θ_{I f}^{T} (r) θ_{I f} (r) d r > \int_{t - τ}^{t} e^{- β_{I} (t - r)} θ_{I f}^{T} (r) θ_{I f} (r) d r . \end{matrix}

(53)

From (17), (52) and (53), we conclude that

\begin{matrix} P_{I} & = \int_{0}^{t} e^{- β_{I} (t - r)} θ_{I f}^{T} (r) θ_{I f} (r) d r > e^{- β_{I} τ} \int_{t - τ}^{t} θ_{I f}^{T} (r) θ_{I f} (r) d r \geq e^{- β_{I} τ} δ I . \end{matrix}

(54)

Hence, the matrix

P_{I}

in (16) is positive definite, that is,

λ_{i} (P_{I}) > 0

,

i = 1, \dots, d

. Considering that the determinant of a matrix is equal to the product of all its eigenvalues, that is,

| P_{I} | = λ_{1} (P_{I}) λ_{2} (P_{I}) \dots λ_{d} (P_{I})

, we obtain

λ_{i} (P_{I}) > 0 \Rightarrow \prod_{i = 1}^{d} λ_{i} (P_{I}) > 0 \Rightarrow | P_{I} | > 0

. Thus, (47) is true.

The proof of (48) is established by the following:

\begin{matrix} | P_{I} | \in P E & \Leftrightarrow \int_{0}^{t} | P_{I} |^{2} (τ) d τ > 0 \\ \Leftrightarrow \int_{0}^{t} \prod_{i = 1}^{d} λ_{i}^{2} (P_{I}) > 0 \\ ⇏ λ_{i} (P_{I}) > 0, i = 1, \dots, d \\ \Leftrightarrow P_{I} > 0 \\ \Leftrightarrow P_{I} \in P E . \end{matrix}

(55)

For (ii), the proof process can be referred to in (i). This finishes the proof. □

5.2. Stability and Convergence Analysis

The stability result for the closed-loop system under the proposed online learning optimal control method will be presented in the following theorem.

Theorem 4.

Let Assumptions 1 and 2 hold. Considering system (1) with the identifier weight tuning law given by (22), the

H_{\infty}

control pair are computed by (36) and (37), respectively. The critic NN weight tuning laws are updated by (46). If

| P_{I} | \in P E

and

| P_{c} | \in P E

, then the closed-loop system, system identifier estimation error

{\tilde{W}}_{I}

, and critic estimation error

{\tilde{W}}_{c}

are uniformly ultimately bounded (UUB). Moreover, the approximated

H_{\infty}

control pair given by (36) and (37) are close to the optimal control pair

u^{★}

and

d^{★}

within a small region

b_{u}

and

b_{d}

, that is,

∥ \hat{u} - u^{★} ∥ \leq b_{u}

and

∥ \hat{d} - d^{★} ∥ \leq b_{d}

, where

b_{u}

and

b_{d}

are positive constants.

Proof.

We consider the Lyapunov function as follows:

\begin{matrix} J (t) & = \frac{1}{2} t r {{\tilde{W}}_{I}^{T} (t) γ_{1}^{- 1} {\tilde{W}}_{I} (t)} + \frac{1}{2} {\tilde{W}}_{c}^{T} (t) γ_{2}^{- 1} {\tilde{W}}_{c} (t) + γ_{3} x^{T} x + γ_{4} V^{★} (x) \\ + γ_{5} t r {V_{I}^{T} V_{I}} + γ_{6} V_{c}^{T} V_{c} \\ = J_{1} + J_{2} + J_{3} + J_{4} + J_{5} + J_{6}, \end{matrix}

(56)

where

γ_{3}

,

γ_{4}

,

γ_{5}

and

γ_{6}

are positive constants.

By applying matrix operations, we can obtain the following:

\begin{matrix} {\dot{\tilde{W}}}_{I} = - γ_{1} | P_{I} | [| P_{I} | {\tilde{W}}_{I} - V_{I}], \\ {\dot{\tilde{W}}}_{c} = - γ_{2} | P_{c} | [| P_{c} | {\tilde{W}}_{c} - V_{c}] . \end{matrix}

(57)

According to Definition 1,

| P_{I} | \in P E

and

| P_{c} | \in P E

imply that

\int_{t}^{t + T} | P_{I} |^{2} d r \geq δ_{I} > 0

and

\int_{t}^{t + T} | P_{c} |^{2} d r \geq δ_{c} > 0

. Substituting (19), (43), and using Young’s inequality

a b \leq a^{2} η / 2 + b^{2} / 2 η

with

η > 0

, we have

\begin{matrix} {\dot{J}}_{1} & = t r \{- {\tilde{W}}_{I}^{T} | P_{I} |^{2} {\tilde{W}}_{I} + | P_{I} | {\tilde{W}}_{I}^{T} V_{I}\} \leq - (δ_{I} - \frac{1}{2 η}) {∥{\tilde{W}}_{I}∥}^{2} + \frac{η}{2} b_{P_{I}^{*}}^{2} {∥| P_{I} | v_{I}∥}^{2}, \end{matrix}

(58)

\begin{matrix} {\dot{J}}_{2} & = - {\tilde{W}}_{c}^{T} | P_{c} |^{2} {\tilde{W}}_{c} + | P_{c} | {\tilde{W}}_{c}^{T} V_{c} \leq - (δ_{c} - \frac{1}{2 η}) {∥{\tilde{W}}_{c}∥}^{2} + \frac{η}{2} b_{P_{c}^{*}}^{2} {∥| P_{c} | v_{c}∥}^{2} . \end{matrix}

(59)

where

∥P_{I}^{*}∥ \leq b_{P_{I}^{*}}

,

∥P_{c}^{*}∥ \leq b_{P_{c}^{*}}

.

For

J_{3}

and

J_{4}

,

\begin{matrix} {\dot{J}}_{3} + {\dot{J}}_{4} & = 2 γ_{3} x^{T} \dot{x} + γ_{4} {\dot{V}}^{★} (x) \\ = 2 γ_{3} x^{T} [f (x) + g (x) \hat{u} + G (x) \hat{d} - g (x) u^{★} + g (x) u^{★} - G (x) d^{★} + G (x) d^{★}] \\ + γ_{4} (- x^{T} Q x - u^{★ T} R u^{★} + κ^{2} d^{★ T} d^{★}) \\ = 2 γ_{3} x^{T} [f (x) + g (x) (- \frac{1}{2} R^{- 1} {\hat{g}}^{T} (x) \nabla θ_{c}^{T} (x) {\hat{W}}_{c} + \frac{1}{2} R^{- 1} g^{T} (x) (\nabla θ_{c}^{T} (x) W_{c} \\ + \nabla ϵ_{v})) + g (x) u^{★} + G (x) (\frac{1}{2 κ^{2}} {\hat{G}}^{T} (x) \nabla θ_{c}^{T} (x) {\hat{W}}_{c} - \frac{1}{2 κ^{2}} G^{T} (x) (\nabla θ_{c}^{T} (x) W_{c} \\ + \nabla ϵ_{v})) + G (x) d^{★}] + γ_{4} (- x^{T} Q x - u^{★ T} R u^{★} + κ^{2} d^{★ T} d^{★}) . \end{matrix}

(60)

Since

g^{T} \nabla θ_{c}^{T} W_{c} - {\hat{g}}^{T} \nabla θ_{c}^{T} {\hat{W}}_{c} = g^{T} \nabla θ_{c}^{T} {\tilde{W}}_{c} + {\tilde{g}}^{T} \nabla θ_{c}^{T} {\hat{W}}_{c}

, and

- G^{T} \nabla θ_{c}^{T} W_{c} + {\hat{G}}^{T} \nabla θ_{c}^{T} {\hat{W}}_{c} =

- G^{T} \nabla θ_{c}^{T} {\tilde{W}}_{c} - {\tilde{G}}^{T} \nabla θ_{c}^{T} {\hat{W}}_{c}

, (60) can be rewritten as

\begin{matrix} {\dot{J}}_{3} + {\dot{J}}_{4} = & 2 γ_{3} x^{T} [f (x) + g (x) (\frac{1}{2} R^{- 1} g^{T} (x) \nabla θ_{c}^{T} (x) {\tilde{W}}_{c} + \frac{1}{2} R^{- 1} {\tilde{g}}^{T} (x) \nabla θ_{c}^{T} (x) {\hat{W}}_{c} \\ + \frac{1}{2} R^{- 1} g^{T} (x) \nabla ε_{v}) + g (x) u^{★} + G (x) (- \frac{1}{2 κ^{2}} G^{T} (x) \nabla θ_{c}^{T} (x) {\tilde{W}}_{c} \\ - \frac{1}{2 κ^{2}} {\tilde{G}}^{T} (x) \nabla θ_{c}^{T} (x) {\hat{W}}_{c} - \frac{1}{2 κ^{2}} G^{T} (x) \nabla ϵ_{v}) + G (x) d^{★}] \\ + γ_{4} (- x^{T} Q x - u^{★ T} R u^{★} + κ^{2} d^{★ T} d^{★}) \\ \leq & - [γ_{4} λ_{m} (Q) - 2 γ_{3} b_{f} - 4 η] {∥x∥}^{2} + [\frac{1}{2 η} γ_{3}^{2} b_{g}^{2} b_{ω}^{2} b_{\nabla θ_{c}}^{2} λ_{M}^{2} (R^{- 1}) \\ + \frac{1}{2 η κ^{4}} γ_{3}^{2} b_{G}^{2} b_{ω}^{2} b_{\nabla θ_{c}}^{2}] {∥{\tilde{W}}_{I}∥}^{2} + [\frac{1}{2 η} γ_{3}^{2} b_{g}^{4} b_{\nabla θ_{c}}^{2} λ_{M}^{2} (R^{- 1}) + \frac{1}{2 η κ^{4}} γ_{3}^{2} b_{G}^{4} b_{\nabla θ_{c}}^{2}] {∥{\tilde{W}}_{c}∥}^{2} \\ - [γ_{4} λ_{m} (R) - 2 γ_{3}^{2} b_{g}^{2} / η] {∥u^{★}∥}^{2} + [\frac{1}{2 η} γ_{3}^{2} b_{g}^{4} λ_{M}^{2} (R^{- 1}) + \frac{γ_{3}^{2}}{κ^{4}} b_{G}^{4}] {∥\nabla ϵ_{v}∥}^{2} \\ + [γ_{4} κ^{2} + 2 γ_{3}^{2} b_{G}^{2} / η] {∥d^{★}∥}^{2}, \end{matrix}

(61)

where

b_{ω} = ∥{\hat{W}}_{c}∥

is a bounded variable.

Recall that

V_{I} = P_{I}^{*} v_{I}

and

{\dot{v}}_{I} = - l_{I} v_{I} + θ_{I f} ϵ_{T f}^{T}

, thus

\begin{matrix} {\dot{J}}_{5} \leq & 2 γ_{5} b_{P_{I}^{*}}^{2} ∥v_{I}^{T} {\dot{v}}_{I}∥ = 2 γ_{5} b_{P_{I}^{*}}^{2} ∥v_{I}^{T} [- l_{I} v_{I} + θ_{I f} ϵ_{T f}^{T}]∥ \\ \leq & - (2 γ_{5} b_{P_{I}^{*}}^{2} l_{I} - b_{P_{I}^{*}}^{2} η) {∥v_{I}∥}^{2} + \frac{1}{η} γ_{5}^{2} b_{P_{I}^{*}}^{2} {∥θ_{I f} ϵ_{T f}^{T}∥}^{2} . \end{matrix}

(62)

Since

{\dot{v}}_{c} = - l_{c} v_{c} + Θ ϵ_{H J I}^{T}

. Hence, the last term of (56) can be given as

\begin{matrix} {\dot{J}}_{6} \leq & 2 γ_{6} b_{P_{c}^{*}}^{2} ∥v_{c}^{T} {\dot{v}}_{c}∥ \\ = & 2 γ_{6} b_{P_{c}^{*}}^{2} ∥v_{c}^{T} [- l_{c} v_{c} + Θ ϵ_{H J I}^{T}]∥ \\ \leq & - (2 γ_{6} l_{c} b_{P_{c}^{*}}^{2} - 5 b_{P_{c}^{*}}^{2} η) {∥v_{c}∥}^{2} + \frac{1}{η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{W_{c}}^{2} b_{\nabla θ_{c}}^{2} {∥Θ∥}^{2} {∥ϵ_{T}∥}^{2} \\ + \frac{1}{η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{\nabla θ_{c}}^{2} b_{W_{c}}^{2} {∥θ_{I}∥}^{2} {∥Θ∥}^{2} {∥{\tilde{W}}_{I}∥}^{2} + \frac{1}{η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{f}^{2} b_{\nabla ϵ_{v}}^{2} {∥Θ∥}^{2} {∥x∥}^{2} \\ + \frac{1}{4 η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{g}^{2} b_{ϖ 1}^{2} b_{\nabla θ_{c}}^{2} b_{ω}^{2} λ_{M}^{2} (R^{- 1}) {∥Θ∥}^{2} {∥\nabla ϵ_{v}∥}^{2} \\ + \frac{1}{4 η κ^{4}} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{G}^{2} b_{ϖ 2}^{2} b_{\nabla θ_{c}}^{2} b_{ω}^{2} {∥Θ∥}^{2} {∥\nabla ϵ_{v}∥}^{2} . \end{matrix}

(63)

where

b_{ϖ 1} = ∥{\hat{W}}_{g} θ_{g}∥

and

b_{ϖ 2} = ∥{\hat{W}}_{G} θ_{G}∥

are bounded variables. Consequently, we substitute (58), (59), and (61)–(63) into (56); thus, we have

\begin{matrix} \dot{J} (t) & = {\dot{J}}_{1} + {\dot{J}}_{2} + {\dot{J}}_{3} + {\dot{J}}_{4} + {\dot{J}}_{5} + {\dot{J}}_{6} \\ \leq - (δ_{I} - \frac{1}{2 η} - \frac{1}{2 η} γ_{3}^{2} b_{g}^{2} b_{ω}^{2} b_{\nabla θ_{c}}^{2} λ_{M}^{2} (R^{- 1}) - \frac{1}{2 η κ^{4}} γ_{3}^{2} b_{G}^{2} b_{ω}^{2} b_{\nabla θ_{c}}^{2} \\ - \frac{1}{η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{\nabla θ_{c}}^{2} b_{W_{c}}^{2} {∥θ_{I}∥}^{2} {∥Θ∥}^{2}) {∥{\tilde{W}}_{I}∥}^{2} - (γ_{4} λ_{m} (R) - \frac{2}{η} γ_{3}^{2} b_{g}^{2}) {∥u^{★}∥}^{2} \\ - (δ_{c} - \frac{1}{2 η} - \frac{1}{2 η} γ_{3}^{2} b_{g}^{4} b_{\nabla θ_{c}}^{2} λ_{M}^{2} (R^{- 1}) - \frac{1}{2 η κ^{4}} γ_{3}^{2} b_{G}^{4} b_{\nabla θ_{c}}^{2}) {∥{\tilde{W}}_{c}∥}^{2} \\ - (γ_{4} λ_{m} (Q) - 2 γ_{3} b_{f} - 4 η - \frac{1}{η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{f}^{2} b_{\nabla ϵ_{v}}^{2} {∥Θ∥}^{2}) {∥x∥}^{2} + \frac{1}{η} γ_{5}^{2} b_{P_{I}^{*}}^{2} {∥ϕ_{I f} ϵ_{T f}^{T}∥}^{2} \\ - (2 γ_{5} b_{P_{I}^{*}}^{2} l_{I} - b_{P_{I}^{*}}^{2} η - \frac{η}{2} b_{P_{I}^{*}}^{2} {| P_{I} |}^{2}) {∥v_{I}∥}^{2} - (2 γ_{6} l_{c} b_{P_{c}^{*}}^{2} - 5 b_{P_{c}^{*}}^{2} η - \frac{η}{2} b_{P_{I}^{*}}^{2} {| P_{I} |}^{2}) {∥v_{c}∥}^{2} \\ + (\frac{1}{2 η} γ_{3}^{2} b_{g}^{4} λ_{M}^{2} (R^{- 1}) + \frac{γ_{3}^{2}}{κ^{4}} b_{G}^{4} + \frac{1}{4 η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{g}^{2} b_{ϖ 1}^{2} b_{\nabla θ_{c}}^{2} b_{ω}^{2} λ_{M}^{2} (R^{- 1}) {∥Θ∥}^{2} \\ + \frac{1}{4 η κ^{4}} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{G}^{2} b_{ϖ 2}^{2} b_{\nabla θ_{c}}^{2} b_{ω}^{2} {∥Θ∥}^{2}) {∥\nabla ϵ_{v}∥}^{2} + \frac{1}{η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{W_{c}}^{2} b_{\nabla θ_{c}}^{2} {∥Θ∥}^{2} {∥ϵ_{T}∥}^{2} . \end{matrix}

(64)

We choose the parameters

γ_{3}

,

γ_{4}

,

γ_{5}

,

γ_{6}

and

η

, fulfilling the following conditions

\begin{matrix} η > max {(κ^{4} + κ^{4} γ_{3}^{2} b_{g}^{2} b_{ω}^{2} b_{\nabla θ_{c}}^{2} λ_{M}^{2} (R^{- 1}) + γ_{3}^{2} b_{G}^{2} b_{ω}^{2} b_{\nabla θ_{c}}^{2} + 2 κ^{4} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{\nabla θ_{c}}^{2} b_{W_{c}}^{2} \\ {∥θ_{I}∥}^{2} {∥Θ∥}^{2}) / 2 κ^{4} δ_{I}, (κ^{4} + κ^{4} γ_{3}^{2} b_{g}^{4} b_{\nabla θ_{c}}^{2} λ_{M}^{2} (R^{- 1}) + κ^{4} γ_{3}^{2} b_{G}^{4} b_{\nabla θ_{c}}^{2}) / 2 κ^{4} δ_{c}}, \\ γ_{3} < \sqrt{γ_{4} η λ_{m} (R) / 2 b_{g}^{2}}, \\ γ_{4} > \{2 γ_{3} b_{f} - 4 η - \frac{1}{η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{f}^{2} b_{\nabla ϵ_{v}}^{2} {∥Θ∥}^{2}\} / λ_{m} (Q), \\ γ_{5} > (η + \frac{η}{2} | P_{I} |^{2}) / 2 l_{I}, \\ γ_{6} > (5 η - \frac{η}{2} | P_{c} |^{2}) / 2 l_{c} . \end{matrix}

Then, (64) can be further presented as

\begin{matrix} \dot{J} (t) & \leq - k_{1} {∥{\tilde{W}}_{I}∥}^{2} - k_{2} {∥{\tilde{W}}_{c}∥}^{2} - k_{3} {∥x∥}^{2} - k_{4} {∥v_{I}∥}^{2} - k_{5} {∥v_{c}∥}^{2} + b_{γ}, \end{matrix}

(65)

where

k_{1}

,

k_{2}

,

k_{3}

,

k_{4}

,

k_{5}

and

b_{γ}

are positive constants

\begin{matrix} k_{1} = & δ_{I} - \frac{1}{2 η} - \frac{1}{2 η} γ_{3}^{2} b_{g}^{2} b_{ω}^{2} b_{\nabla θ_{c}}^{2} λ_{M}^{2} (R^{- 1}) - \frac{1}{2 η κ^{4}} γ_{3}^{2} b_{G}^{2} b_{ω}^{2} b_{\nabla θ_{c}}^{2} - \frac{1}{η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{\nabla θ_{c}}^{2} b_{W_{c}}^{2} {∥θ_{I}∥}^{2} {∥Θ∥}^{2}, \\ k_{2} = & δ_{c} - \frac{1}{2 η} - \frac{1}{2 η} γ_{3}^{2} b_{g}^{4} b_{\nabla θ_{c}}^{2} λ_{M}^{2} (R^{- 1}) - \frac{1}{2 η κ^{4}} γ_{3}^{2} b_{G}^{4} b_{\nabla θ_{c}}^{2}, \\ k_{3} = & γ_{4} λ_{m} (Q) - 2 γ_{3} b_{f} - 4 η - \frac{1}{η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{f}^{2} b_{\nabla ϵ_{v}}^{2} {∥Θ∥}^{2}, \\ k_{4} = & 2 γ_{5} b_{P_{I}^{*}}^{2} l_{I} - b_{P_{I}^{*}}^{2} η - \frac{η}{2} b_{P_{I}^{*}}^{2} | P_{I} |^{2}, k_{5} = 2 γ_{6} l_{c} b_{P_{c}^{*}}^{2} - 5 b_{P_{c}^{*}}^{2} η - \frac{η}{2} b_{P_{c}^{*}}^{2} {| P_{c} |}^{2}, \\ b_{γ} = & (\frac{1}{2 η} γ_{3}^{2} b_{g}^{4} λ_{M}^{2} (R^{- 1}) + \frac{γ_{3}^{2}}{κ^{4}} b_{G}^{4} + \frac{1}{4 η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{g}^{2} b_{ϖ 1}^{2} b_{\nabla θ_{c}}^{2} b_{ω}^{2} λ_{M}^{2} (R^{- 1}) {∥Θ∥}^{2} + \frac{1}{η} γ_{5}^{2} b_{P_{I}^{*}}^{2} {∥ϕ_{I f} ϵ_{T f}^{T}∥}^{2} \\ + & \frac{1}{4 η κ^{4}} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{G}^{2} b_{ϖ 2}^{2} b_{\nabla θ_{c}}^{2} b_{ω}^{2} {∥Θ∥}^{2}) {∥\nabla ϵ_{v}∥}^{2} + \frac{1}{η} γ_{6}^{2} b_{P_{c}^{*}}^{2} b_{W_{c}}^{2} b_{\nabla θ_{c}}^{2} {∥Θ∥}^{2} {∥ϵ_{T}∥}^{2} . \end{matrix}

Thus,

\dot{J} (t)

is negative if

\begin{matrix} ∥{\tilde{W}}_{I}∥ > \sqrt{b_{γ} / k_{1}}, ∥{\tilde{W}}_{c}∥ > \sqrt{b_{γ} / k_{2}}, ∥x∥ > \sqrt{b_{γ} / k_{3}}, ∥Ξ_{I}∥ > \sqrt{b_{γ} / k_{4}}, ∥Ξ_{c}∥ > \sqrt{b_{γ} / k_{5}}, \end{matrix}

which implies that the NN weight estimation errors

{\tilde{W}}_{I}

,

{\tilde{W}}_{c}

and the system state x are all UUB.

Lastly, the error between the proposed

H_{\infty}

control pair and the ideal one are written as

\begin{matrix} \hat{u} - u^{★} = & - \frac{1}{2} R^{- 1} {[{\hat{W}}_{g} θ_{g} (x)]}^{T} \nabla θ_{c}^{T} (x) {\hat{W}}_{c} + \frac{1}{2} R^{- 1} g^{T} (\nabla θ_{c}^{T} (x) W_{c} + \nabla ϵ_{v}) \\ = & \frac{1}{2} R^{- 1} g^{T} \nabla θ_{c}^{T} (x) {\tilde{W}}_{c} + \frac{1}{2} R^{- 1} {[g - {\hat{W}}_{g} θ_{g} (x)]}^{T} \nabla θ_{c}^{T} (x) W_{c} \\ - \frac{1}{2} R^{- 1} {[g - {\hat{W}}_{g} θ_{g} (x)]}^{T} \nabla θ_{c}^{T} (x) {\tilde{W}}_{c} + \frac{1}{2} R^{- 1} g^{T} \nabla ϵ_{v}, \end{matrix}

\begin{matrix} \hat{d} - d^{★} = & \frac{1}{2 κ^{2}} {[{\hat{W}}_{G} θ_{G} (x)]}^{T} \nabla θ_{c}^{T} (x) {\hat{W}}_{c} - \frac{1}{2 κ^{2}} G^{T} (\nabla θ_{c}^{T} (x) W_{c} + \nabla ϵ_{v}) \\ = & - \frac{1}{2 κ^{2}} G^{T} \nabla θ_{c}^{T} (x) {\tilde{W}}_{c} + \frac{1}{2 κ^{2}} {[G - {\hat{W}}_{G} θ_{G} (x)]}^{T} \nabla θ_{c}^{T} (x) W_{c} \\ + \frac{1}{2 κ^{2}} {[G - {\hat{W}}_{G} θ_{G} (x)]}^{T} \nabla θ_{c}^{T} (x) {\tilde{W}}_{c} - \frac{1}{2 κ^{2}} {[{\hat{W}}_{G} θ_{G} (x)]}^{T} \nabla ϵ_{v}, \end{matrix}

which further implies the following fact

\begin{matrix} lim_{t \to + \infty} ∥\hat{u} - u^{★}∥ \leq & \frac{1}{2} λ_{M} (R^{- 1}) {b_{g} (b_{\nabla θ_{c}} ∥{\tilde{W}}_{c}∥ + b_{\nabla ϵ_{v}}) + b_{\nabla θ_{c}} b_{W_{c}} (∥{\tilde{W}}_{I}∥ + ∥b_{g}∥) \\ + b_{\nabla θ_{c}} ∥{\tilde{W}}_{c}∥ (∥{\tilde{W}}_{I}∥ + ∥b_{g}∥)} \leq b_{u}, \end{matrix}

\begin{matrix} lim_{t \to + \infty} ∥\hat{d} - d^{★}∥ \leq & \frac{1}{2 κ^{2}} {b_{G} (b_{\nabla θ_{c}} ∥{\tilde{W}}_{c}∥ + b_{\nabla ϵ_{v}}) + b_{\nabla θ_{c}} b_{W_{c}} (∥{\tilde{W}}_{I}∥ + ∥b_{G}∥) \\ + b_{\nabla θ_{c}} ∥{\tilde{W}}_{c}∥ (∥{\tilde{W}}_{I}∥ + ∥b_{G}∥)} \leq b_{d}, \end{matrix}

where

b_{u} > 0

and

b_{d} > 0

are constants determined by the identifier NN estimation error

{\tilde{W}}_{I}

and the critic NN estimation error

{\tilde{W}}_{c}

. It proves that the approximate

H_{\infty}

control pair can converge to a set around the optimal solution.

This completes the proof. □

6. Numerical Simulation

This section aims to verify the effectiveness of the proposed KREM-based IC learning approach for optimal robust control. We consider the following NCT system [12]

\begin{matrix} \dot{x} = f (x) + g (x) u + G (x) d, \end{matrix}

(66)

where

f (x) = [\begin{matrix} - x_{1} + x_{2} \\ - 0.5 x_{1} - 0.5 x_{2} (1 - {(cos (2 x_{1}) + 2)}^{2}) \end{matrix}]

,

g (x) = [\begin{matrix} 0 \\ cos (2 x_{1}) + 2 \end{matrix}]

,

G (x) = [\begin{matrix} 0 \\ sin (4 x_{1}) + 2 \end{matrix}]

.

We choose the regressor of identifier NN as

θ_{I} (x, u) = {[x_{1}, x_{2}, x_{2} (1 - {(cos (2 x_{1}) + 2)}^{2}), u cos (2 x_{1}), u, d sin (4 x_{1}), d]}^{T},

with the unknown identifier weight matrix given by

W_{I} = [\begin{matrix} - 1 & 1 & 0 & 0 & 0 & 0 & 0 \\ - 0.5 & 0 & - 0.5 & 1 & 2 & 1 & 2 \end{matrix}] .

The activation function in (33) for the critic NN is selected as

θ_{c} (x, u) = {[x_{1}^{2}, x_{1} x_{2}, x_{2}^{2}]}^{T} .

The ideal critic NN weights were

W_{c} = {[0.5, 0, 1]}^{T}

.

In this numerical example, several other parameters are set as follows: the initial values of the system states are

x_{1} (0) = 3

and

x_{2} (0) = - 1

.

Q = I_{2}

and

R = 1

. The filter coefficients are

ρ = 0.001

,

l_{I} = 0.1

,

l_{c} = 20

,

γ_{1} = 800

,

γ_{2} = 200 d i a g {0.3, 1, 1}

. It is important to note that in this simulation, there is no need to add noise to the control input

u (t)

to ensure the PE condition. This condition is often necessary for many existing ADP-based control methods to ensure that

θ_{I} (t) \in P E

and

Θ (t) \in P E

.

For comparison, we consider the Kreisselmeier’s Regressor Extension (KRE) based identifier-critic network framework [12] for the system (66). Figure 2 and Figure 3 display the convergence of the identifier NN weights and the critic NN weights, respectively, under our KREM-based optimal robust control method and the KRE-based control method [12]. As illustrated in Figure 2, the KREM-based ADP method proposed in this paper exhibits faster convergence compared to the KRE-based ADP method. Furthermore, it demonstrates element-wise monotonicity, thus preventing oscillations and peaking in the learning curve. The trajectories of the approximate control input

\hat{u}

and the estimated disturbance

\hat{d}

are presented in Figure 4 and Figure 5, respectively. By applying the optimal

H_{\infty}

control pair, the system states are stabilized, as depicted in Figure 6.

Figure 2. Comparison of the convergence of identifier NN’s weights

{\hat{W}}_{I}

: (a) KREM-based method; (b) KRE-based method in [12].

Figure 3. Comparison of the convergence of critic NN’s weights

{\hat{W}}_{c}

: (a) KREM-based method; (b) KRE-based method in [12].

Figure 4. Evolution of the approximate control input

\hat{u}

.

Figure 5. Disturbance action d.

Figure 6. Trajectories of the system states

x = {[x_{1}, x_{2}]}^{T}

.

7. Conclusions

This paper presents a novel adaptive learning approach using neural networks (NNs) to address the problem of optimal robust control for nonlinear continuous-time systems with unknown dynamics. The approach involves employing a system identifier that utilizes NNs and parameter estimation techniques to approximate the unknown system matrices and disturbances. Additionally, a critic NNs learning structure is introduced to obtain an approximate controller that corresponds to the optimal control problem. Unlike existing identifier-critic NNs learning control methods, this approach incorporates adaptive tuning laws based on a regressor extension and mixing technique. These laws facilitate the learning of unknown parameters in the two NNs under relaxed persistence of excitation conditions. The convergence conditions of the proposed approach have been theoretically demonstrated. Finally, the effectiveness of the proposed learning control approach is validated via a simulation study.

Author Contributions

Methodology, R.L.; Validation, R.L.; Formal analysis, R.L.; Investigation, R.L. and Z.P.; Writing—original draft, R.L.; Writing—review & editing, Z.P. and J.H.; Supervision, Z.P. and J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62203089 and Grant 62103084, in part by the Project funded by China Postdoctoral Science Foundation under Grant 2021M700695, in part by the Sichuan Science and Technology Program, China under Grant 2022NSFSC0890, Grant 2022NSFSC0865, and Grant 2021YFS0016, and in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2022A1515110135.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare that they have no conflict of interest. All authors have approved the manuscript and agreed with submission to this journal.

References

Luo, R.; Peng, Z.; Hu, J. On model identification based optimal control and it’s applications to multi-agent learning and control. Mathematics 2023, 11, 906. [Google Scholar] [CrossRef]
Luo, B.; Wu, H.N.; Huang, T. Off-policy reinforcement learning for H_∞ control design. IEEE Trans. Cybern. 2014, 45, 65–76. [Google Scholar] [CrossRef] [PubMed]
Werbos, P. Approximate Dynamic Programming for Realtime Control and Neural Modelling; White, D.A., Sofge, D.A., Eds.; Van Nostrand: New York, NY, USA, 1992. [Google Scholar]
Lewis, F.L.; Vrabie, D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst. Mag. 2009, 9, 32–50. [Google Scholar] [CrossRef]
Vamvoudakis, K.G.; Lewis, F.L. Online actor–critic algorithm to solve the continuous time infinite horizon optimal control problem. Automatica 2010, 46, 878–888. [Google Scholar] [CrossRef]
Wei, Q.; Liu, D.; Lin, H. Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans. Cybern. 2015, 46, 840–853. [Google Scholar] [CrossRef]
Peng, Z.; Zhao, Y.; Hu, J.; Luo, R.; Ghosh, B.K.; Nguang, S.K. Input-output data-based output antisynchronization control of multi-agent systems using reinforcement learning approach. IEEE Trans. Ind. Inform. 2021, 17, 7359–7367. [Google Scholar] [CrossRef]
Peng, Z.; Zhao, Y.; Hu, J.; Ghosh, B.K. Data-driven optimal tracking control of discrete-time multi-agent systems with two-stage policy iteration algorithm. Inf. Sci. 2019, 481, 189–202. [Google Scholar] [CrossRef]
Zhang, H.; Jiang, H.; Luo, C.; Xiao, G. Discrete-time nonzero-sum games for multiplayer using policy-iteration-based adaptive dynamic programming algorithms. IEEE transactions on cybernetics. IEEE Trans. Cybern. 2016, 47, 3331–3340. [Google Scholar] [CrossRef]
Modares, H.; Lewis, F.L. Optimal tracking control of nonlinear partially unknown constrained input systems using integral reinforcement learning. Automatica 2014, 50, 1780–1792. [Google Scholar] [CrossRef]
Yang, X.; Liu, D.; Luo, B.; Li, C. Data-based robust adaptive control for a class of unknown nonlinear constrained-input systems via integral reinforcement learning. Inf. Sci. 2016, 369, 731–747. [Google Scholar] [CrossRef]
Lv, Y.; Na, J.; Ren, X. Online H_∞ control for completely unknown nonlinear systems via an identifier-critic-based ADP structure. Int. J. Control Autom. 2019, 92, 100–111. [Google Scholar] [CrossRef]
Luo, R.; Peng, Z.; Hu, J.; Ghosh, B.K. Adaptive optimal control of affine nonlinear systems via identifier-critic neural network approximation with relaxed PE conditions. Neural Netw. 2023, 167, 588–600. [Google Scholar] [CrossRef] [PubMed]
Luo, R.; Tan, W.; Peng, Z.; Zhang, J.; Hu, J.; Ghosh, B.K. Optimal consensus control for multi-agent systems with unknown dynamics and states of leader: A distributed KREM learning method. IEEE Trans. Circuits Syst. II Express Briefs 2023. [Google Scholar] [CrossRef]
Wei, Q.; Song, R.; Yan, P. Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 444–458. [Google Scholar] [CrossRef] [PubMed]
Wang, D.; Zhou, Z.; Liu, A.; Qiao, J. Event-triggered robust adaptive critic control for nonlinear disturbed systems. Nonlinear Dyn. 2023, 111, 19963–19977. [Google Scholar] [CrossRef]
Zhao, J.; Na, J.; Gao, G. Adaptive dynamic programming based robust control of nonlinear systems with unmatched uncertainties. Neurocomputing 2020, 395, 56–65. [Google Scholar] [CrossRef]
Vamvoudakis, K.G.; Lewis, F.L. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. Int. J. Robust Nonlinear Control 2012, 22, 1460–1483. [Google Scholar] [CrossRef]
Peng, Z.; Ji, H.; Zou, C.; Kuang, Y.; Cheng, H.; Shi, K.; Ghosh, B.K. Optimal H∞ tracking control of nonlinear systems with zero-equilibrium-free via novel adaptive critic designs. Neural Netw. 2023, 164, 105–114. [Google Scholar] [CrossRef]
Lin, F.; Brandt, R.D. An optimal control approach to robust control of robot manipulators. IEEE Trans. Robot. Automat. 1998, 14, 69–77. [Google Scholar]
Yang, X.; He, H.; Zhong, X. Adaptive dynamic programming for robust regulation and its application to power systems. IEEE Trans. Ind. Electron. 2017, 65, 5722–5732. [Google Scholar] [CrossRef]
Yang, X.; He, H. Adaptive critic designs for event-triggered robust control of nonlinear systems with unknown dynamics. IEEE Trans. Cybern. 2018, 49, 2255–2267. [Google Scholar] [CrossRef] [PubMed]
Xue, S.; Luo, B.; Liu, D.; Gao, Y. Event-triggered ADP for tracking control of partially unknown constrained uncertain systems. IEEE Trans. Cybern. 2021, 52, 9001–9012. [Google Scholar] [CrossRef] [PubMed]
Lewis, F.W.; Jagannathan, S.; Yesildirak, A. Neural Network Control of Robot Manipulators and Non-Linear Systems; Taylor & Francis: London, UK, 1999. [Google Scholar]
Boyd, S.; Sastry, S. Adaptive Control: Stability, Convergence and Robustness; Prentice-Hall: Englewood Cliffs, NJ, USA, 1989. [Google Scholar]

Figure 1. Schematic of the proposed control system.

Figure 2. Comparison of the convergence of identifier NN’s weights

{\hat{W}}_{I}

: (a) KREM-based method; (b) KRE-based method in [12].

Figure 2. Comparison of the convergence of identifier NN’s weights

{\hat{W}}_{I}

: (a) KREM-based method; (b) KRE-based method in [12].

Figure 3. Comparison of the convergence of critic NN’s weights

{\hat{W}}_{c}

: (a) KREM-based method; (b) KRE-based method in [12].

Figure 3. Comparison of the convergence of critic NN’s weights

{\hat{W}}_{c}

: (a) KREM-based method; (b) KRE-based method in [12].

Figure 4. Evolution of the approximate control input

\hat{u}

.

Figure 4. Evolution of the approximate control input

\hat{u}

.

Figure 5. Disturbance action d.

Figure 6. Trajectories of the system states

x = {[x_{1}, x_{2}]}^{T}

.

Figure 6. Trajectories of the system states

x = {[x_{1}, x_{2}]}^{T}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.

Optimal Robust Control of Nonlinear Systems with Unknown Dynamics via NN Learning with Relaxed Excitation

Abstract

1. Introduction

2. Preliminaries and Problem Formulation

2.1. Preliminaries

2.2. Problem Formulation

3. System Identifier Design with Relaxed PE Condition

4. Critic NN Design for H ∞ Control under Relaxed PE Condition

5. Stability and Convergence Analysis

5.1. Weak PE Properties of New Convergence Conditions

5.2. Stability and Convergence Analysis

6. Numerical Simulation

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Article Access Statistics

4. Critic NN Design for $H_{\infty}$ Control under Relaxed PE Condition