Closed-Loop Nash Equilibrium in the Class of Piecewise Constant Strategies in a Linear State Feedback Form for Stochastic LQ Games

Vasile Drăgan; Ivan Ganchev Ivanov; Ioan-Lucian Popa; Ovidiu Bagdasar

doi:10.3390/math9212713

,

and

¹

“Simion Stoilow” Institute of Mathematics, Romanian Academy, P.O. Box 1-764, 014700 Bucharest, Romania

²

The Academy of the Romanian Scientists, Str. Ilfov, 3, 050044 Bucharest, Romania

³

Faculty of Economics and Business Administration, Sofia University St. Kliment Ohridski, 1113 Sofia, Bulgaria

⁴

Department of Computing, Mathematics and Electronics, “1 Decembrie 1918” University of Alba Iulia, 510009 Alba Iulia, Romania

Mathematics2021, 9(21), 2713;https://doi.org/10.3390/math9212713

This article belongs to the Special Issue Dynamical Systems in Engineering

Version Notes

Order Reprints

Abstract

In this paper, we examine a sampled-data Nash equilibrium strategy for a stochastic linear quadratic (LQ) differential game, in which admissible strategies are assumed to be constant on the interval between consecutive measurements. Our solution first involves transforming the problem into a linear stochastic system with finite jumps. This allows us to obtain necessary and sufficient conditions assuring the existence of a sampled-data Nash equilibrium strategy, extending earlier results to a general context with more than two players. Furthermore, we provide a numerical algorithm for calculating the feedback matrices of the Nash equilibrium strategies. Finally, we illustrate the effectiveness of the proposed algorithm by two numerical examples. As both situations highlight a stabilization effect, this confirms the efficiency of our approach.

Keywords:

nash equilibria; stochastic LQ differential games; sampled-data controls; equilibrium strategies; optimal trajectories

MSC:

91A23; 93E20; 49N10; 49N70

1. Introduction

Stochastic control problems governed by Itô’s differential equations have been the subject of intensive research over the last decades. This generated a rich literature and fundamental results such as the

H_{2}

and LQ robust sampled-data control problems under a unified framework studied in [1,2], classes of uncertain sampled-data systems with random jumping parameters characterized by finite state semi-Markov process analysed in [3], or stochastic differential games investigated in [4,5,6,7].

Dynamical games have been used to solve many real life problems (see e.g., [8]). For example, the concept of Nash equilibrium is very important for dynamical games, where for controlled systems the closed-loop and open-loop equilibria strategies present special interest. Various aspects of open-loop Nash equilibria are studied for a LQ differential game in [9], other results being reported in [10,11,12]. In addiytion, in [13] applications to gas network optimisation are studied via open-loop sampled-data Nash equilibrium strategy. The framework in which state vector measurements for a class of differential games are available only at discrete times was first studied in [14]. There, a two-player differential game was considered, and necessary conditions for the sample data controls were obtained using a backward translation method starting at the last time interval, and following the previous state measurements. This case has been extended to a stochastic framework in [15], where the players have access to sample-data state information with sampling interval. For other results dealing with closed-loop systems (see, e.g., [16]). Stochastic dynamical games are an important, but more challenging framework. First introduced in [17], stochastic LQ problems have been studied extensively (see, [18,19]).

In the present paper, we consider stochastic differential games governed by Itô’s differential equation, with state multiplicative and control multiplicative white noise perturbations. The original contributions of this work are the following. First, we analyze the design of a Nash equilibrium strategy in a state feedback form in the class of piecewise constant admissible strategies. It is assumed that the state measurements are available only at some discrete times. The original problem is transformed into an equivalent one which asks to find some existence conditions for a Nash equilibrium strategy in a state feedback form for a LQ stochastic differential game described by a system of Itô differential equations controlled by impulses. Necessary and sufficient conditions for the existence of a Nash equilibrium strategy for the new LQ differential game are obtained based on methods from [20,21]. The feedback matrices of the equilibrium strategies for the original dynamical game are obtained from the general result using the structure of the matrix coefficients of the system controlled by impulses. Another major contribution of this paper consists of the numerical methods for computing the feedback matrices of the Nash equilibrium strategy.

To our knowledge, in the stochastic framework, there are few papers dealing with the problem of sampled-data Nash equilibrium strategy in both open-loop and closed-loop forms ([22,23]), the papers [13,14] mentioned before only considering the deterministic framework. In that case, the problem of sampled-data Nash equilibrium strategy can be transformed in a natural way into a problem stated in discrete-time framework. Such a transformation is not possible when the dynamical system contains state multiplicative and control multiplicative white noise perturbations. In [15], the stochastic character is due only to the presence of the additive white noise perturbations. In that case, the approach is not essentially different from the one used in the deterministic case.

The paper is organized as follows. In Section 2, we formulate the problem, introducing the L-players Nash equilibria concept. In Section 2.2, we state an equivalent form of the original problem and we introduce a system of matrix linear differential equations with jumps and algebraic constraints which is involved in the derivation of the feedback matrices of the equilibrium strategies. Then, in Section 2.3, we provide some necessary and sufficient conditions which guarantee the existence of a piecewise constant Nash equilibrium strategy. An algorithm implementing these developments is given in Section 3. The efficiency of the proposed algorithm is demonstrated by two numerical examples illustrating the behavior of the optimal trajectories generated by the equilibrium strategy. Section 4 is dedicated to conclusions.

2. Problem Formulation

2.1. Model Description and Problem Setting

Consider the controlled system having the state space representation described by

\begin{matrix} d x (t) = [A x (t) + \sum_{k = 1}^{L} B_{k} u_{k} (t)] d t + [C x (t) + \sum_{k = 1}^{L} D_{k} u_{k} (t)] d w (t), \\ x (t_{0}) = x_{0}, t \in [t_{0}, t_{f}], \end{matrix}

(1)

where

x (t) \in R^{n}

is the state vector, L is a positive integer,

u_{k} (t) \in R^{m_{k}},

k = 1, \dots, L

are control parameters, and

{w (t)}_{t \geq 0}

is a 1-dimensional standard Wiener process defined on a probability space

(Ω, F, P)

.

In the controlled system there are L players (

k = 1, 2, \dots, L

) who change their behavior through their control function

u_{k} (\cdot),

k = 1, \dots, L .

The matrices of the system

A, C \in R^{n \times n}

and matrices of the players

B_{k}, D_{k} \in R^{n \times m_{k}},

k = 1, \dots, L,

are known. In the field of the game theory, the controls

u_{k} (\cdot)

are called admissible strategies (or policies) for the players. The different classes of admissible strategies can be defined in various ways, depending on the available information.

Each player aims to minimize its own cost function (performance criterion), and for

k = 1, \dots, L

we have

\begin{matrix} J_{k} (t_{0}, x_{0}; u_{1}, \dots, u_{L}) = & E [x_{u}^{T} (t_{f}) G_{k} x_{u} (t_{f}) + \int_{t_{0}}^{t_{f}} (x_{u}^{T} (t) M_{k} x_{u} (t) + \sum_{j = 1}^{L} u_{j}^{T} (t) R_{k j} u_{j} (t)) d t] . \end{matrix}

(2)

We make the following assumption regarding the weights matrices in (2):

H.

G_{k} \geq 0,

M_{k} \geq 0,

R_{k k} > 0,

and

R_{k l} \geq 0,

with

k, l = 1, \dots, L,

and

l \neq k .

Here we generalize Definition 2.1 given in [23].

Definition 1.

The L-tuple of admissible strategies

({\tilde{u}}_{1} (\cdot), {\tilde{u}}_{2} (\cdot), \dots, {\tilde{u}}_{L} (\cdot))

is said to achieve a Nash equilibrium for the differential game described by the controlled system (1), the cost function (2), and the class of the admissible strategies

U = U_{1} \times U_{2} \times \dots \times U_{L}

, if for all

u_{k} (\cdot) \in U_{k}

,

k = 1, \dots, L

we have

\begin{matrix} J_{k} (t_{0}, x_{0}; {\tilde{u}}_{1}, {\tilde{u}}_{2}, \dots, {\tilde{u}}_{L}) \leq J_{k} (t_{0}, x_{0}; {\tilde{u}}_{1}, {\tilde{u}}_{2}, \dots ., {\tilde{u}}_{k - 1}, u_{k}, {\tilde{u}}_{k + 1}, \dots, {\tilde{u}}_{L}) . \end{matrix}

(3)

In this paper we consider a special class of closed-loop admissible strategies in which the states

x (t)

of the dynamical system are available for measurement at the discrete-times

0 \leq t_{0} < t_{1} < \dots < t_{N - 1} < t_{N} = t_{f}

, and the set of admissible strategies consists of piecewise constant stochastic processes of the form

u_{k} (t) = F_{k} (j) x (t_{j}), t_{j} \leq t < t_{j + 1}, j = 0, 1, \dots, N - 1,

(4)

with

F_{k} (j) \in R^{m_{k} \times n}

are arbitrary matrices.

Our aim is to investigate the problem of designing a Nash equilibrium strategy in the class of piecewise constant admissible strategies of type (4) (the closed-loop admissible strategies), for a LQ differential game described by a dynamical system of type (1), under the performance criteria (2). Moreover, we also present a method for the numerical computation of the feedback gains of the equilibrium strategy.

We denote

{\tilde{U}}^{p c} = {\tilde{U}}_{1}^{p c} \times {\tilde{U}}_{2}^{p c} \times \dots \times {\tilde{U}}_{L}^{p c}

the set of the piecewise constant admissible strategies of type (4).

2.2. The Equivalent Problem

Define

v_{k} : [t_{0}, t_{f}] \to R^{m_{k}}

by

v_{k} (t) = u_{k} (j)

,

t_{j} \leq t < t_{j + 1}, j = 0, 1, \dots, N - 1,

where

u_{k} (j)

are arbitrary

m_{k}

-dimensional random vectors with finite second moments. If

x (t)

is the solution of system (1) determined by the piecewise constant inputs

v_{k} (\cdot)

, we set

ξ (t) = {(x^{T} (t) v_{1}^{T} (t) \dots v_{L}^{T} (t))}^{T} \in R^{n + m}, m = \sum_{k = 1}^{L} m_{k}

.

Direct calculations show that

ξ (t)

is the solution of the initial value problem (IVP) associated to a linear stochastic system with finite jumps often called system controlled by impulses:

\begin{matrix} d ξ (t) & = A ξ (t) d t + C ξ (t) d w (t), t_{j} \leq t < t_{j + 1} \end{matrix}

(5a)

\begin{matrix} ξ (t_{j}^{+}) & = A_{d} ξ (t_{j}) + \sum_{k = 1}^{L} B_{d k} u_{k} (j), j = 0, 1, \dots, N - 1, \end{matrix}

(5b)

\begin{matrix} ξ (t_{0}) & = {(x_{0}^{T} 0^{T} \dots 0^{T})}^{T}, \end{matrix}

(5c)

under the notations:

\begin{matrix} A = (\begin{matrix} A & B_{1} & B_{2} & \dots & B_{L} \\ 0_{m n} & 0_{m m_{1}} & 0_{m m_{2}} & \dots & 0_{m m_{L}} \end{matrix}), \\ C = (\begin{matrix} C & D_{1} & D_{2} & \dots & D_{L} \\ 0_{m n} & 0_{m m_{1}} & 0_{m m_{2}} & \dots & 0_{m m_{L}} \end{matrix}), \\ A_{d} = (\begin{matrix} I_{n} & 0_{n m_{1}} & 0_{n m_{2}} & \dots & 0_{n m_{l}} \\ 0_{m n} & 0_{m m_{1}} & 0_{m m_{2}} & \dots & 0_{m m_{L}} \end{matrix}), \\ B_{d k} = {(\begin{matrix} 0_{n m_{k}}^{T} 0_{m_{1} m_{k}}^{T} \dots 0_{m_{k - 1} m_{k}}^{T} I_{m_{k}} 0_{m_{k + 1} m_{k}}^{T} \dots 0_{m_{L} m_{k}}^{T} \end{matrix})}^{T}, \end{matrix}

(6)

where

0_{p q}

denotes the zero matrix of size

p \times q

.

The performance criteria (2) becomes

\begin{matrix} J_{k} (t_{0}, ξ_{0}; u_{1}, u_{2}, \dots, u_{L}) = & E [ξ^{T} (t_{f}) G_{k} ξ (t_{f}) + \int_{t_{0}}^{t_{f}} ξ^{T} (t) M_{k} ξ (t) d t] \\ + \sum_{j = 0}^{N - 1} E [\sum_{i = 1}^{L} u_{i}^{T} (j) R_{k i} (j) u_{i} (j)], \end{matrix}

(7)

for all

u_{k} = (u_{k} (0), \dots, u_{k} (N - 1)),

u_{k} (j)

are

m_{k}

-dimensional random vectors

F_{t_{j}}

-measurable such that

E [| u_{k} (j) |^{2}] < \infty .

Throughout the paper

F_{t}

denotes the

σ

-algebra generated by the random variables

w (s), 0 \leq s \leq t

. The matrices in (7) can be written as

\begin{matrix} G_{k} = diag (G_{k} 0) \in R^{(n + m) \times (n + m)} \\ M_{k} = diag (M_{k} 0) \in R^{(n + m) \times (n + m)} \\ R_{k i} (j) = (t_{j + 1} - t_{j}) R_{k i} . \end{matrix}

(8)

Let

U^{s d} = U_{1}^{s d} \times U_{2}^{s d} \times \dots \times U_{L}^{s d}

be the set of the inputs of the form of sampled data linear state feedback, i.e.,

u = (u_{1}, u_{2}, \dots, u_{L}) \in U^{s d}

if and only if

u_{k} = (u_{k} (0), \dots, u_{k} (N - 1))

with

u_{k} (j) = F_{k} (j) ξ (t_{j}), 0 \leq j \leq N - 1,

(9)

where

F_{k} (j) \in R^{m_{k} \times (n + m)}

are arbitrary matrices and

ξ (t_{j})

are the values at the time instants

t_{j}

of the solution of the following IVP:

\begin{matrix} d ξ (t) & = A ξ (t) d t + C ξ (t) d w (t), t_{j} < t \leq t_{j + 1} \end{matrix}

(10a)

\begin{matrix} ξ (t_{j}^{+}) & = (A_{d} + \sum_{k = 1}^{L} B_{d k} F_{k} (j)) ξ (t_{j}), j = 0, 1, \dots, N - 1, \end{matrix}

(10b)

\begin{matrix} ξ (t_{0}) & = ξ_{0} \in R^{n + m} . \end{matrix}

(10c)

Let

Φ_{k}

be a matrix valued sequence of the form

\begin{matrix} Φ_{k} = (F_{k} (0), F_{k} (1), \dots, F_{k} (N - 1)), \end{matrix}

(11)

where

F_{k} (i) \in R^{m_{k} \times (n + m)}

are arbitrary matrices. We consider the set

\begin{matrix} U_{Φ}^{s d} = {(Φ_{1}, Φ_{2}, \dots, Φ_{L}) : Φ_{k} are arbitrary sequences defined as in (11)} . \end{matrix}

(12)

Remark 1.

By (9) and (10), there is a one to one correspondence between the sets

U^{s d}

and

U_{Φ}^{s d}

. Each

u_{k}

from

U_{k}^{s d}

can be identified with the sequence

Φ_{k} = (Φ_{k} (0), Φ_{k} (1), \dots, Φ_{k} (N - 1))

of its feedback matrices.

Based on this remark we can rewrite the performance criterion (7) as:

\begin{matrix} J_{k} (t_{0}, ξ_{0}; Φ_{1}, Φ_{2}, \dots, Φ_{L}) = & E [ξ^{T} (t_{f}) G_{k} ξ (t_{f}) + \int_{t_{0}}^{t_{f}} ξ^{T} (t) M_{k} ξ (t) d t] \\ + \sum_{j = 0}^{N - 1} E [\sum_{i = 1}^{L} ξ^{T} (t_{j}) F_{i}^{T} (j) R_{k i} (j) F_{i} (j) ξ (t_{j})], \end{matrix}

(13)

for all

(Φ_{1}, Φ_{2}, \dots, Φ_{L}) \in U_{Φ}^{s d}

.

Similarly to Definition 1, one can define a Nash equilibrium strategy for the LQ differential game described by the controlled system (5), the performance criteria (13) and the class of admissible strategies

U_{Φ}^{s d}

described by (12).

Definition 2.

The L-tuple of admissible strategies

({\tilde{Φ}}_{1}, {\tilde{Φ}}_{2}, \dots, {\tilde{Φ}}_{L})

is said to achieve a Nash equilibrium for the differential game described by the controlled system (5), the cost function (13), and the class of the admissible strategies

U_{Φ}^{s d}

, if for all

(Φ_{1}, Φ_{2}, \dots, Φ_{L}) \in U_{Φ}^{s d}

, we have

\begin{matrix} J_{k} (t_{0}, ξ_{0}; {\tilde{Φ}}_{1}, {\tilde{Φ}}_{2}, \dots, {\tilde{Φ}}_{L}) \leq J_{k} (t_{0}, ξ_{0}; {\tilde{Φ}}_{1}, {\tilde{Φ}}_{2}, \dots ., {\tilde{Φ}}_{k - 1}, Φ_{k}, {\tilde{Φ}}_{k + 1}, \dots, {\tilde{Φ}}_{L}) . \end{matrix}

(14)

Remark 2.

(a): Based on the Remark 1 we may infer that if $({\tilde{Φ}}_{1}, {\tilde{Φ}}_{2}, \dots, {\tilde{Φ}}_{L})$ is an equilibrium strategy in the sense of the Definition 2, then $({\tilde{u}}_{1}, {\tilde{u}}_{2}, \dots, {\tilde{u}}_{L})$ given by (9) using the matrix components of ${\tilde{Φ}}_{k}$ , provides an equilibrium strategy for the LQ differential game described by (5), (7) and the family of admissible strategies $U^{s d}$ .
(b): Among the feedback matrices from (9) some have the form:

$\begin{matrix} F_{k} (j) = (F_{k} (j) 0_{m_{k} m}), \end{matrix}$

(15)

where $F_{k} (j) \in R^{m_{k} \times n}$ . Hence, some admissible strategies (9) are of type (4). Hence, if the feedback matrices of the Nash equilibrium strategy $({\tilde{Φ}}_{1}, {\tilde{Φ}}_{2}, \dots, {\tilde{Φ}}_{L})$ have the structure given in (15), then the strategy of type (9) with these feedback matrices provide the Nash equilibrium strategy for the LQ differential game described by (1), (2) and (4).

To obtain explicit formulae for the feedback matrices of a Nash equilibrium strategy of type (9) (or, equivalently (11), (12)), we use the following system of matrix linear differential equations (MLDEs) with jumps and algebraic constraints:

\begin{matrix} - {\dot{P}}_{k} (t) = A^{T} P_{k} (t) + P_{k} (t) A + C^{T} P_{k} (t) C + M_{k}, t_{j} \leq t < t_{j + 1} \\ P_{k} (t_{j}^{-}) = A_{[- k]}^{T} (j) P_{k} (t_{j}) A_{[- k]} (j) - A_{[- k]}^{T} (j) P_{k} (t_{j}) B_{d k} \times \end{matrix}

(16a)

\begin{matrix} \times {(R_{k k} (j) + B_{d k}^{T} P_{k} (t_{j}) B_{d k})}^{†} B_{d k}^{T} P_{k} (t_{j}) A_{[- k]} (j) + M_{[- k]} (j) \\ \sum_{i = 1}^{k - 1} B_{d k}^{T} P_{k} (t_{j}) B_{d i} F_{i} (j) + (R_{k k} (j) + B_{d k}^{T} P_{k} (t_{j}) B_{d k}) F_{k} (j) \end{matrix}

(16b)

\begin{matrix} + \sum_{i = k + 1}^{L} B_{d k}^{T} P_{k} (t_{j}) B_{d i} F_{i} (j) = - B_{d k}^{T} P_{k} (t_{j}) A_{d} \end{matrix}

(16c)

\begin{matrix} P_{k} (t_{N}^{-}) = G_{k}, k = 1, \dots, L, \end{matrix}

(16d)

where we have denoted

A_{[- k]} (j) = A_{d} + \sum_{i = 1, i \neq k}^{L} B_{d i} F_{i} (j)

(17)

and

M_{[- k]} (j) = \sum_{i = 1, i \neq k}^{L} F_{i}^{T} (j) R_{k i} (j) F_{i} (j),

(18)

while the superscript † denotes the generalized inverse of a matrix.

Remark 3.

A solution of the terminal value problem (TVP) with algebraic constraints (16) is a 2L-uple of the form

(P_{1} (\cdot), P_{2} (\cdot), \dots, P_{L} (\cdot); F_{1} (\cdot), F_{2} (\cdot), \dots, F_{L} (\cdot))

where, for each

1 \leq k \leq L

,

P_{k} (\cdot)

is a solution of the TVP (16a), (16b), (16d) and

F_{k} (j) \in R^{m_{k} \times (n + m)}

,

0 \leq j \leq N - 1

. On the interval

[t_{N - 1}, t_{N}]

,

P_{k} (\cdot)

is the solution of the TVP described by the perturbed Lyapunov-type equation from (16a) and the terminal value given in (16d). On each interval

[t_{j - 1}, t_{j})

,

j \leq N - 1

, the terminal value

P_{k} (t_{j}^{-})

of

P_{k} (\cdot)

is computed via (16b) together with (17) and (18) provided that

(F_{1} (j), F_{2} (j), \dots, F_{L} (j))

to be obtained as solution of (16c). So, the TVPs solved by

P_{k} (\cdot), 1 \leq k \leq L

are interconnected via (16c).

To facilitate the statement of the main result of this section, we rewrite (16c) in a compact form as:

\begin{matrix} Π_{d} (P_{1} (t_{j}), \dots, P_{L} (t_{j}), j) F (j) = - Γ_{d} (P_{1} (t_{j}), \dots, P_{L} (t_{j})), \end{matrix}

(19)

where

F (j) = {(F_{1}^{T} (j) F_{2}^{T} (j) \dots F_{L}^{T} (j))}^{T}

and the matrices

Π_{d} (P_{1} (t_{j}), \dots,

P_{L} (t_{j}), j)

and

Γ_{d} (P_{1} (t_{j}), \dots, P_{L} (t_{j}))

are obtained using the block components of (16c).

2.3. Sampled Data Nash Equilibrium Strategy

First we derive a necessary and sufficient condition for the existence of an equilibrium strategy of type (9) for the LQ differential game given by the controlled system (5), the performance criteria (7) and the set of the admissible strategies

U^{s d}

. To this end we adapt the argument used in the proof of ([22], Theorem 4).

We prove:

Theorem 1.

Under the assumption

H .

the following are equivalent:

(i): the LQ differential game defined by the dynamical system controlled by impulses (5), the performance criteria (7) and the class of the admissible strategies of type (9) has a Nash equilibrium strategy

$\begin{matrix} {\tilde{u}}_{k} (j) = {\tilde{F}}_{k} (j) ξ (t_{j}), 0 \leq j \leq N - 1, 1 \leq k \leq L . \end{matrix}$

(20)
(ii): the TVP with constraints (16) has a solution $({\tilde{P}}_{1} (\cdot), {\tilde{P}}_{2} (\cdot), \dots, {\tilde{P}}_{L} (\cdot); {\tilde{F}}_{1} (\cdot), {\tilde{F}}_{2} (\cdot), \dots, {\tilde{F}}_{L} (\cdot))$ defined on the whole interval $[t_{0}, t_{f}]$ and satisfying the conditions below for $0 \leq j \leq N - 1$ :

$\begin{matrix} Π_{d} ({\tilde{P}}_{1} (t_{j}), \dots, {\tilde{P}}_{L} (t_{j}), j) & Π_{d} {({\tilde{P}}_{1} (t_{j}), \dots, {\tilde{P}}_{L} (t_{j}), j)}^{†} Γ_{d} ({\tilde{P}}_{1} (t_{j}), \dots, {\tilde{P}}_{L} (t_{j})) = \\ = Γ_{d} ({\tilde{P}}_{1} (t_{j}), \dots, {\tilde{P}}_{L} (t_{j})) . \end{matrix}$

(21)

If condition (21) holds, then the feedback matrices of a Nash equilibrium strategy of type (9) are the matrix components of the solution of the TVP (16) and are given by

\begin{matrix} {({\tilde{F}}_{1}^{T} (j) {\tilde{F}}_{2}^{T} (j) \dots {\tilde{F}}_{L}^{T} (j))}^{T} = - Π_{d} {({\tilde{P}}_{1} (t_{j}), \dots, {\tilde{P}}_{L} (t_{j}), j)}^{†} Γ_{d} ({\tilde{P}}_{1} (t_{j}), \dots, {\tilde{P}}_{L} (t_{j})), \\ 0 \leq j \leq N - 1 . \end{matrix}

(22)

The minimal value of the cost of the k-th player is

ξ_{0}^{T} {\tilde{P}}_{k} (t_{0}^{-}) ξ_{0}

.

Proof.

From (14) and Remarks 1 and 2(a), one can see that a strategy of type (9) defines a Nash equilibrium strategy for the linear differential game described by the controlled system (5), the performance criteria (7) (or equivalently (13)) if and only if for each

1 \leq k \leq L

the optimal control problem described by the controlled system

\begin{matrix} d ξ (t) & = A ξ (t) d t + C ξ (t) d w (t), t_{j} < t \leq t_{j + 1} \end{matrix}

(23a)

\begin{matrix} ξ (t_{j}^{+}) & = {\tilde{A}}_{[- k]} (j) ξ (t_{j}) + B_{d k} u_{k} (j), j = 0, 1, \dots, N - 1, \end{matrix}

(23b)

\begin{matrix} ξ (t_{0}) & = ξ_{0} \in R^{n + m}, \end{matrix}

(23c)

and the quadratic functional

\begin{matrix} J_{[- k]} (t_{0}, ξ_{0}; u_{k}) = & E [ξ^{T} (t_{f}) G_{k} ξ (t_{f}) + \int_{t_{0}}^{t_{f}} ξ^{T} (t) M_{k} ξ (t) d t] \\ + \sum_{j = 0}^{N - 1} E [ξ^{T} (t_{j}) {\tilde{M}}_{[- k]} (j) ξ (t_{j}) + u_{k}^{T} (j) R_{k k} (j) u_{k} (j)], \end{matrix}

(24)

has an optimal control in a state feedback form. The controlled system (23) and the performance criterion (24) are obtained substituting

{\tilde{u}}_{ℓ} (j) = {\tilde{F}}_{ℓ} (j) ξ (t_{j})

,

1 \leq k, ℓ \leq L

,

ℓ \neq k

in (5) and (7), respectively.

{\tilde{A}}_{[- k]}

and

{\tilde{M}}_{[- k]}

are computed as in (17) and (18), respectively, but with

F_{i} (j)

replaced by

{\tilde{F}}_{i} (j)

.

To obtain necessary and sufficient conditions for the existence of the optimal control in a linear state feedback form we employ the results proved in [20]. First, notice that in the case of the optimal control problem (23)–(24), the TVP (16a), (16b), (16d) plays the role of the TVP (19)–(23) from [20].

Using Theorem 3 in [20] in the case of the optimal control problem described by (23) and (24) we deduce that the existence of the Nash equilibrium strategy of the form (9) for the differential game described by the controlled system (5), the performance criteria (7) (or its equivalent form (13)), is equivalent to the solvability of the TVP described by (16). The feedback matrix

{\tilde{F}}_{k} (j)

of the optimal control solves the equation:

\begin{matrix} (R_{k k} (j) + B_{d k}^{T} {\tilde{P}}_{k} (t_{j}) B_{d k}) {\tilde{F}}_{k} (j) = - B_{d k}^{T} {\tilde{P}}_{k} (t_{j}) {\tilde{A}}_{[- k]} (j) . \end{matrix}

(25)

Substituting the formulae of

{\tilde{A}}_{[- k]}

in (25) we deduce that the feedback matrices of the Nash equilibrium strategy solve an equation of the form (16c) written for

{\tilde{F}}_{k} (j)

instead of

F_{k} (j)

. This equation may be written in the compact form:

\begin{matrix} Π_{d} ({\tilde{P}}_{1} (t_{j}), \dots, {\tilde{P}}_{L} (t_{j}), j) \tilde{F} (j) = - Γ_{d} ({\tilde{P}}_{1} (t_{j}), \dots, {\tilde{P}}_{L} (t_{j})), \end{matrix}

(26)

where

\tilde{F} (j) = {({\tilde{F}}_{1}^{T} (j) {\tilde{F}}_{2}^{T} (j) \dots {\tilde{F}}_{L}^{T} (j))}^{T}

.

By Lemma 2.7 in [21] we deduce that the Equation (26) has a solution if and only if the condition (21) holds. A solution of the Equation (26) is given by (22). The minimal value of the cost for the k-th player is obtained from Theorem 1 in [20] applied in the case of the optimal control problem described by (23), (24). Thus the proof is complete.□

Remark 4.

When the matrices

Π_{d} ({\tilde{P}}_{1} (t_{j}), \dots, {\tilde{P}}_{L} (t_{j}), j)

are invertible, the conditions (21) are satisfied automatically. In this case, the feedback matrices

{\tilde{F}}_{k} (j)

of a Nash equilibrium strategy of type (20) are obtained as the unique solution of the Equation (22), because in this case, the generalized inverse of each matrix

Π_{d} ({\tilde{P}}_{1} (t_{j}), \dots, {\tilde{P}}_{L} (t_{j}), j)

,

0 \leq j \leq N - 1

is the usual inverse.

Combining (6) and (16c), we deduce that the matrices

{\tilde{F}}_{k} (j)

provided by (22) have the structure

{\tilde{F}}_{k} (j) = ({\tilde{F}}_{k} (j) 0_{m_{k} m})

. Hence, the Nash equilibrium strategy of the differential game described by the dynamical system (5), the performance criteria (7) and the admissible strategies of type (9) have the form

{\tilde{u}}_{k} (j) = ({\tilde{F}}_{k} (j) 0_{m_{k} m}) ξ (t_{j}) = {\tilde{F}}_{k} (j) x (t_{j}), 0 \leq j \leq N - 1 .

Now we obtain the following Nash equilibrium strategy of the differential game.

Theorem 2.

Assume that the conditions

H .

and (ii) in Theorem 1 are satisfied. Then, a Nash equilibrium strategy in a state feedback form with sampled measurements of type (4) of the differential game described by the dynamical system (1) and the performance criteria (2) are given by:

\begin{matrix} {\tilde{u}}_{k} (t) = {\tilde{F}}_{k} (j) x (t_{j}), t_{j} \leq t \leq t_{j + 1}, 0 \leq j \leq N - 1, 1 \leq k \leq L . \end{matrix}

(27)

The feedback matrices

{\tilde{F}}_{k} (j)

from (27) are given by the first n columns of the matrices

{\tilde{F}}_{k} (j)

, which are obtained as solutions of Equation (26). In (27),

x (t_{j})

are the values measured at the times

t_{j}

,

0 \leq j \leq N - 1,

of the solution of the closed-loop system obtained when (27) is plugged into (1). The minimal value of the cost (2) associated to the k-th player is given by

(x_{0}^{T} 0_{n m}) {\tilde{P}}_{k} (t_{0}^{-}) {(x_{0}^{T} 0_{n m})}^{T} .

In the next section, we present an algorithm which allows the numerical computation of the matrices

{\tilde{F}}_{k} (j)

arising in (27) for an LQ differential game with two players.

3. Numerical Computations and the Algorithm

In what follows we assume that

L = 2

and

t_{j + 1} - t_{j} = h > 0.0 \leq j \leq N - 1 .

We propose a numerical approach to compute the optimal strategies

{\tilde{u}}_{k} (j) = {\tilde{F}}_{k} (j) \tilde{x} (t_{j}), j = 0, 1, \dots, N - 1 .

(28)

The algorithm consists of two steps:

We first compute the feedback matrices ${\tilde{F}}_{k} (j),$ $j = 0, 1, \dots, N - 1,$ $k = 1, 2$ of the Nash equilibrium strategy, based on the solution ${\tilde{P}}_{1} (\cdot), {\tilde{P}}_{2} (\cdot)$ :

$- {\dot{P}}_{k} (t) = A^{T} P_{k} (t) + P_{k} (t) A + C^{T} P_{k} (t) C + M_{k}, t_{j} \leq t < t_{j + 1} .$

(29)

STEP 1.A. We take $P_{k} (t_{N}^{-}) = G_{k}, k = 1, 2,$ and compute

$\begin{matrix} {\tilde{P}}_{k} (t_{N - 1}) = e^{L^{*} h} [G_{k}] + M_{k}, k = 1, 2, where \end{matrix}$

(30)

$\begin{matrix} M_{k} = h M_{k} + \frac{h^{2}}{2} L^{*} [M_{k}] + \frac{h^{3}}{6} {(L^{*})}^{2} [M_{k}] + \dots + \frac{h^{p}}{p!} {(L^{*})}^{p - 1} [M_{k}] \end{matrix}$

(31)

$e^{L^{*} h} [X] ⋍ \sum_{ℓ = 0}^{q} \frac{h^{ℓ}}{ℓ!} L^{* ℓ} [X] = X + h L^{*} [X] + \frac{h^{2}}{2} (L^{*}) [L^{*} [X]] + \dots + \frac{h^{q}}{q!} {(L^{*})}^{q} [X],$

with $p \geq 1$ and $q \geq 1$ sufficiently large.
For the operator $L^{*} [X]$ we have

$L^{*} [X] = A^{T} X + X A + C^{T} X C$

(32)

for all $X = X^{T} \in R^{(n + m_{1} + m_{2}) \times (n + m_{1} + m_{2})}$ .
The iterations $L^{ℓ} [X]$ are computed from:

$\begin{matrix} L^{* ℓ} [X] = A^{T} L^{* (ℓ - 1)} [X] + L^{* (ℓ - 1)} [X] A + C^{T} L^{* (ℓ - 1)} [X] C \end{matrix}$

(33)

for $ℓ \geq 1$ with $L^{0} [X] = X$ where $X = P_{k} (t_{j + 1}^{-})$ or $X = M_{k}$ , respectively.
We compute the feedback matrices ${\tilde{F}}_{k} (N - 1) \in R^{m_{k} \times n}$ as solutions of the linear equation

$\begin{matrix} (\begin{matrix} R_{11} + h^{- 1} {\tilde{P}}_{1, 11} (t_{N - 1}) & h^{- 1} {\tilde{P}}_{1, 12} (t_{N - 1}) \\ h^{- 1} {\tilde{P}}_{2, 12}^{T} (t_{N - 1}) & R_{22} + h^{- 1} {\tilde{P}}_{2, 22} (t_{N - 1}) \end{matrix}) (\begin{matrix} {\tilde{F}}_{1} (N - 1) \\ {\tilde{F}}_{2} (N - 1) \end{matrix}) = \\ - (\begin{matrix} h^{- 1} {\tilde{P}}_{1, 01}^{T} (t_{N - 1}) \\ h^{- 1} {\tilde{P}}_{2, 02}^{T} (t_{N - 1}) \end{matrix}) \end{matrix}$

(34)

STEP 1.B. We set

${\tilde{F}}_{k} (N - 1) = ({\tilde{F}}_{k} (N - 1) 0 0) \in R^{m_{k} \times (n + m_{1} + m_{2})}, k = 1, 2 .$

Next, we compute ${\tilde{P}}_{k} (t_{N - 1}^{-}),$ $k = 1, 2,$ :

$\begin{matrix} {\tilde{P}}_{1} (t_{N - 1}^{-}) = {(A_{d} + B_{d 2} {\tilde{F}}_{2} (N - 1))}^{T} {\tilde{P}}_{1} (t_{N - 1}) (A_{d} + B_{d 2} {\tilde{F}}_{2} (N - 1)) \\ - {(A_{d} + B_{d 2} {\tilde{F}}_{2} (N - 1))}^{T} {\tilde{P}}_{1} (t_{N - 1}) B_{d 1} {(h R_{11} + B_{d 1}^{T} {\tilde{P}}_{1} (t_{N - 1}) B_{d 1})}^{- 1} \\ \cdot B_{d 1}^{T} {\tilde{P}}_{1} (t_{N - 1}) (A_{d} + B_{d 2} {\tilde{F}}_{2} (N - 1)) + h {\tilde{F}}_{2}^{T} (N - 1) R_{12} {\tilde{F}}_{2} (N - 1) \end{matrix}$

(35)

and

$\begin{matrix} {\tilde{P}}_{2} (t_{N - 1}^{-}) = {(A_{d} + B_{d 1} {\tilde{F}}_{1} (N - 1))}^{T} {\tilde{P}}_{2} (t_{N - 1}) (A_{d} + B_{d 1} {\tilde{F}}_{1} (N - 1)) \\ - {(A_{d} + B_{d 1} {\tilde{F}}_{1} (N - 1))}^{T} {\tilde{P}}_{2} (t_{N - 1}) B_{d 2} {(h R_{22} + B_{d 2}^{T} {\tilde{P}}_{2} (t_{N - 1}) B_{d 2})}^{- 1} \\ \cdot B_{d 2}^{T} {\tilde{P}}_{2} (t_{N - 1}) (A_{d} + B_{d 1} {\tilde{F}}_{1} (N - 1)) + h {\tilde{F}}_{1}^{T} (N - 1) R_{21} {\tilde{F}}_{1} (N - 1) . \end{matrix}$

(36)

STEP 2.A. Fix j such that $j \leq N - 2$ . Assuming that ${\tilde{P}}_{k} (t_{j + 1}^{-})$ have already been computed for a $j \leq N - 2$ , $k = 1, 2$ , we compute

$\begin{matrix} {\tilde{P}}_{k} (t_{j}) = e^{L^{*} h} [{\tilde{P}}_{k} (t_{j + 1}^{-})] + M_{k}, k = 1, 2, \end{matrix}$

(37)

where $M_{k}$ is computed as in (31).
We compute the feedback gains ${\tilde{F}}_{k} (j) \in R^{m_{k} \times n}$ as solution of the linear equation

$\begin{matrix} (\begin{matrix} R_{11} + h^{- 1} {\tilde{P}}_{1, 11} (t_{j}) & h^{- 1} {\tilde{P}}_{1, 12} (t_{j}) \\ h^{- 1} {\tilde{P}}_{2, 12}^{T} (t_{j}) & R_{22} + h^{- 1} {\tilde{P}}_{2, 22} (t_{j}) \end{matrix}) (\begin{matrix} {\tilde{F}}_{1} (j) \\ {\tilde{F}}_{2} (j) \end{matrix}) = - (\begin{matrix} h^{- 1} {\tilde{P}}_{1, 01}^{T} (t_{j}) \\ h^{- 1} {\tilde{P}}_{2, 02}^{T} (t_{j}) \end{matrix}) \end{matrix}$

(38)

STEP 2.B. Setting ${\tilde{F}}_{k} (j) = ({\tilde{F}}_{k} (j) 0 0) \in R^{m_{k} \times (n + m_{1} + m_{2})}$ , $k = 1, 2$ we compute ${\tilde{P}}_{k} (t_{j}^{-})$ as in the formulae below

$\begin{matrix} {\tilde{P}}_{1} (t_{j}^{-}) = {(A_{d} + B_{d 2} {\tilde{F}}_{2} (j))}^{T} {\tilde{P}}_{1} (t_{j}) (A_{d} + B_{d 2} {\tilde{F}}_{2} (j)) \\ - {(A_{d} + B_{d 2} {\tilde{F}}_{2} (j))}^{T} {\tilde{P}}_{1} (t_{j}) B_{d 1} {(h R_{11} + B_{d 1}^{T} {\tilde{P}}_{1} (t_{j}) B_{d 1})}^{- 1} \\ \cdot B_{d 1}^{T} {\tilde{P}}_{1} (t_{j}) (A_{d} + B_{d 2} {\tilde{F}}_{2} (j)) + h {\tilde{F}}_{2}^{T} (j) R_{12} {\tilde{F}}_{2} (j) \end{matrix}$

(39)

and

$\begin{matrix} {\tilde{P}}_{2} (t_{j}^{-}) = {(A_{d} + B_{d 1} {\tilde{F}}_{1} (j))}^{T} {\tilde{P}}_{2} (t_{j}) (A_{d} + B_{d 1} {\tilde{F}}_{1} (j)) \\ - {(A_{d} + B_{d 1} {\tilde{F}}_{1} (j))}^{T} {\tilde{P}}_{2} (t_{j}) B_{d 2} {(h R_{22} + B_{d 2}^{T} {\tilde{P}}_{2} (t_{j}) B_{d 2})}^{- 1} \\ B_{d 2}^{T} {\tilde{P}}_{2} (t_{j}) (A_{d} + B_{d 1} {\tilde{F}}_{1} (j)) + h {\tilde{F}}_{1}^{T} (j) R_{21} {\tilde{F}}_{1} (j) . \end{matrix}$

(40)
In the second step, the computation of the optimal trajectory $\tilde{x} (t)$ involves the initial vector $x_{0}$ and the equilibrium strategy values $u_{k} (j),$ $k = 1, 2 .$
Then, we illustrate the mean squares of the optimal trajectory $E [| \tilde{x} (t) |^{2}]$ and of the equilibrium strategy $E [| {\tilde{u}}_{k} (t) |^{2}], k = 1, 2$ . We set $\tilde{ξ} (t) = {({\tilde{x}}^{T} (t) {\tilde{u}}_{1}^{T} (t) {\tilde{u}}_{2}^{T} (t))}^{T}$ and define $X (t) = E [\tilde{ξ} (t) {\tilde{ξ}}^{T} (t)]$ .
We have $t \to X (t)$ solves the forward linear differential equation with finite jumps:

$\begin{matrix} \dot{X} (t) = L X (t), t_{j} \leq t < t_{j + 1} . \end{matrix}$

(41)

For $t_{j} = j h$ we write:

$\begin{matrix} X (t_{j}^{+}) = (A_{d} + B_{d 1} {\tilde{F}}_{1} (j) + B_{d 2} {\tilde{F}}_{2} (j)) X (t_{j}) \cdot {(A_{d} + B_{d 1} {\tilde{F}}_{1} (j) + B_{d 2} {\tilde{F}}_{2} (j))}^{T} \end{matrix}$

(42)

$0 \leq j \leq N - 1, t_{j} = j h$ , where

$\begin{matrix} L X = A X + X A^{T} + C X C^{T} . \end{matrix}$

(43)

Then, we have used the values to make plots

$\begin{matrix} E [| \tilde{x} (i δ + j h) |^{2}] & = T r [X_{11} (i δ + j h)] \\ E [| {\tilde{u}}_{1} (i δ + j h) |^{2}] & = T r [X_{22} (i δ + j h)] \\ E [| {\tilde{u}}_{2} (i δ + j h) |^{2}] & = T r [X_{33} (i δ + j h)], \end{matrix}$

where

$X (i δ + j h) = (\begin{matrix} X_{11} (i δ + j h) & X_{12} (i δ + j h) & X_{13} (i δ + j h) \\ X_{12}^{T} (i δ + j h) & X_{22} (i δ + j h) & X_{23} (i δ + j h) \\ X_{13}^{T} (i δ + j h) & X_{23}^{T} (i δ + j h) & X_{33} (i δ + j h) \end{matrix})$

such that $X_{11} (i δ + j h) \in R^{n \times n}, X_{22} (i δ + j h) \in R^{m_{1} \times m_{1}}$ and $X_{33} (i δ + j h) \in R^{m_{2} \times m_{2}}$ .

This algorithm enables us to compute the equilibrium strategies values

u_{k} (j)

of the players. The experiments illustrate that the optimal strategies are piecewise constant, which seems to indicate that we have a stabilization effect.

Further, we consider two examples for the LQ differential game described by the dynamical system (1), the performance criteria (2) and the class of piecewise constant admissible strategies of type (28).

Example 1.

We consider the controlled system (1) in the special form

n = m_{1} = m_{2} = 2 .

The coefficient matrices

A,

B_{k},

C,

D_{k},

M_{k},

G_{k},

R_{k k},

R_{k ℓ},

k, ℓ = 1, 2,

k \neq ℓ

are defined as

A = (\begin{matrix} 1.5 & 0.17 \\ 0.07 & - 1.4 \end{matrix}) B_{1} = (\begin{matrix} 1.5 & 0.7 \\ 0.3 & 0.4 \end{matrix}) B_{2} = (\begin{matrix} 1.2 & 0.95 \\ 0.8 & 0.7 \end{matrix})

C = (\begin{matrix} 0.7 & 0.19 \\ 0.24 & 0.9 \end{matrix}) D_{1} = (\begin{matrix} 0.2 & 0.04 \\ 0.4 & 0.5 \end{matrix}) D_{2} = (\begin{matrix} 0.1 & 0.06 \\ 0.2 & 0.3 \end{matrix})

M_{1} = (\begin{matrix} 0.8 & 0.7 \\ 0.7 & 0.95 \end{matrix}) M_{2} = (\begin{matrix} 0.09 & 0.04 \\ 0.04 & 0.08 \end{matrix})

G_{1} = (\begin{matrix} 1.2 & 0.45 \\ 0.45 & 1.5 \end{matrix}) G_{2} = (\begin{matrix} 0.95 & 0.8 \\ 0.8 & 1.15 \end{matrix})

R_{11} = (\begin{matrix} 0.6 & 0.25 \\ 0.25 & 0.8 \end{matrix}) R_{22} = (\begin{matrix} 0.3 & 0.15 \\ 0.15 & 0.4 \end{matrix})

R_{12} = (\begin{matrix} 0.05 & 0.04 \\ 0.04 & 0.08 \end{matrix}) R_{21} = (\begin{matrix} 0.06 & 0.07 \\ 0.07 & 0.09 \end{matrix}) .

The evolution of the mean square values

E [| \tilde{x} (t) {|]}^{2}

and

E [| u_{o p t} (t) {|]}^{2}

of the optimal trajectory

\tilde{x} (t)

(with the initial point

x_{0}^{T} = (0.03 0.01)

) and the equilibrium strategies

u_{1, o p t} (t)

and

u_{2, o p t} (t)

is depicted in Figure 1 on the intervals

[0, 1]

, and in Figure 2 for

[0, 2]

, respectively. The values of the optimal trajectory

\tilde{x} (t)

equilibrium strategies of both players are very close to zero in both the short-term and long-term periods.

Figure 1. (left)

E [| \tilde{x} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 1]

; (right)

E [| u_{1, o p t} (t) |^{2}]

and

E [| u_{2, o p t} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 1]

.

Figure 2. (left)

E [| \tilde{x} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 2]

; (right)

E [| u_{1, o p t} (t) |^{2}]

and

E [| u_{2, o p t} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 2]

.

Example 2.

We consider the controlled system (1) in the special form

n = 4

and

m_{1} = m_{2} = 2 .

We define the matrix coefficients

A,

B_{k},

C,

D_{k},

M_{k},

G_{k},

R_{k k},

R_{k ℓ},

k, ℓ = 1, 2,

k \neq ℓ

as follows:

A = (\begin{matrix} 0.5 & 0.17 & 0.07 & 0.9 \\ 0.07 & 0.54 & 0.2 & 0.25 \\ 0.6 & 0.8 & 0.92 & 0.06 \\ 0.35 & 0.45 & 0.04 & - 0.99 \end{matrix}) B_{1} = (\begin{matrix} 4.05 & - 0.4 \\ 0.4 & - 0.8 \\ 1 & 0.9 \\ 0 & - 0.8 \end{matrix})

C = (\begin{matrix} 0.07 & 0.19 & 0.8 & 0 \\ 0.4 & 0.18 & 0.24 & 0.7 \\ 0.06 & 0.3 & 0.15 & 0.4 \\ 0.45 & 0.37 & 0.09 & 0.08 \end{matrix}) D_{1} = (\begin{matrix} 0.15 & 0 \\ - 0.2 & 0.25 \\ 0 & 0.035 \\ 0.04 & - 0.2 \end{matrix})

B_{2} = (\begin{matrix} 0.4 & 0.05 \\ 0.05 & - 0.07 \\ 0 & 0.07 \\ 0.3 & - 0.05 \end{matrix}) D_{2} = (\begin{matrix} 0.25 & 0.525 \\ 1.25 & - 0.025 \\ 0.35 & - 0.75 \\ 0.25 & - 0.9 \end{matrix})

M_{1} = (\begin{matrix} 0.78 & 0 & 0 & 0 \\ 0 & 0.82 & 0 & 0 \\ 0 & 0 & 0.6 & 0 \\ 0 & 0 & 0 & 0.5 \end{matrix}) M_{2} = (\begin{matrix} 0.6 & 0 & 0 & 0 \\ 0 & 0.8 & 0 & 0 \\ 0 & 0 & 0.48 & 0 \\ 0 & 0 & 0 & 1.05 \end{matrix})

G_{1} = (\begin{matrix} 0.9 & 0.05 & 0.25 & 0.35 \\ 0.05 & 1 & 0.2 & 0.07 \\ 0.25 & 0.2 & 1.05 & 0.3 \\ 0.35 & 0.07 & 0.3 & 0.9 \end{matrix}) G_{2} = (\begin{matrix} 1.25 & 0.75 & 0.21 & 0.65 \\ 0.75 & 0.88 & 0.45 & 0.76 \\ 0.21 & 0.45 & 1 & 0.87 \\ 0.65 & 0.76 & 0.87 & 0.99 \end{matrix})

R_{11} = (\begin{matrix} 1.26 & 0.25 & 0.25 & 0.8 \\ 0.25 & 0.95 & 0.15 & 0.4 \\ 0.25 & 0.15 & 0.96 & 0.3 \\ 0.8 & 0.4 & 0.3 & 0.88 \end{matrix}) R_{22} = (\begin{matrix} 0.6 & 0.15 & 0.15 & 0.4 \\ 0.15 & 0.85 & 0.36 & 0.4 \\ 0.15 & 0.36 & 0.4 & 0.25 \\ 0.4 & 0.4 & 0.25 & 0.87 \end{matrix})

R_{12} = (\begin{matrix} 0.98 & 0.04 & 0.36 & 0.4 \\ 0.04 & 0.8 & 0.36 & 0.45 \\ 0.36 & 0.36 & 0.64 & 0.1 \\ 0.4 & 0.45 & 0.1 & 0.89 \end{matrix}) R_{21} = (\begin{matrix} 0.6 & 0.07 & 0.35 & 0.28 \\ 0.07 & 0.8 & 0.39 & 0.25 \\ 0.35 & 0.39 & 1.2 & 0.48 \\ 0.28 & 0.25 & 0.48 & 1.01 \end{matrix}) .

The evolution of the mean square values

E [| \tilde{x} (t) {|]}^{2}

and

E [| u_{o p t} (t) {|]}^{2}

of the optimal trajectory

\tilde{x} (t)

(with the initial point

x_{0}^{T} = (0.15 0.01 0.02 0.03)

) and the equilibrium strategies

u_{1, o p t} (t)

and

u_{2, o p t} (t)

on the intervals

[0, 1]

(Figure 3) and

[0, 5]

(Figure 4), respectively. The values of the optimal trajectory

\tilde{x} (t)

equilibrium strategies of both players are very close to zero in short-term and long-term period.

Figure 3. (left)

E [| \tilde{x} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 1]

; (right)

E [| u_{1, o p t} (t) |^{2}]

and

E [| u_{2, o p t} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 1]

.

Figure 4. (left)

E [| \tilde{x} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 5]

; (right)

E [| u_{1, o p t} (t) |^{2}]

and

E [| u_{2, o p t} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 5]

.

4. Concluding Remarks

In this paper, we have investigated the formulation of existence conditions for the Nash equilibria strategy in a state feedback form, in the piecewise constant admissible strategies case. These conditions are expressed through the solvability of the algebraic Equation (26). The solutions of these equations provide the feedback matrices of the desired Nash equilibrium strategy. To obtain such conditions for the existence of a sampled-data Nash equilibrium strategy, we have transformed the original problem into an equivalent one which requires to find a Nash equilibrium strategy in a state feedback form for a stochastic differential game, in which the dynamic is described by Itô type differential equations controlled by impulses. Unlike for the deterministic case, when the problem of finding of a sampled-data Nash equilibrium strategy can be transformed into an equivalent problem in discrete-time, in the stochastic framework when the controlled system is described by Itô type differential equations, such a transformation to the discrete-time case is not possible. The developments from the present work clarify and extend the results from Section 5 of [23], where only the particular case

L = 2

was considered. The key method used for obtaining the feedback matrices of the Nash equilibrium strategy via the Equation (26) is the solution

{\tilde{P}}_{k} (\cdot), 1 \leq k \leq L

of the TVP (16). On each interval

(t_{j - 1}, t_{j}), 1 \leq j \leq N,

(16a) consists of L uncoupled backward linear differential equation. The boundary values

{\tilde{P}}_{k} (t_{j}^{-})

are computed via (16d) for

j = N

and via (16b) for

j \leq N - 1

. Finally, we gave an algorithm for calculating the equilibrium strategies of the players, and the numerical experiments suggest a stabilization effect.

Author Contributions

Conceptualization, V.D., I.G.I., I.-L.P. and O.B.; methodology, V.D., I.G.I., I.-L.P. and O.B.; software, V.D., I.G.I., I.-L.P. and O.B.; validation, V.D., I.G.I., I.-L.P. and O.B.; investigation, V.D., I.G.I., I.-L.P. and O.B.; resources, V.D., I.G.I., I.-L.P. and O.B.; writing—original draft preparation, V.D., I.G.I., I.-L.P. and O.B.; writing—review and editing, V.D., I.G.I., I.-L.P. and O.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by “1 Decembrie 1918” University of Alba Iulia through scientific research funds.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hu, L.S.; Cao, Y.Y.; Shao, H.H. Constrained robust sampled-data control for nonlinear uncertain systems. Int. J. Robust Nonlinear Control 2002, 12, 447–464. [Google Scholar] [CrossRef]
Hu, L.-S.; Lam, J.; Cao, Y.-Y.; Shao, H.-H. A linear matrix inequality (LMI) approach to robust H/sub 2/sampled-data control for linear uncertain systems. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2003, 33, 149–155. [Google Scholar] [CrossRef]
Hu, L.; Shi, P.; Huang, B. Stochastic stability and robust control for sampled-data systems with Markovian jump parameters. J. Math. Anal. Appl. 2006, 504–517. [Google Scholar] [CrossRef]
Ramachandran, K.; Tsokos, C. Stochastic Differential Games Theory and Applications; Atlantis Studies in Probability and Statistics; Atlantis Press: Dordrecht, The Netherlands, 2012. [Google Scholar]
Yeung, D.K.; Petrosyan, L.A. Cooperative Stochastic Differential Games; Springer Series in Operations Research and Financial Engineering; Springer: New York, NY, USA, 2006. [Google Scholar]
Zhang, J. Backward Stochastic Differential Equations: From Linear to Fully Nonlinear Theory; Probability Theory and Stochastic Modelling; Springer: New York, NY, USA, 2017; Volume 86. [Google Scholar]
Dockner, E.; Jorgensen, S.; Long, N.; Sorger, G. Differential Games in Economics and Management Science; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar] [CrossRef]
Başar, T.; Olsder, G.J. Dynamic Noncooperative Game Theory; Classics in Applied Mathematics; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1999; Volume 23. [Google Scholar]
Engwerda, J. On the open-loop Nash equilibrium in LQ-games. J. Econ. Dyn. Control 1998, 22, 729–762. [Google Scholar] [CrossRef]
Engwerda, J. Computational aspects of the open-loop Nash equilibrium in linear quadratic games. J. Econ. Dyn. Control 1998, 22, 1487–1506. [Google Scholar] [CrossRef]
Engwerda, J. Open-loop Nash equilibria in the non-cooperative infinite-planning horizon LQ game. J. Frankl. Inst. 2014, 351, 2657–2674. [Google Scholar] [CrossRef]
Nian, X.; Duan, Z.; Tang, W. Analytical solution for a class of linear quadratic open-loop Nash game with multiple players. J. Control Theory Appl. 2006, 4, 239–244. [Google Scholar] [CrossRef]
Azevedo-Perdicoúlis, T.P.; Jank, G. Disturbance Attenuation of Linear Quadratic OL-Nash Games on Repetitive Processes with Smoothing on the Gas Dynamics. Multidimens. Syst. Signal Process. 2012, 23, 131–153. [Google Scholar] [CrossRef]
Imaan, M.; Cruz, J. Sampled-data Nash controls in non-zero-sum differential games. Int. J. Control 1973, 17, 1201–1209. [Google Scholar] [CrossRef]
Başar, T. On the existence and uniqueness of closed-loop sampled-data nash controls in linear-quadratic stochastic differential games. In Optimization Techniques; Iracki, K., Malanowski, K., Walukiewicz, S., Eds.; Lecture Notes in Control and Information Sciences; Springer: Berlin/Heidelberg, Germany, 1980; Volume 22, pp. 193–203. [Google Scholar]
Engwerda, J. A numerical algorithm to find soft-constrained Nash equilibria in scalar LQ-games. Int. J. Control 2006, 79, 592–603. [Google Scholar] [CrossRef]
Wonham, W.M. On a Matrix Riccati Equation of Stochastic Control. SIAM J. Control 1968, 6, 681–697. [Google Scholar] [CrossRef]
Yong, J.S.J. Linear–quadratic stochastic two-person nonzero-sum differential games: Open-loop and closed-loop Nash equilibria. Stoch. Process. Appl. 2019, 381–418. [Google Scholar] [CrossRef]
Sun, J.; Li, X.; Yong, J. Open-Loop and Closed-Loop Solvabilities for Stochastic Linear Quadratic Optimal Control Problems. SIAM J. Control Optim. 2016, 54, 2274–2308. [Google Scholar] [CrossRef]
Drăgan, V.; Ivanov, I.G. On the stochastic linear quadratic control problem with piecewise constant admissible controls. J. Frankl. Inst. 2020, 357, 1532–1559. [Google Scholar] [CrossRef]
Rami, M.; Moore, J.; Zhou, X. Indefinite stochastic linear quadratic control and generalized differential Riccati equation. Siam J. Control Optim. 2001, 40, 1296–1311. [Google Scholar] [CrossRef]
Drăgan, V.; Ivanov, I.G.; Popa, I.L. Stochastic linear quadratic differential games in a state feedback setting with sampled measurements. Syst. Control Lett. 2019, 104563. [Google Scholar] [CrossRef]
Drăgan, V.; Ivanov, I.G.; Popa, I.L. On the closed loop Nash equilibrium strategy for a class of sampled data stochastic linear quadratic differential games. Chaos Solitons Fractals 2020, 109877. [Google Scholar] [CrossRef]

Figure 1. (left)

E [| \tilde{x} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 1]

; (right)

E [| u_{1, o p t} (t) |^{2}]

and

E [| u_{2, o p t} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 1]

.

Figure 1. (left)

E [| \tilde{x} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 1]

; (right)

E [| u_{1, o p t} (t) |^{2}]

and

E [| u_{2, o p t} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 1]

.

Figure 2. (left)

E [| \tilde{x} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 2]

; (right)

E [| u_{1, o p t} (t) |^{2}]

and

E [| u_{2, o p t} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 2]

.

Figure 2. (left)

E [| \tilde{x} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 2]

; (right)

E [| u_{1, o p t} (t) |^{2}]

and

E [| u_{2, o p t} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 2]

.

Figure 3. (left)

E [| \tilde{x} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 1]

; (right)

E [| u_{1, o p t} (t) |^{2}]

and

E [| u_{2, o p t} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 1]

.

Figure 3. (left)

E [| \tilde{x} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 1]

; (right)

E [| u_{1, o p t} (t) |^{2}]

and

E [| u_{2, o p t} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 1]

.

Figure 4. (left)

E [| \tilde{x} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 5]

; (right)

E [| u_{1, o p t} (t) |^{2}]

and

E [| u_{2, o p t} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 5]

.

Figure 4. (left)

E [| \tilde{x} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 5]

; (right)

E [| u_{1, o p t} (t) |^{2}]

and

E [| u_{2, o p t} (t) |^{2}]

; Interval

[t_{0}, τ] = [0, 5]

.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Closed-Loop Nash Equilibrium in the Class of Piecewise Constant Strategies in a Linear State Feedback Form for Stochastic LQ Games

Abstract

1. Introduction

2. Problem Formulation

2.1. Model Description and Problem Setting

2.2. The Equivalent Problem

2.3. Sampled Data Nash Equilibrium Strategy

3. Numerical Computations and the Algorithm

4. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics