Abstract
In this paper, we examine a sampled-data Nash equilibrium strategy for a stochastic linear quadratic (LQ) differential game, in which admissible strategies are assumed to be constant on the interval between consecutive measurements. Our solution first involves transforming the problem into a linear stochastic system with finite jumps. This allows us to obtain necessary and sufficient conditions assuring the existence of a sampled-data Nash equilibrium strategy, extending earlier results to a general context with more than two players. Furthermore, we provide a numerical algorithm for calculating the feedback matrices of the Nash equilibrium strategies. Finally, we illustrate the effectiveness of the proposed algorithm by two numerical examples. As both situations highlight a stabilization effect, this confirms the efficiency of our approach.
Keywords:
nash equilibria; stochastic LQ differential games; sampled-data controls; equilibrium strategies; optimal trajectories MSC:
91A23; 93E20; 49N10; 49N70
1. Introduction
Stochastic control problems governed by Itô’s differential equations have been the subject of intensive research over the last decades. This generated a rich literature and fundamental results such as the and LQ robust sampled-data control problems under a unified framework studied in [1,2], classes of uncertain sampled-data systems with random jumping parameters characterized by finite state semi-Markov process analysed in [3], or stochastic differential games investigated in [4,5,6,7].
Dynamical games have been used to solve many real life problems (see e.g., [8]). For example, the concept of Nash equilibrium is very important for dynamical games, where for controlled systems the closed-loop and open-loop equilibria strategies present special interest. Various aspects of open-loop Nash equilibria are studied for a LQ differential game in [9], other results being reported in [10,11,12]. In addiytion, in [13] applications to gas network optimisation are studied via open-loop sampled-data Nash equilibrium strategy. The framework in which state vector measurements for a class of differential games are available only at discrete times was first studied in [14]. There, a two-player differential game was considered, and necessary conditions for the sample data controls were obtained using a backward translation method starting at the last time interval, and following the previous state measurements. This case has been extended to a stochastic framework in [15], where the players have access to sample-data state information with sampling interval. For other results dealing with closed-loop systems (see, e.g., [16]). Stochastic dynamical games are an important, but more challenging framework. First introduced in [17], stochastic LQ problems have been studied extensively (see, [18,19]).
In the present paper, we consider stochastic differential games governed by Itô’s differential equation, with state multiplicative and control multiplicative white noise perturbations. The original contributions of this work are the following. First, we analyze the design of a Nash equilibrium strategy in a state feedback form in the class of piecewise constant admissible strategies. It is assumed that the state measurements are available only at some discrete times. The original problem is transformed into an equivalent one which asks to find some existence conditions for a Nash equilibrium strategy in a state feedback form for a LQ stochastic differential game described by a system of Itô differential equations controlled by impulses. Necessary and sufficient conditions for the existence of a Nash equilibrium strategy for the new LQ differential game are obtained based on methods from [20,21]. The feedback matrices of the equilibrium strategies for the original dynamical game are obtained from the general result using the structure of the matrix coefficients of the system controlled by impulses. Another major contribution of this paper consists of the numerical methods for computing the feedback matrices of the Nash equilibrium strategy.
To our knowledge, in the stochastic framework, there are few papers dealing with the problem of sampled-data Nash equilibrium strategy in both open-loop and closed-loop forms ([22,23]), the papers [13,14] mentioned before only considering the deterministic framework. In that case, the problem of sampled-data Nash equilibrium strategy can be transformed in a natural way into a problem stated in discrete-time framework. Such a transformation is not possible when the dynamical system contains state multiplicative and control multiplicative white noise perturbations. In [15], the stochastic character is due only to the presence of the additive white noise perturbations. In that case, the approach is not essentially different from the one used in the deterministic case.
The paper is organized as follows. In Section 2, we formulate the problem, introducing the L-players Nash equilibria concept. In Section 2.2, we state an equivalent form of the original problem and we introduce a system of matrix linear differential equations with jumps and algebraic constraints which is involved in the derivation of the feedback matrices of the equilibrium strategies. Then, in Section 2.3, we provide some necessary and sufficient conditions which guarantee the existence of a piecewise constant Nash equilibrium strategy. An algorithm implementing these developments is given in Section 3. The efficiency of the proposed algorithm is demonstrated by two numerical examples illustrating the behavior of the optimal trajectories generated by the equilibrium strategy. Section 4 is dedicated to conclusions.
2. Problem Formulation
2.1. Model Description and Problem Setting
Consider the controlled system having the state space representation described by
where is the state vector, L is a positive integer, are control parameters, and is a 1-dimensional standard Wiener process defined on a probability space .
In the controlled system there are L players () who change their behavior through their control function The matrices of the system and matrices of the players are known. In the field of the game theory, the controls are called admissible strategies (or policies) for the players. The different classes of admissible strategies can be defined in various ways, depending on the available information.
Each player aims to minimize its own cost function (performance criterion), and for we have
We make the following assumption regarding the weights matrices in (2):
H. and with and
Here we generalize Definition 2.1 given in [23].
Definition 1.
In this paper we consider a special class of closed-loop admissible strategies in which the states of the dynamical system are available for measurement at the discrete-times , and the set of admissible strategies consists of piecewise constant stochastic processes of the form
with are arbitrary matrices.
Our aim is to investigate the problem of designing a Nash equilibrium strategy in the class of piecewise constant admissible strategies of type (4) (the closed-loop admissible strategies), for a LQ differential game described by a dynamical system of type (1), under the performance criteria (2). Moreover, we also present a method for the numerical computation of the feedback gains of the equilibrium strategy.
We denote the set of the piecewise constant admissible strategies of type (4).
2.2. The Equivalent Problem
Define by , where are arbitrary -dimensional random vectors with finite second moments. If is the solution of system (1) determined by the piecewise constant inputs , we set .
Direct calculations show that is the solution of the initial value problem (IVP) associated to a linear stochastic system with finite jumps often called system controlled by impulses:
under the notations:
where denotes the zero matrix of size .
Throughout the paper denotes the -algebra generated by the random variables . The matrices in (7) can be written as
Let be the set of the inputs of the form of sampled data linear state feedback, i.e., if and only if with
where are arbitrary matrices and are the values at the time instants of the solution of the following IVP:
Let be a matrix valued sequence of the form
where are arbitrary matrices. We consider the set
Remark 1.
By (9) and (10), there is a one to one correspondence between the sets and . Each from can be identified with the sequence of its feedback matrices.
Based on this remark we can rewrite the performance criterion (7) as:
Similarly to Definition 1, one can define a Nash equilibrium strategy for the LQ differential game described by the controlled system (5), the performance criteria (13) and the class of admissible strategies described by (12).
Definition 2.
The L-tuple of admissible strategies is said to achieve a Nash equilibrium for the differential game described by the controlled system (5), the cost function (13), and the class of the admissible strategies , if for all , we have
Remark 2.
- (a)
- (b)
- Among the feedback matrices from (9) some have the form:where . Hence, some admissible strategies (9) are of type (4). Hence, if the feedback matrices of the Nash equilibrium strategy have the structure given in (15), then the strategy of type (9) with these feedback matrices provide the Nash equilibrium strategy for the LQ differential game described by (1), (2) and (4).
To obtain explicit formulae for the feedback matrices of a Nash equilibrium strategy of type (9) (or, equivalently (11), (12)), we use the following system of matrix linear differential equations (MLDEs) with jumps and algebraic constraints:
Remark 3.
A solution of the terminal value problem (TVP) with algebraic constraints (16) is a 2L-uple of the form where, for each , is a solution of the TVP (16a), (16b), (16d) and , . On the interval , is the solution of the TVP described by the perturbed Lyapunov-type equation from (16a) and the terminal value given in (16d). On each interval , , the terminal value of is computed via (16b) together with (17) and (18) provided that to be obtained as solution of (16c). So, the TVPs solved by are interconnected via (16c).
To facilitate the statement of the main result of this section, we rewrite (16c) in a compact form as:
where and the matrices and are obtained using the block components of (16c).
2.3. Sampled Data Nash Equilibrium Strategy
First we derive a necessary and sufficient condition for the existence of an equilibrium strategy of type (9) for the LQ differential game given by the controlled system (5), the performance criteria (7) and the set of the admissible strategies . To this end we adapt the argument used in the proof of ([22], Theorem 4).
We prove:
Theorem 1.
Under the assumption the following are equivalent:
- (i)
- (ii)
- the TVP with constraints (16) has a solution defined on the whole interval and satisfying the conditions below for :
If condition (21) holds, then the feedback matrices of a Nash equilibrium strategy of type (9) are the matrix components of the solution of the TVP (16) and are given by
The minimal value of the cost of the k-th player is .
Proof.
From (14) and Remarks 1 and 2(a), one can see that a strategy of type (9) defines a Nash equilibrium strategy for the linear differential game described by the controlled system (5), the performance criteria (7) (or equivalently (13)) if and only if for each the optimal control problem described by the controlled system
and the quadratic functional
has an optimal control in a state feedback form. The controlled system (23) and the performance criterion (24) are obtained substituting , , in (5) and (7), respectively. and are computed as in (17) and (18), respectively, but with replaced by .
To obtain necessary and sufficient conditions for the existence of the optimal control in a linear state feedback form we employ the results proved in [20]. First, notice that in the case of the optimal control problem (23)–(24), the TVP (16a), (16b), (16d) plays the role of the TVP (19)–(23) from [20].
Using Theorem 3 in [20] in the case of the optimal control problem described by (23) and (24) we deduce that the existence of the Nash equilibrium strategy of the form (9) for the differential game described by the controlled system (5), the performance criteria (7) (or its equivalent form (13)), is equivalent to the solvability of the TVP described by (16). The feedback matrix of the optimal control solves the equation:
Substituting the formulae of in (25) we deduce that the feedback matrices of the Nash equilibrium strategy solve an equation of the form (16c) written for instead of . This equation may be written in the compact form:
where .
By Lemma 2.7 in [21] we deduce that the Equation (26) has a solution if and only if the condition (21) holds. A solution of the Equation (26) is given by (22). The minimal value of the cost for the k-th player is obtained from Theorem 1 in [20] applied in the case of the optimal control problem described by (23), (24). Thus the proof is complete.□
Remark 4.
When the matrices are invertible, the conditions (21) are satisfied automatically. In this case, the feedback matrices of a Nash equilibrium strategy of type (20) are obtained as the unique solution of the Equation (22), because in this case, the generalized inverse of each matrix , is the usual inverse.
Combining (6) and (16c), we deduce that the matrices provided by (22) have the structure . Hence, the Nash equilibrium strategy of the differential game described by the dynamical system (5), the performance criteria (7) and the admissible strategies of type (9) have the form
Now we obtain the following Nash equilibrium strategy of the differential game.
Theorem 2.
Assume that the conditions and (ii) in Theorem 1 are satisfied. Then, a Nash equilibrium strategy in a state feedback form with sampled measurements of type (4) of the differential game described by the dynamical system (1) and the performance criteria (2) are given by:
The feedback matrices from (27) are given by the first n columns of the matrices , which are obtained as solutions of Equation (26). In (27), are the values measured at the times , of the solution of the closed-loop system obtained when (27) is plugged into (1). The minimal value of the cost (2) associated to the k-th player is given by
In the next section, we present an algorithm which allows the numerical computation of the matrices arising in (27) for an LQ differential game with two players.
3. Numerical Computations and the Algorithm
In what follows we assume that and
We propose a numerical approach to compute the optimal strategies
The algorithm consists of two steps:
- We first compute the feedback matrices of the Nash equilibrium strategy, based on the solution :STEP 1.A. We take and computewith and sufficiently large.For the operator we havefor all .The iterations are computed from:for with where or , respectively.We compute the feedback matrices as solutions of the linear equationSTEP 1.B. We setNext, we compute :andSTEP 2.A. Fix j such that . Assuming that have already been computed for a , , we computewhere is computed as in (31).We compute the feedback gains as solution of the linear equationSTEP 2.B. Setting , we compute as in the formulae belowand
- In the second step, the computation of the optimal trajectory involves the initial vector and the equilibrium strategy valuesThen, we illustrate the mean squares of the optimal trajectory and of the equilibrium strategy . We set and define .We have solves the forward linear differential equation with finite jumps:For we write:, whereThen, we have used the values to make plotswheresuch that and .
This algorithm enables us to compute the equilibrium strategies values of the players. The experiments illustrate that the optimal strategies are piecewise constant, which seems to indicate that we have a stabilization effect.
Further, we consider two examples for the LQ differential game described by the dynamical system (1), the performance criteria (2) and the class of piecewise constant admissible strategies of type (28).
Example 1.
The evolution of the mean square values and of the optimal trajectory (with the initial point ) and the equilibrium strategies and is depicted in Figure 1 on the intervals , and in Figure 2 for , respectively. The values of the optimal trajectory equilibrium strategies of both players are very close to zero in both the short-term and long-term periods.
Figure 1.
(left) ; Interval ; (right) and ; Interval .
Figure 2.
(left) ; Interval ; (right) and ; Interval .
Example 2.
We consider the controlled system (1) in the special form and We define the matrix coefficients as follows:
The evolution of the mean square values and of the optimal trajectory (with the initial point ) and the equilibrium strategies and on the intervals (Figure 3) and (Figure 4), respectively. The values of the optimal trajectory equilibrium strategies of both players are very close to zero in short-term and long-term period.
Figure 3.
(left) ; Interval ; (right) and ; Interval .
Figure 4.
(left) ; Interval ; (right) and ; Interval .
4. Concluding Remarks
In this paper, we have investigated the formulation of existence conditions for the Nash equilibria strategy in a state feedback form, in the piecewise constant admissible strategies case. These conditions are expressed through the solvability of the algebraic Equation (26). The solutions of these equations provide the feedback matrices of the desired Nash equilibrium strategy. To obtain such conditions for the existence of a sampled-data Nash equilibrium strategy, we have transformed the original problem into an equivalent one which requires to find a Nash equilibrium strategy in a state feedback form for a stochastic differential game, in which the dynamic is described by Itô type differential equations controlled by impulses. Unlike for the deterministic case, when the problem of finding of a sampled-data Nash equilibrium strategy can be transformed into an equivalent problem in discrete-time, in the stochastic framework when the controlled system is described by Itô type differential equations, such a transformation to the discrete-time case is not possible. The developments from the present work clarify and extend the results from Section 5 of [23], where only the particular case was considered. The key method used for obtaining the feedback matrices of the Nash equilibrium strategy via the Equation (26) is the solution of the TVP (16). On each interval (16a) consists of L uncoupled backward linear differential equation. The boundary values are computed via (16d) for and via (16b) for . Finally, we gave an algorithm for calculating the equilibrium strategies of the players, and the numerical experiments suggest a stabilization effect.
Author Contributions
Conceptualization, V.D., I.G.I., I.-L.P. and O.B.; methodology, V.D., I.G.I., I.-L.P. and O.B.; software, V.D., I.G.I., I.-L.P. and O.B.; validation, V.D., I.G.I., I.-L.P. and O.B.; investigation, V.D., I.G.I., I.-L.P. and O.B.; resources, V.D., I.G.I., I.-L.P. and O.B.; writing—original draft preparation, V.D., I.G.I., I.-L.P. and O.B.; writing—review and editing, V.D., I.G.I., I.-L.P. and O.B. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by “1 Decembrie 1918” University of Alba Iulia through scientific research funds.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Hu, L.S.; Cao, Y.Y.; Shao, H.H. Constrained robust sampled-data control for nonlinear uncertain systems. Int. J. Robust Nonlinear Control 2002, 12, 447–464. [Google Scholar] [CrossRef]
- Hu, L.-S.; Lam, J.; Cao, Y.-Y.; Shao, H.-H. A linear matrix inequality (LMI) approach to robust H/sub 2/sampled-data control for linear uncertain systems. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2003, 33, 149–155. [Google Scholar] [CrossRef]
- Hu, L.; Shi, P.; Huang, B. Stochastic stability and robust control for sampled-data systems with Markovian jump parameters. J. Math. Anal. Appl. 2006, 504–517. [Google Scholar] [CrossRef]
- Ramachandran, K.; Tsokos, C. Stochastic Differential Games Theory and Applications; Atlantis Studies in Probability and Statistics; Atlantis Press: Dordrecht, The Netherlands, 2012. [Google Scholar]
- Yeung, D.K.; Petrosyan, L.A. Cooperative Stochastic Differential Games; Springer Series in Operations Research and Financial Engineering; Springer: New York, NY, USA, 2006. [Google Scholar]
- Zhang, J. Backward Stochastic Differential Equations: From Linear to Fully Nonlinear Theory; Probability Theory and Stochastic Modelling; Springer: New York, NY, USA, 2017; Volume 86. [Google Scholar]
- Dockner, E.; Jorgensen, S.; Long, N.; Sorger, G. Differential Games in Economics and Management Science; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar] [CrossRef]
- Başar, T.; Olsder, G.J. Dynamic Noncooperative Game Theory; Classics in Applied Mathematics; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1999; Volume 23. [Google Scholar]
- Engwerda, J. On the open-loop Nash equilibrium in LQ-games. J. Econ. Dyn. Control 1998, 22, 729–762. [Google Scholar] [CrossRef]
- Engwerda, J. Computational aspects of the open-loop Nash equilibrium in linear quadratic games. J. Econ. Dyn. Control 1998, 22, 1487–1506. [Google Scholar] [CrossRef]
- Engwerda, J. Open-loop Nash equilibria in the non-cooperative infinite-planning horizon LQ game. J. Frankl. Inst. 2014, 351, 2657–2674. [Google Scholar] [CrossRef]
- Nian, X.; Duan, Z.; Tang, W. Analytical solution for a class of linear quadratic open-loop Nash game with multiple players. J. Control Theory Appl. 2006, 4, 239–244. [Google Scholar] [CrossRef]
- Azevedo-Perdicoúlis, T.P.; Jank, G. Disturbance Attenuation of Linear Quadratic OL-Nash Games on Repetitive Processes with Smoothing on the Gas Dynamics. Multidimens. Syst. Signal Process. 2012, 23, 131–153. [Google Scholar] [CrossRef]
- Imaan, M.; Cruz, J. Sampled-data Nash controls in non-zero-sum differential games. Int. J. Control 1973, 17, 1201–1209. [Google Scholar] [CrossRef]
- Başar, T. On the existence and uniqueness of closed-loop sampled-data nash controls in linear-quadratic stochastic differential games. In Optimization Techniques; Iracki, K., Malanowski, K., Walukiewicz, S., Eds.; Lecture Notes in Control and Information Sciences; Springer: Berlin/Heidelberg, Germany, 1980; Volume 22, pp. 193–203. [Google Scholar]
- Engwerda, J. A numerical algorithm to find soft-constrained Nash equilibria in scalar LQ-games. Int. J. Control 2006, 79, 592–603. [Google Scholar] [CrossRef]
- Wonham, W.M. On a Matrix Riccati Equation of Stochastic Control. SIAM J. Control 1968, 6, 681–697. [Google Scholar] [CrossRef]
- Yong, J.S.J. Linear–quadratic stochastic two-person nonzero-sum differential games: Open-loop and closed-loop Nash equilibria. Stoch. Process. Appl. 2019, 381–418. [Google Scholar] [CrossRef]
- Sun, J.; Li, X.; Yong, J. Open-Loop and Closed-Loop Solvabilities for Stochastic Linear Quadratic Optimal Control Problems. SIAM J. Control Optim. 2016, 54, 2274–2308. [Google Scholar] [CrossRef]
- Drăgan, V.; Ivanov, I.G. On the stochastic linear quadratic control problem with piecewise constant admissible controls. J. Frankl. Inst. 2020, 357, 1532–1559. [Google Scholar] [CrossRef]
- Rami, M.; Moore, J.; Zhou, X. Indefinite stochastic linear quadratic control and generalized differential Riccati equation. Siam J. Control Optim. 2001, 40, 1296–1311. [Google Scholar] [CrossRef]
- Drăgan, V.; Ivanov, I.G.; Popa, I.L. Stochastic linear quadratic differential games in a state feedback setting with sampled measurements. Syst. Control Lett. 2019, 104563. [Google Scholar] [CrossRef]
- Drăgan, V.; Ivanov, I.G.; Popa, I.L. On the closed loop Nash equilibrium strategy for a class of sampled data stochastic linear quadratic differential games. Chaos Solitons Fractals 2020, 109877. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).



