Next Article in Journal
A Game-Theoretic Approach for Modeling Competitive Diffusion over Social Networks
Next Article in Special Issue
A Stochastic Maximum Principle for Markov Chains of Mean-Field Type
Previous Article in Journal
Optimal Incentives in a Principal–Agent Model with Endogenous Technology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Linear–Quadratic Mean-Field-Type Games: A Direct Method

by
Tyrone E. Duncan
1 and
Hamidou Tembine
2,*
1
Department of Mathematics, University of Kansas, Lawrence, KS 66044, USA
2
Learning and Game Theory Laboratory, New York University Abu Dhabi, P.O. Box 129188, Abu Dhabi, UAE
*
Author to whom correspondence should be addressed.
Games 2018, 9(1), 7; https://doi.org/10.3390/g9010007
Submission received: 4 January 2018 / Revised: 29 January 2018 / Accepted: 31 January 2018 / Published: 12 February 2018
(This article belongs to the Special Issue Mean-Field-Type Game Theory)

Abstract

:
In this work, a multi-person mean-field-type game is formulated and solved that is described by a linear jump-diffusion system of mean-field type and a quadratic cost functional involving the second moments, the square of the expected value of the state, and the control actions of all decision-makers. We propose a direct method to solve the game, team, and bargaining problems. This solution approach does not require solving the Bellman–Kolmogorov equations or backward–forward stochastic differential equations of Pontryagin’s type. The proposed method can be easily implemented by beginners and engineers who are new to the emerging field of mean-field-type game theory. The optimal strategies for decision-makers are shown to be in a state-and-mean-field feedback form. The optimal strategies are given explicitly as a sum of the well-known linear state-feedback strategy for the associated deterministic linear–quadratic game problem and a mean-field feedback term. The equilibrium cost of the decision-makers are explicitly derived using a simple direct method. Moreover, the equilibrium cost is a weighted sum of the initial variance and an integral of a weighted variance of the diffusion and the jump process. Finally, the method is used to compute global optimum strategies as well as saddle point strategies and Nash bargaining solution in state-and-mean-field feedback form.

1. Introduction

In 1952, Markowitz proposed a paradigm for dealing with risk issues concerning choices which involve many possible financial instruments [1]. Formally, it deals with only two discrete time periods (e.g., “now” and “3 months from now”), or equivalently, one accounting period (e.g., “3 months”). In this scheme, the goal of an Investor is to select the portfolio of securities that will provide the best distribution of future consumption, given their investment budget. Two measures of the prospects provided by such a portfolio are assumed to be sufficient for evaluating its desirability: the expected value at the end of the accounting period and the standard deviation or its square, the variance, of that value. If the initial investment budget is positive, there will be a one-to-one relationship between these end-of-period measures and comparable measures relating to the percentage change in value, or return over the period. Thus, Markowitz’ approach is often framed in terms of the expected return of a portfolio and its standard deviation of return, with the latter serving as a measure of risk. A typical example of risk in the current market is the evolution of the prices [2,3] of the cryptocurrencies (bitcoin, litecoin, ethereum, dash, etc). The Markowitz paradigm (also termed as mean-variance paradigm) is often characterized as dealing with portfolio risk and (expected) return [4,5]. We address this problem when several entities are involved. Game problems in which the state dynamics is given by a linear stochastic system with a Brownian motion and a cost functional that is quadratic in the state and the control are often called linear–quadratic–Gaussian (LQG) games. For the continuous time LQG game problem with positive coefficients, the optimal strategy is a linear state-feedback strategy which is identical to an optimal control for the corresponding deterministic linear–quadratic game problem, where the Brownian motion is replaced by the zero process. Moreover, the equilibrium cost only differs from the deterministic game problem’s equilibrium cost by the integral of a function of time. For LQG control and LQG zero-sum games, it can be shown that a simple square completion method provides an explicit solution to the problem. It was successfully developed and applied by Duncan et al. [6,7,8,9,10,11] in the mean-field-free case. Interestingly, the method can be used beyond the class of LQG framework. Moreover, Duncan et al. extended the direct method to more general noises, including fractional Brownian noises and some non-quadratic cost functionals on spheres, torus, and more general spaces.
The main goal of this work is to investigate whether these techniques can be used to solve mean-field-type game problems which are non-standard problems [12]. To do so, we modify the state dynamics to include mean-field terms which are (i) the expected value of the state, (ii) the expected value of the control-actions, in the drift function. We also modify the instant cost and terminal cost function to include (iii) the square of the expected values of the state and (iv) the square of the expected values of the control action. When the state dynamics and/or the cost functional involve a mean-field term (such as the expected value of the state and/or expected values of the control actions), the game is said to be an LQG game of mean-field type, or MFT-LQG. We aim to study the behavior of such MFT-LQG game problems when mean-field terms are involved. If in addition the state dynamics is driven by a jump-diffusion process, then the problem is termed as an MFT-LQJD game problem.
For such game problems, various solution methods such as the stochastic maximum principle (SMP) ([12]) and the dynamic programming principle (DPP) with Hamilton–Jacobi–Bellman–Isaacs equation and Fokker–Planck–Kolmogorov equation have been proposed [12,13,14]. Most studies illustrated these solution methods in the linear–quadratic game with an infinite number of decision-makers [15,16,17,18,19,20,21]. These works assume indistinguishability within classes, and the cost functions were assumed to be identical or invariant per permutation of decision-makers indexes. Note that the indistinguishability assumption is not fulfilled for many interesting problems, such as variance reduction or and risk quantification problems, in which decision-makers have different sensitivity towards the risk. One typical and practical example is to consider an energy-efficient multi-level building in which every resident has its own comfort zone temperature and aims to use the Heating, ventilation, and air conditioning (HVAC) system to be closer to its comfort temperature and to maintain it within its own comfort zone. This problem clearly does not satisfy the indistinguishability assumption used in the previous works on mean-field games. Therefore, it is reasonable to look at the problem beyond the indistinguishability assumption. Here we drop these assumptions and solve the problem directly with an arbitrary finite number of decision-makers. In the LQ-mean-field-type game problems, the state process can be modeled by a set of linear stochastic differential equations of McKean–Vlasov, and the preferences are formalized by quadratic cost functions with mean-field terms. These game problems are of practical interest, and a detailed exposition of this theory can be found in [7,12,22,23,24,25]. The popularity of these game problems is due to practical considerations in signal processing, pattern recognition, filtering, prediction, economics, and management science [26,27,28,29].
To some extent, most of the risk-neutral versions of these optimal controls are analytically and numerically solvable [6,7,9,11,24]. On the other hand, the linear quadratic robust setting naturally appears if the decision makers’ objective is to minimize the effect of a small perturbation and related variance of the optimally controlled nonlinear process. By solving a linear–quadratic game problem of mean-field type, and using the implied optimal control actions, decision-makers can significantly reduce the variance (and the cost) incurred by this perturbation. The variance reduction and minimax problems have very interesting applications in risk quantification problems under adversarial attacks and in security issues in interdependent infrastructures and networks [27,30,31,32,33]. Table 1 summarizes some recent developments in MF-LQ-related games.
In this work, we propose a simple argument that gives the best-response strategy and the Nash equilibrium cost for a class of MFT-LQJD games without the use of the well-known solution methods (SMP and DPP). We apply the square completion method in the risk-neutral mean-field-type game problems. It is shown that this method is well-suited to MF-LQJD games, as well as to variance reduction performance functionals. Applying the solution methodology related to the DPP or the SMP requires an involved (stochastic) analysis and convexity arguments to generate necessary and sufficient optimality criteria. We avoid all of this with this method.

1.1. Contribution of This Article

Our contribution can be summarized as follows. We formulate and solve a mean-field-type game described by a linear jump-diffusion dynamics and a mean-field-dependent quadratic or robust-quadratic cost functional for each generic decision-maker. The optimal strategies for the decision-makers are given semi-explicitly using a simple and direct method based on square completion, suggested by Duncan et al. (e.g., [7,8,9]) for the mean-field-free case. This approach does not use the well-known solution methods such as the stochastic maximum principle and the dynamic programming principle with Hamilton–Jacobi–Bellman–Isaacs equation and Fokker–Planck–Kolmogorov equation. It does not require extended backward–forward integro-partial differential equations (IPDEs) to solve the problem. In the risk-neutral linear–quadratic mean-field-type game, we show that there is generally a best response strategy to the mean of the state, and provide a sufficient condition of existence of mean-field Nash equilibrium. We also provide a global optimum solution to the problem in the case of full cooperation between the decision-makers. This approach gives a basic insight into the solution by providing a simple explanation for the additional term in the robust Riccati equation, compared to the risk-neutral Riccati equation. Sufficient conditions for the existence and uniqueness of mean-field equilibria are obtained when the horizon lengths are small enough and the Riccati coefficient parameters are positive. The method (see Figure 1) is then extended to the linear–quadratic robust mean-field-type games under disturbance, formulated as a minimax mean-field-type game.
Only a very limited amount of prior work seems to have been done on the MF-LQJD mean-field-type game problems. As indicated in Table 1, the jump term brings a new feature to the existing literature, and to the best of our knowledge, it is the first work that introduces and provides a bargaining solution [34] in mean-field-type games using a direct method.
The last section of this article is devoted to the validation of the novel equations derived in this article using other approaches. We confirm the validity of the optimal feedback strategies. In the Appendix we provide a basic example illustrating the sub-optimality of the mean-field game approach (which consists of freezing the mean-field term) compared with the mean-field-type game approach (in which an individual decision-maker can significantly influence the mean-field term).

1.2. Structure

A brief outline of the article follows. The next section introduces the non-cooperative mean-field-type game problem and provides its solution. Then, the fully-cooperative game and the bargaining problems and their solutions are presented. The last part of the article is devoted to adversarial problems of mean-field type.

Notation and Preliminaries

Let T > 0 be a fixed time horizon and ( Ω , F , F B , N , P ) be a given filtered probability space on which a one-dimensional standard Brownian motion B = { B ( t ) } t 0 is given, N ˜ ( d t , d θ ) = N ( d t , d θ ) ν ( d θ ) d t is a centered jump process with Lévy measure ν defined over Θ . The filtration F = { F t B , N , 0 t T } is the natural filtration generated by the union { B , N } augmented by P null sets of F . The processes B and N are mutually independent. In practice, B is used to capture smaller disturbance and N is used for larger jumps of the system.
We introduce the following notation:
  • Let k 1 . L k ( 0 , T ; R ) be the set of functions f : [ 0 , T ] R such that 0 T | f ( t ) | k d t < .
  • L F k ( 0 , T ; R ) is the set of F -adapted R -valued processes X ( · ) such that E 0 T | X ( t ) | k d t < .
  • X ¯ ( t ) = E [ X ( t ) ] denotes the expected value of the random variable X ( t ) .
An admissible control strategy u i of decision-maker i is an F -adapted and square-integrable process with values in a non-empty subset U i of R . We denote the set of all admissible controls by U i :
U i = { u i ( · ) L F 2 ( 0 , T ; R ) ; u i ( . ) U i a . e . t [ 0 , T ] , P a . s . } .

2. Non-Cooperative Problem

Consider n risk-neutral decision-makers ( n 2 ) and let L i ( u 1 , , u n ) be the objective functional of decision-maker i , given by
L i ( u 1 , , u n ) = 1 2 q i ( T ) x 2 ( T ) + 1 2 q ¯ i ( T ) [ E x ( T ) ] 2 + 1 2 0 T q i ( t ) x 2 ( t ) + q ¯ i ( t ) ( E [ x ( t ) ] ) 2 + r i ( t ) u i 2 ( t ) + r ¯ i ( t ) [ E u i ( t ) ] 2 d t .
Then, the best-response of decision-maker i to the process ( u i , E [ x ] ) : = ( u 1 , , u i 1 , u i + 1 , , u n , E [ x ] ) solves the following risk-neutral linear–quadratic mean-field-type control problem
inf u i ( · ) U i E L i ( u 1 , , u n ) , subject   to d x ( t ) = a ( t ) x ( t ) + a ¯ ( t ) E [ x ( t ) ] + i = 1 n b i ( t ) u i ( t ) + i = 1 n b ¯ i u ¯ i ( t ) d t + σ ( t ) d B ( t ) + Θ μ ( t , θ ) N ˜ ( d t , d θ ) , x ( 0 ) : = x 0 ,
where E x 2 ( 0 ) < + , q i ( t ) 0 , q i ( t ) + q ¯ i ( t ) 0 , r i ( t ) > 0 , r i ( t ) + r ¯ i ( t ) 0 , and a ( t ) , a ¯ ( t ) , b i ( t ) , σ ( t ) are real-valued functions, and where E [ x ( t ) ] is the expected value of the state created by all decision-makers under the control action profile ( u 1 , , u n ) j = 1 n U j . The method below can handle time-varying coefficients. For simplicity, we impose an integrability condition on these coefficient functions over [ 0 , T ] :
0 T | a ( t ) | + | a ¯ ( t ) | + | b ( t ) | + | b ¯ ( t ) | + σ 2 ( t ) + Θ ( | μ ( t , θ ) | + μ 2 ( t , θ ) ) ν ( d θ ) d t < + .
Under condition (3), the state dynamics of (2) has a solution for each u = ( u 1 , , u n ) j = 1 n U j . Note that we do not impose boundedness or Lipschitz conditions (because quadratic functionals are not necessarily Lipschitz).
Definition 1 (BRi: Best Response of decision-maker i).
Any strategy u i * ( · ) U i satisfying the infimum in (2) is called a risk-neutral best-response strategy of decision-maker i to the other decision-makers strategy u i j i U j . The set of best-response strategies of i is denoted by B R i : j i U j 2 U i , where 2 U i denotes the set of subsets of U i .
Note that if b i = 0 = r i , there are multiple optimizers of the best-response problem.
Definition 2 (Mean-Field Nash Equilibrium).
Any strategy profile ( u i * , , u n * ) i U i such that u i * B R i ( u i * ) for every i and x ¯ * = E [ x * ] is called a Nash equilibrium of the LQ-MFJD game above.
The risk-neutral mean-field-type Nash equilibrium problem we are concerned with is to find and characterize the processes ( x * , u * , E [ x * ] , E [ u * ] ) such that for every decision-maker i , u i * is an optimizer of the best response problem (2) and the expected value of the resulting common state E [ x * ] created by all the decision-makers coincides with x ¯ . This means that an equilibrium is a fixed-point of the best response correspondence B R = ( B R 1 , , B R n ) , where B R i : j i U j 2 U i is the best-response correspondence of decision-maker i .
We rewrite the expected objective functional and the state coefficients in terms of x x ¯ and x ¯ :
E L i = 1 2 [ q i ( T ) E ( x ( T ) x ¯ ( T ) ) 2 + [ q i ( T ) + q ¯ i ( T ) ] x ¯ 2 ( T ) + E 0 T q i ( x x ¯ ) 2 + [ q i + q ¯ i ] x ¯ 2 + r i ( u i u ¯ i ) 2 + ( r i + r ¯ i ) [ u ¯ i ] 2 d t ] , d x = a ( x x ¯ ) + ( a + a ¯ ) x ¯ + i = 1 n b i ( u i u ¯ i ) + i = 1 n ( b i + b ¯ i ) u ¯ i d t + σ d B ( t ) + Θ μ ( t , θ ) N ˜ ( d t , d θ ) .
Note that the expected value of the first term in the integral in L i can be seen as a weighted variance v a r of the state, since q ¯ i ( t ) E [ ( x ( t ) E [ x ( t ) ] ) 2 ] = q ¯ i ( t ) v a r ( x ( t ) ) . Taking the expectation of the state dynamics, one arrives at the deterministic linear dynamics
d d t E x = x ¯ ˙ = ( a + a ¯ ) x ¯ + i = 1 n ( b i + b ¯ i ) u ¯ i , x ¯ ( 0 ) = E x ( 0 ) .
The direct method consists of writing a generic structure of the cost functional, with unknown deterministic functions to be identified. Inspired from the structure of the terminal cost function, we try a generic solution in a quadratic form. Let f i ( t , x ) = 1 2 α i ( t ) ( x x ¯ ) 2 + 1 2 β i ( t ) x ¯ 2 + γ i ( t ) x ¯ + δ i ( t ) , where α , β , γ , δ are deterministic functions of time, such that
f i ( T , x ( T ) ) = 1 2 { q i ( T ) E ( x ( T ) x ¯ ( T ) ) 2 + [ q i ( T ) + q ¯ i ( T ) ] x ¯ 2 ( T ) } .
At the final time T , one can identify α i ( T ) = q i ( T ) , β i ( T ) = q i ( T ) + q ¯ i ( T ) , γ i ( T ) = δ i ( T ) = 0 .
Recall that Itô’s formula for the jump-diffusion process is
f i ( T , x ( T ) ) = f i ( 0 , x ( 0 ) ) + 0 T [ f i , t + f i , x D + f i , x x σ 2 2 ] d t + 0 T σ f i , x d B + 0 T Θ [ f i ( t , x + μ ( t , θ ) ) f i ( t , x ) f i , x μ ( t , θ ) ] ν ( d θ ) d t + 0 T Θ [ f i ( t , x + μ ( t , θ ) ) f i ( t , x ) ] N ˜ ( d t , d θ ) ,
where D is the drift term D : = a ( x x ¯ ) + ( a + a ¯ ) x ¯ + i = 1 n b i ( u i u ¯ i ) + i = 1 n ( b i + b ¯ i ) u ¯ i . We compute the derivative terms:
x ¯ ˙ = ( a + a ¯ ) x ¯ + i = 1 n ( b i + b ¯ i ) u ¯ i , f i , t = 1 2 α ˙ i ( x x ¯ ) 2 + 1 2 β ˙ i x ¯ 2 + γ ˙ i x ¯ + δ ˙ i α i ( x x ¯ ) x ¯ ˙ + β i x ¯ x ¯ ˙ + γ i x ¯ ˙ , f i , x = α ˙ i ( x x ¯ ) , f i , x x = α i , f i ( t , x + μ ) f i ( t , x ) f i , x μ = 1 2 α i ( x + μ x ¯ ) 2 1 2 α i ( x x ¯ ) 2 α i ( x x ¯ ) μ = 1 2 α i μ 2 .
Using (7) in (6) and taking the expectation yields
E [ f i ( T , x ( T ) ) f i ( 0 , x ( 0 ) ) ] = 1 2 E 0 T α ˙ i ( x x ¯ ) 2 + β ˙ i x ¯ 2 d t + 1 2 E 0 T 2 β i [ ( a + a ¯ ) x ¯ 2 + j = 1 n ( b j + b ¯ j ) u ¯ j x ¯ ] d t + 1 2 E 0 T 2 a α i ( x x ¯ ) 2 + 2 α i j = 1 n b j ( u j u ¯ j ) ( x x ¯ ) d t + 1 2 0 T [ σ 2 + Θ μ 2 ( t , θ ) ν ( d θ ) ] α i d t + E 0 T γ ˙ i x ¯ + γ i [ ( a + a ¯ ) x ¯ + j = 1 n ( b j + b ¯ j ) u ¯ j ] d t + 0 T δ ˙ i d t ,
where we have used the following equalities:
E [ α i ( x x ¯ ) x ¯ ˙ ] = 0 , E 0 T σ f i , x d B = 0 , E 0 T Θ [ f i ( t , x + μ ( t , θ ) ) f i ( t , x ) ] N ˜ ( d t , d θ ) = 0 .
We compute the gap between E [ L i ] and E [ f i ( 0 , x ( 0 ) ) ] as
E [ L i f i ( 0 , x ( 0 ) ) ] = 1 2 ( q i ( T ) α i ( T ) ) E ( x ( T ) x ¯ ( T ) ) 2 + 1 2 [ q i ( T ) + q ¯ i ( T ) β i ( T ) ] x ¯ 2 ( T ) + 1 2 E 0 T q i ( x x ¯ ) 2 + [ q i + q ¯ i ] x ¯ 2 ( t ) d t + 1 2 E 0 T r i ( u i u ¯ i ( t ) ) 2 + ( r i + r ¯ i ) [ u ¯ i ] 2 d t ] + 1 2 E 0 T α ˙ i ( x x ¯ ) 2 + β ˙ i x ¯ 2 + 2 β i ( a + a ¯ ) x ¯ 2 d t + 1 2 E 0 T 2 β i ( b i + b ¯ i ) u ¯ i x ¯ d t + 1 2 E 0 T 2 β i j i n ( b j + b ¯ j ) u ¯ j x ¯ d t + 1 2 E 0 T 2 a α i ( x x ¯ ) 2 + 2 α i b i ( u i u ¯ i ) ( x x ¯ ) + 2 α i j i b j ( u j u ¯ j ) ( x x ¯ ) d t + 1 2 0 T [ σ 2 + Θ μ 2 ( t , θ ) ν ( d θ ) ] α i d t + 1 2 E 0 T 2 γ ˙ i x ¯ + 2 γ i [ ( a + a ¯ ) x ¯ + j = 1 n ( b j + b ¯ j ) u ¯ j ] d t + 1 2 0 T 2 δ ˙ i d t .

2.1. Best Response to Open-Loop Strategies

In this subsection, we compute the best-response of decision-maker i to open-loop strategies ( u j ) j i . The information structure for the others players is limited to time and initial point; i.e., the mappings ( u j ) j i are measurable functions of time (and do not depend on x) and initial point x 0 .
E [ L i f i ( 0 , x ( 0 ) ) ] = 1 2 [ ( q i ( T ) α i ( T ) ) E ( x ( T ) x ¯ ( T ) ) 2 + 1 2 [ q i ( T ) + q ¯ i ( T ) β i ( T ) ] x ¯ 2 ( T ) + 1 2 E 0 T { α ˙ i + 2 a α i b i 2 r i α i 2 + q i } ( x ( t ) x ¯ ( t ) ) 2 + 1 2 E 0 T [ β ˙ i + 2 β i ( a + a ¯ ) ( b i + b ¯ i ) 2 r i + r ¯ i β i 2 + q i + q ¯ i ] x ¯ 2 d t + 1 2 E 0 T r i [ u i u ¯ i + b i r i α i ( x x ¯ ) ] 2 d t + 1 2 E 0 T ( r i + r ¯ i ) [ u ¯ i + ( b i + b ¯ i ) r i + r ¯ i ( β i x ¯ + γ i ) ] 2 d t + 1 2 E 0 T 2 γ ˙ i + 2 γ i ( a + a ¯ ) ( b i + b ¯ i ) 2 r i + r ¯ i 2 β i γ i + 2 β i j i ( b j + b ¯ j ) u ¯ j x ¯ d t + 1 2 E 0 T ( b i + b ¯ i ) 2 r i + r ¯ i γ i 2 + 2 γ i j i ( b j + b ¯ j ) u ¯ j d t + 1 2 0 T [ σ 2 + Θ μ 2 ( t , θ ) ν ( d θ ) ] α i d t + 1 2 0 T 2 δ ˙ i d t ,
where
( r i + r ¯ i ) [ u ¯ i ] 2 + 2 β i j = 1 n ( b j + b ¯ j ) u ¯ j x ¯ + 2 γ i j = 1 n ( b j + b ¯ j ) u ¯ j = ( r i + r ¯ i ) [ u ¯ i ] 2 + 2 u ¯ i ( b i + b ¯ i ) ( β i x + γ ¯ i ) + 2 β i j i ( b j + b ¯ j ) u ¯ j x ¯ + 2 γ i j i ( b j + b ¯ j ) u ¯ j = ( r i + r ¯ i ) [ u ¯ i + ( b i + b ¯ i ) r i + r ¯ i ( β i x ¯ + γ i ) ] 2 ( b i + b ¯ i ) 2 r i + r ¯ i { β i 2 x ¯ 2 + 2 β i γ i x ¯ + γ ¯ i 2 } + 2 β i j i ( b j + b ¯ j ) u ¯ j x ¯ + 2 γ i j i ( b j + b ¯ j ) u ¯ j .
The best response of decision-maker i to the open-loop strategies ( u j ) j i is u i = u ¯ i b i r i α i ( x x ¯ ) , and its expected value is u ¯ i = ( b i + b ¯ i ) r i + r ¯ i ( β i x ¯ + γ i ) , where α i , β i , γ i are deterministic functions of time t . Clearly, the best response to open-loop strategies is in state-and-mean-field feedback form. Here the mean-field feedback terms are the expected value of the state E [ x ( t ) ] and the expected value of the control action E [ u i ( t ) ] .
Therefore, we examine optimal strategies in state-and-mean-field feedback form in the next section.

2.2. Feedback Strategies

The information structure for feedback solution is as follows. The model and the objective functions are assumed to be common knowledge. We assume that the state is of perfect observation. We will show below that the mean-field term is computable (via the initial mean state and the model). If the other decision-makers play their optimal state-and-mean-field feedback strategies, then the functions γ 1 , , γ n are identically zero at any given time. We compute again E [ L i f i ( 0 , x ( 0 ) ) ] and complete the squares using the elements of { x x ¯ , x ¯ } .
E [ L i f i ( 0 , x ( 0 ) ) ] = 1 2 ( q i ( T ) α i ( T ) ) E ( x ( T ) x ¯ ( T ) ) 2 + 1 2 [ q i ( T ) + q ¯ i ( T ) β i ( T ) ] x ¯ 2 ( T ) + 1 2 E 0 T [ α ˙ i + 2 a α i b i 2 r i α i 2 2 α i j i n b j 2 r j α j + q i ] ( x x ¯ ) 2 + 1 2 E 0 T β ˙ i + 2 β i ( a + a ¯ ) β i 2 ( b i + b ¯ i ) 2 r i + r ¯ i 2 β i j i n ( b j + b ¯ j ) 2 r j + r ¯ j β j + q i + q ¯ i x ¯ 2 d t + 1 2 E 0 T r i [ u i u ¯ i + b i r i α i ( x x ¯ ) ] 2 d t + 1 2 E 0 T ( r i + r ¯ i ) [ u ¯ i + β i ( b i + b ¯ i ) r i + r ¯ i x ¯ ] 2 d t + 1 2 E 0 T [ σ 2 + Θ μ 2 ( t , θ ) ν ( d θ ) ] α i d t ,
where we have used the following square completions:
r i ( u i u ¯ i ) 2 + 2 α i j = 1 n b j ( u j u ¯ j ) ( x x ¯ ) = r i [ u i u ¯ i + b i r i α i ( x x ¯ ) ] 2 b i 2 r i α i 2 ( x x ¯ ) 2 + 2 α i j i b j ( u j u ¯ j ) ( x x ¯ ) , and ( r i + r ¯ i ) [ u ¯ i ] 2 + 2 β i j = 1 n ( b j + b ¯ j ) u ¯ j x ¯ = ( r i + r ¯ i ) [ u ¯ i + β i ( b i + b ¯ i ) r i + r ¯ i x ¯ ] 2 β i 2 ( b i + b ¯ i ) 2 r i + r ¯ i x ¯ 2 + 2 β i j i ( b j + b ¯ j ) u ¯ j x ¯ .
It follows that
inf u i U i E [ L i ] = 1 2 α i ( 0 ) v a r ( x ( 0 ) ) + 1 2 β i ( 0 ) [ E x ( 0 ) ] 2 + 1 2 0 T [ σ 2 + Θ μ 2 ( t , θ ) ν ( d θ ) ] α i d t , u i * = b i r i α i ( x x ¯ ) β i ( b i + b ¯ i ) r i + r ¯ i x ¯ , α ˙ i + 2 a α i b i 2 r i α i 2 2 α i j i b j 2 r j α j + q i = 0 , α i ( T ) = q i ( T ) , β ˙ i + 2 β i ( a + a ¯ ) β i 2 ( b i + b ¯ i ) 2 r i + r ¯ i 2 β i j i ( b j + b ¯ j ) 2 r j + r ¯ j β j + q i + q ¯ i = 0 , β i ( T ) = q i ( T ) + q ¯ i ( T ) . x ¯ ( t ) = x ¯ ( 0 ) e 0 t { ( a + a ¯ ) i = 1 n β i ( b i + b ¯ i ) 2 r i + r ¯ i } d t
provides a mean-field Nash equilibrium in feedback strategies.
These Riccati equations are different from those of open-loop control strategies. The coefficient of the coupling terms β i β j , α i α j are different, reflecting the coupling through the state and the mean state. Notice that the optimal strategy is in state-and-mean-field feedback form, which is different from the standard LQG game solution. As a ¯ , b ¯ , r ¯ , q ¯ vanish in (15), one gets the Nash equilibrium of the corresponding stochastic differential game in closed-loop strategies with α i = β i , and u i becomes mean-field-free. When the diffusion coefficient σ and the jump rate μ vanish, one obtains the noiseless deterministic game problem, and the optimal strategy solution will be given by the equation in β i because x x ¯ = 0 in the deterministic case.
How to feedback the mean-field term E [ x ( t ) ] ? Here the mean-field term can be explicitly computed if the initial mean state x ¯ ( 0 ) is given and the model known:
E [ x ( t ) ] = x ¯ ( t ) = x ¯ ( 0 ) e 0 t { ( a + a ¯ ) i = 1 n β i ( b i + b ¯ i ) 2 r i + r ¯ i } d t .

3. Fully-Cooperative Solutions

In this section, we examine the global optimum and Nash bargaining solution [34] of the game.

3.1. Global Optimum

We now consider the fully cooperative scenario where all the decision-makers decide jointly to optimize a single global objective L 0 : = i L i given by
inf ( u 1 , , u n ) E i L i , d x = a ( x x ¯ ) + ( a + a ¯ ) x ¯ + i = 1 n b i ( u i u ¯ i ) + i = 1 n ( b i + b ¯ i ) u ¯ i d t + σ d B ( t ) + Θ μ ( t , θ ) N ˜ ( d t , d θ ) , x ( 0 ) = x 0 .
Following the same methodology as above with q 0 = i q i , q ¯ 0 = i q ¯ i and f 0 ( t , x ) = 1 2 α 0 ( t ) ( x x ¯ ) 2 + 1 2 β 0 ( t ) x ¯ 2 , we obtain:
inf ( u 1 , , u n ) E i [ L i ] = 1 2 α 0 ( 0 ) v a r ( x ( 0 ) ) + 1 2 β 0 ( 0 ) [ E x ( 0 ) ] 2 + 1 2 0 T [ σ 2 ( t ) + Θ μ 2 ( t , θ ) ν ( d θ ) ] α 0 ( t ) d t , u i * = b i r i α 0 ( x x ¯ ) β 0 ( b i + b ¯ i ) r i + r ¯ i x ¯ , α ˙ 0 + 2 a α 0 α 0 2 i b i 2 r i + q 0 = 0 , α 0 ( T ) = q 0 ( T ) = i = 1 n q i ( T ) , β ˙ 0 + 2 β 0 ( a + a ¯ ) β 0 2 i = 1 n ( b i + b ¯ i ) 2 r i + r ¯ i + q 0 + q ¯ 0 = 0 , β 0 ( T ) = i = 1 n q i ( T ) + q ¯ i ( T ) . x ¯ ( t ) = x ¯ ( 0 ) e 0 t { ( a + a ¯ ) β 0 i = 1 n ( b i + b ¯ i ) 2 r i + r ¯ i } d t .
When the coefficients are constant (in time), α 0 , β 0 are explicitly given by
S : = i b i 2 r i , α 0 ( t ) = a S + q 0 S + a 2 S 2 1 + q 0 ( T ) a S + q 0 S + a 2 S 2 Γ , Γ : = 1 2 ( q 0 ( T ) a S + q 0 S + a 2 S 2 ) 1 2 ( q 0 ( T ) a S q 0 S + a 2 S 2 ) e 2 ( T t ) q 0 S + a 2 S 2 , S ˜ : = i = 1 n ( b i + b ¯ i ) 2 r i + r ¯ i , β 0 ( t ) = a + a ¯ S ˜ + q 0 + q ¯ 0 S ˜ + a 2 S ˜ 2 1 + q 0 ( T ) + q ¯ 0 ( T ) a + a ¯ S ˜ + q 0 + q ¯ 0 S ˜ + ( a + a ¯ ) 2 S ˜ 2 Γ ˜ , Γ ˜ : = 1 2 ( q 0 ( T ) + q ¯ 0 ( T ) a + a ¯ S ˜ + q 0 + q ¯ 0 S ˜ + ( a + a ¯ ) 2 S ˜ 2 ) 1 2 ( q 0 ( T ) + q ¯ 0 ( T ) a + a ¯ S ˜ q 0 + q ¯ 0 S ˜ + ( a + a ¯ ) 2 S ˜ 2 ) × e 2 ( T t ) q 0 + q ¯ 0 S ˜ + ( a + a ¯ ) 2 S ˜ 2 .
The global optimum cost in the fully-cooperative case is
L 0 = 1 2 α 0 ( 0 ) v a r ( x ( 0 ) ) + 1 2 β 0 ( 0 ) [ E x ( 0 ) ] 2 + 1 2 0 T [ σ 2 ( t ) + Θ μ 2 ( t , θ ) ν ( d θ ) ] α 0 ( t ) d t ,
and is less than the total cost at the Nash equilibrium, which is
1 2 i = 1 n α i ( 0 ) v a r ( x ( 0 ) ) + 1 2 i = 1 n β i ( 0 ) ( x ¯ ( 0 ) ) 2 + 1 2 0 T [ σ 2 ( t ) + Θ μ 2 ( t , θ ) ν ( d θ ) ] i = 1 n α i ( t ) d t .
This loss of efficiency of Nash equilibria was analyzed in [35], and is often termed as the price of anarchy [36,37].

3.2. Nash Bargaining Solution

Mean-field-type bargaining theory deals with the situation in which decision-makers can realize—through cooperation—other better outcomes than the one which becomes effective when they do not cooperate. This non-cooperative outcome is called the threatpoint L N E = ( L 1 N E , , L n N E ) . The question is which outcome might the decision-makers possibly agree to. Let V be the set of feasible outcomes of the benefit of bargaining [34]. We assume that if the agents unanimously agree on a point v = ( v 1 , , v n ) V , they obtain v . Otherwise, they obtain L N E = ( L 1 N E , , L n N E ) . This presupposes that each decision-maker can enforce the threatpoint when he does not agree with a proposal. Generically, what decision-maker i can guarantee is inf u i sup u i L i , which is non-admissible in the quadratic setting. The outcome v the decision-makers will finally agree on is called the solution of the bargaining problem. Therefore, we have chosen the non-cooperation solution when there is a disagreement.
The Nash bargaining solution selects for a given set V the point at which the product of gains from L N E is maximal.
N B S ( V , L N E ) = arg max v V { i N [ L i N E v i ] }
Since the function v k N v k is non-convex, Problem (21) is non-convex. Here we exploit the convexity of the functional u w , L ( u ) for any given w = ( w 1 , , w n ) R + + n such that i = 1 n w i = 1 to reach any point in the Pareto frontier of the game. The maximization (in w) of the product P : = i = 1 n [ L i N E L i ( u ^ ( w ) ) ] yields
i = 1 n w j L i ( u ^ ( w ) ) . { j i [ L j N E L j ( u ^ ( w ) ) ] } = c .
This is equivalent to
i = 1 n P L i N E L i ( u ^ ( w ) ) w j L i ( u ^ ( w ) ) = c .
We set y i : = P L i N E L i ( u ^ ( w ) ) k = 1 n P L k N E L k ( u ^ ( w ) ) . Then, it follows that
i = 1 n y i w j L i ( u ^ ( w ) ) = c i = 1 n P L i N E L i ( u ^ ( w ) ) .
Moreover, y i 0 , i = 1 n y i = 1 .
Assume the matrix ( w j L i ( u ^ ( w ) ) ) ( i , j ) N 2 has at least rank n 1 . Then, the Nash bargaining solution is explicitly given by v = ( L 1 ( u ^ ( w * ) ) , , L n ( u ^ ( w * ) ) ) , with the weight
w i * = j i [ L j N E L j ( u ^ ( w * ) ) ] k = 1 n j k [ L j N E L j ( u ^ ( w * ) ) ] = y i ( w * ) ,
where the optimal bargaining strategy profile is u ^ ( w ) arg min u { w , L ( u ) } . It remains to compute the functional u ^ ( w ) = ( u ^ 1 ( w ) , , u ^ n ( w ) ) .
inf ( u 1 , , u n ) E i w i L i , d x = a ( x x ¯ ) + ( a + a ¯ ) x ¯ + i = 1 n b i ( u i u ¯ i ) + i = 1 n ( b i + b ¯ i ) u ¯ i d t + σ d B + Θ μ ( t , θ ) N ˜ ( d t , d θ ) . x ( 0 ) = x 0 ,
Following the same methodology as above, we obtain:
inf ( u 1 , , u n ) E i w i L i = 1 2 α 0 ( 0 ) v a r ( x ( 0 ) ) + 1 2 β 0 ( 0 ) [ x ¯ ( 0 ) ] 2 + 0 T [ σ 2 + Θ μ 2 ( t , θ ) ν ( d θ ) ] α 0 2 d t , u ^ i ( w ) = b i w i r i α 0 ( x x ¯ ) ( b i + b ¯ i ) w i ( r i + r ¯ i ) ( β 0 x ¯ ) , α ˙ 0 + 2 a α 0 α 0 2 i b i 2 w i r i + i w i q i = 0 , α 0 ( T ) = i w i q i ( T ) , β ˙ 0 + 2 β 0 ( a + a ¯ ) β 0 2 i ( b i + b ¯ i ) 2 w i ( r i + r ¯ i ) + i w i ( q i + q ¯ i ) = 0 , β 0 ( T ) = i w i ( q i + q ¯ i ( T ) ) , x ¯ ( t ) = x ¯ ( 0 ) e 0 t [ a + a ¯ β 0 i = 1 n ( b i + b ¯ i ) 2 w i ( r i + r ¯ i ) ] d t , x ¯ ( 0 ) R .

4. LQ Robust Mean-Field-Type Games

We now consider a robust mean-field-type game with two decision-makers. Decision-maker 1 minimizes with respect to u 1 and Decision-maker 2 maximizes with respect to u 2 . The minimax problem of mean-field type is given by
inf u 1 sup u 2 E [ L ( u 1 , u 2 ) ] , d x = a ( t ) ( x ( t ) x ¯ ( t ) ) + ( a ( t ) + a ¯ ( t ) ) x ¯ ( t ) + i = 1 2 b i ( t ) ( u i ( t ) u ¯ i ( t ) ) + i = 1 2 ( b i ( t ) + b ¯ i ( t ) ) u ¯ i ( t ) d t + σ ( t ) d B ( t ) + Θ μ ( t , θ ) N ˜ ( t , d θ ) , x ( 0 ) = x 0 L 2 ( R ) ,
where the objective functional is
L ( u 1 , u 2 ) = 1 2 [ q ( T ) ( x ( T ) x ¯ ( T ) ) 2 + [ q ( T ) + q ¯ ( T ) ] x ¯ 2 ( T ) + 0 T q ( t ) ( x ( t ) x ¯ ( t ) ) 2 + [ q ( t ) + q ¯ ( t ) ] x ¯ 2 ( t ) d t + 0 T r 1 ( t ) ( u 1 ( t ) u ¯ 1 ( t ) ) 2 + ( r 1 ( t ) + r ¯ 1 ( t ) ) [ u ¯ 1 ( t ) ] 2 d t + 0 T r 2 ( t ) ( u 2 ( t ) u ¯ 2 ( t ) ) 2 + ( r 2 ( t ) + r ¯ 2 ( t ) ) [ u ¯ 2 ( t ) ] 2 d t ] .
The risk-neutral robust mean-field-type equilibrium problem we are concerned with is to characterize the processes ( x * , u 1 * , u 2 * , E [ x * ] ) such that for every decision-maker, u ¯ 1 * is the minimizer and u 2 * is the maximum of the best response problem (24), and the expected value of the resulting common state created by all the decision-makers is x ¯ * ( t ) = E [ x * ( t ) ] .
Below, we solve Problem (24) for r 1 ( t ) > 0 , r ¯ 1 ( t ) > 0 , r 2 ( t ) < 0 , r ¯ 2 ( t ) < 0 .
E [ f ( T , x ( T ) ) f ( 0 , x ( 0 ) ) ] = 1 2 E 0 T { α ˙ + 2 a α } ( x x ¯ ) 2 d t + { β ˙ + 2 ( a + a ¯ ) β } x ¯ 2 d t + 1 2 E 0 T 2 α i = 1 2 b i ( u i u ¯ i ) ( x x ¯ ) d t + 1 2 E 0 T 2 β i = 1 2 ( b i + b ¯ i ) u ¯ i x ¯ d t + 1 2 0 T [ σ 2 + Θ μ 2 ( t , θ ) ν ( d θ ) ] α d t
E [ L f ( 0 , x ( 0 ) ) ] = 1 2 [ ( q ( T ) α ( T ) ) ] E ( x ( T ) x ¯ ( T ) ) 2 + 1 2 [ q ( T ) + q ¯ ( T ) β ( T ) ] x ¯ 2 ( T ) + 1 2 E 0 T { α ˙ + 2 a α ( b 1 2 r 1 + b 2 2 r 2 ) α 2 + q } ( x x ¯ ) 2 d t + 1 2 E 0 T β ˙ + 2 ( a + a ¯ ) β ( ( b 1 + b ¯ 1 ) 2 r 1 + r ¯ 1 + ( b 2 + b ¯ 2 ) 2 r 2 + r ¯ 2 ) β 2 + q + q ¯ x ¯ 2 d t + 1 2 E 0 T r 1 [ u 1 u ¯ 1 + b 1 r 1 α ( x x ¯ ) ] 2 d t + 1 2 E 0 T ( r 1 + r ¯ 1 ) [ u ¯ 1 + ( b 1 + b ¯ 1 ) r 1 + r ¯ 1 β x ¯ ] 2 d t + 1 2 E 0 T r 2 [ u 2 u ¯ 2 + b 2 r 2 α ( x x ¯ ) ] 2 d t + 1 2 E 0 T ( r 2 + r ¯ 2 ) [ u ¯ 2 + ( b 2 + b ¯ 2 ) r 2 + r ¯ 2 β x ¯ ] 2 d t + 1 2 0 T [ σ 2 + Θ μ 2 ( t , θ ) ν ( d θ ) ] α d t ,
where we have used the following square completions:
r 1 ( u 1 u ¯ 1 ) 2 + ( r 1 + r ¯ 1 ) [ u ¯ 1 ] 2 + 2 α b 1 ( u 1 u ¯ 1 ) ( x x ¯ ) + 2 β ( b 1 + b ¯ 1 ) u ¯ 1 x ¯ = r 1 [ u 1 u ¯ 1 + b 1 r 1 α ( x x ¯ ) ] 2 b 1 2 r 1 α 2 ( x x ¯ ) 2 + ( r 1 + r ¯ 1 ) [ u ¯ 1 + ( b 1 + b ¯ 1 ) r 1 + r ¯ 1 β x ¯ ] 2 ( b 1 + b ¯ 1 ) 2 r 1 + r ¯ 1 β 2 x ¯ 2 , r 2 ( u 2 u ¯ 2 ) 2 + ( r 2 + r ¯ 2 ) [ u ¯ 2 ] 2 + 2 α b 2 ( u 2 u ¯ 2 ) ( x x ¯ ) + 2 β ( b 2 + b ¯ 2 ) u ¯ 2 x ¯ = r 2 [ u 2 u ¯ 2 + b 2 r 2 α ( x x ¯ ) ] 2 b 2 2 r 2 α 2 ( x x ¯ ) 2 + ( r 2 + r ¯ 2 ) [ u ¯ 2 + ( b 2 + b ¯ 2 ) r 2 + r ¯ 2 β x ¯ ] 2 ( b 2 + b ¯ 2 ) 2 r 2 + r ¯ 2 β 2 x ¯ 2 .
It follows that the equilibrium solution is
inf u 1 sup u 2 E [ L ( u 1 , u 2 ) ] = 1 2 α ( 0 ) v a r ( x ( 0 ) ) + 1 2 β ( 0 ) [ E ( x ( 0 ) ) ] 2 + 1 2 0 T [ σ 2 ( t ) + Θ μ 2 ( t , θ ) ν ( d θ ) ] α ( t ) d t , u 1 * = b 1 r 1 α ( x x ¯ ) ( b 1 + b ¯ 1 ) r 1 + r ¯ 1 β x ¯ , u 2 * = b 2 r 2 α ( x x ¯ ) ( b 2 + b ¯ 2 ) r 2 + r ¯ 2 β x ¯ , α ˙ + 2 a α ( b 1 2 r 1 + b 2 2 r 2 ) α 2 + q = 0 , α ( T ) = q ( T ) , β ˙ + 2 ( a + a ¯ ) β ( ( b 1 + b ¯ 1 ) 2 r 1 + r ¯ 1 + ( b 2 + b ¯ 2 ) 2 r 2 + r ¯ 2 ) β 2 + q + q ¯ = 0 , β ( T ) = q ( T ) + q ¯ ( T ) , x ¯ ( t ) = x ¯ ( 0 ) e 0 t { ( a + a ¯ ) β i = 1 2 ( b i + b ¯ i ) 2 r i + r ¯ i } d t .
When r 1 > 0 , r ¯ 1 0 , r 2 < 0 , r ¯ 2 0 , S 2 = i = 1 2 b i 2 r i > 0 and S ˜ 2 = i = 1 2 ( b i + b ¯ i ) 2 r i + r ¯ i > 0 , the functions α , β are explicitly given by
α ( t ) = a S 2 + q 0 S 2 + a 2 S 2 2 1 + q 0 ( T ) a S 2 + q 0 S 2 + a 2 S 2 2 Γ , Γ : = 1 2 ( q 0 ( T ) a S 2 + q 0 S 2 + a 2 S 2 2 ) 1 2 ( q 0 ( T ) a S 2 q 0 S 2 + a 2 S 2 2 ) e 2 ( T t ) q 0 S 2 + a 2 S 2 2 , β ( t ) = a + a ¯ S ˜ 2 + q 0 + q ¯ 0 S ˜ 2 + a 2 S ˜ 2 2 1 + q 0 ( T ) + q ¯ 0 ( T ) a + a ¯ S ˜ 2 + q 0 + q ¯ 0 S ˜ 2 + ( a + a ¯ ) 2 S ˜ 2 2 Γ ˜ , Γ ˜ : = 1 2 ( q 0 ( T ) + q ¯ 0 ( T ) a + a ¯ S ˜ 2 + q 0 + q ¯ 0 S ˜ 2 + ( a + a ¯ ) 2 S ˜ 2 2 ) 1 2 ( q 0 ( T ) + q ¯ 0 ( T ) a + a ¯ S ˜ 2 q 0 + q ¯ 0 S ˜ 2 + ( a + a ¯ ) 2 S ˜ 2 2 ) × e 2 ( T t ) q 0 + q ¯ 0 S ˜ 2 + ( a + a ¯ ) 2 S ˜ 2 2 .
Notice that under the conditions r 1 ( t ) > 0 , r ¯ 1 ( t ) > 0 , r 2 ( t ) < 0 , r ¯ 2 ( t ) < 0 , S 2 ( t ) > 0 , S ˜ 2 ( t ) > 0 , the minimax solution is also a maximin solution: there is a saddle point, and the saddle point is ( u 1 * , u 2 * ) . It solves
E [ L ( u 1 * , u 2 ) ] E [ L ( u 1 * , u 2 * ) ] E [ L ( u 1 , u 2 * ) ] , ( u 1 , u 2 ) U 1 × U 2 .
The value of the game is
E [ L ( u 1 * , u 2 * ) ] = 1 2 α ( 0 ) v a r ( x ( 0 ) ) + 1 2 β ( 0 ) [ E x ( 0 ) ] 2 + 1 2 0 T [ σ 2 + Θ μ 2 ( t , θ ) ν ( d θ ) ] α ( t ) d t .

5. Checking Our Results

In this section, we verify the validity of our results above using a Bellman system. Due to the non-Markovian nature of x , one needs to build an augmented state. A candidate augmented state is the measure m, since one can write the objective functionals in terms of the measure m ( d x ) . This leads to a dynamic programming principle in infinite dimensions. Below, we use functional derivatives with respect to m ( d x ) . The Bellman equilibrium system (in infinite dimension) is
V ^ i , t ( t , m ) + x H i ( x , m , V ^ i , m , V ^ i , x m , V ^ i , x x m ) m ( d x ) = 0 ,
where the terminal equilibrium payoff functional at time T is
V ^ i ( T , m ) = 1 2 q ( T ) y ( y m ¯ ) 2 m ( d y ) + 1 2 [ q ( T ) + q ¯ ( T ) ] y y m ( d y ) 2 ,
and the integrand Hamiltonian is
H i ( x , m , V ^ i , m , V ^ i , x m , V ^ i , x m m ) = inf u i 1 2 q i ( x x ¯ ) 2 + 1 2 [ q i + q ¯ i ] x ¯ 2 + 1 2 r i ( u i u ¯ i ) 2 + 1 2 ( r i + r ¯ i ) [ u ¯ i ] 2 + [ a ( x x ¯ ) + ( a + a ¯ ) x ¯ + i = 1 n b i ( u i u ¯ i ) + i = 1 n ( b i + b ¯ i ) u ¯ i ] V ^ i , x m + σ 2 2 V ^ i , x x m + Θ [ V ^ i , m ( t , x + μ ) V ^ i , m μ V ^ i , x m ] ν ( d θ ) .
It is important to notice that the last term in the integrand Hamiltonian Θ [ V ^ i , m ( t , x + μ ) V ^ i , m μ V ^ i , x m ] ν ( d θ ) is coming from the jump process involved in the state dynamics. From this Hamiltonian. we deduce that, generically, the optimal strategy is state-and-mean-field feedback form, as the RHS of (35) is. We now solve explicitly McKean–Vlasov integro-partial differential equation above. Inspired by the structure of the final payoff V ^ i ( T , m ) , we choose a guess functional in the following form:
V ^ i ( t , m ) = α i 2 y ( y m ¯ ) 2 m ( d y ) + β i 2 y y m ( d y ) 2 + δ i .
The reader may ask why the term γ ˜ i [ ( x m ¯ ) ] y y m ( d y ) is missing in the guess functional. This is because we are looking for the expected value optimization (risk-neutral case), and its expected value is zero. The term γ i y y m ( d y ) does not appear because there is no constant shift in the drift and no cross-terms in the loss function.
We now utilize the functional directional derivative. Consider another measure m ˜ L 2 and compute V ^ i ( t , m + ϵ m ˜ ) .
V ^ i ( t , m + ϵ m ˜ ) = α i 2 y ( y m ¯ ϵ m ˜ ¯ ) 2 m ( d y ) + ϵ α i 2 y ( y m ¯ ϵ m ˜ ¯ ) 2 m ˜ ( d y ) + β i 2 y y m ( d y ) + ϵ y y m ˜ ( d y ) 2 + δ i .
Differentiating the latter term with respect to ϵ yields
d d ϵ V ^ i ( t , m + ϵ m ˜ ) = 2 α i m ˜ ¯ ( t ) 2 y ( y m ¯ ϵ m ˜ ¯ ) m ( t , d y ) 2 ϵ 2 α i m ˜ ¯ 2 y ( y m ¯ ϵ m ˜ ¯ ) m ˜ ( t , d y ) + α i 2 y ( y m ¯ ϵ m ˜ ¯ ) 2 m ˜ ( d y ) + 2 β i m ˜ ¯ 2 y y m ( d y ) + ϵ y y m ˜ ( d y ) = 2 α i m ˜ ¯ 2 y ( y m ¯ ) m ( d y ) + α i 2 y ( y m ¯ ϵ m ˜ ¯ ( t ) ) 2 m ˜ ( d y ) + 2 β i m ˜ ¯ 2 y y m ( d y ) = α i 2 y ( y m ¯ ϵ m ˜ ¯ ) 2 m ˜ ( d y ) + 2 β i m ˜ ¯ 2 y y m ( d y ) .
We deduce the following equalities:
V ^ i , m [ t , m ˜ ] ( x ) = α i 2 x y y m ( d y ) 2 + β i x y y m ( d y ) , V ^ i , x m [ t , m ] ( x ) = α i x y y m ( d y ) + β i y y m ( d y ) , V ^ i , x x m [ t , m ] ( x ) = α i , V ^ i , m ( t , x + μ ) V ^ i , m μ V ^ i , x m = 1 2 α i μ 2 ;
1 2 r i ( u i u ¯ i ) 2 + 1 2 ( r i + r ¯ i ) [ u ¯ i ] 2 + [ i = 1 n b i ( u i u ¯ i ) + j = 1 n ( b j + b ¯ j ) u ¯ i ] V ^ i , x m = 1 2 r i ( u i u ¯ i ) 2 + 1 2 ( r i + r ¯ i ) [ u ¯ i ] 2 + [ i = 1 n b i ( u i u ¯ i ) ] V ^ i , x m E V ^ i , x m + E V ^ i , x m + [ j = 1 n ( b j + b ¯ j ) u ¯ j ] V ^ i , x m E V ^ i , x m + E V ^ i , x m = 1 2 r i ( u i u ¯ i ) 2 + 1 2 ( r i + r ¯ i ) [ u ¯ i ] 2 + V ^ i , x m E V ^ i , x m j = 1 n b j ( u j u ¯ j ) + [ E V ^ i , x m ] j = 1 n b j ( u j u ¯ j ) + V ^ i , x m E V ^ i , x m j = 1 n ( b j + b ¯ j ) u ¯ j + [ E V ^ i , x m ] j = 1 n ( b j + b ¯ j ) u ¯ j ,
where we have used the following orthogonal decomposition:
V ^ i , x m = V ^ i , x m E V ^ i , x m + E V ^ i , x m .
Noting that the expected value of the following term
[ E V ^ i , x m ] j = 1 n b j ( u j u ¯ j ) + V ^ i , x m E V ^ i , x m j = 1 n ( b j + b ¯ j ) u ¯ j
is zero, the optimization yields the optimization of
1 2 r i ( u i u ¯ i ) 2 + 1 2 ( r i + r ¯ i ) [ u ¯ i ] 2 + V ^ i , x m E V ^ i , x m j = 1 n b j ( u j u ¯ j ) + [ E V ^ i , x m ] j = 1 n ( b j + b ¯ j ) u ¯ j .
Thus, the equilibrium strategy of decision-maker i is
u i * = u ¯ i b i r i V ^ i , x m E V ^ i , x m = u ¯ i b i r i α i x y y m ( t , d y ) = b i + b ¯ i r i + r ¯ i β i y y m ( t , d y ) b i r i α i x y y m ( t , d y ) , u ¯ i * = b i + b ¯ i r i + r ¯ i E [ V ^ i , x m ] = b i + b ¯ i r i + r ¯ i β i y y m ( t , d y ) ,
which are exactly the expressions of the optimal strategies obtained in (15). Based on the latter expressions, we refine our statement. The optimal strategy is state-and-(mean of) mean-field feedback form.
We now solve explicitly the McKean–Vlasov integro-partial differential equation above.
H ˜ i ( x , m , V ^ i , m , V ^ i , x m , V ^ i , x m m ) = 1 2 q i ( x x ¯ ) 2 + 1 2 [ q i + q ¯ i ] x ¯ 2 + a ( x x ¯ ) ( V ^ i , x m E V ^ i , x m ) + ( a + a ¯ ) x ¯ E V ^ i , x m + σ 2 2 V ^ i , x x m + Θ [ V ^ i , m ( t , x + μ ) V ^ i , m μ V ^ i , x m ] ν ( d θ ) + inf u i 1 2 r i ( u i u ¯ i ) 2 + 1 2 ( r i + r ¯ i ) [ u ¯ i ] 2 + [ V ^ i , x m E V ^ i , x m ] j = 1 n b j ( u j u ¯ j ) + [ E V ^ i , x m ] j = 1 n ( b j + b ¯ j ) u ¯ j = 1 2 q i ( x x ¯ ) 2 + 1 2 [ q i + q ¯ i ] x ¯ 2 + 1 2 2 a α i ( x x ¯ ) 2 + 1 2 2 ( a + a ¯ ) β i x ¯ 2 + σ 2 2 α i + Θ 1 2 μ 2 ( t , θ ) α i ν ( d θ ) + 1 2 b i 2 r i α i 2 ( x x ¯ ) 2 + 1 2 ( b i + b ¯ i ) 2 r i + r ¯ i β i 2 x ¯ 2 α i ( x x ¯ ) 2 j = 1 n b j 2 r j α j β i x ¯ 2 j = 1 n ( b j + b ¯ j ) 2 ( r j + r ¯ j ) β j = 1 2 { 2 a α i 2 α i j = 1 n b j 2 r j α j + b i 2 r i α i 2 + q i } ( x x ¯ ) 2 + 1 2 2 ( a + a ¯ ) β i 2 β i j = 1 n ( b j + b ¯ j ) 2 ( r j + r ¯ j ) β j + ( b i + b ¯ i ) 2 r i + r ¯ i β i 2 + q i + q ¯ i x ¯ 2 + 1 2 α i [ σ 2 + Θ μ ( t , θ ) ν ( d θ ) ] .
Using the time derivative of V ^ i ( t , m ) ,
V ^ i , t = α ˙ i 2 y ( y m ¯ ) 2 m ( d y ) + β ˙ i 2 y y m ( d y ) 2 + δ ˙ i ,
and identifying the coefficients, we arrive at
α ˙ i + 2 a α i 2 α i j = 1 n b j 2 r j α j + b i 2 r i α i 2 + q i = 0 , α i ( T ) = q i ( T ) , β ˙ i + 2 ( a + a ¯ ) β i 2 β i j = 1 n ( b j + b ¯ j ) 2 ( r j + r ¯ j ) β j + ( b i + b ¯ i ) 2 r i + r ¯ i β i 2 + q i + q ¯ i = 0 , β i ( T ) = q i ( T ) + q ¯ i ( T ) , δ ˙ i + 1 2 α i [ σ 2 + Θ μ ( t , θ ) ν ( d θ ) ] = 0 , δ i ( T ) = 0 .
We retrieve the expressions in (15), confirming the validity of our approach.

6. Conclusions

In this article, we have shown that a mean-field equilibrium can be determined in a semi-explicit way for the linear–quadratic game problem where the Brownian motion is replaced by a jump-diffusion process in which the drift is of mean-field type. The method does not require the sophisticated non-elementary extension (33) to backward–forward systems. It does not need IPDEs (33). It does not need SMPs. It is basic and applies the expectation of Itô’s formula. The use of this simple method may open the accessibility of the tool to a broader audience including beginners and engineers to this emerging field of mean-field-type game theory. In our future work, we would like to investigate the extension of the method to include common noise, action-and-state-dependent jump-diffusion coefficients, matrix forms, operator forms, jump-fractional noise, and risk-sensitivity, among other interesting aspects [38,39]. Additionally, it would be interesting to investigate the non-quadratic mean-field-dependent setting, as Duncan et al. [24] have extended the direct method to some non-quadratic cost functionals on spheres, torus, and more general spaces. We also would like to investigate how the explicit solution provided in this article can be used to improve numerical methods in mean-field-type game theory.

Acknowledgments

This research work is supported by U.S. Air Force Office of Scientific Research under grant number FA9550-17-1-0259 and this research supported by NSF grant DMS 1411412 and AFOSR grant FA9550-17-1-0073.

Author Contributions

T.E.D. and H.T. conceived and designed the model; performed the analysis and wrote the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Difference with the Mean-Field Game Approach

We would like to highlight that the variance reduction problem is fundamentally different from the classical risk-neutral mean-field game approach. In the classical mean-field game literature, the mean-field term is frozen, as it is assumed to be resulting from an infinite number of decision-makers. However, when one wants to reduce the variance on own-state, one cannot freeze the mean state as it is resulting from own-control. A change in the individual action of a deviant decision-maker changes its own-state and therefore it changes its own mean state. To illustrate the difference between mean-field games and mean-field-type games approaches, we consider a basic example below. Through a simple example, we show that for a wide range of parameters, the mean-field game approach is sub-optimal and leads a much higher risk than the mean-field-type game approach.
Let q > 0 , q ¯ > 0 and consider the following mean-field game problem:
( m f g ) inf u i ( · ) U i E q x i 2 ( T ) + q ¯ m 2 ( T ) + 0 T u i 2 d t , subject   to d x i ( t ) = u i ( t ) d t + x i ( t ) d B i ( t ) , x i ( 0 ) R , m ( t ) = lim inf n + 1 n k = 1 n x k ( t ) .
In the problem (mfg), the pair ( u i , x i ) of an individual decision-maker alone does not affect the limiting mean state m in (A1). The solution of the mean-field game problem is
( m f g ) u i m f g ( t ) = α ( t ) x i ( t ) , m ( t ) = m ( 0 ) e 0 t α ( s ) d s L m f g = E α ( 0 ) x i ( 0 ) 2 + γ ( 0 ) , α ˙ + α α 2 = 0 , α ( T ) = q > 0 .
As we can see by freezing the mean-field term, the achieved cost is L m f g = α ( 0 ) v a r ( x ( 0 ) ) + α ( 0 ) [ E x ( 0 ) ] 2 + γ ( 0 ) .
Now consider the following one-decision-maker (say decision-maker i) problem:
( m f t g ) inf u i U i E q x i 2 ( T ) + q ¯ ( x ¯ i ) 2 ( T ) + 0 T u i 2 d t , subject   to d x i ( t ) = u i ( t ) d t + x i ( t ) d B i ( t ) , x i ( 0 ) R , x ¯ i ( t ) = E [ x i ( t ) ] .
In the problem (mftg), the pair ( u i , x i ) of an individual decision-maker alone does significantly affect the mean state x ¯ i in (A3):
( m f t g ) u i m f t g ( t ) = α ( t ) ( x i ( t ) x ¯ i ( t ) ) β ( t ) x ¯ i ( t ) , E u i m f t g ( t ) = β ( t ) x ¯ i ( t ) , x ¯ i ( t ) = x ¯ i ( 0 ) e 0 t β ( s ) d s , L m f t g = α ( 0 ) v a r ( x i ( 0 ) ) + β ( 0 ) [ x ¯ i ( 0 ) ] 2 , α ˙ + α α 2 = 0 , α ( T ) = q > 0 , β ˙ + α β 2 = 0 , β ( T ) = q + q ¯ > 0 .
When we do not freeze the mean-field term, the achieved cost is L m f t g . Taking the difference between the ordinary differential equations, it is not difficult to see that α β satisfies
d d t [ α β ] ( α β ) ( α + β ) , t < T , ( α β ) ( T ) = q ¯ < 0 .
Since q < q + q ¯ for q ¯ > 0 , it implies α ( t ) < β ( t ) . We would like to compare α ( 0 ) [ E x ( 0 ) ] 2 + γ ( 0 ) and β ( 0 ) [ x ¯ ( 0 ) ] 2 .
L m f g L m f t g = α ( 0 ) [ E x ( 0 ) ] 2 + γ ( 0 ) β ( 0 ) [ x ¯ ( 0 ) ] 2 = [ α ( 0 ) β ( 0 ) ] [ E x ( 0 ) ] 2 + γ ( 0 ) = [ α ( 0 ) β ( 0 ) ] [ E x ( 0 ) ] 2 + q ¯ [ E x ( 0 ) ] 2 e 2 0 T α ( s ) d s = [ α ( 0 ) β ( 0 ) + q ¯ e 2 0 T α ( s ) d s ] [ E x ( 0 ) ] 2 .
The latter expression being positive, we deduce that for any T > 0 : L m f g > L m f t g . Thus, in this example, the mean-field game approach—which consists of freezing the mean-field term—is sub-optimal. On the other hand, the mean-field-type game approach coincides with the global optimization problem in the one-decision-maker case. Hence, (A4) is the global optimum.

Difference with Multi-Population Mean-Field Games

The model studied here differs from (non-cooperative) multi-population (multi-class or multi-type) mean-field games. In multi-population mean-field games, it is usually assumed that there is an infinite number of decision-makers, each of them having their own control action. In those models, a single decision-maker does not influence the population mean state within its class, since the class size is assumed infinite. On the other hand, in the mean-field-type game model presented here, there is a finite number of “true” decision-makers, and each decision-maker does have a non-negligible effect on the mean-field terms.

References

  1. Markowitz, H.M. Portfolio Selection. J. Financ. 1952, 7, 77–91. [Google Scholar]
  2. Roos, C.F. A Mathematical Theory of Competition. Am. J. Math. 1925, 47, 163–175. [Google Scholar] [CrossRef]
  3. Roos, C.F. A Dynamic Theory of Economics. J. Polit. Econ. 1927, 35, 632–656. [Google Scholar] [CrossRef]
  4. Markowitz, H.M. The Utility of Wealth. J. Politi. Econ. 1952, 60, 151–158. [Google Scholar] [CrossRef]
  5. Markowitz, H.M. Portfolio Selection: Efficient Diversification of Investments; John Wiley & Sons: New York, NY, USA, 1959. [Google Scholar]
  6. Duncan, T.E.; Pasik-Duncan, B. Solvable stochastic differential games in rank one compact symmetric spaces. Int. J. Control 2017. [Google Scholar] [CrossRef]
  7. Duncan, T.E. Linear Exponential Quadratic Stochastic Differential Games. IEEE Trans. Autom. Control 2016, 61, 2550–2552. [Google Scholar] [CrossRef]
  8. Duncan, T.E.; Pasik-Duncan, B. Linear-quadratic fractional Gaussian control. SIAM J. Control Optim. 2013, 51, 4604–4619. [Google Scholar] [CrossRef]
  9. Duncan, T.E.; Pasik-Duncan, B. Linear-exponential-quadratic control for stochastic equations in a Hilbert space. Dyn. Syst. Appl. 2012, 21, 407–416. [Google Scholar]
  10. Duncan, T.E. Linear-exponential-quadratic Gaussian control. IEEE Trans. Autom. Control 2013, 58, 2910–2911. [Google Scholar] [CrossRef]
  11. Duncan, T.E. Linear-quadratic stochastic differential games with general noise processes. In Models and Methods in Economics and Management Science; El Ouardighi, F., Kogan, K., Eds.; Operations Research and Management Series; Springer International Publishing: Cham, Switzerland, 2014; Volume 198, pp. 17–26. [Google Scholar]
  12. Bensoussan, A.; Frehse, J.; Yam, S.C.P. Mean Field Games and Mean Field Type Control Theory; Springer: Berlin, Germany, 2013. [Google Scholar]
  13. Lasry, J.M.; Lions, P.L. Mean field games. Jpn. J. Math. 2007, 2, 229–260. [Google Scholar] [CrossRef]
  14. Bensoussan, A.; Djehiche, B.; Tembine, H.; Yam, P. Risk-sensitive mean-field-type control. arXiv, 2017; arXiv:1702.01369. [Google Scholar]
  15. Bardi, M. Explicit solutions of some linear-quadratic mean field games. Netw. Heterog. Media 2012, 7, 243–261. [Google Scholar] [CrossRef]
  16. Bardi, M.; Priuli, F.S. Linear-Quadratic N-person and Mean-Field Games with Ergodic Cost. SIAM J. Control Optim. 2014, 52, 3022–3052. [Google Scholar] [CrossRef]
  17. Tembine, H. Mean-field-type games. AIMS Math. 2017, 2, 706–735. [Google Scholar] [CrossRef]
  18. Djehiche, B.; Tembine, H. On the Solvability of Risk-Sensitive Linear-Quadratic Mean-Field Games. arXiv, 2014; arXiv:1412.0037v1. [Google Scholar]
  19. Tembine, H.; Zhu, Q.; Basar, T. Risk-sensitive mean-field games. IEEE Trans. Autom. Control 2014, 59, 835–850. [Google Scholar] [CrossRef]
  20. Tembine, H.; Bauso, D.; Basar, T. Robust linear quadratic mean-field games in crowd-seeking social networks. In Proceedings of the 52nd Annual Conference on Decision and Control, Florence, Italy, 10–13 December 2013; pp. 3134–3139. [Google Scholar]
  21. Kolokoltsov, V.N. Nonlinear Markov Games on a Finite State Space (Mean-field and Binary Interactions). Int. J. Stat. Probab. 2012, 1, 77–91. [Google Scholar] [CrossRef]
  22. Engwerda, J.C. On the open-loop Nash equilibrium in LQ-games. J. Econ. Dyn. Control 1998, 22, 729–762. [Google Scholar] [CrossRef]
  23. Bensoussan, A. Explicit solutions of linear quadratic differential games. In Stochastic Processes, Optimization, and Control Theory: Applications in Financial Engineering, Queueing Networks, and Manufacturing Systems; International Series in Operations Research and Management Science; Springer: Berlin, Germany, 2006; Volume 94, pp. 19–34. [Google Scholar]
  24. Duncan, T.E.; Pasik-Duncan, B. A direct method for solving stochastic control problems. Commun. Inf. Syst. 2012, 12, 1–14. [Google Scholar] [CrossRef]
  25. Lukes, D.L.; Russell, D.L. A Global Theory of Linear-Quadratic Differential Games. J. Math. Anal. Appl. 1971, 33, 96–123. [Google Scholar] [CrossRef]
  26. Tembine, H. Distributed Strategic Learning for Wireless Engineers; CRC Press: Boca Raton, FL, USA, 2012; 496 p. [Google Scholar]
  27. Tembine, H. Risk-sensitive mean-field-type games with p-norm drifts. Automatica 2015, 59, 224–237. [Google Scholar] [CrossRef]
  28. Djehiche, B.; Tembine, H.; Tempone, R. A Stochastic Maximum Principle for Risk-Sensitive Mean-Field-Type Control. IEEE Trans. Autom. Control 2014, 60, 2640–2649. [Google Scholar] [CrossRef]
  29. Dockner, E.J.; Jorgensen, S.; Van Long, N.; Sorger, G. Differential Games in Economics and Management Science; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
  30. Tembine, H. Nonasymptotic mean-field games. IEEE Trans. Syst. Man Cybern. B Cybern. 2014, 44, 2744–2756. [Google Scholar]
  31. Djehiche, B.; Tcheukam, A.; Tembine, H. Mean-field-type games in engineering. AIMS Electron. Electr. Eng. 2017, 1, 18–73. [Google Scholar]
  32. Başar, T.; Djehiche, B.; Tembine, H. Mean-Field-Type Game Theory; Springer: New York, NY, USA, 2019; under preparation. [Google Scholar]
  33. Tembine, H. Energy-constrained mean-field games in wireless networks. Strateg. Behav. Environ. 2014, 4, 187–211. [Google Scholar] [CrossRef]
  34. Nash, J. The Bargaining Problem. Econometrica 1950, 18, 155–162. [Google Scholar] [CrossRef]
  35. Dubey, P. Inefficiency of Nash equilibria. Math. Operat. Res. 1986, 11, 1–8. [Google Scholar] [CrossRef]
  36. Koutsoupias, E.; Papadimitriou, C. Worst-case Equilibria. In Proceedings of the 16th Annual Conference on Theoretical Aspects of Computer Science, Trier, Germany, 4–6 March 1999; pp. 404–413. [Google Scholar]
  37. Koutsoupias, E.; Papadimitriou, C. Worst-case Equilibria. Comput. Sci. Rev. 2009, 3, 65–69. [Google Scholar] [CrossRef]
  38. Tsiropoulou, E.E.; Vamvakas, P.; Papavassiliou, S. Energy Efficient Uplink Joint Resource Allocation Non-cooperative Game with Pricing. In Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC 2012), Shanghai, China, 1–4 April 2012. [Google Scholar]
  39. Musku, M.R.; Chronopoulos, A.T.; Popescu, D.C. Joint rate and power control with pricing. In Proceedings of the IEEE Global Telecommunications Conference, St. Louis, MO, USA, 28 November–2 December 2005. [Google Scholar]
Figure 1. Methods developed in this work. Cooperative vs. noncooperative. Adversarial/robust vs. nonzero-sum mean-field-type games.
Figure 1. Methods developed in this work. Cooperative vs. noncooperative. Adversarial/robust vs. nonzero-sum mean-field-type games.
Games 09 00007 g001
Table 1. Some recent developments on mean-field-type linear–quadratic–Gaussian (MF-LQG)-related games.
Table 1. Some recent developments on mean-field-type linear–quadratic–Gaussian (MF-LQG)-related games.
FeatureState of-the-ArtThis Work
Jump yes
Diffusion[15,16]yes
Mean-Field Type[12,31,32]yes
One decision-maker[12]yes
Two or more decision-makers[31,32]yes
State-MF[12]yes
Control-Action-MF yes
Bargaining yes
Anonymity[15,16]relaxed
Indistinguishability[15,16]relaxed

Share and Cite

MDPI and ACS Style

Duncan, T.E.; Tembine, H. Linear–Quadratic Mean-Field-Type Games: A Direct Method. Games 2018, 9, 7. https://doi.org/10.3390/g9010007

AMA Style

Duncan TE, Tembine H. Linear–Quadratic Mean-Field-Type Games: A Direct Method. Games. 2018; 9(1):7. https://doi.org/10.3390/g9010007

Chicago/Turabian Style

Duncan, Tyrone E., and Hamidou Tembine. 2018. "Linear–Quadratic Mean-Field-Type Games: A Direct Method" Games 9, no. 1: 7. https://doi.org/10.3390/g9010007

APA Style

Duncan, T. E., & Tembine, H. (2018). Linear–Quadratic Mean-Field-Type Games: A Direct Method. Games, 9(1), 7. https://doi.org/10.3390/g9010007

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop