1. Introduction
In [
1], two recently developed branches of game theory, quantum games and mean field games (MFGs), were merged, creating quantum MFGs. MFGs represent a very popular recent development in game theory. It was initiated in [
2,
3]. For recent developments one can consult monographs [
4,
5,
6,
7] and numerous references therein. Quantum games were initiated by Meyer [
8], Eisert, Wilkens and Lewenstein [
9], and Marinatto and Weber [
10], and were dealt with afterwards in numerous publications, see, e.g., surveys [
11,
12], and Chapter 13 of textbook [
13].
Using approaches from [
9,
10], one can transform any game to a new quantum version. This transformation modifies in a systematic way all properties of the games: equilibria, their stability, etc. For instance, stability of the equilibria of the transformed Replicator Dynamics for two-player two-action games was analyzed in [
14]. ESS (evolutionary stable strategies) for the transformed Rock-Paper-Scissors game was analyzed in [
15], and for 3 player games in [
16]. The transformations of the simplest cooperative games were analyzed in [
17]. In [
18] the EWL (Eisert, Wilkens and Lewenstein) protocol was applied to the Battle of Sexes, in [
19] to the general prisoner’s dilemma and in [
20] to the three player quantum Prisoner’s dilemma. Peculiar behavior and remarkable phase transitions were found. The extension of EWL protocol for games with continuous strategy space was suggested in [
21].
For application of related quantum concepts (including quantum probability) to cognitive sciences we refer to [
22,
23] and references therein.
The main accent in all these developments was made on stationary or repeated games, see, e.g., [
24,
25] for the latter, and [
26] for their interpretation in economics. Not only for games, but generally for quantum control the main stream of quantum control research is based on open loop controls, with a rare appearance of a feedback control, see, e.g., [
27] and [
28].
The present paper initiates the study of the truly dynamic theory with observations of counting type and with the strategies chosen by players in real time. Since direct continuous observations are known to destroy quantum evolutions (so-called quantum Zeno paradox) the necessary new ingredient for quantum dynamic games must be the theory of non-direct observations and the corresponding quantum filtering. This theory is usually performed in two forms: diffusive (or homodyne) type and counting type. In paper [
1] the author developed quantum MFGs based on diffusive type filtering. In the present paper quantum MFGs are built for counting type quantum observations and filtering.
As a part of the construction we show that the limiting behavior of mean field interacting controlled quantum particles (or N-player quantum game) can be described by certain classical MFG forward-backward system of jump-type equations on manifolds, the forward part being given by a new kind of nonlinear jump-type stochastic Schrödinger equations. One of the objectives of the paper is to draw the attention of game theorists to this type of games and this type of forward-backward systems, which were not studied before, and no results even on the existence of solutions are available. These objects are fully classical, but represent the limit of quantum games.
The main result states that any solution to this forward-backward system represents an approximate -Nash equilibrium for the initial N-player dynamic quantum game.
The content of the paper is as follows. In the next section we recall the basic theory of quantum continuous measurement and filtering. In
Section 3, as a warm-up, we discuss briefly an example of a two-player quantum dynamic game on a qubit with observation and feedback control of counting type. In
Section 4 the new nonlinear equations are introduced for the case of controlled counting detection and the convergence of
N-particle observed quantum evolutions to the decoupled system of these equations is obtained, together with explicit rates of convergence. In
Section 5 the MFG limits for quantum
N-player games are introduced and it is proven that solutions for the limiting MFG equations specify
-Nash equilibria for
N-player quantum game, with
of order
. The limiting MFGs can be also looked at as classical MFGs, though complex-valued and evolving in infinite-dimensional manifolds. In the final section we state the problem of existence of the solutions, even in the simplest case of the control problem on a qubit.
2. Quantum Filtering of Counting Type
The general theory of quantum non-demolition observation, filtering and resulting feedback control was built essentially in papers [
29,
30,
31]. For alternative simplified derivations of the main filtering equations given below (by-passing the heavy theory of quantum filtering) we refer to [
32,
33,
34,
35,
36] and references therein. For the technical side of organising feedback quantum control in real time, see, e.g., [
37,
38,
39].
We shall describe briefly the main result of this theory.
The non-demolition measurement of quantum systems can be organised in two versions: photon counting and homodyne detection. As was stressed above, here we shall deal only with counting measurements. In this case the main equation of quantum filtering takes the form
in terms of the density matrices
, where
H is the Hamiltonian of the free (not observed) motion of a quantum system, the operators
define the coupling of the system with the measurement devices, and the (counting) observed Poisson processes
are independent and have the position dependent intensities
, so that the compensated processes
are martingales. By
we denote the anticommutator of two operators:
. In terms of the compensated processes
Equation (
1) rewrites as
In this paper we shall deal only with the simplest case when the operators
L are unitary. In this case
and Equations (
1) and (
2) become linear and take the form
This dynamics preserves the set of pure states. Namely, if
satisfies the equation
then
satisfies Equation (
3).
The theory of quantum filtering reduces the analysis of quantum dynamic control and games to the controlled version of evolutions (
1). Two types of control can be naturally considered (see [
40]). The players can control the Hamiltonian
H, say, by applying appropriate electric or magnetic fields to the atom, or the coupling operators
. Thus (
3) extends to the equation
with some self-adjoint
, control
u and a family of unitary operators
depending on a control parameter
v.
It is seen from Equation (
5) that its evolution preserves traces of matrices. One can also show that these evolutions preserve positivity of matrices
(see, e.g., [
36]).
3. Example of a Quantum Dynamic Two-Player Game
Let us stress again that the whole physics of quantum dynamic games with a feedback control of a finite number of players is incorporated into the stochastic filtering Equation (
1), so that the quantum dynamic games are reduced to the stochastic games with jumps governed by this equation with operators
H and
L that may depend on control. As a warm-up before the mean-field setting let us consider the simple example of a zero-sum quantum dynamic two-player game on a qubit, where a complete analytic solution can be found.
Working with a qubit means that the Hilbert space of the quantum system is two dimensional. Let
L be fixed and the Hamiltonian be the sum of two parts, controlled by the first and the second player respectively. Stochastic filtering Equation (
1) simplifies to the equation (omitting index
t)
being control parameters of players I and II. Assume
,
with some positive
. Moreover,
has only two coordinates:
. Using Ito’s rule
we find the equation for
:
Consequently, again by Ito’s rule, we find the equation for
:
where
.
Let us choose the simplest possible
L:
—the third Pauli matrix (diagonal with diagonal elements 1 and
). Then Equation (
7) simplifies to
The payoffs in quantum setting are given by certain operators, that is, they have the form
where
J and
F are some self-adjoint operators. They may depend on the control parameters, but we shall look for the case when they do not. In terms of
w this payoff rewrites as
Thus the zero-sum quantum dynamic two-player game (with a feedback control) with a fixed horizon
T in this setting is the stochastic dynamic game with the state space
, with the evolution described by the jump-type stochastic Equation (
8) and payoff (
10). The aim of the first player is to maximise the expectation of (
10) using an appropriate feedback strategies
. The second player tries to minimise it using an appropriate feedback strategies
.
The remarkable feature of this game is that the possible jumps are only of type . Consequently, in the coordinates and (where ), the dynamics is deterministic. Therefore, if the operators J and F of current and terminal payoffs are invariant under the transformation , the game can be reduced to a deterministic differential game. This game is still very complicated.
Let us consider now the most trivial example of commuting operators
and
controlled by two players. To be concrete, let us chose
diagonal with diagonal elements 1 and 0, and
diagonal with elements 0 and 1. Then Equation (
8) becomes linear in
w:
and then the modulus
becomes the integral of motion:
. Choosing
for definiteness we get the equation for the angle
on the circle
:
If
J and
F are invariant under the transformation
, we can identify points when
differ only by a sign (so that possible jumps
become irrelevant), and the evolution on a circle, given by the set
with identified endpoints, becomes deterministic:
that is a simple rotation. Choosing
and the simplest nontrivial
J with zero diagonal elements and real numbers
j as non-diagonal terms. The payoff (
10) for
simplifies to
The HJB-Isaacs equation takes the form
Assuming for definiteness that
, so that the first player has an edge in this game, the equation rewrites as
This is HJB of a pure maximisation problem. It can be solved via the method of viscosity solutions. For instance, let us find a stationary solution describing the average winning of the first player per unit of time in a long lasting game. For this one searches for a solution to (
15) in the form
with a constant
. Then
(obviously defined up to a constant multiplier, so that we can set
) satisfies the equation
To guess the right solution one can derive from the meaning of this equation that
must be an even function of
with maximum at
, decreasing on
. Hence
and thus
and Equation (
16) on
becomes
so that
This function (considered as periodically continued with period
to the whole line) is smooth outside points
, where it has convex kinks. Hence this is really the viscosity solution to (
16) confirming that our educated guess above was correct and that
is the income per unit of time to the first player for a long lasting game.
Another example for the case of quantum control (without games) was given in [
28].
4. Controlled Limiting Stochastic Equation
Let X be a Borel space with a fixed Borel measure that we denote .
For a linear operator O in we shall denote by the operator in that acts on functions as O acting on the variable . For a linear operator A in we shall denote by the operator in that acts as A on the variables .
Let
H and
be two self-adjoint operators in
and
A a self-adjoint integral operator in
with the kernel
that acts on the functions of two variables as
It is assumed that A is symmetric in the sense that it takes symmetric functions (symmetric with respect to permutation of x and y) to symmetric functions.
Let us consider the quantum evolution of
N particles driven by the interaction Hamiltonian
Here continuous functions describe the controls of jth agent, who is supposed to have access to the jth subsystem, namely to the partial trace (with respect to all other variables but j) of the state . All are taken from a bounded interval .
In order to be able to carry out a feedback control we assume further that this quantum system is observed via coupling with the collection of (possibly controlled) identical one-particle unitary families
. That is, we consider the filtering Equation (
3) of the type
The corresponding density matrix
satisfies the equation of type (
5):
The main ingredient in the construction of quantum MFG theory is the quantum law of large numbers that states that as
, the limiting evolution of each particle (precise conditions are given in the theorem below) is described by the nonlinear stochastic equation
where
is the integral operator in
with the integral kernel
and
The equation for the corresponding density matrix
writes down as
For the analysis of the limiting behavior we use an approach from [
41,
42], where the main measures of the deviation of the solutions
to
N-particle systems from the product of the solutions
to the Hartree equations are the following positive numbers from the interval
:
In the present stochastic case, these quantities depend not just on the number of particles in the product, but on the concrete choice of these particles. The proper stochastic analog of the quantity
is the collection of random variables
where the latter equation holds by the definition of the partial trace. Here
is identified with the operator in
acting on the
jth variable and
denotes the partial trace of
with respect to all variables except for the
jth.
Since evolutions (
20) preserve the set of operators with the unit trace, (
23) rewrites as
Assuming that all controls
and
are given by identical feedback functions
,
and that the initial conditions for Equation (
19) is the tensor product of i.i.d. random vectors, the expectations
are well defined (they do not depend on a particular choice of particles).
Expressions
can be linked with the traces by the following inequalities, due to Knowles and Pickl:
see Lemma 2.3 from [
42].
Theorem 1. Let be self-adjoint operators in , with bounded, and be a family of unitary operators depending Lipschitz continuously on v: Let A be a symmetric self-adjoint integral operator A in with a Hilbert-Schmidt kernel, that is a kernel such that Let the functions and with values in bounded intervals and respectively be Lipschitz in the sense that Let be solutions to Equation (21) with i.i.d. initial conditions , . Let be the solution to the N-particle Equation (19) with given by (18) and with the initial condition Proof. By Ito’s product rule for counting processes,
with the Ito product rule being
.
Let us denote by I and II the parts of the differential that contain and, respectively, not.
Starting with II we obtain, denoting
the operator
acting on the
jth variable, that
with
and
with
where for the last inequality we used (
25).
The term
was dealt with in [
1] (proof of Theorems 3.1) yielding the estimate
Let us turn to I. We have
Since
and
with
commute, it follows that all terms with
cancel. Taking into account other cancelation (arising from the unitarity of
) we obtain
If
would be constant, this expression would vanish. In the present controlled version, some work is required. First of all, writing
we obtain
with
To make the calculations more transparent, let us omit indices at
. Thus
where
We can now estimate
as
above yielding
With
yet another add-and-subtract manipulation is required. Namely,
with
The first term is estimated as above yielding
And the second one is estimated as
Therefore, since
is a martingale and its differential does not contribute to the expectation, it follows that
Applying Gronwall’s inequality yields (
30). □
5. Quantum MFG
Let us consider the quantum dynamic game of
N players, where the dynamics of the density matrix
is given by the controlled dynamics of type (
20):
Assume as above that controls
and
of each
jth player can be chosen from some bounded closed intervals
and
respectively, that the initial matrix is the product of iid states,
and that the payoff of each player on the interval
is given by the expression
where
J and
F are some operators in
expressing the current and the terminal costs of the agent,
and
denote their actions on the
jth variable, constants
measure the cost of applying control
u.
Remark 1. (i) We choose the simplest payoff function. Of course more general dependence on u, v is possible. As long as payoff is convex in u and v the results below are still valid. (ii) Also everything remains in force if only H or only L is controlled, that is either u or v is not present in all formulas.
Notice that by the property of the partial trace, the payoff (
34) rewrites as
so that it really depends explicitly only on the individual partial traces
, which can be considered as quantum analogs of the positions of classical particles.
Let us stress again that, after all equations arising from physics are written, our quantum dynamic
N-player game can be formulated in fully classical terms. Namely, the goal of each
jth player is to maximise the expectation of payoff (
35) under the evolution (
33) depending on all controls
. The information available to the
jth player is the ‘position’ of
jth player, which is the partial trace
, and thus the actions of
jth player are chosen among the feedback strategies
that are measurable functions
. An additional technical assumption that we are using in the analysis below is that the class of feedback strategies is reduced to Lipschitz continuous functions of partial traces. Therefore both the information setting and technical assumptions are slightly different from the simpler setting of two-player game of
Section 3, where players were assumed to define their strategies on the basis of the whole state (not a partial trace). The restriction to partial traces is necessary to uncouple the dynamics in the limit of
.
The limiting evolution of each player can be expected to be described by the equations
with
and with payoffs given by
For pure states
this payoff turns to
Let us say that the pair of functions
with
and
from the set of density matrices in
, and
with
,
, solve the limiting MFG problem if (i)
is an optimal feedback strategy for the stochastic control problem (
36), (
37) under the fixed function
and (ii)
arises from the solution of (
36) under fixed
.
Theorem 2. Let the conditions on from Theorem 1 hold. Assume that the pair and solves the limiting MFG problem and moreover is Lipschitz in the sense of inequality (29). Then the strategiesform a symmetric ϵ-Nash equilibrium for the N-agent quantum game described by (33) and (34), where strategies of all players are sought among measurable controls that depend Lipschitz in γ in the sense of inequality (29), with , depending on , , ϰ, . Proof. Assume that all players, except for one of them, say the first one, are playing according to the MFG strategy , , and the first player is following some other strategy . By the law of large numbers (which is not affected by a single deviation), all are equal and are given by the formula for all . Moreover, are the same for all .
Following the proof of Theorem 1 we obtain
with the same
, as in the proof of Theorem 1, though
being
,
, and
for
. Looking first at
we note that up to an additive correction of magnitude not exceeding
expression
can be substituted by the expression
which is then dealt with exactly as in the proof of Theorem 1 (with
instead of
N) yielding the same estimate (
30) (with a corrected multiplier) for
,
, that is
The same estimate is obtained for
(even without the correcting term
) yielding
for all
j and a constant
depending on
.
We can now compare the expected payoffs (
35) received by the players in the
N-player quantum game with the expected payoff (
37) received in the limiting game. For each
jth player the difference is bounded by
Since,
and by (
25),
it follows that the expectation of the difference of the payoffs is bounded by
with a constant
depending on
.
But by the assumption of the Theorem, is the optimal choice for the limiting optimization problem. Hence the claim of the theorem follows. □
6. Discussion
The problem of proving existence or uniqueness for the solution of the limiting MFG on manifold seems to be nontrivial. We suggest it as an interesting open problem.
Let us give a bit more detail for the simplest case of two-dimensional Hilbert space (a qubit), as in
Section 3.
When there is no control
v (that is, operator
L is constant) and there is no free (uncontrolled) part of the Hamiltonian, the limiting Equation (
21) simplify to the equation (omitting indices
j and
t for simplicity)
Moreover,
has only two coordinates:
. Using Ito’s rule as in
Section 3, we find the equation for
:
where
.
The most common interaction operator between qubits is the operator describing the possible exchange of photons,
, with the annihilation operators
and
of the two atoms. This interaction is given by the tensor
such that
with other elements vanishing. Hence
,
, with other elements vanishing. Let us take also the simplest possible
L:
—the third Pauli matrix. Then Equation (
42) simplifies to
If
is diagonal with diagonal elements
, this turns to
In this simplest case, choosing
and
in payoff (
38), we obtain the HJB equation for the individual control in the form
Already this equation on the complex plane
, describing optimal control for the individual quantum feedback control in a qubit, is quite nonstandard. And to deal with the corresponding forward-backward system one needs not only its well-posedness in a certain generalized sense, but some continuous dependence on parameters. May be some method from [
43] or [
28] can be used to get insight into this problem.
As a future research direction it is worth mentioning the general development of the theory of the limiting classical mean-field games, which are mean-field games on infinite dimensional curvilinear manifolds based on Markov processes with jumps, highly fascinating and nontrivial objects. Of course usual questions of classical mean-field games on the connection between stationary and time dependent solutions are fully open here, as well as the theory of the corresponding master equation. On the other hand, quantum dynamic games of finite number of players (touched upon in
Section 3) lead to new nonlinear functional-differential equations on manifolds of Hamilton-Jacobi or Isaacs type, which are also worthy of proper analysis.