Discrete-Time Constrained Average Stochastic Games with Independent State Processes

Zhang, Wenzhao

doi:10.3390/math7111089

Open AccessArticle

Discrete-Time Constrained Average Stochastic Games with Independent State Processes

by

Wenzhao Zhang

College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China

Mathematics 2019, 7(11), 1089; https://doi.org/10.3390/math7111089

Submission received: 6 August 2019 / Revised: 30 October 2019 / Accepted: 6 November 2019 / Published: 11 November 2019

(This article belongs to the Section Mathematics and Computer Science)

Download Versions Notes

Abstract

:

In this paper, we consider the discrete-time constrained average stochastic games with independent state processes. The state space of each player is denumerable and one-stage cost functions can be unbounded. In these game models, each player chooses an action each time which influences the transition probability of a Markov chain controlled only by this player. Moreover, each player needs to pay some costs which depend on the actions of all the players. First, we give an existence condition of stationary constrained Nash equilibria based on the technique of average occupation measures and the best response linear program. Then, combining the best response linear program and duality program, we present a non-convex mathematic program and prove that each stationary Nash equilibrium is a global minimizer of this mathematic program. Finally, a controlled wireless network is presented to illustrate our main results.

Keywords:

constrained Nash equilibria; expected average criteria; average occupation measures

MSC:

90C40; 91A25

1. Introduction

Stochastic games introduced by Shapley in Reference [1] which have been actively pursued over the last few decades because of several applications mainly in economics and queueing system; see, for instance, References [2,3,4,5]. In this paper, we study the special constrained stochastic games with independent state processes. More precisely, each player chooses an action and pays some costs which depend on the actions of all players in each stage. However, each player’s action only influence the transition probability of the Markov chain controlled by herself. In such case, the player would wish to minimize certain expected average cost which she is most concerned about, while wants to keep other kinds of expected average costs within bounds. The game models under these formulation have been considered in References [6,7]. In Reference [6], the authors present an existence condition of stationary Nash equilibria for the constrained average games. Reference [7] derives a necessary and sufficient condition for a stationary Nash equilibria to be a global minimizer of a mathematic program.

Different from the framework of game model considered in this paper, References [2,8,9] study the so-called centralized stochastic games in which all players jointly control a single Markov chain and the one-stage costs of each player are influenced by the actions of all players. Reference [8] considers the game model with expected discounted cost criteria and expected average cost criteria and yields the existence of stationary Nash equilibria in the context of finite state space and compact action spaces. Using vanishing discount approach, Reference [9] generalizes the existence result in Reference [8] to the average games with denumerable states. Reference [2] extends the existence result in Reference [8] to the discounted game with denumerable states but under some special hypothesis on the transition laws and boundness condition imposed on the one-stage costs. By special finite-state approximation approach, Reference [10] weakens the above existence condition in Reference [8]. For applications of constrained games, Reference [11] studies the wireless powered networks including a user and an attacker. Based on the constrained game theory in Reference [6], Reference [11] gives the energy request and data transmission strategy of the user and the attack strategy of the attacker in the sense of Nash equilibria. Reference [12] considers the wireless networks in which each mobil wants to maximize its expected capacity subject to some power and buffer length constraints. Reference [13] introduces a neighbor-aware energy-efficient monitoring system for energy harvesting Internet of Things. A constrained stochastic game model is established to minimize the number of transmissions while keeping a desired monitoring probability and a best response dynamics-based algorithm is developed in Reference [13].

In this paper, we consider the constrained average games with denumerable states and unbounded costs. The main contributions of the present paper are as follows. First, by introducing the average occupation measures, we establish the so-called best-response linear program to characterize the occupation measures of stationary constrained Nash equilibria as fixed points of certain multifunction from the product space of occupation measures into itself. The standard weak convergence technique used in References [6,7] can not apply directly to the case wherein the costs are unbounded. Meanwhile, the arguments in References [6,7] employ the finiteness of the state spaces. However, the costs are always unbounded and the state spaces are not finite in some controlled stochastic models such as the stochastic inventory model, queueing system; see, for instance References [14,15]. Therefore, we introduce the so-called w-weak convergence topology and impose the standard drift condition on the transition kernel, the growth condition and additive condition on the costs, w-uniform geometrical ergodicity condition and continuity-compactness conditions. By the properties of w-weak convergence, we study the asymptotic properties of average occupation measures and expected average costs, which are used to establish the upper-continuity of the multifunction. Then, we show the existence of stationary constrained Nash equilibria by the fixed-point theorem. It should be mentioned that the vanishing discount approach in Reference [9] is based on the existence of discounted Nash equilibria in Reference [10], that is to show the limit of discounted Nash equilibria is the average Nash equilibrium when the discount factors tend to one. However, in this paper, we use fixed-point method directly because the existence of discounted Nash equilibria for stochastic games with independent state process has not been established. Finally, we characterize each stationary constrained Nash equilibrium as a global minimizer of this mathematical program which can be viewed as the combination of the best response linear program and duality program.

The rest of the paper is organized as follows. In Section 2, we introduce the constrained average game model we are concerned with. In Section 3, we introduce the best response linear program and study the corresponding convergent property of the best response linear program based on the average occupation measures. In Section 4, we establish the main statements, including the existence result and the characterization of stationary constrained Nash equilibria. In Section 5, a controlled wireless network is presented to illustrate our main results.

2. The Game Model

If S is a Borel space, we denote by

B (S)

its Borel

σ

-algebra, by

P (S)

the set of all probability measures on

B (S)

endowed with the topology of weak convergence, and by

B (S)

the set of all Borel measurable functions on S. Let I stand for the indicator function,

N \geq 1

be a fixed integer. Let us define

I : = {1, \dots, N}

, and use indexs i or j to denote a player. Now, we introduce the N-person constrained game model

G : = \{{(X^{i}, A^{i}, {A^{i} (x^{i}) | x^{i} \in X^{i}}, {c_{k}^{i}}_{k = 0}^{p}, {d_{k}^{i}}_{k = 1}^{p}, Q^{i} (\cdot | x^{i}, a^{i}), ν^{i})}_{i \in I}\} .

(1)

For each

i \in I

, the state space

X^{i}

is assumed to be a denumerable set and

A^{i}

is the action space which is assumed to be a Polish space endowed with the Borel

σ

-algebra

B (A^{i})

. Without loss of generality, we assume

X^{i} : = {0, 1, \dots}

and is enumerated by the natural order. Let

X : = \prod_{i = 1}^{N} X^{i}

be the product space of the state spaces and

A : = \prod_{i = 1}^{N} A^{i}

be the product space of the action spaces. For each

x^{i} \in X^{i}

, the nonempty measurable subset

A^{i} (x^{i})

of

A^{i}

denotes the set of actions available when the state of player i is

x^{i} \in X^{i}

. Let

K^{i} : = \{(x^{i}, a^{i}) | x^{i} \in X^{i}, a^{i} \in A^{i} (x^{i})\}

and

K : = \prod_{i = 1}^{N} K^{i}

.

Q^{i}

is the stochastic kernel on

X^{i}

given

K^{i}

controlled by player i, that is

Q^{i} (y^{i} | x^{i}, a^{i})

is the probability of moving from state

x^{i}

to

y^{i}

if player i chooses action

a^{i}

. For each

0 \leq k \leq p

, the one-stage cost

c_{k}^{i}

is a real-valued measurable function on

K^{i}

. The constants

d_{k}^{i}

(

1 \leq k \leq p

) denote the constraints and

ν^{i}

denotes the initial distribution of player i. Let

ν : = (ν^{1}, \dots, ν^{N})

.

For each

i \in I

, let

H_{0}^{i} : = X^{i}, H_{t}^{i} : = {(K^{i})}^{t} \times X^{i}

be the set of all histories up to time t and

Φ^{i}

be the set of all stochastic kernels on

A^{i}

given

X^{i}

for each i, where

{(K^{i})}^{t}

denotes the t-power product space of

K^{i}

.

Definition 1.

(1): A randomized history-dependent strategy for player i is a sequence $π^{i} : = {π_{t}^{i},$ $t = 0, 1, 2, \dots}$ of stochastic kernels $π_{t}^{i}$ on the action space $A^{i}$ given $H_{t}^{i}$ satisfying

$π_{t}^{i} (A^{i} (x_{t}^{i}) | h_{t}^{i}) = 1 \forall h_{t}^{i} t = 0, 1, 2, \dots .$
(2): A randomized history-dependent strategy $π^{i} = {π_{t}^{i}} \in Π_{h}^{i}$ for player i is said to be randomized stationary if there is a stochastic kernel $φ^{i} \in Φ^{i}$ such that $π_{t}^{i} (\cdot | h_{t}^{i}) = φ^{i} (\cdot | x_{t}^{i})$ for each $h_{t}^{i} \in H_{t}$ . We will write such a stationary strategy as $φ^{i}$ .

Assumption 1.

The players do not observe their costs, that is, the strategy chosen by any player does not depend on the realization of the cost.

Remark 1.

Assumption 1 is imposed on [6,7], which is used to ensure that a player could not use the one-stage costs to estimate the state and the action of the other players.

The sets of all randomized history-dependent strategies and randomized stationary strategies for player i are denoted by

Π_{h}^{i}

and

Π_{s}^{i}

, respectively. A multi-strategy is a vector

π : = (π^{1}, \dots, π^{N}) \in Π_{h}

, where

Π_{h} : = \times_{i = 1}^{N} Π_{h}^{i}

. Let

Π_{s} : = \times_{i = 1}^{N} Π_{s}^{i}

denote the set of all randomized stationary multi-strategies.

Let

Ω^{i} : = {(X^{i} \times A^{i})}^{\infty}

and

F^{i}

be the corresponding product

σ

-algebra. Then, for each

π^{i} \in Π_{h}^{i}

and each initial distribution

ν^{i} \in P (X^{i})

, the well-known Tulcea’s Theorem ([16], p. 178) ensures the existence of a unique probability measure

P_{ν^{i}}^{π^{i}}

on

(Ω^{i}, F^{i})

such that, for each

B^{i} \in B (X^{i})

and

h_{t}^{i} \in H_{t}^{i}

,

\begin{matrix} P_{ν^{i}}^{π^{i}} (x_{t + 1}^{i} \in B^{i} | h_{t}^{i}, a_{t}^{i}) = Q^{i} (B^{i} | x_{t}^{i}, a_{t}^{i}), t = 0, 1, 2, \dots, \end{matrix}

(2)

where

x_{t}^{i}

and

a_{t}^{i}

denote the state and the action at the decision epoch t, respectively. The expectation operator with respect to

P_{ν^{i}}^{π^{i}}

is denoted by

E_{ν^{i}}^{π^{i}}

. If

ν^{i}

is concentrated at some state

x^{i} \in X^{i}

, we will write

P_{ν^{i}}^{π^{i}}

and

E_{ν^{i}}^{π^{i}}

as

P_{x^{i}}^{π^{i}}

and

E_{x^{i}}^{π^{i}}

, respectively. For each

π : = (π^{1}, \dots, π^{N}) \in Π_{h}

, the product measure of

P_{ν^{i}}^{π^{i}}

is denoted by

{\tilde{P}}_{ν}^{π} : = \times_{i = 1}^{N} P_{ν^{i}}^{π^{i}}

and the corresponding expectation operator is denoted by

{\tilde{E}}_{ν}^{π}

. For each

{\hat{π}}^{i} \in Π_{h}^{i}

, we denote by

[π^{- i}, {\hat{π}}^{i}]

the N-vector multi-strategy obtained from

π

by replacing

π^{i}

with

{\hat{π}}^{i}

. Similarly, for each

a^{i} \in A^{i}

, we denote by

[π^{- i}, a^{i}]

the N-vector in which the jth component is

π^{j}

for

j \neq i

, while the ith component is

a^{i}

.

For each

ν \in \times_{i = 1}^{N} P (X^{i})

and

π \in Π_{h}

, the expected average criteria are defined, for each player

i \in I

and

0 \leq k \leq p

, as

V_{k}^{i} (ν, π) : = \underset{n \to \infty}{lim sup} \frac{1}{n} {\tilde{E}}_{ν}^{π} [\sum_{t = 0}^{n - 1} c_{k}^{i} (x_{t}, a_{t})],

where

x_{t}

and

a_{t}

denote the state vector and action vector at time t, respectively.

Definition 2.

(1): For a fixed $π = (π^{1}, \dots π^{N}) \in Π_{h}$ , a strategy $π^{' i} \in Π_{h}^{i}$ is said to be feasible for player i against π if $V_{k}^{i} (ν, [π^{- i}, π^{' i}]) \leq d_{k}^{i}$ for each $1 \leq k \leq p$ . Let

$Δ^{i} (π) : = \{π^{' i} \in Π_{h}^{i} | V_{k}^{i} (ν, [π^{- i}, π^{' i}]) \leq d_{k}^{i}, for each 1 \leq k \leq p\},$

be the set of all feasible strategies for player i against π.
(2): $π : = (π^{1}, \dots, π^{N}) \in Π_{h}$ is called a feasible multi-strategy for $G$ if $π^{i}$ is in $Δ^{i} (π)$ for each $i \in I$ . We denote by Δ the set of all feasible multi-strategies.
(3): (Nash equilibrium) $π^{*}$ is called a constrained Nash equilibrium if $π^{*} \in Δ$ and

$V_{0}^{i} (ν, π^{*}) = inf_{π^{' i} \in Δ^{i} (π^{*})} V_{0}^{i} (ν, [π^{* - i}, π^{' i}]) .$

A constrained Nash equilibrium $π^{*}$ is said to be stationary if $π^{*}$ belongs to $Π_{s}$ .

3. The Technique Preliminary

A monotone nondecreasing function

f : X^{i} \to [1, \infty)

such that

{lim}_{x^{i} \to \infty} f (x^{i}) = \infty

will be refereed to as a Lyapunov function on

X^{i}

for each

i \in I

. For each function g and constant

τ

, we denote by

g^{τ}

the

τ

power of the function g.

To guarantee the finiteness of the expected average costs, we need the following drift conditions, which is widely used in References [14,16,17,18] for discrete-time Markov decision processes and in References [5,10] for stochastic games.

Assumption 2.

For each

i \in I

, there exist a Lyapunov function

w_{i} \geq 1

on

X^{i}

, constants

b_{i} \geq 0, 1 > β_{i} > 0

and

τ > 1

such that

(1): $\sum_{y^{i} \in X^{i}} w_{i}^{τ} (x^{i}) Q^{i} (y^{i} | x^{i}, a^{i}) \leq β_{i} w_{i}^{τ} (x^{i}) + b_{i}$ ;
(2): $\sum_{x^{i} \in X^{i}} w_{i}^{τ} (x^{i}) ν^{i} (x^{i}) < \infty$ .

By Jensen’s inequality, it follows that there exist constants

β_{i} (τ^{'}) : = β_{i}^{\frac{τ^{'}}{τ}} > 0

and

b_{i} (τ^{'}) : = b_{i}^{\frac{τ^{'}}{τ}} > 0

(depending on

τ^{'}

) for each

0 < τ^{'} < τ

such that

\sum_{y^{i} \in X^{i}} Q^{i} (y^{i} | x^{i}, a^{i}) w_{i}^{τ^{'}} (y^{i}) \leq β_{i} (τ^{'}) w_{i}^{τ^{'}} (x^{i}) + b_{i} (τ^{'}), for each (x^{i}, a^{i}) \in K^{i} .

(3)

Now, we give two different kinds of hypotheses imposed on the cost functions to ensure the existence of Nash equilibria.

Assumption 3.

(1): (Growth condition) There exists a constant $M_{1} > 0$ such that $| c_{k}^{i} (x^{1}, \dots, x^{N}, a^{1}, \dots, a^{N}) | \leq M_{1} min_{i \in I} {w_{i} (x^{i})}$ for each $(x^{1}, \dots, x^{N}, a^{1}, \dots, a^{N}) \in K$ and $0 \leq k \leq p$ .
(2): The function $c_{k}^{i} (x^{1}, \dots, x^{N}, a^{1}, \dots, a^{N})$ is continuous in $(a^{1}, \dots, a^{N}) \in \times_{i = 1}^{N} A^{i} (x^{i})$ for each fixed $(x^{1}, \dots, x^{N}) \in X$ .

Assumption 4.

Suppose that there exist a constant

M_{2} > 0

and functions

f_{k}^{i, j}

on

K^{j}

with

j \in I

for each

i \in I

and

0 \leq k \leq p

, such that

(1): $c_{k}^{i} (x^{1}, \dots, x^{N}, a^{1}, \dots, a^{N}) = \sum_{j = 1}^{N} f_{k}^{i, j} (x^{j}, a^{j})$ and $| f_{k}^{i, j} (x^{j}, a^{j}) | \leq M_{2} w_{j} (x^{j})$ ;
(2): $f_{k}^{i, j} (x^{j}, \cdot)$ is continuous in $a^{j} \in A^{j} (x^{j})$ for each $x^{j} \in S^{j}$ .

Remark 2.

(1): It should be mentioned that the cost functions in References [6,7,9] are assumed to be bounded. However, the cost functions are always unbounded in some controlled stochastic models such as the stochastic inventory model, queueing system; see, for instance, References [14,15].
(2): As in Assumption 4, the case that the immediate costs are additive is also considered in References [7,19] for discrete-time unconstrained stochastic games.

Lemma 1.

Suppose that Assumptions 2 and 3(1) (resp. 4(1)) hold. For each

i \in I

,

π^{i} \in Π_{h}^{i}

and

x^{i} \in X^{i}

,

\underset{n \to \infty}{lim sup} \frac{1}{n} E_{ν^{i}}^{π^{i}} [\sum_{t = 0}^{n - 1} w_{i}^{τ} (x_{t}^{i})] \leq \frac{b_{i}}{1 - β_{i}},

(4)

V_{k}^{i} (ν, π) \leq M_{1} \frac{b_{i}}{1 - β_{i}} (resp . V_{k}^{i} (ν, π) \leq M_{2} \sum_{j = 1}^{N} \frac{b_{j}}{1 - β_{j}}) .

(5)

Proof.

(4) and (5) follow directly from Lemma 10.4.1 in Reference [17]. □

For each function

u \in B (X^{i})

(resp.

u \in B (K^{i})

) and

1 \leq f \in B (X^{i})

, let us define the f-norm

{| | u | |}_{f}

by

{| | u | |}_{f} : = {sup}_{x^{i} \in X^{i}} \frac{| u (x^{i}) |}{f (x^{i})}

(resp.

{| | u | |}_{f} : = {sup}_{x^{i} \in X^{i}} \frac{| u (x^{i}, a^{i}) |}{f (x^{i})}

) and the Banach space

B_{f} (X^{i}) : = \{u \in B (X^{i}) {| | | u | |}_{f} < \infty\}

(resp.

B_{f} (K^{i}) : = \{u \in B (K^{i}) {| | | u | |}_{f} < \infty\}

). The set of all bounded measurable functions on

X^{i}

is denoted by

B_{1} (X^{i})

.

For each

i \in I

and

φ^{i} \in Π_{s}^{i}

, let us define

Q^{i} (y^{i} | x^{i}, φ^{i}) : = \int_{A^{i}} Q^{i} (y^{i} | x^{i}, a^{i}) φ^{i} (d a^{i} | x^{i}) .

For each

t \geq 1

, let

Q^{i} (x^{i} | y^{i}, φ^{i}, t)

denote the t-step transition probability from

y^{i}

to

x^{i}

corresponding to

φ^{i}

. Obviously,

Q^{i} (x^{i} | y^{i}, φ^{i}, 1) = Q^{i} (y^{i} | x^{i}, φ^{i})

.

Definition 3.

For each

i \in I

and

φ^{i} \in Π_{s}^{i}

, the transition kernel

Q^{i} (\cdot | \cdot, φ)

is said to be irreducible, if

P_{x^{i}}^{φ^{i}} (x_{t}^{i} = y^{i} for some t \geq 1) > 0

for all

x^{i}, y^{i} \in X^{i}

.

Assumption 5.

For each

i \in I

,

(1): let $φ^{i} \in Π_{s}^{i}$ , there exist a probability measure $μ_{φ^{i}}^{i}$ on $X^{i}$ and constants $R_{i} > 0$ and $ρ_{i} > 0$ such that

$| \sum_{y^{i} \in X^{i}} u (y^{i}) Q^{i} (y^{i} | x^{i}, φ^{i}, t) - μ_{φ^{i}}^{i} (u) {| \leq | | u | |}_{w_{i}^{τ}} R_{i} ρ_{i}^{t} w_{i}^{τ} (x^{i})$

for each $u \in B_{w_{i}^{τ}} (X^{i})$ ;
(2): $Q^{i} (\cdot | \cdot, φ)$ is irreducible for all $φ^{i} \in Π_{s}^{i}$ .

Remark 3.

(1): Under Assumptions 2 and 5, it is easy to see that $μ_{φ^{i}}^{i}$ is the unique invariant probability measure of the transition kernel $Q^{i} (y^{i} | x^{i}, φ^{i})$ , see Reference ([17], p.12). Moreover, we have $μ_{φ^{i}}^{i} (w_{i}^{τ^{'}}) : = \sum_{y^{i} \in X^{i}} w_{i}^{τ^{'}} (y^{i}) μ_{φ^{i}}^{i} (y^{i}) \leq \frac{b_{i}}{1 - β_{i}} < \infty$ for each $0 < τ^{'} \leq τ$ and $μ_{φ^{i}}^{i} (x^{i}) > 0$ for each $x^{i} \in X^{i}$ .
(2): The w-uniform geometrical ergodicity condition in Assumption 5(1) has been widely used in discrete-time Markov decision processes, see References [17,18] and the reference therein. Since the state space in References [6,7] is finite, the standard ergodicity condition is only required in References [6,7]. Moreover, as the cost functions in Reference [9] are assumed to be bounded, the weaker uniform geometrical ergodicity condition is imposed on [9].

For each

i \in I

, we define the average occupation measure

{\tilde{μ}}_{φ^{i}}^{i} (x^{i}, d a^{i}) : = μ_{φ^{i}}^{i} (x^{i}) φ^{i} (d a^{i} | x^{i})

for each

φ^{i} \in Π_{s}^{i}

and the set

N^{i} : = {{\tilde{μ}}_{φ^{i}}^{i} | φ^{i} \in Π_{s}^{i}} and N : = \times_{i = 1}^{N} N^{i} .

For each

\tilde{μ} = ({\tilde{μ}}_{φ^{1}}^{1}, \dots, {\tilde{μ}}_{φ^{N}}^{N}) \in \times_{i = 1}^{N} N^{i}

and

i \in I

, we introduce the following Markov decision processes

M_{\tilde{μ}}^{i}

:

M_{\tilde{μ}}^{i} : = \{X^{i}, (A^{i}, {A^{i} (x^{i}) | x^{i} \in X^{i}}), Q^{i} (y^{i} | x^{i}, a^{i}), c_{0, \tilde{μ}}^{i} (x^{i}, a^{i}), (c_{k, \tilde{μ}}^{i} (x^{i}, a^{i}), d_{k}^{i}, 1 \leq k \leq p)\}

where

\begin{matrix} c_{k, \tilde{μ}}^{i} (x^{i}, a^{i}) : = & \int_{K^{N}} {\tilde{μ}}_{φ^{N}}^{N} (x^{N}, d a^{N}) \dots \int_{K^{i + 1}} {\tilde{μ}}_{φ^{i + 1}}^{i + 1} (x^{i + 1}, d a^{i + 1}) \int_{K^{i - 1}} {\tilde{μ}}_{φ^{i - 1}}^{i - 1} (x^{i - 1}, d a^{i - 1}) \\ \dots \int_{K^{1}} {\tilde{μ}}_{φ^{1}}^{1} (x^{1}, d a^{1}) c_{k}^{i} (x^{1}, \dots, x^{N}, a^{1}, \dots, a^{N}), \end{matrix}

for each

0 \leq k \leq p

and

(x^{i}, a^{i}) \in K^{i}

, the other components are the same as in (1).

For each

i \in I

and

π^{i} \in Π_{h}^{i}

, the expected average cost is defined by

V_{k, \tilde{μ}}^{i} (ν^{i}, π^{i}) : = \underset{n \to \infty}{lim sup} \frac{1}{n} E_{ν^{i}}^{π^{i}} [\sum_{t = 0}^{n - 1} c_{k, \tilde{μ}}^{i} (x_{t}^{i}, a_{t}^{i})] for each 0 \leq k \leq p .

Definition 4.

For a fixed

i \in I

, strategy

π^{i} \in Π_{h}^{i}

is said to be feasible for

M_{\tilde{μ}}^{i}

if

V_{k, \tilde{μ}}^{i} (ν^{i}, π^{i}) \leq d_{k}^{i} for each 1 \leq k \leq p .

Let

U^{i} (\tilde{μ})

be the set of all feasible strategies of

M_{\tilde{μ}}^{i}

. A strategy

π^{i} \in Π_{h}^{i}

is said to be optimal for

M_{\tilde{μ}}^{i}

if

π^{i} \in U^{i} (\tilde{μ})

and

V_{0, \tilde{μ}}^{i} (ν^{i}, π^{i}) = inf_{π^{' i} \in U^{i} (\tilde{μ})} V_{0, \tilde{μ}}^{i} (ν^{i}, π^{' i}) .

We denote by

V_{\tilde{μ}}^{* i}

the optimal value of

M_{\tilde{μ}}^{i}

.

For each

i \in I

and

\tilde{μ} \in N

, under Assumptions 2, 3 (resp. 4) and 5, we have

V_{k, \tilde{μ}}^{i} (ν^{i}, φ^{i}) = \int_{K^{i}} c_{k, \tilde{μ}}^{i} (x^{i}, a^{i}) {\tilde{μ}}_{φ^{i}}^{i} (x^{i}, d a^{i}) for each φ^{i} \in Π_{s}^{i} .

Now, we introduce the following constrained optimality problem

\begin{matrix} minimize & V_{0, \tilde{μ}}^{i} (ν^{i}, π^{i}) \\ subject to & π^{i} \in Π_{h}^{i}, V_{k, \tilde{μ}}^{i} (ν^{i}, π^{i}) \leq d_{k}^{i}, 1 \leq k \leq p . \end{matrix}

(6)

Lemma 2.

Suppose that Assumptions 1, 2 and 5 are satisfied and in addition either Assumption 3 or 4 holds. Let

\tilde{μ} : = {{\tilde{μ}}^{1}, \dots, {\tilde{μ}}^{N}} \in N

and

φ : = (φ^{1}, \dots, φ^{N})

with

{\tilde{μ}}^{i} (x^{i}, d a^{i}) = {\hat{\tilde{μ}}}^{i} (x^{i}) φ^{i} (d a^{i} | x^{i})

for each

i \in I

. Then

V_{k, \tilde{μ}}^{i} (ν^{i}, π^{i}) = V_{k}^{i} (ν, [φ^{- i}, π^{i}]) for each π^{i} \in Π_{h}^{i} and 0 \leq k \leq p .

Proof.

We only prove for

i = 1

. Let us define

\begin{matrix} c_{k}^{1} (x^{1}, \dots, x^{N}, [φ^{- 1}, a^{1}]) & : = & \int_{A^{2}} φ^{2} (d a^{2} | x^{2}) \dots \int_{A^{N}} c_{k}^{1} (x^{1}, \dots, x^{N}, a) φ^{N} (d a^{N} | x^{N}), \end{matrix}

p_{ν^{j}} (t, x^{j}, φ^{j}) : = \sum_{y^{j} \in X^{j}} ν^{j} (y^{j}) Q^{j} (x^{j} | y^{j}, φ^{j}, t) for each φ^{j} \in Π_{s}^{j} .

First, we assume Assumption 3 holds. It follows from Assumption 5 that, for each player

j \neq 1

, we have

\begin{matrix} | \sum_{x^{j} \in X^{j}} {\tilde{μ}}_{φ^{j}}^{j} (x^{j}) c_{k}^{1} (x^{1}, \dots, x^{N}, [φ^{- 1}, a^{1}]) - \sum_{x^{j} \in X^{j}} p_{ν^{j}} (t, x^{j}, φ^{j}) c_{k}^{1} (x^{1}, \dots, x^{N}, [φ^{- 1}, a^{1}]) | \\ \leq & ρ_{j}^{t} R_{j} | | c_{k}^{1} (x^{1}, \dots, x^{j - 1}, x^{j + 1}, \dots, x^{N}, [φ^{- 1}, a^{1}]) {| |}_{w_{j}^{τ}} \sum_{x^{j} \in X^{j}} ν^{j} (x^{j}) w_{j}^{τ} (x^{j}) \\ \leq & ρ_{j}^{t} R_{j} M_{1} \sum_{x^{j} \in X^{j}} ν^{j} (x^{j}) w_{j}^{τ} (x^{j}), \end{matrix}

where

| | c_{k}^{1} (x^{1}, \dots, x^{j - 1}, x^{j + 1}, \dots, x^{N}, [φ^{- 1}, a^{1}]) {| |}_{w_{j}^{τ}} : = sup_{x^{j} \in X^{j}} \frac{| c_{k}^{1} (x^{1}, \dots, x^{N}, [φ^{- 1}, a^{1}]) |}{w_{j}^{τ} (x^{j})}

for each

0 \leq k \leq p

. For notational ease, we define

\prod_{n = 2}^{j - 1} {\hat{\tilde{μ}}}^{n} (x^{n}) : = 1

for

j = 2

and

\prod_{n = j + 1}^{N} p_{ν^{n}} (t, x^{n}, φ^{n}) : = 1

for

j = N

and

X^{- 1} : = \prod_{i = 2}^{N} X^{i}

. Hence,

\begin{matrix} | E_{ν^{1}}^{π^{1}} [c_{k, \tilde{μ}}^{1} (x_{t}^{1}, a_{t}^{1})] - {\tilde{E}}_{ν}^{[φ^{- 1}, π^{1}]} [c_{k}^{1} (x_{t}, [φ^{- 1}, a_{t}^{1}])] | \\ = & | \int_{K^{1}} P_{ν^{1}}^{π^{1}} (x_{t}^{1}, d a_{t}^{1}) \sum_{x^{2} \in X^{2}} {\hat{\tilde{μ}}}^{2} (x^{2}) \dots \sum_{x^{N} \in X^{N}} {\hat{\tilde{μ}}}^{N} (x^{N}) c_{k}^{1} (x_{t}^{1}, \dots, x^{N}, [φ^{- 1}, a_{t}^{1}]) \\ - \int_{K^{1}} P_{ν^{1}}^{π^{1}} (x_{t}^{1}, d a_{t}^{1}) \sum_{x^{2} \in X^{2}} p_{ν^{2}} (t, x^{2}, φ^{2}) \dots \sum_{x^{N} \in X^{N}} p_{ν^{N}} (t, x^{N}, φ^{N}) c_{k}^{1} (x_{t}^{1}, \dots, x^{N}, [φ^{- 1}, a_{t}^{1}]) | \\ \leq & | \sum_{j = 2}^{N} \int_{K^{1}} P_{ν^{1}}^{π^{1}} (x_{t}^{1}, d a_{t}^{1}) \sum_{(x^{2}, \dots, x^{N}) \in X^{- 1}} \prod_{n = 2}^{j - 1} {\hat{\tilde{μ}}}^{n} (x^{n}) \cdot \prod_{n = j + 1}^{N} p_{ν^{n}} (t, x^{n}, φ^{n}) \\ \cdot ({\hat{\tilde{μ}}}^{j} (x^{j}) - p_{ν^{j}} (t, x^{j}, φ^{j})) c_{k}^{1} (x_{t}^{1}, \dots, x^{N}, [φ^{- 1}, a_{t}^{1}]) | \\ \leq & \sum_{j = 2}^{N} [R_{j} ρ_{j}^{t} M_{1} \sum_{x^{j} \in X^{j}} ν^{j} (x^{j}) w_{j}^{τ} (x^{j})] . \end{matrix}

Thus, we have that

\begin{matrix} \frac{1}{n} \sum_{t = 0}^{n - 1} | E_{ν^{1}}^{π^{1}} [c_{k, \tilde{μ}}^{1} (x_{t}^{1}, a_{t}^{1})] - {\tilde{E}}_{ν}^{[φ^{- 1}, π^{1}]} [c_{k}^{1} (x_{t}, [φ^{- 1}, a_{t}^{1}])] | \\ \leq & \frac{1}{n} \sum_{j = 2}^{N} (R_{j} M_{1} \sum_{x^{j} \in X^{j}} ν^{j} (x^{j}) w_{j}^{τ} (x^{j})) \cdot \sum_{t = 0}^{n - 1} ρ_{j}^{t} \to 0, as n \to \infty, \end{matrix}

which implies the desired result. On the other hand, if Assumption 4 holds,

\begin{matrix} V_{k}^{1} (ν, [φ^{- 1}, π^{1}]) \\ = & \underset{n \to \infty}{lim sup} \frac{1}{n} \sum_{t = 0}^{n - 1} [\sum_{j = 2}^{N} \sum_{x^{j} \in X^{j}} f_{k}^{1, j} (x^{j}, φ^{j}) p_{ν^{j}} (t, x^{j}, φ^{j}) + \int_{K^{1}} f_{k}^{1, 1} (x_{t}^{1}, a_{t}^{1}) P_{ν^{1}}^{π^{1}} (x_{t}^{1}, d a_{t}^{1})] \\ = & \sum_{j = 2}^{N} \sum_{x^{j} \in X^{j}} f_{k}^{1, j} (x^{j}, φ^{j}) {\hat{\tilde{μ}}}^{j} (x^{j}) + \underset{n \to \infty}{lim sup} \frac{1}{n} \sum_{t = 0}^{n - 1} [\int_{K^{1}} f_{k}^{1, 1} (x_{t}^{1}, a_{t}^{1}) P_{ν^{1}}^{π^{1}} (x_{t}^{1}, d a_{t}^{1})] \\ = & \underset{n \to \infty}{lim sup} \frac{1}{n} \sum_{t = 0}^{n - 1} [\sum_{j = 2}^{N} \sum_{x^{j} \in X^{j}} f_{k}^{1, j} (x^{j}, φ^{j}) {\hat{\tilde{μ}}}^{j} (x^{j}) + \int_{K^{1}} f_{k}^{1, 1} (x_{t}^{1}, a_{t}^{1}) P_{ν^{1}}^{π^{1}} (x_{t}^{1}, d a_{t}^{1})] \\ = & \underset{n \to \infty}{lim sup} \frac{1}{n} \sum_{t = 0}^{n - 1} [\int_{K^{1}} P_{ν^{1}}^{π^{1}} (x_{t}^{1}, d a_{t}^{1}) c_{k, \tilde{μ}}^{1} (x_{t}^{1}, a_{t}^{1})] = V_{k, \tilde{μ}}^{1} (ν^{1}, π^{1}) . \end{matrix}

□

Let

w_{i}

be the function as in Assumption 2 for each

i \in I

. Let us define

C_{w_{i}} (K^{i}) : = {u \in B_{w_{i}} (K^{i}) | u is continuous on K^{i}}

and

P_{w_{i}} (K^{i}) : = {μ \in P (K^{i}) | \int_{K^{i}} w_{i} (x^{i}) μ (x^{i}, d a^{i}) < \infty}

.

Definition 5.

For each

i \in I

, let

{μ_{n}} \subseteq P_{w_{i}} (K^{i})

.

μ_{n}

is said to converge to

μ \in P (K^{i})

with respect to the

w_{i}

-weak topology if and only if

{lim}_{n \to \infty} \int_{K^{i}} g (x^{i}, a^{i}) μ_{n} (x^{i}, d a^{i}) = \int_{K^{i}} g (x^{i}, a^{i}) μ (x^{i}, d a^{i})

for each

g \in C_{w_{i}} (K^{i}) .

In this case, we denote it by

μ_{n} \overset{w_{i}}{\to} μ

.

Let

w : = (w_{1}, \dots, w_{N})

and

{μ_{n}} \subseteq \times_{i = 1}^{N} P_{w_{i}} (K^{i})

with

μ_{n} = (μ_{n}^{1}, \dots, μ_{n}^{N})

, we say

μ_{n} \overset{w}{\to} μ = : (μ^{1}, \dots, μ^{N})

if

μ_{n}^{i} \overset{w_{i}}{\to} μ^{i}

for each

i \in I

. For each

i \in I

and

μ^{i} \in P_{w_{i}} (K^{i})

, we define

{\hat{μ}}^{i} (x^{i}) : = μ^{i} (x^{i}, A^{i} (x^{i}))

for each

x^{i} \in X^{i}

.

To ensure the existence of constrained Nash equilibria, we also need the following standard continuity-compactness conditions which are widely used; see, for instance, References [4,14,16,17,19] and the reference therein.

Assumption 6.

(1): For each $i \in I$ and $x^{i} \in X^{i}$ , $A^{i} (x^{i})$ is a compact set.
(2): The functions $Q^{i} (y^{i} | x^{i}, a^{i})$ is continuous in $a^{i} \in A^{i} (x^{i})$ for each fixed $i \in I$ and $x^{i}, y^{i} \in X^{i}$ .
(3): The functions $\sum_{y^{i} \in X^{i}} w_{i} (y^{i}) Q^{i} (y^{i} | x^{i}, a^{i})$ is continuous in $a^{i} \in A^{i} (x^{i})$ for each fixed $i \in I$ and $x^{i} \in X^{i}$ .

Lemma 3.

Suppose that Assumptions 2, 5 and 6 hold. For each

i \in I

, the set

N^{i}

is convex and compact with respect to the

w_{i}

-weak topology.

Proof.

Fix

1 \leq τ^{'} < τ

, it follows from Remark 3 that

sup_{{\tilde{μ}}^{i} \in N^{i}} {\int_{K^{i}} w_{i}^{τ} (x^{i}) {\tilde{μ}}^{i} (x^{i}, d a^{i})} \leq \frac{b_{i}}{1 - β_{i}}

and the set

{(x^{i}, a^{i}) \in K^{i} | w_{i}^{τ} (x^{i}) \leq n w_{i}^{τ^{'}} (x^{i})}

is compact in

K^{i}

for each

n \geq 1

. Hence, it follows from Corollary A.46 in Reference [20], the set

N^{i}

is compact with respect to the

w_{i}

-weak topology. The other statement can be obtained by Lemma 5.2.2 in Reference [18]. □

Remark 4.

The standard weak convergence technique used in References [6,7] for bounded costs does not apply directly to the case wherein costs are unbounded in this paper.

Lemma 4.

Suppose that Assumptions 1, 2, 5 and 6 are satisfied and in addition either Assumption 3 or 4 holds. Let

i \in I

be fixed,

{{\tilde{μ}}_{n}} \subseteq N

and

{η_{n}} \subseteq N^{i}

, such that

{\tilde{μ}}_{n} \overset{w}{\to} {\tilde{μ}}_{\infty}

and

η_{n} \overset{w_{i}}{\to} η_{\infty}

weakly in

P (K^{i})

. Then,

lim_{n \to \infty} \int_{K^{i}} c_{k, {\tilde{μ}}_{n}}^{i} (x^{i}, a^{i}) η_{n} (x^{i}, d a^{i}) = \int_{K^{i}} c_{k, {\tilde{μ}}_{\infty}}^{i} (x^{i}, a^{i}) η_{\infty} (x^{i}, d a^{i})

for each

0 \leq k \leq p

.

Proof.

For each

n \in \bar{N}

, we assume

{\tilde{μ}}_{n} : = ({\tilde{μ}}_{n}^{1}, \dots, {\tilde{μ}}_{n}^{N})

. According to proposition D.8 in Reference [16], there exist

φ_{n}^{i}

and

{\tilde{φ}}_{n}^{i} \in Π_{s}^{i}

such that

{\tilde{μ}}_{n}^{i} (x^{i}, d a^{i}) = {\hat{\tilde{μ}}}_{n}^{i} (x^{i}) φ_{n}^{i} (d a^{i} | x^{i})

and

η_{n} (x^{i}, d a^{i}) = {\hat{η}}_{n} (x^{i}) {\tilde{φ}}_{n}^{i} (d a^{i} | x^{i})

for each

i \in I

. For an arbitrary

u \in B_{1} (X^{i})

, it follows from Lemma 5.1.2 in Reference [18] that

\int_{K^{i}} (\sum_{y^{i} \in X^{i}} u (y^{i}) Q^{i} (y^{i} | x^{i}, a^{i}) - u (x^{i})) {\tilde{μ}}_{n}^{i} (x^{i}, d a^{i}) = 0 .

Since

{\tilde{μ}}_{n}^{i} \overset{w_{i}}{\to} {\tilde{μ}}_{\infty}^{i}

, it follows from Assumption 6(2) that

\begin{matrix} lim_{n \to \infty} \int_{K^{i}} (\sum_{y^{i} \in X^{i}} u (y^{i}) Q^{i} (y^{i} | x^{i}, a^{i}) - u (x^{i})) {\tilde{μ}}_{n}^{i} (x^{i}, d a^{i}) \\ = & \int_{K^{i}} (\sum_{y^{i} \in X^{i}} u (y^{i}) Q^{i} (y^{i} | x^{i}, a^{i}) - u (x^{i})) {\tilde{μ}}_{\infty}^{i} (x^{i}, d a^{i}) = 0, \end{matrix}

which together with Lemma 5.1.2 in Reference [18] implies that

{\tilde{μ}}_{\infty}^{i} \in N^{i}

for each

i \in I

. Similarly, we also have

η_{\infty} \in N^{i}

. Thus, we can see that

lim_{n \to \infty} {\hat{\tilde{μ}}}_{n}^{i} (x^{i}) = {\hat{\tilde{μ}}}_{\infty}^{i} (x^{i}) > 0 and lim_{n \to \infty} {\hat{η}}_{n} (x^{i}) = {\hat{η}}_{\infty} (x^{i}) > 0, for each x^{i} \in X^{i},

(7)

{\tilde{φ}}_{n}^{i} (\cdot | x^{i}) \to {\tilde{φ}}_{\infty}^{i} (\cdot | x^{i})

and

φ_{n}^{i} (\cdot | x^{i}) \to φ_{\infty}^{i} (\cdot | x^{i})

weakly in

P (A^{i} (x^{i}))

for each

x^{i} \in X^{i}

. It follows from Assumption 3(2) or Assumption 4 that

lim_{n \to \infty} c_{k}^{i} (x^{1}, \dots, x^{N}, [φ_{n}^{- i}, {\tilde{φ}}_{n}^{i}]) = c_{k}^{i} (x^{1}, \dots, x^{N}, [φ_{\infty}^{- i}, {\tilde{φ}}_{\infty}^{i}]),

(8)

for each

(x^{1}, \dots, x^{N}) \in X

, where

φ_{n} : = (φ_{n}^{1}, \dots, φ_{n}^{N})

for each

n \in \bar{N}

.

Under Assumptions 2(2) and 5, we have that

\sum_{x^{1} \in X^{1}} {\hat{\tilde{μ}}}_{\infty}^{1} (x^{1}) w_{1} (x^{1}) < \infty and lim_{n \to \infty} \sum_{x^{i} \in X^{i}} w_{i} (x^{i}) {\hat{η}}_{n} (x^{i}) = \sum_{x^{i} \in X^{i}} w_{i} (x^{i}) {\hat{η}}_{\infty} (x^{i}) \forall i \in I .

(9)

Then, under Assumption 3(2), by (7)–(9) and Proposition A.2.6 in Reference [15], it follows that

\begin{matrix} lim_{n \to \infty} \int_{K^{i}} c_{k, {\tilde{μ}}_{n}}^{i} (x^{i}, a^{i}) η_{n} (x^{i}, d a^{i}) \\ = & lim_{n \to \infty} \sum_{x^{i} \in X^{i}} {\hat{η}}_{n} (x^{i}) \int_{A^{i} (x^{i})} {\tilde{φ}}_{n}^{i} (d a^{i} | x^{i}) \sum_{x^{N} \in X^{N}} {\hat{\tilde{μ}}}_{n}^{N} (x^{N}) \int_{A^{N} (x^{N})} φ_{n}^{N} (d a^{N} | x^{N}) \dots \sum_{x^{i + 1} \in X^{i + 1}} {\hat{\tilde{μ}}}_{n}^{i + 1} (x^{i + 1}) \\ \int_{A^{i + 1} (x^{i + 1})} φ_{n}^{i + 1} (d a^{i + 1} | x^{i + 1}) \sum_{x^{i - 1} \in X^{i - 1}} {\hat{\tilde{μ}}}_{n}^{i - 1} (x^{i - 1}) \int_{A^{i - 1} (x^{i - 1})} φ_{n}^{i - 1} (d a^{i - 1} | x^{i - 1}) \\ \dots \int_{A^{1} (x^{1})} φ_{n}^{1} (d a^{1} | x^{1}) \sum_{x^{1} \in X^{1}} {\hat{\tilde{μ}}}_{n}^{1} (x^{1}) c_{k}^{i} (x^{1}, \dots, x^{N}, a^{1}, \dots, a^{N}) \\ = & \int_{K^{i}} c_{k, {\tilde{μ}}_{\infty}}^{i} (x^{i}, a^{i}) η_{\infty} (x^{i}, d a^{i}) . \end{matrix}

If Assumption 3 is replaced by Assumption 4, then we have

\begin{matrix} lim_{n \to \infty} \int_{K^{i}} c_{k, {\tilde{μ}}_{n}}^{i} (x^{i}, a^{i}) η_{n} (x^{i}, d a^{i}) & = & lim_{n \to \infty} [\int_{K^{i}} f_{k}^{i, i} (x^{i}, a^{i}) η_{n} (x^{i}, d a^{i}) + \sum_{j = 1, j \neq i}^{N} \int_{K^{j}} f_{k}^{i, j} (x^{j}, a^{j}) {\tilde{μ}}_{n}^{j} (x^{j}, d a^{j})] \\ = & \int_{K^{i}} f_{k}^{i, i} (x^{i}, a^{i}) η_{\infty} (x^{i}, d a^{i}) + \sum_{j = 1, j \neq i}^{N} \int_{K^{j}} f_{k}^{i, j} (x^{j}, a^{j}) {\tilde{μ}}_{\infty}^{j} (x^{j}, d a^{j}) \\ = & \int_{K^{i}} c_{k, {\tilde{μ}}_{\infty}}^{i} (x^{i}, a^{i}) η_{\infty} (x^{i}, d a^{i}) . \end{matrix}

□

The following slater condition is common for constrained games, see References [2,6,7,8,9,10].

Assumption 7.

(Slater condition) For each stationary multi-strategy

φ \in Π_{s}

and each player i, there exists

{\tilde{π}}^{i} \in Π_{h}^{i}

such that

V_{k}^{i} (ν, [φ^{- i}, {\tilde{π}}^{i}]) < d_{k}^{i}, for each 1 \leq k \leq p .

Lemma 5.

Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let

i \in I

and

\tilde{μ} : = ({\tilde{μ}}^{1}, \dots, {\tilde{μ}}^{N}) \in N

be fixed. Then,

(1): for each $π^{i} \in Π_{h}^{i}$ , there exists a stationary strategy ${\bar{φ}}^{i} \in Π_{s}^{i}$ (depending on $\tilde{μ}$ , $ν^{i}$ and $π^{i}$ ), such that $V_{k, \tilde{μ}}^{i} (ν^{i}, {\bar{φ}}^{i}) \leq V_{k, \tilde{μ}}^{i} (ν^{i}, π^{i})$ for each $0 \leq k \leq p$ ;
(2): there exists a stationary strategy ${\tilde{φ}}^{i} \in Π_{s}^{i}$ such that $V_{k, \tilde{μ}}^{i} (ν^{i}, {\tilde{φ}}^{i}) < d_{k}^{i}$ for each $1 \leq k \leq p$ .

Proof.

(1): See Lemma 5.7.10 in Reference [16].
(2): Let $φ : = (φ^{1}, \dots, φ^{N}) \in Π_{s}$ such that ${\tilde{μ}}^{i} (x^{i}, d a^{i}) = μ_{φ^{i}}^{i} (x^{i}) φ^{i} (d a^{i} | x^{i})$ for each $i \in I$ . For the fixed $φ$ , let ${\tilde{π}}^{i} \in Π_{h}^{i}$ be the corresponding strategy as in Assumption 7. It follows from Lemma 2 that $V_{k, \tilde{μ}}^{i} (ν^{i}, {\tilde{π}}^{i}) = V_{k}^{i} (ν, [φ^{- i}, {\tilde{π}}^{i}]) < d_{k}^{i}$ for each $1 \leq k \leq p$ . By part (1), there exists a stationary strategy ${\tilde{φ}}^{i} \in Π_{s}^{i}$ (depending on $φ$ and ${\tilde{π}}^{i}$ ) such that $V_{k, \tilde{μ}}^{i} (ν^{i}, {\tilde{φ}}^{i}) \leq V_{k, \tilde{μ}}^{i} (ν^{i}, {\tilde{π}}^{i}) = V_{k}^{i} (ν, [φ^{- i}, {\tilde{π}}^{i}]) < d_{k}^{i}$ for each $1 \leq k \leq p$ .

□

It follows from Lemma 5.1.2 in Reference [18] that

η \in N^{i}

if and only if

\sum_{x^{i} \in X^{i}} \int_{A^{i} (x^{i})} (Q^{i} (y^{i} | x^{i}, a^{i}) - I_{{y^{i}}} (x^{i})) η (x^{i}, d a^{i}) = 0, for each i \in I .

According to Lemma 5, the constrained optimality problem (6) can be restricted to the set of all stationary strategies. Hence, for each

\tilde{μ} \in N

and

i \in I

, the constrained optimality problem (6) is equivalent to the following best response linear program

({LP}_{\tilde{μ}}^{i})

:

\begin{matrix} {LP}_{\tilde{μ}}^{i} : & inf_{η \in P_{w_{i}} (K^{i})} \int_{K^{i}} c_{0, \tilde{μ}}^{i} (x^{i}, a^{i}) η (x^{i}, d a^{i}) \\ subject to \{\begin{matrix} \int_{K^{i}} c_{k, \tilde{μ}}^{i} (x^{i}, a^{i}) η (x^{i}, d a^{i}) \leq d_{k}^{i}, 1 \leq k \leq p, \\ \int_{K^{i}} (Q^{i} (y^{i} | x^{i}, a^{i}) - I_{{y^{i}}} (x^{i})) η (x^{i}, d a^{i}) = 0 for each y^{i} \in X^{i} . \end{matrix} \end{matrix}

(10)

For a fixed

i \in I

and

\tilde{μ} \in N

,

η

is said to be feasible for LP

_{\tilde{μ}}^{i}

if it satisfies (10). The sets of all feasible solutions and optimal solutions are denoted by

F^{i} (\tilde{μ})

and

O^{i} (\tilde{μ})

, respectively. The optimal value of

{LP}_{\tilde{μ}}^{i}

is denoted by

inf {LP}_{\tilde{μ}}^{i}

.

Proposition 1.

Suppose that Assumptions 1, 2, 5, 6 and 7 and are satisfied and in addition either Assumption 3 or 4 holds. Let

\tilde{μ} : = ({\tilde{μ}}^{1}, \dots, {\tilde{μ}}^{N}) \in N

and

φ : = (φ^{1}, \dots, φ^{N}) \in Π_{s}

such that

{\tilde{μ}}^{i} (x^{i}, d a^{i}) = {\hat{\tilde{μ}}}^{i} (x^{i}) φ^{i} (d a^{i} | x^{i})

for each

i \in I

. Then,

(1): $φ^{i}$ is an optimal strategy of $M_{\tilde{μ}}^{i}$ for each $i \in I$ if and only if φ is a stationary constrained Nash equilibrium;
(2): ${\tilde{μ}}^{i}$ is an optimal strategy of LP $_{\tilde{μ}}^{i}$ for each $i \in I$ if and only if φ is a stationary constrained Nash equilibrium.

Proof.

(1): ⟹. Let $i \in I$ be fixed and $π^{i} \in Δ^{i} (φ)$ . It follows from Lemma 2 that

$V_{k, \tilde{μ}}^{i} (ν^{i}, π^{i}) = V_{k}^{i} (ν, [φ^{- i}, π^{i}]) \leq d_{k}^{i} for each 1 \leq k \leq p .$

Thus, we have $π^{i} \in U^{i} (\tilde{μ})$ . Then, Lemma 2 yields that

$V_{0}^{i} (ν, [φ^{- i}, π^{i}]) = V_{0, \tilde{μ}}^{i} (ν^{i}, π^{i}) \geq V_{0, \tilde{μ}}^{i} (ν^{i}, φ^{i}) = V_{0}^{i} (ν, φ),$

which implies that $φ$ is a constrained Nash equilibrium.
⟸. For each $i \in I$ , we take an arbitrary strategy $π^{i} \in Π_{h}^{i}$ such that $V_{k, \tilde{μ}}^{i} (ν^{i}, π^{i}) \leq d_{k}^{i}$ for each $1 \leq k \leq p$ . Lemma 2 yields that $V_{k}^{i} (ν, [φ^{- i}, π^{i}]) = V_{k, \tilde{μ}}^{i} (ν^{i}, π^{i}) \leq d_{k}^{i}$ for each $1 \leq k \leq p$ , which implies that $π^{i} \in Δ^{i} (φ)$ . As $φ$ is a constrained Nash equilibrium, it follows that $V_{0, \tilde{μ}}^{i} (ν^{i}, φ^{i}) = V_{0}^{i} (ν, φ) \leq V_{0}^{i} (ν, [φ^{- i}, π^{i}]) \leq V_{0, \tilde{μ}}^{i} (ν^{i}, π^{i})$ , which implies that $φ^{i}$ is an optimal strategy of $M_{\tilde{μ}}^{i}$ .
(2): ${\tilde{μ}}^{i}$ is optimal for LP $_{\tilde{μ}}^{i}$ if and only if $φ^{i}$ is optimal for $M_{\tilde{μ}}^{i}$ . Hence, we get the desired result by part (1). □

Lemma 6.

Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let

i \in I

be fixed and

{{\tilde{μ}}_{n}} \subseteq N

such that

{\tilde{μ}}_{n} \overset{w}{\to} {\tilde{μ}}_{\infty}

, and

η_{n} \in F^{i} ({\tilde{μ}}_{n})

for each

n \in N

. Then,

{η_{n}}

is relatively compact in

N^{i}

with respect to the

w_{i}

-weak topology and the accumulation point is a feasible solution of LP

_{{\tilde{μ}}_{\infty}}^{i}

.

Proof.

It follows from Lemma 3 that there exists a subsequence

{η_{n_{m}}}

such that

η_{n_{m}} \overset{w_{i}}{\to} η_{\infty}

with respect to the

w_{i}

-weak topology. Then, Lemma 4 yields that

\int_{K^{i}} c_{k, {\tilde{μ}}_{\infty}}^{i} (x^{i}, a^{i}) η_{\infty} (x^{i}, d a^{i}) = lim_{m \to \infty} \int_{K^{i}} c_{k, {\tilde{μ}}_{n_{m}}}^{i} (x^{i}, a^{i}) η_{n_{m}} (x^{i}, d a^{i}) \leq d_{k}^{i} .

(11)

As

Q^{i} (\cdot | x^{i}, a^{i}) \in C_{w_{i}} (K^{i})

, it follows that

\int_{K^{i}} (Q^{i} (y^{i} | x^{i}, a^{i}) - I_{{y^{i}}} (x^{i})) η_{\infty} (x^{i}, d a^{i}) = lim_{m \to \infty} \int_{K^{i}} (Q^{i} (y^{i} | x^{i}, a^{i}) - I_{{y^{i}}} (x^{i})) η_{n_{m}} (x^{i}, d a^{i}) = 0,

which together with (11) implies that

η_{\infty} \in F^{i} ({\tilde{μ}}_{\infty})

. □

Lemma 7.

Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let

i \in I

be fixed and take an arbitrary

\tilde{μ} \in N

, then LP

_{\tilde{μ}}^{i}

has an optimal solution.

Proof.

First, it follows from Lemma 5(2) that

F^{i} (\tilde{μ}) \neq \emptyset

. Let

{η_{m}} \subseteq F^{i} (\tilde{μ})

be the minimizing sequence such that

lim_{m \to \infty} \sum_{x^{i} \in X^{i}} \int_{A^{i} (x^{i})} c_{0, \tilde{μ}}^{i} (x^{i}, a^{i}) η_{m} (x^{i}, d a^{i}) = inf {LP}_{\tilde{μ}}^{i} .

(12)

By Lemma 6, there exists a subsequence

{η_{m_{s}}}

of

{η_{m}}

such that

η_{m_{s}} \overset{w_{i}}{\to} η_{\infty} \in F^{i} (\tilde{μ})

. Then, it follows from Lemma 4 and (12) that

\sum_{x^{i} \in X^{i}} \int_{A^{i} (x^{i})} c_{0, \tilde{μ}}^{i} (x^{i}, a^{i}) η_{\infty} (x^{i}, d a^{i}) = lim_{s \to \infty} \sum_{x^{i} \in X^{i}} \int_{A^{i} (x^{i})} c_{0, \tilde{μ}}^{i} (x^{i}, a^{i}) η_{m_{s}} (x^{i}, d a^{i}) = inf {LP}_{\tilde{μ}}^{i} .

This is

η_{\infty} \in O^{i} (\tilde{μ})

. □

The idea of the proof of the following lemma is from Lemma 5.1 and Theorem 3.9 in Reference [21] for constrained discrete-time Markov decision processes with discounted cost criteria.

Lemma 8.

Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let

i \in I

be fixed and

{{\tilde{μ}}_{n}} \subseteq N

such that

{\tilde{μ}}_{n} \overset{w}{\to} {\tilde{μ}}_{\infty}

. Then,

(1): for each $η \in F^{i} ({\tilde{μ}}_{\infty})$ , there exist an integer N and $η_{n} \in F^{i} ({\tilde{μ}}_{n})$ for all $n \geq N$ , such that $η_{n} \overset{w_{i}}{\to} η$ weakly in $P (K^{i})$ ;
(2): if $η_{\infty}^{'}$ is an accumulation point of sequence ${η_{n}^{'}, n \geq 1}$ in which $η_{n}^{'}$ is an optimal solution of LP $_{{\tilde{μ}}_{n}}^{i}$ for each $n \in N$ , then $η_{\infty}^{'} \in O^{i} ({\tilde{μ}}_{\infty}) .$

Proof.

(1): Lemma 5(2) gives the existence of $φ^{i}, {\tilde{φ}}^{i} \in Π_{s}^{i}$ and constant $D > 0$ , such that $η = {\tilde{μ}}_{φ^{i}}^{i}$ and $V_{k, {\tilde{μ}}_{\infty}}^{i} (ν^{i}, {\tilde{φ}}^{i}) \leq d_{k}^{i} - D$ for all $1 \leq k \leq p$ . By Lemma 4, it follows that

$\begin{matrix} lim_{n \to \infty} \int_{K^{i}} c_{k, {\tilde{μ}}_{n}}^{i} (x^{i}, a^{i}) η (x^{i}, d a^{i}) = \int_{K^{i}} c_{k, {\tilde{μ}}_{\infty}}^{i} (x^{i}, a^{i}) η (x^{i}, d a^{i}) \leq d_{k}^{i}, \end{matrix}$

(13)

$\begin{matrix} lim_{n \to \infty} \int_{K^{i}} c_{k, {\tilde{μ}}_{n}}^{i} (x^{i}, a^{i}) {\tilde{μ}}_{{\tilde{φ}}^{i}}^{i} (x^{i}, d a^{i}) = \int_{K^{i}} c_{k, {\tilde{μ}}_{\infty}}^{i} (x^{i}, a^{i}) {\tilde{μ}}_{{\tilde{φ}}^{i}}^{i} (x^{i}, d a^{i}) \leq d_{k}^{i} - D, \end{matrix}$

(14)

for all $1 \leq k \leq p .$
Let $ε$ be such that $0 < ε < \frac{D}{2}$ . By (13)-(14), there exists an integer $N_{ε}$ (depending on $ε$ ) such that, for each $n \geq N_{ε}$ and $1 \leq k \leq p$ ,

$\int_{K^{i}} c_{k, {\tilde{μ}}_{n}}^{i} (x^{i}, a^{i}) η (x^{i}, d a^{i}) \leq d_{k}^{i} + ε, \int_{K^{i}} c_{k, {\tilde{μ}}_{n}}^{i} (x^{i}, a^{i}) {\tilde{μ}}_{{\tilde{φ}}^{i}}^{i} (x^{i}, d a^{i}) \leq d_{k}^{i} - D + ε .$

Let $ν_{n}^{ε} : = (1 - λ_{ε}) η + λ_{ε} {\tilde{μ}}_{{\tilde{φ}}^{i}}^{i},$ where $λ_{ε} : = \frac{ε}{D} < \frac{1}{2}$ . Then, we derive from Lemma 3 that $ν_{n}^{ε} \in N^{i}$ , and

$\begin{matrix} \int_{K^{i}} c_{k, {\tilde{μ}}_{n}}^{i} (x^{i}, a^{i}) ν_{n}^{ε} (x^{i}, d a^{i}) & = & (1 - λ_{ε}) \int_{K^{i}} c_{k, {\tilde{μ}}_{n}}^{i} (x^{i}, a^{i}) η (x^{i}, d a^{i}) + λ_{ε} \int_{K^{i}} c_{k, {\tilde{μ}}_{n}}^{i} (x^{i}, a^{i}) {\tilde{μ}}_{{\tilde{φ}}^{i}}^{i} (x^{i}, d a^{i}) \\ \leq & (1 - λ_{ε}) (d_{k}^{i} + ε) + λ_{ε} [(d_{k}^{i} - D) + ε] \leq d_{k}^{i}, \end{matrix}$

(15)

for each $1 \leq k \leq p$ and $n \geq N_{ε}$ , which implies that $ν_{n}^{ε} \in F^{i} ({\tilde{μ}}_{n})$ for all $n \geq N_{ε}$ .
Then, let ${ε_{s}} \subseteq R$ , such that $ε_{s} ↓ 0$ and $0 < ε_{s} < \frac{D}{2}$ . For each fixed $s \geq 1$ (corresponding to a given $ε_{s}$ ), as in the previous argument, there exists an integer $N_{s}$ (depending on s) which is assumed to be increasing in $s \geq 1$ , such that

$ν_{n}^{s} : = (1 - λ_{s}) η + λ_{s} {\tilde{μ}}_{{\tilde{φ}}^{i}}^{i} \in F^{i} ({\tilde{μ}}_{n}), \forall n \geq N_{s}, s \geq 1, where λ_{s} : = \frac{ε_{s}}{D} .$

(16)

Let

$η_{n} : = ν_{n}^{s} and λ_{n} : = λ_{s}, for each N_{s} \leq n < N_{s + 1} .$

(17)

Since $ε_{s} ↓ 0$ , by (15)–(17), we have $η_{n} \overset{w_{i}}{\to} η$ weakly in $P (K^{i})$ as $n \to \infty$ and $η_{n} \in F^{i} ({\tilde{μ}}_{n})$ for each $n \geq N_{1}$ , which completes the proof of part (i).
(2): By Lemma 6, without loss of generality, we assume that $η_{n}^{'} \overset{w_{i}}{\to} η_{\infty}^{'} \in F^{i} ({\tilde{μ}}_{\infty})$ weakly in $P (K^{i})$ . On the other hand, for any $η_{\infty} \in F^{i} ({\tilde{μ}}_{\infty})$ , it follows from part (1) that there exist an integer N and $η_{n} \in F^{i} ({\tilde{μ}}_{n})$ for all $n \geq N$ , such that $η_{n} \overset{w_{i}}{\to} η_{\infty}$ weakly in $P (K^{i})$ , which together with $η_{n}^{'} \in O^{i} ({\tilde{μ}}_{n})$ , gives

$\int_{K^{i}} c_{0, {\tilde{μ}}_{n}}^{i} (x^{i}, a^{i}) η_{n}^{'} (x^{i}, d a^{i}) \leq \int_{K^{i}} c_{0, {\tilde{μ}}_{n}}^{i} (x^{i}, a^{i}) η_{n} (x^{i}, d a^{i}) \forall n \geq N .$

(18)

Then, by Lemma 4 and (18), we have

$\int_{K^{i}} c_{0, {\tilde{μ}}_{\infty}}^{i} (x^{i}, a^{i}) η_{\infty}^{'} (x^{i}, d a^{i}) \leq \int_{K^{i}} c_{0, {\tilde{μ}}_{\infty}}^{i} (x^{i}, a^{i}) η_{\infty} (x^{i}, d a^{i}),$

which implies $η_{\infty}^{'} \in O^{i} ({\tilde{μ}}_{\infty})$ .

4. The Main Results

In this section, we give the existence and characterization of stationary constrained Nash equilibria. For each

\tilde{μ} = ({\tilde{μ}}^{1}, \dots, {\tilde{μ}}^{N}) \in N

, we introduce the following two multi-functions:

Λ^{i} : N \to 2^{N^{i}} and Ψ (\tilde{μ}) : N \to 2^{N} for each i \in I

Λ^{i} (\tilde{μ}) : = {η \in N^{i} | η is an optimal solution of {LP}_{\tilde{μ}}^{i}} and Ψ (\tilde{μ}) : = \times_{i = 1}^{N} Λ^{i} (\tilde{μ}) .

(19)

Theorem 1.

Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Then, there exists a stationary constrained Nash equilibrium.

Proof.

Let

i \in I

be fixed,

{{\tilde{μ}}_{n}} \subseteq N

and

η_{n} \in Λ^{i} ({\tilde{μ}}_{n})

such that

{\tilde{μ}}_{n} \overset{w}{\to} {\tilde{μ}}_{\infty}

and

η_{n} \overset{w_{i}}{\to} η_{\infty}

. Then, by Lemma 8(2), we know that

η_{\infty} \in O^{i} ({\tilde{μ}}_{\infty})

which implies that

Λ^{i}

is upper semi-continuous. By the arbitrariness of i, we can derive that

Ψ

is upper semi-continuous. Moreover, Lemmas 3, 7 and (10) deduce that the set

Ψ (\tilde{μ})

is nonempty and convex for each

\tilde{μ} \in N

. Moreover, we can see that

Ψ (\tilde{μ})

is compact by the upper semi-continuity of

Ψ

. Hence, it follows from Fan’s fixed point Theorem in Reference [22] that that

Ψ

has a fixed point

{\tilde{μ}}^{*}

. By Proposition D.8 in Reference [16], there exists

φ^{*} : = (φ^{* 1}, \dots, φ^{* N}) \in Π_{s}

such that

μ^{* i} (x^{i}, d a^{i}) = {\hat{μ}}^{* i} (x^{i}) φ^{* i} (d a^{i} | x^{i})

, which together with Proposition 1, implies that

φ^{*}

is a stationary constrained Nash equilibrium. □

By the same proof of Theorems 5.3.1 and 5.4.1 in Reference [18] for discrete-time Markov decision processes, we can yield the following statements.

Lemma 9.

Suppose that Assumptions 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Then, for each given

\tilde{μ}

,

(1): there exists a function $h^{* i} \in B_{ω_{i}} (X^{i})$ and a vector $λ^{* i} : = (λ_{1}^{* i}, \dots, λ_{p}^{* i}) \in {[0, \infty)}^{p}$ for each $i \in I$ such that

$\begin{matrix} V_{\tilde{μ}}^{* i} + h^{* i} (x^{i}) & = & inf_{a^{i} \in A^{i} (x^{i})} {c_{0, \tilde{μ}}^{i} (x^{i}, a^{i}) + \sum_{k = 1}^{p} λ_{k}^{* i} (c_{k, \tilde{μ}}^{i} (x^{i}, a^{i}) - d_{k}^{i}) + \sum_{y^{i} \in X^{i}} h^{* i} (y^{i}) Q^{i} (y^{i} | x^{i}, a^{i})}; \end{matrix}$
(2): for each $λ^{i} = (λ_{1}^{i}, \dots, λ_{p}^{i}) \in {[0, \infty)}^{p}$ , there exists $(V_{λ^{i}}^{i}, h^{i}) \in R \times B_{ω_{i}} (X^{i})$ such that

$V_{λ^{i}}^{i} + h^{i} (x^{i}) = inf_{a^{i} \in A^{i} (x^{i})} {c_{0, \tilde{μ}}^{i} (x^{i}, a^{i}) + \sum_{k = 1}^{p} λ_{k}^{i} (c_{k, \tilde{μ}}^{i} (x^{i}, a^{i}) - d_{k}^{i}) + \sum_{y^{i} \in X^{i}} h^{i} (y^{i}) Q^{i} (y^{i} | x^{i}, a^{i})},$

and $V_{\tilde{μ}}^{* i} = {inf}_{λ \in {[0, \infty)}^{p}} V_{λ^{i}}^{i}$ .

Now we introduce the following duality program

{DP}_{\tilde{μ}}^{i}

for each

\tilde{μ} \in N

and

i \in I

:

\begin{matrix} {DP}_{\tilde{μ}}^{i} : & sup_{(h^{i}, λ^{i}) \in B_{w_{i}} (X^{i}) \times {[0, \infty)}^{p}} V_{λ^{i}}^{i} \\ subject to \{\begin{matrix} V_{λ^{i}}^{i} + h^{i} (x^{i}) \leq c_{0, \tilde{μ}}^{i} (x^{i}, a^{i}) + \sum_{k = 1}^{p} λ_{k}^{i} (c_{k, \tilde{μ}}^{i} (x^{i}, a^{i}) - d_{k}^{i}) \\ + \sum_{y^{i} \in X^{i}} h^{i} (y^{i}) Q^{i} (y^{i} | x^{i}, a^{i}) for each (x^{i}, a^{i}) \in (X^{i} \times A^{i}), \\ λ_{k}^{i} \geq 0 for each 1 \leq k \leq p . \end{matrix} \end{matrix}

(20)

Combining

{LP}_{\tilde{μ}}^{i}

and

{DP}_{\tilde{μ}}^{i}

together, we introduce the following mathematical program

(MP)

:

\begin{matrix} MP : & min_{ζ \in R^{N} \times \prod_{i = 1}^{N} (B_{w_{i}} (X^{i}) \times P_{w_{i}} (K^{i})) \times {[0, \infty)}^{N p}} \sum_{i = 1}^{N} [\int_{X \times A} c_{0}^{i} (x, a) \prod_{i = 1}^{N} {\tilde{μ}}^{i} (x^{i}, d a^{i}) - V^{i}] \\ subject to \{\begin{matrix} V^{i} + h^{i} (x^{i}) \leq c_{0, \tilde{μ}}^{i} (x^{i}, a^{i}) + \sum_{k = 1}^{p} λ_{k}^{i} (c_{k, \tilde{μ}}^{i} (x^{i}, a^{i}) - d_{k}^{i}) \\ + \sum_{y^{i} \in X^{i}} h^{i} (y^{i}) Q^{i} (y^{i} | x^{i}, a^{i}) for each (x^{i}, a^{i}) \in (X^{i} \times A^{i}), \\ λ_{k}^{i} \geq 0 for each 1 \leq k \leq p, \\ \sum_{x^{i} \in X^{i}} \int_{A^{i} (x^{i})} c_{k, \tilde{μ}}^{i} (x^{i}, a^{i}) {\tilde{μ}}^{i} (x^{i}, d a^{i}) \leq d_{k}^{i}, 1 \leq k \leq p, \\ \sum_{x^{i} \in X^{i}} \int_{A^{i} (x^{i})} (Q^{i} (y^{i} | x^{i}, a^{i}) - I_{{y^{i}}} (x^{i})) {\tilde{μ}}^{i} (x^{i}, d a^{i}) = 0 for each y^{i} \in X^{i}, \end{matrix} \end{matrix}

(21)

where

ζ : = {(V^{i}, h^{i}, {\tilde{μ}}^{i}, λ^{i})}_{i = 1}^{N} \in R^{N} \times \prod_{i = 1}^{N} (B_{w_{i}} (X^{i}) \times P_{w_{i}} (K^{i})) \times {[0, \infty)}^{N p}

. Let us define the function

Φ (ζ) : = \sum_{i = 1}^{N} [\int_{X \times A} c_{0}^{i} (x, a) \prod_{i = 1}^{N} {\tilde{μ}}^{i} (x^{i}, d a^{i}) - V^{i}]

, for each

ζ = {(V^{i}, h^{i}, {\tilde{μ}}^{i}, λ^{i})}_{i = 1}^{N} \in R^{N} \times \prod_{i = 1}^{N} (B_{w_{i}} (X^{i}) \times P_{w_{i}} (K^{i})) \times {[0, \infty)}^{N p}

.

In Reference [7], the authors characterize each stationary constrained Nash equilibrium in the finite-state games as a global minimum of certain mathematical program. The following theorem generalizes the above result to the denumerable-state games.

Theorem 2.

Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Then,

(1): there exists $ζ^{*} = {(V^{* i}, h^{* i}, {\tilde{μ}}^{* i}, λ^{* i})}_{i = 1}^{N} \in R^{N} \times \prod_{i = 1}^{N} (B_{w_{i}} (X^{i}) \times P_{w_{i}} (K^{i})) \times {[0, \infty)}^{N p}$ such that it is a global minimum of the mathematical program MP;
(2): suppose that $ζ^{=} {(V^{i}, h^{i}, {\tilde{μ}}^{i}, λ^{i})}_{i = 1}^{N} \in R^{N} \times \prod_{i = 1}^{N} (B_{w_{i}} (X^{i}) \times P_{w_{i}} (K^{i})) \times {[0, \infty)}^{N p}$ is a global minimum of the mathematical program MP with $Φ (ζ) = 0$ . Then, there exits $φ^{* i} \in Π_{s}^{i}$ such that ${\tilde{μ}}^{i} (x^{i}, d a^{i}) = {\hat{\tilde{μ}}}^{i} (x^{i}) φ^{* i} (d a^{i} | x^{i})$ and $φ^{*} : = (φ^{* 1}, \dots, φ^{* N})$ is a constrained Nash equilibrium of the model $G$ .

Proof.

(1): Let $φ^{*} : = (φ^{* 1}, \dots, φ^{* N})$ be a constrained Nash equilibrium and let ${\tilde{μ}}^{*} = ({\tilde{μ}}_{φ^{* 1}}^{1}, \dots, {\tilde{μ}}_{φ^{* N}}^{N}) \in \times_{i = 1}^{N} N^{i}$ be the average occupation measure corresponding to $φ^{*}$ . By Proposition 1, ${\tilde{μ}}_{φ^{* i}}^{i}$ satisfies (10) if $\tilde{μ}$ is replaced by ${\tilde{μ}}^{*}$ . Theorem 5.3.1 in Reference [18] and Theorem 10.3.6 in Reference [17] yield that there exists $(V^{* i}, h^{* i}, λ^{* i}) \in R \times B_{w_{i}} (X^{i}) \times {[0, \infty)}^{p}$ which satisfies (20) by replacing $\tilde{μ}$ with ${\tilde{μ}}^{*}$ and

$\sum_{x^{i} \in X^{i}} \int_{A^{i} (x^{i})} c_{0, {\tilde{μ}}^{*}}^{i} (x^{i}, a^{i}) {\tilde{μ}}^{* i} (x^{i}, d a^{i}) = V^{* i} = V_{{\tilde{μ}}^{*}}^{* i}$

for each $i \in I$ . Let $ζ^{*} : = {(V^{* i}, h^{* i}, {\tilde{μ}}^{* i}, λ^{* i})}_{i = 1}^{N}$ , we have $Φ (ζ^{*}) = 0$ . In turn, let $ζ = {(V_{λ^{i}}^{i}, h^{i}, {\tilde{μ}}^{i}, λ^{i})}_{i = 1}^{N} \in R^{N} \times \prod_{i = 1}^{N} (B_{w_{i}} (X^{i}) \times P_{w_{i}} (K^{i})) \times {[0, \infty)}^{N p}$ be a feasible point of $MP$ , we get that

$\begin{matrix} V_{λ^{i}}^{i} & \leq & \int_{X \times A} c_{0}^{i} (x, a) \prod_{j = 1}^{N} {\tilde{μ}}^{j} (x^{j}, d a^{j}) + \sum_{k = 1}^{p} λ_{k}^{i} \int_{X \times A} (c_{k}^{i} (x, a) - d_{k}^{i}) \prod_{j = 1}^{N} {\tilde{μ}}^{j} (x^{j}, d a^{j}) \\ + \int_{K^{i}} \sum_{y^{i} \in X^{i}} h^{i} (y^{i}) Q^{i} (y^{i} | x^{i}, a^{i}) {\tilde{μ}}^{i} (x^{i}, d a^{i}) - \int_{K^{i}} h^{i} (x^{i}) {\tilde{μ}}^{i} (x^{i}, d a^{i}) \\ \leq & \int_{X \times A} c_{0}^{i} (x, a) \prod_{j = 1}^{N} {\tilde{μ}}^{j} (x^{j}, d a^{j}), \end{matrix}$

(22)

which implies $Φ (ζ) \geq 0$ .
(2): Let us take an arbitrary global minimum $ζ^{=} {(V_{λ^{i}}^{i}, h^{i}, {\tilde{μ}}^{i}, λ^{i})}_{i = 1}^{N} \in R^{N} \times \prod_{i = 1}^{N} (B_{w_{i}} (X^{i}) \times P_{w_{i}} (K^{i})) \times {[0, \infty)}^{N p}$ of MP with $Φ (ζ) = 0$ . Since $ζ$ is a feasible solution of MP, by (22), we have

$V_{λ^{i}}^{i} \leq \int_{X \times A} c_{0}^{i} (x, a) \prod_{j = 1}^{N} {\tilde{μ}}^{j} (x^{j}, d a^{j})$

and ${\tilde{μ}}^{i}$ satisfies (10). Hence, by $Φ (ζ) = 0$ , it follows that $V_{λ^{i}}^{i} = \int_{X \times A} c_{0}^{i} (x, a) \prod_{j = 1}^{N} {\tilde{μ}}^{j} (x^{j}, d a^{j})$ for each $i \in I$ .

Then, let

\tilde{μ} : = (μ^{1}, \dots, μ^{N})

and take an arbitrary feasible solution

{\tilde{μ}}^{' i}

of

{LP}_{\tilde{μ}}^{i}

, we have

\int_{X \times A} (c_{k}^{i} (x, a) - d_{k}^{i}) {\tilde{μ}}^{' i} (x^{i}, d a^{i}) \prod_{j = 1, j \neq i}^{N} {\tilde{μ}}^{j} (x^{j}, d a^{j}) \leq 0, for each 1 \leq k \leq p .

Then, proceeding as in the proof of (22), it follows that

\begin{matrix} \int_{X \times A} c_{0}^{i} (x, a) \prod_{j = 1}^{N} {\tilde{μ}}^{j} (x^{j}, d a^{j}) & = & V_{λ^{i}}^{i} \leq \int_{X \times A} c_{0}^{i} (x, a) {\tilde{μ}}^{' i} (x^{i}, d a^{i}) \prod_{j = 1, j \neq i}^{N} {\tilde{μ}}^{j} (x^{j}, d a^{j}), \end{matrix}

which implies

{\tilde{μ}}^{i}

is

{LP}_{\tilde{μ}}^{i}

optimal for each

i \in I

. Then, Proposition D.8 in Reference [16] yields that there exists a stationary strategy

φ^{* i} \in Π_{s}^{i}

such that

{\tilde{μ}}^{i} (x^{i}, d a^{i}) = {\hat{\tilde{μ}}}^{i} (x^{i}) φ^{* i} (d a^{i} | x^{i})

for each

i \in I

. By Proposition 1,

φ^{*} : = (φ^{* 1}, \dots, φ^{* N})

is a stationary constrained Nash equilibrium of the model

G

.

□

5. An Example

In this section, we use a wireless network to illustrate our conditions and main results.

Example 1.

Consider a wireless network in which there are N nodes and each node contains a mobile, a buffer and a channel. Let

x_{t}^{i} \in X^{i} : = {0, 1, \dots}

denote the number of packets in the buffer of node i and we assume that the new packets are not admitted if the buffer is not empty. At time t, if

x_{t}^{i} > 0

, each mobile transmits a packet with power

a_{t}^{i} \in A^{i} : = [δ, 1]

, where

0 < δ < 1

. We assume each mobile will retransmit packet at time

t + 1

if the packet has not been transmitted successfully. When

x_{t}^{i} = 0

, the number of the new arrivals at time t is denoted by

z_{t}

, which is assume to be identically distributed. The number of packets in the buffer i and the action

a_{t}^{i}

are only available for mobile i itself. Hence, the transition probability are defined as follows: if

x^{i} \geq 1

\begin{matrix} Q^{i} (x^{i} - 1 | x^{i}, a^{i}) & = & a^{i} \\ Q^{i} (x^{i} | x^{i}, a^{i}) & = & 1 - a^{i}, \end{matrix}

and

Q^{i} (x^{i} | 0, a^{i}) = q (x^{i})

, where

\sum_{k = 0}^{\infty} q (k) = 1

with

q (k) > 0

for each

k \in X^{i}

. Let

ν^{i}

denote the initial distribution of the buffer i,

p_{0}

be some base value of the power,

c_{1}^{i} (x^{i}, a^{i}) : = p_{0} a^{i}

be the cost of power for node i and

c_{2}^{i} (x^{i}, a^{i}) : = x^{i}

denote the delay cost for node i. As in Reference [12], we assume that the signal to interference ratio of mobile i denoted by

S I R_{i}

is given by

\begin{matrix} {SIR}_{i} ((x^{1}, x^{2}, \dots, x^{N}), (a^{1}, a^{2}, \dots, a^{N})) & : = & \frac{g_{i} a^{i}}{N_{0} + \sum_{j \neq i, x^{j} \geq 0} r_{i j} g_{j} a^{j}} if x^{j} > 0 \\ = & 0, otherwise, \end{matrix}

where

r_{i j}

are the coding orthogonality coefficients and

N_{0}

is the thermal noise in the medium,

g_{i} > 0

are some constants. Let

c_{0}^{i} (x^{1}, \dots, x^{N}, a^{1}, \dots, a^{N}) : = - {log}_{2} (1 + {SIR}_{i} ((x^{1}, x^{2}, \dots, x^{N}), (a^{1}, a^{2}, \dots, a^{N})))

denotes the cost that each mobile wants to minimize, and

d_{1}^{i}, d_{2}^{i}

denote the constraints of node i corresponding to

c_{1}^{i}

and

c_{2}^{i}

.

Condition 1.

(1): $\sum_{k = 1}^{\infty} k^{2} q (k) < \infty .$
(2): There exists an interval $[0, \bar{s}]$ such that $m (s) : = \sum_{k = 0}^{\infty} e^{2 s k} q (k) < \infty$ for each $s \in [0, \bar{s}]$ .
(3): There exists $\hat{s} \in (0, \bar{s})$ such that $\sum_{x^{i} \in X^{i}} e^{2 \hat{s} x^{i}} ν^{i} (x^{i}) < \infty$ for each i.
(4): There exists an action $a^{'} \in [δ, 1]$ such that $p_{0} a^{'} < d_{1}^{i}$ and $\frac{1}{2} \frac{\sum_{k = 1}^{\infty} k (1 + k) q (k)}{a^{'} + \sum_{k = 1}^{\infty} k q (k)} < d_{2}^{i}$ for each i.

Proposition 2.

Under condition 1, Example 1 satisfies Assumptions 1, 2, 3, 5, 6 and 7. Hence, (by Theorem 1), there exists a stationary constrained Nash equilibrium.

Proof.

For each i and

φ^{i} \in Π_{s}^{i}

, let

μ_{φ^{i}}^{i}

denote the invariant measure and

φ^{i} (x^{i}) : = \int_{δ}^{1} a^{i} φ^{i} (d a^{i} | x^{i})

, which satisfies

\begin{matrix} μ_{φ^{i}}^{i} (y^{i}) & = & \sum_{x^{i} = 0}^{\infty} Q^{i} (y^{i} | x^{i}, φ^{i}) μ_{φ^{i}}^{i} (x^{i}) \\ = & [1 - φ^{i} (y^{i})] μ_{φ^{i}}^{i} (y^{i}) + φ^{i} (y^{i} + 1) μ_{φ^{i}}^{i} (y^{i} + 1) + q (y^{i}) μ_{φ^{i}}^{i} (0) . \end{matrix}

Hence, it follows from Condition 1(1) that

\begin{matrix} φ^{i} (y^{i}) μ_{φ^{i}}^{i} (y^{i}) & = & μ_{φ^{i}}^{i} (0) q (y^{i}) + μ_{φ^{i}}^{i} (0) q (y^{i} + 1) + μ_{φ^{i}}^{i} (y^{i} + 2) q (y^{i} + 2) \\ = & μ_{φ^{i}}^{i} (0) \sum_{k = 0}^{\infty} q (y^{i} + k), \end{matrix}

which implies that

μ_{φ^{i}}^{i} (y^{i}) = μ_{φ^{i}}^{i} (0) \sum_{k = 0}^{\infty} \frac{q (y^{i} + k)}{φ^{i} (y^{i})}, and μ_{φ^{i}}^{i} (0) = \frac{1}{1 + \sum_{x^{i} = 1}^{\infty} \sum_{y^{i} = 1}^{x^{i}} \frac{q (x^{i})}{φ (y^{i})}} .

(23)

Let us define

β_{1} : = 1 - δ (1 - \frac{1}{e})

, the functions

l (x^{i}) : = I_{0} (x^{i})

and

w_{i} (x^{i}) : = e^{\hat{s} x^{i}}

and

μ (x^{i}) : = q (x^{i})

for each

x^{i} \in X^{i}

. It is obvious that

\begin{matrix} Q^{i} (x^{i} - 1 | x^{i}, φ^{i}) & = & φ^{i} (x^{i}) \geq I_{0} (x^{i}) μ (x^{i} - 1), \\ Q^{i} (x^{i} | x^{i}, φ^{i}) & = & 1 - φ^{i} (x^{i}) \geq I_{0} (x^{i}) μ (x^{i}), \\ Q^{i} (y^{i} | 0, φ^{i}) & = & q (y^{i}) \geq I_{0} (0) q (y^{i}) . \end{matrix}

If

x^{i} \geq 1

,

\begin{matrix} \sum_{y^{i} \in X^{i}} w_{i} (y^{i}) Q^{i} (y^{i} | x^{i}, a^{i}) & = & e^{\hat{s} (x^{i} - 1)} a^{i} + e^{\hat{s} x^{i}} (1 - a^{i}) \\ = & e^{\hat{s} x^{i}} (1 - a^{i} + \frac{1}{e} a^{i}) \end{matrix}

and

\begin{matrix} \sum_{y^{i} \in X^{i}} w_{i} (y^{i}) Q^{i} (y^{i} | 0, a^{i}) & = & \sum_{y^{i} \in X^{i}} e^{\hat{s} y^{i}} q (y^{i}) . \end{matrix}

Thus, by Condition 1(2), it follows that

\sum_{y^{i} \in X^{i}} w_{i} (y^{i}) Q^{i} (y^{i} | x^{i}, a^{i}) \leq β_{1} w_{i} (x^{i}) + l (0) \sum_{y^{i} \in X^{i}} e^{\hat{s} y^{i}} q (y^{i}) .

Hence, the model satisfies Assumption 6(3). Let

β_{2} : = 1 - δ (1 - \frac{1}{e^{2 \hat{s}}})

and

b_{2} = \sum_{y^{i} \in X^{i}} e^{2 \hat{s} y^{i}} q (y^{i})

. Similarly, if

x^{i} > 0

,

\begin{matrix} \sum_{y^{i} \in X^{i}} w_{i}^{2} (y^{i}) Q^{i} (y^{i} | x^{i}, a^{i}) & = & e^{2 \hat{s} (x^{i} - 1)} a^{i} + e^{2 \hat{s} x^{i}} (1 - a^{i}) \\ = & e^{2 \hat{s} x^{i}} (1 - a^{i} + \frac{a^{i}}{e^{2 \hat{s}}}) \\ \leq & e^{2 \hat{s} x^{i}} β_{2}, \end{matrix}

if

x^{i} = 0

,

\begin{matrix} \sum_{y^{i} \in X^{i}} w_{i}^{2} (y^{i}) Q^{i} (y^{i} | 0, a^{i}) & = & \sum_{y^{i} \in X^{i}} e^{2 \hat{s} y^{i}} q (y^{i}) . \end{matrix}

Therefore, we have

\begin{matrix} \sum_{y^{i} \in X^{i}} w_{i}^{2} (y^{i}) Q (y^{i} | x^{i}, a^{i}) & \leq & β_{2} w_{i}^{2} (x^{i}) + b_{2} l (0), \end{matrix}

for each

x^{i} \geq 0

, which together with Condition 1(3) and Proposition 10.2.5 in Reference [17], implies Assumptions 2 and 5 with

τ = 2

. Let

φ^{' i} (a^{'} | x^{i}) : = 1

for each i and

φ \in Π_{s}

, it follows from (23) and Condition 1(1) that

\begin{matrix} V_{2}^{i} (ν, [φ^{- i}, φ^{' i}]) & : = & \underset{n \to \infty}{lim sup} \frac{1}{n} {\tilde{E}}_{ν}^{[φ^{- i}, φ^{' i}]} [\sum_{t = 0}^{n - 1} c_{2}^{i} (x_{t}^{i}, a_{t}^{i})] \\ = & \sum_{y^{i} = 1}^{\infty} y^{i} μ_{φ^{' i}}^{i} (y^{i}) \\ = & μ_{φ^{' i}}^{i} (0) \sum_{y^{i} = 1}^{\infty} \sum_{k = 0}^{\infty} y^{i} \frac{q (y^{i} + k)}{φ^{' i} (y^{i})} \\ = & μ_{φ^{' i}}^{i} (0) \sum_{x^{i} = 1}^{\infty} q (x^{i}) \sum_{y^{i} = 1}^{x^{i}} \frac{y^{i}}{φ^{' i} (y^{i})} \\ = & μ_{φ^{' i}}^{i} (0) \sum_{x^{i} = 1}^{\infty} q (x^{i}) \frac{x^{i} (1 + x^{i})}{2 a^{'}} \\ = & \frac{1}{2} \frac{\sum_{k = 1}^{\infty} k (1 + k) q (k)}{a^{'} + \sum_{k = 1}^{\infty} k q (k)} < d_{2}^{i}, \end{matrix}

which together with Condition 1(4) implies the Assumption 7. It is obvious that Assumptions 3 and 6(1)–(2) hold for this model. Thus, by Theorem 1, there exists a stationary constrained Nash equilibrium. □

6. Conclusions

In the present paper, we have studied the discrete-time constrained stochastic games with denumerable state space under the average cost criteria. By introducing the average occupation measures, we have established the best-response linear program and characterized the average occupation measures of stationary Nash equilibria as fixed points of ceratin multifunction. By introducing the so-called w-weak convergence topology, we have considered the asymptotic properties of average occupation measures and given the new existence condition of stationary constrained Nash equilibria. Furthermore, we have established a mathematical program and showed that each stationary Nash equilibria is a global minimizer of this mathematical program. It should be mentioned that the arguments in References [6,7] employ the compactness of the space of occupation measures with respect to the standard weak convergence topology and the solvability of best response linear program which can not apply directly for the case of unbounded costs and denumerable state space. However, the costs are usually unbounded and state space is not finite in some controlled stochastic models. Moreover, the fixed point method in the present paper is also different from the vanishing discount method in Reference [9] which is based on the existence of constrained discounted Nash equilibria. Finally, a wireless network has been presented to illustrate the applications of our main results.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 11801080).

Conflicts of Interest

The authors declare no conflict of interest.

References

Shapley, L.S. Stochastic games. Proc. Natl. Acad. Sci. USA 1953, 39, 1095–1100. [Google Scholar] [CrossRef] [PubMed]
Alvarez-Mena, J.; Hernández-Lerma, O. Existence of Nash equilibria for constrained stochastic games. Math. Methods Oper. Res. 2006, 63, 261–285. [Google Scholar] [CrossRef]
Filar, J.; Vrieze, K. Competitive Markov Decision Processes; Springer: New York, NY, USA, 1997. [Google Scholar]
Nowak, A.S. On stochastic games in economics. Math. Methods Oper. Res. 2007, 66, 513–530. [Google Scholar] [CrossRef]
Yang, J.; Guo, X.P. Zero-sum stochastic games with average payoffs: New optimality conditions. Acta Math. Sin. Engl. Ser. 2009, 25, 1201–1216. [Google Scholar] [CrossRef]
Altman, E.; Avrachenkov, K.; Bonneau, N.; Debbah, M.; EI-Azouzi, R.; Sadoc Menasche, D. Constrained cost-coupled stochastic games with independent state processes. Oper. Res. Lett. 2008, 36, 160–164. [Google Scholar] [CrossRef]
Singh, V.V.; Hemachandra, N. A characterization of stationary Nash equilibria of constrained stochastic games with independent state processes. Oper. Res. Lett. 2014, 42, 48–52. [Google Scholar] [CrossRef]
Altman, E.; Shwartz, A. Constrained Markov Games: Nash Equilibria. In Annals of the International Society of Dynamic Games: Advances in Dynamic Games and Applications; Filar, J.A., Gaitsgory, V., Mizukami, K., Eds.; Birkha¨user: Boston, MA, USA, 2000; Volume 5, pp. 213–221. [Google Scholar]
Wei, Q.D.; Chen, X. Constrained stochastic games with the average payoff criteria. Oper. Res. Lett. 2015, 43, 83–88. [Google Scholar] [CrossRef]
Zhang, W.Z.; Huang, Y.H.; Guo, X.P. Nonzero-sum constrained discrete-time Markov games: The case of unbounded costs. TOP 2014, 22, 1074–1102. [Google Scholar] [CrossRef]
Niyato, D.; Wang, P.; Kim, D.I.; Han, Z.; Xiao, L. Game theoretic modeling of jamming attack in wireless powered communication networks. In Proceedings of the 2015 IEEE International Conference on Communications (ICC), London, UK, 8–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 6018–6023. [Google Scholar]
Altman, E.; Avratchenkov, K.; Bonneau, N.; Debbah, M.; El-Azouzi, R.; Menasché, D.S. Constrained stochastic games in wireless networks. In Proceedings of the IEEE GLOBECOM 2007—IEEE Global Telecommunications Conference, Washington, DC, USA, 26–30 November 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 315–320. [Google Scholar]
Ko, H.; Pack, S. Neighbor-aware energy-efficient monitoring system for energy harvesting internet of things. IEEE Internet Things J. 2019, 6, 5745–5752. [Google Scholar] [CrossRef]
Puterman, M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming; Wiley: New York, NY, USA, 1994. [Google Scholar]
Sennott, L.I. Stochastic Dynamic Programming and the Control of Queueing Systems; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
Hernández-Lerma, O.; Lasserre, J.B. Discrete-Time Markov Control Processes: Basic Optimality Criteria; Springer: New York, NY, USA, 1996. [Google Scholar]
Hernández-Lerma, O.; Lasserre, J.B. Further Topics on Discrete-Time Markov Control Processes; Springer: New York, NY, USA, 1999. [Google Scholar]
Mendoza-Pérez, A. Pathwise Average Reward Markov Control Processes. Ph.D. Thesis, CINVE-STAV-IPN, Mexico City, Mexico, 2008. Available online: http://www.math.cinvestav.mx/ohernand_students (accessed on 22 October 2019).
Nowak, A.S. Remarks on sensitive equilibria in stochastic games with additive reward and transition structure. Math. Methods Oper. Res. 2006, 64, 481–494. [Google Scholar] [CrossRef]
Föllmer, H.; Schied, A. Stochastic Finance: An Introduction in Discrete Time; Walter de Gruyter: Berlin, Germany, 2004. [Google Scholar]
Alvarez-Mena, J.; Hernández-Lerma, O. Convergence of the optimal values of constrained Markov control processes. Math. Methods Oper. Res. 2002, 55, 461–484. [Google Scholar] [CrossRef]
Fan, K. Fixed-point and minimax theorems in locally convex topological linear spaces. Proc. Natl. Acad. Sci. USA 1952, 38, 121–126. [Google Scholar] [CrossRef] [PubMed] [Green Version]

© 2019 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, W. Discrete-Time Constrained Average Stochastic Games with Independent State Processes. Mathematics 2019, 7, 1089. https://doi.org/10.3390/math7111089

AMA Style

Zhang W. Discrete-Time Constrained Average Stochastic Games with Independent State Processes. Mathematics. 2019; 7(11):1089. https://doi.org/10.3390/math7111089

Chicago/Turabian Style

Zhang, Wenzhao. 2019. "Discrete-Time Constrained Average Stochastic Games with Independent State Processes" Mathematics 7, no. 11: 1089. https://doi.org/10.3390/math7111089

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Discrete-Time Constrained Average Stochastic Games with Independent State Processes

Abstract

1. Introduction

2. The Game Model

3. The Technique Preliminary

4. The Main Results

5. An Example

6. Conclusions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI