Next Article in Journal
Energy-Based Control and LMI-Based Control for a Quadrotor Transporting a Payload
Previous Article in Journal
Approximations of Fixed Points in the Hadamard Metric Space CATp(0)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Discrete-Time Constrained Average Stochastic Games with Independent State Processes

College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350108, China
Mathematics 2019, 7(11), 1089; https://doi.org/10.3390/math7111089
Submission received: 6 August 2019 / Revised: 30 October 2019 / Accepted: 6 November 2019 / Published: 11 November 2019
(This article belongs to the Section Mathematics and Computer Science)

Abstract

:
In this paper, we consider the discrete-time constrained average stochastic games with independent state processes. The state space of each player is denumerable and one-stage cost functions can be unbounded. In these game models, each player chooses an action each time which influences the transition probability of a Markov chain controlled only by this player. Moreover, each player needs to pay some costs which depend on the actions of all the players. First, we give an existence condition of stationary constrained Nash equilibria based on the technique of average occupation measures and the best response linear program. Then, combining the best response linear program and duality program, we present a non-convex mathematic program and prove that each stationary Nash equilibrium is a global minimizer of this mathematic program. Finally, a controlled wireless network is presented to illustrate our main results.

1. Introduction

Stochastic games introduced by Shapley in Reference [1] which have been actively pursued over the last few decades because of several applications mainly in economics and queueing system; see, for instance, References [2,3,4,5]. In this paper, we study the special constrained stochastic games with independent state processes. More precisely, each player chooses an action and pays some costs which depend on the actions of all players in each stage. However, each player’s action only influence the transition probability of the Markov chain controlled by herself. In such case, the player would wish to minimize certain expected average cost which she is most concerned about, while wants to keep other kinds of expected average costs within bounds. The game models under these formulation have been considered in References [6,7]. In Reference [6], the authors present an existence condition of stationary Nash equilibria for the constrained average games. Reference [7] derives a necessary and sufficient condition for a stationary Nash equilibria to be a global minimizer of a mathematic program.
Different from the framework of game model considered in this paper, References [2,8,9] study the so-called centralized stochastic games in which all players jointly control a single Markov chain and the one-stage costs of each player are influenced by the actions of all players. Reference [8] considers the game model with expected discounted cost criteria and expected average cost criteria and yields the existence of stationary Nash equilibria in the context of finite state space and compact action spaces. Using vanishing discount approach, Reference [9] generalizes the existence result in Reference [8] to the average games with denumerable states. Reference [2] extends the existence result in Reference [8] to the discounted game with denumerable states but under some special hypothesis on the transition laws and boundness condition imposed on the one-stage costs. By special finite-state approximation approach, Reference [10] weakens the above existence condition in Reference [8]. For applications of constrained games, Reference [11] studies the wireless powered networks including a user and an attacker. Based on the constrained game theory in Reference [6], Reference [11] gives the energy request and data transmission strategy of the user and the attack strategy of the attacker in the sense of Nash equilibria. Reference [12] considers the wireless networks in which each mobil wants to maximize its expected capacity subject to some power and buffer length constraints. Reference [13] introduces a neighbor-aware energy-efficient monitoring system for energy harvesting Internet of Things. A constrained stochastic game model is established to minimize the number of transmissions while keeping a desired monitoring probability and a best response dynamics-based algorithm is developed in Reference [13].
In this paper, we consider the constrained average games with denumerable states and unbounded costs. The main contributions of the present paper are as follows. First, by introducing the average occupation measures, we establish the so-called best-response linear program to characterize the occupation measures of stationary constrained Nash equilibria as fixed points of certain multifunction from the product space of occupation measures into itself. The standard weak convergence technique used in References [6,7] can not apply directly to the case wherein the costs are unbounded. Meanwhile, the arguments in References [6,7] employ the finiteness of the state spaces. However, the costs are always unbounded and the state spaces are not finite in some controlled stochastic models such as the stochastic inventory model, queueing system; see, for instance References [14,15]. Therefore, we introduce the so-called w-weak convergence topology and impose the standard drift condition on the transition kernel, the growth condition and additive condition on the costs, w-uniform geometrical ergodicity condition and continuity-compactness conditions. By the properties of w-weak convergence, we study the asymptotic properties of average occupation measures and expected average costs, which are used to establish the upper-continuity of the multifunction. Then, we show the existence of stationary constrained Nash equilibria by the fixed-point theorem. It should be mentioned that the vanishing discount approach in Reference [9] is based on the existence of discounted Nash equilibria in Reference [10], that is to show the limit of discounted Nash equilibria is the average Nash equilibrium when the discount factors tend to one. However, in this paper, we use fixed-point method directly because the existence of discounted Nash equilibria for stochastic games with independent state process has not been established. Finally, we characterize each stationary constrained Nash equilibrium as a global minimizer of this mathematical program which can be viewed as the combination of the best response linear program and duality program.
The rest of the paper is organized as follows. In Section 2, we introduce the constrained average game model we are concerned with. In Section 3, we introduce the best response linear program and study the corresponding convergent property of the best response linear program based on the average occupation measures. In Section 4, we establish the main statements, including the existence result and the characterization of stationary constrained Nash equilibria. In Section 5, a controlled wireless network is presented to illustrate our main results.

2. The Game Model

If S is a Borel space, we denote by B ( S ) its Borel σ -algebra, by P ( S ) the set of all probability measures on B ( S ) endowed with the topology of weak convergence, and by B ( S ) the set of all Borel measurable functions on S. Let I stand for the indicator function, N 1 be a fixed integer. Let us define I : = { 1 , , N } , and use indexs i or j to denote a player. Now, we introduce the N-person constrained game model
G : = X i , A i , { A i ( x i ) | x i X i } , { c k i } k = 0 p , { d k i } k = 1 p , Q i ( · | x i , a i ) , ν i i I .
For each i I , the state space X i is assumed to be a denumerable set and A i is the action space which is assumed to be a Polish space endowed with the Borel σ -algebra B ( A i ) . Without loss of generality, we assume X i : = { 0 , 1 , } and is enumerated by the natural order. Let X : = i = 1 N X i be the product space of the state spaces and A : = i = 1 N A i be the product space of the action spaces. For each x i X i , the nonempty measurable subset A i ( x i ) of A i denotes the set of actions available when the state of player i is x i X i . Let K i : = ( x i , a i ) | x i X i , a i A i ( x i ) and K : = i = 1 N K i . Q i is the stochastic kernel on X i given K i controlled by player i, that is Q i ( y i | x i , a i ) is the probability of moving from state x i to y i if player i chooses action a i . For each 0 k p , the one-stage cost c k i is a real-valued measurable function on K i . The constants d k i ( 1 k p ) denote the constraints and ν i denotes the initial distribution of player i. Let ν : = ( ν 1 , , ν N ) .
For each i I , let H 0 i : = X i , H t i : = ( K i ) t × X i be the set of all histories up to time t and Φ i be the set of all stochastic kernels on A i given X i for each i, where ( K i ) t denotes the t-power product space of K i .
Definition 1.
(1) 
A randomized history-dependent strategy for player i is a sequence π i : = { π t i , t = 0 , 1 , 2 , } of stochastic kernels π t i on the action space A i given H t i satisfying
π t i ( A i ( x t i ) | h t i ) = 1 h t i t = 0 , 1 , 2 , .
(2) 
A randomized history-dependent strategy π i = { π t i } Π h i for player i is said to be randomized stationary if there is a stochastic kernel φ i Φ i such that π t i ( · | h t i ) = φ i ( · | x t i ) for each h t i H t . We will write such a stationary strategy as φ i .
Assumption 1.
The players do not observe their costs, that is, the strategy chosen by any player does not depend on the realization of the cost.
Remark 1.
Assumption 1 is imposed on [6,7], which is used to ensure that a player could not use the one-stage costs to estimate the state and the action of the other players.
The sets of all randomized history-dependent strategies and randomized stationary strategies for player i are denoted by Π h i and Π s i , respectively. A multi-strategy is a vector π : = ( π 1 , , π N ) Π h , where Π h : = × i = 1 N Π h i . Let Π s : = × i = 1 N Π s i denote the set of all randomized stationary multi-strategies.
Let Ω i : = ( X i × A i ) and F i be the corresponding product σ -algebra. Then, for each π i Π h i and each initial distribution ν i P ( X i ) , the well-known Tulcea’s Theorem ([16], p. 178) ensures the existence of a unique probability measure P ν i π i on ( Ω i , F i ) such that, for each B i B ( X i ) and h t i H t i ,
P ν i π i ( x t + 1 i B i | h t i , a t i ) = Q i ( B i | x t i , a t i ) , t = 0 , 1 , 2 , ,
where x t i and a t i denote the state and the action at the decision epoch t, respectively. The expectation operator with respect to P ν i π i is denoted by E ν i π i . If ν i is concentrated at some state x i X i , we will write P ν i π i and E ν i π i as P x i π i and E x i π i , respectively. For each π : = ( π 1 , , π N ) Π h , the product measure of P ν i π i is denoted by P ˜ ν π : = × i = 1 N P ν i π i and the corresponding expectation operator is denoted by E ˜ ν π . For each π ^ i Π h i , we denote by [ π i , π ^ i ] the N-vector multi-strategy obtained from π by replacing π i with π ^ i . Similarly, for each a i A i , we denote by [ π i , a i ] the N-vector in which the jth component is π j for j i , while the ith component is a i .
For each ν × i = 1 N P ( X i ) and π Π h , the expected average criteria are defined, for each player i I and 0 k p , as
V k i ( ν , π ) : = lim sup n 1 n E ˜ ν π t = 0 n 1 c k i ( x t , a t ) ,
where x t and a t denote the state vector and action vector at time t, respectively.
Definition 2.
(1) 
For a fixed π = ( π 1 , π N ) Π h , a strategy π i Π h i is said to be feasible for player i against π if V k i ( ν , [ π i , π i ] ) d k i for each 1 k p . Let
Δ i ( π ) : = π i Π h i | V k i ( ν , [ π i , π i ] ) d k i , for each 1 k p ,
be the set of all feasible strategies for player i against π.
(2) 
π : = ( π 1 , , π N ) Π h is called a feasible multi-strategy for G if π i is in Δ i ( π ) for each i I . We denote by Δ the set of all feasible multi-strategies.
(3) 
(Nash equilibrium) π * is called a constrained Nash equilibrium if π * Δ and
V 0 i ( ν , π * ) = inf π i Δ i ( π * ) V 0 i ( ν , [ π * i , π i ] ) .
A constrained Nash equilibrium π * is said to be stationary if π * belongs to Π s .

3. The Technique Preliminary

A monotone nondecreasing function f : X i [ 1 , ) such that lim x i f ( x i ) = will be refereed to as a Lyapunov function on X i for each i I . For each function g and constant τ , we denote by g τ the τ power of the function g.
To guarantee the finiteness of the expected average costs, we need the following drift conditions, which is widely used in References [14,16,17,18] for discrete-time Markov decision processes and in References [5,10] for stochastic games.
Assumption 2.
For each i I , there exist a Lyapunov function w i 1 on X i , constants b i 0 , 1 > β i > 0 and τ > 1 such that
(1) 
y i X i w i τ ( x i ) Q i ( y i | x i , a i ) β i w i τ ( x i ) + b i ;
(2)
x i X i w i τ ( x i ) ν i ( x i ) < .
By Jensen’s inequality, it follows that there exist constants β i ( τ ) : = β i τ τ > 0 and b i ( τ ) : = b i τ τ > 0 (depending on τ ) for each 0 < τ < τ such that
y i X i Q i ( y i | x i , a i ) w i τ ( y i ) β i ( τ ) w i τ ( x i ) + b i ( τ ) , for each ( x i , a i ) K i .
Now, we give two different kinds of hypotheses imposed on the cost functions to ensure the existence of Nash equilibria.
Assumption 3.
(1) 
(Growth condition) There exists a constant M 1 > 0 such that | c k i ( x 1 , , x N , a 1 , , a N ) | M 1 min i I { w i ( x i ) } for each ( x 1 , , x N , a 1 , , a N ) K and 0 k p .
(2) 
The function c k i ( x 1 , , x N , a 1 , , a N ) is continuous in ( a 1 , , a N ) × i = 1 N A i ( x i ) for each fixed ( x 1 , , x N ) X .
Assumption 4.
Suppose that there exist a constant M 2 > 0 and functions f k i , j on K j with j I for each i I and 0 k p , such that
(1) 
c k i ( x 1 , , x N , a 1 , , a N ) = j = 1 N f k i , j ( x j , a j ) and | f k i , j ( x j , a j ) | M 2 w j ( x j ) ;
(2) 
f k i , j ( x j , · ) is continuous in a j A j ( x j ) for each x j S j .
Remark 2.
(1) 
It should be mentioned that the cost functions in References [6,7,9] are assumed to be bounded. However, the cost functions are always unbounded in some controlled stochastic models such as the stochastic inventory model, queueing system; see, for instance, References [14,15].
(2) 
As in Assumption 4, the case that the immediate costs are additive is also considered in References [7,19] for discrete-time unconstrained stochastic games.
Lemma 1.
Suppose that Assumptions 2 and 3(1) (resp. 4(1)) hold. For each i I , π i Π h i and x i X i ,
lim sup n 1 n E ν i π i [ t = 0 n 1 w i τ ( x t i ) ] b i 1 β i ,
V k i ( ν , π ) M 1 b i 1 β i ( resp . V k i ( ν , π ) M 2 j = 1 N b j 1 β j ) .
Proof. 
(4) and (5) follow directly from Lemma 10.4.1 in Reference [17]. □
For each function u B ( X i ) (resp. u B ( K i ) ) and 1 f B ( X i ) , let us define the f-norm | | u | | f by | | u | | f : = sup x i X i | u ( x i ) | f ( x i ) (resp. | | u | | f : = sup x i X i | u ( x i , a i ) | f ( x i ) ) and the Banach space B f ( X i ) : = u B ( X i ) | | | u | | f < (resp. B f ( K i ) : = u B ( K i ) | | | u | | f < ). The set of all bounded measurable functions on X i is denoted by B 1 ( X i ) .
For each i I and φ i Π s i , let us define
Q i ( y i | x i , φ i ) : = A i Q i ( y i | x i , a i ) φ i ( d a i | x i ) .
For each t 1 , let Q i ( x i | y i , φ i , t ) denote the t-step transition probability from y i to x i corresponding to φ i . Obviously, Q i ( x i | y i , φ i , 1 ) = Q i ( y i | x i , φ i ) .
Definition 3.
For each i I and φ i Π s i , the transition kernel Q i ( · | · , φ ) is said to be irreducible, if P x i φ i ( x t i = y i for some t 1 ) > 0 for all x i , y i X i .
Assumption 5.
For each i I ,
(1) 
let φ i Π s i , there exist a probability measure μ φ i i on X i and constants R i > 0 and ρ i > 0 such that
| y i X i u ( y i ) Q i ( y i | x i , φ i , t ) μ φ i i ( u ) | | | u | | w i τ R i ρ i t w i τ ( x i )
for each u B w i τ ( X i ) ;
(2) 
Q i ( · | · , φ ) is irreducible for all φ i Π s i .
Remark 3.
(1) 
Under Assumptions 2 and 5, it is easy to see that μ φ i i is the unique invariant probability measure of the transition kernel Q i ( y i | x i , φ i ) , see Reference ([17], p.12). Moreover, we have μ φ i i ( w i τ ) : = y i X i w i τ ( y i ) μ φ i i ( y i ) b i 1 β i < for each 0 < τ τ and μ φ i i ( x i ) > 0 for each x i X i .
(2) 
The w-uniform geometrical ergodicity condition in Assumption 5(1) has been widely used in discrete-time Markov decision processes, see References [17,18] and the reference therein. Since the state space in References [6,7] is finite, the standard ergodicity condition is only required in References [6,7]. Moreover, as the cost functions in Reference [9] are assumed to be bounded, the weaker uniform geometrical ergodicity condition is imposed on [9].
For each i I , we define the average occupation measure μ ˜ φ i i ( x i , d a i ) : = μ φ i i ( x i ) φ i ( d a i | x i ) for each φ i Π s i and the set
N i : = { μ ˜ φ i i | φ i Π s i } and N : = × i = 1 N N i .
For each μ ˜ = ( μ ˜ φ 1 1 , , μ ˜ φ N N ) × i = 1 N N i and i I , we introduce the following Markov decision processes M μ ˜ i :
M μ ˜ i : = X i , ( A i , { A i ( x i ) | x i X i } ) , Q i ( y i | x i , a i ) , c 0 , μ ˜ i ( x i , a i ) , ( c k , μ ˜ i ( x i , a i ) , d k i , 1 k p )
where
c k , μ ˜ i ( x i , a i ) : = K N μ ˜ φ N N ( x N , d a N ) K i + 1 μ ˜ φ i + 1 i + 1 ( x i + 1 , d a i + 1 ) K i 1 μ ˜ φ i 1 i 1 ( x i 1 , d a i 1 ) K 1 μ ˜ φ 1 1 ( x 1 , d a 1 ) c k i ( x 1 , , x N , a 1 , , a N ) ,
for each 0 k p and ( x i , a i ) K i , the other components are the same as in (1).
For each i I and π i Π h i , the expected average cost is defined by
V k , μ ˜ i ( ν i , π i ) : = lim sup n 1 n E ν i π i t = 0 n 1 c k , μ ˜ i ( x t i , a t i ) for each 0 k p .
Definition 4.
For a fixed i I , strategy π i Π h i is said to be feasible for M μ ˜ i if
V k , μ ˜ i ( ν i , π i ) d k i for each 1 k p .
Let U i ( μ ˜ ) be the set of all feasible strategies of M μ ˜ i . A strategy π i Π h i is said to be optimal for M μ ˜ i if π i U i ( μ ˜ ) and
V 0 , μ ˜ i ( ν i , π i ) = inf π i U i ( μ ˜ ) V 0 , μ ˜ i ( ν i , π i ) .
We denote by V μ ˜ i the optimal value of M μ ˜ i .
For each i I and μ ˜ N , under Assumptions 2, 3 (resp. 4) and 5, we have
V k , μ ˜ i ( ν i , φ i ) = K i c k , μ ˜ i ( x i , a i ) μ ˜ φ i i ( x i , d a i ) for each φ i Π s i .
Now, we introduce the following constrained optimality problem
minimize V 0 , μ ˜ i ( ν i , π i ) subject to π i Π h i , V k , μ ˜ i ( ν i , π i ) d k i , 1 k p .
Lemma 2.
Suppose that Assumptions 1, 2 and 5 are satisfied and in addition either Assumption 3 or 4 holds. Let μ ˜ : = { μ ˜ 1 , , μ ˜ N } N and φ : = ( φ 1 , , φ N ) with μ ˜ i ( x i , d a i ) = μ ˜ ^ i ( x i ) φ i ( d a i | x i ) for each i I . Then
V k , μ ˜ i ( ν i , π i ) = V k i ( ν , [ φ i , π i ] ) for each π i Π h i and 0 k p .
Proof. 
We only prove for i = 1 . Let us define
c k 1 ( x 1 , , x N , [ φ 1 , a 1 ] ) : = A 2 φ 2 ( d a 2 | x 2 ) A N c k 1 ( x 1 , , x N , a ) φ N ( d a N | x N ) ,
p ν j ( t , x j , φ j ) : = y j X j ν j ( y j ) Q j ( x j | y j , φ j , t ) for each φ j Π s j .
First, we assume Assumption 3 holds. It follows from Assumption 5 that, for each player j 1 , we have
| x j X j μ ˜ φ j j ( x j ) c k 1 ( x 1 , , x N , [ φ 1 , a 1 ] ) x j X j p ν j ( t , x j , φ j ) c k 1 ( x 1 , , x N , [ φ 1 , a 1 ] ) | ρ j t R j | | c k 1 ( x 1 , , x j 1 , x j + 1 , , x N , [ φ 1 , a 1 ] ) | | w j τ x j X j ν j ( x j ) w j τ ( x j ) ρ j t R j M 1 x j X j ν j ( x j ) w j τ ( x j ) ,
where | | c k 1 ( x 1 , , x j 1 , x j + 1 , , x N , [ φ 1 , a 1 ] ) | | w j τ : = sup x j X j | c k 1 ( x 1 , , x N , [ φ 1 , a 1 ] ) | w j τ ( x j ) for each 0 k p . For notational ease, we define n = 2 j 1 μ ˜ ^ n ( x n ) : = 1 for j = 2 and n = j + 1 N p ν n ( t , x n , φ n ) : = 1 for j = N and X 1 : = i = 2 N X i . Hence,
| E ν 1 π 1 c k , μ ˜ 1 ( x t 1 , a t 1 ) E ˜ ν [ φ 1 , π 1 ] c k 1 ( x t , [ φ 1 , a t 1 ] ) | = | K 1 P ν 1 π 1 ( x t 1 , d a t 1 ) x 2 X 2 μ ˜ ^ 2 ( x 2 ) x N X N μ ˜ ^ N ( x N ) c k 1 ( x t 1 , , x N , [ φ 1 , a t 1 ] ) K 1 P ν 1 π 1 ( x t 1 , d a t 1 ) x 2 X 2 p ν 2 ( t , x 2 , φ 2 ) x N X N p ν N ( t , x N , φ N ) c k 1 ( x t 1 , , x N , [ φ 1 , a t 1 ] ) | | j = 2 N K 1 P ν 1 π 1 ( x t 1 , d a t 1 ) ( x 2 , , x N ) X 1 n = 2 j 1 μ ˜ ^ n ( x n ) · n = j + 1 N p ν n ( t , x n , φ n ) · ( μ ˜ ^ j ( x j ) p ν j ( t , x j , φ j ) ) c k 1 ( x t 1 , , x N , [ φ 1 , a t 1 ] ) | j = 2 N R j ρ j t M 1 x j X j ν j ( x j ) w j τ ( x j ) .
Thus, we have that
1 n t = 0 n 1 | E ν 1 π 1 [ c k , μ ˜ 1 ( x t 1 , a t 1 ) ] E ˜ ν [ φ 1 , π 1 ] [ c k 1 ( x t , [ φ 1 , a t 1 ] ) ] | 1 n j = 2 N R j M 1 x j X j ν j ( x j ) w j τ ( x j ) · t = 0 n 1 ρ j t 0 , as n ,
which implies the desired result. On the other hand, if Assumption 4 holds,
V k 1 ( ν , [ φ 1 , π 1 ] ) = lim sup n 1 n t = 0 n 1 j = 2 N x j X j f k 1 , j ( x j , φ j ) p ν j ( t , x j , φ j ) + K 1 f k 1 , 1 ( x t 1 , a t 1 ) P ν 1 π 1 ( x t 1 , d a t 1 ) = j = 2 N x j X j f k 1 , j ( x j , φ j ) μ ˜ ^ j ( x j ) + lim sup n 1 n t = 0 n 1 K 1 f k 1 , 1 ( x t 1 , a t 1 ) P ν 1 π 1 ( x t 1 , d a t 1 ) = lim sup n 1 n t = 0 n 1 j = 2 N x j X j f k 1 , j ( x j , φ j ) μ ˜ ^ j ( x j ) + K 1 f k 1 , 1 ( x t 1 , a t 1 ) P ν 1 π 1 ( x t 1 , d a t 1 ) = lim sup n 1 n t = 0 n 1 K 1 P ν 1 π 1 ( x t 1 , d a t 1 ) c k , μ ˜ 1 ( x t 1 , a t 1 ) = V k , μ ˜ 1 ( ν 1 , π 1 ) .
 □
Let w i be the function as in Assumption 2 for each i I . Let us define C w i ( K i ) : = { u B w i ( K i ) | u is continuous on K i } and P w i ( K i ) : = { μ P ( K i ) | K i w i ( x i ) μ ( x i , d a i ) < } .
Definition 5.
For each i I , let { μ n } P w i ( K i ) . μ n is said to converge to μ P ( K i ) with respect to the w i -weak topology if and only if lim n K i g ( x i , a i ) μ n ( x i , d a i ) = K i g ( x i , a i ) μ ( x i , d a i ) for each g C w i ( K i ) . In this case, we denote it by μ n w i μ .
Let w : = ( w 1 , , w N ) and { μ n } × i = 1 N P w i ( K i ) with μ n = ( μ n 1 , , μ n N ) , we say μ n w μ = : ( μ 1 , , μ N ) if μ n i w i μ i for each i I . For each i I and μ i P w i ( K i ) , we define μ ^ i ( x i ) : = μ i ( x i , A i ( x i ) ) for each x i X i .
To ensure the existence of constrained Nash equilibria, we also need the following standard continuity-compactness conditions which are widely used; see, for instance, References [4,14,16,17,19] and the reference therein.
Assumption 6.
(1) 
For each i I and x i X i , A i ( x i ) is a compact set.
(2) 
The functions Q i ( y i | x i , a i ) is continuous in a i A i ( x i ) for each fixed i I and x i , y i X i .
(3) 
The functions y i X i w i ( y i ) Q i ( y i | x i , a i ) is continuous in a i A i ( x i ) for each fixed i I and x i X i .
Lemma 3.
Suppose that Assumptions 2, 5 and 6 hold. For each i I , the set N i is convex and compact with respect to the w i -weak topology.
Proof. 
Fix 1 τ < τ , it follows from Remark 3 that sup μ ˜ i N i { K i w i τ ( x i ) μ ˜ i ( x i , d a i ) } b i 1 β i and the set { ( x i , a i ) K i | w i τ ( x i ) n w i τ ( x i ) } is compact in K i for each n 1 . Hence, it follows from Corollary A.46 in Reference [20], the set N i is compact with respect to the w i -weak topology. The other statement can be obtained by Lemma 5.2.2 in Reference [18]. □
Remark 4.
The standard weak convergence technique used in References [6,7] for bounded costs does not apply directly to the case wherein costs are unbounded in this paper.
Lemma 4.
Suppose that Assumptions 1, 2, 5 and 6 are satisfied and in addition either Assumption 3 or 4 holds. Let i I be fixed, { μ ˜ n } N and { η n } N i , such that μ ˜ n w μ ˜ and η n w i η weakly in P ( K i ) . Then,
lim n K i c k , μ ˜ n i ( x i , a i ) η n ( x i , d a i ) = K i c k , μ ˜ i ( x i , a i ) η ( x i , d a i )
for each 0 k p .
Proof. 
For each n N ¯ , we assume μ ˜ n : = ( μ ˜ n 1 , , μ ˜ n N ) . According to proposition D.8 in Reference [16], there exist φ n i and φ ˜ n i Π s i such that μ ˜ n i ( x i , d a i ) = μ ˜ ^ n i ( x i ) φ n i ( d a i | x i ) and η n ( x i , d a i ) = η ^ n ( x i ) φ ˜ n i ( d a i | x i ) for each i I . For an arbitrary u B 1 ( X i ) , it follows from Lemma 5.1.2 in Reference [18] that
K i y i X i u ( y i ) Q i ( y i | x i , a i ) u ( x i ) μ ˜ n i ( x i , d a i ) = 0 .
Since μ ˜ n i w i μ ˜ i , it follows from Assumption 6(2) that
lim n K i y i X i u ( y i ) Q i ( y i | x i , a i ) u ( x i ) μ ˜ n i ( x i , d a i ) = K i y i X i u ( y i ) Q i ( y i | x i , a i ) u ( x i ) μ ˜ i ( x i , d a i ) = 0 ,
which together with Lemma 5.1.2 in Reference [18] implies that μ ˜ i N i for each i I . Similarly, we also have η N i . Thus, we can see that
lim n μ ˜ ^ n i ( x i ) = μ ˜ ^ i ( x i ) > 0 and lim n η ^ n ( x i ) = η ^ ( x i ) > 0 , for each x i X i ,
φ ˜ n i ( · | x i ) φ ˜ i ( · | x i ) and φ n i ( · | x i ) φ i ( · | x i ) weakly in P ( A i ( x i ) ) for each x i X i . It follows from Assumption 3(2) or Assumption 4 that
lim n c k i ( x 1 , , x N , [ φ n i , φ ˜ n i ] ) = c k i ( x 1 , , x N , [ φ i , φ ˜ i ] ) ,
for each ( x 1 , , x N ) X , where φ n : = ( φ n 1 , , φ n N ) for each n N ¯ .
Under Assumptions 2(2) and 5, we have that
x 1 X 1 μ ˜ ^ 1 ( x 1 ) w 1 ( x 1 ) < and lim n x i X i w i ( x i ) η ^ n ( x i ) = x i X i w i ( x i ) η ^ ( x i ) i I .
Then, under Assumption 3(2), by (7)–(9) and Proposition A.2.6 in Reference [15], it follows that
lim n K i c k , μ ˜ n i ( x i , a i ) η n ( x i , d a i ) = lim n x i X i η ^ n ( x i ) A i ( x i ) φ ˜ n i ( d a i | x i ) x N X N μ ˜ ^ n N ( x N ) A N ( x N ) φ n N ( d a N | x N ) x i + 1 X i + 1 μ ˜ ^ n i + 1 ( x i + 1 ) A i + 1 ( x i + 1 ) φ n i + 1 ( d a i + 1 | x i + 1 ) x i 1 X i 1 μ ˜ ^ n i 1 ( x i 1 ) A i 1 ( x i 1 ) φ n i 1 ( d a i 1 | x i 1 ) A 1 ( x 1 ) φ n 1 ( d a 1 | x 1 ) x 1 X 1 μ ˜ ^ n 1 ( x 1 ) c k i ( x 1 , , x N , a 1 , , a N ) = K i c k , μ ˜ i ( x i , a i ) η ( x i , d a i ) .
If Assumption 3 is replaced by Assumption 4, then we have
lim n K i c k , μ ˜ n i ( x i , a i ) η n ( x i , d a i ) = lim n [ K i f k i , i ( x i , a i ) η n ( x i , d a i ) + j = 1 , j i N K j f k i , j ( x j , a j ) μ ˜ n j ( x j , d a j ) ] = K i f k i , i ( x i , a i ) η ( x i , d a i ) + j = 1 , j i N K j f k i , j ( x j , a j ) μ ˜ j ( x j , d a j ) = K i c k , μ ˜ i ( x i , a i ) η ( x i , d a i ) .
 □
The following slater condition is common for constrained games, see References [2,6,7,8,9,10].
Assumption 7.
(Slater condition) For each stationary multi-strategy φ Π s and each player i, there exists π ˜ i Π h i such that
V k i ( ν , [ φ i , π ˜ i ] ) < d k i , for each 1 k p .
Lemma 5.
Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let i I and μ ˜ : = ( μ ˜ 1 , , μ ˜ N ) N be fixed. Then,
(1) 
for each π i Π h i , there exists a stationary strategy φ ¯ i Π s i (depending on μ ˜ , ν i and π i ), such that V k , μ ˜ i ( ν i , φ ¯ i ) V k , μ ˜ i ( ν i , π i ) for each 0 k p ;
(2) 
there exists a stationary strategy φ ˜ i Π s i such that V k , μ ˜ i ( ν i , φ ˜ i ) < d k i for each 1 k p .
Proof. 
(1)
See Lemma 5.7.10 in Reference [16].
(2)
Let φ : = ( φ 1 , , φ N ) Π s such that μ ˜ i ( x i , d a i ) = μ φ i i ( x i ) φ i ( d a i | x i ) for each i I . For the fixed φ , let π ˜ i Π h i be the corresponding strategy as in Assumption 7. It follows from Lemma 2 that V k , μ ˜ i ( ν i , π ˜ i ) = V k i ( ν , [ φ i , π ˜ i ] ) < d k i for each 1 k p . By part (1), there exists a stationary strategy φ ˜ i Π s i (depending on φ and π ˜ i ) such that V k , μ ˜ i ( ν i , φ ˜ i ) V k , μ ˜ i ( ν i , π ˜ i ) = V k i ( ν , [ φ i , π ˜ i ] ) < d k i for each 1 k p .
 □
It follows from Lemma 5.1.2 in Reference [18] that η N i if and only if
x i X i A i ( x i ) ( Q i ( y i | x i , a i ) I { y i } ( x i ) ) η ( x i , d a i ) = 0 , for each i I .
According to Lemma 5, the constrained optimality problem (6) can be restricted to the set of all stationary strategies. Hence, for each μ ˜ N and i I , the constrained optimality problem (6) is equivalent to the following best response linear program ( LP μ ˜ i ) :
LP μ ˜ i : inf η P w i ( K i ) K i c 0 , μ ˜ i ( x i , a i ) η ( x i , d a i ) subject to K i c k , μ ˜ i ( x i , a i ) η ( x i , d a i ) d k i , 1 k p , K i ( Q i ( y i | x i , a i ) I { y i } ( x i ) ) η ( x i , d a i ) = 0 for each y i X i .
For a fixed i I and μ ˜ N , η is said to be feasible for LP μ ˜ i if it satisfies (10). The sets of all feasible solutions and optimal solutions are denoted by F i ( μ ˜ ) and O i ( μ ˜ ) , respectively. The optimal value of LP μ ˜ i is denoted by inf LP μ ˜ i .
Proposition 1.
Suppose that Assumptions 1, 2, 5, 6 and 7 and are satisfied and in addition either Assumption 3 or 4 holds. Let μ ˜ : = ( μ ˜ 1 , , μ ˜ N ) N and φ : = ( φ 1 , , φ N ) Π s such that μ ˜ i ( x i , d a i ) = μ ˜ ^ i ( x i ) φ i ( d a i | x i ) for each i I . Then,
(1) 
φ i is an optimal strategy of M μ ˜ i for each i I if and only if φ is a stationary constrained Nash equilibrium;
(2) 
μ ˜ i is an optimal strategy of LP μ ˜ i for each i I if and only if φ is a stationary constrained Nash equilibrium.
Proof. 
(1)
⟹. Let i I be fixed and π i Δ i ( φ ) . It follows from Lemma 2 that
V k , μ ˜ i ( ν i , π i ) = V k i ( ν , [ φ i , π i ] ) d k i for each 1 k p .
Thus, we have π i U i ( μ ˜ ) . Then, Lemma 2 yields that
V 0 i ( ν , [ φ i , π i ] ) = V 0 , μ ˜ i ( ν i , π i ) V 0 , μ ˜ i ( ν i , φ i ) = V 0 i ( ν , φ ) ,
which implies that φ is a constrained Nash equilibrium.
⟸. For each i I , we take an arbitrary strategy π i Π h i such that V k , μ ˜ i ( ν i , π i ) d k i for each 1 k p . Lemma 2 yields that V k i ( ν , [ φ i , π i ] ) = V k , μ ˜ i ( ν i , π i ) d k i for each 1 k p , which implies that π i Δ i ( φ ) . As φ is a constrained Nash equilibrium, it follows that V 0 , μ ˜ i ( ν i , φ i ) = V 0 i ( ν , φ ) V 0 i ( ν , [ φ i , π i ] ) V 0 , μ ˜ i ( ν i , π i ) , which implies that φ i is an optimal strategy of M μ ˜ i .
(2)
μ ˜ i is optimal for LP μ ˜ i if and only if φ i is optimal for M μ ˜ i . Hence, we get the desired result by part (1). □
Lemma 6.
Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let i I be fixed and { μ ˜ n } N such that μ ˜ n w μ ˜ , and η n F i ( μ ˜ n ) for each n N . Then, { η n } is relatively compact in N i with respect to the w i -weak topology and the accumulation point is a feasible solution of LP μ ˜ i .
Proof. 
It follows from Lemma 3 that there exists a subsequence { η n m } such that η n m w i η with respect to the w i -weak topology. Then, Lemma 4 yields that
K i c k , μ ˜ i ( x i , a i ) η ( x i , d a i ) = lim m K i c k , μ ˜ n m i ( x i , a i ) η n m ( x i , d a i ) d k i .
As Q i ( · | x i , a i ) C w i ( K i ) , it follows that
K i ( Q i ( y i | x i , a i ) I { y i } ( x i ) ) η ( x i , d a i ) = lim m K i ( Q i ( y i | x i , a i ) I { y i } ( x i ) ) η n m ( x i , d a i ) = 0 ,
which together with (11) implies that η F i ( μ ˜ ) . □
Lemma 7.
Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let i I be fixed and take an arbitrary μ ˜ N , then LP μ ˜ i has an optimal solution.
Proof. 
First, it follows from Lemma 5(2) that F i ( μ ˜ ) . Let { η m } F i ( μ ˜ ) be the minimizing sequence such that
lim m x i X i A i ( x i ) c 0 , μ ˜ i ( x i , a i ) η m ( x i , d a i ) = inf LP μ ˜ i .
By Lemma 6, there exists a subsequence { η m s } of { η m } such that η m s w i η F i ( μ ˜ ) . Then, it follows from Lemma 4 and (12) that
x i X i A i ( x i ) c 0 , μ ˜ i ( x i , a i ) η ( x i , d a i ) = lim s x i X i A i ( x i ) c 0 , μ ˜ i ( x i , a i ) η m s ( x i , d a i ) = inf LP μ ˜ i .
This is η O i ( μ ˜ ) . □
The idea of the proof of the following lemma is from Lemma 5.1 and Theorem 3.9 in Reference [21] for constrained discrete-time Markov decision processes with discounted cost criteria.
Lemma 8.
Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Let i I be fixed and { μ ˜ n } N such that μ ˜ n w μ ˜ . Then,
(1) 
for each η F i ( μ ˜ ) , there exist an integer N and η n F i ( μ ˜ n ) for all n N , such that η n w i η weakly in P ( K i ) ;
(2) 
if η is an accumulation point of sequence { η n , n 1 } in which η n is an optimal solution of LP μ ˜ n i for each n N , then η O i ( μ ˜ ) .
Proof. 
(1)
Lemma 5(2) gives the existence of φ i , φ ˜ i Π s i and constant D > 0 , such that η = μ ˜ φ i i and V k , μ ˜ i ( ν i , φ ˜ i ) d k i D for all 1 k p . By Lemma 4, it follows that
lim n K i c k , μ ˜ n i ( x i , a i ) η ( x i , d a i ) = K i c k , μ ˜ i ( x i , a i ) η ( x i , d a i ) d k i ,
lim n K i c k , μ ˜ n i ( x i , a i ) μ ˜ φ ˜ i i ( x i , d a i ) = K i c k , μ ˜ i ( x i , a i ) μ ˜ φ ˜ i i ( x i , d a i ) d k i D ,
for all 1 k p .
Let ε be such that 0 < ε < D 2 . By (13)-(14), there exists an integer N ε (depending on ε ) such that, for each n N ε and 1 k p ,
K i c k , μ ˜ n i ( x i , a i ) η ( x i , d a i ) d k i + ε , K i c k , μ ˜ n i ( x i , a i ) μ ˜ φ ˜ i i ( x i , d a i ) d k i D + ε .
Let ν n ε : = ( 1 λ ε ) η + λ ε μ ˜ φ ˜ i i , where λ ε : = ε D < 1 2 . Then, we derive from Lemma 3 that ν n ε N i , and
K i c k , μ ˜ n i ( x i , a i ) ν n ε ( x i , d a i ) = ( 1 λ ε ) K i c k , μ ˜ n i ( x i , a i ) η ( x i , d a i ) + λ ε K i c k , μ ˜ n i ( x i , a i ) μ ˜ φ ˜ i i ( x i , d a i ) ( 1 λ ε ) ( d k i + ε ) + λ ε [ ( d k i D ) + ε ] d k i ,
for each 1 k p and n N ε , which implies that ν n ε F i ( μ ˜ n ) for all n N ε .
Then, let { ε s } R , such that ε s 0 and 0 < ε s < D 2 . For each fixed s 1 (corresponding to a given ε s ), as in the previous argument, there exists an integer N s (depending on s) which is assumed to be increasing in s 1 , such that
ν n s : = ( 1 λ s ) η + λ s μ ˜ φ ˜ i i F i ( μ ˜ n ) , n N s , s 1 , where λ s : = ε s D .
Let
η n : = ν n s and λ n : = λ s , for each N s n < N s + 1 .
Since ε s 0 , by (15)–(17), we have η n w i η weakly in P ( K i ) as n and η n F i ( μ ˜ n ) for each n N 1 , which completes the proof of part (i).
(2)
By Lemma 6, without loss of generality, we assume that η n w i η F i ( μ ˜ ) weakly in P ( K i ) . On the other hand, for any η F i ( μ ˜ ) , it follows from part (1) that there exist an integer N and η n F i ( μ ˜ n ) for all n N , such that η n w i η weakly in P ( K i ) , which together with η n O i ( μ ˜ n ) , gives
K i c 0 , μ ˜ n i ( x i , a i ) η n ( x i , d a i ) K i c 0 , μ ˜ n i ( x i , a i ) η n ( x i , d a i ) n N .
Then, by Lemma 4 and (18), we have
K i c 0 , μ ˜ i ( x i , a i ) η ( x i , d a i ) K i c 0 , μ ˜ i ( x i , a i ) η ( x i , d a i ) ,
which implies η O i ( μ ˜ ) .

4. The Main Results

In this section, we give the existence and characterization of stationary constrained Nash equilibria. For each μ ˜ = ( μ ˜ 1 , , μ ˜ N ) N , we introduce the following two multi-functions:
Λ i : N 2 N i and Ψ ( μ ˜ ) : N 2 N for each i I
Λ i ( μ ˜ ) : = { η N i | η is an optimal solution of LP μ ˜ i } and Ψ ( μ ˜ ) : = × i = 1 N Λ i ( μ ˜ ) .
Theorem 1.
Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Then, there exists a stationary constrained Nash equilibrium.
Proof. 
Let i I be fixed, { μ ˜ n } N and η n Λ i ( μ ˜ n ) such that μ ˜ n w μ ˜ and η n w i η . Then, by Lemma 8(2), we know that η O i ( μ ˜ ) which implies that Λ i is upper semi-continuous. By the arbitrariness of i, we can derive that Ψ is upper semi-continuous. Moreover, Lemmas 3, 7 and (10) deduce that the set Ψ ( μ ˜ ) is nonempty and convex for each μ ˜ N . Moreover, we can see that Ψ ( μ ˜ ) is compact by the upper semi-continuity of Ψ . Hence, it follows from Fan’s fixed point Theorem in Reference [22] that that Ψ has a fixed point μ ˜ * . By Proposition D.8 in Reference [16], there exists φ * : = ( φ * 1 , , φ * N ) Π s such that μ * i ( x i , d a i ) = μ ^ * i ( x i ) φ * i ( d a i | x i ) , which together with Proposition 1, implies that φ * is a stationary constrained Nash equilibrium. □
By the same proof of Theorems 5.3.1 and 5.4.1 in Reference [18] for discrete-time Markov decision processes, we can yield the following statements.
Lemma 9.
Suppose that Assumptions 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Then, for each given μ ˜ ,
(1) 
there exists a function h * i B ω i ( X i ) and a vector λ * i : = ( λ 1 * i , , λ p * i ) [ 0 , ) p for each i I such that
V μ ˜ * i + h * i ( x i ) = inf a i A i ( x i ) { c 0 , μ ˜ i ( x i , a i ) + k = 1 p λ k * i ( c k , μ ˜ i ( x i , a i ) d k i ) + y i X i h * i ( y i ) Q i ( y i | x i , a i ) } ;
(2) 
for each λ i = ( λ 1 i , , λ p i ) [ 0 , ) p , there exists ( V λ i i , h i ) R × B ω i ( X i ) such that
V λ i i + h i ( x i ) = inf a i A i ( x i ) { c 0 , μ ˜ i ( x i , a i ) + k = 1 p λ k i ( c k , μ ˜ i ( x i , a i ) d k i ) + y i X i h i ( y i ) Q i ( y i | x i , a i ) } ,
and V μ ˜ * i = inf λ [ 0 , ) p V λ i i .
Now we introduce the following duality program DP μ ˜ i for each μ ˜ N and i I :
DP μ ˜ i : sup ( h i , λ i ) B w i ( X i ) × [ 0 , ) p V λ i i subject to V λ i i + h i ( x i ) c 0 , μ ˜ i ( x i , a i ) + k = 1 p λ k i ( c k , μ ˜ i ( x i , a i ) d k i ) + y i X i h i ( y i ) Q i ( y i | x i , a i ) for each ( x i , a i ) ( X i × A i ) , λ k i 0 for each 1 k p .
Combining LP μ ˜ i and DP μ ˜ i together, we introduce the following mathematical program ( MP ) :
MP : min ζ R N × i = 1 N ( B w i ( X i ) × P w i ( K i ) ) × [ 0 , ) N p i = 1 N [ X × A c 0 i ( x , a ) i = 1 N μ ˜ i ( x i , d a i ) V i ] subject to V i + h i ( x i ) c 0 , μ ˜ i ( x i , a i ) + k = 1 p λ k i ( c k , μ ˜ i ( x i , a i ) d k i ) + y i X i h i ( y i ) Q i ( y i | x i , a i ) for each ( x i , a i ) ( X i × A i ) , λ k i 0 for each 1 k p , x i X i A i ( x i ) c k , μ ˜ i ( x i , a i ) μ ˜ i ( x i , d a i ) d k i , 1 k p , x i X i A i ( x i ) ( Q i ( y i | x i , a i ) I { y i } ( x i ) ) μ ˜ i ( x i , d a i ) = 0 for each y i X i ,
where ζ : = ( V i , h i , μ ˜ i , λ i ) i = 1 N R N × i = 1 N ( B w i ( X i ) × P w i ( K i ) ) × [ 0 , ) N p . Let us define the function Φ ( ζ ) : = i = 1 N [ X × A c 0 i ( x , a ) i = 1 N μ ˜ i ( x i , d a i ) V i ] , for each ζ = ( V i , h i , μ ˜ i , λ i ) i = 1 N R N × i = 1 N ( B w i ( X i ) × P w i ( K i ) ) × [ 0 , ) N p .
In Reference [7], the authors characterize each stationary constrained Nash equilibrium in the finite-state games as a global minimum of certain mathematical program. The following theorem generalizes the above result to the denumerable-state games.
Theorem 2.
Suppose that Assumptions 1, 2, 5, 6 and 7 are satisfied and in addition either Assumption 3 or 4 holds. Then,
(1) 
there exists ζ * = ( V * i , h * i , μ ˜ * i , λ * i ) i = 1 N R N × i = 1 N ( B w i ( X i ) × P w i ( K i ) ) × [ 0 , ) N p such that it is a global minimum of the mathematical program MP;
(2) 
suppose that ζ = ( V i , h i , μ ˜ i , λ i ) i = 1 N R N × i = 1 N ( B w i ( X i ) × P w i ( K i ) ) × [ 0 , ) N p is a global minimum of the mathematical program MP with Φ ( ζ ) = 0 . Then, there exits φ * i Π s i such that μ ˜ i ( x i , d a i ) = μ ˜ ^ i ( x i ) φ * i ( d a i | x i ) and φ * : = ( φ * 1 , , φ * N ) is a constrained Nash equilibrium of the model G .
Proof. 
(1)
Let φ * : = ( φ * 1 , , φ * N ) be a constrained Nash equilibrium and let μ ˜ * = ( μ ˜ φ * 1 1 , , μ ˜ φ * N N ) × i = 1 N N i be the average occupation measure corresponding to φ * . By Proposition 1, μ ˜ φ * i i satisfies (10) if μ ˜ is replaced by μ ˜ * . Theorem 5.3.1 in Reference [18] and Theorem 10.3.6 in Reference [17] yield that there exists ( V * i , h * i , λ * i ) R × B w i ( X i ) × [ 0 , ) p which satisfies (20) by replacing μ ˜ with μ ˜ * and
x i X i A i ( x i ) c 0 , μ ˜ * i ( x i , a i ) μ ˜ * i ( x i , d a i ) = V * i = V μ ˜ * * i
for each i I . Let ζ * : = ( V * i , h * i , μ ˜ * i , λ * i ) i = 1 N , we have Φ ( ζ * ) = 0 . In turn, let ζ = ( V λ i i , h i , μ ˜ i , λ i ) i = 1 N R N × i = 1 N ( B w i ( X i ) × P w i ( K i ) ) × [ 0 , ) N p be a feasible point of MP , we get that
V λ i i X × A c 0 i ( x , a ) j = 1 N μ ˜ j ( x j , d a j ) + k = 1 p λ k i X × A ( c k i ( x , a ) d k i ) j = 1 N μ ˜ j ( x j , d a j ) + K i y i X i h i ( y i ) Q i ( y i | x i , a i ) μ ˜ i ( x i , d a i ) K i h i ( x i ) μ ˜ i ( x i , d a i ) X × A c 0 i ( x , a ) j = 1 N μ ˜ j ( x j , d a j ) ,
which implies Φ ( ζ ) 0 .
(2)
Let us take an arbitrary global minimum ζ = ( V λ i i , h i , μ ˜ i , λ i ) i = 1 N R N × i = 1 N ( B w i ( X i ) × P w i ( K i ) ) × [ 0 , ) N p of MP with Φ ( ζ ) = 0 . Since ζ is a feasible solution of MP, by (22), we have
V λ i i X × A c 0 i ( x , a ) j = 1 N μ ˜ j ( x j , d a j )
and μ ˜ i satisfies (10). Hence, by Φ ( ζ ) = 0 , it follows that V λ i i = X × A c 0 i ( x , a ) j = 1 N μ ˜ j ( x j , d a j ) for each i I .
Then, let μ ˜ : = ( μ 1 , , μ N ) and take an arbitrary feasible solution μ ˜ i of LP μ ˜ i , we have
X × A ( c k i ( x , a ) d k i ) μ ˜ i ( x i , d a i ) j = 1 , j i N μ ˜ j ( x j , d a j ) 0 , for each 1 k p .
Then, proceeding as in the proof of (22), it follows that
X × A c 0 i ( x , a ) j = 1 N μ ˜ j ( x j , d a j ) = V λ i i X × A c 0 i ( x , a ) μ ˜ i ( x i , d a i ) j = 1 , j i N μ ˜ j ( x j , d a j ) ,
which implies μ ˜ i is LP μ ˜ i optimal for each i I . Then, Proposition D.8 in Reference [16] yields that there exists a stationary strategy φ * i Π s i such that μ ˜ i ( x i , d a i ) = μ ˜ ^ i ( x i ) φ * i ( d a i | x i ) for each i I . By Proposition 1, φ * : = ( φ * 1 , , φ * N ) is a stationary constrained Nash equilibrium of the model G .
 □

5. An Example

In this section, we use a wireless network to illustrate our conditions and main results.
Example 1.
Consider a wireless network in which there are N nodes and each node contains a mobile, a buffer and a channel. Let x t i X i : = { 0 , 1 , } denote the number of packets in the buffer of node i and we assume that the new packets are not admitted if the buffer is not empty. At time t, if x t i > 0 , each mobile transmits a packet with power a t i A i : = [ δ , 1 ] , where 0 < δ < 1 . We assume each mobile will retransmit packet at time t + 1 if the packet has not been transmitted successfully. When x t i = 0 , the number of the new arrivals at time t is denoted by z t , which is assume to be identically distributed. The number of packets in the buffer i and the action a t i are only available for mobile i itself. Hence, the transition probability are defined as follows: if x i 1
Q i ( x i 1 | x i , a i ) = a i Q i ( x i | x i , a i ) = 1 a i ,
and Q i ( x i | 0 , a i ) = q ( x i ) , where k = 0 q ( k ) = 1 with q ( k ) > 0 for each k X i . Let ν i denote the initial distribution of the buffer i, p 0 be some base value of the power, c 1 i ( x i , a i ) : = p 0 a i be the cost of power for node i and c 2 i ( x i , a i ) : = x i denote the delay cost for node i. As in Reference [12], we assume that the signal to interference ratio of mobile i denoted by S I R i is given by
SIR i ( ( x 1 , x 2 , , x N ) , ( a 1 , a 2 , , a N ) ) : = g i a i N 0 + j i , x j 0 r i j g j a j if x j > 0 = 0 , otherwise ,
where r i j are the coding orthogonality coefficients and N 0 is the thermal noise in the medium, g i > 0 are some constants. Let c 0 i ( x 1 , , x N , a 1 , , a N ) : = log 2 ( 1 + SIR i ( ( x 1 , x 2 , , x N ) , ( a 1 , a 2 , , a N ) ) ) denotes the cost that each mobile wants to minimize, and d 1 i , d 2 i denote the constraints of node i corresponding to c 1 i and c 2 i .
Condition 1.
(1) 
k = 1 k 2 q ( k ) < .
(2) 
There exists an interval [ 0 , s ¯ ] such that m ( s ) : = k = 0 e 2 s k q ( k ) < for each s [ 0 , s ¯ ] .
(3) 
There exists s ^ ( 0 , s ¯ ) such that x i X i e 2 s ^ x i ν i ( x i ) < for each i.
(4) 
There exists an action a [ δ , 1 ] such that p 0 a < d 1 i and 1 2 k = 1 k ( 1 + k ) q ( k ) a + k = 1 k q ( k ) < d 2 i for each i.
Proposition 2.
Under condition 1, Example 1 satisfies Assumptions 1, 2, 3, 5, 6 and 7. Hence, (by Theorem 1), there exists a stationary constrained Nash equilibrium.
Proof. 
For each i and φ i Π s i , let μ φ i i denote the invariant measure and φ i ( x i ) : = δ 1 a i φ i ( d a i | x i ) , which satisfies
μ φ i i ( y i ) = x i = 0 Q i ( y i | x i , φ i ) μ φ i i ( x i ) = [ 1 φ i ( y i ) ] μ φ i i ( y i ) + φ i ( y i + 1 ) μ φ i i ( y i + 1 ) + q ( y i ) μ φ i i ( 0 ) .
Hence, it follows from Condition 1(1) that
φ i ( y i ) μ φ i i ( y i ) = μ φ i i ( 0 ) q ( y i ) + μ φ i i ( 0 ) q ( y i + 1 ) + μ φ i i ( y i + 2 ) q ( y i + 2 ) = μ φ i i ( 0 ) k = 0 q ( y i + k ) ,
which implies that
μ φ i i ( y i ) = μ φ i i ( 0 ) k = 0 q ( y i + k ) φ i ( y i ) , and μ φ i i ( 0 ) = 1 1 + x i = 1 y i = 1 x i q ( x i ) φ ( y i ) .
Let us define β 1 : = 1 δ ( 1 1 e ) , the functions l ( x i ) : = I 0 ( x i ) and w i ( x i ) : = e s ^ x i and μ ( x i ) : = q ( x i ) for each x i X i . It is obvious that
Q i ( x i 1 | x i , φ i ) = φ i ( x i ) I 0 ( x i ) μ ( x i 1 ) , Q i ( x i | x i , φ i ) = 1 φ i ( x i ) I 0 ( x i ) μ ( x i ) , Q i ( y i | 0 , φ i ) = q ( y i ) I 0 ( 0 ) q ( y i ) .
If x i 1 ,
y i X i w i ( y i ) Q i ( y i | x i , a i ) = e s ^ ( x i 1 ) a i + e s ^ x i ( 1 a i ) = e s ^ x i ( 1 a i + 1 e a i )
and
y i X i w i ( y i ) Q i ( y i | 0 , a i ) = y i X i e s ^ y i q ( y i ) .
Thus, by Condition 1(2), it follows that
y i X i w i ( y i ) Q i ( y i | x i , a i ) β 1 w i ( x i ) + l ( 0 ) y i X i e s ^ y i q ( y i ) .
Hence, the model satisfies Assumption 6(3). Let β 2 : = 1 δ ( 1 1 e 2 s ^ ) and b 2 = y i X i e 2 s ^ y i q ( y i ) . Similarly, if x i > 0 ,
y i X i w i 2 ( y i ) Q i ( y i | x i , a i ) = e 2 s ^ ( x i 1 ) a i + e 2 s ^ x i ( 1 a i ) = e 2 s ^ x i ( 1 a i + a i e 2 s ^ ) e 2 s ^ x i β 2 ,
if x i = 0 ,
y i X i w i 2 ( y i ) Q i ( y i | 0 , a i ) = y i X i e 2 s ^ y i q ( y i ) .
Therefore, we have
y i X i w i 2 ( y i ) Q ( y i | x i , a i ) β 2 w i 2 ( x i ) + b 2 l ( 0 ) ,
for each x i 0 , which together with Condition 1(3) and Proposition 10.2.5 in Reference [17], implies Assumptions 2 and 5 with τ = 2 . Let φ i ( a | x i ) : = 1 for each i and φ Π s , it follows from (23) and Condition 1(1) that
V 2 i ( ν , [ φ i , φ i ] ) : = lim sup n 1 n E ˜ ν [ φ i , φ i ] t = 0 n 1 c 2 i ( x t i , a t i ) = y i = 1 y i μ φ i i ( y i ) = μ φ i i ( 0 ) y i = 1 k = 0 y i q ( y i + k ) φ i ( y i ) = μ φ i i ( 0 ) x i = 1 q ( x i ) y i = 1 x i y i φ i ( y i ) = μ φ i i ( 0 ) x i = 1 q ( x i ) x i ( 1 + x i ) 2 a = 1 2 k = 1 k ( 1 + k ) q ( k ) a + k = 1 k q ( k ) < d 2 i ,
which together with Condition 1(4) implies the Assumption 7. It is obvious that Assumptions 3 and 6(1)–(2) hold for this model. Thus, by Theorem 1, there exists a stationary constrained Nash equilibrium. □

6. Conclusions

In the present paper, we have studied the discrete-time constrained stochastic games with denumerable state space under the average cost criteria. By introducing the average occupation measures, we have established the best-response linear program and characterized the average occupation measures of stationary Nash equilibria as fixed points of ceratin multifunction. By introducing the so-called w-weak convergence topology, we have considered the asymptotic properties of average occupation measures and given the new existence condition of stationary constrained Nash equilibria. Furthermore, we have established a mathematical program and showed that each stationary Nash equilibria is a global minimizer of this mathematical program. It should be mentioned that the arguments in References [6,7] employ the compactness of the space of occupation measures with respect to the standard weak convergence topology and the solvability of best response linear program which can not apply directly for the case of unbounded costs and denumerable state space. However, the costs are usually unbounded and state space is not finite in some controlled stochastic models. Moreover, the fixed point method in the present paper is also different from the vanishing discount method in Reference [9] which is based on the existence of constrained discounted Nash equilibria. Finally, a wireless network has been presented to illustrate the applications of our main results.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 11801080).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shapley, L.S. Stochastic games. Proc. Natl. Acad. Sci. USA 1953, 39, 1095–1100. [Google Scholar] [CrossRef] [PubMed]
  2. Alvarez-Mena, J.; Hernández-Lerma, O. Existence of Nash equilibria for constrained stochastic games. Math. Methods Oper. Res. 2006, 63, 261–285. [Google Scholar] [CrossRef]
  3. Filar, J.; Vrieze, K. Competitive Markov Decision Processes; Springer: New York, NY, USA, 1997. [Google Scholar]
  4. Nowak, A.S. On stochastic games in economics. Math. Methods Oper. Res. 2007, 66, 513–530. [Google Scholar] [CrossRef]
  5. Yang, J.; Guo, X.P. Zero-sum stochastic games with average payoffs: New optimality conditions. Acta Math. Sin. Engl. Ser. 2009, 25, 1201–1216. [Google Scholar] [CrossRef]
  6. Altman, E.; Avrachenkov, K.; Bonneau, N.; Debbah, M.; EI-Azouzi, R.; Sadoc Menasche, D. Constrained cost-coupled stochastic games with independent state processes. Oper. Res. Lett. 2008, 36, 160–164. [Google Scholar] [CrossRef]
  7. Singh, V.V.; Hemachandra, N. A characterization of stationary Nash equilibria of constrained stochastic games with independent state processes. Oper. Res. Lett. 2014, 42, 48–52. [Google Scholar] [CrossRef]
  8. Altman, E.; Shwartz, A. Constrained Markov Games: Nash Equilibria. In Annals of the International Society of Dynamic Games: Advances in Dynamic Games and Applications; Filar, J.A., Gaitsgory, V., Mizukami, K., Eds.; Birkha¨user: Boston, MA, USA, 2000; Volume 5, pp. 213–221. [Google Scholar]
  9. Wei, Q.D.; Chen, X. Constrained stochastic games with the average payoff criteria. Oper. Res. Lett. 2015, 43, 83–88. [Google Scholar] [CrossRef]
  10. Zhang, W.Z.; Huang, Y.H.; Guo, X.P. Nonzero-sum constrained discrete-time Markov games: The case of unbounded costs. TOP 2014, 22, 1074–1102. [Google Scholar] [CrossRef]
  11. Niyato, D.; Wang, P.; Kim, D.I.; Han, Z.; Xiao, L. Game theoretic modeling of jamming attack in wireless powered communication networks. In Proceedings of the 2015 IEEE International Conference on Communications (ICC), London, UK, 8–12 June 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 6018–6023. [Google Scholar]
  12. Altman, E.; Avratchenkov, K.; Bonneau, N.; Debbah, M.; El-Azouzi, R.; Menasché, D.S. Constrained stochastic games in wireless networks. In Proceedings of the IEEE GLOBECOM 2007—IEEE Global Telecommunications Conference, Washington, DC, USA, 26–30 November 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 315–320. [Google Scholar]
  13. Ko, H.; Pack, S. Neighbor-aware energy-efficient monitoring system for energy harvesting internet of things. IEEE Internet Things J. 2019, 6, 5745–5752. [Google Scholar] [CrossRef]
  14. Puterman, M.L. Markov Decision Processes: Discrete Stochastic Dynamic Programming; Wiley: New York, NY, USA, 1994. [Google Scholar]
  15. Sennott, L.I. Stochastic Dynamic Programming and the Control of Queueing Systems; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
  16. Hernández-Lerma, O.; Lasserre, J.B. Discrete-Time Markov Control Processes: Basic Optimality Criteria; Springer: New York, NY, USA, 1996. [Google Scholar]
  17. Hernández-Lerma, O.; Lasserre, J.B. Further Topics on Discrete-Time Markov Control Processes; Springer: New York, NY, USA, 1999. [Google Scholar]
  18. Mendoza-Pérez, A. Pathwise Average Reward Markov Control Processes. Ph.D. Thesis, CINVE-STAV-IPN, Mexico City, Mexico, 2008. Available online: http://www.math.cinvestav.mx/ohernand_students (accessed on 22 October 2019).
  19. Nowak, A.S. Remarks on sensitive equilibria in stochastic games with additive reward and transition structure. Math. Methods Oper. Res. 2006, 64, 481–494. [Google Scholar] [CrossRef]
  20. Föllmer, H.; Schied, A. Stochastic Finance: An Introduction in Discrete Time; Walter de Gruyter: Berlin, Germany, 2004. [Google Scholar]
  21. Alvarez-Mena, J.; Hernández-Lerma, O. Convergence of the optimal values of constrained Markov control processes. Math. Methods Oper. Res. 2002, 55, 461–484. [Google Scholar] [CrossRef]
  22. Fan, K. Fixed-point and minimax theorems in locally convex topological linear spaces. Proc. Natl. Acad. Sci. USA 1952, 38, 121–126. [Google Scholar] [CrossRef] [PubMed] [Green Version]

Share and Cite

MDPI and ACS Style

Zhang, W. Discrete-Time Constrained Average Stochastic Games with Independent State Processes. Mathematics 2019, 7, 1089. https://doi.org/10.3390/math7111089

AMA Style

Zhang W. Discrete-Time Constrained Average Stochastic Games with Independent State Processes. Mathematics. 2019; 7(11):1089. https://doi.org/10.3390/math7111089

Chicago/Turabian Style

Zhang, Wenzhao. 2019. "Discrete-Time Constrained Average Stochastic Games with Independent State Processes" Mathematics 7, no. 11: 1089. https://doi.org/10.3390/math7111089

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop