Next Article in Journal
Vulnerability and Defence: A Case for Stackelberg Game Dynamics
Previous Article in Journal
Nash Equilibria and Undecidability in Generic Physical Interactions—A Free Energy Perspective
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Stationary Bayesian–Markov Equilibria in Bayesian Stochastic Games with Periodic Revelation

Department of Economics, Rochester Institute of Technology, 92 Lomb Memorial Dr, Rochester, NY 14623, USA
Games 2024, 15(5), 31; https://doi.org/10.3390/g15050031
Submission received: 6 May 2024 / Revised: 26 August 2024 / Accepted: 9 September 2024 / Published: 11 September 2024

Abstract

:
I consider a class of dynamic Bayesian games in which types evolve stochastically according to a first-order Markov process on a continuous type space. Types are privately informed, but they become public together with actions when payoffs are obtained, resulting in a delayed information revelation. In this environment, I show that there exists a stationary Bayesian–Markov equilibrium in which a player’s strategy maps a tuple of the previous type and action profiles and the player’s current type to a mixed action. The existence can be extended to K-periodic revelation. I also offer a computational algorithm to find an equilibrium.

1. Introduction

In the parable of the blind men and the elephant1, individuals attempt to comprehend the concept of an elephant solely through the sense of touch. Each person, limited to exploring only a specific part of the elephant, such as the wriggling trunk, flapping ears, or swinging tail, reaches their own conclusions about the shape of an elephant. One perceives the elephant as resembling a snake, another as a fan, and yet another as a rope. The parable serves as an analogy to the situation where economic agents have incomplete perceptions about the true state of the world. Players choose actions based on their limited perception of the state of the economy, but, eventually, the true state of the world is revealed. Players’ actions accordingly yield rewards or penalties.2
To incorporate the above aspect of reality into a formal model, this paper considers a class of dynamic Bayesian games in which types evolve stochastically according to a first-order Markov process on a continuous type space (“Bayesian stochastic games”). It is well known that equilibria of stochastic games with the continuous state space are elusive. This paper overcomes the challenge with the Bayesian feature. Dynamic Bayesian games with serially correlated types, however, are notorious for the curse of dimensionality that the dimension of players’ beliefs grows over time, and, thus, equilibria of such games are generally not tractable. This paper introduces delayed revelation of private information which is referred to as “periodic revelation” to overcome the dimensionality issue of beliefs. Types remain private when players choose actions, but they are revealed alongside actions when payoffs are obtained. In this framework, there exist a class of stationary Markov perfect equilibria. The game structure and equilibrium concept can be applied to the analyses of dynamic oligopoly with asymmetric information or many other economic environments with a delay of information revelation.
Suppose, at each time t, a player has a limited piece of information about the true state of the world. It is defined as the type of the player in the t-stage Bayesian game. The type profile of all the players is the true state of the economy, and it is hidden when players choose actions. But type and action profiles are eventually revealed to everyone at the end of the stage when individual payoffs of the stage Bayesian game are obtained. Toward the next period, the type profile stochastically evolves according to a first-order Markov process based on the revealed type and action profiles. Especially, the previous type and action profiles s t 1 , a t 1 , are publicly known at the beginning of time t, and all the players know the probability distribution over the new type profile P r s t s t 1 , a t 1 as common knowledge.
To see the structure in a familiar setting, consider a duopoly playing a dynamic Cournot competition. At time t, firm i learns its cost type c i , which is drawn from a compact interval in the real line [ 0 , α ] . However, firm i does not know firm j’s cost type, and vice versa. Firm i is aware that the cost type profile, which is the state of the economy, evolves stochastically over time according to a first-order Markov process, and that the stochastic process is based on the previous type and action profiles. Suppose the previous stage types and actions are publicly known, say, after the financial statements of firms are revealed. Then, firm i can have a belief over firm j’s current cost type based on the previous type profile and action profile. Conditional on their private cost types and beliefs about each other, firms choose optimal quantities to produce. Although similar approaches have appeared in the dynamic Bayesian games literature frequently, to the best of my knowledge, the models have suffered from the curse of dimensionality due to the history-dependent beliefs.3 This paper, by contrast, allows players to have time-invariant beliefs as long as the previous actions and types are the same under periodic revelation. Also, the models have never been treated in a continuous type space under a first-order Markov process. This paper shows existence of a class of stationary Markov perfect equilibria in this environment, which are termed “stationary Bayesian–Markov equilibria”.
Developing a framework for analyses of dynamic oligopolies with asymmetric information has been an open question for a long time due to the difficulty of dealing with the beliefs. Fershtman and Pakes [4] propose a framework for dynamic games with asymmetric information over a discrete type space focusing on empirical tractability. Their theoretical equilibrium concept is history-dependent, but they cleverly detour the dimensionality issue with the assumption that accumulated data include all the information about the history.4 Cole and Kocherlakota [5] and Athey and Bagwell [6], respectively, suggest new equilibrium concepts in a class of dynamic Bayesian games in which players’ beliefs are a function of the full history of public information.5 Hörner, Takahashi, and Vieille [7] characterize a subset of equilibrium payoffs, focusing on the case where players report their private information truthfully in dynamic stochastic Bayesian games.6
The key distinction of this paper from existing literature in dynamic Bayesian games is to add periodic revelation to mitigate the dimensionality problem. In the literature, it is conventional to formulate that private information remains hidden and never becomes public. If the game is under the serially correlated type evolution, without periodic revelation, players formulate their beliefs about other players’ current type based on the history of available past information. Then, the dimension of each player’s beliefs exponentially grows over time.7 Periodic revelation in this paper, however, enables players to have common prior for the current stage type distribution based on the revealed information, and, thus, players have time-invariant beliefs as long as the revealed information is the same regardless of the calendar time, which are consistent with the underlying type evolution. This helps to establish the stationary equilibrium concept even when types are serially correlated. Practically, it captures the economic environments where private information is periodically disclosed, either by legal requirements or voluntarily. Therefore, the model can be applied to dynamic oligopolies under the periodic disclosure requirement on the firms’ financial performance, e.g., Form 10-K in the U.S., or to the setting of Barro and Gordon [8] to analyze the effects of release of the Federal Open Market Committee (FOMC) transcripts with a five-year delay (See Ko and Kocherlakota [9]).
Levy and McLennan [10] show that, for stochastic games with complete information in a continuous state space, existence of stationary Markov equilibria is not guaranteed. In such environments, Nowak and Raghavan [11] show that there exists a stationary correlated equilibrium. Duggan [12] shows that there exists a stationary Markov equilibrium in cases where the state variable has an additional random component. The random component, so-called noise, can be viewed as an embedded randomization device for each state. In this perspective, a stationary Markov equilibrium in noisy stochastic games can be seen isomorphic to a stationary correlated equilibrium by Nowak and Raghavan [11]. Barelli and Duggan [13] formulate a dynamic stochastic game in which players’ strategies depend on the past and current information to show that Nowak and Raghavan’s stationary correlated equilibrium can be uncorrelated. By contrast, this paper considers the case where players have private information: the state is defined by a type profile of the game of incomplete information. As the players compute their interim payoff by integrating possible scenarios over their beliefs, the convexity of the set of interim payoffs is naturally obtained. It is more realistic that the convexity is obtained from the Bayesian structure, compared with previous approaches that utilize public randomization devices (Nowak and Raghavan [11]) or random noise (Duggan [12]).
The concept of the stationary Bayesian–Markov equilibrium is closely related to a stationary correlated equilibrium in Nowak and Raghavan [11]. That is, in a Bayesian game, players have beliefs about the type profile of their game, which is the state of the economy. Players then choose actions to maximize their expected payoffs considering their individual beliefs about the state of the economy. According to Aumann [14], a resulting Bayesian equilibrium can be regarded as a correlated equilibrium distribution, utilizing the collection of beliefs as a randomization device.
To prove existence of stationary Markov equilibria in conventional stochastic games, it is common to consider an induced game of the original stochastic games. The induced game is defined by a stage game that is indexed by the state of the world and a profile of continuation value functions. Then, the next step is to find a fixed point of the expected continuation value function in the induced game. Finally, one may apply the generalized implicit function theorem and a measurable selector theorem to extract an equilibrium strategy profile. During this process, it is crucial to have a convex set of the expected continuation value functions in order to ensure existence of a fixed point. Therefore, the key of proof is convexification of the set of continuation values in each state of the economy. Nowak and Raghavan [11] introduce a public randomization device, a so-called sunspot, as a means to convexification. In this paper, the idea of convexification is extended to the case where each player obtains private information according to common prior that depends on the past information from periodic revelation.
The model of a Bayesian stochastic game with periodic revelation is formally described in Section 2. Section 3 contains the existence theorem and the proof. Section 4 is devoted to a computational algorithm. Section 5 sketches the extension of proof to K-periodic revelation and concludes. In the Supplementary Materials, I present an application, specifically an incomplete information version of an innovation race between two pharmaceutical companies with periodic revelation.

2. The Basic Model

2.1. The Primitives

I use superscript “ + ( resp. ) ” to denote the next period (resp. the previous period). A discounted Bayesian stochastic game with periodic revelation is a tuple,
( I , ( ( S i , S i ) , X i , A i , u i , δ i , μ , η ) i I , τ ) ,
such that
  • I = { 1 , , n } is a finite set of n players, and for each i I ,
  • ( S i , S i ) is a measurable space of player i’s types, regardless of the calendar time, i.e., S i + = S i ,
  • ( X i , X i ) is a measurable space of player i’s actions,
  • A i : S i X i is the feasible action correspondence,
  • u i : S × X R is the payoff function, where S = i I S i and X = i I X i ,
  • δ i [ 0 , 1 ) is the discount factor,
  • μ : S × X × S i [ 0 , 1 ] is a transition function8,
  • η : S × X × S i × S i [ 0 , 1 ] is a transition function,
  • τ : S × X × S [ 0 , 1 ] is a transition function,
  • It is an infinite horizon game.
Let Δ ( · ) denote the set of probability measures. I assume that
( A 1 )
For each i I , S i is a Borel subset of a complete separable metric space, and S i is its Borel σ -algebra. Endowed with the product topology, the Cartesian product S is a Borel subset of a complete separable metric space. A product of σ -algebras S = S 1 × × S n is its Borel σ -algebra (M10, Billingsley [16], p. 254).
( A 2 )
For each i I , there is an atomless probability measure ϕ i such that ( S i , S i , ϕ i ) is a complete measure space of player i’s types; ϕ is a product probability measure such that ϕ = ϕ 1 × × ϕ n .9
( A 3 )
For each i I , X i is a compact metric space; X i is its Borel- σ algebra. Endowed with the product topology, the finite Cartesian product X is a compact metric space and X is its Borel- σ algebra. A typical element is denoted by a X . There is a measure κ such that ( X , X , κ ) is a complete measure space.
( A 4 )
For each i, define T i S × X × S i . A typical element is denoted by ( s , a , s i ) or t i . Notice that T i is a complete separable metric space. Let T i be its Borel- σ algebra. There is an atomless probability measure λ i such that ( T i , T i , λ i ) is complete measure space. Endowed with the product topology, the Cartesian product T is also a complete separable metric space and a product of σ -algebras T is its Borel- σ algebra’ λ is a product probability measure such that λ = λ 1 × × λ n .
( A 5 )
For each i I , A i is nonempty, compact valued, and lower measurable.10
( A 6 )
The expression u i ( · , · ) is bounded; there exists C i R + such that for each ( s , a ) , u i ( s , a ) C i . For each a X , u i ( · , a ) is measurable; for each s S , u i ( s , · ) , continuous.
( A 7 )
For each ( s , a ) , there is a prior distribution for Nature, τ ( · s , a ) Δ ( S ) about the current type s. For each Z S , τ ( Z · , · ) is jointly measurable; τ ( · s , a ) is absolutely continuous with respect to the atomless measure ϕ .
( A 8 )
For each ( s , a , s i ) , there are beliefs η ( · s , a , s i ) Δ ( S i ) about the other players’ current types. This is the s i -section of τ ( · s , a ) . Given ( s , a ) and for each s i , the mapping ( s , a , s i ) η ( · s , a , s i ) is a regular conditional probability on S i . For each Z i , η ( Z i · , · , · ) is jointly measurable; η ( · s , a , s i ) is absolutely continuous with respect to the atomless product measure ϕ 1 × × ϕ i 1 × ϕ i + 1 × × ϕ n .
( A 9 )
For each ( s , a ) , for each player i, there is an anticipation μ ( · s , a ) Δ ( S i ) about the future type of player i oneself. This is a marginal distribution derived by τ ( · s , a ) (Note that S i + = S i ).11
For each Z i S i , μ ( Z i s , a ) is jointly measurable in ( s , a ) ; μ ( · s , a ) is absolutely continuous with respect to the complete, atomless measure ϕ i . For ϕ -almost all s, the mapping a μ ( · s , a ) is norm-continuous.
( A 10 )
For each i, τ is decomposed into μ and η : for each ( s , a ) S × X , and all Z + S , I have the following:
τ ( Z + s , a ) = s i + s i + I Z ( s i + , s i + ) η ( d s i + s , a , s i + ) μ ( d s i + s , a ) ,
Example 1.
Consider a dynamic Bayesian Cournot competition as follows. There are two firms and these firms face unknown production cost types each period. Other than the cost type structure, the other components of Cournot competition are applied as usual. The price of the good is determined by a market demand function:
p ( q 1 , q 2 ) = α q 1 q 2 q 1 , q 2 [ 0 , α ] a n d α > 1 .
The condition q 1 , q 2 [ 0 , α ] helps the price remain above 0. The payoff functions u 1 and u 2 are as follows:
u 1 ( c 1 , c 2 , q 1 , q 2 ) = ( p c 1 ) · q 1 a n d u 2 ( c 1 , c 2 , q 1 , q 2 ) = ( p c 2 ) · q 2 .
The key distinction of the model is at the cost type structure. Assume cost type space of player i is given by [ 0 , α ] . Consider an area that is given by a square made of four points, ( 0 , 0 ) , ( 0 , α ) , ( α , 0 ) , and ( α , α ) , and called Region Z. Now, the cost types of firms are drawn from a joint probability distribution that is determined by the previous cost levels and the previous production levels of the firms: The shape of the joint probability distribution is determined by c 1 , c 2 , q 1 and q 2 over Region Z.12 For a specific example, assume the cost types are assumed to follow a joint distribution over Region Z:
( c 1 , c 2 ) F c 1 , c 2 , q 1 , q 2 R e g i o n Z .
Any joint probability measure with respect to ( c 1 , c 2 ) is allowed as a cost type structure in this sample model if the following conditions are satisfied: (1) the joint probability measure has [ 0 , α ] × [ 0 , α ] (Region Z) as its support; (2) it is differentiable with respect to c i over the type space [ 0 , α ] ; and (3) it is continuous with respect to q i over the action space [ 0 , α ] . For example, for each i, consider a concave quadratic function f i ( c i c 1 , c 2 , q 1 , q 2 ) R + , which is peaked at α · ( q 1 + q 2 ) 2 / 2 ( c 1 + c 2 ) / 2 and goes through zeros at c i = 0 or c i = α , i.e., f i ( 0 c 1 , c 2 , q 1 , q 2 ) = 0 or f i ( α c 1 , c 2 , q 1 , q 2 ) = 0 , as well as of which area is one (1). Compose a joint probability distribution by the product of the aforementioned functions f ( c 1 c 1 , c 2 , q 1 , q 2 ) × f ( c 2 c 1 , c 2 , q 1 , q 2 ) .13 Then, the support of the joint distribution is the square made of four points, ( 0 , 0 ) , ( 0 , α ) , ( α , 0 ) , and ( α , α ) , and it satisfies ( A 8 ) . The joint probability distribution can be more general, not necessarily being a product of two marginal distributions. A uniform joint distribution over Region Z is also allowed.
In this example, the set of player indices I = 1 , 2 . The costs of firms c 1 , c 2 are considered as types s i in the above model setup, and the type space for each firm S i is given by a closed interval on the real line [ 0 , α ] . Then, it satisfies the condition ( A 1 ) of a complete separable metric space. The produced quantities q 1 , q 2 are actions a i in the above model setup. The action space for each firm is given by a compact metric space q 1 , q 2 [ 0 , α ] on the real line. As the type and action spaces are all on the real line, ( A 2 ) , ( A 3 ) , and ( A 4 ) are fulfilled. Assumption ( A 5 ) is satisfied as long as, for example, a pure action strategy q 1 : c 1 [ 0 , α ] and q 2 : c 2 [ 0 , α ] are continuous in c 1 and c 2 , respectively. As q 1 and q 2 are bounded, the market price p = α q 1 q 2 is bounded. Then, it is clear that payoff function for firm i, for each ( q 1 , q 2 ) , u i ( · , q 1 , q 2 ) , is bounded and continuous. Also, given ( c 1 , c 2 , ) , u i ( c 1 , c 2 , · ) is integrable.
The probability measures τ ( · ) , μ ( · ) , η ( · ) in this example are as follows: from player 1’s perspective,14
τ ( · s , a ) Δ ( S ) ( c 1 , c 2 ) F c 1 , c 2 , q 1 , q 2 ( · ) , η ( · s , a , s i ) Δ ( S i ) c 2 e F c 1 , c 2 , q 1 , q 2 ( c ^ 1 ) , μ ( · s , a ) Δ ( S i ) c 1 c 2 F c 1 , c 2 , q 1 , q 2 d c 2 ,
where c 2 e is what player 1 believes as player 2’s type when player 1’s cost type is c ^ 1 . Since the model is on the real line, assumptions ( A 7 10 ) are satisfied as long as the probability function (1) has Region Z as its support, (2) is differentiable with respect to c i over [ 0 , α ] , and (3) is continuous with respect to q i over [ 0 , α ] . By contrast, if Region Z were a triangle made of three points of ( 0 , 0 ) , ( 0 , α ) , and ( α , 0 ) , it would violate ( A 8 ) because the conditional distribution for c i given c i has a jump on the cost type space [ 0 , α ] . I keep using this example to describe the equilibrium concept and the sketch of the proof.

2.2. Timing

Suppose the game reaches the middle of the stage of time ( t 1 ) . Players choose actions a given their own type without knowing others. However, the payoff function for player i depends on the types and actions of all players. At the end of the previous period, the type profile and action profile ( s , a ) are revealed and the payoff u i ( s , a ) is given to player i for all i. Next, the time-t stage game proceeds as follows (Figure 1 depicts the timeline):
  • Nature moves to draw each player’s type based on the Markov process τ ( · s , a ) for all i.
  • For each i, player i whose type is s i chooses actions based on their beliefs η ( · s , a , s i ) , maximizing their discounted sum of expected payoffs.
  • At the end of the current period, the realized type profile and action profile ( s , a ) are revealed, and player i earns payoff u i ( s , a ) . Nature moves to draw each player’s next period type based on the Markov process τ ( · s , a ) for all i.

2.3. Stationary Bayesian–Markov Equilibrium

A stationary Bayesian–Markov strategy for player i is a measurable mapping σ i : S × X × S i Δ ( X i ) . For each ( s , a , s i ) , a probability measure σ i ( s , a , s i ) assigns probability one to A i ( s i ) . Let Σ i denote the set of stationary Bayesian–Markov strategies:
Σ i = { σ i σ i M ( S × X × S i , Δ ( X i ) ) , σ i ( s , a , s i ) ( A i ( s i ) ) = 1 } .
For each s S , let σ ( s , a , s ) denote the product probability measure σ 1 ( s , a , s 1 ) × × σ n ( s , a , s n ) .15 In addition, σ denotes a profile of mappings ( σ 1 , , σ n ) and Σ denotes the set of stationary Bayesian–Markov strategy profiles σ .
For each σ , player i’s interim expected continuation value function v i ( · σ ) : S × X × S i R is a measurable function. To define the interim expected continuation value function v i , remember that player i with the current type s i , t only has beliefs over the current realization ( s t , a t ) and the future type over s i , t + 1 . Then, player i will again imagine, in the next period ( t + 1 ) , if they learn their type is s i , t + 1 , they will have new beliefs over the type and action profile realization ( s t + 1 , a t + 1 ) and the future type in period ( t + 2 ) , s i , t + 2 . That is, the interim expected continuation value function relies on the expectation over the current beliefs of player i and their conjecture over the future beliefs conditional on the current beliefs. Therefore, the value function demonstrates a fractal structure in which there is an expectation within an expectation ad infinitum. To see clearly, denote E s i , t , a t [ u i ( s t , a t ) ] as follows when player i’s belief over the others’ type and realized action profile ( s i , t , a t ) is given by η ( s i , t s t 1 , a t 1 , s i ) × σ ( s t 1 , a t 1 , s t ) :
E s i , t , a t u i s t , a t ; s i , t , η , σ = s i , t a t u i ( s t , a t ) σ ( s t 1 , a t 1 , s t ) ( d a ) η ( d s i , t s t 1 , a t 1 , s i , t )
In addition, denote E s i , t + 1 , s i , t + 1 , a t + 1 u i s t + 1 , a t + 1 ; μ , η , σ as follows to express ex ante expected payoff at the beginning of time ( t + 1 ) , which is before player i learns their type s i , t + 1 :
E s i , t + 1 , s i , t + 1 , a t + 1 u i s t + 1 , a t + 1 ; μ , η , σ = s i , t + 1 s i , t + 1 a t + 1 u i ( s t + 1 , a t + 1 ) σ ( s t , a t , s i , t + 1 ) ( d a ) × η ( d s i , t + 1 s t , a t , s i , t + 1 ) μ d s i , t + 1 s t , a t = s i , t + 1 E s i , t , a t u i ( s t + 1 , a t + 1 ) ; η , σ μ d s i , t + 1 s t , a t
Then, the interim expected continuation value function is, for each ( s t 1 , a t 1 , s i , t ) ,16
v i ( s τ 1 , a τ 1 , s i , τ σ ) = E s i , t , a t ( 1 δ i ) u i ( s t , a t ) + δ i E s i , t + 1 , s i , t + 1 , a t + 1 ( 1 δ i ) u i ( s t + 1 , a t + 1 ) + δ i E s i , t + 2 , s i , t + 2 , a t + 2 ( 1 δ i ) u i ( s t + 2 , a t + 2 ) +
= E s i , t , a t ( 1 δ i ) u i ( s t , a t ) + δ i s i , t + 1 E s i , t + 1 , a t + 1 ( 1 δ i ) u i ( s t + 1 , a t + 1 ) + μ d s i , t + 1 s t , a t
As Equation (11) is a fractal, it is more convenient to express by recursion: For each ( s , a , s i ) ,
v i ( s , a , s i σ ) = s i a ( 1 δ i ) u i ( s , a ) + δ i s i + v i ( s , a , s i + σ ) μ ( d s i + s , a ) σ ( s , a , s ) ( d a ) η ( d s i s , a , s i ) .
A profile of stationary Bayesian–Markov strategies σ is a stationary Bayesian–Markov equilibrium if, for each ( s , a , s ) , each player i’s strategy σ i maximizes i’s interim expected continuation values. That is, given ( s , a ) , for each s i , σ ( s , a , s i ) puts probability one on the set of solution to
max a i A i ( s i ) s i a i ( 1 δ i ) u i ( s , a ) + δ i s i + v i ( s , a , s i + a i , σ i ) μ ( d s i + s , a ) × σ i ( s , a , s i ) ( d a i ) η ( d s i s , a , s i ) .
By the one-shot deviation principle, every stationary Bayesian–Markov equilibrium is subgame perfect.
Example 2.
Continued from the above dynamic Bayesian Cournot competition, the strategies for each player can be any mapping from a cost type of player i to a probability distribution over a subset of the action space, q i [ 0 , α ] , e.g., c i Δ ( A i ) , where A i [ 0 , α ] , given previous c 1 , c 2 , q 1 , q 2 . Among possible strategies, the equilibrium strategy is obtained by maximization of the interim expected continuous value function. Recall the stage payoff function is u i ( c 1 , c 2 , q 1 , q 2 ) = ( α q 1 q 2 ) · q 1 c 1 · q 1 , and assume A ( c i ) is a singleton, that is, q i ( c i ) is a pure action strategy. Solving the following dynamic optimization, equilibrium strategy for player 1 is given by
q 1 ( c 1 , c 2 , q 1 , q 2 , c 1 ) = arg max c 2 q 2 ( 1 δ 1 ) ( α q 1 q 2 ) · q 1 c 1 · q 1 + δ 1 c 1 + ( α q 1 + q 2 + ) · q 1 + c 1 + · q 1 + μ ( d c 1 + c 1 , c 2 , q 1 , q 2 ) + δ 1 2 c 1 + + ( α q 1 + + q 2 + + ) · q 1 + + c 1 + + · q 1 + + μ ( d c 1 + + c 1 + , c 2 + , q 1 + , q 2 + ) + × 1 q ( c 2 ) ( d q 2 ) × η ( d c 2 c 1 , c 2 , q 1 , q 2 , c 1 ) .
Then, by taking derivatives and solving the system of equations, for each q i
q i ( c i c 1 , c 2 , q 1 , q 2 ) = 1 3 ( 1 δ ) 2 1 δ α c 1 1 δ α E c 2 + δ E E c 2 + · q 2 + q 2 2 δ E c 1 + · q 1 + q 1 + 1 δ 2 c 1 E c 1 + 1 δ 2 E c 1 + · q 1 + q 1 E E c 1 + · q 1 + q 1
Observe that the equilibrium strategy is forward-looking: q i takes into account the variation in future cost type distribution due to changes in q i . However, it only cares about the next period because the optimization problem is recursive. Using this recursiveness, the existence theorem for the continuous space dynamic stochastic game is presented in the next section.

3. Existence Theorem

Theorem 1
(Existence Theorem). For every Bayesian stochastic game with periodic revelation, there exists a stationary Bayesian–Markov equilibrium.
First, I construct the set V of interim expected continuation value function profiles in the following two paragraphs.17 Fix i I . Let L ( T i , λ i ) be the collection of λ i -equivalence classes of λ i -essentially bounded, measurable extended real-valued functions from S × X × S i to R ; L ( T i , λ i ) is equipped with the usual norm · L , which gives the smallest essential upper bound; that is, f L = inf M if f ( s , a , s i ) M for λ i -almost all ( s , a , s i ) ; L 1 ( T i , λ i ) is the collection of λ i -equivalence classes of integrable functions from T i = S × X × S i to R ; L 1 ( T i , λ i ) is equipped with · L 1 such that g L 1 = T i g d λ i = S i X S g d ϕ d κ d ϕ i .18 By the Riesz representation theorem (see Royden–Fitzpatrick ([19], hereafter RF, p. 400), L ( T i , λ i ) is the dual space of L 1 ( T i , λ i ) , and it is endowed with weak-* topology. By Proposition 14.21 of RF (p. 287), L ( T i , λ i ) is a locally convex Hausdorff topological vector space. Let L ( T , λ ) denote the Cartesian product L ( T 1 , λ 1 ) × × L ( T n , λ n ) . Then, endowed with product topology, L ( T , λ ) is a locally convex Hausdorff topological vector space.
Let V i consist of functions v i L ( T i , λ i ) with v i L C i ; that is, v i ( s , a , s i ) C i for λ i -almost all ( s , a , s i ) . The constant C i R + is of which u i ( s , a ) C i for all ( s , a ) . Clearly, V i is nonempty and convex. Then, the Cartesian product V = V 1 × × V n is also non-empty and convex. By Alaoglu’s theorem (RF, p. 299), V i is compact. The finite product V is also compact (see Theorem 26.7, Munkres [20], p. 167).
Now I show V is metrizable in the weak-* topology. Since T i is separable metric space, its Borel σ -algebra is countably generated. L 1 ( T i , λ i ) is separable.19 By Corollary 15.11 of RF (p. 306), V i is metrizable in the weak-* topology. The Cartesian product V is metrizable in the product weak-* topology, so it is compact if and only if sequentially compact (see Theorem 28.2, Munkres [20], p. 179).
In order to see if the correspondence v M v is nonempty, closed graph, and convex valued, I consider the following induced game. Eventually I want to show that the set of interim expected payoffs profiles E v ( s , a ) ( s ) from the induced game is in fact equivalent to its convexification c o E v ( s , a ) ( s ) . Let Γ v ( s , a ) ( s ) denote a state-s induced game of a Bayesian stochastic game given a profile of continuation value functions v and the realized state-action profiles ( s , a ) in the previous period:
Γ v ( s , a ) ( s ) = ( I , ( ( S i , S i ) , X i , A i ( s i ) , U ^ i ( s , a ) ( · v ) , μ ( · · , · ) , η ( · s , a , s i ) ) i I ) .
Here, U ^ i ( s , a ) ( · v ) : S i × Σ R is defined as follows: for each s i , each σ ,
U ^ i ( s , a ) ( s i , σ v ) = s i a ( 1 δ i ) u i ( s , a ) + δ i s i + v i ( s , a , s i + ) μ ( d s i + s , a ) σ ( s , a , s ) ( d a ) η ( d s i s , a , s i ) .
In the interim stage of a general one-shot Bayesian game, suppose players use behavioral strategies. Knowing their realized type ( s i ), player i exerts a mixed action ( β i ( s i ) Δ ( X i ) ), induced by their behavioral strategy (a measurable mapping β i : S i Δ ( X i ) ). In the induced game of the Bayesian stochastic game in this paper, players use stationary Bayesian–Markov strategies, and it is essentially the same to the behavioral strategy in a one-shot Bayesian game. The mixed actions induced by stationary Bayesian–Markov strategy profile σ = ( σ 1 , , σ n ) of all players determine a product probability measure σ ( s , a , s ) = σ 1 ( s , a , s 1 ) × × σ n ( s , a , s n ) . The difference between a stationary Bayesian–Markov strategy and a behavioral strategy is that, in the former, beliefs are given by the consequence of the previous stage game, and, thus, mixed actions depend on the previously realized type and action profiles as well as the current own type of player i oneself.20
The space Δ ( X 1 ) × × Δ ( X n ) and Δ ( A 1 ( s 1 ) ) × × Δ ( A n ( s n ) ) are endowed with the product weak topology. By Theorem 2.8 of Billingsley ([16], p. 23), α m α in Δ ( X 1 ) × × Δ ( X n ) if and only if α i m α i in Δ ( X i ) for each i.
Lemma 1.
Given ( s , a ) , for each v and each σ, U ^ i ( s , a ) ( · , σ v ) is measurable in s i . Then, U ^ i ( · , · ) ( · , σ v ) is measurable in ( s , a , s i ) . Given ( s , a ) , for each s i , U ^ i ( s , a ) ( s i , · · ) is jointly continuous in ( σ , v ) .
Proof. 
Define
U i ( s , a ) ( s , σ v ) = a ( 1 δ i ) · u i ( s , a ) + δ i s i + v i ( s , a , s i + ) · μ ( d s i + s , a ) · σ ( s , a , s ) ( d a )
and
U ^ i ( s , a ) ( s i , σ v ) = s i U i ( s , a ) ( s , σ v ) η ( d s i s , a , s i ) .
For each Z i S i , μ ( Z i s , a ) is measurable in s i ; v i ( s , a , s i + ) is bounded, then Theorem 19.7 of Aliprantis and Border ([22], hereafter AB, p. 627) implies s i + v i ( s , a , s i + ) μ ( d s i + s , a ) is measurable in s i . Recall that, given ( s , a ) , for each Z i S i , η ( Z i s , a , · ) is measurable. Since U i ( s , a ) ( s , σ v ) is bounded, U ^ i ( s , a ) ( · , σ v ) is measurable. Similarly, since for each Z i S i , η ( Z i · , · , · ) is measurable, U ^ i ( · , · ) ( · , σ v ) is also measurable.
Fix ( s , a ) . Consider a sequence { ( a m , v m ) } ( a , v ) , where for each i, v i m v i in weak-* topology. I have
U ^ i ( s , a ) ( s i , a m v m ) U ^ i ( s , a ) ( s i , a v ) = s i ( 1 δ i ) u i ( s , a m ) ( 1 δ i ) u i ( s , a ) + δ i s i + v i m ( s , a m , s i + ) μ ( d s i + s , a m ) δ i s i + v i ( s , a , s i + ) μ ( d s i + s , a ) × η ( d s i s , a , s i ) . ( i )
Since u i ( s , · ) is continuous, ( 1 δ i ) u i ( s , a m ) ( 1 δ i ) u i ( s , a ) 0 , and
s i + v i m ( s , a m , s i + ) μ ( d s i + s , a m ) s i + v i ( s , a , s i + ) μ ( d s i + s , a ) s i + v i m ( s , a m , s i + ) μ ( d s i + s , a m ) s i + v i m ( s , a m , s i + ) μ ( d s i + s , a ) + s i + v i m ( s , a m , s i + ) μ ( d s i + s , a ) s i + v i ( s , a , s i + ) μ ( d s i + s , a ) C i · | μ ( · s , a m ) μ ( · s , a ) | + s i + [ v i m ( s , a m , s i + ) v i ( s , a , s i + ) ] μ ( d s i + s , a )
The first inequality is by the triangle inequality. The second inequality is by the fact that v i ( s , a ) is essentially bounded by C i and that | f d μ | | f | d μ .
As v i m v i in weak-* topology, by Proposition 3.13 of Brezis ([23], p. 63), for any given point of ( s , a , s i + ) T i , v i m ( s , a , s i + ) v i ( s , a , s i + ) . This induces S i + v i m ( · , a m , s i + ) μ ( d s i + s , a ) T i v i ( · , a m , s i + ) μ ( d s i + s , a ) . Moreover, as a m a in X, v i ( s , a m , s i + ) v i ( s , a , s i + ) , because u i ( s , · ) for given s is continuous, and v i is a continuous linear functional of u i .21 This gives us s i + [ v i m ( s , a m , s i + ) v i ( s , a , s i + ) ] μ ( d s i + s , a ) 0 . Then, combined with norm continuity of μ ( · s , a ) in a, I have the RHS of the last inequality converging to 0. Now, continuity of U ^ i ( s , a ) ( s i , σ v ) follows from that given s, the family of real valued functions U U ^ i ( s , a ) ( s i , · v m ) m N is equicontinuous at each a, and the result of Rao [24] under the absolutely continuous information structure holds. □
Fix ( s , a ) and v. For each s, let B v ( s , a ) ( s ) be the set of mixed action profiles induced by Bayesian–Nash equilibria of Γ v ( s , a ) (s). Then
B v ( s , a ) ( s ) σ ( s , a , s ) Δ ( A 1 ( s 1 ) ) × × Δ ( A n ( s n ) ) σ Σ ; for each j I , and for each i I { j } σ j satisfies payoff - indifference among i s actions U ^ i ( s , a ) ( h i , s i , σ v ) = max a i A i ( s i ) U ^ i ( s , a ) ( h i , s i , a i , σ i v ) .
The condition holds if and only if player j’s equilibrium mixed action makes all the other player i’s choice of pure actions a i that belongs to the support of σ i ( s i ) indifferent for player i in terms of the interim expected continuation value. This condition induces player i to mix their actions. As a result, player i’s mixed action gives the same payoff (in terms of the interim expected continuation value) to any pure action from the support of σ i ( s i ) .
Now define
ξ ^ ( s , a ) ( s , σ ; v ) = i = 1 n U ^ i ( s , a ) ( s i , σ ; v ) max a i A i ( s i ) U ^ i ( s , a ) ( s i , a i , σ i ; v ) .
Then, for all σ ( s , a , s ) B v ( s , a ) ( s ) , ξ ^ ( s , a ) ( s , σ ; v ) = 0 . Below I follow Duggan’s [12] approach.
Lemma 2.
For each v, ( s , a , s ) B v ( s , a ) ( s ) is nonempty, compact valued, and lower measurable.
Proof. 
A correspondence s Δ ( A 1 ( s 1 ) ) × × Δ ( A n ( s n ) ) is lower measurable, nonempty, and compact valued. To see this, I want, for each open subset G of X, the lower inverse image of G to be measurable; A l ( G ) = { s S A ( s ) G } S . Since S = S 1 × × S n is generated by measurable rectangles, A l ( G ) S if and only if A i l ( G i ) S i , where p r o j i 1 ( G i ) is a subbasis element for the product topology on X. From ( A 4 ) , A i is lower measurable for each i, it is clear that s ( A 1 ( s 1 ) ) × × ( A n ( s n ) ) is lower measurable. It is also nonempty and compact valued because so is each A i . Then, s Δ ( A 1 ( s 1 ) ) × × Δ ( A n ( s n ) ) is lower measurable, nonempty, and compact valued by Himmelberg and Van Vleck [25].
Notice that
ξ ^ ( s , a ) ( s , σ ; v ) = i = 1 n U ^ i ( s , a ) ( s i , σ ; v ) max a i A i ( s i ) U ^ i ( s , a ) ( s i , a i , σ i ; v ) = i = 1 n s i U i ( s , α ; v ) η ( d s i s , a , s i ) max a i A i ( s i ) s i U i ( s , a i , α i ) ; v ) η ( d s i s , a , s i ) = i = 1 n s i U i ( s , α ; v ) max a i A i ( s i ) U i ( s , a i , α i ) ; v ) η ( d s i s , a , s i ) = i S i s i = 1 n U i ( s , α ; v ) max a i A i ( s i ) U i ( s , a i , α i ; v ) τ ( d s s , a ) d δ s i ,
where, given ( s , a , s ) , α = σ ( s , a , s ) , and U i ( s , α ; v ) = ( 1 δ i ) u i ( s , α ) + δ i s i + v i ( s , α , s i + ) μ ( d s i + s , α ) . Similarly, U i ( s , a i , α i ; v ) = ( 1 δ i ) u i ( s , a i , α i ) + δ i s i + v i ( s , a i , α i , s i + ) μ ( d s i + s , a i , α i ) , and δ s i is the Dirac delta measure concentrated at s i . Define
ξ ( s , α ; v ) i = 1 n U i ( s , α ) max a i A i ( s i ) U i ( s , a i , α i ) .
Then, ξ ( s , α ; v ) = 0 implies ξ ^ ( s , a ) ( s , σ ; v ) = 0 .22 By Lemma 2 of Duggan [12] and measurability of τ ( Z · , · ) for each Z S , ( s , a , s ) B v ( s , a ) ( s ) is lower measurable. Given ( s , a ) , Balder [18] gives us nonemptiness of B v ( s , a ) ( s ) for each s. Recall that the interim expected payoff function U ^ i ( s , a ) ( s i , · ; v ) is continuous, and the finite product of × i Δ ( A i ( s i ) ) is compact. By the theorem of maximum (Theorem 17.31, AB, p. 570), B v ( s , a ) ( s ) is compact subset of Δ ( A 1 ( s 1 ) ) × × Δ ( A n ( s n ) ) for each s. □
Given s as realized type profile, define the set of realized payoffs for player i from B v ( s , a ) ( s ) as P v , i ( s , a ) ( s ) = U i ( s , B v ( s , a ) ( s ) ; v ) . Then, ( s , a , s ) P v , i ( s , a ) ( s ) is nonempty, compact valued, and lower measurable, since U i ( s , · ; v ) is continuous. By the Kuratowski–Ryll-Nardzewski measurable selection theorem (see Theorem 18.13, AB, p.600), it admits a measurable selector. Then, the correspondence is integrable. The set of interim expected payoffs for player i is denoted as E v , i ( s , a ) ( s i ) = s i P v , i ( s , a ) ( s ) η ( d s i s , a , s i ) . Let E v ( s , a ) ( s ) denote the Cartesian product E v , 1 ( s , a ) ( s 1 ) × × E v , n ( s , a ) ( s n ) .
Lemma 3.
For each v, each ( s , a , s ) , E v ( s , a ) ( s ) = c o E v ( s , a ) ( s ) .
Proof. 
By Theorem 18.10 of AB (p. 598), the correspondence for each player i’s realized payoffs ( s , a , s ) P v , i ( s , a ) ( s ) is, in fact, measurable. For any ( s ˜ , a ˜ , s ˜ i ) , consider the ( s ˜ , a ˜ , s ˜ i ) -section of the correspondence, ( s ˜ , a ˜ , s ˜ i , s i ) P v , i ( s ˜ , a ˜ ) ( s ˜ i , s i ) ; it is clearly measurable. By Theorem 4 of Hildenbrand ([26], p. 64),
s i P v , i ( s ˜ , a ˜ ) ( s ˜ i , s i ) η ( d s i s ˜ , a ˜ , s ˜ i ) = s i c o P v , i ( s ˜ , a ˜ ) ( s ˜ i , s i ) η ( d s i s ˜ , a ˜ , s ˜ i ) .
Hence, for each ( s , a , s i ) , E v , i ( s , a ) ( s i ) = c o E v , i ( s , a ) ( s i ) . Since Cartesian product of convex sets of R is convex in R n , E v ( s , a ) ( s ) = c o E v ( s , a ) ( s ) . □
Lemma 4.
The mapping ( s , a , s ) E v ( s , a ) ( s ) is lower measurable, nonempty, compact, and convex valued.
Proof. 
Notice that for each ( s , a , s i ) , E v , i ( s , a ) ( s i ) is nonempty and convex. Recall that ( s , a , s ) P v , i ( s , a ) ( s ) admits a measurable selector. Let { f i , k } k denote the set of measurable selectors from the correspondence. Note that { f i , k ( s , a , s ) } k R . Since P v , i ( s , a ) ( s ) is compact, for each f i , k , there is a family of functions that converge pointwise to f i , k at each ( s , a , s ) . Notice that U i ( s , σ ( s ) ; v ) is bounded, so a family of functions that converges pointwise to f i , k is uniformly integrable. Recall that each λ i is a probability measure. Then, S × X × S i is a set of finite measure; the finite Cartesian product S × X × S is also of finite measure. Obviously, the aforementioned family of functions is tight. Applying the Vitali convergence theorem (RF, p. 377), I obtain compactness of s i P v , i ( s , a ) ( s ¯ i , s i ) η ( d s i s , a , s ¯ i ) for each i. The finite product E v ( s , a ) ( s ) is therefore compact at each ( s , a , s ) . Applying Tonelli’s theorem (RF, p. 420), I have s i f i , k ( s , a , s ¯ i , s i ) η ( d s i s ¯ , a , s i ) measurable. Thus, I also get the lower measurability of ( s , a , s ) s i P v , i ( s , a ) ( s ¯ i , s i ) η ( d s i s ¯ , a , s i ) . Then, by Proposition 2.3 (ii) of Himmelberg ([27], p. 67), ( s , a , s ) E v ( s , a ) ( s ) is lower measurable. □
By the Kuratowski–Ryll-Nardzewski measurable selection theorem (Theorem 18.13, AB, p. 600), ( s , a , s ) E v ( s , a ) ( s ) has a measurable selector. Given v, define M v to be the set of all λ -equivalence classes of measurable selectors of ( s , a , s ) E v ( s , a ) ( s ) .
Lemma 5.
The mapping v M v is nonempty, closed graph, and convex valued.
Proof. 
By construction, for each v, M v is nonempty, closed, and convex. Recall that the interim expected payoff function U ^ i ( s , a ) ( s i , σ ; · ) is continuous. By the theorem of maximum, v B v ( s , a ) ( s ) is upper hemicontinuous. Suppose a sequence ( v m , σ m ( s , a , s ) ) ( v , σ ( s , a , s ) , where σ m ( s , a , s ) B v m ( s , a ) ( s ) and σ ( s , a , s ) B v ( s , a ) ( s ) . Lemma 1 tells us that for each player i23,
d i m ( s i ) ( s , a ) max α Δ ( A ( s ) ) U ^ i ( s , a ) ( s i , α ; v m ) U ^ i ( s , a ) ( s i , α ; v ) 0 .
Now I proceed similarly to Lemma 7 of Nowak and Raghavan [11]. Suppose g m M v m and g m g in the product weak-* topology on V. I want to show that g M v . By Mazur’s theorem and Alaoglu’s theorem (Brezis [23], p. 61 and p. 66), there is sequence { y m } made up of convex combinations of the g m s that satisfies | y m g | L 0 as m . This implies that { y m } converges to g pointwise almost everywhere on T. Let y m ( t ) g ( t ) for all t T T 0 where λ ( T 0 ) = 0 . Recall that T S × X × S . Then, for each t T T 0 , for each m, if y m ( s , a , s ) E v m ( s , a ) ( s ) , then g ( s ) E v m ( s , a ) ( s ) ; since d i m ( s i ) ( s , a ) 0 , E v m ( s , a ) ( s ) E v ( s , a ) ( s ) ; thus, g ( s , a , s ) E v ( s , a ) ( s ) for each ( s , a , s ) = t T T 0 and clearly g M v . Hence, v M v is upper-hemicontinuous. In addition, I observe that for each m, M v m is closed and M v is closed. By the closed graph theorem for correspondences (Theorem 17.11, AB, p. 561), v M v is closed graph. □
Proof of Existence Theorem.
Obviously, M v V . By the Kakutani–Fan–Glicksberg theorem (see Theorem 17.55, AB, p. 583), I have a fixed point of v M v . Then, I have w : S × X × S R n such that given ( s , a ) , w ( s , a , s ) E v ( s , a ) ( s ) for all s S S 0 , ϕ ( S 0 ) = 0 . Recall that U ^ i ( s , a ) ( s i , α ; v ) is measurable in ( s , a , s i ) , continuous in α (Lemma 1). Now, by Filippov’s implicit function theorem (see Theorem 18.17, AB, p. 603), there exists a measurable mapping f : S × X × S Δ ( X 1 ) × × Δ ( X n ) such that, given ( s , a ) , for each s, f ( s , a , s ) B v ( s , a ) ( s ) and for all i, each s i S i p r o j i ( S 0 ) ,
w i ( s , a , s i ) = s i ( 1 δ i ) u i ( s , f ( s , a , s ) ) + δ i s i + w i ( s , a , s i + ) μ ( d s i + s , f ( s , a , s ) ) η ( d s i s , a , s i ) .
For each i, ( s , a , s ) S × X × ( S S 0 ) , ( s , a , s ) B v ( s , a ) ( s ) admits a measurable selector by the Kuratowski–Ryll-Nardzewski measurable selection theorem (Theorem 18.13, AB, p. 600). Let f 0 be any measurable selector of ( s , a , s ) B v ( s , a ) ( s ) . Given ( s , a ) , put
f * ( s , a , s ) = f 0 ( s , a , s ) if s S 0 f ( s , a , s ) if s S S 0 .
Then, f * ( s , a , s ) B v ( s , a ) ( s ) for all s, and
w i * ( s , a , s i ) = s i ( 1 δ i ) u i ( s , f * ( s , a , s ) ) + δ i s i + w i * ( s , a , s i + ) μ ( d s i + s , f * ( s , a , s ) ) η ( d s i s , a , s i ) .
Therefore, f is the stationary Bayesian–Markov equilibrium strategy profile. □

4. Computational Algorithm

The flow of the proof in Section 3 can be used as a computational algorithm to find an equilibrium. The algorithm is especially useful for the environment of heterogeneous beliefs.24 First, find a fixed point in a set of interim expected continuous value functions, and second, extract an equilibrium strategy that generates the fixed point value function. In this sense, the computational algorithm for the dynamic Bayesian games in a continuous type space with periodic revelation is close to the value function iteration in macroeconometrics.25 The difference between the popular macroeconometrics technique and the following algorithm is, however, that there are multiple agents who have heterogeneous beliefs about the others’ types in dynamic Bayesian games. That is, the highlight of the computational algorithm in this section is to find a fixed point for a array of interim expected continuous value functions given the heterogeneous beliefs.
Start the algorithm by approximating interim expected continuous value function v i ( · ) for player i by a polynomial, preferably a Chebyshev orthogonal polynomial.26 Accordingly, the type space and action space are given by grids of Chebyshev nodes. In addition, build the beliefs and transition probability distribution over different sets of grids for numerical integration, for example, a type space with equidistant nodes S i = { s 1 , , s k , , s N } and an action space with equidistant nodes, X i = { a 1 , , a k , , a N } for all i. Over the grids, construct the probability distribution over the collection of types for the current stage, τ ( · s , a ) Δ ( S ) , given any previous type and action profiles, ( s , a ) . From this τ ( · s , a ) , beliefs for the other players η ( · s , a , s i ) and the future type distribution μ ( · s , a ) can be obtained.
Then, the main algorithm is given by the following backward iteration process: start with the jth guess of v i j ( · ) and plug it into the right-hand side. After numerical integration using μ ( · s , a ) and η ( · s , a , s i ) , the result from the right-hand side is fed to the loop as the ( j + 1 ) th guess of v i j + 1 ( · ) . Repeat the iteration until v i ( · ) converges. By the existence theorem, a fixed point for v i ( · ) exists, and, thus, for large enough j, v i j ( · ) is guaranteed to be sufficiently close to v i j + 1 ( · ) .
v i j + 1 ( s , a , s i ) = max f Δ ( X ) s i ( 1 δ i ) u i ( s , f ( s , a , s ) ) + δ i s i + v i j ( s , a , s i + ) μ ( d s i + s , f ( s , a , s ) ) η ( d s i s , a , s i ) .
Define an operator T ( · ) that implements Equation (29). That is, v i j + 1 = T v i j and, more precisely, it is an operation for coefficients of Chebyshev polynomials. As there are multiple players i { 1 , , n } , by stacking all the vectors of v i j for all i, in the jth iteration,
v 1 j + 1 v 2 j + 1 v n j + 1 = T v 1 j v 2 j v n j .
The sequence of the vectors v i j j for all i is guaranteed to converge as T ( · ) is a contraction mapping. Denote the solution as v i * . Then, the associated equilibrium stationary Bayesian strategy profile f ( · ) in Equation (29) can be obtained by plugging in the solution v i * to the both sides.

5. Conclusions and Extension

In reality, economic agents rarely have a true understanding about the state of the economy. Often they have their own biased perceptions which reflect only part of the true state of the economy, and their perceptions may remain private. Moreover, such biased perceptions tend to be serially correlated. To reflect such aspects of reality more closely, dynamic games with asymmetric information can be a better modeling choice than ones with symmetric information. This paper constructs a class of dynamic Bayesian games where types evolve stochastically according to a first-order Markov process depending on the type and action profiles from the previous period. In this class of dynamic Bayesian games, however, players’ beliefs exponentially grow over time. To mitigate the curse of dimensionality in the dynamic Bayesian game structure, this paper considers the environment where the asymmetric information in the past becomes symmetric with a delay (periodic revelation). Specifically, this paper considers the case where the previous type profile is revealed as public information with a one-period delay. That is, type profile remains hidden when players choose actions, but type and action profiles are revealed when players obtain their payoffs. As common prior for the next stage Bayesian game is pinned down by the type and action profiles, players’ beliefs do not explode over time.
Theoretically, in the dynamic Bayesian game of this paper, the type space is a complete separable metric space (Polish space) and the action space is a compact metric space. A stationary Bayesian–Markov strategy is constructed as a measurable mapping that maps a tuple of the previous type profile, action profile, and player i’s current type, i.e., ( s , a , s i ) , to a probability distribution over actions. Then, there exists a stationary Bayesian–Markov equilibrium. This stationary Bayesian–Markov equilibrium concept is related to the stationary correlated equilibrium concept in Nowak and Raghavan [11] through Aumann [14].
Similarly, the proof can be extended to a game with K-period lagged revelation, i.e., in each period t, players observe type and action profiles up to period ( t K ) , whereas for periods from ( t K + 1 ) to t, they only observe the history of their own types, their own actions, and their own payoffs. Then, existence of a stationary Bayesian–Markov equilibrium with K-period lagged revelation can be proved once players’ heterogeneous beliefs η ( s i · ) are adjusted judiciously: The unrevealed history from period ( t K + 1 ) to period t plays the role of individual player’s new “type”, and beliefs are assumed to be induced by a common prior conditional on the lagged information ( s K , a K ) . Figure 2 depicts the timeline. See the Supplementary Materials for the formal proof.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/g15050031/s1, References [15,31,32] are cited in the Supplementary Materials.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Material, further inquiries can be directed to the corresponding author.

Acknowledgments

Eunmi Ko—Conceptualization; Data curation; Funding acquisition; Investigation; Methodology; Writing—original draft, review & editing. I am grateful to Narayana Kocherlakota and Paulo Barelli for invaluable support and guidance for this project. I thank three anonymous reviewers and the journal editor for helpful remarks that significantly improved this paper. I also thank (including but not limited to) Yu Awaya, David Besanko, Anmol Bhandari, Doruk Cetemen, Hari Govindan, Iru Hwang, Asen Kochov, Ichiro Obara, Ron Siegel, Takuo Sugaya, Yuichi Yamamoto, Haomiao Yu, and participants in the Midwest Theory Conference (Rochester, NY, Spring 2016), Stony Brook Festival (Summer 2016), Asian Meeting of the Econometric Society (Kyoto, Japan, Summer 2016) for helpful comments and suggestions. An earlier version of this paper was a chapter of my Ph.D. dissertation. I have read and agreed to the published version of the manuscript. All errors are my own.

Conflicts of Interest

The author declares no conflicts of interest.

Notes

1
See Saxe [1], but it is also alleged that the parable originated in ancient India, as part of a Buddhist text. See https://en.wikipedia.org/wiki/Blind_men_and_an_elephant (accessed on 1 May 2024).
2
The limited perceptions of the state of the economy can be due to others’ non-disclosure, optimal choice of individuals themselves, i.e., rational inattention (Mackowiak et al. [2]), or due to their own biases.
3
This causes a severe computational burden in empirical analyses in dynamic structural estimation as well. See Aguirregabiria and Mira [3] for more detail.
4
The private information of each firm to be serially correlated, and, thus, the history of past payoff relevant information and past actions comprises the current payoff relevant information. Then, the dimension of beliefs of firms regarding other firms’ payoff relevant information may explode over time. To obtain computational tractability, they propose a new equilibrium concept that makes use of the empirical distribution from data. If data are accumulated over time, however, the framework would lose its strength.
5
In Cole and Kocherlakota [5], the beliefs also depend on player’s current type in addition to the history of public information. The equilibrium concept is referred to as Markov private perfect equilibrium. In Athey and Bagwell [6], the beliefs are a function of the history of public information only, and they refer to the equilibrium concept as perfect public Bayesian equilibrium.
6
Hörner, Takahashi, and Vieille [7] characterize the set of feasible payoffs under certain conditions: Any of these payoffs can be achieved as a result of playing the dynamic Bayesian game via any (either stationary or non-stationary) equilibria, where they do not specify existence of equilibria. In contrast, this paper focuses on stationary equilibria and proves existence. While the result of Hörner, Takahashi, and Vieille [7] is highly regarded in the mechanism design literature, showing existence of stationary equilibria in Bayesian stochastic games is particularly valued in the dynamic structural estimation literature because existence of stationary equilibria is crucial for model tractability and estimation.
7
Handling the belief system using the history of available past information is complicated. Previous literature has circumvented the issue by introducing the belief operator T(.). For example, in Athey and Bagwell [6], the type evolution process of Player i is given by a specific process, denoted by F ( · θ i , t 1 ) , relying on just the previous type of oneself. However, the beliefs of other players about the type of Player i, denoted by μ i , t , are not related to F ( · θ i , t 1 ) . Rather, the belief can be any probability distribution based on the history of public information up to time t as long as its support includes the currently observed actions of Player i with nonzero probability. To be more concrete, Athey and Bagwell introduce an abstract belief updating operator T(.) such that μ i , t + 1 = T ( μ i , t , s i , t , z t ) allowing for any belief function to be chosen as μ i , t + 1 as long as it is compatible with the current observations of z t (where z t denotes the vector of actions taken by all the players and s i , t stands for the possibly non-truthful actions taken by Player i). As Player i can act non-truthfully and other players presume Player i’s mimicry, the dimension of the belief μ i , t grows over time: given the time-t belief, μ i , t , the belief in time ( t + 1 ) , μ i , t + 1 , is based on the public information of actions in time t. The belief in time (t+2), μ i , t + 2 , then, is based on the public information from actions in time t and ( t + 1 ) , and so forth. This is why their equilibrium strategy depends on the entire history of public information and the players’ beliefs have a time-subscript in their model.
8
For the formal definition of a transition function, see Stokey and Lucas with Prescott ([15], p. 212).
9
Precisely, I mean the Caratheodory extension of premeasure ϕ 1 × × ϕ n over the σ -algebra of ( ϕ 1 × × ϕ n ) * -measurable subsets of S; ( · ) * means the outer measure induced by a set function. The product measure ϕ is absolutely continuous with respect to ϕ 1 × × ϕ n . Thus, the information diffusion condition (Milgrom and Weber [17]) is satisfied.
10
Intuitively, a set-valued function is lower measurable if the inverse image of a closed set in the codomain, here A i 1 ( B ¯ ) , where B ¯ X i is a measurable set in the domain with respect to the σ -algebra of the domain, here S i .
11
The expression τ works for Nature while μ and η work for the individual players; τ = μ × η means the individuals have consistent beliefs with Nature. It is possible that τ μ × η if individual players exhibit bounded rationality.
12
Here, c 1 0 , c 2 0 , q 1 0 , and q 2 0 as initial conditions are required to draw the cost distributions are in the first stage game. Assume that c 1 0 = c 2 0 = α / 2 , and q 1 0 = q 2 0 = α / 3 .
13
Assume α = 7 / 2 , c 1 = 2 , c 2 = 1 , q 1 = 7 / 6 , q 2 = 1 / 6 . Then, with f i = 0.1130 · c i α · ( q 1 + q 2 ) 2 / 2 ( c 1 + c 2 ) / 2 2 + 0.4033 for all i, F = f 1 × f 2 can be the joint probability distribution. In this case, E c i c 1 , c 1 , c 2 , q 1 , q 2 = 1.6378 .
14
The expression η ( · s , a , s i ) Δ ( S i ) is a conditional distribution derived from τ ( · s , a ) Δ ( S ) ; μ ( · s , a ) Δ ( S i ) is a marginal distribution of τ ( · s , a ) Δ ( S ) .
15
Precisely, I mean the Caratheodory extension of premeasure σ 1 ( s , a , s 1 ) × × σ n ( s , a , s n ) over the σ -algebra of ( σ 1 ( s , a , s 1 ) × × σ n ( s , a , s n ) ) * -measurable subsets of A 1 ( s 1 ) × × A n ( s n ) ; ( · ) * means outer measure induced by a set function.
16
Fix ( s , a ) . Then, common prior τ ( · s , a ) is specified and so are each player’s beliefs η ( · s , a , s i ) . The absolutely continuous information structure that η i ϕ 1 × × ϕ i 1 × ϕ i + 1 × × ϕ n allows us to express player i’s interim expected payoffs in this manner. See Milgrom and Weber [17] or Balder [18] Theorem 2.5.
17
This construction closely follows Duggan’s [12] approach.
18
Observe that I choose L ( T i , λ i ) and weak-* topology for the space of continuation value functions of player i, similarly to the standard approach in general stochastic games. For the L 1 norm, · L 1 , Tonelli’s theorem is applied; see Royden–Fitzpatrick ([19], p. 420).
19
See Rudin [21]. Otherewise, consider that the subspace of rational-valued simple functions with finite support is countable and dense in L 1 ( T i , λ i ) by Theorem 19.5 of RF (p. 398) and the fact that the set of rational numbers is countable and dense in R .
20
With regard to terms, “Bayesian” is about beliefs and “Markov” is about current state (type) irrespective of the calendar time. If common prior on the current type profile is given by the fixed probability measure (i.i.d.), then T i = S i and η ( Z i s , a , s i ) = η ( Z i s i ) , and, thus, the stationary Bayesian–Markov strategy is simply σ i : S i Δ ( X i ) .
21
Recall that u i ( s , · ) L 1 ( T i , λ i ) is continuous for given s. In addition, v i L ( T i , λ i ) is a dual space (a set of continuous linear functionals) of L 1 ( T i , λ i ) . For given s and s i + , define a convergent sequence u ˇ i m u ˇ i for a fixed value of a in L 1 ( T i , λ i ) , which assigns the same values of u i ( s , a m ) u i ( s , a ) . Since v i is continuous and linear in u i , v i ( u ˇ i m ) v i ( u ˇ i ) . Then, v i ( s , · , s i + ) can be seen as continuous and linear in a.
22
Necessary and sufficient condition is that ξ ( s , α ; v ) = 0 for τ -almost all s.
23
The expression E v m ( s , a ) ( s ) is the 1 m -neighborhood of E v ( s , a ) ( s ) as v m v .
24
For the literature about solving dynamic programming under heterogeneous expectations, see the lecture notes by Wouter den Haan, https://www.wouterdenhaan.com/numerical/heteroexpecslides.pdf (accessed on 1 May 2024).
25
See Judd [28], pp. 434–436, or DeJong and Dave [29], pp. 89–94.
26
See Chapter 4 of Ljungqvist and Sargent [30] for the definition of a Chebyshev polynomial (Section 4.7.2) and how the computation of dynamic programming works in a representative agent setting.

References

  1. Saxe, J.G. The Blind Men and the Elephant; James R. Osgood and Company: Boston, MA, USA, 1873. [Google Scholar]
  2. Maćkowiak, B.; Matějka, F.; Wiederholt, M. Rational Inattention: A Review. J. Econ. Lit. 2023, 61, 226–273. [Google Scholar] [CrossRef]
  3. Aguirregabiria, V.; Mira, P. Dynamic discrete choice structural models: A survey. J. Econom. 2010, 156, 38–67. [Google Scholar] [CrossRef]
  4. Fershtman, C.; Pakes, A. Dynamic Games with Asymmetric Information: A Framework for Empirical Work. Q. J. Econ. 2012, 127, 1611–1661. [Google Scholar] [CrossRef]
  5. Cole, H.L.; Kocherlakota, N. Dynamic Games with Hidden Actions and Hidden States. J. Econ. Theory 2001, 98, 114–126. [Google Scholar] [CrossRef]
  6. Athey, S.; Bagwell, K. Collusion with Persistent Cost Shocks. Econometrica 2008, 76, 493–540. [Google Scholar] [CrossRef]
  7. Hörner, J.; Takahashi, S.; Vieille, N. Truthful Equilibria in Dynamic Bayesian Games. Econometrica 2015, 83, 1795–1848. [Google Scholar] [CrossRef]
  8. Barro, R.J.; Gordon, D.B. Rules, discretion and reputation in a model of monetary policy. J. Monet. Econ. 1983, 12, 101–121. [Google Scholar] [CrossRef]
  9. Ko, E.; Kocherlakota, N. The Athey, Atkeson, and Kehoe Model with Periodic Revelation of Private Information. Available online: http://hdl.handle.net/1802/35442 (accessed on 28 July 2023).
  10. Levy, Y.J.; McLennan, A. Corrigendum to ‘Discounted Stochastic Games with No Stationary Nash Equilibrium: Two Examples’. Econometrica 2015, 83, 1237–1252. [Google Scholar] [CrossRef]
  11. Nowak, A.; Raghavan, T. Existence of Stationary Correlated Equilibria with Symmetric Information for Discounted Stochastic Games. Math. Oper. Res. 1992, 17, 519–526. [Google Scholar] [CrossRef]
  12. Duggan, J. Noisy Stochastic Games. Econometrica 2012, 80, 2017–2045. [Google Scholar]
  13. Barelli, P.; Duggan, J. A note on semi-Markov perfect equilibria in discounted stochastic games. J. Econ. Theory 2014, 151, 596–604. [Google Scholar] [CrossRef]
  14. Aumann, R.J. Correlated Equilibrium as an Expression of Bayesian Rationality. Econometrica 1987, 55, 1–18. [Google Scholar] [CrossRef]
  15. Stokey, N.L.; Lucas, R.E.; Prescott, E.C. Recursive Methods in Economic Dynamics; Harvard University Press: Cambridge, MA, USA, 1989. [Google Scholar]
  16. Billingsley, P. Convergence of Probability Measures, 2nd ed.; Wiley: New York, NY, USA, 1999. [Google Scholar]
  17. Milgrom, P.R.; Weber, R.J. Distributional Strategies for Games with Incomplete Information. Math. Oper. Res. 1985, 10, 619–632. [Google Scholar] [CrossRef]
  18. Balder, E.J. Generalized Equilibrium Results for Games with Incomplete Information. Math. Oper. Res. 1988, 13, 265–276. [Google Scholar] [CrossRef]
  19. Royden, H.L.; Fitzpatrick, P.M. Real Analysis, 4th ed.; Pearson: London, UK, 2010. [Google Scholar]
  20. Munkres, J.R. Topology, 2nd ed.; Pearson: London, UK, 2000. [Google Scholar]
  21. Rudin, W. Functional Analysis; McGraw-Hill Book Company: New York, NY, USA, 1973. [Google Scholar]
  22. Aliprantis, C.D.; Border, K. Infinite Dimensional Analysis: A Hitchhiker’s Guide, 3rd ed.; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  23. Brezis, H. Functional Analysis, Sobolev Spaces and Partial Differential Equations; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
  24. Rao, R.R. Relations between Weak and Uniform Convergence of Measures with Applications. Ann. Math. Stat. 1962, 33, 659–680. [Google Scholar] [CrossRef]
  25. Himmelberg, C.J.; Van Vleck, F.S. Multifunctions with Values in a Space of Probability Measures. J. Math. Anal. Appl. 1975, 50, 108–112. [Google Scholar] [CrossRef]
  26. Hildenbrand, W. Core and Equilibria of a Large Economy; Princeton University Press: Princeton, NJ, USA, 1974. [Google Scholar]
  27. Himmelberg, C.J. Measurable Relations. Fund. Math. 1975, 87, 53–72. [Google Scholar] [CrossRef]
  28. Judd, K.L. Numerical Methods in Economics; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
  29. DeJong, D.N.; Dave, C. Structural Macroeconometrics, 2nd ed.; Princeton University Press: Princeton, NJ, USA, 2011. [Google Scholar]
  30. Ljungqvist, L.; Sargent, T.J. Recursive Macroeconomic Theory, 3rd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  31. Judd, K.L.; Schmedders, K.; Yeltekin, S. Optimal Rules for Patent Races. Int. Econ. Rev. 2012, 53, 23–52. [Google Scholar] [CrossRef]
  32. Zamir, S. Bayesian Games: Games with Incomplete Information. In Encyclopedia of Complexity and Systems Science; Springer: Berlin/Heidelberg, Germany, 2009; pp. 426–441. [Google Scholar]
Figure 1. Timeline.
Figure 1. Timeline.
Games 15 00031 g001
Figure 2. The K-periodic revelation.
Figure 2. The K-periodic revelation.
Games 15 00031 g002
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ko, E. Stationary Bayesian–Markov Equilibria in Bayesian Stochastic Games with Periodic Revelation. Games 2024, 15, 31. https://doi.org/10.3390/g15050031

AMA Style

Ko E. Stationary Bayesian–Markov Equilibria in Bayesian Stochastic Games with Periodic Revelation. Games. 2024; 15(5):31. https://doi.org/10.3390/g15050031

Chicago/Turabian Style

Ko, Eunmi. 2024. "Stationary Bayesian–Markov Equilibria in Bayesian Stochastic Games with Periodic Revelation" Games 15, no. 5: 31. https://doi.org/10.3390/g15050031

APA Style

Ko, E. (2024). Stationary Bayesian–Markov Equilibria in Bayesian Stochastic Games with Periodic Revelation. Games, 15(5), 31. https://doi.org/10.3390/g15050031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop