Rational Play in Extensive-Form Games

Bonanno, Giacomo

doi:10.3390/g13060072

Open AccessArticle

Rational Play in Extensive-Form Games

by

Giacomo Bonanno

Department of Economics, University of California, Davis, CA 95616-8578, USA

Games 2022, 13(6), 72; https://doi.org/10.3390/g13060072

Submission received: 24 September 2022 / Revised: 23 October 2022 / Accepted: 26 October 2022 / Published: 30 October 2022

(This article belongs to the Topic Game Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

We argue in favor of a departure from the equilibrium approach in game theory towards the less ambitious goal of describing only the actual behavior of rational players. The notions of Nash equilibrium and its refinements require a specification of the players’ choices and beliefs not only along the equilibrium play but also at counterfactual histories. We discuss an alternative—counterfactual-free—approach that focuses on choices and beliefs along the actual play, while being silent on choices and beliefs at unreached histories. Such an approach was introduced in an earlier paper that considered only perfect-information games. Here we extend the analysis to general extensive-form games (allowing for imperfect information) and put forward a behavioral notion of self-confirming play, which is close in spirit to the literature on self-confirming equilibrium. We also extend, to general extensive-form games, the characterization of rational play that is compatible with pure-strategy Nash equilibrium.

Keywords:

material rationality; behavioral model; self-confirming play; Nash equilibrium

1. Introduction

We address the issue of what kind of object qualifies as a “rational solution” of an extensive-form game. Whereas the dominant approach focuses on strategy profiles that, besides being Nash equilibria, satisfy additional—often increasingly complex—criteria, we suggest moving in the opposite direction by doing without the notion of strategy and focusing only on the actions and beliefs of the active players, that is, on choices made and beliefs held at reached histories.

The notions of Nash equilibrium and its refinements require a specification of the players’ choices and beliefs not only along the equilibrium play but also at counterfactual (that is, unreached) histories. In Section 2, we argue that pinning down counterfactual choices at unreached information histories is not a straightforward matter and that it may be futile to search for a general theory of rationality that can achieve this result. Instead, we argue in favor of a more basic approach that was put forward in [1]; while [1] was exclusively focused on perfect-information games, in this paper we extend the approach to general extensive-form games, by allowing for imperfect information. We put forward a behavior-based notion of self-confirming play—which is close in spirit to the literature on self-confirming equilibrium—and extend the characterization of rational play that is compatible with pure-strategy Nash equilibrium to general extensive-form games.

The paper is organized as follows. In Section 2, we illustrate, by means of examples, the difficulties that arise in the pursuit of specifying rational choices and beliefs at unreached information sets. In Section 3, we review the behavioral models introduced in [1], extend them to general extensive-form games and put forward a notion of self-confirming play, where each action taken is justified by the beliefs held at the time of choice and, furthermore, those beliefs turn out to be exactly correct, so that no player receives information that contradicts those beliefs (and thus experiences no regret). We also extend the characterization of rational play that is consistent with pure-strategy Nash equilibrium to general extensive-form games. Section 4 provides further discussion and a conclusion.

2. What Is a Rational Solution?

What constitutes a rational solution of an extensive-form game? Two different approaches can be found in the literature.

1. The Nash-equilibrium approach. Within this approach the notion of rationality is captured through the concept of Nash equilibrium or one of its refinements. Consider, for example, the game of Figure 1 and the strategy profile

(a, a, d)

.

Although

(a, a, d)

is a Nash equilibrium, popular refinements of Nash equilibrium would deny it the status of a “rational solution” [for example,

(a, a, d)

is not a sequential equilibrium [2] because strategy d can be a rational choice for Player 3 only if she assigns positive probability to history

b b

, but the notion of consistency— which is part of the definition of sequential equilibrium—requires Player 3’s beliefs to assign zero probability to

b b

]. Regardless of one’s views on whether

(a, a, d)

can be considered a rational solution of the game of Figure 1, there is a more fundamental issue to be considered, namely how Player 3’s strategy d should be interpreted. The common interpretation seems to be in terms of an objective counterfactual: Player 3 would play d if her information set were to be reached. It is typically the case in extensive-form games that, given a strategy profile s, there will be information sets that are not reached by the play generated by s. Thus, under this interpretation of strategies, a rational solution of the game would determine, not only what actions are actually taken by the players (that is, what the actual play of the game is), but also—counterfactually— what actions would be taken at every unreached information set.

The standard theory of counterfactuals, due to Robert Stalnaker and David Lewis [3,4,5], postulates a family of similarity relations on the set of possible worlds (one for each possible world) and the sentence “if

ϕ

were the case then

ψ

would be the case” is declared to be true at a possible world

ω

if

ψ

is true at the most similar world(s) to

ω

where

ϕ

is true. Referring to the game of Figure 1, at a world where Players 1 and 2 play

a a

, we can take

ϕ

to be the sentence “Player 3’s information set is reached” and

ψ

the sentence “Player 3 plays d”. Then the sentence “if

ϕ

were the case then

ψ

would be the case” would be true at the actual world (where Player 3’s information set is not reached, because Players 1 and 2 play

a a

) if and only if the most similar world to the actual world at which Player 3’s information is reached is one where Player 3 plays d. However, how are we to determine if the most similar world to the actual world is one where Player 3 plays d or one where Player 3 plays c?

In general, pinning down counterfactual choices at unreached information sets is not a straightforward matter. Consider, for example, the game illustrated in Figure 2 due to Perea ([6], p. 169), where Player 1 can either play b and end the game, or play a, in which case Players 1 and 2 play a “simultaneous” game.

If one appeals to backward-induction reasoning, one is led to conclude that the “rational solution” of this game is the strategy profile

(b, e)

, which incorporates the counterfactual claim that Player 2 would play e if her information set were to be reached [first apply the procedure of iterative deletion of strictly dominated strategies to the subgame that starts at history a to obtain

(c, e)

: in the subgame, for Player 2 g is strictly dominated by both e and f; after deleting g, for Player 1 d becomes strictly dominated by c; after deleting d, for Player 2 f becomes strictly dominated by e; then infer that Player 1 will play b; backward-induction reasoning is captured by such notions as “common belief in present and future rationality” [7], or forward belief in rationality [8,9].]

On the other hand, if one appeals to forward-induction reasoning (as captured by the notion of extensive-form rationalizability [10,11]) one is led to conclude that the “rational solution” of this game is the strategy profile

(b, f)

, incorporating the counterfactual claim that Player 2 would play f if her information set were to be reached [first eliminate Player 1’s strategy

a c

, since it is strictly dominated by b, and Player 2’s strategy g; then eliminate Player 1’s strategy

a d

and Player 2’s strategy e, with the conclusion that Player 1 will play b and Player 2 would play f.]

Note, however, that the prediction in terms of play is the same, namely that Player 1 would end the game by playing b; the two solutions differ only in terms of the answer to the question “what would Player 2 do if her information set were to be reached?”

The above example shows that it may be futile to search for a general theory of rationality that would pin down counterfactual choices and beliefs at unreached information sets (as shown recently by [12], besides backward-induction and forward-induction reasoning, there are other types of rationality-based reasoning that lead to the same outcome but different counterfactual “predictions” about choices at unreached information sets). It is natural, therefore, to ask: Is it essential to provide an answer to such counterfactual questions? The thesis put forward in this paper is that the answer to this question is negative.

2. The self-confirming equilibrium approach. Returning to the strategy profile

(a, a, d)

in the game of Figure 1, an alternative approach is to interpret Player 3’s strategy d not as a claim about what Player 3 would actually do in a counterfactual world where her information is reached, but as a belief, shared by Players 1 and 2, about Player 3’s hypothetical behavior. Such shared belief would support the rationality of playing a for both Players 1 and 2.

This approach is in line with the literature that identifies rational play in extensive-form games with the notion of self-confirming equilibrium, introduced in [13] (similar notions are put forward in [14,15,16]; Refs. [17,18] provide a refinement of self-confirming equilibrium that imposes constraints on the players’ beliefs about what actions an opponent could take at an off-path information set and [19] provide a generalization of self-confirming equilibrium; the related expression ‘conjectural equilibrium’ is mostly used in the context of strategic-form games: it was introduced in this context by [20,21] and is defined as a situation where each player’s strategy is a best response to a conjecture about the other players’ strategies and any information acquired after the play of the game does not induce the player to change her conjecture). A self-confirming equilibrium is a strategy profile satisfying the property that each player’s strategy is a best response to her beliefs about the strategies of her opponents, and each player’s beliefs are correct along the equilibrium play, even though beliefs about play at unreached information sets may be incorrect. The essential feature of a self-confirming equilibrium is that no player receives information that contradicts her beliefs.

If one follows the interpretation suggested above, then two issues arise. First of all, the strategy profile

(a, a, d)

(for the game of Figure 1) is now a hybrid object, incorporating—on the one hand—a prediction about actual behavior (namely, the

(a, a)

part) and—on the other hand – an encoding of the beliefs of Players 1 and 2 (namely, the d part). This leaves to be desired, since one should clearly distinguish between actions and beliefs and model the latter explicitly. The second issue is that there seems to be no reason to require different players to agree on the hypothetical choice of a third player at an unreached information set. Consider, for example, the game of Figure 3, taken from ([13], p. 533).

In this game it is rational for Player 1 to play a, if she believes that Player 2 will play A and Player 3 would play L, and it is rational for Player 2 to play A, if he believes that Player 3 would play R. Thus, the play

a A

is supported by beliefs of Players 1 and 2 that are not in agreement with each other.

Note, however, that the notion of self-confirming equilibrium is still defined in terms of a strategy profile and thus one cannot claim that

(a, A)

is a self-confirming equilibrium. One would have to state that both

(a, A, L)

and

(a, A, R)

are self-confirming equilibria sustained by Player 1’s belief (correct in the former, erroneous in the latter) that Player 3 would play L and Player 2’s belief (erroneous in the former, correct in the latter) that Player 3 would play R. In other words, also the notion of self-confirming equilibrium requires an answer to the counterfactual “what would Player 3 do if her information set were to be reached?” Note also that in the game of Figure 3 there is no Nash equilibrium that yields the play

a A

; thus, a self-confirming equilibrium need not be a Nash equilibrium.

Both the notion of Nash equilibrium and the notion of self-confirming equilibrium require specifying choices at all information sets, whether they are reached or not. From a conceptual point of view, however, it is not clear what role choices at unreached information sets play beyond expressing the beliefs of the active players along the equilibrium path. For example, consider again the game of Figure 3 and a situation where Player 1 plays a – believing that Player 2 will play A and Player 3 would play L–and Player 2 plays A, believing that Player 3 would play R. Why is this not enough as a “solution”? Why the need to settle the counterfactual concerning what Player 3 would truly do if her information set were to be reached and thus which of Players 1 and 2 is holding incorrect beliefs? [Note that, as [15] points out, in this game Player 3 gains from the uncertainty in the minds of Players 1 and 2 and, if asked what she would do, she would refuse to answer, since her payoff is largest when Players 1 and 2 play

a A

.] Furthermore, it is not clear how the counterfactual could be settled: both L and R can be justified as hypothetical rational choices for Player 3.

In this paper, we turn to an alternative approach, put forward in [1] in the context of perfect-information games, and extend it to general extensive-from games. The proposed framework restricts attention to the actual choices of the players and the beliefs that justify those choices.

3. Behavioral Models of Games

There are two types of epistemic/doxastic models used in the game-theoretic literature: the so-called “state-space” models and the “type-space” models. We will adopt the former (note that there is a straightforward way of translating one type of model into the other). In the standard state-space model of a given game, one takes as starting point a set of states (or possible worlds) and associates with every state a strategy for every player, thus providing an interpretation of a state in terms of players’ choices. If

ω

is a state and

s_{i}

is the strategy of player i at

ω

then the interpretation is that, at that state, player i plays

s_{i}

. If the game is simultaneous (so that there cannot be any unreached information sets), then there is no ambiguity in the expression “player i plays

s_{i}

”, but if the game is an extensive-form game then the expression is ambiguous. Consider, for example, the game of Figure 1 and a state

ω

where Players 1 and 2 play

a a

, so that Player 3’s information set is not reached; suppose also that the strategy of Player 3 associated with state

ω

is d. In what sense does Player 3 “play” d? Does it mean that, before the game starts, Player 3 has made a plan to play d if her information happens to be reached? Or does it mean (in a Stalnaker-Lewis interpretation of the counterfactual) that in the state most similar to

ω

where her information set is actually reached, Player 3 plays d? [This interpretation is adopted in [22] where it is pointed out that in this type of models “one possible culprit for the confusion in the literature regarding what is required to force the backward induction solution in games of perfect information is the notion of a strategy”.] Or is Player 3’s strategy d to be interpreted not as a statement about what Player 3 would do but as an expression of what the opponents think that Player 3 would do?

While most of the literature on the epistemic foundations of game theory makes use of strategy-based models, a few papers follow a behavioral approach by associating with each state a play (or outcome) of the game (the seminal contribution is [23], followed by [1,8,24,25]; the focus of this literature has been on games with perfect information). The challenge in this class of models is to capture the reasoning of a player who takes a particular action while considering what would happen if she took a different action. The most common approach is to postulate, for each player, a set of conditional beliefs, where the conditioning events are represented by possible histories in the game, including off-path histories ([23] uses extended information structures to model hypothetical knowledge, [8] use plausibility relations and [24] use conditional probability systems). Here we follow the simpler approach put forward in [1], which models the “pre-choice” beliefs of a player, while the previous literature considered the “after-choice” beliefs. The previous literature was based on the assumption that, if at a state a player takes action a, then she knows that she takes action a, that is, in all the states that she considers possible she takes action a. The pre-choice or deliberation stage approach, on the other hand, models the beliefs of the player at the time when she is contemplating the actions available to her and treats each of those actions as an “open possibility”. Thus, her beliefs take the following form: “if I take action a then the outcome will be x and if I take action b then the outcome will be y”, where the conditional “if p then q” is interpreted as a material conditional, that is, as equivalent to “either not p or q” (in [26] it is argued that, contrary to a common view, the material conditional is indeed sufficient to model deliberation; it is also shown how to convert pre-choice beliefs into after-choice beliefs, reflecting a later stage at which the agent has made up her mind on what to do). This analysis does not rely in any way on counterfactuals; furthermore, only the beliefs of the active players at the time of choice are modeled, so that no initial beliefs nor belief revision policies are postulated. The approach is described below and it makes use of the history-based definition of extensive-form game, which is reviewed in Appendix A.

As in [1] we take a non-quantitative approach based on qualitative beliefs and ordinal utility.

3.1. Qualitative Beliefs

Let

Ω

be a set, whose elements are called states (or possible worlds). We represent the beliefs of an agent by means of a binary relation

B \subseteq Ω \times Ω

. The interpretation of

(ω, ω^{'}) \in B

, also denoted by

ω B ω^{'}

, is that at state

ω

the agent considers state

ω^{'}

possible; we also say that

ω^{'}

is reachable from

ω

by

B

. For every

ω \in Ω

we denote by

B (ω)

the set of states that are reachable from

ω

, that is,

B (ω) = {ω^{'} \in Ω : ω B ω^{'}}

.

B

is transitive if

ω^{'} \in B (ω)

implies

B (ω^{'}) \subseteq B (ω)

and it is euclidean if

ω^{'} \in B (ω)

implies

B (ω) \subseteq B (ω^{'})

(it is well known that transitivity of

B

corresponds to positive introspection of beliefs: if the agent believes an event

E

then she believes that she believes

E

, and euclideanness corresponds to negative introspection: if the agent does not believe

E

then she believes that she does not believe

E

). We will assume throughout that the belief relations are transitive and euclidean so that

ω^{'} \in B (ω)

implies that

B (ω^{'}) = B (ω)

. Note that we do not assume reflexivity of

B

(that is, we do not assume that, for every state

ω

,

ω \in B (ω)

; reflexivity corresponds to the assumption that a player cannot have incorrect beliefs: an assumption that, as [27] points out, is conceptually problematic, especially in a multi-agent context). Hence, in general, the relation

B

does not induce a partition of the set of states.

Graphically, we represent a transitive and euclidean belief relation as shown in Figure 4, where

ω^{'} \in B (ω)

if and only if either there is an arrow from

ω

to the rounded rectangle containing

ω^{'}

, or

ω

and

ω^{'}

are enclosed in the same rounded rectangle (that is, if there is an arrow from state

ω

to a rounded rectangle, then, for every

ω^{'}

in that rectangle,

(ω, ω^{'}) \in B

and, for any two states

ω

and

ω^{'}

that are enclosed in a rounded rectangle,

{(ω, ω), (ω, ω^{'}), (ω^{'}, ω), (ω^{'}, ω^{'})} \subseteq B

).

The object of beliefs are propositions or events (i.e., sets of states; events are denoted by bold-type capital letters). We say that at state ω the agent believes event

E \subseteq Ω

if and only if

B (ω) \subseteq E

. For example, in the case illustrated in Figure 4, at state

α

the agent believes event

{β, γ}

. We say that, at state ω, event

E

is true if

ω \in E

. In the case illustrated in Figure 4, at state

α

the agent erroneously believes event

{β, γ}

, since event

{β, γ}

is not true at

α

(

α \notin B (α) = {β, γ}

). We say that at state ω the agent has correct beliefs if

ω \in B (ω)

(note that it is a consequence of euclideanness of the relation

B

that, even if the agent’s beliefs are objectively incorrect, she always believes that what she believes is true: if

ω^{'} \in B (ω)

then

ω^{'} \in B (ω^{'})

).

3.2. Models of Games

As a starting point in the definition of a model of a game, we take a set of states

Ω

and provide an interpretation of each state in terms of a particular play of the game, by means of a function

ζ : Ω \to Z

that associates, with every state

ω

, a play or terminal history

ζ (ω) \in Z

. Each state

ω

also provides a description of the beliefs of the active players by means of a binary relation

B_{h}

on

Ω

representing the beliefs of

ι (h)

, the player who moves at decision history h. It would be more precise to write

B_{ι (h)}

instead of

B_{h}

, but we have chosen the lighter notation since there is no ambiguity, because we assume (see Appendix A) that at every decision history there is a unique player who is active there. Note that beliefs are specified only at histories that are reached at a given state, in the sense that

B_{h} (ω) \neq ⌀

if and only if

h ≺ ζ (ω)

.

Definition 1.

A model of an extensive-form game is a tuple

〈Ω, ζ, {\{B_{h}\}}_{h \in D}〉

where

Ω is a set of states.
$ζ : Ω \to Z$ is an assignment of a terminal history to each state.
For every $h \in D$ , $B_{h} \subseteq Ω \times Ω$ is a belief relation that satisfies the following properties:
1.
$B_{h} (ω) \neq ⌀$ if and only if $h ≺ ζ (ω)$ [beliefs are specified only at reached decision histories and are consistent: consistency means that there is no event $E$ such that both $E$ and its complement $\neg E$ are believed; it is well known that, at state ω, beliefs are consistent if and only if $B (ω) \neq ⌀$ ].
2.
If $ω^{'} \in B_{h} (ω)$ then $h^{'} ≺ ζ (ω^{'})$ for some $h^{'}$ such that $h^{'} \approx_{ι (h)} h$ [the active player at history h correctly believes that her information set that contains h has been reached; recall (see Appendix A) that $h^{'} \approx_{ι (h)} h$ (also written as $h^{'} \in [h]$ ) if and only if h and $h^{'}$ belong to the same information set of player $ι (h)$ (thus $ι (h) = ι (h^{'}))$ ].
3.
If $ω^{'} \in B_{h} (ω)$ then (1) $B_{h} (ω^{'}) = B_{h} (ω)$ and (2) if $h^{'} ≺ ζ (ω^{'})$ with $h^{'} \approx_{ι (h)} h$ then $B_{h^{'}} (ω^{'}) = B_{h} (ω)$ [by (1), beliefs satisfy positive and negative introspection and, by (2), beliefs are the same at any two histories in the same information set; thus one can unambiguously refer to a player’s beliefs at an information set, which is what we do in Figures 5–9].
4.
If $ω^{'} \in B_{h} (ω)$ and $h^{'} ≺ ζ (ω^{'})$ with $h^{'} \approx_{ι (h)} h$ , then, for every action $a \in A (h)$ (note that $A (h^{'}) = A (h)$ ), there is an $ω^{″} \in B_{h} (ω)$ such that $h^{'} a ≾ ζ (ω^{″})$ .

The last condition states that if, at state

ω

and history h reached at

ω

(

h ≺ ζ (ω)

), player

ι (h)

considers it possible that the play of the game has reached history

h^{'}

, which belongs to her information set that contains h, then, for every action a available at that information set, there is a state

ω^{″}

that she considers possible at h and

ω

(

ω^{″} \in B_{h} (ω)

) where she takes action a at history

h^{'}

(

h^{'} a ≾ ζ (ω^{″})

). This means that, for every available action, the active player at h considers it possible that she takes that action and thus has a belief about what will happen conditional on taking it. A further “natural” restriction on beliefs will be discussed later (Definition 6).

Figure 5 reproduces the game of Figure 1 and shows a model of it. For every reached decision history, under every state that the corresponding player considers possible we have shown the action actually taken by that player and the player’s payoff (at the terminal history associated with that state).

Suppose, for example, that the actual state is

γ

. State

γ

encodes the following facts and beliefs.

1.: As a matter of fact, Player 1 plays a, Player 2 plays b and Player 3 plays d.
2.: Player 1 (who chooses at the null history ⌀) believes that if she plays a then Player 2 will also play a (this belief is erroneous since at state $γ$ Player 2 actually plays b, after Player 1 plays a) and thus her utility will be 2, and she believes that if she plays b then Player 2 will play a and Player 3 will play d and thus her utility will be 1.
3.: Player 2 (who chooses at information set ${a, b}$ ) correctly believes that Player 1 played a and, furthermore, correctly believes that if he plays b then Player 3 will play d and thus his utility will be 1, and believes that if he plays a his utility will be 2.
4.: Player 3 (who chooses at information set ${a b, b a, b b}$ ) erroneously believes that both Player 1 and Player 2 played b; thus, she believes that if she plays c her utility will be 0 and if she plays d her utility will be 1.

On the other hand, if the actual state is

β

, then the actual play is

a a

and the beliefs of Players 1 and 2 are as detailed above (Points 2 and 3, respectively), while no beliefs are specified for Player 3, because Player 3 does not get to play (that is, Player 3 is not active at state

β

since her information set is not reached).

3.3. Rationality

Consider again the model of Figure 5 and state

γ

. There Player 1 believes that if she takes action a, her utility will be 2, and if she takes action b, her utility will be 1. Thus, if she is rational, she must take action a. Indeed, at state

γ

she does take action a and thus she is rational (although she will later discover that her belief was erroneous and the outcome turns out to be

a b d

not

a a

so that her utility will be 1, not 2). Since Player 1 has the same beliefs at every state, we declare Player 1 to be rational at precisely those states where she takes action a, namely

β

and

γ

. Similar reasoning leads us to conclude that Player 2 is rational at those states where she takes action a, namely states

α

and

β

. Similarly, Player 3 is rational at those states where she takes action d, namely states

α

,

γ

and

δ

. If we denote by

R

the event that all the active players are rational, then in the model of Figure 5 we have that

R = {β}

(note that at state

β

Player 3 is not active).

We need to define the notion of rationality more precisely. Various definitions of rationality have been suggested in the context of extensive-form games, most notably material rationality and substantive rationality [28,29]. The notion of material rationality is the weaker of the two in that a player can be found to be irrational only at decision histories of hers that are actually reached (substantive rationality, on the other hand, is more demanding since a player can be labeled as irrational at a decision history h of hers even if h not reached). Given that we have adopted a purely behavioral approach, the natural notion for us is the weaker one, namely material rationality. We will adopt a very weak version of it, according to which at a state

ω

and reached history h (that is,

h ≺ ζ (ω)

), the active player at h is rational if the following is the case: if a is the action that the player takes at h at state

ω

(that is,

h a ≾ ζ (ω)

) then there is no other action at h that, according to her beliefs, guarantees a higher utility.

Definition 2.

Let ω be a state, h a decision history that is reached at ω (

h ≺ ζ (ω)

) and

a, b \in A (h)

two actions available at h.

(A): We say that, at ω and h, the active player $ι (h)$ believes that b is better than a if, for all $ω_{1}, ω_{2} \in B_{h} (ω)$ and for all $h^{'}$ such that $h^{'} \approx_{ι (h)} h$ (that is, history $h^{'}$ belongs to the same information set as h), if a is the action taken at history $h^{'}$ at state $ω_{1}$ , that is, $h^{'} a ≾ ζ (ω_{1})$ , and b is the action taken at $h^{'}$ at state $ω_{2}$ , that is, $h^{'} b ≾ ζ (ω_{2})$ , then $u_{ι (h)} (ζ (ω_{1})) < u_{ι (h)} (ζ (ω_{2}))$ . Thus, the active player at history h believes that action b is better than action a if, restricting attention to the states that she considers possible, the largest utility that she obtains if she plays a is less than the lowest utility that she obtains if she plays b.
(B): We say that player $ι (h)$ is rational at history h at state ω if and only if the following is true: if $h a ≾ ζ (ω)$ (that is, $a \in A (h)$ is the action played at h at state ω) then, for every $b \in A (h)$ , it is not the case that, at state ω and history h, player $ι (h)$ believes that b is better than a.

Finally, we define the event that all the active players are rational, denoted by

R

as follows:

\begin{matrix} ω \in R if and only if, for every h ≺ ζ (ω), \\ player ι (h) is rational at h (at state ω) . \end{matrix}

(1)

For example, as noted above, in the model of Figure 5 we have that

R = {β}

.

3.4. Correct Beliefs

The notion of correct belief was first mentioned in Section 3.1 and was identified with local reflexivity (that is, reflexivity at a state, rather than global reflexivity). Since, at any state, only the beliefs of the active players are specified, we define the event that players have correct beliefs by restricting attention to those players who actually move. Thus, the event that the active players have correct beliefs, denoted by

T

(‘T’ stands for ‘true’), is defined as follows:

ω \in T if and only if ω \in B_{h} (ω) for every h such that h ≺ ζ (ω) .

(2)

For example, in the model of Figure 5,

T = {β}

.

What does the expression “correct beliefs” mean? Consider state

β

in the model of Figure 5 where the active players (Players 1 and 2) have correct beliefs in the sense of (2) (

β \in B_{⌀} (β)

and

β \in B_{{a, b}} (β)

). Consider Player 1. There are two components to Player 1’s beliefs: (i) she believes that if she plays a then Player 2 will also play a, and (

i i

) she believes that if she plays b then Players 2 and 3 will play a and d, respectively. The first belief is correct at state

β

, where Player 1 plays a and Player 2 indeed follows with a. As for the second belief, whether it is correct or not depends on how we interpret it. If we interpret it as the material conditional “if b then

b a d

” (which is equivalent to “either not b or

b a d

”) then it is indeed true at state

β

, but trivially so, because the antecedent is false there (Player 1 does not play b). If we interpret it as a counterfactual conditional “if Player 1 were to play b then Players 2 and 3 would play a and d, respectively” then in order to decide whether the conditional is true or not one would need to enrich the model by adding a “similarity” or “closeness” relation on the set of states (in the spirit of [3,4]); one would then check if at the closest state(s) to

β

at which Player 1 plays b it is indeed the case that Players 2 and 3 play a and d, respectively. Note that there is no a priori reason to think that the closest state to

β

where Player 1 plays b is state

α

. This is because, as pointed out by Stalnaker ([30], p.48), there is no necessary connection between counterfactuals, which capture causal relations, and beliefs: for example, I may believe that, if I drop the vase that I am holding in my hands, it will break (because I believe it is made of glass) but my belief is wrong because—as a matter of fact—if I were to drop it, it would not break, since it is made of plastic.

Our models do not have the resources to answer the question: “at state

β

, is it true —as Player 1 believes—that if Player 1 were to play b then Players 2 and 3 would play a and d, respectively?” One could, of course, enrich the models in order to answer the question, but is there a compelling reason to do so? In other words, is it important to be able to answer such questions? If we are merely interested in determining what rational players do, then what matters is what actions they actually take and what they believe when they act, whether or not those beliefs are correct in a stronger sense than is captured by the material conditional.

Is the material conditional interpretation of “if I play a then the outcome will be x” sufficient, though? Since the crucial assumption in the proposed framework is that the agent considers all of her available actions as possible (that is, for every available action there is a doxastically accessible state where she takes that action), material conditionals are indeed sufficient: the material conditional “if I take action a the outcome will be x” zooms in—through the lens of the agent’s beliefs— on those states where action a is indeed taken and verifies that at those states the outcome is indeed x, while the states where action a is not taken are not relevant for the truth of the conditional.

3.5. Self-Confirming Play

We have defined two events: the event

R

that all the active players are rational and the event

T

that all the active players have correct beliefs. In the model of Figure 5 we have that

R \cap T = {β}

and it so happens that

ζ (β) = a a

is a Nash equilibrium play, that is, there is a pure-strategy Nash equilibrium (namely,

(a, a, d)

) whose associated play is

a a

. However, as shown below, this is not always the case.

At a play associated with a state

ω \in R \cap T

, each active player’s chosen action is rationally justified by her beliefs at the time of choice (since

ω \in R

) and the beliefs concerning what will happen after that action turn out to be correct (since

ω \in T

), so that no player is faced with evidence that her beliefs were wrong. Does that mean that, once the final outcome

ζ (ω)

is revealed, no player regrets her actual choice? The answer is negative, because it is possible that a player, while not having any false beliefs, might not anticipate with precision the actions of the players who move after her. In the model shown in Figure 6 we have that

R \cap T = {α, β, γ}

, that is, at every state the active players are rational and have correct beliefs. Consider state

β

, where the play is

a d

. At state

β

Player 1 is rational because she believes that if she plays b her utility will be 1 and if she plays a her utility might be 0 but might also be 2 (she is uncertain about what Player 2 will do). Thus, she does not believe that action b is better than a and hence it is rational for her to play a (Definition 2). Player 2 is rational because she is indifferent between her two actions. However, ex post, when Player 1 learns that the actual outcome is

a d

, she regrets not taking action b instead of a. This example shows that, even though

β \in R \cap T

,

ζ (β) = a d

is not a Nash equilibrium play, that is, there is no Nash equilibrium whose associated play is

a d

.

Next we introduce another event which, in conjunction with

T

, guarantees that the active players’ beliefs about the opponents’ actual moves are exactly correct (note that a requirement built in the definition of a self-fulfilling equilibrium ([13], p.523) is that “each player’s beliefs about the opponents’ play are exactly correct”). Event

C

(’C’ for ’certainty’) defined below rules out uncertainty about the opponents’ past choices (Point 1) as well as uncertainty about the opponents’ future choices (Point 2). Note that Point 1 is automatically satisfied in games with perfect information and thus imposes restrictions on beliefs only in imperfect-information games.

Definition 3.

A state ω belongs to event

C

if and only if, for every reached history h at ω (that is, for every

h ≺ ζ (ω))

, and

\forall ω^{'}, ω^{″} \in B_{h} (ω)

,

\forall h^{'}, h^{″} \in [h]

(recall that

[h]

is the information set that contains h),

1.: if $h^{'} ≺ ζ (ω^{'})$ and $h^{″} ≺ ζ (ω^{″})$ then $h^{'} = h^{″}$ ,
2.: $\forall a \in A (h)$ , if $h^{'} a ≺ ζ (ω^{'})$ and $h^{″} a ≺ ζ (ω^{″})$ then $ζ (ω^{'}) = ζ (ω^{″})$ .

Note that—concerning Point 1—a player may be erroneous in her certainty about the opponents’ past choices, that is, it may be that

ω \in C

, the actual reached history is

h ≺ ζ (ω)

and yet player

ι (h)

is certain that she is moving at history

h^{'} \in [h]

with

h^{'} \neq h

(for example, in the model of Figure 5, at state

γ

, which belongs to event

C

, and at reached history

a b

, Player 3 is certain that she is moving at history

b b

while, as a matter of fact, she is moving at history

a b

), and—concerning Point 2— a player may also be erroneous in her certainty about what will happen after her choice (for example, in the model of Figure 5, at state

γ

and history ⌀, Player 1 is certain that if she takes action a then Player 2 will also play a, but she is wrong about this, because, as a matter of fact, at state

γ

Player 2 follows with b rather than a).

In the model of Figure 5

C = Ω

, while in the model of Figure 6

C = ⌀

, because at the null history ⌀ Player 1 is uncertain about what will happen if she takes action a.

If state

ω

belongs to the intersection of events

C

and

T

then, at state

ω

, each active player’s beliefs about the opponents’ actual play are exactly correct. Note, however, that—as noted in Section 3.4—there is no way of telling whether or not a player is also correct about what would happen after her counterfactual choices, because the models that we are considering are not rich enough to address the issue of counterfactuals.

Definition 4.

Let G be a game and z a play (or terminal history) in G. We say that z is a self-confirming play if there exists a model of G and a state ω in that model such that (1)

ω \in R \cap T \cap C

and (2)

z = ζ (ω)

.

Definition 5.

Given a game G and a play z in G, call z a Nash play if there is a pure-strategy Nash equilibrium whose induced play is z.

It turns out that, in perfect-information games in which no player moves more than once along any play, the two notions of self-confirming play and Nash play are equivalent ([1], Proposition 1, p. 1012). For games with imperfect information, while it is still true that a Nash play is a self-confirming play, there may be self-confirming plays that are not Nash plays. The reason for this is that two players might have different beliefs about the potential choice of a third player. Figure 7 reproduces the game of Figure 3 together with a model of it.

In the model of Figure 7,

R = {γ}

,

T = {β, γ}

and

C = Ω

, so that

R \cap T \cap C = {γ}

. Thus, at state

γ

the active players (Players 1 and 2) are rational, have correct beliefs and have no uncertainty and yet

ζ (γ) = a A

which is not a Nash play (there is no Nash equilibrium that yields the play

a A

). Players 1 and 2 have different beliefs about what Player 3 would do at her information set: at state

γ

Player 1 believes that if she plays d then Player 3 will play L, while Player 2 believes that if he plays D then Player 3 will play R.

Next we introduce a new event, denoted by

A

(‘A’ stands for ‘agreement’), that rules out such disagreement and use it to provide a doxastic characterization of Nash play in general games (with possibly imperfect information). First we need to add one more condition to the definition of a model of a game that is relevant only if the game has imperfect information.

The definition of model given in Section 3 (Definition 1) allows for “unreasonable” beliefs that express a causal link between a player’s action and her opponent’s reaction to it, when the latter does not observe the former’s choice. As an illustration of such beliefs, consider a game where Player 1 moves first, choosing between actions a and b, and Player 2 moves second choosing between actions c and d without being informed of Player 1’s choice, that is, histories a and b belong to the same information set of Player 2. Definition 1 allows Player 1 to have the following beliefs: “if I play a, then Player 2 will play c, while if I play b then Player 2 will play d”. Such beliefs ought to be rejected as “irrational” on the grounds that there cannot be a causal link between Player 1’s move and Player 2’s choice, since Player 2 does not get to observe Player 1’s move and thus cannot react differently to Player 1’s choice of a and Player 1’s choice of b. [It should be noted, however, that several authors have argued that such beliefs are not necessarily irrational: see, for example, [31,32,33,34,35,36]. A “causally correct” belief for Player 1 would require that the predicted choice(s) of Player 2 be the same, no matter what action Player 1 herself chooses.

Definition 6.

A causally restricted model of a game is a model (Definition 1) that satisfies the following additional restriction (a verbal interpretation follows; note that, for games with perfect information, there is no difference between a model and a restricted model, since (3) is vacuously satisfied).

5.: Let ω be a state, h a decision history reached at ω ( $h ≺ ζ (ω)$ ) and a and b two actions available at h ( $a, b \in A (h)$ ). Let $h_{1}$ and $h_{2}$ be two decision histories that belong to the same information set of player $j = ι (h_{1})$ ( $h_{1} \approx_{j} h_{2}$ ) and $c_{1}, c_{2}$ be two actions available at $h_{1}$ ( $c_{1}, c_{2} \in A (h_{1}) = A (h_{2})$ ). Then the following holds (recall that $[h]$ denotes the information set that contains decision history h, that is, $h^{'} \in [h]$ if and only if $h^{'} \approx_{ι (h)} h$ ):

\begin{matrix} if h^{'}, h^{″} \in [h], ω_{1}, ω_{2} \in B_{h} (ω), h^{'} a ≺ h_{1} c_{1} ≾ ζ (ω_{1}) and \\ h^{″} b ≺ h_{2} c_{2} ≾ ζ (ω_{2}), then either c_{1} = c_{2} or there exist \\ ω_{1}^{'}, ω_{2}^{'} \in B_{h} (ω) such that h^{'} a ≺ h_{1} c_{2} ≾ ζ (ω_{1}^{'}) and \\ h^{″} b ≺ h_{2} c_{1} ≾ ζ (ω_{2}^{'}) . \end{matrix}

(3)

In words: if, at state

ω

and reached history h, player

i = ι (h)

considers it possible that, if she takes action a, history

h_{1}

is reached and player

j = ι (h_{1})

takes action

c_{1}

at

h_{1}

and player i also considers it possible that, if she takes action b, then history

h_{2}

is reached, which belongs to the same information set as

h_{1}

, and player j takes action

c_{2}

at

h_{2}

, then either

c_{1} = c_{2}

or at state

ω

and history h player i must also consider it possible that (1) after taking action a,

h_{1}

is reached and player j takes action

c_{2}

at

h_{1}

and (2) after taking action b,

h_{2}

is reached and player j takes action

c_{1}

at

h_{2}

.

Figure 8 shows a game and four partial models of it, giving only the beliefs of Player 1 (at history ⌀): two of them violate Condition 5 of Definition 6 (the ones on the left that are labeled “not allowed”), while the other two satisfy it. Note that the models shown in Figure 5, Figure 6 and Figure 7 are all causally restricted models.

Now we turn to the notion of agreement, which is intended to rule out situations like the one shown in Figure 7 where Players 1 and 2 disagree about what action Player 3 would take at her information set

{d, a D}

.

Definition 7.

We say that at state ω active players i and j consider future information set

[h]

of player

k = ι (h)

if there exist

1.: two decision histories $h_{1}$ and $h_{2}$ that are reached at ω (that is, $h_{1} ≺ h_{2} ≺ ζ (ω)$ ) and belong to i and j, respectively, (that is, $i = ι (h_{1})$ and $j = ι (h_{2})$ ),
2.: states $ω_{1} \in B_{h_{1}} (ω)$ and $ω_{2} \in B_{h_{2}} (ω)$ ,
3.: decision histories $h^{'}, h^{″} \in [h]$ ,

such that, for some

h_{1}^{'} \approx_{i} h_{1}

,

h_{1}^{'} ≺ h^{'} ≺ ζ (ω_{1})

and, for some

h_{2}^{'} \approx_{j} h_{2}

,

h_{2}^{'} ≺ h^{″} ≺ ζ (ω_{2})

.

That is, player i at

h_{1}

considers it possible that the play has reached history

h_{1}^{'} \in [h_{1}]

and, after taking an action at

h_{1}^{'}

, information set

[h]

of player k is reached, and player j at

h_{2}

considers it possible that the play has reached history

h_{2}^{'} \in [h_{2}]

and, after taking an action at

h_{2}^{'}

, that same information set

[h]

of player k is reached.

Definition 8.

We say that at state ω active players i and j are in agreement if, for every future information set

[h]

that they consider (Definition 7), they predict the same choices(s) of player

k = ι (h)

at h, that is, if player i is active at reached history

h_{1}

and player j is active at reached history

h_{2}

, with

h_{1} ≺ h_{2} ≺ ζ (ω)

, then

1.: if $ω_{1} \in B_{h_{1}} (ω)$ and $h_{1}^{'} ≺ h^{'} a ≾ ζ (ω_{1})$ with $h_{1}^{'} \in [h_{1}]$ , $h^{'} \in [h]$ and $a \in A (h)$ , then there exists an $ω_{2} \in B_{h_{2}} (ω)$ such that, for some $h^{″} \in [h]$ and $h_{2}^{'} \in [h_{2}]$ , $h_{2}^{'} ≺ h^{″} a ≾ ζ (ω_{2})$ , and
2.: if $ω_{2} \in B_{h_{2}} (ω)$ with $h_{2}^{'} ≺ h^{″} b ≾ ζ (ω_{2})$ with $h_{2}^{'} \in [h_{2}]$ , $h^{″} \in [h]$ and $b \in A (h)$ then here exists an $ω_{1} \in B_{h_{1}} (ω)$ such that, for some $h^{'} \in [h]$ and $h_{1}^{'} \in [h_{1}]$ , $h_{1}^{'} ≺ h^{'} b ≾ ζ (ω_{1})$ .

Finally we define the event, denoted by

A

, that any two active players are in agreement:

ω \in A if and only if any two players active at ω are in agreement at ω .

(4)

We can now state our characterization result, according to which a self-confirming play is a Nash play if and only if the beliefs of any two players are in agreement about the hypothetical choice(s) of a third player at a future information set that they both consider. As in [1] we restrict attention to games that satisfy the property that each player moves at most once along any play. Equivalently, one could consider the agent form of the game, where the same player at different information sets is regarded as different players, but with the same payoff function.

Proposition 1.

Consider a finite extensive-form game G where no player moves more than once along any play. Then,

(A): If z is a Nash play of G then there is a causally restricted model of G and a state $ω$ in that model such that (1) $ζ (ω) = z$ and (2) $ω \in R \cap T \cap C \cap A$ .
(B): For any causally restricted model of G and for every state $ω$ in that model, if $ω \in R \cap T \cap C \cap A$ then $ζ (ω)$ is a Nash play.

The proof of Proposition 1 is given in Appendix B.

Note that, in a perfect-information game,

T \cap C \subseteq A

. Hence Proposition 1 in [1] is a corollary of the above Proposition.

4. Further Discussion and Conclusions

A reviewer suggested a discussion of the similarities and differences between the approach put forward in this paper and Steven Brams’ Theory of Moves (TOM) [37] (see also the very recent [38]). TOM deals mostly with two-person strategic-form games in which each player has a strict ordinal ranking of the outcomes. TOM assumes that, instead of choosing strategies simultaneously and independently, players start from an outcome (that is, a strategy profile)—called the “initial state”—and from that outcome they consider the consequences of a series of moves and countermoves that lead from state to state. The sequence of moves and countermoves is strictly alternating and the process continues until the game terminates in a “final state” which is called the “final outcome” or simply “outcome” of the game. It is assumed that no payoffs accrue to players from being in a state unless it is the final state (which could be the initial state if the players choose not to move from it). Players make farsighted calculations of where play will terminate after a finite sequence of moves and countermoves. The result of such farsighted calculations is called a Non-Myopic Equilibrium (NME). Thus, an NME can be understood as the backward-induction solution of a finite perfect-information “moves game” (with no ties).

There are two points in common between TOM and our approach. First of all, only ordinal preferences are considered in both approaches. Secondly, both approaches are based on a departure from standard solution concepts in game theory. However, the differences between the two approaches are substantial. Brams’ theory is not based at all on epistemic considerations: no beliefs are attributed to the players and the conceptual nature of a “state” is very different; in TOM a state is merely an outcome, while in our doxastic approach a state is described not only in terms of an outcome but also in terms of doxastic relations that describe what the active players believe about the possible outcomes when it is their turn to move. Furthermore, while TOM starts from a strategic-form game and builds on it a finite perfect-information game by specifying an initial outcome and the rules for moves and countermoves from it, we analyze a given extensive-form game (with possibly imperfect information) without modifying it in any way. Our approach falls within the “epistemic foundations approach” in which beliefs play an essential role. TOM, on the other hand, is entirely “belief-free”.

In Section 2, we raised the question “what constitutes a rational solution of an extensive-form game?” Most of the epistemic game theory literature has gone in the direction of imposing more and more subtle and complex conditions on counterfactual beliefs and choices of the players at unreached information sets. We suggested going in the opposite direction, by focusing only on the actions and beliefs of the active players. Within this framework, a natural notion of rational play is captured by the definition of self-confirming play, where each action taken is justified by the beliefs held at the time of choice and those beliefs turn out to be exactly correct, so that no player receives information that contradicts those beliefs and thus experiences no regret. This approach is flexible enough to allow one to explore the epistemic foundations of standard solution concepts such as Nash equilibrium and backward induction (within the behavioral approach described in this paper, the epistemic conditions needed to obtain a characterization of backward induction in perfect information games, or a generalized version of it for a class of games with imperfect-information, are investigated in [1] and [9], respectively).

The characterization of Nash play given in Proposition 1—unlike characterizations of Nash equilibrium provided for strategic-form games (for a discussion of the relevant literature the reader is referred to ([1], Section 6))—does not require players to believe in each other’s rationality. This can be seen in the game and model shown in Figure 9, where

R = {γ}

,

T = {β, γ}

and

C = A = {α, β, γ}

, so that

R \cap T \cap C \cap A = {γ}

but at

γ

Player 1 does not believe that Player 2 is rational, because

β \in B_{⌀} (γ)

and at

β

Player 2 is not rational (she plays d believing that c gives her higher utility).

In Definition 4 we put forward the notion of self-confirming play, which is in the spirit of self-confirming equilibrium ([13]), but framed in behavioral terms and without making use of the notion of strategy. Whereas in perfect-information games the notion of self-confirming play is equivalent to the notion of Nash play, the equivalence does not extend to imperfect information games. Proposition 1 identified the additional restriction that is needed to characterize the set of Nash plays in games with imperfect information.

The main purpose of this paper was to show that one can go a long way in the analysis of rational play in extensive-form games without using the notion of strategy, that is, without the need to specify choices at all histories—even those that are not reached—and without the need to model players’ beliefs at unreached histories. We argued that the standard approach based on Nash equilibrium and its refinements is too ambitious in its goal to tackle the counterfactual behavior and beliefs of players at unreached histories and that there is no need to pursue this goal in order to have a theory of rational behavior in dynamic games.

In what directions can the approach discussed in this paper be further developed? One natural extension is to move from ordinal preferences to von Neumann-Morgenstern preferences and from qualitative beliefs to probabilistic beliefs; one would then, correspondingly, move from the very weak definition of rationality given in Definition 2 to the stronger definition of rationality as expected utility maximization. Another possible line of inquiry would be to identify the circumstances (if any) that would make the structures used in this paper inadequate and would require a full analysis in terms of counterfactuals (which in turn would require extending those structures by adding similarity relations among states, as explained in Section 3).

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

I am grateful to three anonymous reviewers and to participants in the Workshop on Epistemic Game Theory (EPICENTER, Maastricht University, July 2022) and the LOFT conference (Groningen University, July 2022) for useful comments.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. The History-Based Definition of Extensive-Form Game

For simplicity we will restrict attention to games with ordinal payoffs and without chance moves. We will not, however, make the common assumption of “no relevant ties” or genericity of payoffs; furthermore we allow for imperfect information.

If A is a set, we denote by

A^{*}

the set of finite sequences in A. If

h = 〈a_{1}, \dots, a_{k}〉 \in A^{*}

and

1 \leq i \leq k

, the sequence

h^{'} = 〈a_{1}, \dots, a_{i}〉

is called a prefix of h and we denote this by

h^{'} ≾ h

; furthermore, if

h^{'} ≾ h

and

h^{'} \neq h

then we write

h^{'} ≺ h

and say that

h^{'}

is a proper prefix of h. If

h = 〈a_{1}, \dots, a_{k}〉 \in A^{*}

and

a \in A

, we denote the sequence

〈a_{1}, \dots, a_{k}, a〉 \in A^{*}

by

h a

.

A finite extensive form without chance moves is given by the following elements, where all the sets are finite:

1.: A set of players denoted by N.
2.: A set of actions, denoted by A.
3.: A set of histories, denoted by $H \subseteq A^{*}$ , which satisfies the property that, if $h \in H$ and $h^{'} \in A^{*}$ is a prefix of h, then $h^{'} \in H$ . The null history $,$ denoted by ⌀, belongs to H and is a prefix of every history. A history $h \in H$ such that, for every $a \in A$ , $h a \notin H$ , is called a terminal history or play. Z denotes the set of terminal histories and $D = H ∖ Z$ the set of decision histories.
4.: To every decision history is assigned a player, by means of a function $ι : D \to N$ . Thus, $ι (h) \in N$ is the player who moves, or is active, at $h \in D$ . For notational simplicity we assume that there is exactly one player who is active active at any decision history; thus, a simultaneous move by, say, Players 1 and 2 is represented in the traditional way by having Player 1 move first followed by Player 2, who is not informed of Player 1’s move. Let $D_{i} = {h \in D : i = ι (h)}$ denote the set of histories at which player i is active. For every $h \in D$ , $A (h)$ denotes the set of actions available at h (to player $ι (h)$ ), that is, $a \in A (h)$ if and only if $a \in A$ and $h a \in H$ .
5.: For every player $i \in N$ , we postulate an equivalence relation $\approx_{i}$ on $D_{i}$ : $h \approx_{i} h^{'}$ if and only if, when choosing an action at history $h \in D_{i}$ , player i does not know whether she is moving at h or at $h^{'}$ . The equivalence class of $h \in D$ is denoted by $[h]$ and is called an information set of player $ι (h)$ ; thus $[h] = {h^{'} \in D_{ι (h)} : h \approx_{ι (h)} h^{'}}$ . The actions available at an information set are not allowed to differ across histories in that information set, that is, if $h \approx_{i} h^{'}$ then $A (h^{'}) = A (h)$ . We also assume the property of perfect recall, according to which a player always remembers her own past moves: if $h_{1}, h_{2} \in D_{i}$ , $a \in A (h_{1})$ and $h_{1} a$ is a prefix of $h_{2}$ then, for every $h^{'}$ such that $h^{'} \approx_{i} h_{2}$ , there exists an $h \approx_{i} h_{1}$ such that $h a$ is a prefix of $h^{'}$ .
When every information set consists of a single history, the game is said to have perfect information, otherwise it is said to have imperfect information.

In order to lighten the notation, histories will be denoted succinctly by listing the corresponding actions, without brackets, without commas and omitting the empty history: thus instead of writing

〈⌀, a_{1}, a_{2}, a_{3}, a_{4}〉

we will simply write

a_{1} a_{2} a_{3} a_{4}

.

An extensive game with ordinal payoffs is obtained from a given extensive form, by adding, for every player

i \in N

, a complete and transitive preference relation

R_{i}

over the set Z of terminal histories; the interpretation of

z R_{i} z^{'}

is that player i considers z to be at least as good as

z^{'}

. It is often convenient to replace the relation

R_{i}

with a real-valued utility (or payoff) function

u_{i} : Z \to R

satisfying the property that

u_{i} (z) \geq u_{i} (z^{'})

if and only if

z R_{i} z^{'}

.

Appendix B. Proof of Proposition 1

Given a finite extensive-form game and a pure-strategy profile s, define the function

f_{s} : H \to Z

as follows: if

z \in Z

(that is, if z is a terminal history) then

f_{s} (z) = z

and if

h \in D

(that is, if h is a decision history) then

f_{s} (h)

is the terminal history reached from h by following the choices prescribed by s. We denote by

z_{s}^{*}

the play generated by s, that is, the terminal history reached by s from the null history:

z_{s}^{*} = f_{s} (⌀)

. We say that

z_{s}^{*}

avoids information set

[h]

if, for all

h^{'} \in [h]

,

h^{'} ⊀ z_{s}^{*}

. If

z_{s}^{*}

does not avoid information set

[h]

then we denote the unique history in

[h]

that is a prefix of

z_{s}^{*}

by

h_{s}^{*} ([h])

(thus

h_{s}^{*} ([h]) \in [h]

and

h_{s}^{*} ([h]) ≺ z_{s}^{*}

).

Definition A1.

Given an extensive-form game G, denote by I the set of information sets. Let s be a pure-strategy profile of G. A selection function based on s is a function

g_{s} : I \to D

that selects for every information set

[h] \in I

a unique decision history in

[h]

subject to the constraint that if

z_{s}^{*}

does not avoid information set

[h]

then

g_{s} ([h]) = h_{s}^{*} ([h])

.

Definition A2.

Let G be an extensive-form game, s a pure strategy profile and

g_{s}

a selection function based on s. The model of G generated by s and

g_{s}

is the following model.

$Ω = Z$ .
$ζ : Z \to Z$ is the identity function: $ζ (z) = z, \forall z \in Z$ .
For every $h \in D$ and $z \in Z$ define $B_{h} (z)$ as follows:
1.
If $h ⊀ z$ , then $B_{h} (z) = ⌀$ .
2.
If $h ≺ z_{s}^{*}$ then $B_{h} (z_{s}^{*}) = \{z^{'} \in Z : z^{'} = f_{s} (h a) for some a \in A (h)\}$ . [That is, if h is on the play generated by s, then at h the active player believes that, for every available action a, if she takes action a then the outcome will be the terminal history reached from $h a$ by s.]
3.
If $h ⊀ z_{s}^{*}$ , but $[h]$ is not avoided by $z_{s}^{*}$ , then, for all $z \in Z$ such that $h ≺ z$ , $B_{h} (z) = \{z^{'} \in Z : z^{'} = f_{s} (h_{s}^{*} ([h]) a) for some a \in A (h)\}$ . [That is, at every decision history in an information set crossed by the play generated by s, the player believes that the play has reached history $h_{s}^{*} ([h])$ (the history in $[h]$ that is on the play to $z_{s}^{*}$ ) and her beliefs are as given in Point 2.]
4.
If $[h]$ is avoided by $z_{s}^{*}$ , let $\hat{h} = g_{s} ([h])$ . Then, for every $h^{'} \in [h]$ and every $z \in Z$ such that $h^{'} ≺ z$ , $B_{h^{'}} (z) = {z^{'} \in Z : z^{'} = f_{s} (\hat{h} a) for some a \in A (h)}$ . [That is, at every decision history in an information set that is not crossed by the play generated by s, the player believes that she is at the history selected by $g_{s}$ , denoted by $\hat{h}$ , and that, for every available action a, if she takes action a then the outcome will be the terminal history reached from $\hat{h} a$ by s.]

Remark A1.

Note that the model generated by a pure-strategy profile s and a selection function

g_{s}

is a causally restricted model (Definition 6).

Remark A2.

Let G be a finite extensive-form game and consider the model generated by a pure-strategy profile s of G and a selection function

g_{s}

(Definition A2). Then the no-uncertainty conditions 1 and 2 of Definition 3 and the agreement condition (4) are satisfied at every state, that is,

C = A = Z

. Furthermore, by Point 1 in Definition A2,

z_{s}^{*} \in B_{h} (z_{s}^{*})

for all h such that

h ≺ z_{s}^{*}

; that is,

z_{s}^{*} \in T

.

We can now prove Proposition 1.

Proof.

(A) [Note that, for this part of the proof, the restriction that no player moves more than once along any play of the game is not needed.] Fix a finite extensive-form game G and let s be a pure-strategy Nash equilibrium s of G. Fix a selection function

g_{s}

based on s (Definition A1) and consider the model generated by s and

g_{s}

(Definition A2). By Remark A2,

z_{s}^{*} \in C \cap T \cap A

(recall that

z_{s}^{*}

is the play generated by s, that is,

z_{s}^{*} = f_{s} (⌀)

). Thus, it only remains to show that

z_{s}^{*} \in R

. If h is a decision history, denote by

s (h)

the choice selected by s at h. Fix an arbitrary decision history h that is reached at state

z_{s}^{*}

(that is,

h ≺ z_{s}^{*}

) and let a be the action at h such that

h a ≾ z_{s}^{*}

, that is,

s (h) = a

; then

f_{s} (h a) = f_{s} (⌀) = z_{s}^{*}

. Suppose that player

ι (h)

is not rational at h. Then there must be a

b \in A (h) ∖ {a}

that guarantees a higher utility to player

ι (h)

: if

z^{'} \in B_{h} (z_{s}^{*})

is such that

h b ≾ z^{'}

, then

u_{ι (h)} (z^{'}) > u_{ι (h)} (z_{s}^{*})

. By Definition A2,

z^{'} = f_{s} (h b)

so that

u_{ι (h)} (f_{s} (h b)) > u_{ι (h)} (f_{s} (h a))

; hence, by unilaterally changing her strategy at h from a to b (while leaving the rest of her strategy unchanged), player

ι (h)

can increase her payoff, contradicting the assumption that s is a Nash equilibrium.

(B) Fix a finite extensive-form game G where no player moves more than once along any play and consider an arbitrary model of it where there is a state

α

such that

α \in R \cap T \cap C \cap A

. We want to show that we can construct a pure-strategy Nash equilibrium s of G such that

f_{s} (⌀) = ζ (α)

.

STEP 1. If h is a decision history on the play

ζ (α)

, that is,

h ≺ ζ (α)

, let

s (h) = a

where

a \in A (h)

is the action at h such that

h a ≾ ζ (α)

.

STEP 2. Fix an arbitrary decision history h that is reached at state

α

(that is,

h ≺ ζ (α)

) and an arbitrary

b \in A (h)

such that

h b

is not a prefix of

ζ (α)

(that is,

b \neq s (h)

where

s (h)

was defined in Step 1). By Definition of model (Definition 1) there exists an

\hat{ω} \in B_{h} (α)

such that

\hat{h} b ≾ ζ (\hat{ω})

for some

\hat{h} \in [h]

. Since

α \in C

, by Point 1 of Definition 3 for every

ω^{'} \in B_{h} (α)

and for every

h^{'} \in [h]

, if

h^{'} b ≾ ζ (ω^{'})

then

h^{'} = \hat{h}

. Since

α \in C

, by Point 2 of Definition 3 for any other

ω \in B_{h} (α)

such that

\hat{h} b ≾ ζ (ω)

,

ζ (ω) = ζ (\hat{ω})

. Define, for every

h^{'}

such that

\hat{h} b ≾ h^{'} ≺ ζ (\hat{ω})

,

s (h^{'}) = c

where

c \in A (h^{'})

is the action at

h^{'}

such that

h^{'} c ≾ ζ (\hat{ω})

. Note that, since

α \in A

, if any other active player at any reached history at state

α

considers the information set that contains history

h^{'}

, then that player will also predict choice c at

h^{'}

. Thus,

s (h^{'})

is well defined.

Steps 1 and 2 define the choices prescribed by s along the play

ζ (α)

as well as for paths to terminal histories following one-step deviations from this play.

STEP 3. Complete s in an arbitrary way.

Because of Step 1,

ζ (α) = f_{s} (h)

, for every

h ≾ ζ (α)

(in particular,

f_{s} (⌀) = ζ (α)

). We want to show that s is a Nash equilibrium. Suppose not. Then there is a decision history h such that

h ≺ ζ (α)

(that is, h reached at state

α

) and, by switching her choice at h from

s (h)

to a different choice, player

ι (h)

can increase her payoff (by hypothesis there are no successors of h that belong to player

ι (h)

). Let

s (h) = a

(that is,

h a ≾ ζ (α)

) and let b be the choice at h that yields a higher payoff to player

ι (h)

; that is,

u_{ι (h)} (f_{s} (h b)) > u_{ι (h)} (ζ (α)) .

(A1)

By Item 4 of Definition 1 there exists a

β \in B_{h} (α)

such that

h b ≾ ζ (β)

. Since

α \in C

, for every

ω \in B_{h} (α)

such that

h b ≾ ζ (ω)

,

ζ (ω) = ζ (β)

. By Step 2 above,

ζ (β) = f_{s} (h b) .

(A2)

Hence, by (A2), at decision history h and state

α

, player

ι (h)

believes that if she plays b her payoff will be

u_{ι (h)} (f_{s} (h b))

. Since

α \in T

,

α \in B_{h} (α)

, and since

α \in C

, for every

ω \in B_{h} (α)

such that

h a ≾ ζ (ω)

,

ζ (ω) = ζ (α)

. Hence, at state

α

and history h, player

ι (h)

believes that if she plays a her payoff will be

u_{ι (h)} (ζ (α))

. It follows from this and (A1) that at

α

and h player

ι (h)

believes that action b is better than action a (Definition 2), which implies that at

α

player

ι (h)

is not rational, contradicting the assumption that at

α \in R

. □

References

Bonanno, G. Behavior and deliberation in perfect-information games: Nash equilibrium and backward induction. Int. J. Game Theory 2018, 47, 1001–1032. [Google Scholar] [CrossRef] [Green Version]
Kreps, D.; Wilson, R. Sequential equilibrium. Econometrica 1982, 50, 863–894. [Google Scholar] [CrossRef]
Lewis, D. Counterfactuals; Harvard University Press: Cambridge, MA, USA, 1973. [Google Scholar]
Stalnaker, R. A theory of conditionals. In Studies in Logical Theory; Rescher, N., Ed.; Blackwell: Oxford, UK, 1968; pp. 98–112. [Google Scholar]
Stalnaker, R.; Thomason, R. A semantical analysis of conditional logic. Theoria 1970, 36, 246–281. [Google Scholar]
Perea, A. Backward induction Versus Forw. Induction Reason. Games 2010, 1, 168–188. [Google Scholar] [CrossRef] [Green Version]
Perea, A. Belief in the opponents’ future rationality. Games Econ. Behav. 2014, 83, 231–254. [Google Scholar] [CrossRef]
Baltag, A.; Smets, S.; Zvesper, J. Keep ’hoping’ for rationality: A solution to the backward induction paradox. Synthese 2009, 169, 301–333. [Google Scholar]
Bonanno, G. A doxastic behavioral characterization of generalized backward induction. Games Econ. Behav. 2014, 88, 221–241. [Google Scholar]
Battigalli, P.; Siniscalchi, M. Strong belief and forward induction reasoning. J. Econ. Theory 2002, 106, 356–391. [Google Scholar]
Pearce, D. Rationalizable strategic behavior and the problem of perfection. Econometrica 1984, 52, 1029–1050. [Google Scholar]
Meier, M.; Perea, A. Forward Induction in a Backward Inductive Manner. Technical Report, EPICENTER, Maastricht University. 2022. Available online: https://www.epicenter.name/Perea/Papers/BI-FI-procedure.pdf (accessed on 23 September 2022).
Fudenberg, D.; Levine, D. Self-confirming equilibrium. Econometrica 1993, 61, 523–545. [Google Scholar]
Battigalli, P.; Guaitoli, D. Conjectural equilibria and rationalizability in a game with incomplete information. In Decisions, Games and Markets; Battigalli, P., Montesano, A., Panunzi, F., Eds.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1997; pp. 97–124. [Google Scholar]
Greenberg, J. The right to remain silent. Theory Decis. 2000, 48, 193–204. [Google Scholar]
Greenberg, J.; Gupta, S.; Luo, X. Mutually acceptable courses of action. Econ. Theory 2009, 40, 91–112. [Google Scholar]
Dekel, E.; Fudenberg, D.; Levine, D. Payoff information and self-confirming equilibrium. J. Econ. Theory 1999, 89, 165–185. [Google Scholar] [CrossRef] [Green Version]
Dekel, E.; Fudenberg, D.; Levine, D. Subjective uncertainty over behavior strategies: A correction. J. Econ. Theory 2002, 104, 473–478. [Google Scholar] [CrossRef] [Green Version]
Fudenberg, D.; Kamada, Y. Rationalizable partition-confirmed equilibrium. Theor. Econ. 2015, 10, 775–806. [Google Scholar]
Battigalli, P. Comportamento Razionale ed Equilibrio nei Giochi e nelle Situazioni Strategiche. Unpublished Dissertation, Bocconi University, Milano, Italy, 1987. [Google Scholar]
Gilli, M. Metodo Bayesiano e Aspettative nella Teoria dei Giochi e nella Teoria Economica. Unpublished Dissertation, Bocconi University, Milano, Italy, 1987. [Google Scholar]
Halpern, J. Substantive rationality and backward induction. Games Econ. Behav. 2001, 37, 425–435. [Google Scholar]
Samet, D. Hypothetical knowledge and games with perfect information. Games Econ. Behav. 1996, 17, 230–251. [Google Scholar] [CrossRef] [Green Version]
Battigalli, P.; Di-Tillio, A.; Samet, D. Strategies and interactive beliefs in dynamic games. In Advances in Economics and Econometrics. Theory and Applications: Tenth World Congress, Volume 1; Acemoglu, D., Arellano, M., Dekel, E., Eds.; Cambridge University Press: Cambridge, UK, 2013; pp. 391–422. [Google Scholar]
Bonanno, G. A dynamic epistemic characterization of backward induction without counterfactuals. Games Econ. Behav. 2013, 78, 31–43. [Google Scholar]
Bonanno, G. The material conditional is sufficient to model deliberation. Erkenntnis. February 2021. ISSN 1572-8420. Available online: https://link.springer.com/article/10.1007/s10670-020-00357-7 (accessed on 23 September 2022).
Stalnaker, R. Knowledge, belief and counterfactual reasoning in games. Econ. Philos. 1996, 12, 133–163. [Google Scholar]
Aumann, R. Backward induction and common knowledge of rationality. Games Econ. Behav. 1995, 8, 6–19. [Google Scholar] [CrossRef]
Aumann, R. On the centipede game. Games Econ. Behav. 1998, 23, 97–105. [Google Scholar] [CrossRef] [Green Version]
Stalnaker, R. Belief revision in games: Forward and backward induction. Math. Soc. Sci. 1998, 36, 31–56. [Google Scholar] [CrossRef]
Bicchieri, C.; Green, M. Symmetry arguments for cooperation in the Prisoner’s Dilemma. In The Logic of Strategy; Bicchieri, C., Jeffrey, R., Skyrms, B., Eds.; Oxford University Press: Oxford, UK, 1999; pp. 175–195. [Google Scholar]
Gauthier, D. Morals by Agreement; Oxford University Press: Oxford, UK, 1986. [Google Scholar]
Nozick, R. Newcomb’s problem and two principles of choice. In Essays in Honor of Carl G. Hempel: A Tribute on the Occasion of His Sixty-Fifth Birthday; Rescher, N., Ed.; Springer: Dordrecht, The Netherlands, 1969; pp. 114–146. [Google Scholar] [CrossRef]
Spohn, W. Dependency equilibria and the causal structure of decision and game situations. Homo Oeconomicus 2003, 20, 195–255. [Google Scholar] [CrossRef]
Spohn, W. Dependency equilibria. Philos. Sci. 2007, 74, 775–789. [Google Scholar] [CrossRef] [Green Version]
Spohn, W. From Nash to dependency equilibria. In Logic and the Foundations of Game and Decision Theory—LOFT 8; Bonanno, G., Löwe, B., van der Hoek, W., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 135–150. [Google Scholar]
Brams, S.J. Theory of Moves; Cambridge University Press: Cambridge, UK, 1993. [Google Scholar]
Brams, S.; Ismail, M. Every normal-form game has a Pareto-optimal nonmyopic equilibrium. Theory Decis. 2022, 92, 349–362. [Google Scholar] [CrossRef]

Figure 1. An extensive game with imperfect information.

Figure 2. The conflict between the backward-induction-based counterfactual and the forward-induction-based counterfactual encoded in Player 2’s strategy.

Figure 3. The play

a A

is consistent with the notion of self-confirming equilibrium, even though there is no Nash equilibrium that yields

a A

.

Figure 3. The play

a A

is consistent with the notion of self-confirming equilibrium, even though there is no Nash equilibrium that yields

a A

.

Figure 4. The relation

B = {(α, β), (α, γ), (β, β), (β, γ), (γ, β), (γ, γ)}

.

Figure 4. The relation

B = {(α, β), (α, γ), (β, β), (β, γ), (γ, β), (γ, γ)}

.

Figure 5. The top part reproduces the game of Figure 1 and the bottom part shows a model of it.

Figure 6. A perfect-information game and a model of it.

Figure 7. The game of Figure 3 and a model of it.

Figure 8. A game and four partial models of it (showing only the beliefs of Player 1 at history ⌀), two of which violate Condition 5 of Definition 6 and the remaining two do not.

Figure 9.

γ \in R \cap T \cap C \cap A

but at

γ

Player 1 does not believe that Player 2 is rational.

Figure 9.

γ \in R \cap T \cap C \cap A

but at

γ

Player 1 does not believe that Player 2 is rational.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bonanno, G. Rational Play in Extensive-Form Games. Games 2022, 13, 72. https://doi.org/10.3390/g13060072

AMA Style

Bonanno G. Rational Play in Extensive-Form Games. Games. 2022; 13(6):72. https://doi.org/10.3390/g13060072

Chicago/Turabian Style

Bonanno, Giacomo. 2022. "Rational Play in Extensive-Form Games" Games 13, no. 6: 72. https://doi.org/10.3390/g13060072

APA Style

Bonanno, G. (2022). Rational Play in Extensive-Form Games. Games, 13(6), 72. https://doi.org/10.3390/g13060072

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Rational Play in Extensive-Form Games

Abstract

1. Introduction

2. What Is a Rational Solution?

3. Behavioral Models of Games

3.1. Qualitative Beliefs

3.2. Models of Games

3.3. Rationality

3.4. Correct Beliefs

3.5. Self-Confirming Play

4. Further Discussion and Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. The History-Based Definition of Extensive-Form Game

Appendix B. Proof of Proposition 1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI