*Article* **Subgame Consistent Cooperative Behavior in an Extensive form Game with Chance Moves**

**Denis Kuzyutin 1,2 and Nadezhda Smirnova 1,2,\***


Received: 28 May 2020; Accepted: 23 June 2020; Published: 1 July 2020

**Abstract:** We design a mechanism of the players' sustainable cooperation in multistage *n*-person game in the extensive form with chance moves. When the players agreed to cooperate in a dynamic game they have to ensure time consistency of the long-term cooperative agreement. We provide the players' rank based (PRB) algorithm for choosing a unique cooperative strategy profile and prove that corresponding optimal bundle of cooperative strategies satisfies time consistency, that is, at every subgame along the optimal game evolution a part of each original cooperative trajectory belongs to the subgame optimal bundle. We propose a refinement of the backwards induction procedure based on the players' attitude vectors to find a unique subgame perfect equilibrium and use this algorithm to calculate a characteristic function. Finally, to ensure the sustainability of the cooperative agreement in a multistage game we employ the imputation distribution procedure (IDP) based approach, that is, we design an appropriate payment schedule to redistribute each player's optimal payoff along the optimal bundle of cooperative trajectories. We extend the subgame consistency notion to extensive-form games with chance moves and prove that incremental IDP satisfies subgame consistency, subgame efficiency and balance condition. An example of a 3-person multistage game is provided to illustrate the proposed cooperation mechanism.

**Keywords:** time consistency; multistage game; chance moves; subgame perfect equilibria; cooperative trajectory; imputation distribution procedure

### **1. Introduction**

In a dynamic n-person game the players first choose their "optimal" strategies at the initial position *x*<sup>0</sup> (which form the optimal strategy profile for the whole game), and then have an option to change their strategies at any intermediate position *xt* and switch to other strategies if these strategies constitute the locally optimal strategy profile for the subgame starting at *xt*. The time consistency property (first introduced in References [1–3] for differential games) ensures that the players will not have an incentive to change their strategies at any subgame along the optimal game evolution, and hence plays an important role in the designing of the optimal players' behavior in non-cooperative and cooperative dynamic games (see, e.g., References [2–21], for details).

We consider an *n*-person finite multistage games in the extensive form (see, e.g., References [5,17,22,23]) with perfect information and with chance moves. Note that much research has been already done on time consistent solutions (or close concepts) in extensive-form games (see, e.g., References [4,6,13,17,21]). Time consistency concept was extended to dynamic games played over event trees in References [14,16,20] as well as to multicriteria extensive-form cooperative games (without chance moves) in References [7,8,10,11,15]. The property of "time consistency in the whole game" was extended to multicriteria extensive-form cooperative games with chance moves in Reference [9] (note

that in these games an optimal pure strategy profile does not generate the unique optimal trajectory in the game tree but rather the whole optimal bundle of the trajectories).

In the paper, we mainly focus on the dynamic aspects of cooperation in a dynamic extensive-form game with chance moves, and propose to design a mechanism of the players' sustainable cooperation which satisfies three properties. First, a fragment of each cooperative trajectory from the optimal bundle for the original game Γ*x*<sup>0</sup> should "remain optimal" at each subgame Γ*xt* along the cooperative game evolution, that is, it should belong to the subgame optimal bundle of cooperative trajectories. Secondarily, a cooperative payoff-to-go at the subgame Γ*xt* is no less than the non-cooperative payoff-to-go for all players. Finally, when the players re-evaluate their expected cooperative payoff after each passed chance move, they have no incentive to change original cooperative agreement.

To this aim, we first need to provide a rule for choosing a unique cooperative strategy profile as well as the unique optimal bundle of cooperative trajectories. We introduce the *Players' Rank Based (PRB) algorithm* and prove that this algorithm generates the unique optimal bundle of cooperative trajectories which satisfies time consistency. Note that a rather close approach—the so-called Refined Leximin (RL) algorithm—was introduced recently in Reference [8]. Let us notice the main differences of these two algorithms. The RL algorithm is applicable for multicriteria game without chance moves and is based on the ranking of the criteria, while the PRB algorithm is designed for single-criterion extensive-form game with chance moves and employed the players' ranks. Further, the RL algorithm allows to choose a unique cooperative trajectory while the PRB algorithm generates the unique optimal bundle of the cooperative trajectories in the game tree. To the best of the authors' knowledge, other approaches to choosing an optimal bundle of the cooperative trajectories in extensive-form game with chance moves have not been considered yet.

Then, to construct a characteristic function (which describes the worth of each coalition in cooperative game) we use an equilibrium-based approach, namely the *γ*-characteristic function introduced in Reference [24]. Hence, the players have to accept a specific method for choosing a unique Subgame Perfect Equilibria (SPE) [25] in an extensive-form game with chance moves. To solve this problem we provide the novel refinement of the backwards induction procedure (see, e.g., References [5,17,23])—the so-called *Attitude SPE algorithm*. A similar approach to construct a unique SPE in extensive-form game with perfect information was explored in References [17,26,27] and was called the Type Equilibrium (TE) algorithm. Both algorithms are the refinements of the general backwards induction procedure that take into account the attitudes of each player towards other players. Let us point out the main differences of these algorithms. The TE algorithm is applicable for the game without chance moves and for the case when the payoffs are only determined in terminal nodes. In addition, the TE algorithm allows to construct SPE that is "unique" in the sense of payoffs (i.e., there may exist several optimal trajectories which generate the same equilibrium payoffs) while the Attitude SPE algorithm allows to choose unique SPE strategy profile as well as unique bundle of trajectories. Another rather close approach to find a unique SPE—the so-called Indifferent Equilibrium (IE) algorithm—was introduced in Reference [28]. Again, the IE algorithm is applicable only for the game without chance moves and for the partial case when the payoffs are determined in terminal nodes. Moreover, IE algorithm in general allows to construct a SPE in behavior strategies while the proposed Attitude SPE algorithm always generates a SPE in pure strategies.

It is worth noting, that other approaches to analyze an extensive-form game, except for the backwards induction procedure and its refinements mentioned above, imply that the researcher first needs to obtain a strategic representation of the original extensive game and then analyzes this strategic (or normal-form) game (see, e.g., References [29–31] ). For instance, the software tool "Game Theory Explorer" [29] is based on the strategic-form representation and then applying the modified Lemke-Howson algorithm [32] to find all Nash equilibria. The majority of existing algorithms are developed to find Nash equilibria in mixed strategies for 2-person games and do not allow to construct SPE in pure strategies. Moreover, as it was noted in Reference [31], in general the strategic-form representation is exponential in the size of the original game tree. In contrast, the proposed Attitude SPE algorithm is a rather simple recursive algorithm which deals with n-person extensive-form game (with perfect information) itself and allows to compute a unique SPE in pure strategies.

After computing the *γ*-characteristic function we suppose that the players adopt some single-valued cooperative solution *ϕ* (for instance, the Shapley value [33], the nucleolus [34], etc.) which satisfies the individual and collective rationality property. Finally, to guarantee the sustainability of the achieved long-term cooperative agreement we employ the *Imputation Distribution Procedure (IDP)* based approach (see, e.g., References [3,12,14,16–18,20,35]), that is, a payment schedule to redistribute the *i*th player's expected cooperative payoff along the optimal bundle of cooperative trajectories. In this paper, we mainly focus on the following good properties an IDP may satisfy: subgame efficiency, strict balance condition [10,15,17] and an appropriate refinement of the time consistency property, called *subgame consistency*. The point is that the "time consistency in the whole game" property [9,14,16,20] is based on an a priori assessment of the *i*th player's expected optimal payoff (before the game Γ*x*<sup>0</sup> starts). However, when the players make a decision in the subgame Γ*xt* after the chance move occurs, they need to re-estimate their expected optimal payoffs-to-go since the original optimal bundle of cooperative trajectories shrinks after each chance node. To deal with this interesting feature of the game with chance moves we adopt the notion of subgame consistency that was firstly proposed in Reference [36] for cooperative stochastic differential games and then extend it to stochastic dynamic games in References [37,38].

Since we derive a suitable definition of subgame consistency for other class of games, the proposed Definition 6 differs from ones provided in References [37,38] but captures the same idea. Let us point out the main differences with References [37,38]. While D. Yeung and L. Petrosyan do not consider the issue of multiple equilibria and study the stochastic games in which there exists a unique Nash equilibrium in each subgame, we focus on the problem of how to select a unique (subgame perfect) Nash equilibrium in extensive-form game with chance moves and derive the corresponding algorithm. Secondarily, the characteristic function has not been constructed in References [37,38] and, hence, the players are restricted to using the simplest cooperative solutions (for instance, they may share equally the excess of the total expected cooperative payoff over the expected sum of individual non-cooperative payoffs), whereas we provide a method for calculating the *γ*-characteristic function. Hence, the players may use different solution concepts based on the characteristic function approach. Finally, it turns out that the incremental IDP specified for extensive-form games with chance moves in Reference [9] satisfies not only the subgame consistency but also subgame efficiency and strict balance condition.

Therefore, the suggested PRB algorithm, the Attitude SPE algorithm combined with the *γ*-characteristic function, and the incremental payment schedule for any single-valued cooperative solution (meeting individual and collective rationality) together constitute a required mechanism of the players' sustainable cooperation that satisfies exactly three properties mentioned above for any extensive-form game with chance moves.

It is worth noting that the extensive-form games, as well as dynamic games played over event trees, differential games and multistage games with discrete dynamics are used to model various real-world situations where several decision makers (or players) with different objectives may cooperate (see, e.g., References [5,12,14,16,17,20,39–44]. Hence, a proposed approach to implement a long-term cooperative agreement may have a number of possible applications.

The rest of the paper is organized as follows: Section 2 recalls the main ingredients of the class of games of interest. In Section 3, we specify the attitude SPE algorithm that allows constructing a unique SPE in a extensive-form game with chance moves. In Section 4, we provide the PRB algorithm and prove that the optimal bundle of cooperative trajectories generated by this algorithm satisfy time consistency. Section 5 reveals a drawback of the IDP "time consistency in the whole game" property and presents a subgame consistency definition that is applicable for extensive-form games with chance moves. We prove that incremental IDP satisfies a number of good properties and consider an example of a 3-person multistage game with chance moves to illustrate the incremental IDP implementation. Section 6 provides a brief review of the results and discussion.

### **2. Extensive-Form Game with Chance Moves**

We consider a finite multistage game in extensive form following References [6,13,17,22,23]. First we need to define the basic notations and briefly remind some properties of extensive-form game that will be used in the sequel:


In the following, we will use *Gcm*(*n*) to denote the class of all finite multistage *n*-person games with chance moves in extensive form defined above, where <sup>Γ</sup>*x*<sup>0</sup> <sup>∈</sup> *<sup>G</sup>cm*(*n*) denotes a game with root *x*0. Note that Γ*x*<sup>0</sup> is an extensive-form game with perfect information (see, e.g., References [17,22,23] for details).

Since all the solutions we are interested in throughout the paper are attainable when the players restrict themselves to the class of pure strategies we will focus on this class of strategies. The pure strategy *ui*(·) of the *i*th player is a function with domain *Pi* that specifies for each node *x* ∈ *Pi* the next node *ui*(*x*) ∈ *S*(*x*) which the player *i* has to choose at *x*. Let *Ui* denote the (finite) set of all *i*th player's pure strategies, *<sup>U</sup>* = <sup>∏</sup>*i*∈*<sup>N</sup> Ui*.

Denote by *p*(*y*|*x*, *u*) the conditional probability that node *y* ∈ *S*(*x*) is reached if node *x* has been already reached (the probability of transition from *x* to *y*) while the players use the strategies *ui*, *i* ∈ *N*. Note that for all *x* ∈ *Pi*, *i* = 1, ... , *n*, and for all *y* ∈ *S*(*x*) *p*(*y*|*x*, *u*) = 1 if *ui*(*x*) = *y* and *p*(*y*|*x*, *u*) = 0 if *ui*(*x*) = *y*. For chance moves, that is, if *x* ∈ *P*<sup>0</sup> *p*(*y*|*x*, *u*) = *π*(*y*|*x*) for all *y* ∈ *S*(*x*) for each *u* ∈ *U*.

Then one can calculate the probability *p*(*ω*, *u*) of realization of the trajectory *ω* = (*x*0, ... , *xτ*, *xτ*<sup>+</sup>1, ... , *xT*), *xT* ∈ *Pn*+1, *xτ*+<sup>1</sup> ∈ *S*(*xτ*), *τ* = 0, ... , *T* − 1, when the players use the strategies *ui* from the strategy profile *u* = (*u*1,..., *un*).

$$p(\omega, u) = p(\mathbf{x}\_1|\mathbf{x}\_0, u) \cdot p(\mathbf{x}\_2|\mathbf{x}\_1, u) \cdot \dots \cdot p(\mathbf{x}\_T|\mathbf{x}\_{T-1}, u) = \prod\_{\tau=0}^{T-1} p(\mathbf{x}\_{\tau+1}|\mathbf{x}\_{\tau}, u). \tag{1}$$

Denote by Ω(*u*) = {*ωk*(*u*)|*p*(*ωk*, *u*) > 0} the finite set (or the bundle) of the trajectories *ω<sup>k</sup>* which are generated by strategy profile *u* ∈ *U*. Note that for all *ωk*(*u*) ∈ Ω(*u*), *uj*(*xτ*) = *xτ*+<sup>1</sup> for all *x<sup>τ</sup>* ∈ *ωk*(*u*) ∩ *Pj*, *j* ∈ *N*, 0 ≤ *τ* ≤ *T* − 1.

Let ˜ *hi*(*ω*) = *<sup>T</sup>* ∑ *τ*=0 *hi*(*xτ*) denote the *i*th player's vector payoff corresponding to the trajectory *ω* = (*x*0,..., *xt*, *xt*+1,..., *xT*).

Denote by

$$H\_i(u) = \sum\_{\omega\_k \in \Omega(u)} p(\omega\_k, u) \cdot \tilde{h}\_i(\omega\_k) = \sum\_{\omega\_k \in \Omega(u)} p(\omega\_k, u) \cdot \sum\_{\tau=0}^{T(k)} h\_i(x\_\tau) \tag{2}$$

the (expected) value of the *i*th player's payoff function which corresponds to the strategy profile *u* = (*u*1, ... , *un*). Let Ω*n*+1(*u*) = {Ω(*u*) ∩ *Pn*+1} denote the set of all terminal nodes of the trajectories *ωk*(*u*) ∈ Ω(*u*).

**Remark 1** ([9])**.** *If the pure strategy profiles u and v generate different bundles* Ω(*u*) *and* Ω(*v*) *of the trajectories, that is,* Ω(*u*) = Ω(*v*)*, then* Ω*n*+1(*u*) ∩ Ω*n*+1(*v*) = ∅*.*

According to References [17,22,23] each intermediate node *xt* <sup>∈</sup> *<sup>P</sup>* \ *Pn*+<sup>1</sup> generates a subgame <sup>Γ</sup>*xt* with the subgame tree *Kxt* and the subgame root *xt* as well as a factor-game Γ*<sup>D</sup>* with the factor-game tree *<sup>K</sup><sup>D</sup>* = (*<sup>K</sup>* \ *<sup>K</sup>xt*) ∪ {*xt*}. Decomposition of the original extensive game <sup>Γ</sup>*xt* at node *xt* onto the subgame Γ*xt* and the factor-game Γ*<sup>D</sup>* generates the corresponding decomposition of the pure (and mixed) strategies (see References [17,22] for details).

Let *Pxt <sup>i</sup>* (*P<sup>D</sup> <sup>i</sup>* ), *<sup>i</sup>* <sup>∈</sup> *<sup>N</sup>*, denote the restriction of *Pi* on the subgame tree *<sup>K</sup>xt*(*KD*), and *<sup>u</sup>xt <sup>i</sup>* (*u<sup>D</sup> <sup>i</sup>* ), *i* ∈ *N*, denote the restriction of the *<sup>i</sup>*th player's pure strategy *ui*(·) in <sup>Γ</sup>*x*<sup>0</sup> on *<sup>P</sup>xt <sup>i</sup>* (*P<sup>D</sup> <sup>i</sup>* ). The pure strategy profile *uxt* = (*uxt* <sup>1</sup> , ... , *<sup>u</sup>xt <sup>n</sup>* ) generates the bundle of the subgame trajectories <sup>Ω</sup>*xt*(*uxt*) = {*ωxt <sup>k</sup>* (*uxt*)|*p*(*ωxt <sup>k</sup>* , *<sup>u</sup>xt*) > 0}. Similarly to (2), let us denote by

$$H\_{\boldsymbol{i}}^{\boldsymbol{x}\_{\boldsymbol{l}}}(\boldsymbol{u}^{\boldsymbol{x}\_{\boldsymbol{l}}}) = \sum\_{\boldsymbol{\omega}\_{\boldsymbol{k}}^{\boldsymbol{x}\_{\boldsymbol{l}}} \in \Omega^{\boldsymbol{x}\_{\boldsymbol{l}}}(\boldsymbol{u}^{\boldsymbol{x}\_{\boldsymbol{l}}})} p(\boldsymbol{\omega}\_{\boldsymbol{k}}^{\boldsymbol{x}\_{\boldsymbol{l}}}, \boldsymbol{u}^{\boldsymbol{x}\_{\boldsymbol{l}}}) \cdot \sum\_{\boldsymbol{\tau} = \boldsymbol{t}}^{T(\boldsymbol{k})} h\_{\boldsymbol{i}}(\boldsymbol{x}\_{\boldsymbol{\tau}}) = \sum\_{\boldsymbol{\omega}\_{\boldsymbol{k}}^{\boldsymbol{x}\_{\boldsymbol{l}}} \in \Omega^{\boldsymbol{x}\_{\boldsymbol{l}}}(\boldsymbol{u}^{\boldsymbol{x}\_{\boldsymbol{l}}})} p(\boldsymbol{\omega}\_{\boldsymbol{k}}^{\boldsymbol{x}\_{\boldsymbol{l}}}, \boldsymbol{u}^{\boldsymbol{x}\_{\boldsymbol{l}}}) \cdot \tilde{h}\_{\boldsymbol{i}}(\boldsymbol{\omega}\_{\boldsymbol{k}}^{\boldsymbol{x}\_{\boldsymbol{l}}}) \tag{3}$$

the expected value of the *i*th player's payoff in Γ*xt* , and by *Uxt <sup>i</sup>* the set of all possible *i*th player's pure strategies in the subgame Γ*xt* , *Uxt* = ∏ *i*∈*N Uxt <sup>i</sup>* . Note that for each trajectory *ω* = (*x*0, ... , *xt*, *xt*+1, ... , *xT*), 1 *t T* − 1, *xT* ∈ *Pn*+1,

$$\begin{split} p(\boldsymbol{\omega}, \boldsymbol{\mu}) &= \prod\_{\tau=0}^{t-1} p(\boldsymbol{x}\_{\tau+1} | \boldsymbol{x}\_{\tau}, \boldsymbol{\mu}) \cdot \prod\_{\tau=t}^{T-1} p(\boldsymbol{x}\_{\tau+1} | \boldsymbol{x}\_{\tau}, \boldsymbol{\mu}) = \\ p(\boldsymbol{\underline{\omega}^{\boldsymbol{x}\_{l}}}, \boldsymbol{\mu}) \cdot p(\boldsymbol{\omega}^{\boldsymbol{x}\_{l}}, \boldsymbol{\mu}) &= p(\boldsymbol{\underline{\omega}^{\boldsymbol{x}\_{l}}}, \boldsymbol{\mu}^{\boldsymbol{D}}) \cdot p(\boldsymbol{\omega}^{\boldsymbol{x}\_{l}}, \boldsymbol{\mu}^{\boldsymbol{x}\_{l}}), \end{split} \tag{4}$$

where *<sup>ω</sup>xt* = (*x*0, *<sup>x</sup>*1, ... , *xt*−1, *xt*) denotes a fragment of trajectory *<sup>ω</sup>* implemented before the subgame Γ*xt* starts, and *p*(*ωxt* , *u*) = *p*(*xt*, *u*) denotes the probability that node *xt* is reached when the players employ the strategies *ui*, *<sup>i</sup>* <sup>∈</sup> *<sup>N</sup>*. It is worth noting that factor-game <sup>Γ</sup>*<sup>D</sup>* <sup>=</sup> <sup>Γ</sup>*D*(*uxt*) is usually defined for given strategy profile *uxt* in the subgame Γ*xt* since we assume that

$$h\_i^D(\mathbf{x}\_0, \mathbf{x}\_1, \dots, \mathbf{x}\_{t-1}, \mathbf{x}\_t) = \sum\_{\tau=0}^{t-1} h\_i(\mathbf{x}\_\tau) + H\_i^{\mathbf{x}\_l}(\mathbf{u}^{\mathbf{x}\_l}) = \tilde{h}\_i(\underline{\omega}^{\mathbf{x}\_l} \mid \{\mathbf{x}\_l\}) + H\_i^{\mathbf{x}\_l}(\mathbf{u}^{\mathbf{x}\_l}) \tag{5}$$

(see, e.g., References [17,22] for details). Moreover, given intermediate node *xt*, the bundle Ω(*u*) = {*ωk*(*u*)|*p*(*ωk*, *u*) > 0} can be divided in two subsets, that is, Ω(*u*) = {Ψ*m*}∪{*χl*}, where *xt* ∈ Ψ*m*, and *xt* ∈/ *χl*, {Ψ*m*}∩{*χl*} = ∅. Then, taking (1), (3), (4) and (5) into account, we get

*Hi*(*u*) = ∑ *m <sup>p</sup>*(Ψ*m*, *<sup>u</sup>*) · ˜ *hi*(Ψ*m*) + ∑ *l <sup>p</sup>*(*χl*, *<sup>u</sup>*) · ˜ *hi*(*χl*) = = ∑ *m <sup>p</sup>*(*xt*, *<sup>u</sup>*) · *<sup>p</sup>*(Ψ*xt <sup>m</sup>* , *<sup>u</sup>xt*) · " ˜ *hi*(Ψ*xt <sup>m</sup>* \ {*xt*}) + ˜ *hi*(Ψ*xt m* ) # + + ∑ *l <sup>p</sup>*(*χl*, *<sup>u</sup>*) · ˜ *hi*(*χl*) = *<sup>p</sup>*(*xt*, *<sup>u</sup>D*) · ˜ *hi*(*x*0,..., *xt*−1) · ∑ *m p*(Ψ*xt <sup>m</sup>* , *uxt*)+ <sup>+</sup> *<sup>p</sup>*(*xt*, *<sup>u</sup>D*) · ∑ *m p*(Ψ*xt <sup>m</sup>* , *<sup>u</sup>xt*) · ˜ *hi*(Ψ*xt <sup>m</sup>* ) + ∑ *l <sup>p</sup>*(*χl*, *<sup>u</sup>*) · ˜ *hi*(*χl*) = <sup>=</sup> *<sup>p</sup>*(*xt*, *<sup>u</sup>D*) · ˜ *hi*(*x*0,..., *xt*−1) + *<sup>p</sup>*(*xt*, *<sup>u</sup>D*) · *<sup>H</sup>xt <sup>i</sup>* (*uxt*)+ + ∑ *l <sup>p</sup>*(*χl*, *<sup>u</sup>*) · ˜ *hi*(*χl*) = *<sup>p</sup>*(*xt*, *<sup>u</sup>D*) · *<sup>h</sup><sup>D</sup> <sup>i</sup>* (*x*0,..., *xt*) + ∑ *l <sup>p</sup>*(*χl*, *<sup>u</sup>*) · ˜ *hi*(*χl*). (6)

Note that, since *Pi* = *<sup>P</sup>xt <sup>i</sup>* <sup>∪</sup> *<sup>P</sup><sup>D</sup> <sup>i</sup>* , one can compose the *<sup>i</sup>*th player's pure strategy *Wi* = (*u<sup>D</sup> <sup>i</sup>* , *<sup>v</sup>xt <sup>i</sup>* ) ∈ *Ui* in the original game Γ*x*<sup>0</sup> from her strategies *vxt <sup>i</sup>* <sup>∈</sup> *<sup>U</sup>xt <sup>i</sup>* in the subgame <sup>Γ</sup>*xt* and *<sup>u</sup><sup>D</sup> <sup>i</sup>* <sup>∈</sup> *<sup>U</sup><sup>D</sup> <sup>i</sup>* in the factor-game Γ*<sup>D</sup>* [17,22].

### **3. Refined Backwards Induction Procedure to Construct a Unique SPE**

**Definition 1** ([45])**.** *A strategy profile <sup>u</sup>* = (*u*1, *<sup>u</sup>*2, ... , *un*) *is a Nash Equilibrium (NE) in* <sup>Γ</sup>*x*<sup>0</sup> <sup>∈</sup> *<sup>G</sup>cm*(*n*)*, if*

$$H\_i(\upsilon\_i, u\_{-i}) \lessapprox H\_i(u\_i, u\_{-i}), \ \forall \upsilon\_i \in \mathcal{U}\_i, \ \forall i \in \mathcal{N}.$$

*Let NE*(Γ*x*<sup>0</sup> ) *denote the set of all pure strategy Nash equilibria in* Γ*x*<sup>0</sup> *.*

**Definition 2** ([25])**.** *A strategy profile <sup>u</sup> is a subgame perfect (Nash) equilibrium (SPE) in* <sup>Γ</sup>*x*<sup>0</sup> <sup>∈</sup> *<sup>G</sup>cm*(*n*)*, if* <sup>∀</sup>*<sup>x</sup>* <sup>∈</sup> *<sup>P</sup>* \ *Pn*+<sup>1</sup> *it holds that <sup>u</sup><sup>x</sup>* <sup>∈</sup> *NE*(Γ*x*)*, i. e. the restriction of <sup>u</sup> on each subgame* <sup>Γ</sup>*<sup>x</sup> forms a NE in this subgame.*

To construct SPE in an extensive-form game with perfect information one may employ a so-called backwards induction procedure (see, e.g., References [12,17,22,23,46,47]).

However, the backwards induction procedure may generate multiple subgame perfect equilibriums in an extensive form game with different payoffs to the players (see, e.g., References [5,12,17,23]). To choose a unique SPE and unique corresponding bundle of trajectories we use an approach based on the players' attitude vectors. Namely, let the *i*th player's attitude vector *Fi* = { *fi*(1), ... , *fi*(*n*)} be a permutation of numbers {1, ... , *n*} meeting the condition *fi*(*i*) = 1. If *fi*(*j*) = *k* one may interpret the player *j* as an "*i*th player's associate of level *k*".

In the paper we will use these attitude vectors when constructing SPE via backwards induction procedure in the following way. Let *<sup>x</sup>* <sup>∈</sup> *Pi*, *<sup>H</sup><sup>y</sup> <sup>i</sup>* (*uy*) denote the *<sup>i</sup>*th player's expected payoff in the subgame <sup>Γ</sup>*y*, *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>*(*x*) while *<sup>u</sup><sup>y</sup>* be a SPE in this subgame. Assume that there exist multiple nodes *<sup>y</sup>*1, ... , *yq* such that *hi*(*y*1) + *<sup>H</sup>y*<sup>1</sup> *<sup>i</sup>* (*uy*<sup>1</sup> ) = ... <sup>=</sup> *hi*(*yq*) + *<sup>H</sup>yq <sup>i</sup>* (*uyq* ), that is, player *<sup>i</sup>* is indifferent to the choice of particular node *y* from {*y*1, ... , *yq*} while the *i*th player's choice may affect the other players' payoffs. If *fi*(*j*) = 2, suppose that the *i*th player aims to maximize firstly the *j*th player's expected payoff *H<sup>y</sup> <sup>j</sup>* (*uy*) when choosing a unique node *<sup>y</sup>* from *<sup>y</sup>*1, ... , *yq*. If again there are several nodes *<sup>y</sup>* with the same value *H<sup>y</sup> <sup>j</sup>* (*uy*) the *<sup>i</sup>*th player purposes to maximize secondarily the expected payoff *Hy <sup>l</sup>* (*uy*) of such player *<sup>l</sup>* that *fi*(*l*) = 3, and so on. Note that similar approach to construct a unique SPE in extensive-form game with perfect information but without chance moves was explored in References [17,26,27] for the case when the payoffs are only determined in terminal nodes.

Now let us provide a rigorous specification of this backwards induction procedure refinement which we will refer to as the *Attitude SPE or A-SPE algorithm*.

**Attitude SPE algorithm**. Suppose that the players attitude vectors *F*1, *F*2, ... , *Fn* are of common knowledge, i. e. each player knows these vectors, and all the players are aware of it. Let the length .

of the trajectory *<sup>ω</sup>* = (*x*0, ... , *xt*, *xt*+1, ... , *xT*) equals to *<sup>T</sup>* <sup>−</sup> 1, and the multistage game <sup>Γ</sup>*x*<sup>0</sup> length equals to the maximal length of the trajectory *ω* in Γ*x*<sup>0</sup> . We'll construct the unique subgame perfect equilibrium *u* = (*u*1,..., *un*) in Γ*x*<sup>0</sup> by induction in the length *L* of the subgame Γ*x*.

**Step** *<sup>L</sup>* **<sup>=</sup> 1:** Consider a subgame <sup>Γ</sup>*<sup>x</sup>* of the length *<sup>L</sup>* <sup>=</sup> 1. If *<sup>x</sup>* <sup>∈</sup> *Pi*, *<sup>i</sup>* <sup>=</sup> 1, . . . , *<sup>n</sup>*, we have two cases.

*Case 1:* there exists a unique *<sup>z</sup><sup>k</sup>* <sup>∈</sup> *<sup>S</sup>*(*x*) = *<sup>P</sup><sup>x</sup> <sup>n</sup>*+<sup>1</sup> such that *hi*(*zk*) = max *z*∈*S*(*x*) *hi*(*z*). Then suppose that

*ui*(*x*) = *<sup>z</sup>k*, *<sup>p</sup>*(*zk*|*x*, *<sup>u</sup>*) = 1, *<sup>p</sup>*(*z*|*x*, *<sup>u</sup>*) = <sup>0</sup> <sup>∀</sup>*<sup>z</sup>* <sup>∈</sup> *<sup>S</sup>*(*x*) \ {*zk*}. *Case 2:* there exist *<sup>q</sup>* <sup>&</sup>gt; 1 nodes *<sup>z</sup>kq* <sup>∈</sup> *<sup>S</sup>*(*x*) = *<sup>P</sup><sup>x</sup> <sup>n</sup>*+<sup>1</sup> such that *hi*(*zk*<sup>1</sup> ) = *hi*(*zk*<sup>2</sup> ) = ... = *hi*(*zkq* ) = max *z*∈*S*(*x*) *hi*(*z*). Then suppose that the *i*th player chooses the terminal position *<sup>z</sup><sup>k</sup>* ∈ {*zk*<sup>1</sup> ,..., *<sup>z</sup>kq*} <sup>=</sup> *<sup>S</sup>i*,1(*x*) such that

$$h\_{\dot{\jmath}}(z^k) = \max\_{z \in S^{i,1}(x)} h\_{\dot{\jmath}}(z), \text{ where } f\_i(j) = 2. \tag{7}$$

Let *<sup>S</sup>i*,2(*x*) denote the set of all nodes *<sup>z</sup><sup>k</sup>* <sup>∈</sup> *<sup>S</sup>i*,1(*x*) meeting (7). If *<sup>S</sup>i*,2(*x*) consists of unique node *<sup>z</sup><sup>k</sup>* then *ui*(*x*) = *<sup>z</sup>k*, *<sup>p</sup>*(*zk*|*x*, *<sup>u</sup>*) = 1, *<sup>p</sup>*(*z*|*x*, *<sup>u</sup>*) = <sup>0</sup> <sup>∀</sup>*<sup>z</sup>* <sup>∈</sup> *<sup>S</sup>*(*x*) \ {*zk*}. Otherwise, suppose that the *<sup>i</sup>*th player chooses terminal node *<sup>z</sup><sup>k</sup>* <sup>∈</sup> *<sup>S</sup>i*,2(*x*) such that

$$h\_l(z^k) = \max\_{z \in S^{i2}(x)} h\_l(z), \text{ where } f\_l(l) = \mathfrak{A}. \tag{8}$$

Let *<sup>S</sup>i*,3(*x*) denote the set of all final nodes *<sup>z</sup><sup>k</sup>* <sup>∈</sup> *<sup>S</sup>i*,2(*x*) satisfying (8), and so on. .

. Finally, if *<sup>S</sup>i*,*n*(*x*) contains unique node *<sup>z</sup>k*, then *ui*(*x*) = *<sup>z</sup>k*, *<sup>p</sup>*(*zk*|*x*, *<sup>u</sup>*) = 1, *<sup>p</sup>*(*z*|*x*, *<sup>u</sup>*) = <sup>0</sup> <sup>∀</sup>*<sup>z</sup>* <sup>∈</sup> *<sup>S</sup>*(*x*) \ {*zk*}. Otherwise, suppose that player *<sup>i</sup>* chooses the final node *<sup>z</sup><sup>k</sup>* from *<sup>S</sup>i*,*n*(*x*) with minimal ordinal number *k*.

Note that for all cases *Hj*(*u*) = *hj*(*zk*), *<sup>j</sup>* <sup>∈</sup> *<sup>N</sup>*.

If *<sup>x</sup>* <sup>∈</sup> *<sup>P</sup>*<sup>0</sup> then *<sup>S</sup>*(*x*) = *<sup>P</sup><sup>x</sup> <sup>n</sup>*+<sup>1</sup> and we do not need to define a strategy of any player at *x*, while *Hj*(*u*) = ∑ *<sup>z</sup>k*∈*S*(*x*) *<sup>π</sup>*(*zk*|*x*) · *hj*(*zk*). Hence, the players' behavior *<sup>u</sup><sup>x</sup>* = (*u<sup>x</sup>* <sup>1</sup>, ... , *<sup>u</sup><sup>x</sup> <sup>n</sup>*) <sup>∈</sup> *NE*(Γ*x*) and

the expected payoffs *H<sup>x</sup> <sup>j</sup>* (*ux*), *<sup>j</sup>* <sup>∈</sup> *<sup>N</sup>* are defined for all subgames <sup>Γ</sup>*<sup>x</sup>* of the length 1. In addition, for games <sup>Γ</sup>*y*, *<sup>y</sup>* <sup>∈</sup> *Pn*+<sup>1</sup> of length *<sup>L</sup>* <sup>=</sup> 0 we assume that *<sup>H</sup><sup>y</sup> <sup>i</sup>* (*uy*) = *hi*(*y*), *<sup>i</sup>* <sup>∈</sup> *<sup>N</sup>*.


$$H\_{\restriction}(\underline{\boldsymbol{u}}) = \sum\_{\boldsymbol{y} \in \mathcal{S}(\boldsymbol{x}\_{0})} \pi(\boldsymbol{y}|\boldsymbol{x}\_{0}) \cdot (h\_{\restriction}(\boldsymbol{y}) + H\_{\restriction}^{\boldsymbol{y}}(\underline{\boldsymbol{u}}^{\boldsymbol{y}})) \geqslant \sum\_{\boldsymbol{y} \in \mathcal{S}(\boldsymbol{x}\_{0})} \pi(\boldsymbol{y}|\boldsymbol{x}\_{0}) \cdot (h\_{\restriction}(\boldsymbol{y}) + H\_{\restriction}^{\boldsymbol{y}}(\boldsymbol{u}\_{\restriction}^{\boldsymbol{y}}, \underline{\boldsymbol{u}}^{\boldsymbol{y}}\_{-\underline{\boldsymbol{y}}})) = H\_{\restriction}(\boldsymbol{u}\_{\restriction}^{\boldsymbol{y}}, \underline{\boldsymbol{u}}^{\boldsymbol{y}}\_{-\underline{\boldsymbol{y}}}) \quad \{\boldsymbol{\Theta}\}$$

for all *uj* <sup>=</sup> *<sup>u</sup><sup>y</sup> <sup>j</sup>* <sup>∈</sup> *<sup>U</sup><sup>y</sup> <sup>j</sup>* <sup>=</sup> *Uj* since *<sup>u</sup><sup>y</sup>* <sup>∈</sup> *NE*(Γ*y*) due to induction assumption, and each player *<sup>j</sup>* <sup>∈</sup> *<sup>N</sup>* can deviate from *uj* only in the subgames <sup>Γ</sup>*y*, *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>*(*x*0).

If *x*<sup>0</sup> ∈ *Pi* for some *i* ∈ *N*, we have two cases.

*Case 1:* there exists a unique *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>*(*x*0) such that

$$h\_i(\overline{y}) + H\_i^{\overline{y}}(\underline{u}^{\overline{y}}) = \max\_{\underline{y} \in S(x\_0)} \left( h\_i(\underline{y}) + H\_i^{\overline{y}}(\underline{u}^{\underline{y}}) \right). \tag{10}$$

Then we suppose that *ui*(*x*0) = *<sup>y</sup>*; *uj*(*x*) = *<sup>u</sup><sup>y</sup> <sup>j</sup>* (*x*) if *<sup>x</sup>* <sup>∈</sup> *Pj* <sup>∩</sup> *<sup>K</sup>y*, *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>*(*x*0), *<sup>j</sup>* <sup>=</sup> 1, . . . , *<sup>n</sup>*. *Case 2:* there exist *<sup>q</sup>* <sup>&</sup>gt; 1 nodes *<sup>y</sup>*1,..., *yq* <sup>∈</sup> *<sup>S</sup>*(*x*0) such that

$$h\_i(\overline{y}\_1) + H\_i^{\overline{\mathfrak{g}}\_1}(\underline{u}^{\overline{\mathfrak{g}}\_1}) = \dots = h\_i(\overline{y}\_q) + H\_i^{\overline{\mathfrak{g}}\_q}(\underline{u}^{\overline{\mathfrak{g}}\_q}) = \max\_{y \in S(x\_0)} \left( h\_i(y) + H\_i^{\overline{\mathfrak{g}}}(\underline{u}^y) \right). \tag{11}$$

Then we suppose that the *<sup>i</sup>*th player chooses *<sup>y</sup>* ∈ {*y*1,..., *yq*} <sup>=</sup> *<sup>S</sup>i*,1(*x*0) such that

$$h\_{\vec{\jmath}}(\overline{y}) + H\_{\vec{\jmath}}^{\mathfrak{F}}(\overline{u}^{\mathfrak{F}}) = \max\_{y \in S^{\mathfrak{f},1}(x\_0)} \left( h\_{\vec{\jmath}}(y) + H\_{\vec{\jmath}}^{\mathfrak{F}}(\underline{u}^{\mathfrak{F}}) \right), \text{ where } f\_{\vec{\imath}}(\underline{\jmath}) = 2. \tag{12}$$

Let *<sup>S</sup>i*,2(*x*0) denote the set of all nodes *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>i*,1(*x*0) satisfying (12). If *<sup>S</sup>i*,2(*x*0) consists of unique node *<sup>y</sup>* then we suppose that *ui*(*x*0) = *<sup>y</sup>*; *uj*(*x*) = *<sup>u</sup><sup>y</sup> <sup>j</sup>* (*x*) if *<sup>x</sup>* <sup>∈</sup> *Pj* <sup>∩</sup> *<sup>K</sup>y*, *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>*(*x*0), *<sup>j</sup>* <sup>=</sup> 1, ... , *<sup>n</sup>*. Otherwise, suppose that the *<sup>i</sup>*th player chooses node *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>i*,2(*x*0) such that

$$h\_l(\mathfrak{Y}) + H\_l^{\mathfrak{Y}}(\mathbb{Z}^{\mathfrak{Y}}) = \max\_{y \in \mathcal{S}^{j,2}(x\_0)} \left( h\_l(y) + H\_l^{\mathfrak{Y}}(\underline{u}^{\mathfrak{y}}) \right), \text{ where } f\_l(l) = 3. \tag{13}$$

Let *<sup>S</sup>i*,3(*x*0) denote the set of all nodes *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>i*,2(*x*0) meeting (13), and so on. . . .

Finally, if *<sup>S</sup>i*,*n*(*x*0) contains several nodes *ym*, denote by *<sup>l</sup>* = min *ym*∈*Si*,*n*(*x*0) {*<sup>l</sup>* <sup>|</sup> *<sup>z</sup><sup>l</sup>* <sup>∈</sup> *<sup>P</sup>ym <sup>n</sup>*+<sup>1</sup> <sup>∩</sup> <sup>Ω</sup>(*uym* )} the minimal number of terminal nodes of the trajectories generated by subgame perfect equilibriums *<sup>u</sup>ym* in the subgames <sup>Γ</sup>*ym* , *ym* <sup>∈</sup> *<sup>S</sup>i*,*n*(*x*0) (see Remark 1). Note that there exists unique trajectory *ω* = (*x*0, ... , *z<sup>l</sup>* ) from *<sup>x</sup>*<sup>0</sup> to *<sup>z</sup><sup>l</sup>* in the game <sup>Γ</sup>*x*<sup>0</sup> , and let *<sup>y</sup>* <sup>=</sup> *<sup>ω</sup>* <sup>∩</sup> *<sup>S</sup>i*,*n*(*x*0). Again, we suppose that *ui*(*x*0) = *<sup>y</sup>*; *uj*(*x*) = *<sup>u</sup><sup>y</sup> <sup>j</sup>* (*x*) if *<sup>x</sup>* <sup>∈</sup> *Pj* <sup>∩</sup> *<sup>K</sup>y*, *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>*(*x*0), *<sup>j</sup>* <sup>=</sup> 1, . . . , *<sup>n</sup>*.

Now we prove that for both cases no player has profitable deviation in Γ*x*<sup>0</sup> from the strategy profile *u* = (*u*1,..., *un*) constructed above.

$$H\_i(\underline{u}) = h\_i(\overline{y}) + H\_i^{\overline{y}}(\underline{u}^{\overline{y}}) \geqslant h\_i(y) + H\_i^y(\underline{u}^y) \geqslant h\_i(y) + H\_i^y(u\_i^y, \underline{u}^y\_{-i}) \tag{14}$$

for all *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>*(*x*0), *<sup>u</sup><sup>y</sup> <sup>i</sup>* <sup>∈</sup> *<sup>U</sup><sup>y</sup> <sup>i</sup>* due to (10), (11) and the induction assumption that *<sup>u</sup><sup>y</sup>* <sup>∈</sup> *NE*(Γ*y*), *<sup>y</sup>* <sup>∈</sup> *<sup>S</sup>*(*x*0). For other players *j* ∈ *N*, *j* = *i*, we have

$$H\_{\underline{j}}(\underline{\mu}) = h\_{\underline{j}}(\underline{\mathfrak{y}}) + H\_{\underline{j}}^{\underline{\mathfrak{y}}}(\underline{\mathfrak{u}}^{\mathfrak{y}}) \gg h\_{\underline{j}}(\underline{\mathfrak{y}}) + H\_{\underline{j}}^{\underline{\mathfrak{y}}}(\underline{u}\_{\underline{j}}^{\mathfrak{y}}, \underline{\mathfrak{u}}^{\mathfrak{y}}\_{-\underline{j}}) = H\_{\underline{j}}(u\_{\underline{j}}, \underline{\mathfrak{u}}\_{-\underline{j}}) \tag{15}$$

for all *uj* <sup>∈</sup> *Uj* since *<sup>u</sup><sup>y</sup>* <sup>∈</sup> *NE*(Γ*y*), and the only deviation of player *<sup>j</sup>* <sup>∈</sup> *<sup>N</sup>*, *<sup>j</sup>* <sup>=</sup> *<sup>i</sup>* from *uj* in the subgame Γ*<sup>y</sup>* may affect the players' payoffs.

Hence, taking (9), (14) and (15) into account we obtain by induction that the strategy profile *u* = (*u*1,..., *un*) constructed above forms unique subgame perfect equilibria in Γ*x*<sup>0</sup> .

**Proposition 1.** *If the players attitude vectors F*1, *F*2, ..., *Fn are of common knowledge, the Attitude SPE algorithm allows to construct a unique subgame perfect equilibrium u* = (*u*1, ... , *un*) *in pure strategies for any extensive-form game* <sup>Γ</sup>*x*<sup>0</sup> <sup>∈</sup> *<sup>G</sup>cm*(*n*) *with chance moves as well as a unique bundle of trajectories* <sup>Ω</sup>(*u*)*.*

It is worth noting than the existence of (subgame perfect) pure strategy equilibrium in extensive form game with perfect information and chance moves was first proved in References [46,47] for the partial case when the payoffs are only defined in terminal nodes. Hence, Proposition 1 could be considered as a corollary of these results. However, we provide a rigorous algorithm how to construct a unique *SPE* in extensive-form game with chance moves as well as a (unique) corresponding bundle of trajectories. We will use this algorithm, in particular, to calculate the characteristic function of the cooperative extensive-form game in Section 4.

Let us use the following example to demonstrate how the Attitude SPE algorithm works.

**Example 1.** *(A 3-player multistage game with chance moves).*

*Let <sup>P</sup>*<sup>0</sup> <sup>=</sup> {*x*1, *<sup>x</sup>*3}*, <sup>P</sup>*<sup>1</sup> <sup>=</sup> {*x*0, *<sup>x</sup>*<sup>2</sup> <sup>4</sup>}*, <sup>P</sup>*<sup>2</sup> <sup>=</sup> {*x*<sup>1</sup> 2, *<sup>x</sup>*5}*, <sup>P</sup>*<sup>3</sup> <sup>=</sup> {*x*<sup>2</sup> <sup>2</sup>, *<sup>x</sup>*<sup>3</sup> <sup>4</sup>}*, Pn*+<sup>1</sup> = {*z*1, ... , *z*10}*. The players' payoffs and probabilities π*(*y*|*x*)*, x* ∈ *P*<sup>0</sup> *are written in the game tree.*

*Suppose that the players' attitude vectors are F*<sup>1</sup> = (*f*1(1), *f*1(2), *f*1(3)) = (1, 3, 2)*, F*<sup>2</sup> = (2, 3, 1) *and F*<sup>3</sup> = (3, 1, 2)*.*

*When using the Attitude SPE algorithm, at each node x* ∈ *Pi, i* = 1, 2, 3*, the ith player has to choose the alternative marked in bold violet in Figure 1. Note that*

$$H^{\mathbb{T}3}(\underline{\mu}^{\mathbb{T}3}) = \begin{pmatrix} 12 \\ 0 \\ 0 \end{pmatrix} + \frac{1}{6} \begin{pmatrix} 0 \\ 24 \\ 0 \end{pmatrix} + \frac{1}{2} \begin{pmatrix} 24 \\ 0 \\ 24 \end{pmatrix} + \frac{1}{3} \begin{pmatrix} 0 \\ 18 \\ 12 \end{pmatrix} = \begin{pmatrix} 24 \\ 10 \\ 16 \end{pmatrix} \\ and \, H^{\mathbb{Z}3} = h(z\_2) = \begin{pmatrix} 0 \\ 10 \\ 20 \end{pmatrix}.$$

*Hence, S*2,1(*x*<sup>1</sup> <sup>2</sup>) = {*z*2, *<sup>x</sup>*3}*, and u*2(*x*<sup>1</sup> <sup>2</sup>) = *z*<sup>2</sup> *due to the player's 2 attitude vector F*2*.*

*The A-SPE algorithm generates unique SPE u* = (*u*1, *u*2, *u*3)*, where u*1(*x*0) = *x*1*, u*1(*x*<sup>2</sup> <sup>4</sup>) = *z*8*; u*2(*x*<sup>1</sup> <sup>2</sup>) = *<sup>z</sup>*2*, <sup>u</sup>*2(*x*5) = *<sup>z</sup>*9*; <sup>u</sup>*3(*x*<sup>2</sup> <sup>2</sup>) = *<sup>z</sup>*4*, <sup>u</sup>*3(*x*<sup>3</sup> <sup>4</sup>) = *z*6*, while H*(*u*)=(11, 22, 18)*. We will use this SPE later in Section 4 when calculating the γ-characteristic function.*

**Figure 1.** 3-person extensive-form game: A-Subgame Perfect Equilibria (SPE) algorithm implementation.

### **4. Cooperative Strategies and Trajectories**

If the players agree to cooperate in multicriteria game Γ*x*<sup>0</sup> , first they are expected to maximize the total payoff *<sup>n</sup>* ∑ *i*=1 *Hi*(*u*) of the grand coalition. Let *U*(Γ*x*<sup>0</sup> ) denote the set of all pure strategy profiles *u*, such that

$$\sum\_{i \in N} H\_i(u) = \max\_{\upsilon \in \mathcal{U}} \sum\_{i \in \mathcal{N}} H\_i(\upsilon) = \overline{H}. \tag{16}$$

The set *U*(Γ*x*<sup>0</sup> ) is known to be nonempty and it may contain multiple strategy profiles (see, e.g., Reference [17]). Hence, the players need to agree on a specific approach they are going to use to choose a unique optimal cooperative strategy profile *<sup>u</sup>* <sup>∈</sup> *<sup>U</sup>*(Γ*x*<sup>0</sup> ) as well as the corresponding optimal bundle of cooperative trajectories in the game tree. To this aim we introduce the so-calle *Players' Rank Based (PRB)* algorithm. Note that rather close approach—using the ranking of the criteria to choose a unique cooperative trajectory—was proposed recently in Reference [8] for multicriteria extensive-form games without chance moves. Namely, suppose that the players have agreed on the so-called "rank" of each player within the grand coalition *N*, and *r*(*k*) = *i* means that the rank of player *i* equals *k*, *k* = 1, ..., *n*.

**Players' rank based (PRB) algorithm**.


$$\max\_{\boldsymbol{\upsilon}\in\overline{\mathcal{U}}(\Gamma^{x\_0})} H\_{r(1)}(\boldsymbol{\upsilon}) = \overline{H}\_{r(1)}.$$

Let *Ur*(1)(Γ*x*<sup>0</sup> ) denote the set of all strategy profiles *u* such that *Hr*(1)(*u*) = *Hr*(1). If all strategy profiles *<sup>u</sup>* <sup>∈</sup> *Ur*(1)(Γ*x*<sup>0</sup> ) generate the same bundle of trajectories <sup>Ω</sup>(*u*), the players may choose any strategy profile *<sup>u</sup>* <sup>∈</sup> *Ur*(1)(Γ*x*<sup>0</sup> ) as the cooperative strategy profile. Otherwise proceed to the next step.


.

. **Step** *<sup>k</sup>* **<sup>+</sup> 1.** Finally, if the strategy profiles from *<sup>u</sup>* <sup>∈</sup> *Ur*(*n*)(Γ*x*<sup>0</sup> ) generate different bundles of the trajectories, we suppose that the players choose such *u* ∈ *Ur*(*n*)(Γ*x*<sup>0</sup> ) that Ω(*u*) = {*ωm*(*u*) = (*x*0, ... , *xT*(*m*) = *zl*) | *p*(*ωm*, *u*) > 0} contains the trajectory *ω*(*u*) with minimal number *l* of the terminal node *zl* (see Remark 1).

Henceforth, we will refer to the strategy profile *<sup>u</sup>* <sup>∈</sup> *<sup>U</sup>*(Γ*x*<sup>0</sup> ) and the bundle of the trajectories <sup>Ω</sup>(*u*) as the *optimal cooperative strategy profile* and the *optimal bundle of cooperative trajectories* respectively.

In the dynamic setting it is significant that a specific method which the players agreed to accept in order to choose a unique optimal cooperative strategy profile *<sup>u</sup>* <sup>∈</sup> *<sup>U</sup>*(Γ*x*<sup>0</sup> ) as well as the corresponding optimal bundle of cooperative trajectories satisfies time consistency (see, e.g., References [1,2,6,13,17]), that is, a fragment of the optimal bundle of the cooperative trajectories in the subgame should remain optimal in this subgame. Suppose that at each subgame Γ*xt* along the cooperative trajectories, that is *xt* <sup>∈</sup> *<sup>ω</sup>*(*u*), *<sup>ω</sup>*(*u*) <sup>∈</sup> <sup>Ω</sup>(*u*), the players choose the strategy profile *<sup>u</sup>xt* <sup>∈</sup> *<sup>U</sup>xt* such that

$$\mu^{\overline{\pi}\_l} \in \underset{\upsilon^{\overline{\pi}\_l} \in \mathcal{U}^{\overline{\pi}\_l}}{\text{arg}\,\text{max}} \sum\_{i \in \mathcal{N}} H\_i^{\overline{\pi}\_l} (\upsilon^{\overline{\pi}\_l}) . \tag{17}$$

Let *<sup>U</sup>*(Γ*xt*) denote the set of all pure strategy profiles *<sup>u</sup>xt* <sup>∈</sup> *<sup>U</sup>xt* which satisfy (17) and the players use the same approach to choose a unique optimal cooperative strategy profile *<sup>u</sup>xt* <sup>∈</sup> *<sup>U</sup>*(Γ*xt*) in the subgame as for the original game Γ*x*<sup>0</sup> (namely, the PRB algorithm).

**Proposition 2.** *A cooperative strategy profile for* <sup>Γ</sup>*x*<sup>0</sup> <sup>∈</sup> *<sup>G</sup>cm*(*n*) *based on the PRB algorithm satisfies time consistency. Namely, let u* ∈ *U satisfies (16), and* Ω(*u*) *be the optimal bundle of cooperative trajectories. Then for each subgame* <sup>Γ</sup>*xt , xt* <sup>∈</sup> *<sup>ω</sup>*(*u*)=(*x*0, ... , *xt*, *xt*+1, ... , *xT*), 1 *<sup>t</sup>* <sup>&</sup>lt; *<sup>T</sup>*, *with <sup>x</sup>*<sup>0</sup> <sup>=</sup> *<sup>x</sup>*0*, <sup>ω</sup>*(*u*) <sup>∈</sup> <sup>Ω</sup>(*u*)*, it holds that*

$$\sum\_{i \in \mathcal{N}} H\_i^{\overline{\mathbf{x}}\_l} (\overline{u}^{\mathbb{T}\_l}) = \max\_{\upsilon^{\overline{\mathbf{y}}\_l} \in \mathcal{U}^{\overline{\mathbf{y}}\_l}} \sum\_{i \in \mathcal{N}} H\_i^{\overline{\mathbf{x}}\_l} (\upsilon^{\overline{\mathbf{y}}\_l}),\tag{18}$$

*while <sup>ω</sup>xt* = (*x*¯*t*, *<sup>x</sup>*¯*t*+1, ... , *<sup>x</sup>*¯*T*) <sup>∈</sup> <sup>Ω</sup>(*uxt*)*, that is, <sup>ω</sup>xt belongs to the optimal bundle of cooperative trajectories in the subgame* Γ*xt .*

**Proof.** The optimal bundle of cooperative trajectories <sup>Ω</sup>(*u*) generated by *<sup>u</sup>* <sup>∈</sup> *PO*(Γ*x*<sup>0</sup> ) can be divided onto two subsets {Ψ*m*} = {*ω* ∈ Ω(*u*) | *xt* ∈ *ω*} and {*χl*} = {*ω* ∈ Ω(*u*) | *xt* ∈/ *ω*} while {Ψ*m*}∩{*χl*} = ∅, {Ψ*m*}∪{*χl*} = Ω(*u*). Then, taking (5) and (6) into account we get

$$\begin{split} H\_{i}(\overline{\boldsymbol{\pi}}) &= \sum\_{m} p(\boldsymbol{\Psi}\_{m}, \overline{\boldsymbol{\pi}}) \cdot \tilde{h}\_{i}(\boldsymbol{\Psi}\_{m}) + \sum\_{l} p(\boldsymbol{\chi}\_{l}, \overline{\boldsymbol{\pi}}) \cdot \tilde{h}\_{i}(\boldsymbol{\chi}\_{l}) = \\ &= p(\overline{\boldsymbol{\pi}}\_{l}, \overline{\boldsymbol{\pi}}) \cdot \left[ \tilde{h}\_{i}(\overline{\boldsymbol{\pi}}\_{0}, \overline{\boldsymbol{\pi}}\_{1}, \dots, \overline{\boldsymbol{\pi}}\_{t-1}) + H\_{i}^{\overline{\boldsymbol{\pi}}\_{l}}(\overline{\boldsymbol{\pi}}^{\overline{\boldsymbol{\pi}}\_{l}}) \right] + \sum\_{l} p(\chi\_{l}, \overline{\boldsymbol{\mu}}^{D}) \cdot \tilde{h}\_{i}(\chi\_{l}), \end{split} \tag{19}$$

and (16) for *u* takes the form

$$\begin{split} & \sum\_{i \in N} p(\mathbf{\overline{x}}\_{l}, \mathbf{\overline{u}}) \cdot \left( \overline{h}\_{i}(\mathbf{\overline{x}}\_{0}, \mathbf{\overline{x}}\_{1}, \dots, \mathbf{\overline{x}}\_{t-1}) + H\_{i}^{\mathbf{T}\_{l}}(\mathbf{\overline{u}}^{\mathbf{T}\_{l}}) \right) + \\ & + \sum\_{i \in N} \sum\_{l} p(\chi\_{l}, \mathbf{\overline{u}}^{D}) \cdot \overline{h}\_{i}(\chi\_{l}) = \max\_{v \in U} H\_{i}(v). \end{split} \tag{20}$$

Suppose that *<sup>u</sup>xt* does not satisfy (18), that is, there exists *<sup>v</sup>xt* <sup>∈</sup> *<sup>U</sup>xt* such that

$$\sum\_{i \in N} H\_i^{\overline{\mathfrak{X}}\_l} \left( \overline{u}^{\mathfrak{X}\_l} \right) < \sum\_{i \in N} H\_i^{\overline{\mathfrak{X}}\_l} \left( v^{\mathfrak{X}\_l} \right). \tag{21}$$

Denote by <sup>Ω</sup>(*vxt*) = {*λxt <sup>m</sup>* = (*xt*, ... , *xT*(*m*)) <sup>|</sup> *<sup>p</sup>*(*λxt <sup>m</sup>* , *<sup>v</sup>xt*) <sup>&</sup>gt; <sup>0</sup>} the bundle of all trajectories in the subgame Γ*xt* generated by *vxt* . Then (21) takes the form

$$\sum\_{i \in N} \sum\_{m} p(\Psi\_{m'}^{\overline{\pi}\_{\ell}}, \overline{\pi}^{\overline{\pi}\_{\ell}}) \cdot \tilde{h}\_{i}^{\overline{\pi}\_{\ell}}(\Psi\_{m}^{\overline{\pi}\_{\ell}}) < \sum\_{i \in N} \sum\_{m} p(\lambda\_{m'}^{\overline{\pi}\_{\ell}}, \upsilon^{\overline{\pi}\_{\ell}}) \cdot \tilde{h}\_{i}^{\overline{\pi}\_{\ell}}(\lambda\_{m}^{\overline{\pi}\_{\ell}}).\tag{22}$$

Denote by *Wi* = (*u*¯*<sup>D</sup> <sup>i</sup>* , *<sup>v</sup>xt <sup>i</sup>* ), *<sup>i</sup>* <sup>∈</sup> *<sup>N</sup>*, the *<sup>i</sup>*th player's compound pure strategy in <sup>Γ</sup>*x*<sup>0</sup> . The strategy profile *W* = (*W*1, ... , *Wn*) generates the strategy bundle Ω(*W*) that can be divided onto two disjoint subsets {*λm*} = {*ω* ∈ Ω(*W*) | *xt* ∈ *ω*} and {*χl*} = {*ω* ∈ Ω(*W*) | *xt* ∈/ *ω*}, where the second subset for <sup>Ω</sup>(*W*) coincides with the second subset for <sup>Ω</sup>(*u*) since *<sup>W</sup><sup>D</sup>* <sup>=</sup> *<sup>u</sup>D*, and *<sup>λ</sup><sup>m</sup>* = (*x*0, ... , *xt*) <sup>∪</sup> (*xt*,..., *xT*(*m*))=(*x*0,..., *xt*) <sup>∪</sup> *<sup>λ</sup>xt m* .

Adding ∑ *i*∈*N* ˜ *hi*(*x*0, *x*<sup>1</sup> ..., *xt*−1) to both sides of (22) we get

$$\sum\_{i \in N} \left( \tilde{h}\_i(\overline{\mathbf{x}}\_0, \dots, \overline{\mathbf{x}}\_{t-1}) + H\_i^{\overline{\mathbf{x}}\_t}(\overline{\mathbf{u}}^{\mathbf{T}\_t}) \right) < \sum\_{i \in N} \left( \tilde{h}\_i(\overline{\mathbf{x}}\_0, \dots, \overline{\mathbf{x}}\_{t-1}) + H\_i^{\overline{\mathbf{x}}\_t}(\mathbf{v}^{\mathbf{T}\_t}) \right). \tag{23}$$

Then we can multiply both sides of (23) on *p*(*xt*, *u*) = *p*(*xt*, *uD*) = *p*(*xt*, *WD*) = *p*(*xt*, *W*) > 0 and then add ∑ *i*∈*N* ∑ *l <sup>p</sup>*(*χl*, *<sup>u</sup>D*) · ˜ *hi*(*χl*) to both sides of the last inequality. Taking into account (4)–(6) and (20) we obtain

$$\sum\_{i \in N} H\_i(\overline{u}) < \sum\_{i \in N} H\_i(\mathcal{W})$$

for the constructed strategy profile *<sup>W</sup>* <sup>∈</sup> *<sup>U</sup>*. The last inequality contradicts the fact that *<sup>u</sup>* <sup>∈</sup> *<sup>U</sup>*(Γ*x*<sup>0</sup> ), hence (18) is valid.

Arguing in a similar way (for the case when different strategy profiles from *U*(Γ*xt*) generate different bundles of the trajectories) we can verify that *ωxt* = (*xt*, ... , *xT*) — a fragment of the cooperative trajectory *ω* ∈ Ω(*u*), starting at *xt* — belongs to the optimal bundle of cooperative trajectories in the subgame <sup>Γ</sup>*xt* , that is, *<sup>ω</sup>xt* <sup>∈</sup> <sup>Ω</sup>(*uxt*).

We will assume in this paper that all the players have agreed to apply the PRB algorithm in order to choose the cooperative strategy profile *u* = (*u*¯1, ... , *u*¯*n*) that generates the optimal bundle Ω(*u*) of cooperative trajectories in <sup>Γ</sup>*x*<sup>0</sup> <sup>∈</sup> *<sup>G</sup>cm*(*n*). The next step of cooperation is to define a characteristic function *Vx*<sup>0</sup> (*S*). There are different notions of characteristic functions (see, e.g., References [23,24,48]), in this paper we adopt the so-called *γ*-characteristic function introduced in Reference [24]. Namely, we assume that *Vx*<sup>0</sup> (*S*) is given by the SPE (based on the Attitude SPE algorithm) outcome of S in the noncooperative game between members of S maximizing their joint payoff, and non members playing individually.

The *<sup>γ</sup>*-characteristic function *<sup>V</sup>xt* for the subgame <sup>Γ</sup>*xt* , *xt* <sup>∈</sup> *<sup>ω</sup>m*(*u*)=(*x*0, ... , *xt*, ... , *xT*(*m*)), *ωm*(*u*) ∈ Ω(*u*) along the optimal bundle of cooperative trajectories can be constructed using the same approach. Note that

$$V^{\mathfrak{T}\_{\ell}}(N) = \sum\_{\substack{\mathfrak{T}\_{\ell} \in \Omega(\mathbb{Z}^{\mathfrak{T}\_{\ell}}) \\ \ell \le t \le T}} p(\omega\_m^{\mathfrak{T}\_{\ell}}, \mathbb{Z}^{\mathfrak{T}\_{\ell}}) \cdot \sum\_{\tau=t}^{T(m)} \sum\_{i \in N} h\_i(\mathbb{X}\_{\tau})\_{\tau} \ t = 0, 1, \dots, T(m). \tag{24}$$

Let <sup>Γ</sup>*x*<sup>0</sup> (*N*, *<sup>V</sup>x*<sup>0</sup> ) denote extensive-form cooperative game <sup>Γ</sup>*x*<sup>0</sup> <sup>∈</sup> *<sup>G</sup>cm*(*n*) with *<sup>γ</sup>*-characteristic function, and Γ*xt N*, *Vxt* denote the corresponding subgame.

We assume that the players adopt a single-valued cooperative solution *ϕx*<sup>0</sup> (for instance, the Shapley value [33], the nucleolus [34], etc.) for the cooperative game Γ*x*<sup>0</sup> (*N*, *Vx*<sup>0</sup> ) which satisfies the collective rationality property

$$\sum\_{i=1}^{n} \varphi\_i^{\mathfrak{x}\_0} = V^{\mathfrak{x}\_0}(N) = \sum\_{\omega\_m \in \Omega(\mathfrak{T})} p(\omega\_m, \mathfrak{T}) \cdot \sum\_{\tau=0}^{T(m)} \sum\_{i \in N} h\_i(\mathfrak{X}\_\tau) \tag{25}$$

and the individual rationality property

$$\varphi\_i^{x\_0} \ge V^{x\_0}(\{i\}), \ i = 1, \ldots, n. \tag{26}$$

In addition, we assume that the same properties (25) and (26) are valid for the cooperative solutions *<sup>ϕ</sup>x*¯*<sup>t</sup>* at each subgame <sup>Γ</sup>*x*¯*t*(*N*, *<sup>V</sup>x*¯*t*), *<sup>t</sup>* <sup>=</sup> 0, . . . , *<sup>T</sup>* <sup>−</sup> 1.

It is worth noting that the last assumption as well as the choice of *γ*-characteristic function ensure that every player has an incentive to cooperate at each subgame along the optimal game evolution since the *<sup>i</sup>*th player's cooperative payoff-to-go at <sup>Γ</sup>*x*¯*t*(*N*, *<sup>V</sup>x*¯*t*), *<sup>t</sup>* <sup>=</sup> 0, ... , *<sup>T</sup>* <sup>−</sup> 1, is at least equal to her non-cooperative counterpart: *ϕx*¯*<sup>t</sup> <sup>i</sup>* - *Hx*¯*<sup>t</sup> <sup>i</sup>* (*ux*¯*t*).

### **5. Subgame Consistency and Incremental IDP**

Let *β* = {*βi*(*xτ*)}, *i* = 1, ... , *n*; *τ* = 1, ... , *T*(*l*), *x*(*τ*) ∈ *ωl*(*u*), *ωl*(*u*) ∈ Ω(*u*) denote the Imputation Distribution Procedure (IDP) for the cooperative solution *ϕx*<sup>0</sup> *i <sup>i</sup>*∈*<sup>N</sup>* or the payment schedule (see, e.g., References [3,8–12,14–18,20] for details). The IDP approach means that all the players have agreed to allocate the total cooperative payoff *Vx*<sup>0</sup> (*N*) between the players along the optimal bundle Ω(*u*) of cooperative trajectories *ωl*(*u*) according to some specific rule which is called IDP. Namely, *βi*(*xτ*) denotes the actual current payment which the player *i* receives at position *x<sup>τ</sup>* (instead of *hi*(*xτ*)) if the players employ the IDP *β*. Moreover, one can design such an IDP *β* that all the players will be interested in cooperation in any subgame <sup>Γ</sup>*x<sup>τ</sup>* , *<sup>x</sup>*(*τ*) <sup>∈</sup> *<sup>ω</sup>l*(*u*), *<sup>ω</sup>l*(*u*) <sup>∈</sup> <sup>Ω</sup>(*u*), that is, at any intermediate time instant.

**Definition 3.** *The IDP β* = {*βi*(*xτ*)} *satisfies subgame efficiency, if at any intermediate node xt* ∈ *ω*(*u*)*, ω*(*u*) ∈ Ω(*u*)*,* 0 *t* < *T, it holds that:*

$$\sum\_{\boldsymbol{\omega}\_{\boldsymbol{m}}^{\overline{\mathsf{T}}\_{\boldsymbol{t}}} \in \Omega(\mathbb{T}^{\overline{\mathsf{T}}\_{\boldsymbol{t}}})} p(\boldsymbol{\omega}\_{\boldsymbol{m}}^{\overline{\mathsf{T}}\_{\boldsymbol{t}}}, \overline{\boldsymbol{\pi}}^{\overline{\mathsf{T}}\_{\boldsymbol{t}}}) \cdot \sum\_{\boldsymbol{\tau} = \boldsymbol{t}}^{T(\boldsymbol{m})} \beta\_{\boldsymbol{i}}(\overline{\mathsf{x}}\_{\boldsymbol{\tau}}) = \boldsymbol{\varphi}\_{\boldsymbol{i}}^{\overline{\mathsf{T}}\_{\boldsymbol{t}}}, \boldsymbol{i} \in \boldsymbol{N}. \tag{27}$$

Equation (27) means that the expected sum of the payments to player *i* along the optimal subgame Γ*xt* evolution equals to what she is entitled to in this subgame. Then the IDP for each player can be reasonably implemented as a rule for step-by-step allocation of the *i*th player's current expected optimal payoff. Note that for *t* = 0 the subgame efficiency definition coincides with the efficiency at initial node *x*<sup>0</sup> or the efficiency in the whole game Γ*x*<sup>0</sup> condition (see References [9,14,16,20]).

**Definition 4** ([10])**.** *The IDP β* = {*βi*(*xτ*)} *satisfies the strict balance condition if for each node x*¯*<sup>τ</sup>* ∈ *ωm*(*u*)*, ωm*(*u*) ∈ Ω(*u*) ∀*t* = 0, . . . , *T*(*m*)

$$\sum\_{i \in N} \beta\_i(\vec{x}\_\tau) = \sum\_{i \in N} h\_i(\vec{x}\_\tau). \tag{28}$$

Equation (28) ensures the "admissibility" of the IDP, that is, the sum of payments to the players in any node *x*¯*<sup>τ</sup>* is equal to the sum of payoffs that they can collect in this node.

The next advantageous dynamic property of an IDP—the time consistency, introduced in Reference [3]—was extended to dynamic games played over event trees in References [14,16,20] as well as to multicriteria extensive-form cooperative games (with chance moves) in Reference [9].

To write down properly the time consistency condition for some intermediate node *xt* ∈ *ω*(*u*) = (*x*¯0, *<sup>x</sup>*¯1, ... , *<sup>x</sup>*¯*t*−1, *<sup>x</sup>*¯*t*, *<sup>x</sup>*¯*t*+1, ... , *<sup>x</sup>*¯*T*), *<sup>ω</sup>*(*u*) <sup>∈</sup> <sup>Ω</sup>(*u*), 1 *<sup>t</sup>* <sup>&</sup>lt; *<sup>T</sup>*, in multistage game <sup>Γ</sup>*x*<sup>0</sup> with chance moves we need to pay attention to all chance nodes on the path (*x*¯0,..., *<sup>x</sup>*¯*t*−1) = *<sup>ω</sup>xt* \ {*xt*}.

Namely, let us numerate the chance nodes from *<sup>P</sup>*<sup>0</sup> <sup>∩</sup> (*ωxt* \ {*xt*}) in order of their occurrence on the path (*x*¯0,..., *x*¯*t*−1), that is, *y*<sup>1</sup> = *xt*(1), *y*<sup>2</sup> = *xt*(2),..., *y<sup>θ</sup>* = *xt*(*θ*), 0 *t*(1) < *t*(2) < ... < *t*(*θ*) < *t*.

**Definition 5** ([9])**.** *The IDP <sup>β</sup>* <sup>=</sup> {*βi*/*k*(*xτ*)} *for the cooperative solution <sup>ϕ</sup>x*<sup>0</sup> *is called time consistent in the whole game* <sup>Γ</sup>*x*<sup>0</sup> (*N*, *<sup>V</sup>x*<sup>0</sup> ) <sup>∈</sup> *<sup>G</sup>cm*(*n*) *if at any intermediate node xt* <sup>∈</sup> *<sup>ω</sup>*(*u*)*, <sup>ω</sup>*(*u*) <sup>∈</sup> <sup>Ω</sup>(*u*)*,* <sup>1</sup> *<sup>t</sup>* <sup>&</sup>lt; *T, for all i* ∈ *N, it holds that*

*case θ* = 0 *(no chance nodes on the path* (*x*¯0,..., *x*¯*t*−1)*):*

$$\sum\_{\tau=0}^{t-1} \beta\_i(\overline{\mathbf{x}}\_{\tau}) + \boldsymbol{\varrho}\_i^{\mathbf{x}\_t} = \boldsymbol{\varrho}\_i^{\mathbf{x}\_0} \,\tag{29}$$

*case θ* = 1 *(only one chance node y*<sup>1</sup> = *xt*(1) *before x*¯*t):*

$$\begin{split} \sum\_{\tau=0}^{t(1)} \beta\_i(\overline{\mathbf{x}}\_{\tau}) + p(\overline{\mathbf{x}}\_{t(1)+1}, \overline{\mathbf{u}}) \cdot \left\{ \sum\_{\tau=t(1)+1}^{t-1} \beta\_i(\overline{\mathbf{x}}\_{\tau}) + \boldsymbol{\varrho}\_i^{\overline{\mathbf{x}}\_{t}} \right\} + \\ + \sum\_{\mathbf{x}^k \in S(\mathbf{X}\_{t(1)}) \backslash \{\mathbf{\overline{x}}\_{t(1)+1}\}} p(\mathbf{x}^k, \overline{\mathbf{u}}) \cdot \boldsymbol{\varrho}\_i^{\mathbf{x}^k} = \boldsymbol{\varrho}\_i^{\overline{\mathbf{x}}\_0}, \end{split} \tag{30}$$

*case θ* = 2 *(two chance nodes y*<sup>1</sup> = *xt*(1)*, y*<sup>2</sup> = *xt*(2) *before x*¯*t):*

$$\begin{split} &\sum\_{\tau=0}^{t(1)} \beta\_i(\overline{\mathbf{x}}\_{\tau}) + p(\overline{\mathbf{x}}\_{t(1)+1}, \overline{\mathbf{u}}) \cdot \left\{ \sum\_{\tau=t(1)+1}^{t(2)} \beta\_i(\overline{\mathbf{x}}\_{\tau}) + p(\overline{\mathbf{x}}\_{t(2)+1} \mid \overline{\mathbf{x}}\_{t(2)}, \overline{\mathbf{u}}) \right\} \\ &\times \left[ \sum\_{\tau=t(2)+1}^{t-1} \beta\_i(\overline{\mathbf{x}}\_{\tau}) + q\_i^{\overline{\mathbf{x}}\_t} \right] + \sum\_{\mathbf{x}^{\overline{\mathbf{x}}} \in S(\mathbf{T}\_{t(2)}) \backslash \{\overline{\mathbf{x}}\_{t(2)+1}\}} p(\mathbf{x}^{\overline{\mathbf{x}}} \mid \overline{\mathbf{x}}\_{t(2)}, \overline{\mathbf{u}}) \cdot q\_i^{\overline{\mathbf{x}}^{\overline{\mathbf{x}}}} \right] + \\ &+ \sum\_{\mathbf{x}^{\overline{\mathbf{x}}} \in S(\mathbf{T}\_{t(1)}) \backslash \{\overline{\mathbf{x}}\_{t(1)+1}\}} p(\mathbf{x}^{\overline{\mathbf{x}}}, \overline{\mathbf{u}}) \cdot q\_i^{\overline{\mathbf{x}}^{\overline{\mathbf{x}}}} = q\_i^{x\_0} \,, \end{split} \tag{31}$$

...

Note that for partial case when *xt* ∈ *S*(*xt*(1)), that is, if *xt* follows the chance node *xt*(1) Equation (30) takes the simpler form

$$\sum\_{\tau=0}^{t(1)} \beta\_i(\overline{\mathbf{x}}\_{\tau}) + \sum\_{\mathbf{x}^k \in S(\mathbf{T}\_{t(1)})} p(\mathbf{x}^k, \overline{\mathbf{u}}) \cdot \boldsymbol{\varrho}\_i^{\mathbf{x}^k} = \boldsymbol{\varrho}\_i^{\chi\_0}.$$

A similar note is valid for equation (31), and so forth.

Roughly speaking, Definition 5 implies that the payments collected by the *i*th player (according to the payment schedule *β*) before reaching some intermediate node *xt* plus the expected *i*th player's component of the Shapley value in the subgame Γ*xt* starting at *xt* plus this player's expected Shapley value components in other subgames along the cooperative trajectories which do not contain *xt* corresponds to what the player *i* is entitled to in the original game Γ*x*<sup>0</sup> (*N*, *Vx*<sup>0</sup> ).

It is worth noting that Definition 5 indeed provides a reasonable consistency requirements which a good payment schedule *β* should satisfy when the player evaluates IDP *β* at the initial node *x*0, that is, before the game Γ*x*<sup>0</sup> (*N*, *Vx*<sup>0</sup> ) starts (and the words "in the whole game" in Definition 5 properly reflect this feature). However, when the player purposes to evaluate IDP *β* in the subgame Γ*xt* , that is, after reaching some intermediate node *xt* (in case when *θ* - 1) this player will unlikely take into account the expected future payoffs in all the subgames which are unattainable if the node *xt* has been already reached, that is, the last addends in the LHS of (30) and (31). To overcome this problem we suggest the players to use a notion of subgame consistency—a refinement of time consistency that was firstly proposed in Reference [36] for cooperative stochastic differential games and then extend it to stochastic dynamic games in References [37,38]. Let us provide a rigorous definition of the IDP subgame consistency for extensive-form games with chance moves that is applicable in all the subgames along the optimal bundle of cooperative trajectories.

**Definition 6.** *The IDP β* = {*βi*(*xτ*)} *is called subgame consistent if at any intermediate node xt* ∈ *ω*(*u*)*, ω*(*u*) ∈ Ω(*u*)*,* 1 *t T, for all i* ∈ *N, it holds that*

*case* 1 *t t*(1) *(no chance nodes before the subgame* Γ*xt root xt):*

$$\sum\_{\tau=0}^{t-1} \beta\_i(\overline{\mathbf{x}}\_{\tau}) + \boldsymbol{\varrho}\_i^{\overline{\mathbf{x}}\_{t}} = \boldsymbol{\varrho}\_i^{\mathbf{x}\_0},\tag{32}$$

*case t*(1) + 1 < *t t*(2) *(only one chance node y*<sup>1</sup> = *xt*(1) *before xt):*

$$\sum\_{\tau=t(1)+1}^{t-1} \beta\_i(\overline{\mathbf{x}}\_{\tau}) + \boldsymbol{\varrho}\_i^{\overline{\mathbf{x}}\_t} = \boldsymbol{\varrho}\_i^{\overline{\mathbf{x}}\_{t(1)+1}},\tag{33}$$

*case t*(2) + 1 < *t t*(3) *(two chance nodes before xt):*

$$\sum\_{\tau=t(2)+1}^{t-1} \beta\_i(\overline{\mathbf{x}}\_{\tau}) + \varrho\_i^{\overline{\mathbf{x}}\_t} = \varrho\_i^{\overline{\mathbf{x}}\_{t(2)+1}},\tag{34}$$

*. . .*

*case t*(*θ*) + 1 < *t T (no chance nodes after xt):*

$$\sum\_{\tau=t(\theta)+1}^{t-1} \beta\_i(\overline{\mathbf{x}}\_{\tau}) + \varrho\_i^{\overline{\mathbf{x}}\_t} = \varrho\_i^{\overline{\mathbf{x}}\_{t(\theta)+1}}.\tag{35}$$

The subgame consistency definition differs from the "time consistency in the whole game" property (see References [9,14,16,20]) which is based on an a priori assessment of the *i*th player's

expected optimal payoff (before the game starts). However, when the players make a decision in the subgame after the chance move occurs they need to recalculate the expected optimal payoff since the original optimal bundle of cooperative trajectories shrinks after each chance node. Note that we can not write out the subgame consistency condition for *t* = *t*(1) + 1, *t*(2) + 1, ..., *t*(*θ*) + 1, that is, for the nodes *xt* that immediately follow the chance nodes.

One can suggest different imputation distribution procedures that may or may not satisfy the useful properties listed above. The review of different IDP for multistage games (without chance moves) as well as the analysis of their properties can be found in References [10,12,15,17]. Below we consider the refinement of the so-called incremental IDP (see, e.g., References [10,14,16,17,20,21]) that was recently introduced for multistage games with chance moves [9].

**Definition 7** ([9])**.** *The incremental IDP for the cooperative solution ϕx*<sup>0</sup> *in multistage game with chance moves* Γ*x*<sup>0</sup> *is defined as follows:*

$$\beta\_i(\mathbf{x}\_t) = \boldsymbol{\varphi}\_i^{\mathbf{x}\_t} - \sum\_{\mathbf{x}\_{t+1}^k \in \mathcal{S}(\mathbf{x}\_t)} p(\mathbf{x}\_{t+1}^k | \mathbf{x}\_t, \overline{\boldsymbol{\mu}}) \cdot \boldsymbol{\varphi}\_i^{\mathbf{x}\_{t+1}^k} \tag{36}$$

*for xt* ∈ *ωl*(*u*)=(*x*0,..., *xt*,..., *xT*(*l*))*, ωl*(*u*) ∈ Ω(*u*)*, t* = 0, . . . , *T*(*l*) − 1*;*

$$\rho\_i(\mathbf{x}\_{T(l)}) = \boldsymbol{\varrho}\_i^{\mathbf{x}\_{T(l)}} \tag{37}$$

*for xT*(*l*) ∈ Ω(*u*) ∩ *Pn*+1*.*

**Remark 2.** *Formulas (36), (37) are similar to the imputation distribution procedures suggested in References [14,16,20] for (single-criterion) stochastic discrete-time dynamic games played over event trees. If xt* <sup>∈</sup> *Pi, <sup>i</sup>* <sup>=</sup> 1, ... , *<sup>n</sup> Equation (36) takes the simpler form <sup>β</sup>i*(*xt*) = *<sup>ϕ</sup>xt <sup>i</sup>* <sup>−</sup> *<sup>ϕ</sup>xt*+<sup>1</sup> *<sup>i</sup> , where ui*(*xt*) = *xt*+1*, that coincides with the "classical" incremental IDP.*

Let us use again 3-person extensive-form game from Example 1 to demonstrate a proposed scheme of cooperation.

### **Example 2.** *(Cooperative behavior in 3-player game from Ex. 1).*

*Suppose that the players have agreed on the following ranks: r*(1) = 1*, r*(2) = 2 *and r*(3) = 3*. When implementing the PRB algorithm we get the optimal bundle* Ω(*u*) *which contains four cooperative trajectories (marked in bold, deep blue in Figure 2): ω*<sup>1</sup> = (*x*0, *x*1, *x*<sup>1</sup> 2, *<sup>x</sup>*3, *<sup>x</sup>*<sup>2</sup> <sup>4</sup>, *x*5, *x*6)*, ω*<sup>2</sup> = (*x*0, *x*1, *x*<sup>2</sup> <sup>2</sup>, *z*3)*, ω*<sup>3</sup> = (*x*0, *x*1, *x*<sup>1</sup> 2, *x*3, *x*<sup>1</sup> <sup>4</sup>) *and <sup>ω</sup>*<sup>4</sup> = (*x*0, *<sup>x</sup>*1, *<sup>x</sup>*<sup>1</sup> 2, *x*3, *x*<sup>3</sup> <sup>4</sup>, *z*7)*. Note that players use the ranks when making decision at node x*5*.*

**Figure 2.** 3-player extensive-form game: cooperative behavior.

*To demonstrate the implementation of the incremental IDP and its properties we will adopt the Shapley value as a single valued cooperative solution. The values of the γ-characteristic function Vx*<sup>0</sup> *for the original game* Γ*x*<sup>0</sup> (*N*, *Vx*<sup>0</sup> ) *and the Shapley value ϕx*<sup>0</sup> *are*


*Consider, for instance, the incremental IDP along the longest cooperative trajectory ω*<sup>2</sup> = (*x*0, ... , *x*6) *from* Ω(*u*)*. If we calculate γ-characteristic functions using Attitude SPE algorithm for the subgames, we get the following results.*



*Note that the subgame consistency conditions at nodes x*1*, x*<sup>3</sup> *and x*<sup>5</sup> *according to (32)–(34) respectively take the form:*

$$
\mathfrak{F}\_l(\mathfrak{T}\_0) + \mathfrak{q}\_l^{\mathfrak{T}\_1} = \mathfrak{q}\_l^{\chi\_0}, i \in N,\\
\text{or } \begin{pmatrix} 6 \\ 0 \\ 0 \end{pmatrix} + \begin{pmatrix} 19\frac{1}{5} \\ 17\frac{1}{5} \\ 25\frac{2}{5} \end{pmatrix} = \begin{pmatrix} 25\frac{1}{5} \\ 17\frac{1}{5} \\ 25\frac{2}{5} \end{pmatrix}.
$$

$$\begin{aligned} \beta\_l(\mathfrak{T}\_2^1) + \mathfrak{q}\_l^{\mathfrak{T}\_3} &= \mathfrak{q}\_l^{\mathfrak{X}\_2^1}, i \in \mathcal{N}, \text{or} \begin{pmatrix} -10 \\ 4 \\ 18 \end{pmatrix} + \begin{pmatrix} 31 \\ 13 \\ 20 \end{pmatrix} = \begin{pmatrix} 21 \\ 17 \\ 38 \end{pmatrix}, \\\ \beta\_l(\mathfrak{T}\_4^2) + \mathfrak{q}\_l^{\mathfrak{X}\_5} &= \mathfrak{q}\_l^{\mathfrak{X}\_4^2}, i \in \mathcal{N}, \text{or} \begin{pmatrix} 36 \\ -6 \\ 6 \end{pmatrix} + \begin{pmatrix} 0 \\ 12 \\ 24 \end{pmatrix} = \begin{pmatrix} 36 \\ 6 \\ 30 \end{pmatrix}. \end{aligned}$$

It is known that the classical incremental IDP for multistage (and differential) games may imply negative current payments to some players at some positions (see References [4,10,17,38] for details). As one can observe in Ex. 2, this drawback of the incremental IDP may appear in the extensive-form game with chance moves as well. Two approaches how to overcome this possible disadvantage were suggested in References [4,10]. Unfortunately, as it was firstly proved in Reference [10], in general it is impossible to design a time consistent IDP which satisfies both the balance condition and non-negativity constraint.

**Proposition 3.** *The incremental IDP (36), (37) satisfies strict balance condition (28), the subgame efficiency condition (27), and the subgame consistency conditions (32)–(35).*

**Proof.** Incremental IDP *β* was proved to satisfiy strict balance condition (28) in Reference [9]. The proof of subgame consistency can be carried out by direct verification. For instance, consider the case when *t*(1) + 1 < *t t*(2). Then, using Remark 2 we get

$$\sum\_{\substack{\pi-t(1)+1}}^{t-1} \beta\_l(\overline{\pi}\_\Gamma) = \left(\boldsymbol{\varrho}\_l^{\underline{\pi}\_{t(1)+1}} - \boldsymbol{\varrho}\_l^{\underline{\pi}\_{t(1)+2}}\right) + \dots + \left(\boldsymbol{\varrho}\_l^{\underline{\pi}\_{t-1}} - \boldsymbol{\varrho}\_l^{\underline{\pi}\_l}\right) = \boldsymbol{\varrho}\_l^{\underline{\pi}\_{t(1)+1}} - \boldsymbol{\varrho}\_l^{\underline{\pi}\_l}.$$

Obviously, (33) is satisfied.

The proof that IDP (36), (37) satisfies subgame efficiency (27) is based on direct calculations but rather cumbersome in general case (i.e., for arbitrary game Γ*x*<sup>0</sup> ). Let us demonstrate how it works for the game in Example 2. For instance we verify that the incremental IDP meets the subgame efficiency condition at node *x*3.

Note that Ω(*ux*<sup>3</sup> ) = *ωx*<sup>3</sup> <sup>1</sup> = (*x*3, *<sup>x</sup>*<sup>1</sup> <sup>4</sup>); *<sup>ω</sup>x*<sup>3</sup> <sup>2</sup> = (*x*3, *<sup>x</sup>*<sup>2</sup> <sup>4</sup>, *<sup>x</sup>*5, *<sup>x</sup>*6); *<sup>ω</sup>x*<sup>3</sup> <sup>3</sup> = (*x*3, *<sup>x</sup>*<sup>3</sup> <sup>4</sup>, *z*7) while *p*(*ωx*<sup>3</sup> <sup>1</sup> , *<sup>u</sup>x*<sup>3</sup> ) = *<sup>π</sup>*(*x*<sup>1</sup> <sup>4</sup>|*x*3), *<sup>p</sup>*(*ωx*<sup>3</sup> <sup>2</sup> , *<sup>u</sup>x*<sup>3</sup> ) = *<sup>π</sup>*(*x*<sup>2</sup> <sup>4</sup>|*x*3) and *<sup>p</sup>*(*ωx*<sup>3</sup> <sup>3</sup> , *<sup>u</sup>x*<sup>3</sup> ) = *<sup>π</sup>*(*x*<sup>3</sup> <sup>4</sup>|*x*3). Then, using (32), (33), Remark 2, equality ∑ *xk* <sup>4</sup>∈*S*(*x*3) *π*(*x<sup>k</sup>* <sup>4</sup>|*x*3) = 1 and the notation

$$\Phi\_l^4 = \sum\_{k=1}^3 \pi(\mathbf{x}\_4^k | \mathbf{\mathcal{F}}\_3) \cdot \boldsymbol{\phi}\_l^{\chi\_4^k}.$$

we obtain

∑ *ωx*<sup>3</sup> *<sup>k</sup>* <sup>∈</sup>Ω(*ux*<sup>3</sup> ) *p*(*ωx*<sup>3</sup> *<sup>k</sup>* , *<sup>u</sup>x*<sup>3</sup> ) · *T*(*k*) ∑ *τ*=3 *βi*(*xτ*) = *π*(*x*<sup>1</sup> <sup>4</sup>|*x*3) · *ϕx*<sup>3</sup> *<sup>i</sup>* <sup>−</sup> <sup>Φ</sup><sup>4</sup> *i* <sup>+</sup> *<sup>ϕ</sup>x*<sup>1</sup> 4 *i* ! + +*π*(*x*<sup>2</sup> <sup>4</sup>|*x*3) · *ϕx*<sup>3</sup> *<sup>i</sup>* <sup>−</sup> <sup>Φ</sup><sup>4</sup> *i* + *ϕx*2 4 *<sup>i</sup>* <sup>−</sup> *<sup>ϕ</sup>x*<sup>5</sup> *i* + *ϕx*<sup>5</sup> *<sup>i</sup>* <sup>−</sup> *<sup>ϕ</sup>x*<sup>6</sup> *i* + *ϕx*<sup>6</sup> *i* + +*π*(*x*<sup>3</sup> <sup>4</sup>|*x*3) · *ϕx*<sup>3</sup> *<sup>i</sup>* <sup>−</sup> <sup>Φ</sup><sup>4</sup> *i* + *ϕx*3 4 *<sup>i</sup>* <sup>−</sup> *<sup>ϕ</sup>z*<sup>7</sup> *i* + *ϕz*<sup>7</sup> *i* = *ϕx*<sup>3</sup> *i* · 3 ∑ *k*=1 *π*(*x<sup>k</sup>* <sup>4</sup>|*x*3)+ +*π*(*x*<sup>1</sup> <sup>4</sup>|*x*3) · <sup>−</sup>Φ<sup>4</sup> *<sup>i</sup>* <sup>+</sup> *<sup>ϕ</sup>x*<sup>1</sup> 4 *i* ! + *π*(*x*<sup>2</sup> <sup>4</sup>|*x*3) · <sup>−</sup>Φ<sup>4</sup> *<sup>i</sup>* <sup>+</sup> *<sup>ϕ</sup>x*<sup>2</sup> 4 *i* + *π*(*x*<sup>3</sup> <sup>4</sup>|*x*3) · <sup>−</sup>Φ<sup>4</sup> *<sup>i</sup>* <sup>+</sup> *<sup>ϕ</sup>x*<sup>3</sup> 4 *i* = = *ϕx*<sup>3</sup> *<sup>i</sup>* <sup>−</sup> <sup>Φ</sup><sup>4</sup> *i* · 3 ∑ *k*=1 *π*(*x<sup>k</sup>* <sup>4</sup>|*x*3) + 3 ∑ *k*=1 *π*(*x<sup>k</sup>* <sup>4</sup>|*x*3) · *<sup>ϕ</sup>x<sup>k</sup>* 4 *<sup>i</sup>* <sup>=</sup> *<sup>ϕ</sup>x*<sup>3</sup> *i* .

According to Proposition 3, the incremental payment schedule (36), (37) can be used to implement a long-term cooperative agreement in an extensive-form game with chance moves.

### **6. Conclusions**

In the paper we purposes to design a mechanism of the players' sustainable long-term cooperation that satisfies a number of good properties. To this aim we formalised the players' rank based algorithm for selecting a unique optimal bundle of cooperative trajectories, and proved that corresponding cooperative strategy profile satisfies time consistency. To calculate *γ*-characteristic function one need to have a specific method for constructing a unique (subgame perfect) equilibrium at any extensive-form game with chance moves. Hence, we formalised a backwards induction procedure refinement based on the players' attitude vectors—the so-called attitude SPE algorithm.

As a result of reexamination of the "IDP time consistency in the whole game" concept, we suggest to adopt the concept of subgame consistency, introduced in Reference [36] for differential stochastic games and then extend it to dynamic stochastic games in References [37,38]. The definition of subgame consistency for extensive-form game with chance moves is provided. This property takes into account such an interesting feature of the games under consideration that when the players make a decision in the subgame Γ*xt* after the chance move occurs, they need to recalculate their expected optimal payoffs-to-go since the original optimal bundle of cooperative trajectories shrinks after each chance node. It is worth noting that a similar approach, based on the IDP subgame consistency notion could be applied to dynamic games played over event trees ([14,16,20]). We proved that the incremental IDP specified for multistage games with chance moves in Reference [9] satisfy subgame consistency and subgame efficiency as well as the strict balance condition.

It follows from Propositions 1–3 that two specified algorithms combined with the *γ*-characteristic function, and the incremental payment schedule together constitute a mechanism of the players' sustainable cooperation that satisfies a number of good properties and could be used in extensive-form games with chance moves. Note that the main result of the paper—Proposition 3—does not depend on the specific method which the players employ to calculate the characteristic function as well as on the specific single-valued cooperative solution meeting (25) and (26).

Since this is the first time that subgame consistent solutions are examined for extensive-form games with chance moves, further research along this line is expected. It is surely of interest to develop appropriate software application to implement proposed algorithms in arbitrary extensive-form game with chance moves. Possibly, one can use the so-called Game Theory Explorer [30] when developing such software tools for 2-person extensive games. Further, it might be interesting to run experiments with large-scale datasets, after the software application that allows to construct unique SPE, the optimal bundle of cooperative trajectories, *γ*-characteristic function, and so forth, will be developed.

Let us notice some preliminary suggestions on how one can use such software application to run simulations. First, one can vary the main parameter—the length of the game tree, and the additional parameters such as the game structure, the players' payoffs, probabilities of transitions, and so forth, to obtain practical estimations of the proposed algorithms complexity and scalability. Secondarily, one can generate external disturbances of the stage payoffs and probabilities and vary the players' attitude vectors to carry out the sensitivity analysis of the proposed non-cooperative and cooperative solutions. Further, it is of interest to get experimental estimations of the price of anarchy and the price of stability for the class of games under consideration. Finally, one can use such software application to check whether the additional properties (non-negativity, irrational-behavior-proof conditions, etc.) of the proposed incremental IDP and other payment schedules (see, e.g., Reference [15]) are satisfied for given extensive-form game with chance moves.

**Author Contributions:** Conceptualization, D.K.; methodology, D.K.; formal analysis, D.K.; investigation, D.K. and N.S.; writing—original draft preparation, D.K. and N.S.; writing—review and editing, D.K. and N.S.; visualization, D.K. and N.S.; supervision, D.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** The reported study was funded by RFBR under the research project 18-00-00727 (18-00-00725).

**Acknowledgments:** We would like to thank three anonymous Reviewers and Leon Petrosyan for their valuable comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
