*Article* **On a Simplified Method of Defining Characteristic Function in Stochastic Games**

**Elena Parilina \*,† and Leon Petrosyan †**

Department of Mathematical Game Theory and Statistical Decisions, Saint Petersburg State University,

7/9 Universitetskaya nab., Saint Petersburg 199034, Russia; l.petrosyan@spbu.ru

**\*** Correspondence: e.parilina@spbu.ru

† These authors contributed equally to this work.

Received: 29 May 2020; Accepted: 9 July 2020; Published: 11 July 2020

**Abstract:** In the paper, we propose a new method of constructing cooperative stochastic game in the form of characteristic function when initially non-cooperative stochastic game is given. The set of states and the set of actions for any player is finite. The construction of the characteristic function is based on a calculation of the maximin values of zero-sum games between a coalition and its anti-coalition for each state of the game. The proposed characteristic function has some advantages in comparison with previously defined characteristic functions for stochastic games. In particular, the advantages include computation simplicity and strong subgame consistency of the core calculated with the values of the new characteristic function.

**Keywords:** cooperative stochastic game; strong subgame consistency; characteristic function; core

### **1. Introduction**

When a non-cooperative game is initially defined, the problem of construction of a cooperative version of the game is actual if players start acting as a unique coalition to maximize their joint payoff or minimize joint costs. The classical approach is to define cooperative game in a form of characteristic function that assigns the value for any coalition of players. Subsequently, based on this function one can calculate the imputation of the joint payoff allocating it among players. The component of the imputations may vary if we calculate them based on different characteristic functions. Therefore, the way of defining this function is important and it has influence on the players' payoffs in cooperative game. Moreover, some approaches to define characteristic function make it impossible to apply in dynamic or differential games because of computational difficulties. Additionally, the way of constructing characteristic function also influences on the consistency properties of cooperative solutions that are realized in dynamics.

The choice of the approach on how to define characteristic function also depends on the background of the considered problem if it arises from an applied area. The existence and uniqueness issues are also actual when one chooses the way of constructing characteristic function. There exist different approaches that can be applied to stochastic game. The so-called maxmin and minmax approaches define the value of the function for coalition *S* as maxmin and minmax payoff of coalition *S* in zero-sum game against coalition of all left-out players [1,2]. Another approach is proposed in [3,4] when the value of coalition *S* is defined as its payoff in the Nash equilibrium in the non-cooperative game between coalition *S* and left-out players acting individually. The calculation of characteristic function in two-step procedure is proposed in [5], in which the authors first find an *n*-player non-cooperative equilibrium and then allow coalition *S* to optimize its payoff, assuming that left-out players use their Nash equilibrium actions found at the first step. The properties of this function are examined in [6,7]. Another two-stage approach for defining characteristic function is proposed in [8], in which the strategies maximizing total payoff of the players are first found. Subsequently, these strategies are used by the players from coalition *S*, while the out-coalition players use the strategies minimizing the total payoff of players from *S*. The joint payoff of players from the coalition equals the value of characteristic function for this coalition.

The new simplified method of constructing characteristic function in multistage games is introduced in [9]. They examine the properties of this function and proved that the corresponding core is strongly subgame-consistent in multistage game. This property cannot be proved in general case when the characteristic function is constructed with the classical approaches, like maxmin or minmax.

In the paper, we adopt the method of constructing the characteristic function proposed in [9] to stochastic games. Based on the values of the characteristic function, one can determine the core. Moreover, the core satisfies the strong subgame consistency property, which is a refinement of subgame consistency on the case of set-valued cooperative solutions. The problem of subgame consistency is originally examined for differential games in [10,11]. The construction of a special payment scheme, called imputation distribution procedure (see [11]), allows for coping with the problem of time inconsistency of cooperative solutions. This problem is described for stochastic games in [12–14] in the case of unique-valued cooperative solutions. The node-consistent core is constructed in dynamic games played over event trees in [15]. The strong subgame consistency of the set-valued cooperative solution, like the core, guarantees players to obtain, in total, the solution from initially defined core. It means that, in any intermediate time period, the solution is the sum of obtained payments up to the current period, and the core elements of subgame starting from the next time period. The strong subgame consistency condition is proposed in [16]. The subcore satisfying strong subgame property is constructed for multistage games in [17]. The problem of subgame consistency is actual for different classes of dynamic and differential games and it is examined in [18] for stochastic games with finite duration, in [19] for differential games with finite time horizon, in [20] for multistage games. In the paper, we construct characteristic function for stochastic game in a special way and calculate the core while using the values of this function. The core satisfies strong subgame consistency property. To prove this result, we define the imputation distribution procedure, which determines the payments to the players in any state realized in the game process.

The rest of the paper is organized, as follows. We describe the model of stochastic games in Section 2.1. In Section 2.2, we define the new approximated characteristic function for stage games, and then extend this approach to the case of stochastic game in Section 2.3. We formulate the definition of the imputation distribution procedure for stochastic games and describe the idea of strongly subgame consistency of the core in Section 3. We briefly conclude in Section 4.

### **2. Cooperative Stochastic Games**

### *2.1. Model*

Consider a non-cooperative stochastic game *G* given by

$$\mathbf{G} = \left( N, \Omega, \{ \Gamma(\omega) \} \_{\omega \in \Omega \prime} \pi 0, \left\{ p(\omega^{\prime \prime} | \omega^{\prime}, a^{\omega^{\prime}}) \right\} \_{\begin{smallmatrix} \omega^{\prime} \omega^{\prime \prime} \in \Omega \\ a^{\prime \prime} \in \prod\_{i \in N} A\_i^{\omega^{\prime}} \end{smallmatrix}}, \delta \right), \tag{1}$$

where


• *δ* ∈ (0, 1) is a common discount factor.

Denote by *G<sup>ω</sup>* the subgame of *G* starting from state *ω* defined by (1) with *π*0, such that *π<sup>ω</sup>* <sup>0</sup> = 1 and *π<sup>ω</sup>* <sup>0</sup> = 0 for any state *ω* = *ω*.

We assume that, in stochastic game *G*, the set of any player's strategies *Hi* is stationary. The stationary strategy of player *<sup>i</sup>* is *<sup>η</sup><sup>i</sup>* assigning action (maybe mixed) *ai* <sup>∈</sup> <sup>Δ</sup>(*A<sup>ω</sup> <sup>i</sup>* ) to any state *<sup>ω</sup>*. The vector (*η*1, ... , *<sup>η</sup>n*) ∈ <sup>∏</sup>*j*∈*<sup>N</sup> Hj* is a stationary strategy profile in stochastic game *<sup>G</sup>*. It is obvious that a stationary strategy *η<sup>i</sup>* of player *i* ∈ *N* in game *G* is the stationary strategy of this player in any subgame *Gω*.

By the payoff of player *i*, we assume the expected payoff in stochastic subgame *G<sup>ω</sup>* given by

$$K\_i^{\omega}(\eta) = K\_i^{\omega}(a^{\omega}) + \delta \sum\_{\omega' \in \Omega} p(\omega'|\omega, a^{\omega}) E\_i^{\omega'}(\eta). \tag{2}$$

where *<sup>η</sup>* <sup>∈</sup> *<sup>H</sup>* <sup>=</sup> <sup>∏</sup>*j*∈*<sup>N</sup> Hj* is a stationary strategy profile such that *<sup>η</sup>*(*ω*) = *<sup>a</sup><sup>ω</sup>* <sup>∈</sup> <sup>∏</sup>*j*∈*<sup>N</sup> <sup>A</sup><sup>ω</sup> <sup>j</sup>* . We rewrite Equation (2) in a vector form and obtain

$$E\_i(\eta) = K\_i(a) + \delta \Pi(\eta) E\_i(\eta),\tag{3}$$

where *Ei*(*η*)=(*Eω*<sup>1</sup> *<sup>i</sup>* (*η*), ... , *<sup>E</sup>ω<sup>k</sup> <sup>i</sup>* (*η*)) , *Ki*(*a*)=(*Kω*<sup>1</sup> *<sup>i</sup>* (*aω*<sup>1</sup> ), ... , *<sup>K</sup>ω<sup>k</sup> <sup>i</sup>* (*aω<sup>k</sup>* )) . A matrix of transition probabilities is formed in the following way

$$\Pi(\eta) = \begin{pmatrix} p(\omega\_1|\omega\_1, a^{\omega\_1}) & \dots & p(\omega\_k|\omega\_1, a^{\omega\_1}) \\ p(\omega\_1|\omega\_2, a^{\omega\_2}) & \dots & p(\omega\_k|\omega\_2, a^{\omega\_2}) \\ \dots & \dots & \dots \\ p(\omega\_1|\omega\_k, a^{\omega\_k}) & \dots & p(\omega\_k|\omega\_k, a^{\omega\_k}) \end{pmatrix} \tag{4}$$

in which each row contains transition probabilities from a corresponding state.

Equation (3) implies the explicit formula to calculate the expected payoff of player *i* when the stationary strategy profile *η* is realized:

$$E\_i(\eta) = \left(\mathbb{I}\_k - \delta \Pi(\eta)\right)^{-1} K\_i(a)\_{\eta}$$

where I*<sup>k</sup>* is an identity matrix of size *k* × *k*. Inverted matrix (I*<sup>k</sup>* − *δ*Π(*η*)) <sup>−</sup><sup>1</sup> always exists for discount factor *δ* ∈ (0, 1).

Taking into account the probability distribution *π*0, we calculate the expected payoff in game *G*, as

$$E\_i(\eta) = \pi\_0 E\_i(\eta) = \pi\_0 \left(\mathbb{I}\_k - \delta \Pi(\eta)\right)^{-1} K\_i(a). \tag{5}$$

If players cooperate, they find the cooperative strategy profile *η*∗ maximizing the total expected payoff, which is

$$\eta^\* = \arg\max\_{\eta \in H} \sum\_{i \in N} \vec{E}\_i(\eta).$$

We should notice that *η*∗ is a pure stationary strategy profile. The profile *η*∗ is such that *η*∗ *<sup>i</sup>* (*ω*) = *<sup>a</sup>ω*<sup>∗</sup> *<sup>i</sup>* <sup>∈</sup> *<sup>A</sup><sup>ω</sup> <sup>i</sup>* , *ω* ∈ Ω. We also assume that the profile *η*<sup>∗</sup> is such that max*η*∈*<sup>H</sup>* <sup>∑</sup>*i*∈*<sup>N</sup> <sup>E</sup><sup>ω</sup> <sup>i</sup>* (*η*) = <sup>∑</sup>*i*∈*<sup>N</sup> <sup>E</sup><sup>ω</sup> <sup>i</sup>* (*η*∗) for any state *ω* ∈ Ω, which means that the cooperative strategy profile maximizes the total payoff of the players independently of which state is initial. This assumption is usually satisfied for most stochastic games.

To define cooperative game when the non-cooperative stochastic game is given, we use the classical approach and define it in the form of characteristic function *<sup>v</sup>* : <sup>2</sup>*<sup>N</sup>* <sup>→</sup> <sup>R</sup><sup>1</sup> whose values estimate the "power" of any coalition or the subset of players. In [21], the characteristic function value for coalition *S* in subgame starting at any state *ω* is defined in as maxmin value, which is

*vω*(*S*) = *val G<sup>ω</sup> <sup>S</sup>* , (6)

where *G<sup>ω</sup> <sup>S</sup>* is a zero-sum stochastic subgame starting at state *ω*, in which coalition *S* is a maximizing player, coalition *<sup>N</sup>*\*<sup>S</sup>* is a minimizing player. Existence of the value of game *<sup>G</sup><sup>ω</sup> <sup>S</sup>* for stochastic games is proved in [22].

### *2.2. Approximated Characteristic Function for State Games*

Before we define a characteristic function in a new form, we need to make additional calculations. First, we consider state games and propose a scheme of calculation of the approximated characteristic function values for any state. Define characteristic function for a state *ω* ∈ Ω or one-shot game Γ(*ω*) given in normal form while using the maxmin approach:

$$w(\omega, S) = \max\_{a\_S \in \prod\_{j \in S} A\_j^{\omega}} \min\_{a\_{N/S} \in \prod\_{j \in N/S} A\_j^{\omega}} \sum\_{i \in S} K\_i^{\omega} (a\_{S'}^{\omega}, a\_{N/S}^{\omega})\_{\prime} \tag{7}$$

where maxmin in (7) is found in pure strategies.

Let *C*(*ω*) be a non-empty core in the game defined in state *ω* using c.f. (7), which is

$$\mathcal{C}(\omega) = \left\{ (a\_1(\omega), \dots, a\_n(\omega)) : \sum\_{i \in S} a\_i(\omega) \geqslant \upsilon(\omega, S), \forall S \subset N, \sum\_{i \in N} a\_i(\omega) = \upsilon(\omega, N) \right\} \tag{8}$$

**Remark 1.** *We assume that conditions under which the core C*(*ω*) *exists for any state ω are satisfied. The core <sup>C</sup>*(*ω*) *is non-empty if and only if for any function <sup>ψ</sup>* : <sup>2</sup>*<sup>N</sup>* \ <sup>∅</sup> <sup>→</sup> [0, 1]*, where* <sup>∑</sup>*S*∈2*N*:*S<sup>i</sup> <sup>ψ</sup>*(*S*) = <sup>1</sup> *for any i* ∈ *N, condition (see [23,24])*

$$\sum\_{\mathbf{S}\in\mathcal{B}^N\backslash\mathcal{D}} \psi(\mathbf{S})\upsilon(\omega,\mathbf{S}) \le \upsilon(\omega,N) \tag{9}$$

*holds. Characteristic function v*(*ω*, *S*) *is defined by* (7)*. We refer to the book [25] for further discussion of non-emptiness of the core.*

Second, for any coalition *S* ⊆ *N* define maximal value of characteristic function (7) over set Ω:

$$
\hat{w}(\mathbb{S}) = \max\_{\omega \in \Omega} v(\omega, \mathbb{S})\_\prime \tag{10}
$$

which is the maximal value that coalition *S* can obtain in state games.

The next step is to define the approximated value of the characteristic function for any state in the following way. Let for any state *ω* ∈ Ω the approximated characteristic function *w*(*ω*, *S*) be given as

$$w(\omega, S) = \begin{cases} \sum\_{i \in S} K\_i^{\omega}(a^{\omega \*}), & \text{if } S = N, \\ \vartheta(S), & \text{if } S \neq N. \end{cases} \tag{11}$$

In Equation (11), the summarized payoff of the players adopting cooperative action profile *aω*<sup>∗</sup> is assigned to the grand coalition. The approximated (maximal possible value over all possible states) values of characteristic function *w*ˆ(*S*) given by (10) are assigned to any coalition *S* different from *N*. Denote the core constructed with the values of characteristic function (11) as *D*(*ω*) and assume that it is non-empty for any state *ω*,

$$D(\omega) = \left\{ (a\_1(\omega), \dots, a\_n(\omega)) : \sum\_{i \in S} a\_i(\omega) \geqslant w(\omega, S), \forall S \subset N, \sum\_{i \in N} a\_i(\omega) = w(\omega, N) \right\}.\tag{12}$$

**Lemma 1.** *Let for any coalition S* ⊂ *N, S* = *N, the inequality w*ˆ(*S*) < min *<sup>ω</sup>*∈<sup>Ω</sup> *<sup>v</sup>*(*ω*, *<sup>N</sup>*) *hold. If condition*

$$\sum\_{i \in N} K\_i^{\omega} (a^{\omega \circ \cdot}) = \max\_{a^{\omega} \in \prod\_{j \in N} A\_j^{\omega}} \sum\_{i \in N} K\_i^{\omega} (a^{\omega \circ})\_{\prime} \tag{13}$$

*is true, and the core D*(*ω*) *is non-empty for any ω, and then D*(*ω*) ⊂ *C*(*ω*)*.*

**Proof.** If there exists coalition *S* ⊂ *N*, *S* = *N*, such that *w*ˆ(*S*) min *<sup>ω</sup>*∈<sup>Ω</sup> *<sup>v</sup>*(*ω*, *<sup>N</sup>*), then the core *<sup>D</sup>*(*ω*) is empty. Assuming the non-emptiness of the core *D*(*ω*), we consider any imputation *α*(*ω*) ∈ *D*(*ω*). If condition (13) is true, it means that ∑ *αi*(*ω*) = *v*(*ω*, *N*) = *w*(*ω*, *N*).

*i*∈*N* Subsequently, for any coalition *S* ⊂ *N*, we have ∑ *i*∈*S αi*(*ω*) *w*(*ω*, *S*) = *w*ˆ(*S*) = max *<sup>ω</sup>*∈<sup>Ω</sup> *<sup>v</sup>*(*ω*, *<sup>S</sup>*) - *v*(*ω*, *S*), which proves that *α*(*ω*) ∈ *C*(*ω*).

**Remark 2.** *Condition* (13) *states that the maximal total payoff of the players in state ω coincides with their payoff if players adopt actions prescribed by the cooperative strategy profile. It may not be satisfied in general case in dynamic games. If condition* (13) *is not true, the main result of the paper can be proved, but it requires a modification in the method of characteristic function definition. We leave this case for future research.*

**Remark 3.** *We assume that the approximated core D*(*ω*) *is non-empty for any ω. The conditions under which it is non-empty are similar to the ones given in Remark 1, but in Equation* (9) *characteristic function w*(*ω*, *S*) *given by* (11) *is used. If the conditions of Lemma 1 are satisfied, then D*(*ω*) ⊂ *C*(*ω*)*, and non-emptiness of approximated core D*(*ω*) *implies non-emptiness of core C*(*ω*)*.*

**Example 1.** *Consider three-player stochastic game with two states (ω*<sup>1</sup> *and ω*2*). The sets of actions of player 1, 2, and 3 in state ω*<sup>1</sup> *(ω*2*) are* {*a*1, *a*2}*,* {*b*1, *b*2} *and* {*c*1, *c*2} *(*{*α*1, *α*2}*,* {*ζ*1, *ζ*2}*,* {*γ*1, *γ*2}*), respectively. The payoff functions are given by the following matrices:*

• *in state ω*1*:*

*c*<sup>1</sup> : *b*<sup>1</sup> *b*<sup>2</sup> *a*<sup>1</sup> (10, 10, 8) (0, 15, 0) *<sup>a</sup>*<sup>2</sup> (15, 0, 0) (5, 5, 5) *<sup>c</sup>*<sup>2</sup> : *b*<sup>1</sup> *b*<sup>2</sup> *a*<sup>1</sup> (0, 0, 15) (2, 4, 4) *a*<sup>2</sup> (4, 4, 2) (0, 0, 0)

• *in state ω*2*:*

$$\gamma\_1: \qquad \begin{array}{ccccc} \zeta\_1 & \zeta\_2 & & \zeta\_1 & \zeta\_2 \\ \alpha\_1 \left( (2,1,1) & (4,0,2) \right) & & \gamma\_2: & \quad a\_1 \left( (2,3,0) & (4,2,4) \\ (0,4,2) & (7,5,3) & & & \end{array} \right) \end{array}$$

*Player 1 chooses a row, player 2 chooses a column and player 3 chooses a matrix.*

*The transition probabilities are written in the matrices:*

• *for state ω*1*:*

$$c\_1: \begin{array}{cccc} b\_1 & b\_2 & & b\_1 & b\_2\\ a\_1 \ \left( \begin{matrix} (0.5,0.5) & (0,1) \\ (0,1) & (0,1) \end{matrix} \right) & \quad c\_2: \begin{array}{cccc} a\_1 & & b\_2 & & \\ & a\_1 \ \left( \begin{matrix} (0,1) & (0.5,0.5) \\ (0.5,0.5) & (1,0) \end{matrix} \right) & \\ \end{array}$$

• *for state ω*2*:*

$$\gamma\_1: \quad \begin{array}{ccccc} \zeta\_1 & \zeta\_2 & & & \zeta\_1 & \zeta\_2\\ \varkappa\_1 & \binom{(0,1)}{(1,0)} & (1,0) & & & \\ & \varkappa\_2 & \binom{(0,2,0.8)}{(1,0)} & \binom{(0,1)}{(0,1)} & (1,0) \end{array}$$

*The first (second) element in any entry of the matrix is the probability of transition from the particular state and action profile to state ω*<sup>1</sup> *(state ω*2*). One can easily notice that the probabilistic transitions are defined in state ω*<sup>1</sup> *when players choose action profiles* (*a*1, *b*1, *c*1)*,* (*a*2, *b*1, *c*2) *and* (*a*1, *b*2, *c*2)*, and in state ω*<sup>2</sup> *when players choose action profiles* (*α*1, *ζ*1, *γ*2)*. All other transitions are deterministic.*

*The discount factor equals 0.9. Cooperative strategy profile η*∗ = (*η*∗ <sup>1</sup> , *η*<sup>∗</sup> <sup>2</sup> , *η*<sup>∗</sup> <sup>3</sup> ) *is such that*

$$
\eta\_1^\* = (a\_1, \eta\_2), \quad \eta\_2^\* = (b\_1, \zeta\_2), \quad \eta\_2^\* = (c\_1, \gamma\_2), \tag{14}
$$

*which prescribes any player to choose the first action in state ω*<sup>1</sup> *and the second action in state ω*2*. The cooperative strategy profile defines a Markov chain with the structure that is depicted in Figure 1.*

**Figure 1.** The transition probabilities defined by cooperative strategy profile *η*∗.

*The players' payoffs are* (10, 10, 8) *in state ω*<sup>1</sup> *and* (7, 5, 7) *in state ω*2*. We obtain that the maximal total payoff of the players in state games coincide with the payoff that players get in states implementing cooperative strategy profile η*∗*. However, Theorem 1 is also true for the case when this condition is not satisfied.*

*First, we calculate the characteristic function v*(*ω*, *S*) *by Equation* (7) *and its approximation w*(*ω*, *S*) *by* (11) *for state games. The values of these functions are represented in Table 1.*

**Table 1.** Values of characteristic function *v* and approximated characteristic function *w* for states *ω*<sup>1</sup> and *ω*2.


*The cores of state games C*(*ω*) *and D*(*ω*) *calculated with values of functions v*(*ω*, *S*) *and w*(*ω*, *S*) *by Formulae* (8) *and* (12) *are non-empty for any ω and represented on Figures 2 and 3 for ω*<sup>1</sup> *and ω*<sup>2</sup> *respectively.*

**Figure 2.** The core *C*(*ω*1) (gray region) and approximated core *D*(*ω*1) (blue region inside gray region) for *ω*<sup>1</sup> state game.

**Figure 3.** The core *C*(*ω*2) (gray region) and approximated core *D*(*ω*2) (blue region inside gray region) for *ω*<sup>2</sup> state game.

### *2.3. New Approximated Characteristic Function for Stochastic Games*

We propose a new method of determining characteristic function for stochastic games based on the values of approximated characteristic function defined in states and given by Formula (11).

We assume that coalition *S* at any state of the game may obtain *w*ˆ(*S*) as maximum. Accordingly, this value is the maximal value that the coalition can get, regardless of the state that currently appears. If we summarize this value over infinite horizon with discount factor *δ*, we can calculate the approximation or the upper bound of the payoff that coalition *S* can get in stochastic subgame starting from state *ω*, which is

$$\psi(\omega, S) = \begin{cases} \vartheta(S) + \delta \vartheta(S) + \dots = \frac{1}{1 - \delta} \vartheta(S), & \text{if } S \subset N, S \neq N, \\\sum\_{i \in N} E\_i^{\omega}(\eta^\*), & \text{if } S = N. \end{cases} \tag{15}$$

One should notice that, according to Equation (15), we save the value of characteristic function for grand coalition without approximation. The reason is that, when we define the allocation of a joint payoff, the players should redistribute the value that they obtain using cooperative strategy profile, but not the approximated one. The cooperative stochastic subgame is defined by the set of players *N* and function (15). In the following, we omit the set of players and refer the cooperative stochastic subgame as *w*¯(*ω*, *S*) given by (15).

Let *D*¯ (*ω*) be the core calculated with the values of function (15), i.e.,

$$\bar{D}(\omega) = \left\{ (a\_1(\omega), \dots, a\_{\text{il}}(\omega)) : \sum\_{i \in S} a\_i(\omega) \geqslant \bar{w}(\omega, \mathbb{S}), \forall S \subset N, \sum\_{i \in N} a\_i(\omega) = \bar{w}(\omega, N) \right\}.\tag{16}$$

Let *<sup>D</sup>*¯ (*ω*) be non-empty for any *<sup>ω</sup>* <sup>∈</sup> <sup>Ω</sup>. We can compare the core *<sup>D</sup>*¯ (*ω*) constructed with the values of approximated function (15) and the core defined with the values of characteristic function defined with the classical approach. For any subgame *Gω*, we define characteristic function using the maxmin approach:

$$\psi(\omega, S) = \max\_{\eta\_S \in \prod\_{j \in S}} \max\_{\substack{H\_j \ \eta\_N \in \prod\_{j \in N} H\_j \\ j \in S}} \min\_{H\_j} \sum\_{i \in S} E\_i^{\omega} (\eta\_{S\nu} \eta\_{N\backslash S}). \tag{17}$$

Let *C*¯(*ω*) be a non-empty core of subgame *G<sup>ω</sup>* constructed with the values of function (17).

**Lemma 2.** *Let for any coalition S* ⊂ *N, S* = *N the inequality*

$$
\psi(S) < \min\_{\omega \in \Omega} v(\omega, N) \tag{18}
$$

*hold, and <sup>D</sup>*¯ (*ω*) *is non-empty for any <sup>ω</sup>, then <sup>D</sup>*¯ (*ω*) <sup>⊂</sup> *<sup>C</sup>*¯(*ω*)*.*

**Proof.** If *w*ˆ(*S*) < min *<sup>ω</sup>*∈<sup>Ω</sup> *<sup>v</sup>*(*ω*, *<sup>N</sup>*) is not satisfied, then the core *<sup>D</sup>*¯ (*ω*) is empty by construction. Consider any imputation *<sup>α</sup>*¯(*ω*) <sup>∈</sup> *<sup>D</sup>*¯ (*ω*) and prove that it belongs to the set *<sup>C</sup>*¯(*ω*).

First, ∑ *i*∈*N α*¯*i*(*ω*) = *w*¯(*ω*, *N*) = ∑ *i*∈*N Eω <sup>i</sup>* (*η*∗) = *v*¯(*ω*, *N*).

Second, we prove that ∑ *i*∈*S α*¯*i*(*ω*) *v*¯(*ω*, *S*) taking into account that ∑ *i*∈*S α*¯*i*(*ω*) *w*¯(*ω*, *S*) for any *S* = *N*. We prove that *w*¯(*ω*, *S*) *v*¯(*ω*, *S*).

By definition, we have

$$\vartheta(\omega, \mathcal{S}) = \max\_{\eta\_{\mathcal{S}}} \min\_{\eta\_{N\backslash\mathcal{S}}} \sum\_{i \in \mathcal{S}} E\_i^{\omega} (\eta\_{\mathcal{S}\prime} \eta\_{N\backslash\mathcal{S}})\_{\prime\prime}$$

and we write the functional equation for the right-hand side of this equality and obtain the following

$$\max\_{\eta\_{\mathcal{S}}} \min\_{\eta\_{N|\mathcal{S}}} \sum\_{i \in \mathcal{S}} E\_i^{\omega} (\eta\_{\mathcal{S}}, \eta\_{N|\mathcal{S}}) = \max\_{\eta\_{\mathcal{S}}} \min\_{\eta\_{N|\mathcal{S}}} \left\{ \sum\_{i \in \mathcal{S}} K\_i^{\omega} (a\_{\mathcal{S}}^{\omega}, a\_{N|\mathcal{S}}^{\omega}) + \delta p(\omega, a^{\omega}) \sum\_{i \in \mathcal{S}} E\_i (\eta\_{\mathcal{S}}, \eta\_{N|\mathcal{S}}) \right\},$$

where *p*(*ω*, *aω*) is a vector *p*(*ω* <sup>|</sup>*ω*, *<sup>a</sup>ω*) : *<sup>ω</sup>* <sup>∈</sup> <sup>Ω</sup> .

Let profile (*ηS*, *<sup>η</sup>N*\*S*) be such that maxmin is reached at this profile, we can write the functional equation, as follows:

$$\begin{split} \sum\_{i \in S} E\_i^{\omega} (\eta\_{\mathcal{S}}, \eta\_{N \backslash S}) &= (\mathbb{I}\_k - \delta \Pi(\eta\_{\mathcal{S}}, \eta\_{N \backslash S}))^{-1} \sum\_{i \in S} K\_i(a\_{\mathcal{S}}, a\_{N \backslash S}) \leqslant \frac{1}{1 - \delta} \max\_{\omega \in \Omega} \max\_{a\_{\mathcal{S}}} \min\_{a\_{N \backslash S}} \sum\_{i \in S} K\_i^{\omega} (a\_{\mathcal{S}}^{\omega}, a\_{N \backslash S}^{\omega}) \\ &= \frac{1}{1 - \delta} \max\_{\omega \in \Omega} v(\omega, S) = \bar{w}(\omega, S). \end{split}$$

In the last inequality, we use the property of stochastic matrices, i.e., the sum of the elements in any row of matrix (I*<sup>k</sup>* <sup>−</sup> *<sup>δ</sup>*Π(*ηS*, *<sup>η</sup>N*\*S*))−<sup>1</sup> equal 1/(<sup>1</sup> <sup>−</sup> *<sup>δ</sup>*), because <sup>Π</sup>(*ηS*, *<sup>η</sup>N*\*S*) is a stochastic matrix. The lemma is proved.

**Remark 4.** *We assume non-emptiness of the approximated core D*¯ (*ω*) *in stochastic game with any initial state ω. If condition* (18) *in Lemma 2 is satisfied, the non-emptiness of the approximated cores D*(*ω*) *for any ω implies the non-emptiness of approximated core D*¯ (*ω*)*. It follows from definition of characteristic function w*¯(*ω*, *S*) *and formula* (10)*. Moreover, the non-emptiness of approximated core D*¯ (*ω*) *implies non-emptiness of core C*¯(*ω*)*.*

**Example 2.** *(continuation of Example 1) We continue calculations for stochastic game described in Example 1. Define characteristic function v*¯ *by* (17) *and approximated characteristic function w*¯ *by* (15)*. The values of these functions are given in Table 2.*

**Table 2.** Values of characteristic function *v*¯ and approximated characteristic function *w*¯ for stochastic game starting from states *ω*<sup>1</sup> and *ω*2.


*The cores C*¯(*ω*) *and D*¯ (*ω*) *constructed with the values of functions v*¯ *and w*¯*, respectively, are non-empty and depicted on Figures <sup>4</sup> and <sup>5</sup> for initial states <sup>ω</sup>*<sup>1</sup> *and <sup>ω</sup>*2*, respectively. One can notice that <sup>D</sup>*¯ (*ω*) <sup>⊂</sup> *<sup>C</sup>*¯(*ω*) *for any ω.*

**Figure 4.** The core *C*¯(*ω*) (gray region) and approximated core *D*¯ (*ω*) (blue region inside gray region) in stochastic game with *ω*<sup>1</sup> initial state.

*The approximated core D*¯ (*ω*1) *is defined as the set*

$$D(\omega\_1) = \left\{ (\mathbb{R}\_1, \mathbb{R}\_2, \mathbb{R}\_3) : \mathbb{R}\_1 + \mathbb{R}\_2 + \mathbb{R}\_3 = 252.07, \mathbb{R}\_1 + \mathbb{R}\_2 \geqslant 120.00, \mathbb{R}\_1 + \mathbb{R}\_3 \geqslant 100.00, \mathbb{R}\_2 + \mathbb{R}\_3 \geqslant 100.00, \mathbb{R}\_2 + \mathbb{R}\_3 \geqslant 10.00 \right\}.$$

$$\bar{a}\_2 + \bar{a}\_3 \geqslant 100.00, \bar{a}\_1 \geqslant 20.00, \bar{a}\_2 \geqslant 10.00, \bar{a}\_3 \geqslant 10.00 \right\}.$$

*The approximated core D*¯ (*ω*2) *is defined as the set*

$$D(\omega\_2) = \left\{ (\mathbb{R}\_1, \mathbb{R}\_2, \mathbb{R}\_3) : \mathbb{R}\_1 + \mathbb{R}\_2 + \mathbb{R}\_3 = 245.86, \mathbb{R}\_1 + \mathbb{R}\_2 \geqslant 120.00, \mathbb{R}\_1 + \mathbb{R}\_3 \geqslant 100.00, \mathbb{R}\_2$$

$$\mathbb{R}\_2 + \mathbb{R}\_3 \geqslant 100.00, \mathbb{R}\_1 \geqslant 20.00, \mathbb{R}\_2 \geqslant 10.00, \mathbb{R}\_3 \geqslant 10.00 \right\}.$$

**Figure 5.** The core *C*¯(*ω*2) (gray region) and approximated core *D*¯ (*ω*2) (blue region inside gray region) in stochastic game with *ω*<sup>2</sup> initial state.

### **3. Strongly Subgame-Consistent Core in Stochastic Games**

### *3.1. Imputation Distribution Procedure*

In cooperation, players follow the cooperative strategy profile *η*∗ and then agree on the core as a cooperative solution of the game or the set of possible imputations of the joint payoff in the game. We assume that the core for any subgame *G<sup>ω</sup>* is calculated based on function (15), which is *D*¯ (*ω*). Consider an imputation *<sup>α</sup>*¯(*ω*) <sup>∈</sup> *<sup>D</sup>*¯ (*ω*). Obviously, if the players are paid step by step according to initially given payoff functions *K<sup>ω</sup> <sup>i</sup>* , *i* ∈ *N*, we cannot guarantee that they will get the components of imputation *α*¯(*ω*) as an expected payoff in subgame *Gω*. Therefore, we define the scheme of state payments that, in total, will give the players to obtain the components of imputation *α*¯(*ω*).

**Definition 1.** *[10,11] We call the collection of vectors* (*β<sup>i</sup>* : *i* ∈ *N*)*, where β<sup>i</sup>* = (*βi*(*ω*1), ... , *βi*(*ωk*))*, βi*(*ω*) *is a payment to player i in state ω in cooperative stochastic game, an imputation distribution procedure (IDP) of imputation <sup>α</sup>*¯(*ω*) <sup>∈</sup> *<sup>D</sup>*¯ (*ω*) *if*


The expected sum of payments to player *i* made according to IDP can be calculated by formula (see [14]):

$$B\_i^{\omega} = \pi\_0 (\mathbb{I} - \delta \Pi(\eta^\*))^{-1} \beta\_{i\nu}$$

where *π*<sup>0</sup> is such that *π<sup>ω</sup>* <sup>0</sup> = 1 and *<sup>π</sup><sup>ω</sup>* <sup>0</sup> = 0 for any *ω* = *ω*.

**Remark 5.** *The IDP determined in Definition <sup>1</sup> for an imputation <sup>α</sup>*¯(*ω*) <sup>∈</sup> *<sup>D</sup>*¯ (*ω*) *may be non-unique.*

In the following section, we describe a property of the imputations from the core and corresponding IDP, which allows to narrow the set of IDP.

### *3.2. Strongly Subgame-Consistent Core*

We formulate the property of strongly subgame consistency of the core and propose sufficient conditions of strongly subgame consistency of the core in stochastic games with characteristic function (15). We suppose that the cores of stochastic game *<sup>G</sup>* and any subgame *<sup>G</sup>ω*, *<sup>ω</sup>* <sup>∈</sup> <sup>Ω</sup>, are non-empty.

In cooperation, players agree on the joint implementation of cooperative strategy profile *η*∗ and expect to obtain the components of the imputation belonging to the core *D*¯ (*ω*) in the subgame stating from *<sup>ω</sup>*. Reaching an intermediate state *<sup>ω</sup>* <sup>∈</sup> <sup>Ω</sup>, Player *<sup>i</sup>* chooses action *<sup>a</sup>ω*<sup>∗</sup> *<sup>i</sup>* prescribed by cooperative strategy profile *η*<sup>∗</sup> and gets payoff *K<sup>ω</sup> <sup>i</sup>* (*aω*∗). If the players recalculate the solution in the current subgame and find solution of cooperative subgame *Gω*, we would assume that the cooperative solution is chosen from the core *D*¯ (*ω*). It would be reasonable to require that the payoff received by a player in state *ω* summarized with the expected sum of any imputations from the cores *D*¯ (*ω* ), *<sup>ω</sup>* <sup>∈</sup> <sup>Ω</sup>, following state *<sup>ω</sup>*, would be an imputation from the core *<sup>D</sup>*¯ (*ω*). If this property holds for any intermediate state *ω* ∈ Ω, then the core of cooperative stochastic game with characteristic function (15) is strongly subgame-consistent.

To determine a strongly subgame-consistent core, we need to define the so-called expected core at state *ω*, i.e., we define the set of expected imputations belonging to the cores, which are cooperative solutions of the following subgames. We determine the expected core of state *ω* ∈ Ω, as follows:

$$ED(\omega) = \left\{ \delta \sum\_{\omega \in \Omega} p(\omega'|\omega, a^{\omega \prime \*}) \mathbb{1}(\omega'), \quad \mathbb{1}(\omega') \in D(\omega') \right\}.$$

**Definition 2.** *We call the core D*¯ (*ω*) *strongly subgame consistent solution of cooperative stochastic game with approximated characteristic function <sup>w</sup>*¯(*ω*, *<sup>S</sup>*) *starting from state <sup>ω</sup> if for any imputation <sup>α</sup>*¯(*ω*) <sup>∈</sup> *<sup>D</sup>*¯ (*ω*) *there exists an IDP β* = (*β<sup>i</sup>* : *i* ∈ *N*)*, where β<sup>i</sup>* = (*βi*(*ω*) : *ω* ∈ Ω)*, satisfying condition:*

$$
\mathfrak{F} \oplus \mathrm{ED} \subset \mathfrak{D},
\tag{19}
$$

*where ED*¯ *is the vector* (*ED*¯ (*ω*1), ... , *ED*¯ (*ωk*)) *of expected cores for states ω*1, ... , *ω<sup>k</sup> respectively, D*¯ *is a vector with elements which are sets, i.e., D*¯ = (*D*¯ (*ω*1),..., *D*¯ (*ωk*)) *.*

**Remark 6.** *The inclusion* (19) *is written in a vector form. To explain it, we write the first row of vector inclusion* (19)*:*

$$
\beta(\omega\_1) \oplus E \vec{D}(\omega\_1) \subset \vec{D}(\omega\_1).
$$

*where <sup>β</sup>*(*ω*1) <sup>∈</sup> <sup>R</sup>*n, ED*¯ (*ω*1) <sup>⊂</sup> <sup>R</sup>*n, <sup>D</sup>*¯ (*ω*1) <sup>⊂</sup> <sup>R</sup>*n. The operation <sup>a</sup>* <sup>⊕</sup> *C, where <sup>a</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup> and <sup>C</sup> is a set in* <sup>R</sup>*n, is defined as the set* {*a* + *c*, *for all c* ∈ *C*}*.*

**Theorem 1.** *The core D*¯ (*ω*)*, if it exists, is strongly subgame-consistent.*

**Proof.** Following Definition 2 we need to prove that there exists an IDP of the elements from the core *D*¯ (*ω*) defined in (16) satisfying two properties from Definition 1, such that inclusion (19) is true.

Let for any imputation *<sup>α</sup>*¯*i*(*ω*) <sup>∈</sup> *<sup>D</sup>*¯ (*ω*), the IDP is calculated as

$$\beta\_i = (\mathbb{I}\_k - \delta \Pi(\eta^\*)) \mathbb{R}\_{i\star} \tag{20}$$

where *β<sup>i</sup>* = (*βi*(*ω*1),..., *βi*(*ωk*)) and *α*¯*<sup>i</sup>* = (*α*¯*i*(*ω*1),..., *α*¯*i*(*ωk*)) .

First, we prove that *β*, defined in (20), satisfies properties 1 and 2 in Definition 1.

1. Find the sum of *β<sup>i</sup>* over the set of players, we obtain

$$\begin{split} \sum\_{i \in N} \beta\_i = \left(\mathbb{I}\_k - \delta \Pi(\eta^\*)\right) \sum\_{i \in N} \mathbb{I}\_i = \left(\mathbb{I}\_k - \delta \Pi(\eta^\*)\right) \left(\mathbb{\left<\omega\_1, N\right>}, \dots, \mathbb{\left<\omega\_k, N\right>}\right)' \\ = \left(\mathbb{I}\_k - \delta \Pi(\eta^\*)\right) \left(\mathbb{I}\_k - \delta \Pi(\eta^\*)\right)^{-1} \sum\_{i \in N} K\_i(a^\*) = \sum\_{i \in N} K\_i(a^\*)\_{si} \end{split}$$

or for any *<sup>ω</sup>* <sup>∈</sup> <sup>Ω</sup> the equality <sup>∑</sup>*i*∈*<sup>N</sup> <sup>β</sup>i*(*ω*) = <sup>∑</sup>*i*∈*<sup>N</sup> <sup>K</sup><sup>ω</sup> <sup>i</sup>* (*aω*∗) is true. 2. We prove that *α*¯*i*(*ω*) = *B<sup>ω</sup> <sup>i</sup>* or in vector form *α*¯*<sup>i</sup>* = *Bi*, where *Bi* = (*Bi*(*ω*1),..., *Bi*(*ωk*)) . We have

$$B\_i = (\mathbb{I}\_k - \delta \Pi(\eta^\*))^{-1} \beta\_i = (\mathbb{I}\_k - \delta \Pi(\eta^\*))^{-1} (\mathbb{I}\_k - \delta \Pi(\eta^\*)) \mathbb{h}\_i = \mathbb{h}\_i.$$

Therefore, the payment vector *βi*, *i* ∈ *N*, is the distribution procedure of imputation *α*¯*i*.

Now, we prove that inclusion (19) holds. Let *β<sup>i</sup>* be given by Equation (20), there *<sup>α</sup>*¯*<sup>i</sup>* = (*α*¯*i*(*ω*1),..., *<sup>α</sup>*¯*i*(*ωk*)) and *<sup>α</sup>*¯*i*(*ωj*) <sup>∈</sup> *<sup>D</sup>*¯ (*ωj*) for any *<sup>j</sup>* <sup>=</sup> 1, ... , *<sup>k</sup>*. Consider the sum *β*(*ω*) + *ε*(*ω*), where *ε*(*ω*) is any vector from the expected core *ED*¯ (*ω*). Substituting expressions of *β<sup>i</sup>* from Equation (20) and element of the expected core into the sum, we get

$$
\beta + \varepsilon = (\mathbb{I}\_k - \delta \Pi(\eta^\*))\mathbb{R} + \delta \Pi(\eta^\*)\mathbb{R} = \mathbb{R} \in \mathcal{D},
$$

which proves the theorem.

Theorem 1 gives the method of construction of payment scheme of any element from the core *D*¯ defined by (16) while using values of function (15).

**Example 3.** *(continuation of Example 1 and 2) We demonstrate how to define IDP using a method from the proof of Theorem 1. Let for <sup>ω</sup>*<sup>1</sup> *and <sup>ω</sup>*<sup>2</sup> *the core imputations <sup>α</sup>*¯(*ω*1)=(100.00, 100.00, 52.07) <sup>∈</sup> *<sup>D</sup>*¯ (*ω*1) *and <sup>α</sup>*¯(*ω*2)=(50.00, 95.86, 100.00) <sup>∈</sup> *<sup>D</sup>*¯ (*ω*2) *be chosen. To calculate IDP by Formula* (20)*, we need to define matrix* Π(*η*∗)*, which is*

$$
\Pi(\eta^\*) = \begin{pmatrix} 0.5 & 0.5 \\ 0 & 1 \end{pmatrix}
$$

*for cooperative strategy profile η*∗ *determined by* (14)*.*

*Using formula* (20) *with α*¯ <sup>1</sup> = (100.00, 50.00)*, α*¯ <sup>2</sup> = (100.00, 95.86)*, α*¯ <sup>3</sup> = (52.07, 100.00)*, we obtain*

$$\begin{aligned} \beta\_1 &= (32.50, -40.00)\_\prime \\ \beta\_2 &= (11.86, 5.86)\_\prime \\ \beta\_3 &= (-16.36, 53.14)\_\prime \end{aligned}$$

*where the first component of vector β<sup>i</sup> is the payment to player i in state ω*<sup>1</sup> *and the second component is the payment in state ω*2*. We can easily check that collection of vectors* (*β<sup>i</sup>* : *i* ∈ *N*) *satisfies conditions from Definition 1 of IDP.*

*The approximated cores D*¯ (*ω*1) *and D*¯ (*ω*2) *are strongly subgame-consistent, which is proved in Theorem 1.*

**Remark 7.** *The new method of construction of the characteristic function or the so-called approximated characteristic function proposed in the paper allows not only to find the strongly subgame-consistent subset of the core, but also simplifies calculations. In the example, each player has two actions in any state. Therefore, he has four pure stationary strategies in a stochastic game, and there are 64 strategy profiles in the game. The calculations of maxmin payoff of a coalition in such games is a complicated computational problem. The new approach allows for avoiding these calculations using the values of approximated characteristic function defined in state games to determine the function for a stochastic game.*

### **4. Conclusions**

We have proposed a new method of constructing the characteristic function in stochastic games. The method simplifies calculations in comparison with the previously introduced approaches. An additional advantage of the method is that the core calculated with the values of this characteristic function satisfies strongly subgame consistency. This property positively characterizes the realization of the imputations from the core in a dynamic game process. The property of strongly subgame consistency is applied for set-valued cooperative solutions, like the core. We can briefly

characterize the possible directions for future research in this area. We can also consider additional simplifications in characteristic function definitions, which allow not only to keep the strong subgame consistency properties of the core, but also to reduce the number of calculations defining cooperative stochastic game.

**Author Contributions:** Conceptualization, E.P. and L.P.; methodology, E.P. and L.P.; software, E.P. and L.P.; validation, E.P. and L.P.; formal analysis, E.P. and L.P.; investigation, E.P. and L.P.; resources, E.P. and L.P.; data curation, E.P. and L.P.; writing—original draft preparation, E.P. and L.P.; writing—review and editing, E.P. and L.P.; visualization, E.P. and L.P.; supervision, E.P. and L.P.; project administration, E.P. and L.P.; funding acquisition, E.P. and L.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** The work was supported by Russian Science Foundation, grant no. 17-11-01079.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
