*Article* **Rational Behavior in Dynamic Multicriteria Games**

### **Anna Rettieva 1,2,3,4**


Received: 25 May 2020; Accepted: 27 August 2020; Published: 2 September 2020

**Abstract:** We consider a dynamic, discrete-time, game model where *n* players use a common resource and have different criteria to optimize. To construct a multicriteria Nash equilibrium the bargaining solution is adopted. To design a multicriteria cooperative equilibrium, a modified bargaining scheme that guarantees the fulfillment of rationality conditions is applied. The concept of dynamic stability is adopted for dynamic multicriteria games. To stabilize the multicriteria cooperative solution a time-consistent payoff distribution procedure is constructed. The conditions for rational behavior, namely irrational-behavior-proofness condition and each step rational behavior condition are defined for dynamic multicriteria games. To illustrate the presented approaches, a dynamic bi-criteria bioresource management problem with many players is investigated.

**Keywords:** dynamic games; multicriteria games; Nash bargaining solution; dynamic stability; rational behavior conditions

**MSC:** 22E46

### **1. Introduction**

Mathematical models involving more than one objective [1] seem more adherent to real problems. Players often seek to achieve several goals simultaneously, which can be incomparable. These situations are typical for game-theoretic models in economics and ecology. For example, in bioresource management problems the players wish to maximize their exploitation rates and to minimize the harm to the environment. The multicriteria approach allows determining an optimal behavior in such situations.

In this paper, we consider a dynamic, discrete-time, game model where the players use a common resource and have different criteria to optimize. First, we construct a multicriteria Nash equilibrium applying the bargaining concept (via Nash products [2,3]). Then, we find a multicriteria cooperative equilibrium as a solution of modified bargaining scheme with the multicriteria Nash equilibrium payoffs playing the role of status quo points [4,5]. The presented approach guarantees that the cooperative payoffs of the players are greater than or equal to the multicriteria Nash payoffs.

As is well known, in ecological problems, cooperative behavior leads to a more sparing harvesting rate. The special importance of cooperative behavior for "common resource" exploitation was stressed by Nobel prize winer Ostrom E. [6]. The contract that satisfies the dynamic stability (time-consistency) condition [7,8] is concluded to maintain cooperative behavior. Haurie A. [9] raised the problem of instability of the Nash bargaining solution. The concept of time-consistency (dynamic stability) was introduced by Petrosyan L.A. [7]. Time consistency involves the property that, as the cooperation develops, participants are guided by the same optimality principle at each time moment and hence do not have incentives to deviate from cooperation. Petrosyan L.A. and Danilov N.N. [10] have developed the notion of time-consistent imputation distribution procedure.

An important problem arising in applications is to maintain cooperative behavior. Cooperative behavior and the dynamic stability of cooperative solutions were investigated in a number of papers; see [8,11–15]. The related works that study the cooperation processes in biological, economical and social sciences are [16,17]. An extensive form of multicriteria multistage game and the realization of the IDP-related concepts had been studied in [18,19]. Here, we adopt the concept of dynamic stability for dynamic multicriteria games and construct the payoff distribution procedure.

Still following the cooperative agreement there can be some irrational players who can break out the cooperation. To indemnify players against the loss of profits in this case Yeung D.W.K. [20] introduced the irrational-behavior-proofness condition. In the papers [21,22] the each step rational behavior condition, which is stronger than the Yeung's condition and is easier to verify, was presented. We adopt both conditions for rational behavior for dynamic multicriteria games.

To illustrate the presented approaches, a dynamic bi-criteria bioresource management problem with many players is investigated.

The remainder of the paper has the following structure. Section 2 describes the noncooperative and cooperative solution concepts for a finite horizon multicriteria dynamic game with many players in discrete time. The time-consistent imputation distribution procedure for dynamic multicriteria game is presented in Section 2.3. The conditions for rational behavior are constructed in Section 2.4. A bi-criteria discrete-time dynamic bioresource management model (harvesting problem) with a finite planning horizon is treated in Section 3. Finally, Section 4 provides the basic results and their discussion.

### **2. Dynamic Multicriteria Game with Finite Horizon**

Consider a multicriteria dynamic game with finite horizon in discrete time. Let *N* = {1, ... , *n*} players exploit a common resource for *k* different goals. The state dynamics is in the form

$$\mathbf{x}\_{t+1} = f(\mathbf{x}\_t, \mathbf{u}\_{1t}, \dots, \mathbf{u}\_{nt}), \ \mathbf{x}\_0 = \mathbf{x}\_{\prime} \tag{1}$$

where *xt* ≥ 0 denotes the quantity of resource at a time *t* ≥ 0, *f*(*xt*, *u*1*t*, ... , *unt*) is the natural growth function, and *uit* ∈ *Ui* = [0, ∞) specifies the strategy (resource exploitation rate) of player *i* at a time *t* ≥ 0, *i* ∈ *N*.

Denote *ut* = (*u*1*t*, ... , *unt*). Each player has *k* goals to optimize. The vector payoff functions of players on a finite planning horizon [0, *m*] have the form

$$J\_i = \begin{pmatrix} J\_i^1 = \sum\_{t=0}^m \delta^t g\_i^1(u\_t) \\ \dots \\ J\_i^k = \sum\_{t=0}^m \delta^t g\_i^k(u\_t) \end{pmatrix}, i \in N,\tag{2}$$

where *g j i* (*ut*) ≥ 0 are the instantaneous payoff functions, *j* = 1, ... , *k*, *i* ∈ *N*, *δ* ∈ (0, 1) denotes the discount factor.

### *2.1. Multicriteria Nash Equilibrium*

We design the noncooperative behavior in dynamic multicriteria game applying the Nash bargaining products [2,3]. Therefore, we begin with the construction of guaranteed payoffs which play the role of status quo points.

The possible concepts to determine the guaranteed payoffs for the game with two players were presented in [2]. As it was demonstrated, the case in which the guaranteed payoffs are determined as the Nash equilibrium solutions is the best for the ecological system and also profitable for the players. Therefore, for the multicriteria game with *n* players we adopt this concept of guaranteed payoff points construction. Namely

*G*1 <sup>1</sup>,..., *<sup>G</sup>*<sup>1</sup> *<sup>n</sup>* — are the Nash equilibrium payoffs in the dynamic game *<sup>x</sup>*, *<sup>N</sup>*, {*Ui*}*<sup>n</sup> <sup>i</sup>*=1, {*J*<sup>1</sup> *i* }*n <sup>i</sup>*=1,

... *Gk* <sup>1</sup>,..., *<sup>G</sup><sup>k</sup> <sup>n</sup>* — are the Nash equilibrium payoffs in the dynamic game *<sup>x</sup>*, *<sup>N</sup>*, {*Ui*}*<sup>n</sup> <sup>i</sup>*=1, {*J<sup>k</sup> i* }*n <sup>i</sup>*=1, where the state dynamics is described by (1). Please note that if the Nash equilibrium is not unique, one of the solutions is taken as guaranteed payoff points.

To construct multicriteria payoff functions we adopt the Nash products. The role of the status quo points belongs to the guaranteed payoffs of the players:

$$H\_1(u\_{1t}, \ldots, u\_{nt}) = (f\_1^1(u\_{1t}, \ldots, u\_{nt}) - \mathcal{G}\_1^1) \cdot \ldots \cdot (f\_1^k(u\_{1t}, \ldots, u\_{nt}) - \mathcal{G}\_1^k),$$

$$H\_n(u\_{1t}, \ldots, u\_{nt}) = (f\_n^1(u\_{1t}, \ldots, u\_{nt}) - \mathcal{G}\_n^1) \cdot \ldots \cdot (f\_n^k(u\_{1t}, \ldots, u\_{nt}) - \mathcal{G}\_n^k).$$

**Definition 1.** *A strategy profile u<sup>N</sup> <sup>t</sup>* = (*u<sup>N</sup>* 1*t* , ... , *u<sup>N</sup> nt*) *is a multicriteria Nash equilibrium [2] of problem (1), (2) if*

> *Hi*(*u<sup>N</sup> <sup>t</sup>* ) <sup>≥</sup> *Hi*(*u<sup>N</sup>* <sup>1</sup>*t*,..., *<sup>u</sup><sup>N</sup> <sup>i</sup>*−<sup>1</sup> *<sup>t</sup>*, *uit*, *<sup>u</sup><sup>N</sup> <sup>i</sup>*+<sup>1</sup> *<sup>t</sup>*,..., *<sup>u</sup><sup>N</sup> nt*) ∀*uit* ∈ *Ui* , *i* ∈ *N*. (3)

As it is demonstrated in Appendix A, the presented approach guarantees that the noncooperative payoffs of the players are greater than or equal to the guaranteed ones (for bi-criteria game for simplicity). Hence, the scheme for noncooperative behavior construction is meaningful since multicriteria payoff functions are nonnegative.

### *2.2. Multicriteria Cooperative Equilibrium*

The cooperative equilibrium was obtained as a solution of the Nash bargaining scheme in [23,24]. For the multicriteria dynamic games, the Nash product with the sums of players' payoffs for the criteria in which the sums of their noncooperative payoffs act as the status quo points was applied in [3,4]. In [5], a new approach to determine cooperative behavior in dynamic multicriteria game with asymmetric players was presented. More specifically, the cooperative strategies and payoffs of players are determined from the modified bargaining solution for the entire game horizon. The status quo points are the noncooperative payoffs obtained by the players using the multicriteria Nash equilibrium strategies *u<sup>N</sup> t* :

$$J\_1^N = \left( \begin{array}{c} J\_1^{1N} = \sum\_{t=0}^m \delta^t g\_1^1(u\_t^N) \\ \dots \\ J\_1^{kN} = \sum\_{t=0}^m \delta^t g\_1^k(u\_t^N) \end{array} \right) , \dots , J\_n^N = \left( \begin{array}{c} J\_n^{1N} = \sum\_{t=0}^m \delta^t g\_n^1(u\_t^N) \\ \dots \\ J\_n^{kN} = \sum\_{t=0}^m \delta^t g\_n^k(u\_t^N) \end{array} \right) . \tag{4}$$

The cooperative strategies and payoffs are constructed by solving the following problem:

$$\begin{aligned} & (V\_1^{1c} - l\_1^{1N}) \cdot \dots \cdot (V\_1^{kc} - l\_1^{kN}) + \dots + (V\_n^{1c} - l\_n^{1N}) \cdot \dots \cdot (V\_n^{kc} - l\_n^{kN}) = \\ & = (\sum\_{t=0}^m \delta^t g\_1^1(u\_{1t}^c, \dots, u\_{nt}^c) - l\_1^{1N}) \cdot \dots \cdot (\sum\_{t=0}^m \delta^t g\_1^k(u\_{1t}^c, \dots, u\_{nt}^c) - l\_1^{kN}) + \dots \\ & + (\sum\_{t=0}^m \delta^t g\_n^1(u\_{1t}^c, \dots, u\_{nt}^c) - l\_n^{1N}) \cdot \dots \cdot (\sum\_{t=0}^m \delta^t g\_n^k(u\_{1t}^c, \dots, u\_{nt}^c) - l\_n^{kN}) \to \max\_{u\_{1t}^c, \dots, u\_{nt}^c} \end{aligned} \tag{5}$$

where *J jN <sup>i</sup>* are the noncooperative payoffs given by (4), *i* ∈ *N*, *j* = 1, . . . , *k*.

**Definition 2.** *A strategy profile u<sup>c</sup> <sup>t</sup>* = (*u<sup>c</sup>* 1*t* , ... , *u<sup>c</sup> nt*) *is a rational multicriteria cooperative equilibrium [5] of problem (1), (2) if it is the solution of problem (5).*

As was demonstrated in [5], with the presented approach, the cooperative payoffs of the players are greater than or equal to the multicriteria Nash payoffs. Hence, the conditions of individual rationality *Vjc <sup>i</sup>* ≥ *J jN <sup>i</sup>* , *i* = 1, . . . , *n*, *j* = 1, . . . , *k* are fulfilled.

### *2.3. Dynamic Stability of Cooperative Solution*

Classically, the solution optimality principle for a cooperative game includes: (1) an agreement on a set of cooperative controls, (2) a mechanism to distribute total payoff among the players. In cooperative setting players seek a set of strategies that yields a Pareto optimal solution, hence they maximize the sum of their individual payoffs. To determine the share of each player from the total payoff, that is called the imputation, some solution concepts, such as NM-solution, the core and the Shapley value are applied; see [25–27]. To construct the imputation of the cooperative game the characteristic function reflecting the payoff of any coalition of the players should be determined. There are some approaches how to define the characteristic function, for example *α*, *β*, *γ*—characteristic functions and others (see [8,25,28–30] for details).

In contrast to the classical one the cooperative behavior determination approach presented above needs no distribution of the total cooperative payoff among the players. As it is easily seen, the players seek jointly a set of strategies that optimize their individual payoffs presented as the Nash products. Hence, neither the characteristic function nor the imputation is required. Please note that the problems of construction and stability of the coalitions for multicriteria dynamic games have been also considered; see [31,32]. In the case of coalition games, naturally, the characteristic function and the imputation should be determined. However, in this paper we are not concerned with coalitions' formation processes and the players' cooperative payoffs for the whole game can be calculated without any imputations as

$$J\_1^c(0) = \begin{pmatrix} J\_1^{1c}(0) = \sum\_{t=0}^m \delta^t g\_1^1(u\_t^c) \\ \dots \\ J\_1^{kc}(0) = \sum\_{t=0}^m \delta^t g\_1^k(u\_t^c) \end{pmatrix}, \dots, J\_n^c(0) = \begin{pmatrix} J\_n^{1c}(0) = \sum\_{t=0}^m \delta^t g\_n^1(u\_t^c) \\ \dots \\ J\_n^{kc}(0) = \sum\_{t=0}^m \delta^t g\_n^k(u\_t^c) \end{pmatrix}.$$

where *u<sup>c</sup> <sup>t</sup>* = (*u<sup>c</sup>* 1*t* ,..., *u<sup>c</sup> nt*) are the cooperative strategies determined in (5).

Similarly we determine the cooperative payoffs *J<sup>c</sup> <sup>i</sup>* (*t*), *i* = 1, ... , *n*, for every subgame started from the state *x<sup>c</sup> <sup>t</sup>* at a time *t*.

As is well known, the Nash bargaining scheme is not dynamically stable [9]. To stabilize the cooperative solution in multicriteria dynamic games we adopt the idea of imputation distribution procedure ([7,10,18,19]).

**Definition 3.** *A vector*

$$\beta(t) = (\beta\_1(t), \dots, \beta\_n(t)),$$

*where*

$$\beta\_1(t) = \begin{pmatrix} \beta\_1^1(t) \\ \dots \\ \beta\_1^k(t) \end{pmatrix}, \dots, \beta\_n(t) = \begin{pmatrix} \beta\_n^1(t) \\ \dots \\ \beta\_n^k(t) \end{pmatrix}.$$

*is a payoff distribution procedure (PDP) for the dynamic multicriteria game (1), (2), if*

$$J\_1^c(0) = \sum\_{t=0}^m \delta^t \beta\_1(t), \dots, J\_n^c(0) = \sum\_{t=0}^m \delta^t \beta\_n(t) \tag{6}$$

*or in extended form,*

$$\left\{ \begin{array}{l} J\_{1}^{1c}(0) = \sum\_{t=0}^{m} \delta^{t} \beta\_{1}^{1}(t), \\ \dots \\ J\_{1}^{kc}(0) = \sum\_{t=0}^{m} \delta^{t} \beta\_{1}^{k}(t), \end{array} \right. \\ \left. \begin{array}{l} J\_{n}^{1c}(0) = \sum\_{t=0}^{m} \delta^{t} \beta\_{n}^{1}(t), \\ \dots \\ J\_{n}^{kc}(0) = \sum\_{t=0}^{m} \delta^{t} \beta\_{n}^{k}(t). \end{array} \right. \right. $$

The main idea of this scheme is to distribute the cooperative gain along the game path. Then *β<sup>i</sup>* can be interpreted as the payment to player *i* in all criteria at a time *t*, *i* = 1, . . . , *n*.

**Definition 4.** *A vector β*(*t*)=(*β*1(*t*), ... , *βn*(*t*)) *is a time-consistent [7,10] PDP for dynamic multicriteria game (1), (2), if for every t* ≥ 0

$$J\_1^{\epsilon}(0) = \sum\_{\tau=0}^{t} \delta^{\tau} \beta\_1(\tau) + \delta^{t+1} J\_1^{\epsilon}(t+1),$$

$$J\_n^{\epsilon}(0) = \sum\_{\tau=0}^{t} \delta^{\tau} \beta\_n(\tau) + \delta^{t+1} J\_n^{\epsilon}(t+1),\tag{7}$$

*or in extended form,*

$$\begin{cases} \begin{array}{l} J\_{1}^{1c}(0) = \sum\_{\tau=0}^{t} \delta^{\tau} \beta\_{1}^{1}(\tau) + \delta^{t+1} J\_{1}^{1c}(t+1), \\ \dots \\ J\_{1}^{kc}(0) = \sum\_{\tau=0}^{t} \delta^{\tau} \beta\_{1}^{k}(\tau) + \delta^{t+1} J\_{1}^{kc}(t+1), \\ \dots \\ J\_{n}^{1c}(0) = \sum\_{\tau=0}^{t} \delta^{\tau} \beta\_{n}^{1}(\tau) + \delta^{t+1} J\_{n}^{1c}(t+1), \\ \dots \\ J\_{n}^{kc}(0) = \sum\_{\tau=0}^{t} \delta^{\tau} \beta\_{n}^{k}(\tau) + \delta^{t+1} J\_{n}^{kc}(t+1). \end{array} \end{cases}$$

Here the players following the cooperative trajectory are guided by the same optimal behavior determination approach (5) at each current time and hence do not have any reasonable motivation to deviate from the cooperation agreement.

**Theorem 1.** *A vector β*(*t*)=(*β*1(*t*),..., *βn*(*t*))*, where*

$$
\beta\_1(t) = J\_1^{\underline{c}}(t) - \delta J\_1^{\underline{c}}(t+1),
$$

$$
\cdots \text{ }
$$

$$
\beta\_n(t) = J\_n^{\underline{c}}(t) - \delta J\_n^{\underline{c}}(t+1) \tag{8}
$$

*is a time-consistent payoff distribution procedure for dynamic multicriteria game (1), (2).*

**Proof.** The proof is given for the first player, for others it is similar. Conditions (6) of Definition 3 are satisfied:

$$\sum\_{t=0}^{m} \delta^t \beta\_1(t) = \sum\_{t=0}^{m} \delta^t f\_1^c(t) - \sum\_{t=0}^{m} \delta^{t+1} f\_1^c(t+1) = 0$$
 
$$= \begin{pmatrix} \sum\_{t=0}^{m} \delta^t f\_1^{1c}(t) - \sum\_{t=0}^{m} \delta^{t+1} f\_1^{1c}(t+1) \\ \dots \\ \sum\_{t=0}^{m} \delta^t f\_1^{kx}(t) - \sum\_{t=0}^{m} \delta^{t+1} f\_1^{kx}(t+1) \end{pmatrix} = \begin{pmatrix} f\_1^{1c}(0) \\ \dots \\ f\_1^{kx}(0) \end{pmatrix} = f\_1^c(0)$$

as *J jc* <sup>1</sup> (*m* + 1) = 0, *j* = 1, ... , *k*. Similar considerations are true for *βi*(*t*), *i* = 2, ... , *n*. Hence, *β*(*t*) is a PDP.

Let us prove that this vector is a time-consistent payoff distribution procedure (7). It follows from the equalities

$$\begin{aligned} &\sum\_{\tau=0}^{t} \delta^{\tau} \beta\_{1}(\tau) + \delta^{t+1} f\_{1}^{c}(t+1) = \begin{pmatrix} \displaystyle\sum\_{\tau=0}^{t} \delta^{\tau} \beta\_{1}^{1}(\tau) + \delta^{t+1} f\_{1}^{1c}(t+1) \\ \dotsc \\ \displaystyle\sum\_{\tau=0}^{t} \delta^{\tau} \beta\_{1}^{k}(\tau) + \delta^{t+1} f\_{1}^{kc}(t+1) \end{pmatrix} = \\ &= \begin{pmatrix} \displaystyle\sum\_{\tau=0}^{t} \delta^{\tau} f\_{1}^{1c}(\tau) - \sum\_{\tau=0}^{t} \delta^{\tau+1} f\_{1}^{1c}(\tau+1) + \delta^{t+1} f\_{1}^{1c}(t+1) \\ \dotsc \\ \displaystyle\sum\_{\tau=0}^{t} \delta^{\tau} f\_{1}^{kc}(\tau) - \sum\_{\tau=0}^{t} \delta^{\tau+1} f\_{1}^{kc}(\tau+1) + \delta^{t+1} f\_{1}^{kc}(t+1) \end{pmatrix} = \begin{pmatrix} f\_{1}^{1c}(0) \\ \dotsc \\ f\_{1}^{kc}(0) \end{pmatrix} = f\_{1}^{c}(0), \end{aligned}$$

and similarly for the other players.

### *2.4. Conditions for Rational Behavior*

The conditions to maintain the cooperative (rational) behavior in dynamic games are considered. Since there can be some irrational players who can break out the cooperation, Yeung D.W.K. [20] introduced the condition that protects players against the loss of profits in this case.

**Definition 5.** *The imputation ξ* = (*ξ*1,..., *ξn*) *satisfies irrational-behavior-proofness condition [20] if*

$$\sum\_{\tau=0}^{t} \delta^{\tau} \beta\_i(\tau) + \delta^{t+1} V(i, t+1) \ge V(i, 0) \tag{9}$$

*for all t* ≥ 0*, where β*(*t*)=(*β*1(*t*), ... , *βn*(*t*)) *– time-consistent imputation distribution procedure and V*(*i*, *t*) *is the noncooperative payoff of player i, i* ∈ *N.*

If this condition is satisfied, then each player is irrational-behavior-proof because irrational actions that break the cooperative agreement will not bring his payoff below the initial noncooperative payoff.

In the papers [21,22] for discrete-time problems, a new condition which is stronger than the Yeung's condition and is easier to verify was introduced.

**Definition 6.** *The imputation ξ* = (*ξ*1,..., *ξn*) *satisfies each step rational behavior condition if*

$$
\beta\_i(t) + \delta V(i, t+1) \ge V(i, t) \tag{10}
$$

*for all t* ≥ 0*, where β*(*t*)=(*β*1(*t*), ... , *βn*(*t*))*—time-consistent imputation distribution procedure and V*(*i*, *t*) *is the noncooperative payoff of player i, i* ∈ *N.*

The proposed condition offers an incentive to each player to maintain cooperation because at every step she gains more from cooperation than from noncooperative behavior.

Here, we adopt rationality conditions for dynamic multicriteria games. Since no imputation procedure is required with the approach presented above, let us rewrite the definitions.

**Definition 7.** *The multicriteria cooperative solution Jc*(*t*)=(*J<sup>c</sup>* <sup>1</sup>(*t*), ... , *<sup>J</sup><sup>c</sup> <sup>n</sup>*(*t*)) *satisfies the irrational behavior proofness condition if*

$$\sum\_{\tau=0}^{t} \delta^{\tau} \beta\_i(\tau) + \delta^{t+1} J\_i^N(t+1) \ge J\_i^N(0) \tag{11}$$

*for all <sup>t</sup>* <sup>≥</sup> <sup>0</sup>*, where <sup>β</sup>*(*t*)=(*β*1(*t*), ... , *<sup>β</sup>n*(*t*)) *– time-consistent payoff distribution procedure (8) and <sup>J</sup><sup>N</sup> <sup>i</sup>* (*t*) *is the noncooperative payoff (4) of player i, i* ∈ *N. Or in extended form,*

$$\begin{cases} \begin{aligned} \displaystyle \sum\_{\tau=0}^{t} \delta^{\tau} \beta\_{1}^{1}(\tau) + \delta^{t+1} J\_{1}^{1N}(t+1) &\geq J\_{1}^{1N}(0), \\ \displaystyle \dots \\ \displaystyle \sum\_{\tau=0}^{t} \delta^{\tau} \beta\_{1}^{k}(\tau) + \delta^{t+1} J\_{1}^{kN}(t+1) &\geq J\_{1}^{kN}(0), \\ \dotsc \\ \displaystyle \left\{ \begin{aligned} \displaystyle \sum\_{\tau=0}^{t} \delta^{\tau} \beta\_{n}^{1}(\tau) + \delta^{t+1} J\_{n}^{1N}(t+1) &\geq J\_{n}^{1N}(0), \\ \dotsc \\ \dotsc \\ \displaystyle \sum\_{\tau=0}^{t} \delta^{\tau} \beta\_{n}^{k}(\tau) + \delta^{t+1} J\_{n}^{kN}(t+1) &\geq J\_{n}^{kN}(0). \end{aligned} \end{cases} \end{cases}$$

**Definition 8.** *The multicriteria cooperative solution Jc*(*t*)=(*J<sup>c</sup>* <sup>1</sup>(*t*), ... , *<sup>J</sup><sup>c</sup> <sup>n</sup>*(*t*)) *satisfies each step rational behavior condition if*

$$\{\beta\_i(t) + \delta J\_i^N(t+1) \ge J\_i^N(t)\tag{12}$$

*for all <sup>t</sup>* <sup>≥</sup> <sup>0</sup>*, where <sup>β</sup>*(*t*)=(*β*1(*t*), ... , *<sup>β</sup>n*(*t*)) *– time-consistent payoff distribution procedure (8) and <sup>J</sup><sup>N</sup> <sup>i</sup>* (*t*) *is the noncooperative payoff (4) of player i, i* ∈ *N. Or in extended form,*

$$\begin{cases} \begin{aligned} \beta\_1^1(t) + \delta f\_1^{1N}(t+1) &\geq f\_1^{1N}(t), \\ \vdots \\ \beta\_1^k(t) + \delta f\_1^{kN}(t+1) &\geq f\_1^{kN}(t), \end{aligned} \\\\ \cdots \\ \begin{aligned} \label{10.10} \beta\_n^1(t) + \delta f\_n^{1N}(t+1) &\geq f\_n^{1N}(t), \\ \vdots \\ \beta\_n^k(t) + \delta f\_n^{kN}(t+1) &\geq f\_n^{kN}(t). \end{aligned} \end{cases}$$

For problem (1), (2) the conditions for rational behavior (11) and (12) can be rewritten as

$$(1 - \delta^{t+1})(f\_i^\varepsilon(0) - f\_i^N(0)) + \delta^{t+1} \sum\_{\tau=0}^t \delta^\tau (g\_i(\mu\_\tau^\varepsilon) - g\_i(\mu\_\tau^N)) \ge 0 \,\,\forall t, \,\, i \in \mathcal{N},\tag{13}$$

$$\left( (1 - \delta)(f\_i^{\varepsilon}(t) - f\_i^N(t)) + \delta^{t+2}(g\_i(u\_t^{\varepsilon}) - g\_i(u\_t^N)) \right) \ge 0 \,\,\forall t, \; i \in N,\tag{14}$$

where

$$\mathfrak{g}\_i(u) = \begin{pmatrix} \mathfrak{g}\_i^1(u) \\ \vdots \\ \mathfrak{g}\_i^k(u) \end{pmatrix}.$$

Since with the presented cooperative behavior construction approach individual rationality conditions are satisfied, then the first parts of both inequalities are nonnegative. Hence, the each step rational behavior conditions is fulfilled if *gi*(*u<sup>c</sup> <sup>t</sup>*) <sup>−</sup> *gi*(*u<sup>N</sup> <sup>t</sup>* ) ∀*t*, *i* ∈ *N*, and the irrational behavior proofness condition is true if *<sup>t</sup>* ∑ *τ*=0 *δτ*(*gi*(*u<sup>c</sup> <sup>τ</sup>*) <sup>−</sup> *gi*(*u<sup>N</sup> <sup>τ</sup>* )) ∀*t*, *i* ∈ *N*. As it easily seen, the each step rational behavior condition yields the Yeung's condition.

Next, we consider a dynamic bi-criteria model related with the bioresource management problem (harvesting) to illustrate the suggested concepts.

### **3. Dynamic Bi-Criteria Resource Management Problem**

Consider a bi-criteria discrete-time dynamic bioresource management model with many players. Let *n* players (countries or firms) be exploiting a bioresource on a finite time horizon [0, *m*]. The population evolves according to the equation

$$\mathbf{x}\_{t+1} = \varepsilon \mathbf{x}\_t - \mu\_{1t} - \dots - \mathbf{u}\_{nt}, \ \mathbf{x}\_0 = \mathbf{x}, \tag{15}$$

where *xt* ≥ 0 is the population size at a time *t* ≥ 0, *ε* ≥ 1 denotes the natural birth rate, and *uit* ≥ 0 specifies the catch strategy of player *i* at a time *t* ≥ 0, *i* ∈ *N* = {1, . . . , *n*}.

Each player seeks to achieve two goals: to maximize the profit from resource sales and to minimize the catching costs. It will be assumed that the players have different market prices but the same costs that depend quadratically on the exploitation rate of each player. The vector payoffs of the players on the finite planning horizon take the form

$$J\_1 = \left( \begin{array}{c} J\_1^1 = \sum\_{t=0}^m \delta^t p\_1 u\_{1t} \\\ J\_1^2 = -\sum\_{t=0}^m \delta^t c u\_{1t}^2 \end{array} \right), \dots, J\_n = \left( \begin{array}{c} J\_n^1 = \sum\_{t=0}^m \delta^t p\_n u\_{nt} \\\ J\_n^2 = -\sum\_{t=0}^m \delta^t c u\_{nt}^2 \end{array} \right)' \tag{16}$$

where for *i* ∈ *N*, *pi* ≥ 0 is the market price of the resource for player *i*, *c* ≥ 0 indicates the catching cost, and *δ* ∈ (0, 1) denotes the discount factor.

### *3.1. Multicriteria Nash Equilibrium*

First, we construct the guaranteed payoffs using one of the modifications from [2]. The guaranteed payoff points *G*<sup>1</sup> <sup>1</sup>, ... , *<sup>G</sup>*<sup>1</sup> *<sup>n</sup>* will be defined as the Nash equilibrium in the game *<sup>N</sup>*, {*Ui*}*<sup>n</sup> <sup>i</sup>*=1, {*J*<sup>1</sup> *i* }*n <sup>i</sup>*=1. Applying the Bellman principle and assuming the linear form of the strategies and value functions, we obtain the Nash equilibrium strategies

$$u\_{1t} = \dots = u\_{nt} = \frac{\varepsilon - 1}{n - 1} x\_{1t}$$

and the dynamics becomes

$$\mathbf{x}\_t = \left(\frac{n-\varepsilon}{n-1}\right)^t \mathbf{x}\_0.$$

Then the guaranteed payoff points take the form

$$\mathbf{G}\_1^1 = p\_1 A \mathbf{x}\_0, \dots, \mathbf{G}\_n^1 = p\_n A \mathbf{x}\_{0\prime} \tag{17}$$

where

$$A = \frac{\varepsilon - 1}{n - 1} \frac{(\delta(n - \varepsilon))^{m + 1} + (n - 1)^{m + 1}}{(n - 1)^m (\delta(n - \varepsilon) - n + 1)} \dots$$

Similarly, determining the Nash equilibrium in the game with the second criteria of all players *<sup>N</sup>*, {*Ui*}*<sup>n</sup> <sup>i</sup>*=1, {*J*<sup>2</sup> *i* }*n <sup>i</sup>*=1, yields *n* more guaranteed payoffs points

$$\mathbf{G}\_1^2 = \dots = \mathbf{G}\_n^2 = -\mathbf{c}\mathbf{G}\mathbf{x}\_{0\prime}^2\tag{18}$$

where

$$\begin{aligned} G &= \left(\frac{2n - \varepsilon^2 + \varepsilon\sqrt{4n^2 + \varepsilon^2 - 4n}}{n(-\varepsilon + \sqrt{4n^2 + \varepsilon^2 - 4n})}\right)^2 \\ &\times \frac{(2\delta n)^{m+1} - (\varepsilon - \sqrt{4n^2 + \varepsilon^2 - 4n})^{m+1}}{(\varepsilon - \sqrt{4n^2 + \varepsilon^2 - 4n})^m (2\delta n - \varepsilon + \sqrt{4n^2 + \varepsilon^2 - 4n})}. \end{aligned}$$

In accordance with Definition 1, for designing the multicriteria Nash equilibrium of the game (15), (16) the following problem has to be solved:

$$p\_1 c (\sum\_{t=0}^m \delta^t u\_{1t} - A\boldsymbol{x}) (-\sum\_{t=0}^m \delta^t u\_{1t}^2 + G\boldsymbol{x}^2) \to \max\_{u\_{1t}},$$

$$p\_n c (\sum\_{t=0}^m \delta^t u\_{nt} - A\boldsymbol{x}) (-\sum\_{t=0}^m \delta^t u\_{nt}^2 + G\boldsymbol{x}^2) \to \max\_{u\_{nt}}.$$

Considering the process starting from one-stage game to *m*-stage one and seeking the strategies in linear form, we obtain the multicriteria Nash equilibrium.

**Proposition 1.** *The multicriteria Nash equilibrium strategies in problem (15), (16) have the form u<sup>N</sup> it* = *<sup>γ</sup><sup>N</sup> im*−*t xt, i* ∈ *N,*

$$\gamma\_{1t}^{N} = \dots = \gamma\_{nt}^{N} = \gamma\_{t}^{N} = \frac{\varepsilon^{t-1} \gamma\_{1}^{N}}{1 + n \gamma\_{1}^{N} \sum\_{j=0}^{t-2} \varepsilon^{j}}, \ t = 2, \dots, m. \tag{19}$$

*The players' strategy at the last stage γ<sup>N</sup>* <sup>1</sup> *is determined from the following equation*

$$\begin{aligned} \left[3\varepsilon^{2(m-1)}\sum\_{j=0}^{m-1}\delta^j - 2\varepsilon^{m-1}n\sum\_{j=0}^{m-2}\varepsilon^j A - n^2(\sum\_{j=0}^{m-2}\varepsilon^j)^2 G\right](\gamma\_1^N)^2 - \delta\\ -2(\varepsilon^{m-1}A + \sum\_{j=0}^{m-2}\varepsilon^j G n)\gamma\_1^N - G = 0. \end{aligned}$$

### *3.2. Cooperative Equilibrium*

To construct the cooperative payoffs and strategies the modified bargaining scheme will be applied [5]. First, we have to determine the noncooperative payoffs as the ones gained by the players using the multicriteria Nash strategies. Then, we construct the sum of the Nash products with the noncooperative payoffs of players acting as the status quo points.

In view of Proposition 1, the noncooperative payoffs have the form

$$\begin{aligned} J\_i^{1N}(\mathbf{x}) &= \sum\_{t=0}^{m-1} \delta^t p\_i \gamma\_{m-t}^N \mathbf{x}, \; i \in \mathbb{N}, \\ J\_1^{2N}(\mathbf{x}) &= \dots = J\_n^{2N}(\mathbf{x}) = -c \sum\_{t=0}^{m-1} \delta^t (\gamma\_{m-t}^N)^2 \mathbf{x}^2. \end{aligned}$$

In accordance with Definition 2, for designing the multicriteria cooperative equilibrium the following problem has to be solved:

$$p\_1(\sum\_{t=0}^m \delta^t u\_{1t}^\varepsilon - P\mathbf{x})(-\sum\_{t=0}^m \delta^t (u\_{1t}^\varepsilon)^2 + K\mathbf{x}^2) + \dots \\ + p\_n(\sum\_{t=0}^m \delta^t u\_{nt}^\varepsilon - P\mathbf{x})(-\sum\_{t=0}^m \delta^t (u\_{nt}^\varepsilon)^2 + K\mathbf{x}^2) \\ \to \max\_{u\_{1t}^\varepsilon, \dots, u\_{nt}^\varepsilon} \mathcal{N}(u\_{nt}^\varepsilon)$$

where *<sup>P</sup>* <sup>=</sup> *<sup>m</sup>*−<sup>1</sup> ∑ *t*=0 *δt γ<sup>N</sup> <sup>m</sup>*−*t*, *<sup>K</sup>* <sup>=</sup> *<sup>m</sup>*−<sup>1</sup> ∑ *t*=0 *δt* (*γ<sup>N</sup> <sup>m</sup>*−*t*)2.

Considering the process starting from one-stage game to *m*-stage one and seeking the strategies in linear form, we construct cooperative behavior.

**Proposition 2.** *The multicriteria cooperative equilibrium strategies in problem (15), (16) take the form uc it* = *<sup>γ</sup><sup>c</sup> im*−*t xt, i* ∈ *N,*

$$\gamma\_{1t}^{\varepsilon} = \dots = \gamma\_{nt}^{\varepsilon} = \gamma\_t^{\varepsilon} = \frac{\varepsilon^{t-1} \gamma\_1^{\varepsilon}}{1 + n \gamma\_1^{\varepsilon} \sum\_{j=0}^{t-2} \varepsilon^j}, \ t = 2, \dots, m. \tag{20}$$

*The players' strategy at the last stage γ<sup>c</sup>* <sup>1</sup> *is determined from the following equation*

$$\begin{aligned} \left[3\varepsilon^{2(m-1)}\sum\_{j=0}^{m-1}\delta^j - 2\varepsilon^{m-1}n\sum\_{j=0}^{m-2}\varepsilon^j P - n^2(\sum\_{j=0}^{m-2}\varepsilon^j)^2 K\right](\gamma\_1^{\varepsilon})^2 - \gamma\\ -2(\varepsilon^{m-1}P + \sum\_{j=0}^{m-2}\varepsilon^j K n)\gamma\_1^{\varepsilon} - K = 0. \end{aligned}$$

*3.3. Dynamic Stability and Conditions for Rational Behavior*

**Proposition 3.** *The time-consistent payoff distribution procedure in the problem (15), (16) takes the form*

$$\beta\_i(t) = \left(\begin{array}{c} \beta\_i^1(t) \\ \beta\_i^2(t) \end{array}\right), \ t = 1, \dots, n, \ t = 0, \dots, m - 1,$$

*where*

$$\beta\_i^1(t) = p\_i \delta^t \gamma\_{m-t}^\varepsilon \mathbf{x}\_t + p\_i(1-\delta) \sum\_{\tau=t+1}^{m-1} \delta^\tau \gamma\_{m-\tau}^\varepsilon \mathbf{x}\_{\tau},$$

$$\beta\_i^2(t) = -c\delta^t (\gamma\_{m-t}^\varepsilon)^2 \mathbf{x}\_t^2 - c(1-\delta) \sum\_{\tau=t+1}^{m-1} \delta^\tau (\gamma\_{m-\tau}^\varepsilon)^2 \mathbf{x}\_\tau^2, \ i = 1, \dots, n.$$

**Proof.** follows from Theorem 1 and the form of cooperative strategies given in Proposition 2. **Proposition 4.** *The conditions for rational behavior in problem (15), (16) are fulfilled if γ<sup>c</sup>* <sup>1</sup> <sup>≥</sup> *<sup>γ</sup><sup>N</sup>* 1 *.*

**Proof.** The irrational-behavior-proofness condition (13) in problem (15), (16) takes the form

$$(1 - \delta^{t+1})(l\_i^\varepsilon(0) - l\_i^N(0)) + \delta^{t+1} \begin{pmatrix} p\_i \sum\_{\tau=0}^t \delta^\tau (u\_{i\tau}^\varepsilon - u\_{i\tau}^N) \\ \sum\_{\tau=0}^t \delta^\tau (u\_{i\tau}^\varepsilon)^2 - (u\_{i\tau}^N)^2 \\ -c \sum\_{\tau=0}^t \delta^\tau ((u\_{i\tau}^\varepsilon)^2 - (u\_{i\tau}^N)^2) \end{pmatrix} \ge 0,\tag{21}$$

and each step rational behavior condition becomes

$$(1 - \delta)(l\_i^\varepsilon(t) - l\_i^N(t)) + \delta^{t+2} \begin{pmatrix} p\_i(u\_{it}^\varepsilon - u\_{it}^N) \\ -c((u\_{it}^\varepsilon)^2 - (u\_{it}^N)^2) \end{pmatrix} \ge 0. \tag{22}$$

Let us consider each step rational behavior condition for the first criterium. Since the individual rationality conditions are fulfilled the first part of the inequality is positive. Hence, the sigh of the *uc it* <sup>−</sup> *<sup>u</sup><sup>N</sup> it* need to be checked. In accordance with Propositions 1 and 2

$$u\_{it}^{\varepsilon} - u\_{it}^{N} = \frac{(\varepsilon - 1)^2 \varepsilon^{m - t - 1} (\gamma\_1^{\varepsilon} - \gamma\_1^N) \mathbf{x}\_t}{(\varepsilon - 1 + n\gamma\_1^{\varepsilon}(\varepsilon^{m - t - 1} - 1))(\varepsilon - 1 + n\gamma\_1^N(\varepsilon^{m - t - 1} - 1))}$$

that is nonnegative if *γ<sup>c</sup>* <sup>1</sup> <sup>≥</sup> *<sup>γ</sup><sup>N</sup>* 1 .

For the second criterium the sigh of <sup>−</sup>*c*(*u<sup>c</sup> it* <sup>−</sup> *<sup>u</sup><sup>N</sup> it* )(*u<sup>c</sup> it* + *<sup>u</sup><sup>N</sup> it* ) needs to be checked. Since, the cooperative solution satisfy the individual rationality conditions for each stage *t* the first part of the Inequality (22) takes the form

$$-\mathcal{L} \sum\_{\tau=t}^{m-1} \delta^\tau (u\_{i\tau}^\varepsilon - u\_{i\tau}^N) (u\_{i\tau}^\varepsilon + u\_{i\tau}^N) \ge 0$$

that yields

$$-c\delta^{t}(\boldsymbol{u}\_{it}^{c}-\boldsymbol{u}\_{it}^{N})(\boldsymbol{u}\_{it}^{c}+\boldsymbol{u}\_{it}^{N}) \geq c\sum\_{\tau=t+1}^{m-1} \delta^{\tau}(\boldsymbol{u}\_{i\tau}^{c}-\boldsymbol{u}\_{i\tau}^{N})(\boldsymbol{u}\_{i\tau}^{c}+\boldsymbol{u}\_{i\tau}^{N})\,. \tag{23}$$

The right hand side of (23) is nonnegative if *u<sup>c</sup> <sup>i</sup><sup>τ</sup>* <sup>−</sup> *<sup>u</sup><sup>N</sup> <sup>i</sup><sup>τ</sup>* ≥ 0 ∀*τ* = *t* + 1, ... , *m*, that again is true if *γc* <sup>1</sup> <sup>≥</sup> *<sup>γ</sup><sup>N</sup>* 1 .

As the each step rational behavior condition is stronger than the Yeung's condition, this yields the fulfillment of irrational-behavior-proofness condition.

### *3.4. Modelling*

We have performed numerical simulation for symmetric case with the following parameters:

$$m = 15, \ n = 5, \ \varepsilon = 1.3, \ p\_1 = \dots = p\_5 = 100, \ \varepsilon = 50, \ \delta = 0.8.$$

These parameters are typical for the fish species in Karelian lake [33]. In the papers [22,34,35] the natural growth function of the population was estimated and its linear approximation with the appropriate parameter *ε* is applied in this paper. It should be stressed that the price and the cost parameters do not influence the form of the players' strategies, hence can be taken as any values.

The presented figures illustrate our theoretical results. Namely Figure 1 shows the dynamics of the population size, while Figure 2 presents the players' strategies for noncooperative and cooperative cases. As one can notice cooperative behavior improves the ecological situation as it limits bioresource exploitation. The population size increases in both settings but under cooperation much quicker (from *x*<sup>0</sup> = 50,000 to 110,000).

**Figure 1.** Population size: dark—cooperation, light—Nash equilibrium.

**Figure 2.** Players' strategies: dark—cooperation, light—Nash equilibrium.

Moreover, as Figure 2 shows the cooperative behavior is beneficial for the players. To emphasize the last conclusion the instantaneous payoffs (*δ<sup>t</sup> g*1 <sup>1</sup>(*t*)) for both noncooperative and cooperative settings are presented in Figure 3. As it is easily seen the players' cooperative strategies (the catch) are larger than the noncooperative ones and some convergence can be noticed at the end of the planning horizon. It is related to the fact that the asymptotic values of the players' strategies in both cases (*γ<sup>N</sup> <sup>t</sup>* , *γ<sup>c</sup> <sup>t</sup>* ) are (*ε* − 1)/*n*. The instantaneous payoffs decrease in both settings because of the discounting but under cooperation much slower (from 60,000 to 4000 monetary units).

**Figure 3.** Instantaneous payoffs: dark—cooperation, light—Nash equilibrium.

Since the player's strategy at the last stage under cooperation is larger that noncooperative one the conditions for rational behavior are fulfilled. Figure 4 shows how to distribute the cooperative gain among the game path (PDP *β*<sup>1</sup> <sup>1</sup>(*t*)). It is quiet interesting that PDP differs from instantaneous payoffs very slightly. Please note that changing the number of players, time horizon and other parameters gives the similar pictures, hence are not presented.

**Figure 4.** Instantaneous payoffs (light line) and PDP (*β*<sup>1</sup> <sup>1</sup>(*t*)).

### **4. Conclusions**

The problem of dynamic stability in multicriteria dynamic games with finite horizon has been investigated. First, we have evaluated the multicriteria Nash equilibrium strategies. Second, we constructed the multicriteria cooperative strategies and payoffs via the modified bargaining scheme. We adopted the concept of dynamic stability for multicriteria dynamic games and have constructed the payoff distribution procedure. The conditions for rational behavior have been modified for dynamic multicriteria games.

The approaches presented in the paper give the possibility to find optimal solutions in various multicriteria dynamic games. To show one of the possible applications, we studied a bi-criteria discrete-time bioresource management problem, where the players differ in their aims. Multicriteria Nash and cooperative equilibria strategies have been derived analytically in linear forms. Hence, they can be directly applied to concrete populations with different values of parameters. As cooperative behavior improves the ecological situation, the dynamic stability concept has been applied to stabilize the cooperative agreement. The time-consistent payoff distribution procedure has been also derived analytically. The fulfillment of conditions for rational behavior has been proved.

The presented theoretical constructions can be applied for different management problems, where the decision maker often has several criteria to optimize. For example, to maximize the profit and to minimize the production cost or the labor involved in the manufacture. Moreover, the constructed payoff distribution procedure gives an incentive to maintain the cooperative agreement that is extremely important for management problems with common resources. Hence, the results presented in this paper can be applied in biological, economical and social game-theoretic models with vector payoffs.

**Funding:** This research was supported by the Shandong province "Double-Hundred Talent Plan" (No. WST2017009) and Russian Science Foundation (No. 17-11-01079) on studying the dynamic stability.

**Conflicts of Interest:** The author declares no conflict of interest.

### **Appendix A. Nash Equilibrium Meaningful**

Consider problem (3) with two criteria for simplicity with the constraints *J j <sup>i</sup>* <sup>≥</sup> *<sup>G</sup><sup>j</sup> i* , *i* ∈ *N*, *j* = 1, 2. Let deal with a problem for the first player, to construct noncooperative behavior we should maximize multicriteria payoff function or equally minimize

$$H\_1(\boldsymbol{u}\_{1t}, \boldsymbol{u}\_{2t}^N, \dots, \boldsymbol{u}\_{nt}^N) = (-\boldsymbol{l}\_1^1(\boldsymbol{u}\_{1t}, \boldsymbol{u}\_{2t}^N, \dots, \boldsymbol{u}\_{nt}^N) + \boldsymbol{G}\_1^1)(\boldsymbol{l}\_1^2(\boldsymbol{u}\_{1t}, \boldsymbol{u}\_{2t}^N, \dots, \boldsymbol{u}\_{nt}^N) - \boldsymbol{G}\_1^2) \to \min\_{\boldsymbol{u}\_{1t}}$$

subject to (here and below (*u*1*t*, *u<sup>N</sup>* 2*t* ,..., *u<sup>N</sup> nt*) is omitted)

$$\begin{aligned} G\_1^1 - f\_1^1 &\le 0, \\ G\_1^2 - f\_1^2 &\le 0, \\ u\_{1t} &\ge 0. \end{aligned}$$

The Kuhn–Tucker (KT) conditions are applicable, and the Lagrangian for each time instant *t* = 1, . . . , *m* takes the form (*t* is omitted)

$$L = (-f\_1^{1c} + G\_1^1)(f\_1^2 - G\_1^2) + \lambda\_1(G\_1^1 - f\_1^1) + \lambda\_2(G\_1^2 - f\_1^2).$$

The KT conditions take the forms

$$-(l\_1^1)'(l\_1^2 - G\_1^2 + \lambda\_1) + (l\_1^2)'(-l\_1^1 + G\_1^1 - \lambda\_2) \ge 0,$$

$$\mu\_1 \left[ -(l\_1^1)'(l\_1^2 - G\_1^2 + \lambda\_1) + (l\_1^2)'(-l\_1^1 + G\_1^1 - \lambda\_2) \right] = 0,\tag{A1}$$

$$J\_1^1 - G\_1^1 \ge 0,\tag{A2}$$

$$
\lambda\_1 (f\_1^1 - G\_1^1) = 0,\tag{A3}
$$

$$J\_1^2 - G\_1^2 \ge 0,\tag{A4}$$

$$
\lambda\_2(f\_1^2 - G\_1^2) = 0,\tag{A5}
$$

$$
\mu\_1 \ge 0, \ \lambda\_i \ge 0, \ i = 1, 2.
$$

1. Consider the case *λ*<sup>1</sup> > 0, *λ*<sup>2</sup> > 0. From (A3) and (A5) it follows that

$$J\_1^1 - G\_1^1 = 0 \,, \, J\_1^2 - G\_1^2 = 0 \,.$$

If *u*<sup>1</sup> is equal to zero, then conditions (A2), (A4) are not satisfied (under assumptions that *J j* (0, *<sup>u</sup><sup>N</sup> t* , ... , *u<sup>N</sup> nt*) = 0, *j* = 1, 2). Hence, *u*<sup>1</sup> > 0. In this case, the noncooperative behavior coincides with the guaranteed one, and the goal function *H*<sup>1</sup> is equal to zero.

2. Consider the case *λ*<sup>1</sup> = 0, *λ*<sup>2</sup> > 0. From (A5) it follows that

$$f\_1^2 - G\_1^2 = 0.$$

By analogy, *u*<sup>1</sup> > 0, and from (A1) it follows that

$$(J\_1^2)'(-J\_1^1 + G\_1^1 - \lambda\_2) = 0.$$

Consequently,

$$-J\_1^1 + G\_1^1 = \lambda\_2 > 0,$$

which obviously contradicts condition (A2).

Similarly, in the case where *λ*<sup>1</sup> > 0, *λ*<sup>2</sup> = 0, we will naturally arrive in contradiction.

3. Finally, consider the case *λ*<sup>1</sup> = 0, *λ*<sup>2</sup> = 0. Similarly, it is easy to check that *u*<sup>1</sup> > 0, the minimum is achieved at an interior point and can be found via the first-order optimality condition:

$$-(J\_1^1)'(J\_1^2 - G\_1^2) + (J\_1^2)'(-J\_1^1 + G\_1^1) = 0\dots$$

Here, the goal function becomes

$$H\_1 = \left(-J\_1^1 + G\_1^1\right) \left(J\_1^2 - G\_1^2\right) \dots$$

which is less than zero.

Similarly, for other players. Thus, the presented above scheme guarantees that the solution satisfies the conditions *J j <sup>i</sup>* <sup>≥</sup> *<sup>G</sup><sup>j</sup> i* , *i* ∈ *N*, *j* = 1, 2.

### **References**


© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
