Received: 30 March 2021 Accepted: 10 May 2021 Published: 17 May 2021

**Citation:** Halkos, George E., and George J. Papageorgiou. 2021. Some Results on the Control of Polluting Firms According to Dynamic Nash and Stackelberg Patterns . *Economies* 9: 77. https://doi.org/10.3390/

based on the Nash equilibrium (Wood 2010). Game theory can also be applied to pollution control problems (Halkos 1996; Gromova et al. 2016; Wang et al. 2011).

For instance, Zhao and Jin (2021) constructed a game model regarding environmental pollution control between enterprises and local governments, while taking into consideration the constraint of the evaluation mechanism related to ecological values. The authors obtained a set of Nash equilibrium solutions as feedback and found that enterprises' pollution could be reduce by the influence of governmental ecological ethos, as well as by efforts for environmental protection. In addition, according to the authors, pollution management can be improved if the production cost of an enterprise can be reduced, alongside a reduction in pollution and severe punishment for wrong-doing.

In addition, game theory can be used to investigate the relationship between the behavior of a government and the strategies that supply chain enterprises follow, when it comes to the reduction of carbon emissions. More specifically, Chen and Hu (2018) develop an evolutionary game theory model, regarding the interactions between manufacturers and governments, investigating the effect that governments can have on the behavior of manufacturers, as well as examining whether carbon tax and subsidy mechanisms used by government will lead to low-carbon manufacturing. The authors conclude that a dynamic tax and dynamic subsidy mechanism can lead to the adoption of low-carbon manufacturing, while a static tax and subsidy mechanism is not effective and does not have the necessary positive impact on low-carbon production.

National policies on pollution can be modeled and investigated using a game theoretical approach. Schüller et al. (2017) focused on the interactions between EU countries and their choices regarding green policy investments, presenting different theoretical models: a Nash game, where the investment is made based on minimal expected costs, and two imitation approaches, in which a country imitates the investments of its neighbors, either taking into consideration, or not, its neighbors' costs or profits. The authors conclude that a reduction in pollution stock can be achieved, due to external forces increasing pollution costs. In addition, incentives can be found that will make a country friendlier to the environment.

Differential game models can be used in order to effectively design conflicting situations that exist between polluters and pollution victims. For instance, Halkos and Papageorgiou (2014) set up such a differential game model that involves a country's polluting firms and its social planning, and identified the analytical expressions regarding players' control and the state of the stock variable, i.e., the volume of the polluting firms. In addition, after the transformation of the Nash game into a Stackelberg game, the authors found that the conflict becomes more intense in the Stackelberg game, in which playing as leader is preferred.

The choice of differential game models, in order to efficiently design conflicting situations between polluters and the victims of pollution, is the rule rather than the exception. In this paper, we use the efficiency of differential game models to study the dynamic interactions between the polluting firms in a country and social planning mechanisms in the same country. The strength of the polluting firms as a group changes over time and is measured by the unified volume of active polluters, the transactions made among them, how dangerous to environmental amenities the polluting firms are as a unified group, and so on. New polluting firms are initiated and encouraged by the existing firms. Regarding polluter's attrition, their decay rate is affected by their own actions and by the counter-pollution actions of the home country. The essential targets of the home country are to derive utility from the polluting firms' emissions reduction, but the home country faces substantial costs in combating the polluters and suffers from disutility stemming from the size of the polluters. Conversely, each polluting firm wants to maximize the size of the group of polluters, as well as the utility stemming from emissions. The argument that each polluting firm wants to maximize the size of the group seems somewhat extraordinary. In the next paragraph, we will try to shed some light on this intuition.

The model we deal with in this paper is based on the description of the evolution over time of polluting firms considered as a stock. A major distinction from other models is that there are two decision makers, whereas in former models there is only one, i.e., the social planner. Furthermore, here we consider the strategic interaction of the two decision makers with conflicting interests. Therefore, it is convenient to refer to the decision maker on the polluting firms' side as "the Polluting Organization (PO)". On the other side, the opponent of the polluting organization is assumed to be the government of the country in which the polluting activities take place. Both PO and the target country have (at least) one strategic weapon in order to alter the status quo. The PO can choose the rate of emissions and the target country can choose the rate of the measures used against polluting emissions. In other words, we speak of the strong conflict between the social planner (the government) on the one hand and the polluting firms as an organization on the other. Therefore, it is plausible to conclude that the larger the number of polluting firms, the more powerful the polluting organization (PO). Obviously, the economic benefits for each polluting firm stemming from a higher level of pollution (as a part of overall emissions with lower abatement levels) are higher.

In this study we deal with a special class of differential games called state–separable games. State-separable differential games belong to the special class of dynamic games which allow, in most cases, derivation of the Nash solutions in explicit form. The advantage in obtaining analytical solutions, according to Dockner et al. (2000), is of great importance, because the derived mathematical expressions of the solutions are crucial for the study of the qualitative properties of equilibrium.

Due to the simplicity of the structure, state separable differential games are characterized by the linearity of the objective functional with respect to the state variable(s) and by no interaction between control and state variables (Dockner et al. 1985). An important property of state separable games is related to the information structure employed. The importance of tis property is that the open loop Nash solution coincides with the closed loop (Markovian) Nash solution.

Another important property hinges on the way the game is played, i.e., simultaneously (Nash) or hierarchically (Stackelberg). As is known (e.g., Ba¸sar and Olsder 1999; Dockner et al. 2000), in the Stackelberg games, the adjoint variable of the leader with respect to the adjoint variable of the follower plays a crucial role in the solution process, but due to state separability the interconnection between these variables vanishes.

In the rest of the paper, we determine the Nash and the Stackelberg solutions of the environmental differential game and the state–separability advantage allows us to note some useful propositions and to carry out a sensitivity analysis. Regarding the design efficient counter-pollution actions against the polluting firms of a country, the model parameters of the game and the relevance of the two solutions offer useful information.

The paper is organized as follows. In Section 2, we set up the basic model. Section 3 considers the solutions of the Nash equilibrium and performs a simple sensitivity analysis. In Section 4, we compute the analytical expressions of the Stackelberg equilibrium, while the polluting firms lead and the social planner of home country follows. Section 5 compares the two solution strategies, while the last section concludes the paper.

#### **2. The Model**

In the real world scenario, it seems plausible that the mere existence of polluting firms (POs) is considered as an intertemporal threat to any home country's environmental quality. Translated into conflicting strategies, the polluting firms, on the one hand, have to decide about the volume of the emissions they will carry out, while the home country on the other hand has to act defensively in the "war on pollution". In the model presented here the state variable of the above clash is the volume of polluting firms (the size of PO), which is denoted by x.

Moreover, the group of polluting firms (the size of PO) does not remain at the same volume, but without any government intervention new polluters which are supported by the existing firms are added, because it is profitable for the firms to pollute since this reduces their operating costs during the production process. Therefore, it is reasonable to face the growth of polluting firms (the growth of PO size), as in the population models, in the absence of controls. As in biological population models, a simple equation suitable to describe the evolution of the number of polluting firms at time t, (t), is the following differential .

$$\dot{\mathbf{x}} = \mathbf{g}\mathbf{x} \quad \Leftrightarrow \quad \mathbf{x} = \mathbf{c}^{\mathbf{g}\mathbf{x}}, \quad \mathbf{x}(0) > 0 \tag{1}$$

where *g* > 0 denotes the endogenous growth rate of the polluting firms (of the PO).

From now on, we deal with the possible controls that can be introduced in Equation (1). First of all, the volume of emission realizations (hence denoted by *υ*) reduces the number of polluting firms due to the compliance costs, i.e., the more (the stronger) the emissions the higher the penalties imposed by authorities, therefore the lower the number of the polluting firms that survive due to these costs. We assume for simplicity that this fact is proportional to the number of emission realizations, i.e., *γυ*, and as the volume of the polluters reduces, it is added as an outflow term to Equation (1), i.e., it is entered into (1) with the minus sign.

Moreover, we set the intensity *u* of the counter-emissions effort as the control variable of the home country. The greater the intensity of the counter-emissions effort, the more resources devoted to investigating the implications of emission realization. Moreover, the stronger the home country's counter-pollution effort, the more effective the reduction of the polluting firms. We assume that this fact is the linear term *f*(*u*) = *βu*, and the parameter *β* denotes the percentage losses per emission realization, on behalf the polluting firms, when the social planner of the country, abates (or taxes) the pollutants (a policy which often called counter-offensive). Since the above term, *f*(*u*) = *βu*, reduces the volume of the polluting firms, we add as the second outflow term to (1), weighted by the volume of emissions *υ* with the weight *βu*, i.e., the outflow term is *βuυ*.

Regarding the control variable of the home country, i.e., the intensity of counterpollution effort, this control certainly reduces the number of the polluting firms and therefore a new negative term is entered into Equation (1). This term represents the losses due to the intensity of counter-measures at the initiation phase and is proportional to the control *u*, i.e., is the term *φu*. It is worth noting that taking measures against the polluting firms' initiation is a very sensitive process as the planner of the home country has to discriminate among the firms. Since the discrimination processes lurking risks (e.g., the environmental taxation must be fair for all the people), we designate this inflow to Equation (1), as a quadratic cost function of the intensity of pollution control measures (i.e., it is based on the square of abatement or taxation).

After all, the volume of polluting firms, i.e., the size of PO, evolves according to the following equation:

$$\frac{d\mathbf{x}}{dt} = \dot{\mathbf{x}} = \mathbf{g}\mathbf{x} - \boldsymbol{\phi}\boldsymbol{u} + \frac{a}{2}\boldsymbol{u}^2 - \gamma\boldsymbol{v} - \beta\boldsymbol{u}\boldsymbol{v}$$

where:

*x* ≥ 0 the state variable (the number of polluting firms or the size of PO) *u* ≥ 0 the control variable of the home country, i.e., the intensity of the home country's counter pollution effort;

*υ* ≥ 0 emissions' rate (control variable of the polluting firms acting as organization);

*g* ≥ 0 endogenous growth rate of the group of polluters (of PO);

*<sup>φ</sup>* <sup>≥</sup> 0 the rate at which the counter pollution measures would reduces the size of PO; *<sup>a</sup>*

<sup>2</sup> ≥ 0 the cost factor which faces the home country due to the unsuccessful discrimination among the overall firms during the abatement (or taxation);

*β* ≥ 0 percentage losses of the polluters per emission;

*γ* ≥ 0 average number of polluting firms which are not able to face the compliance costs.

Regarding the players' payoffs, we assume in this paper that the social planner of the home country wishes to minimize the following objectives. First, they want to minimize the volume of emissions *υ* and second to minimize the number of the polluting firms *x*, i.e., the size of PO (which is the state variable of the model). An important reason for which the social planner may wish to minimize the volume of polluting firms (or the size of PO) is because the threat of pollutants concentration is costly for the home country. These costs are in association with the uncertainty of business investments and in turn lead to market shrinkage. As the third objective, the home country has an interest in minimizing the counter-pollution effort (e.g., in lowering the environmental tax factor), by minimizing its control variable *u*. It is well known that the pollution-control activities cost money, as almost any control policy exertion. In the decision making literature, the social planning, in intertemporal formulations, is described as trying to minimize a weighted sum of the state *x* and the opponent's control *υ*, as well as the effort cost stemming from its own control variable *u*. Therefore, after the above simplified assumptions and with a positive discount rate *ρ*1, the intertemporal minimized functional of the social planner will be the following

$$\min\_{u(.)} \int\_{0}^{\infty} e^{-\rho\_1 t} (c\_1 x + c\_2 v + c\_3 u) dt \tag{2}$$

The polluting firms as a group, i.e., the PO, on the other hand, are interested to increasing their number *x* in order to exert more market power. The emissions' rate *υ* is their control variable which is maximized. However, the emission realizations cost money and this cost is represented in the objective functional by the quadratic cost function (*c*4/2)*υ*2. Regarding the polluting firms benefits with respect to the counter pollution effort, i.e., the home country's control variable *u*, the high values of that control may work as an indirect way of stirring up sentiments against the home government's environmental policy. Therefore, we represent this displeasure as a polluting firms' benefit (as PO benefits) and we set in their objective functional as the weighted term *bu*.

Therefore, for a positive discount rate *ρ*<sup>2</sup> the intertemporal objective function of the representative polluting firm may be the following

$$\max\_{\boldsymbol{\nu}(\cdot)} \int\_{0}^{\infty} e^{-\rho\_2 t} \left( b\_1 \mathbf{x} + b\_2 \boldsymbol{\nu} + b\_3 \boldsymbol{u} - \frac{c\_4}{2} \boldsymbol{\nu}^2 \right) dt \tag{3}$$

$$\text{with } \rho\_i > \lg i = 1, 2 \tag{4}$$

Hereafter and in the games that follow, the home country minimizes functional (2) and the polluting firms, i.e., the PO, maximizes (3) subject to (1) and the path constraints

$$\propto\_{\prime} u\_{\prime\prime} v \geq 0$$

In the next sections, we proceed with the calculation of both Nash and Stackelberg equilibrium solutions.

#### **3. Nash Equilibrium**

The Nash equilibrium computation is derived under the assumption that both players play the game at the same time. Then, every player of the game (i.e., the home country and the polluting firms) has to solve their own optimal control problem, taking the opponent's reaction as given. Finally, the two optimal control solutions determine the game optimal strategies *u*∗, *υ*∗. In the following, we denote by *λ* and *μ* the shadow prices of the state variable *x* for the country and the polluting firms, respectively. Now, the current value Hamiltonians of the game described above are given by

$$H\_1 = -c\_1\mathbf{x} - c\_2\mathbf{v} - c\_3\mathbf{u} + \lambda \left(\mathbf{g}\mathbf{x} - \phi\mathbf{u} + \frac{a}{2}\mathbf{u}^2 - \gamma\mathbf{v} - \beta\mu\mathbf{v}\right) \tag{5}$$

$$H\_2 = b\_1 \mathbf{x} + \left(b\_2 - \frac{c\_4}{2}\nu\right)\upsilon + b\_3\mu + \mu\left(\mathbf{g}\mathbf{x} - \phi\mu + \frac{a}{2}\mu^2 - \gamma\upsilon - \beta\mu\upsilon\right) \tag{6}$$

**Proposition 1.** *Along the optimal path, the shadow price λ of the state variable x for the country is always negative, since one additional polluting firm is always harmful for the environmental quality of the country. Conversely the shadow price μ of the volume of the polluting firms x (of the size of PO) for the PO, is positive along the optimal path, because one more polluting firm added to the PO, increases the benefits of the overall polluting*.

**Proof.** The result is obtained through Pontryagin's maximum principle optimality conditions, i.e., .

$$
\lambda = (\rho\_1 - \mathbf{g})\lambda + c\_1 \tag{7}
$$

$$\text{with the equilibrium } \dot{\lambda} = 0 \Rightarrow \dot{\lambda} = -\frac{c\_1}{\rho\_1 - g} < 0 \tag{8a}$$

and the shadow price of the polluting firms evolves according to the following equation

$$\begin{aligned} \dot{\mu} &= (\rho\_2 - \mathfrak{g})\mu - b\_1 \\ \text{with equilibrium } \hat{\mu} &= \frac{b\_1}{\rho\_2 - \mathfrak{g}} > 0 \end{aligned} \tag{8b}$$

According to (8a), the long-run damage associated for the country increases as one more polluting firm is added to the volume of polluting firms (PO) (i.e., as *λ*ˆ increases). This is the result of an increasing cost associated with the existence of a polluting firm (i.e., the factor *c*<sup>1</sup> in the home country's objective functional). The latter obvious conclusion is a prediction of the setup correctness for our model. Note that according to basic theorems of the optimal control theory, the transversality conditions hold for all admissible state trajectories (see, e.g., Grass et al. 2008). -

For the following analysis presented here, it is assumed that only interior solutions exist and they are positive, i.e., (10). According to Pontryagin's maximum principle, the maximizing condition of the Hamiltonian for the intensity of the home country's pollutioncontrol effort (the home country's control variable) is given by

$$\begin{aligned} \frac{\partial H\_1}{\partial u} = 0 &\Leftrightarrow \ -\mathfrak{c}\_3 + \lambda \phi - \lambda \beta \upsilon + \lambda au = 0 \quad \Leftrightarrow\\ u^\* &= \frac{1}{a} (\frac{\varepsilon\_3}{\lambda} + \gamma + \beta \upsilon) \end{aligned} \tag{9}$$

The result (9) is recorded in Proposition 2.

**Proposition 2.** *The optimal strategy of counter-pollution effort u*∗ increases with:


*The cost factor which faces the home country due to the unsuccessful discrimination among the firms of the country during the exercise of the counter pollution measures* (*a*/2) *has a decreasing influence on the home country's intensity of conducting the above effort*.

Inspecting the analytical expression (9) of the control variable, it is worth noting that if the cost of control (*c*3) is large relative to the shadow price *λ* (which is negative along the optimal path), the country's optimal strategy *u*∗ is to a low value and possibly meets the boundary at *u*∗ = 0. Conversely, if the cost of the control is negligible with respect to the shadow price *λ*, the country's optimal strategy is a linear function of emissions *υ*, since the term *c*3/*λ* in condition (9) vanishes. Therefore, it is optimal, in the former case, for the country to not exert any counter-pollution control.

Turning in the polluting firms (PO) problem, we take the Hamiltonian maximizing condition that is determined by

$$\frac{\partial H\_2}{\partial \upsilon} = 0 \iff b\_2 - c\_4 \upsilon - \mu(\gamma - \beta \upsilon) = 0 \iff \upsilon^\* = \frac{b\_2}{c\_4} - \frac{\mu}{c\_4}(\gamma + \beta \upsilon) \tag{10}$$

We record the result (10), as

**Proposition 3.** *The optimal rate of the polluting firms' (of the PO) emissions υ*∗ *decreases with:*


According to (10) if the shadow price of the polluting firms is raised, then it is optimal for the polluting firms (the PO) to curb their emissions' rate. Conversely, along the polluters' optimal path, the rate of emissions increases as the benefits (*b*2) accrued by the emissions increase relative to the costs *c*4.

The following is a useful corollary according to the optimality conditions (9) and (10): "Along the home country's optimal path the intensity of pollution-control measures raises while the rate of emissions increases, and the rate of emissions declines while the intensity of the counter-pollution measures is increasing".

As the next step, we explicitly calculate the stationary values of the crucial variables of the game.

#### **Proposition 4.**

*i. The stationary values of the strategies in Nash equilibrium are the following:*

$$\begin{array}{l} \hat{\mu}\_{N} = \frac{\not{\phi}(b\_{2} - \not{\mu}\gamma) + c\_{4}\left(\not{\phi} + c\_{3}\sqrt{\lambda}\right)}{c\_{4}a + \not{\mu}\beta^{2}}\\ \hat{\upsilon}\_{N} = \frac{a(b\_{2} - \not{\mu}\gamma) - \not{\mu}\beta\left(\not{\phi} + c\_{3}\sqrt{\lambda}\right)}{c\_{4}a + \not{\mu}\beta^{2}} \end{array} \tag{11}$$

*α are given by (8a) and (8b), while the subscript N in (11) means the Nash solution*. *ii. the Nash equilibrium value for the number of polluting firms is given by*

$$\pounds\_{N} = \frac{1}{\mathcal{g}} \left[ \left( \phi - \frac{a}{2} \mathfrak{h}\_{N} \right) \mathfrak{h}\_{N} + \left( \gamma + \beta \mathfrak{h}\_{N} \right) \mathfrak{h}\_{N} \right] \tag{12}$$

*b*<sup>2</sup> *as in (11)*.

#### **Proof.** In the Appendix A. -

Here, it is worth noting that, thanks to the structure of the state-separable games, we have the competitive advantage in finding the analytical expressions of the controls as well as the expression of the state variable. Solution (11) is a unique closed loop Nash equilibrium. This advantage is rather unusual, since multiple equilibrium solutions in differential games are the most common. Due analytical expressions (11) and (12), it is easy to proceed with sensitivity analysis with respect to the model parameters.

Table 1 represents the results of sensitivity analysis. Taking the partial derivatives *υ*ˆ*N*, the symbol "+" means that the partial derivative is greater than zero, the symbol "−" means the opposite case, 0 indicates that the result of the partial derivative is zero (the parameter is not a part of the control), and "?" denotes that the result is unknown. The results in Table 1 make some economic sense. Taking into account (8b), shadow price *μ*ˆ for the polluting firms decreases with the discount factor *ρ*2, but increases with the factor *b1* and at the endogenous growth rate *g*. Taking into account (11), the stationary value of the

polluting firms *x*ˆ*<sup>N</sup>* decreases with increasing endogenous rate *g* (as the control factor *c*<sup>3</sup> equals to zero).


#### **4. The Leader–Follower Game (Polluting Firms (the PO) as a Leader)**

In the Nash equilibrium solution, as illustrated above, it is assumed that the two player game played simultaneously, i.e., the moves of the rivals are made at the same time. As it is mentioned above, in this paper we explore and the other class of games in which one player, the leader, moves first, and the opponent, the follower, makes his/her decision at the second time. This hierarchical or sequential mode of playing is known as the leader–follower or Stackelberg model. In the game theory literature, e.g., Ba¸sar and Olsder (1999), at least one stepwise procedure to derive the equilibrium solution has been developed. In order to describe (for completeness) the solution procedure, we assume, without any loss of generality, the first player (the polluting firms (the PO)) is the leader and the second (the social planner of the country) is the follower. The control and the adjoint variables of the leader are denoted with *H*<sup>2</sup> = *b*1*x* + - *<sup>b</sup>*<sup>2</sup> <sup>−</sup> *<sup>c</sup>*<sup>4</sup> 2 *υ <sup>υ</sup>* <sup>+</sup> *<sup>b</sup>*3*u*∗(*υ*) <sup>+</sup> *<sup>μ</sup>* . *x* + *ψ* . *λ*, respectively, and with (16) we denote the same variables for the follower. Moreover, for simplicity we assume in the analysis that follows that the cost of pollution control vanishes, i.e., *c*<sup>3</sup> = 0.

The three step procedure for the open-loop Stackelberg solution (e.g., Grass et al. 2008; Dockner et al. 2000; Ba¸sar and Olsder 1999) is as follows:

**Step 1:** The polluting firms, as group (i.e., the PO), announce their common strategy, *υ*.

**Step 2:** For the given strategy *υ*, the social planner of country (the follower) solves the same Nash optimal control problem. As it is mentioned in the Nash case (see (9)), the home's optimal response to the strategy *υ* of the polluting firms (the PO), will be

$$
\mu^\* = \mu^\*(\upsilon) = \frac{1}{a}(\gamma + \beta \upsilon) \tag{13}
$$

since it is assumed that *c*<sup>3</sup> = 0, and the shadow price *λ* for the follower is given by Equation (7).

**Step 3:** Now, in the last step, the leader has to solve the same as in the Nash case optimal control problem, but for the known reaction function (13) of the follower:

$$\max\_{\boldsymbol{\nu}(.)} \int\_{0}^{\infty} e^{-\rho\_2 t} \left( b\_1 \boldsymbol{x} + \left( b\_2 - \frac{c\_4}{2} \boldsymbol{\nu} \right) \boldsymbol{\nu} + b\_3 \boldsymbol{u}^\*(\boldsymbol{\nu}) \right) d\boldsymbol{t}$$

subject to the following two equations

$$\dot{\mathbf{x}} = \mathbf{g}\mathbf{x} - \left(\phi - \frac{a}{2}\mu^\*(\upsilon)\right)\mu^\*(\upsilon) - \gamma\upsilon - \beta\mu^\*(\upsilon)\upsilon \tag{14}$$

$$
\dot{\lambda} = (\rho\_1 - \mathbf{g})\lambda + c\_1 \tag{15}
$$

with *u*∗(*υ*) given by (13).

The Hamiltonian of player 2 (the follower) becomes

$$H\_2 = b\_1 \mathbf{x} + \left(b\_2 - \frac{c\_4}{2}\nu\right)\upsilon + b\_3 \mu^\*(\upsilon) + \mu\dot{x} + \psi\dot{\lambda} \tag{16}$$

The adjoint variables are the shadow values of the states Δ for which the equations of motion are given by (14) and (15), respectively. Taking the first order condition for the Hamiltonian (16), i.e., *∂H*2/*∂υ* = 0 we found the optimal strategy *υ*∗. The calculations of the stationary strategies are made through the substitutions in (13) the player's 2 optimal strategy. Then, the final expressions are given below as (17) and (18) and their associated propositions.

#### **Proposition 5.**

*i. The optimal strategies for the social planner and the polluting firms (the PO) of the hierarchical game are given, respectively by the following expressions*

$$\begin{array}{l} \hat{u}\_S = \frac{\beta(b\_2 - \mu \gamma) + c\_4 \phi + b \chi \left(\hat{\rho}^2 / a\right)}{c\_4 a + \hat{\rho} \beta^2} \\ \hat{v}\_S = \frac{a(b\_2 - \mu \gamma) - \beta(\hat{\rho}\phi - b\_3)}{c\_4 a + \hat{\rho} \beta^2} \end{array} \tag{17}$$

*ii. The number of polluting firms (the size of PO) is given by the expression*

$$
\hat{\mathfrak{x}}\_S = \frac{1}{\mathcal{S}} \left( \left( \phi - \frac{a}{2} \hat{u}\_S \right) \hat{u}\_S + \left( \gamma + \beta \hat{u}\_S \right) \hat{v}\_S \right) \tag{18}
$$

*and the optimal controls are given by (17), with the subscript S to denote the Stackelberg strategy.*

#### **Proof.** In the Appendix A. -

Since the analytical expressions of the optimal strategies are computed for both types of the game, in the next section we compare these values. The reverse case at which the country moves first as a leader and the polluting firms follow is not examined here and is left for future research.

#### **5. Comparison of the Two Solutions**

Taking the Nash solutions (11) and the Stackelberg solutions (17) the optimal controls can be expressed as *<sup>u</sup>*ˆ*<sup>S</sup>* <sup>=</sup> *<sup>u</sup>*ˆ*<sup>N</sup>* <sup>+</sup> *<sup>β</sup>*

*a* Δ

$$
\mathfrak{d}\_{\mathcal{S}} = \mathfrak{d}\_{\mathcal{N}} + \Delta
$$

while

$$
\Delta = \frac{b\_3 \beta}{c\_4 a + \beta^2 \hat{\mu}} > 0 \tag{19}
$$

the difference between the optimal stationary strategies is given by (19). Some remarks can be drawn about the difference of the two solutions of the same game. These observations could be:


$$
\mathfrak{a}\_S > \mathfrak{a}\_N \text{ and } \mathfrak{v}\_S > \mathfrak{v}\_N
$$

This means that the conflict will be more intense if the group of polluting firms has the first mover advantage and announces the volume of emissions to be carried out (compared to the simultaneous move game). Consequently, the next result became obvious.

**Proposition 6.** *The pollution control hierarchical game in which the group of polluting firms (the PO) being the leader and the country the follower, results in a higher volume of emissions and in a more intensive counter-pollution effort, i.e., the conflict between the players is more intensive*.

The difference between the equilibrium values (12) and (18) is positive, that is *D* = *x*ˆ*<sup>S</sup>* − *x*ˆ*<sup>N</sup>* > 0, and therefore we can conclude that for the polluting firms (the PO), being the leader is verified as the better position due to the increase in the size of D.

Linear state Equations (12) and (18) can explicitly solved with respect to the size of the PO which is the state variable *x*(*t*), yielding:

$$\mathbf{x}\_{N}(t) = \mathbf{x}\_{N0}\mathbf{e}^{\otimes t} + \mathbf{\hat{x}}\_{N}\left(1 - \mathbf{e}^{\otimes t}\right)$$

$$\mathbf{x}\_{S}(t) = \mathbf{x}\_{S0}\mathbf{e}^{\otimes t} + \mathbf{\hat{x}}\_{S}\left(1 - \mathbf{e}^{\otimes t}\right)$$

Additionally, the value functions for the Nash and Stackelberg equilibrium is easy computed as:

$$\begin{split} V\_{2,N} &= \underset{0}{\int} e^{-\rho\_2 t} \left( b\_1 \mathbf{x} + b\_2 \vartheta\_N + b\_3 \vartheta\_N - \frac{c\_4}{2} \vartheta\_N^2 \right) dt = \\ &= b\_1 \frac{\rho\_2 \mathbf{x}\_{N0} - \mathbf{g} \mathbf{x}\_N}{\rho\_2 (\rho\_2 - \mathbf{g})} + \frac{2b\_2 \vartheta\_N - c\_4 \vartheta\_N^2 + 2b\_3 \vartheta\_N}{2\rho\_2} \end{split}$$

and

$$\begin{split} V\_{2,S} &= \underset{0}{\int} e^{-\rho\_2 t} \left( b\_1 x + b\_2 \vartheta\_S + b\_3 \vartheta\_N - \frac{c\_4}{2} \vartheta\_S^2 \right) dt = \\ &= b\_1 \frac{\rho\_2 x\_{S0} - \chi \vartheta\_S}{\rho\_2 (\rho\_2 - \xi)} + \frac{2b\_2 \vartheta\_S - c\_4 \vartheta\_S^2 + 2b\_3 \vartheta\_S}{2\rho\_2} \end{split}$$

Moreover, the difference of the two value functions

$$V\_{2,S} - V\_{2,N} = \frac{b\_3 \beta \Delta}{2 \rho\_2 a} > 0$$

is positive, and therefore becomes better for the group of the polluting firms to lead playing the Stackelberg strategy than playing the Nash strategy. This result is recorded as Proposition 7.

**Proposition 7.** *In the environmental pollution game between the polluting firms of a country and the social planner of the same country the more beneficial strategy, on the polluters side, is the strategy in which they lead (and the home country follows) in a Stackelberg setting*.

#### **6. Conclusions**

In this paper, we set up a differential game model between the polluting firms of a country and the social planner of the same country. The model belongs to the special tractable class of state-separable games. This class of games has a special feature in the Nash equilibrium, for which the open-loop equilibrium coincides with the closed-loop (Markovian) equilibrium. During the solution process of the simultaneous move game (the Nash case), we found the analytical expressions of both players' are controlled as well as the steady state of the stock variable (which is the volume of the polluting firms). Sensitivity analysis, which is an analysis between the controls and crucial variables of the model, makes economic sense.

Some results, based on the proved propositions discussed in the main text, are as follows. First of all, in the simultaneous move game, the first proposition, which operates as a verification of the correctness of our model, states that the marginal increment of the size of the polluting group (one more polluting firm added to the entire stock) is always harmful for the environmental quality of the country. The measure of this damage is indicated by the negative value of the shadow price of the size of polluting firms, but from the side of the country's social planner. It is worth noting that the measure of the shadow price (expressed by the adjoint or costate variable of the state variable) is the more common and a most accurate measure in the technique of game theory.

Conversely, the same result, but from the polluting firms' side, says that a marginal increment of the size of the group of polluting firms increases their benefits. Again, as game theory dictates, we work with the shadow prices, but from the side of the polluting firms. The second result considers the conditions under which any counter pollution strategy on behalf of the social planner would be optimal. The proved results of our model are based on the variables of the proposed model. More specifically, a counter-pollution strategy is optimal if this strategy increases with a rising volume of emissions, with increasing percentage loss of polluting firms and with increased effectiveness of pollution control on behalf of the social planner of the country. The third result tackles the polluting firms, as a group, regarding their strategy.

As a policy implication, on the polluting firms' side, the group of the polluting firms could have in mind that their optimal strategy decreases with the number of polluting firms' abandonment rate, with increasing losses per emission and in the case at which the shadow price of the of the social planner of the country (which measures the size of the polluting firms) increases. The latter case, i.e., the shadow price of the social planner increases, states the beneficial nature of the situation (which is therefore harmful for the polluting firms as a group). The fourth result is very technical and gives precise computed expressions of strategies for both players. It is worth noting that the equilibrium value (for the simultaneous move game) of the number of polluting firms depends on the reversing of intrinsic growth rate *g*, but also depends on the equilibrium strategies of both players (*u*ˆ*N*, *υ*ˆ*N*), showing that the evolution of the size of the polluting firms, i.e., the state variable *x*ˆ*N*, is time-consistent with a more demanding property, which also means that the state variable is not only a function of time.

In the dynamic hierarchical setting section, in order to compare the simultaneous move game and the hierarchical game values, we compute the exact values of the Stackelberg game. This result is recorded in Proposition 5. The comparison of the two results of the different equilibrium patterns strategies shows that the Stackelberg strategies are superior to the Nash strategies, showing again that the model and its parameters obey the economic theory. With this model, we conclude that the difference between the strategies becomes smaller (i.e., the Nash equilibrium strategies approaches the Stackelberg) as the losses per emission on behalf of the polluting firms decreases. A major result drawing from this hierarchical setting is that the conflict between the players of the game is more intense in this case than in the Nash case, and the first-mover advantage is still present since it is proved that the size of the polluting firms is greater in the Stackelberg case, in which the polluting firms lead. Similarly, computing the payoffs for both players, it is proved by the proposed model that is more beneficial for the polluting firms to play the hierarchical game, in which they lead.

Finally, let us mention that one major limitation of the proposed model is that, considering the polluting firms acting as a group, there is no room for further conclusions and policy implications about the behaviour of the polluting firms involved in a group. This is a drawback of the proposed model which is under consideration for future improvement. Another limitation is that the model is restricted, in the Stackelberg setting, only in one case, at which the polluting firms as a group announce their policy (i.e., the volume of the pollutants that they would emit), and therefore they are set as the leader of the hierarchical game. Undoubtedly there is the opposite case in which the policy maker of the country, announcing its policy first, could be the leader of the same hierarchical game. This case would be the second extension of our primary model in the future research.

**Author Contributions:** G.E.H. and G.J.P. contributed equally. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** Thanks are due to the Editor and the anonymous reviewers for their helpful and constructive comments. Any remaining errors are solely the authors' responsibility.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Proof of Proposition 4.** The Hamiltonian of the country's social planner is given by (5) in the main text as the following

$$H\_1 = -c\_1\mathbf{x} - c\_2\boldsymbol{\upsilon} - c\_3\boldsymbol{\mu} + \lambda \left(\mathbf{g}\mathbf{x} - \phi\boldsymbol{u} + \frac{a}{2}\mathbf{u}^2 - \gamma\boldsymbol{\upsilon} - \beta\boldsymbol{u}\boldsymbol{\upsilon}\right)$$

and the first order condition *∂H*2/*∂υ* = 0, becomes −*c*<sup>3</sup> + *λ*(−*φ* + *au* − *βυ*) = 0.

Substituting the optimal value of the opponent's control *υ*∗ given by (10) in the main text, i.e., *υ*<sup>∗</sup> = *b*2/*c*<sup>4</sup> − *μ*(*γ* + *βu*)/*c*<sup>4</sup> the first order condition yields:

$$\partial H\_1/\partial \mathfrak{u} = -\mathfrak{c}\_3 + \lambda \left( -\phi + a\mathfrak{u} - \beta (b\_2/\mathfrak{c}\_4 - \mu(\gamma + \beta \mathfrak{u})/\mathfrak{c}\_4) \right) = 0$$

from which the optimal Nash strategy for the social planner *u*ˆ*N*, is as follows

$$\hat{a}\_N = \frac{c\_4(\phi + c\_3/\hat{\lambda}) + \beta(b\_2 - \hat{\mu}\gamma)}{c\_4 a + \hat{\mu}\beta^2}$$

Making the same steps as for the social planner's Nash strategy, but with the polluter's Hamiltonian, we have from (6) and (9) in the main text

$$\begin{cases} H\_2 = b\_1 \mathbf{x} + \left( b\_2 - \frac{c\_4}{2} \upsilon \right) \upsilon + b\_3 \boldsymbol{u} + \mu \left( \boldsymbol{g} \mathbf{x} - \boldsymbol{\phi} \mathbf{u} + \frac{\boldsymbol{\varrho}}{2} \boldsymbol{u}^2 - \gamma \upsilon - \beta \boldsymbol{u} \upsilon \right) \Leftrightarrow \\ \partial H\_2 / \partial \upsilon = -c\_4 \upsilon + b\_2 + \mu \left( -\gamma - \beta \boldsymbol{u} \right) = 0 \stackrel{(9)}{\Leftrightarrow} \\ -c\_4 \upsilon + b\_2 + \mu \left( -\gamma - \beta \frac{1}{a} \left( \frac{c\_2}{\lambda} + \gamma + \beta \upsilon \right) \right) = 0 \Leftrightarrow \\ \end{cases}$$
 
$$\psi\_N = \frac{a(b\_2 - \beta \upsilon) - \beta \beta \left( \frac{c\_2}{\lambda} + \gamma \right)}{c\_4 a + \mu \beta^2}$$

the result *x*ˆ*<sup>N</sup>* = <sup>1</sup> *g <sup>φ</sup>* <sup>−</sup> *<sup>a</sup>* <sup>2</sup>*u*ˆ*<sup>N</sup> u*ˆ*<sup>N</sup>* + (*γ* + *βu*ˆ*N*)*υ*ˆ*<sup>N</sup>* is easily obtained, solving the differential equation . *<sup>x</sup>* <sup>=</sup> *gx* <sup>−</sup> *<sup>φ</sup><sup>u</sup>* <sup>+</sup> *<sup>a</sup>* <sup>2</sup>*u*<sup>2</sup> − *γυ* − *<sup>β</sup>u<sup>υ</sup>* and setting zero the integration constant. -

**Proof of Proposition 5.** The Hamiltonian of the polluting firms is given by (16) in the main text and is .

$$H\_2 = b\_1 \mathbf{x} + \left(b\_2 - \frac{c\_4}{2}\nu\right)\upsilon + b\_3 \mu^\*(\upsilon) + \mu \dot{\mathbf{x}} + \psi \dot{\lambda}$$

while the time differentials . *x*, . *λ* are given by (14) and (15) as

$$\dot{\mathbf{x}} = \mathbf{g}\mathbf{x} - \left(\phi - \frac{a}{2}\mu^\*(\upsilon)\right)\mu^\*(\upsilon) - \gamma\upsilon - \beta\mu^\*(\upsilon)\upsilon$$

$$\dot{\lambda} = (\rho\_1 - \mathbf{g})\lambda + c\_1$$

substituting the values of the adjoint variables back into the follower's Hamiltonian this function becomes (after some algebraic manipulations)

$$H\_2 = b\_1 \mathbf{x} + \left(b\_2 - \frac{c\_4}{2}\upsilon\right)\upsilon + \frac{b\mathbf{y}\left(\phi + \beta\upsilon\right)}{a} + \\\\+ \mu \left(g\mathbf{x} - \frac{(\phi - \beta\upsilon)}{2} \cdot \frac{(\phi + \beta\upsilon)}{a} - \gamma\upsilon - \frac{\beta(\phi + \beta\upsilon)\upsilon}{a}\right) + \\\\+ \psi((\rho\_1 - g)\lambda + c\_1)$$

while the maximization condition *∂H*2/*∂υ* = 0 finally yields

$$-c\_4\upsilon + b2 + \frac{b\_3\beta}{a} + \mu \left( -\frac{\beta(\phi + \beta\upsilon)}{2a} - \frac{(\phi - \beta\upsilon)\beta}{2a} - \gamma - \frac{\beta^2\upsilon}{a} \right) = 0$$

and the follower's optimal strategy is easily obtained as (solving the maximization condition)

$$\vartheta\_{\mathcal{S}} = \frac{a(b\_2 - \hat{\mu}\gamma) - \beta(\hat{\mu}\phi - b\_3)}{c\_4 a + \hat{\mu}\beta^2}.$$

repeating similar steps as before we calculate the Stackelberg leader's optimal strategy as

$$\hat{\mu}\_S = \frac{\beta(b\_2 - \hat{\mu}\gamma) + c\_4\phi + b\_3(\beta^2/a)}{c\_4a + \hat{\mu}\beta^2}$$

the proof of the expression for the number of polluting firms is the same as in the Nash case. -

#### **References**

Ba¸sar, Tamer, and Geert Jan Olsder. 1999. *Dynamic Noncooperative Game Theory*, 2nd ed. New York: Academic Press.


Halkos, George E. 1996. Incomplete information in the acid rain game. *Empirica* 23: 129–48. [CrossRef]

