Next Article in Journal
Shannon Entropy of Ramsey Graphs with up to Six Vertices
Previous Article in Journal
Tapping into Permutation Symmetry for Improved Detection of k-Symmetric Extensions
Previous Article in Special Issue
Application of the Esscher Transform to Pricing Forward Contracts on Energy Markets in a Fuzzy Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sustainable Optimal Control for Switched Pollution-Control Problem with Random Duration

1
College of Electronic Science and Engineering, Jilin University, Jilin 130012, China
2
Faculty of Applied Mathematics and Control Processes, Saint Petersburg University, Saint Petersburg 199034, Russia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2023, 25(10), 1426; https://doi.org/10.3390/e25101426
Submission received: 30 August 2023 / Revised: 18 September 2023 / Accepted: 24 September 2023 / Published: 8 October 2023

Abstract

:
Considering the uncertainty of game duration and periodic seasonal fluctuation, an n-player switched pollution-control differential game is modeled to investigate a sustainable and adaptive strategy for players. Based on the randomness of game duration, two scenarios are considered in this study. In the first case, the game duration is a random variable, T f , described by the shifted exponential distribution. In the second case, we assumed that players’ equipment is heterogeneous, and the i-th player’s equipment failure time, T f i , is described according to the shifted exponential distribution. The game continues until a player’s equipment breaks down. Thus, the game duration is defined as T f = m i n { T f 1 , , T f n } . To achieve the goal of sustainable development, an environmentally sustainable strategy and its corresponding condition are defined. By using Pontryagin’s maximum principle, a unique control solution is obtained in the form of a hybrid limit cycle, the state variable converges to a stable hybrid limit cycle, and the total payoff of all players increases and then converges. The results indicate that the environmentally sustainable strategy in the n-player pollution-control cooperative differential game with switches and random duration is a unique strategy that not only ensures profit growth but also considers environmental protection.

1. Introduction

Practical ecological, economic, and engineering problems comprise switching phenomena [1,2,3]. All systems involving logical decision-making and continuous (smooth) dynamics, such as robot systems [4], chemical control processes [5], etc., can be transformed into hybrid systems with multiple regimes of dynamics.
Therefore, the hybrid dynamic system has gained considerable a research interest in the environmental, economic, and engineering fields. In addition, changed systems, in the form of time-driven and state-driven switches, are increasingly common. The up-to-date contributions to those fields are [6,7,8,9,10]. In [9,11], an optimal solution in the form of a hybrid limit cycle (HLC) was introduced as the best possible candidate for the infinite-horizon optimization problem. However, the results were only about the optimal control of a single agent and did not explore the optimal control strategy of multiple players.
In addition, in contrast to the general pollution-control problems with deterministic terminal time [12,13] or with infinite horizon [10,14], the randomness of the game duration cannot be ignored, because the game may end abruptly. The reasons behind this can be an equipment break-down, an economic failure. or a natural disaster, among many others [15]. In [15,16], differential pollution-control games with random duration were thoroughly analyzed. However, the impact of seasonal fluctuation on the system was not considered.
Thus, by combining the two aforementioned research directions, an n-player cooperative differential game was explored for pollution control. The game involves infinite time-driven switches and encompasses random game duration.
The contributions of this paper are summarized as follows:
  • A novel model is proposed to address challenges within the context of an n-player cooperative differential game for pollution control with time-driven switches and random duration. The time-driven switches within system are denoted by a periodic piecewise-constant function. Taking into account the randomness of game duration and the players’ equipment warranty period, the finite-horizon optimal control problem is reformulated as an infinite-horizon optimal control problem, in which the game duration is modeled considering two scenarios based on shifted exponential distributions. The proposed model introduces innovative concepts and refines previously established methodologies, aiming to enhance its adaptability to real-world scenarios and yield more practical outcomes.
  • In addition, this study proves the sustainability and uniqueness of the environmentally sustainable solution upon the proposed model.
  • Solving the optimal pollution-control problem with time-driven switches and random duration: We employed Pontryagin’s maximum principle and thoroughly analyzed the adjoint variable dynamics to derive the shifted equilibrium value, resulting in a unique environmentally sustainable solution in the form of an HLC, which is environmentally friendly and guarantees the profit of all players.
The obtained results can be applied to the optimization problems of switched systems with periodic switching signals, such as the formation control problem [17] and capital investment problem [8]. Additionally, they can provide valuable insights on how agents can effectively adapt to a rapidly changing and evolving environment, allowing them to reap the benefits.
The remainder of this paper is structured as follows. In Section 2, the model of the n-player pollution control differential game with time-driven switches and random duration is formulated. In addition, considering different types of game durations, two case studies were conducted. Section 3 presents a discussion of the cooperative game with identical shifted exponential distribution and provides a the definition of the environmently sustainable control and determines the equilibrium value of adjoint variable that is used to derive the unique solution. In Section 4, we discuss a cooperative game with different shifted exponential distributions and obtain the corresponding equilibrium value and unique solution. Section 5 provides an illustration of a pollution-control differential game involving two players, and optimal solutions for both scenarios are demonstrated numerically.

2. The Problem Statement

All notations used through the paper are summarized in Appendix A, Table A1.
In this study, we consider the optimal lake-pollution-control game model based on [15,16,18]. The game involves n players (factories). Each player i manages his/her emissions policy toward the lake, such that the dynamic of the pollution stock is governed by the linear differential equation with the following initial condition:
z ˙ = i = 1 n ξ i v i δ ( t ) z , z ( 0 ) = z 0 ,
where z is the stock of pollution within a fixed natural reservoir (e.g., a lake), v i [ 0 , b i ] , i = 1 , n ¯ denotes the emissions rate, b i is the maximal admissible emissions rate of each player, ξ i ( 0 , 1 ) is the fraction of the emitted pollutants accumulated in the reservoir from each player (e.g., factory), and δ ( t ) is the self-cleaning rate of the reservoir. Furthermore, we have z 0 0 and the state z ( t ) is non-negative for all t 0 .
The self-cleaning rate for lakes is widely acknowledged to vary throughout the year. This could be due to the impact of various factors, including temperature and light fluctuations, during specific periods (e.g., over the span of a year). Considering the impact of external seasonal changes on the lake, it is reasonable to postulate that the self-cleaning rate of the lake is not a fixed constant; however, it varies as a function of time. Thus, we further assumed that the self-cleaning rate of reservoir δ ( t ) is represented by a periodic piecewise-constant function, which defined as the following mathematical expression:
δ ( t ) : = δ 1 > 0 , t [ k T , ( k + τ ) T ] , δ 2 > 0 , t [ ( k + τ ) T , ( k + 1 ) T ] .
The whole time duration D = [ 0 , T f ] is divided into equal periods of length T, each of which is subdivided into two parts: [ k T , ( k + τ ) T ] and [ ( k + τ ) T , ( k + 1 ) T ] , where τ ( 0 , 1 ) is the switching ratio and k N 0 . When the system is in the first subperiod, δ ( t ) = δ 1 , whereas in the second subperiod, δ ( t ) = δ 2 and δ 1 δ 2 .
Note that the production is usually assumed to be linearly related to the emissionss. Therefore, the revenue function can be expressed in terms of emissionss [18]. The revenue function of each player R i ( v i ) is strictly concave. The marginal revenue decreases with the increasing emissions rate of each player v i ( t ) [ 0 , b i ] . However, zero emissions (production) is unprofitable. Each player incurs a damage cost, C i ( z ) , for mitigating their emissions at moment t, and this cost function is increasing and convex. Thus, a quadratic revenue functional R i ( v i ) = a i v i ( b i v i / 2 )  [19] and a linear cost functional form C i ( z ) = q i z are commonly derived to represent the instantaneous payoff of player i as L ( v i , z ) = R i ( v i ) C i ( z ) .
Then, the general form of the integral payoff for player i is as follows:
J i ( z 0 , v 1 , v 2 , , v n ) = 0 T f [ a i v i ( b i v i / 2 ) q i z ] d t ,
where a i > 0 is a positive constant used to transform the emissions flow to the profit flow. Coefficient q i > 0 is a positive constant, corresponding to the tax that a player must bear (e.g., an ecotax).
Considering the randomness of the game duration, we assumed that the terminal time of game T f is a random variable. After simplifying the integral payoff [20], the expected integral payoff of player i is obtained as follows:
J i ( z 0 , v 1 , v 2 , , v n ) = E ( 0 T f [ a i v i ( b i v i / 2 ) q i z ] d t ) = 0 0 s [ a i v i ( b i v i / 2 ) q i z ] d t d F ( s ) = 0 ( 1 F ( t ) ) [ a i v i ( b i v i / 2 ) q i z ] d t ,
where F ( t ) is the cumulative distribution function of T f .

3. Cooperative Game with Identical Shifted Exponential Distribution

If all players share common pollution-control equipment (e.g., filters), we assume that the game duration is a realization of the random variable T f (the time of equipment failure is same for all players and coincides with the end of the game).
Assume that at the beginning of the game, all players use a new pollution-control equipment, and this equipment comes with a warranty period. The warranty period refers to a specified time frame after the sale of a product or service, during which the manufacturer or supplier assures free repairs or replacements. During this period, if the product experiences any quality issues or malfunctions, the consumer can avail free repairs/replacement services. Hence, before the warranty period, there is no risk of equipment breakdown, while after the warranty period, the equipment is subject to the risk of potential damage. The game terminates when the equipment breaks down.
Therefore, we consider a cooperative game for pollution control, wherein the randomness of game duration and consideration of the warranty period are represented by a shifted exponential distribution.
For the mathematical definition of random variable T f , we applied a shifted exponential distribution, which is given by
F ( t ) = 1 e λ ( θ t ) , t > θ , 0 , t [ 0 , θ ] ,
where θ > 0 is the shift parameter of the exponential distribution from the initiation of game 0, which represents the equipment warranty period. The equipment does not encounter the risk of failure before the warranty period, and after this period, θ , the failure rate is constant over time. In addition, λ > 0 is the parameter of distribution and E ( T f ) = θ + 1 λ .
By substituting (5) into (4), the whole time horizon is split into [ 0 , + ] as [ 0 , θ ] ( θ , ] , according to Bellman’s optimality principle, and the overall payoff functional, (4), can be rewritten as a sum:
J i ( z 0 , v 1 , v 2 , , v n ) = 0 θ [ a i v i ( b i v i / 2 ) q i z ] d t + θ e λ ( θ t ) [ a i v i ( b i v i / 2 ) q i z ] d t .
All players (factories) act together to maximize their joint payoff:
J c o ( z 0 , v 1 , v 2 , , v n ) = i = 1 n J i ( z 0 , v 1 , v 2 , , v n ) = J c o 1 ( z 0 , v 1 , v 2 , , v n ) + J c o 2 ( z θ , v 1 , v 2 , , v n ) = i = 1 n 0 θ [ a i v i ( b i v i / 2 ) q i z ] d t + i = 1 n θ e λ ( θ t ) [ a i v i ( b i v i / 2 ) q i z ] d t .
By using Pontryagin’s maximum principle, the cooperative solution is obtained as a result of the joint optimization problem. We solved this optimization problem separately. First, the second interval ( θ , ] is considered with respect to players’ joint payoff J c o 2 ( z θ , v 1 , v 2 , , v n ) = i = 1 n θ e λ ( θ t ) [ a i v i ( b i v i / 2 ) q i z ] d t , and then the first interval [ 0 , θ ] is considered with respect to players’ joint payoff J c o 1 = i = 1 n 0 θ [ a i v i ( b i v i / 2 ) q i z ] d t .

3.1. Second Interval— ( θ , ]

The payoff functional of the second interval ( θ , ] is
J c o 2 ( z 0 , v 1 , v 2 , , v n ) = i = 1 n θ e λ ( θ t ) [ a i v i ( b i v i / 2 ) q i z ] d t .
Further, the Hamiltonian is given by
H c o 2 ( t ) = e λ ( θ t ) i = 1 n [ a i v i ( b i v i / 2 ) q i z ] + ψ 2 ( i = 1 n ξ i v i δ ( t ) z ) .
Then, we have the canonical system:
z ˙ = i = 1 n ξ i v i δ ( t ) z , ψ ˙ 2 = e λ ( θ t ) i = 1 n q i + δ ( t ) ψ 2 .
Let ψ ˜ 2 = e λ ( θ t ) ψ 2 . Then, we have
ψ ˜ ˙ 2 = i = 1 n q i + ( δ ( t ) + λ ) ψ ˜ 2 .
Based on the first order derivative of the Hamiltonian, the optimal control is obtained as follows:
v ˜ i * = b i , ψ ˜ 2 > 0 , b i + ξ i a i ψ ˜ 2 , ψ ˜ 2 [ a i b i ξ i , 0 ] 0 , ψ ˜ 2 < a i b i ξ i .
For the payoff functional (8), which is defined in infinite horizon ( θ , ] with infinite time-driven switches, players should consider a sustainable development. Therefore, this study aimed to find a sustainable decision-making pattern so that players can find an optimal compromise between profit and penalty (e.g., ecotax).
Definition 1. 
The optimal control, v i * , is environmentally sustainable if it does not take on boundary values, except at isolated instances of time, i.e., ψ ˜ 2 [ a i b i ξ i , 0 ] , for all t θ .
This definition is based on the long-term economic interests of all players. For v i * > b i , the control (the rate of emissions) of player i remains at its maximum value. Evidently, this is not profitable because player i bears high ecotax. In another situation v i * < b i , the revenue of player i cannot exceed the cost; therefore, the player must halt the production after a certain time interval to allow the stock of pollution to decrease to a lower level. Hence, both of these situations are not acceptable for sustainable production [21].
The following theorem and its proof are close to the result presented in [11]. The main difference is that the condition for the adjoint variable was not set at the initial moment but at moment θ . In addition, this value depends on the interval to which θ belongs.
Theorem 1. 
The solution to (10) satisfying z ( 0 ) = z 0 and ψ 2 ( θ ) = ψ h l c with
ψ h l c = L ( 1 e p 2 T ( τ 1 ) ) e p 1 m q p 1 , m [ 0 , τ T ] , L ( 1 e p 1 τ T ) e p 2 ( m T ) q p 2 , m [ τ T , T ]
is the unique optimal solution to (1)–(7). Here, q = i = 1 n q i , p 1 = δ 1 + λ , p 2 = δ 2 + λ , m = θ θ T T , L = q ( p 1 p 2 ) p 1 p 2 ( e p 2 T ( τ 1 ) e p 1 τ T ) .
Proof. 
To obtain a periodic solution, the following equation is first solved:
ψ ˜ 2 ( θ ) = ψ ˜ 2 ( θ + T ) .
Let k 1 = θ T ; then, θ [ k 1 T , ( k 1 + 1 ) T ] . On interval [ k 1 T , ( k 1 + 1 ) T ] , the solution to (11) has the form
ψ ˜ 2 ( t ) = c 1 e p 1 t q p 1 , t [ k 1 T , k 1 T + τ T ] , c 2 e p 2 t q p 2 , t [ k 1 T + τ T , ( k 1 + 1 ) T ] .
If θ [ k 1 T , k 1 T + τ T ] or m [ 0 , τ T ] , then ψ ˜ 2 ( θ ) = c 1 e p 1 θ q p 1 , which is the same thing, solving (14), we have
ψ ˜ 2 ( θ ) = L ( 1 e p 2 T ( τ 1 ) ) e p 1 m q p 1 .
If θ [ k 1 T + τ T , ( k 1 + 1 ) T ] , or, which is the same thing, m [ τ T , T ] , then ψ ˜ 2 ( θ ) = c 2 e p 1 θ q p 2 . Solving (14) we have
ψ ˜ 2 ( θ ) = L ( 1 e p 1 τ T ) e p 2 ( m T ) q p 2 .
Then, the solution to (10) satisfying ψ 2 ( θ ) = ψ ˜ 2 ( θ ) = ψ h l c is given by
ψ ˜ 2 * ( t ) = ψ h l c + q p 1 e p 1 ( t θ ) q p 1 , t [ θ , ( k 1 + τ ) T ] , ψ h l c + q p 1 e p 1 ( t k T θ ) q p 1 , t [ ( k 1 + k ) T , ( k 1 + k + τ ) T ] , k N + , ψ h l c + q p 1 e p 1 ( ( k 1 + τ ) T θ ) q p 1 + q p 2 e p 2 ( t ( k 1 + τ ) T k T ) q p 2 , t [ ( k 1 + τ + k ) T , ( k 1 + 1 + k ) T ] , k N 0 ,
if θ [ k 1 T , k 1 T + τ T ] and
ψ ˜ 2 * ( t ) = ψ h l c + q p 2 e p 2 ( t θ ) q p 2 , t [ θ , ( k 1 + 1 ) T ] , ψ h l c + q p 2 e p 2 ( t k T θ ) q p 2 , t [ ( k 1 + k + τ ) T , ( k 1 + k + 1 ) T ] , k N + , ψ h l c + q p 2 e p 2 ( ( k 1 + 1 ) T θ ) q p 2 + q p 1 e p 1 ( t ( k 1 + 1 ) T k T ) q p 1 , t [ ( k 1 + k ) T , ( k 1 + τ + k ) T ] , k N 0 ,
if θ [ k 1 T + τ T , ( k 1 + 1 ) T ] . As observed, ψ ˜ 2 * ( t ) = ψ ˜ 2 * ( t + k T ) and ψ ˜ 2 * ( t ) [ m i n ( q p 1 , q p 2 ) , m a x ( q p 1 , q p 2 ) ] ; therefore, this solution is periodic and bounded.
The boundedness of ψ ˜ 2 * ( t ) and the state z ( t ) guarantees the fulfillment of the transversality condition
lim t + i n f e λ ( θ t ) ψ ˜ 2 * ( t ) ( z ( t ) z * ( t ) ) 0 ,
for any admissible solution z ( t ) .
Thus, by considering the concavity of the Hamiltonian, we can conclude that v * is the optimal control, where
v i * = b i , ψ ˜ 2 * ( t ) > 0 , b i + ξ i a i ψ ˜ 2 * ( t ) , ψ ˜ 2 * ( t ) [ a i b i ξ i , 0 ] , 0 , ψ ˜ 2 * ( t ) < a i b i ξ i .
The uniqueness of the obtained optimal solution follows from the concavity of the Hamiltonian. Note that the Hamiltonian is a concave function with respect to the state z and strictly concave with respect to the control v i , thus following the uniqueness of the obtained optimal solution. This completes the proof of the theorem. □
Lemma 1. 
If ψ ˜ 2 ( θ ) ψ h l c , then lim t + ψ ˜ 2 ( t ) = .
Proof. 
When ψ ˜ 2 ( θ ) deviates from the equilibrium initial value ψ h l c , let ψ ˜ 2 ( θ ) = ψ h l c + c . Then, from the moment θ , the difference of each period can be obtained:
ψ ˜ 2 ( θ + k T ) ψ ˜ 2 ( θ + ( k 1 ) T ) = c e k T ( p 2 ( 1 τ ) + p 1 τ ) .
For k N + and k T ( p 2 ( 1 τ ) + p 1 τ ) > 0 , the value difference c e k T ( p 2 ( 1 τ ) + p 1 τ ) varies monotonically. Thus, the value of ψ ˜ 2 diverges with time. This implies that when t + , if c > 0 , ψ ˜ 2 ( t ) approaches + , and if c < 0 , ψ ˜ 2 ( t ) approaches . □
Lemma 2. 
Optimal control v i * is environmentally sustainable when the following inequalities are satisfied:
( ψ h l c + q p 1 ) e p 1 ( ( k 1 + τ ) T θ ) q p 1 a i b i ξ i , θ [ k 1 T , ( k 1 + τ ) T ] , ( ( ψ h l c + q p 2 ) e p 2 ( ( k 1 + 1 ) T θ ) q p 2 + q p 1 ) e p 1 ( 1 τ ) T q p 1 a i b i ξ i , θ [ ( k 1 + τ ) T , ( k 1 + 1 ) T ] ,
if δ 1 > δ 2 , k 1 = θ T , and
( ( ψ h l c + q p 1 ) e p 1 ( ( k 1 + τ ) T θ ) q p 1 + q p 2 ) e p 2 ( 1 τ ) T q p 2 a i b i ξ i , θ [ k 1 T , ( k 1 + τ ) T ] , ( ψ h l c + q p 2 ) e p 2 ( ( k 1 + 1 ) T θ ) q p 2 a i b i ξ i , θ [ ( k 1 + τ ) T , ( k 1 + 1 ) T ] ,
if δ 1 < δ 2 .
Proof. 
For δ 1 > δ 2 , the dynamic of adjoint variable ψ ˜ 2 * ( t ) decreases in the first subperiod and increases in the second subperiod. In addition, according to Theorem 1, ψ ˜ 2 * ( t ) [ m i n ( q p 1 , q p 2 ) , m a x ( q p 1 , q p 2 ) ] . Then, we have ψ ˜ 2 * ( t ) < m a x ( q p 1 , q p 2 ) < 0 .
Hence, to ensure that optimal control v i * is environmentally sustainable, the condition min t 0 ψ ˜ 2 * ( t ) a i b i ξ i must be satisfied. As such, we have
min t 0 ψ ˜ 2 * ( t ) = ψ ˜ 2 * ( ( k 1 + τ ) T ) a i b i ξ i , θ [ k 1 T , ( k 1 + τ ) T ] , min t 0 ψ ˜ 2 * ( t ) = ψ ˜ 2 * ( ( k 1 + τ + 1 ) T ) a i b i ξ i , θ [ ( k 1 + τ ) T , ( k 1 + 1 ) T ] .
Similarly, the condition for δ 1 < δ 2 can be obtained. □
Figure 1 illustrates the dynamics of ψ ˜ 2 with different initial values in two situations: when distribution shift parameter θ is located before and after the switching time in one period. Without loss of generality, period T and switching ratio τ can be assigned randomly because these two parameters do not affect the overall result. Hence, we denote T = 1 ; τ = 0.5 . In addition, for parameter λ , which directly determines the expectation of the game-termination time, we denote λ = 0.5 . The other parameters are set as δ 1 = 0.9 , δ 2 = 0.45 , and q = 6 .
The blue line, the initial value ψ ˜ 2 ( θ ) which is equal to the equilibrium value ψ h l c , denotes an equilibrium solution of the adjoint variable, which varies periodically with equal amplitude within interval [ ψ 22 * , ψ 21 * ] , where ψ 2 i * = q p i , i = 1 , 2 are equilibrium positions for each mode, depending on the change of δ . The equilibrium positions are represented by the sky-blue dash lines, and the red lines indicate nonequilibrium solutions; their initial values slightly deviate from the equilibrium value. Thus, the application of a small deviation to the initial value can cause the solution to diverge over time, either going into + or . The solution can then escape from two equilibrium points [ ψ 22 * , ψ 21 * ] .
In this way, an equilibrium solution is uniquely determined, forming an HLC as time approaches an infinite horizon with infinite time-driven switches.

3.2. First Interval— [ 0 , θ ]

The payoff functional of the first interval, [ 0 , θ ] , is denoted as
J c o 1 ( z 0 , v 1 , v 2 , , v n ) = i = 1 n 0 θ [ a i v i ( b i v i / 2 ) q i z ] d t .
The Hamiltonian is given by
H c o 1 ( t ) = i = 1 n [ a i v i ( b i v i / 2 ) q i z ] + ψ 1 ( i = 1 n ξ i v i δ ( t ) z )
Then, we have the canonical system:
z ˙ = i = 1 n ξ i v i δ ( t ) z , ψ ˙ 1 = i = 1 n q i + δ ( t ) ψ 1 , ψ 1 ( θ ) = ψ ˜ 2 ( θ ) = ψ h l c .
Moreover, the continuity condition ψ 1 ( θ ) = ψ ˜ 2 ( θ ) = ψ h l c is based on the continuity of the optimal control, which is directly driven by the adjoint variable, based on the fact that the switching instants depend on time, i.e., are autonomous or time-driven [9,22].
According to the first order derivative of the Hamiltonian, the optimal control is obtained as follows:
v i * = b i , ψ 1 > 0 , b i + ξ i a i ψ 1 , ψ 1 [ a i b i ξ i , 0 ] , 0 , ψ 1 < a i b i ξ i .
As the terminal value of ψ 1 ( t ) is determined, we consider the backward time dynamics of ψ 1 ( t ) , which is described by the following system:
ψ ˙ 1 ( t ) = i = 1 n q 1 δ ( t ) ψ 1 , t [ h T , ( h + τ ) T ) , i = 1 n q 2 δ ( t ) ψ 1 , t [ ( h + τ ) T ) , ( h + 1 ) T ) ] ,
where h N 0 , with ψ 1 ( θ ) = ψ h l c . Each subsystem of (26) has a single stable equilibrium at ψ 1 i * = q i δ i , i = 1 , 2 .
Note the equilibrium points of adjoint variables in the first interval are less than those in the second interval; thus, the overall trend of adjoint variable ψ 1 ( t ) in the first interval, [ 0 , θ ] , increases in the backward time.
Accordingly, back to the forward time point of view, the optimal control in the first interval, [ 0 , θ ] , may first retain around the maximal admissible value and then decreases to the HLC.

3.3. Numerical Overall Adjoint Variable

The initial value of adjoint variable ψ h l c in the second interval is uniquely determined, and this is used as the terminal value in the first interval. Now, we can solve the cooperative adjoint variable, denoted by ψ c o ( t ) , in the whole time horizon [ 0 , + ] . Herein, the parameter settings were similar to those in Section 3.1.
Figure 2 shows the situation when distribution shift parameter θ is located before the switching time in the second period, T θ T + τ , where
ψ c o ( t ) = ψ 1 ( t ) , t [ 0 , θ ] , ψ ˜ 2 ( t ) , t [ θ , ] , ψ 1 ( θ ) = ψ ˜ 2 ( θ ) = ψ h l c .
Figure 3 shows the situation when distribution shift parameter θ is located after the switching time in the second period, T + τ θ 2 T .
The blue lines in the figure represent for the cooperative adjoint variable, ψ c o ( t ) ; the sky-blue dash lines represent the equilibrium points of two intervals, and the green lines show the distribution shift parameter θ . From Figure 2 and Figure 3, we can conclude that regardless of whether shift instant θ is located before or after the switching time in a period, the overall trend of ψ c o ( t ) does not change.
Consequently, the overall optimal control of player i in the whole time duration [ 0 , ] is derived as:
v i * = b i , ψ c o > 0 , b i + ξ i a i ψ c o , ψ c o [ a i b i ξ i , 0 ] , 0 , ψ c o < a i b i ξ i .

4. Cooperative Game with Different Shifted Exponential Distributions

In reliability engineering, the pollution-control equipment used by each player is different and with a different warranty period. The duration of the warranty period may vary depending on factors such as product type, brand, and contract terms, etc., and it is usually measured in months or years. Hence, the i-th player’s equipment fails abruptly at moment T f i as a random variable with a known probability distribution function F i ( t ) , i = 1 , n ¯ , and the equipment may break down owing to the end of its lifetime or other natural disasters. The game lasts until one of the players’ equipments breaks down. Hence, this study also considered an n-player cooperative game of pollution control with different shifted exponential distributions. Furthermore, each player is assumed to possess specific equipment used in pollution control. Moreover, { T f i } 1 n are assumed to be independent random variables. Thus, the game duration is defined as T f = m i n { T f 1 , , T f n } .
In this case, players’ equipment is heterogeneous, and { T f i } 1 n adopts different shifted exponential distributions, as well as different distribution parameters, { λ i } 1 n . Without loss of generality, we assumed that θ 1 θ 2 θ n , where θ n is the shifted parameter of player n, and it represents the largest shifted parameter among all players. Then, we have F ( t ) = P { T f < t } = 1 i = 1 n ( 1 F i ( t ) ) , see [15], where F i ( t ) is defined as
F i ( t ) = 1 e λ i ( θ i t ) , t > θ i , 0 , t [ 0 , θ i ] .
The duration of the game is a random variable with composite distribution function [23]. Thus, the cumulative distribution function, F ( t ) , with different shifted exponential distributions is denoted as
F ( t ) = 0 , t [ 0 , θ 1 ] , 1 e λ 1 ( θ 1 t ) , t [ θ 1 , θ 2 ] , 1 e i = 1 j [ λ i ( θ i t ) ] , t [ θ j , θ j + 1 ] , 1 e i = 1 n [ λ i ( θ i t ) ] , t > θ n .
The cooperative payoff functional, (4), in this case, can be rewritten as the following sum:
J c o ( z 0 , v 1 , v 2 , , v n ) = J 1 ( z 0 , v 1 , v 2 , , v n ) + J 2 ( z 0 , v 1 , v 2 , , v n ) + + J n ( z 0 , v 1 , v 2 , , v n ) = i = 1 n [ 0 θ 1 [ a i v i ( b i v i / 2 ) q i z ] d t + θ 1 θ 2 e λ 1 ( θ 1 t ) [ a i v i ( b i v i / 2 ) q i z ] d t + + θ n e i = 1 n [ λ i ( θ i t ) ] [ a i v i ( b i v i / 2 ) q i z ] d t ] .

4.1. The Last Interval— ( θ n , ]

The payoff functional in the last interval, ( θ n , ] , is
J c o n ( z 0 , v 1 , v 2 , , v n ) = i = 1 n θ n e i = 1 n [ λ i ( θ i t ) ] [ a i v i ( b i v i / 2 ) q i z ] d t .
The Hamiltonian is given by
H c o n ( t ) = e i = 1 n [ λ i ( θ i t ) ] i = 1 n [ a i v i ( b i v i / 2 ) q i z ] + ψ n ( i = 1 n ξ i v i δ ( t ) z ) .
Then, we have the following canonical system:
z ˙ = i = 1 n ξ i v i δ ( t ) z , ψ ˙ n = e i = 1 n [ λ i ( θ i t ) ] i = 1 n q i + δ ( t ) ψ n .
Let ψ ˜ n = e i = 1 n [ λ i ( θ i t ) ] ψ n ; then, we have ψ ˜ ˙ n = i = 1 n q i + ( δ ( t ) + i = 1 n λ i ) ψ ˜ n .
As the differential equation of ψ ˜ n ( t ) is similar to the identical shift case, ψ ˜ 2 ( t ) , in the Section 2, we can still uniquely determine the equilibrium initial value, ψ ˜ n ( θ n ) = ψ h l c N , that forms a unique HLC.
ψ h l c N = L ¯ ( 1 e p 2 N T ( τ 1 ) ) e p 1 N m n q p 1 N , m n [ 0 , τ T ] , L ¯ ( 1 e p 1 N τ T ) e p 2 N ( m n T ) q p 2 N , m n [ τ T , T ] ,
where q = i = 1 n q i ,   λ N = i = 1 n λ i ,   p 1 N = δ 1 + λ N ,   p 2 N = δ 2 + λ N ,   m n = θ n θ n T T ,   L ¯ = q ( p 1 N p 2 N ) p 1 N p 2 N ( e p 2 N T ( τ 1 ) e p 1 N τ T ) .

4.2. Former Intervals— [ 0 , θ 1 ] ( θ 1 , θ 2 ] ( θ n 1 , θ n ]

Next, the equilibrium value of the adjoint variable at moment θ n is uniquely determined. Thus, the dynamics of ψ i ( t ) , i = 1 , n 1 ¯ are also uniquely determined in backward time.
The payoff functional of player j, j [ 1 , n 1 ] , j N 0 , is given by
J c o j ( z 0 , v 1 , v 2 , , v n ) = i = 1 n θ j θ j + 1 e i = 1 j [ λ i ( θ i t ) ] [ a i v i ( b i v i / 2 ) q i z ] d t .
Furthermore, the Hamiltonian is denoted as
H c o j ( t ) = e i = 1 j [ λ i ( θ i t ) ] i = 1 n [ a i v i ( b i v i / 2 ) q i z ] + ψ j ( i = 1 n ξ i v i δ ( t ) z ) .
The differential equation of ψ j ( t ) is given as
ψ ˙ j = e i = 1 j [ λ i ( θ i t ) ] i = 1 n q i + δ ( t ) ψ n .
Let ψ ˜ j = e i = 1 j [ λ i ( θ i t ) ] ψ j ; then, we have
ψ ˜ ˙ j = i = 1 n q i + ( δ ( t ) + i = 1 j λ i ) ψ ˜ j , ψ ˜ j ( θ j + 1 ) = ψ ˜ j + 1 ( θ j + 1 ) .
In addition, for the first interval [ 0 , θ 1 ] , we have ψ ˙ 1 ( t ) = i = 1 n q i + δ ( t ) ψ 1 ( t ) ,   ψ 1 ( θ 1 ) = ψ 2 ˜ ( θ 1 ) .
Consequently, the overall cooperative adjoint variable with different shifted distributions is obtained as follows:
ψ c o ( t ) = ψ 1 ( t ) , t [ 0 , θ 1 ] , ψ ˜ 2 ( t ) , t [ θ 1 , θ 2 ] , ψ ˜ j ( t ) , t [ θ j , θ j + 1 ] , ψ ˜ n ( t ) , t > θ n .

5. Numerical Optimal Solution

For simplicity, we considered an example of a two-player cooperative game-theoretic model of pollution control. As the overall cooperative adjoint variable is uniquely obtained in each case, optimal solutions are shown below.
To meet the condition of environmentally sustainable control, the parameters settled were set as follows: δ 1 = 0.9 ; δ 2 = 0.45 ; T = 1 ;   τ = 0.5 ; λ = 0.5 ; q = 6 ;   ξ 1 = 0.8 ;   ξ 2 = 0.7 ; a 1 = 1 ;   a 2 = 1.2 ; b 1 = b 2 = 10 ;   z 0 = 0 ;   θ = 1.2 ; θ 1 = 1 ;   θ 2 = 2.2 ;   λ 1 = 0.3 ; and λ 2 = 0.5 .

5.1. Solution of Identical Shifted Exponential Distribution

The optimal control, state trajectory, and corresponding cooperative payoff are demonstrated in Figure 4.
The solution shows that the optimal emissions of each player change periodically after the distribution shift instant θ = 1.2 . Further, the stock of pollution converged to a unique HLC and stabilized, and the cooperative payoff increased and then converged.
In some cases, before time instant θ , there may exist a period of radical emissions. This could be interpreted as a more intense use of equipment by players during the warranty period.

5.2. Solution of Different Shifted Exponential Distributions

The optimal control, state trajectory, and corresponding cooperative payoff in this case are shown in Figure 5.
Figure 5 shows that after the expiration of each piece of equipment’s warranty period ( θ i ), the control strategy of player i tends to be increasingly conservative. After the maximal warranty period θ 2 , the control strategy of each player transforms to a periodic solution in the form of an HLC. The stock of pollution also converges to a unique HLC and stabilizes, while the cooperative payoff increases and then converges. This result indicates that the proposed control strategy can still maintain profitability when the equipment of each player is nonhomogeneous with seasonal fluctuation ( δ ( t ) ).
For the dynamic switched system of pollution levels of a lake considered in this paper, based on the obtained results and coupled with the random duration of the process, we proposed that, within the framework of a cooperative game, players may adopt a production strategy transitioning from aggressive to conservative (gradually decreasing output) before the maximum warranty period θ 2 . This strategy reaches its equilibrium value at θ 2 . Moreover, after θ 2 , players adopt a production strategy following an HLC pattern. This involves the gradual increase and progressive decrease in production during periods of relatively high and low lake self-cleaning rates, respectively. Consequently, this approach ensures the attainment of an optimally controlled outcome from the context of sustainable development.
Therefore, based on the considerations of real-world issues, the obtained results exhibit uniqueness, theoretical applicability, and practical relevance.

6. Conclusions

In this study, we analyzed the cooperative differential game for a typical hybrid optimal pollution-control problem with two types of time-driven switches: the seasonal fluctuation(self-cleaning rate of the lake) and the shifted parameter of exponential distributions due to the random game duration. A random terminal duration problem was transformed into a combination of an infinite horizon and a finite horizon(s) optimal control problem.
Further, we first considered a scenario with identical game duration and then examined another scenario with joint probability distribution of game duration resulting from the heterogeneity of players’ equipment. This paper discussed these two scenarios in detail and presented the results of each scenario analytically and numerically. Furthermore, an environmentally sustainable solution in the form of an HLC was uniquely determined for each scenario, ensuring both sustainable production revenue and environmental protection.
In a subsequent study, we will delve into the optimal control problem of each player in a noncooperative game featuring infinite time-driven switches and random game duration. In addition, we will provide comparisons of the results obtained from the noncooperative game with those obtained from the cooperative game to establish a justifiable allocation rule.

Author Contributions

Conceptualization, A.T. and Y.W.; methodology, A.T. and Y.W.; validation, Y.W. and H.W.; formal analysis, A.T. and Y.W.; investigation, Y.W. and A.T; writing—original draft preparation, Y.W. and A.T.; writing—review and editing, A.T., Y.W. and H.W.; visualization, Y.W.; project administration, H.W. and A.T.; funding acquisition, H.W. and A.T. All authors have read and agreed to the published version of the manuscript.

Funding

The work of Anna Tur was funded by RFBR and DFG, project number 21-51-12007.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Notations.
Table A1. Notations.
SymbolsDescriptions
T f the game duration
T f i the i-th player’s equipment failure time
Tthe period time
τ the switching ratio
zthe stock of pollution within a fixed natural reservoir (e.g., a lake)
v i the emissions rate of player i
b i the maximal admissible emissions rate of player i
ξ i the fraction of the emitted pollutants accumulated in the reservoir from each player (e.g., factory)
δ ( t ) the self-cleaning rate of the reservoir in time t
C i ( z ) the damage cost of player i
R i ( v i ) the revenue functional of player i
L ( v i , z ) the instantaneous payoff of player i
a i the positive constant used to transform the player i’s emissions flow to the profit flow
q i the positive constant, corresponding to the tax that player i must bear
J i ( z 0 , v 1 , v 2 , , v n ) the integral payoff of player i
J c o ( z 0 , v 1 , v 2 , , v n ) the joint payoff
θ the shift parameter of the exponential distribution from the initiation of game
λ the parameter of distribution
θ i the shifted parameter of player i
λ i the distribution parameter of player i

References

  1. Scheffer, M.; Carpenter, S.; Foley, J.A.; Folke, C.; Walker, B. Catastrophic shifts in ecosystems. Nature 2001, 413, 591–596. [Google Scholar] [PubMed]
  2. Ang, A.; Bekaert, G. Regime switches in interest rates. J. Bus. Econ. Stat. 2002, 20, 163–182. [Google Scholar] [CrossRef]
  3. Rasmussen, L.V.; Christensen, A.E.; Danielsen, F.; Dawson, N.; Martin, A.; Mertz, O.; Sikor, T.; Thongmanivong, S.; Xaydongvanh, P. From food to pest: Conversion factors determine switches between ecosystem services and disservices. Ambio 2017, 46, 173–183. [Google Scholar] [CrossRef] [PubMed]
  4. Hiskens, I.A. Stability of hybrid system limit cycles: Application to the compass gait biped robot. In Proceedings of the 40th IEEE Conference on Decision and Control, Orlando, FL, USA, 4–7 December 2001; pp. 774–779. [Google Scholar]
  5. Lennartson, B.; Tittus, M.; Egardt, B.; Pettersson, S. Hybrid systems in process control. IEEE Control. Syst. Mag. 1996, 16, 44–56. [Google Scholar]
  6. Hoekstra, J.; Van den Bergh, J.C. Harvesting and conservation in a predator–prey system. J. Econ. Dyn. Control 2005, 29, 1097–1120. [Google Scholar] [CrossRef]
  7. Zelikin, M.I.I.; Lokutsievskiy, L.V.; Skopincev, S.V. On optimal harvesting of a resource on a circle. Math. Notes 2017, 102, 521–532. [Google Scholar] [CrossRef]
  8. Moberg, E.A.; Pinsky, M.L.; Fenichel, E.P. Capital investment for optimal exploitation of renewable resource stocks in the age of global change. Ecol. Econ. 2019, 165, 106335. [Google Scholar]
  9. Gromov, D.; Bondarev, A.; Gromova, E. On periodic solution to control problem with time-driven switching. Optim. Lett. 2022, 16, 2019–2031. [Google Scholar] [CrossRef]
  10. Xin, B.; Peng, W.; Sun, M. Optimal coordination strategy for international production planning and pollution abating under cap–and–trade regulations. Int. J. Environ. Res. Public Health 2019, 16, 3490. [Google Scholar] [CrossRef]
  11. Gromov, D.; Shigoka, T.; Bondarev, A. Optimality and sustainability of hybrid limit cycles in the pollution control problem with regime shifts. Environ. Dev. Sustain. 2023, 1–18. [Google Scholar] [CrossRef]
  12. Su, S.; Tur, A. Estimation of Initial Stock in Pollution Control Problem. Mathematics 2022, 10, 3457. [Google Scholar] [CrossRef]
  13. Huang, X.; He, P.; Hua, Z. A cooperative differential game of transboundary industrial pollution between two regions. J. Clean. Prod. 2015, 120, 43–52. [Google Scholar] [CrossRef]
  14. Jørgensen, S.; Martín-Herrán, G.; Zaccour, G. Dynamic games in the economics and management of pollution. Environ. Model. Assess. 2010, 15, 433–467. [Google Scholar]
  15. Gromova, E.V.; Tur, A.V.; Balandina, L.I. A game-theoretic model of pollution control with asymmetric time horizons. Contrib. Game Theory Manag. 2016, 9, 170–179. [Google Scholar]
  16. Shevkoplyas, E.V.; Kostyunin, S.Y. Modeling of environmental projects under condition of a random time horizon. Contrib. Game Theory Manag. 2011, 4, 447–459. [Google Scholar]
  17. Aleksandrov, A.Y.; Andriyanova, N.R. Fixed–time stability of switched systems with application to a problem of formation control. Nonlinear Anal. Hybrid Syst. 2021, 40, 101008. [Google Scholar] [CrossRef]
  18. Breton, M.; Zaccour, G.; Zahaf, M. A differential game of joint implementation of environmental projects. Automatica 2005, 41, 1737–1749. [Google Scholar] [CrossRef]
  19. De Zeeuw, A.; Zemel, A. Regime shifts and uncertainty in pollution control. J. Econ. Dyn. Control 2012, 36, 939–950. [Google Scholar] [CrossRef]
  20. Kostyunin, S.; Shevkoplyas, E. On simplification of integral payoff in the dif-ferential games with random duration. Vestn. St. Petersburg Univ. 2011, 10, 47–56. [Google Scholar]
  21. Goodland, R. The concept of environmental sustainability. Annu. Rev. Ecol. Syst. 1995, 26, 1–24. [Google Scholar] [CrossRef]
  22. Gromov, D.; Gromova, E. On a class of hybrid differential games. Dyn. Games Appl. 2017, 7, 266–288. [Google Scholar] [CrossRef]
  23. Balas, T.; Tur, A. The Hamilton–Jacobi–Bellman Equation for Differential Games with Composite Distribution of Random Time Horizon. Mathematics 2023, 11, 462. [Google Scholar] [CrossRef]
Figure 1. Dynamics of the ψ ˜ 2 with different initial values.
Figure 1. Dynamics of the ψ ˜ 2 with different initial values.
Entropy 25 01426 g001
Figure 2. Dynamics of the ψ c o if k T θ ( k + τ ) T .
Figure 2. Dynamics of the ψ c o if k T θ ( k + τ ) T .
Entropy 25 01426 g002
Figure 3. Dynamics of the ψ c o if k T + τ θ ( k + 1 ) T .
Figure 3. Dynamics of the ψ c o if k T + τ θ ( k + 1 ) T .
Entropy 25 01426 g003
Figure 4. Optimal solution and cooperative payoff with identical shifted exponential distribution.
Figure 4. Optimal solution and cooperative payoff with identical shifted exponential distribution.
Entropy 25 01426 g004
Figure 5. Optimal solution and cooperative payoff with different shifted exponential distributions.
Figure 5. Optimal solution and cooperative payoff with different shifted exponential distributions.
Entropy 25 01426 g005
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, Y.; Tur, A.; Wang, H. Sustainable Optimal Control for Switched Pollution-Control Problem with Random Duration. Entropy 2023, 25, 1426. https://doi.org/10.3390/e25101426

AMA Style

Wu Y, Tur A, Wang H. Sustainable Optimal Control for Switched Pollution-Control Problem with Random Duration. Entropy. 2023; 25(10):1426. https://doi.org/10.3390/e25101426

Chicago/Turabian Style

Wu, Yilun, Anna Tur, and Hongbo Wang. 2023. "Sustainable Optimal Control for Switched Pollution-Control Problem with Random Duration" Entropy 25, no. 10: 1426. https://doi.org/10.3390/e25101426

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop