Robust Statistic Estimation in Constrained Optimal Control Problems of Pollution Accumulation (Part II: Markovian Switchings)

Escobedo-Trujillo, Beatris Adriana; López-Barrientos, José Daniel; Higuera-Chan, Carmen Geraldi; Alaffita-Hernández, Francisco Alejandro

doi:10.3390/math11041045

Open AccessArticle

Robust Statistic Estimation in Constrained Optimal Control Problems of Pollution Accumulation (Part II: Markovian Switchings)

¹

Facultad de Ingeniería, Universidad Veracruzana, Coatzacoalcos 96535, Mexico

²

Facultad de Ciencias Actuariales, Universidad Anáhuac Mexico, Naucalpan de Juárez 52786, Mexico

³

Departamento de Matemáticas, Universidad de Sonora, Hermosillo 83000, Mexico

⁴

Centro de Investigación en Recursos Energéticos y Sustentables, Universidad Veracruzana, Coatzacoalcos 96535, Mexico

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(4), 1045; https://doi.org/10.3390/math11041045

Submission received: 18 January 2023 / Revised: 8 February 2023 / Accepted: 16 February 2023 / Published: 18 February 2023

(This article belongs to the Special Issue Application of Optimal Control and Game Theory to the Problem of Resource Management)

Download

Browse Figures

Versions Notes

Abstract

:

This piece is a follow-up of the research started by the authors on the constrained optimal control problem applied to pollution accumulation. We consider a dynamic system governed by a diffusion process with multiple modes that depends on an unknown parameter. We will study the components of the model and their restrictions and propose a scheme to solve the problem in which it is possible to determine (adaptive) policies that maximize a suitable discounted reward criterion using standard dynamic programming techniques in combination with discrete estimation methods for the unknown parameter. Finally, we develop a numerical example to illustrate our results with a particular case of the method of minimum least square error approximation.

Keywords:

consistent estimators; multiple Markovian modes; discounted cost; maximum likelihood estimators; least square errors

MSC:

93E10; 93E20; 93E24; 60J60

1. Introduction

Pollution control in large cities is a problem of great interest worldwide, and that is why various organizations are continually seeking strategies to mitigate it. As for scientists, they have begun to analyze models that describe the stock of pollution through ordinary and stochastic differential equations. In particular, optimal control theory has been applied for the optimal management of pollution in economic sciences. This theory considers an economy that consumes some good and, as a by-product of that consumption, generates pollution. The hypotheses in our model are as follows:

The contamination stock is only gradually dissolved by the environment;
The growth rate of the pollution is constant or random;
The flow of pollution is constrained so that it satisfies some mandatory global standards in order to promote sustainable development (see, for instance, ref. [1]).

Social welfare is defined by the net utility from the consumption of some good vis à vis the disutility caused by pollution. Our objective is to find an optimal consumption policy for society. That is, we seek to maximize the difference between the utility function of consumption vs. the disutility caused by the polluting stock (see [2,3]).

This paper represents the second part of a project related to the constrained optimal control problem of pollution accumulation with an unknown parameter. To be in context, we begin by briefly summarizing the results obtained in Robust statistic estimation in constrained optimal control problems of pollution accumulation (Part I) (see [4]).

The first part considers the scenery where the dynamic system is given by a diffusion process and depends on an unknown parameter, say

θ

. First, assuming the parameter

θ

as known, an approach with restrictions and one without them is proposed. The respective optimal value functions are

V_{θ}^{*}

and

V_{θ, λ}^{*}

. Then, estimation techniques for the parameter

θ

are applied and later combined with the characterizations and results previously analyzed. Roughly speaking, one of the results considers a sequence of estimated parameters (

θ_{m}

:m = 1, …) such that

θ_{m} \to θ

, then the value functions converge (in some sense), that is

V_{θ_{n}}^{*} \to V_{θ}^{*} and V_{θ_{m}, λ}^{*} \to V_{θ, λ}^{*} .

Another result states the existence of optimal policies such that

π_{θ_{m}} \to π_{θ}

. Furthermore, the relationship between the value functions

V_{θ}^{*}

and

V_{θ, λ}^{*}

is shown.

In this piece, we obtain similar results considering the case where the dynamics of the stock of pollution evolves as a diffusion process with Markovian switching whose drift function, as well as the reward function, depends on the unknown parameter

θ

. In addition, we impose some natural constraints on the performance index.

To avoid confusion, we try to preserve the notation of Part I in this work. First, a constrained control problem is proposed. Subsequently, assuming

θ

is the real parameter, we study a standard control problem under the discounted criterion, where it is possible to apply standard techniques and dynamic programming tools to determine optimal policies. Then a (discrete) procedure to estimate the unknown parameter

θ

is applied in combination with the standard results formerly mentioned to obtain the so-called adaptive policies that maximize a discounted reward criterion with constraints.

The idea is to estimate the parameter

θ

, and then solve the optimal control problem when such an estimated value is replaced in the problem. In the literature, this approach is known as the Principle of Estimation and Control. This problem has been studied in several contexts. For instance, refs. [5,6,7,8] and the references therein are about stochastic control systems evolving in discrete time. On the other hand, adaptive optimal control for continuous time is studied in [9,10,11]. The estimation for diffusion processes using discrete observations has been studied in the works [12,13,14,15,16].

Dynamic optimization has been used to study the problem of pollution accumulation in the past; for example, the papers [17,18] use a linear quadratic model to explain this phenomenon, the article [2] deals with the average payoff in a deterministic framework, while [3,19] extend the former’s approach to a stochastic context, and [20] uses a robust stochastic differential game to model the situation. The study [21] is a statistical analysis of the impact of air pollution on public health. In order to develop adaptive policies that are almost surely optimal for the restricted optimization problem under the discounted reward on an infinite horizon with Markovian switchings, we use a statistical estimation approach to determine the unknown parameter

θ

. These adaptive policies are created by replacing the estimates into optimal stationary controls (that is, the use of the PEC); for more information, see the works of Kurano and Mandl (cf. [7,8]). The statistic estimation method we use for the unknown parameter

θ

is the so-called least square estimator for stochastic differential equations based on many discrete observations. This resembles existing robust estimation techniques, such as the

H_{\infty}

method, in the fact that in the applications, the dynamic systems are linear. However, the computational complexity of these techniques is greater. Indeed, with our least square estimator, only the inverse of a matrix must be calculated to obtain the estimator, while there are way more computations to be performed in the other algorithms (see [22,23]). Most risk analysts will not be as familiar with our methods as they are with, for example, the model predictive control, MATLAB’s robust control toolbox, or the polynomial chaos expansion method, which have been used in the literature to address similar issues. Since we review a constructive method for robust and adaptive control under deep uncertainty, our findings are similar to those reported in the article [24]. Moreover, our methods also resemble the adaptive moving mesh method for optimal control problems in viscous incompressible fluid used in [25].

This piece can be also be considered an extension of [26,27,28,29], who also study adaptive constrained optimal control methods. In fact, ref. [28] studies a constrained optimal control problem, but unlike our case, there, all the parameters are known, while [26] does the same but in the context of pollution accumulation. The references [27,29] study an unconstrained adaptive optimal control problem. Finally, it is important to highlight the numerical estimation technique that illustrates the results of this article.

The rest of the paper is organized as follows. We present the elements of our model and assumptions in Section 2. Next, Section 3 introduces our optimality criterion and the main results; an interesting numerical example illustrating our results is given in Section 4. We give our conclusions in Section 5, and finally, we included the proof of the important (but rather distracting) Theorem A1 on the convergence of the HJB equation under the topology of relaxed controls in Appendix A.

Notation and Terminology

For vectors

x = (x_{1}, x_{2}, \dots, x_{n}) \in R^{n}

and matrices

A = (A_{k, p}) \in M_{n} (R)

, we denote by

| \cdot |

the Euclidean norm, that is,

{| x |}^{2} : = \sum_{k = 1}^{n} x_{k}^{2} and {| A |}^{2} : = T r (A A^{⊤}) = \sum_{k, p = 1}^{n} A_{k, p}^{2},

where

A^{⊤}

and

T r (\cdot)

denote the transpose and the trace of matrix, respectively. As an abbreviation, we write

\partial_{i}

and

\partial_{i j}^{2}

to refer to

: = \frac{\partial}{\partial x_{i}}

, and

\frac{\partial^{2}}{\partial x_{i} \partial x_{j}}

, respectively.

Given a Borel set B, we denote by

B (B)

its natural

σ

-algebra. As usual,

C (O)

, stands for the space of continuous functions whose domain is

O

and

C (O \times E) : = {ν : O \times E \to R^{n} : ν (\cdot, i) \in C (O \times E) f o r e a c h i \in E} .

Consequently we denote

C_{b} (O \times E)

as the subspace of

C (O \times E)

composed by bounded functions. The set

C^{κ} (O \times E) : = {ν : O \times E \to R^{n} : ν (\cdot, i) \in C^{κ} (O \times E) f o r e a c h i \in E}

, where

C^{κ} (O \times E)

is the space of all real-valued continuous functions f on the bounded, open and connected subset

O \subset R^{n}

with continuous derivatives up to order

κ \in N

.

Fix

p \geq 1

and a measure space

(Ω, F, μ)

, we denote

L^{p} (Ω \times E)

as the Lebesgue space of functions g on

Ω \times E

such that

\int_{Ω} {| g (x, i) |}^{p} μ (d x) < \infty

for

i \in E

.

Let X and Y be Borel spaces. A stochastic kernel

Q (\cdot | \cdot)

on X given Y is a function such that

Q (\cdot | y)

is a probability measure on X for each

y \in Y

and

Q (B | \cdot)

is a measurable function on Y for each

B \in B (X)

.

Finally, the set

P (B)

denotes the family of probability measures on B endowed with the topology of weak convergence.

2. Model Formulation and Assumptions

Taking as reference the problem analyzed in Part I, we consider the scenery where the dynamics of the pollution stock is modeled as an n-dimensional controlled stochastic differential equation (SDE) with Markovian switching. Specifically, such a dynamic takes the form

d x (t) = b (x (t), ψ (t), u (t), θ) d t + σ (x (t), ψ (t)) d W (t), (x (0), ψ (0)) = (x_{0}, ψ_{0}), t \geq 0,

(1)

where

E = {1, 2, \dots, N}

,

b : R^{n} \times E \times U \times Θ \to R^{n}

and

σ : R^{n} \times E \to R^{n \times d}

are given functions,

W (\cdot)

is an

F_{t}

-adapted d-dimensional Wiener process such that

W (t) - W (s)

and

F_{s}

are pairwise independent,

W (\cdot)

is independent of

ψ (\cdot)

, and the evolution of the Markov chain

ψ

has intensity

Q = {(q_{i j})}_{i, j \in E}

and transition rule given by

P (ψ (t + Δ t) = j | ψ (t) = i, (x (s), ψ (s)), s \leq t) = \{\begin{matrix} q_{i j} Δ t + o (Δ t), & if i \neq j, \\ 1 + q_{i i} Δ t + o (Δ t), & if i = j, \end{matrix}

(2)

for

t \geq 0

and

\sum_{j = 1}^{N} q_{i j} = 0

. The compact set

U \subset R^{n_{1}}

is called the control set. In the context of our problem,

u (t)

is a stochastic process on U such that, at time t, it represents the flow of consumption which, in turn, is considered bounded to reflect the policies and rules imposed by governments or social entities.

It is important to remark that throughout this work, we assume that

θ

is an unknown parameter taking values on a compact set

Θ \subset R^{m}

, which is called the parameter set. Note that in the context of pollution problems,

θ

can be seen as the pollution decay rate.

Now we define the so-called randomized policies, also known as relaxed controls, or just policies.

Definition 1.

A policy is a family

π : = {(π_{t} (\cdot | \cdot, \cdot))}_{t \geq 0}

of stochastic kernels on

B (U) \times R^{n} \times E

(see Section 1). We denote by Π the set of stationary policies. In particular, a randomized policy is said to be stationary if there is a probability measure

π (\cdot | x, i) \in P (U)

such that

π_{t} (\cdot | x, i) = π (\cdot | x, i)

for all

t \geq 0

and

(x, i) \in R^{n} \times E

. Let

F

be the set of measurable functions f from

R^{n} \times E

to U. We denote the set of stationary Markov policies as

F_{1} : = {f : R^{n} \times E \to U : f o r e a c h i \in E, f (\cdot, i) \in F}

.

For each randomized policy

π \in Π

and a function whose domain is contained to U, say

v : R^{n} \times E \times U \times Θ \to R

, we use the abbreviated notation

v (x, i, π, θ) : = \int_{U} v (x, i, u, θ) π (d u | x, i) .

(3)

A suitable adjustment should be made for functions with a different domain.

We endow

Π

with a topology (see [30]) determined by the convergence criterion defined below (see [31,32], Lemma 3.2 in [30,33].

Definition 2.

A sequence

{(π_{m})}_{m \in N}

in Π converges to

π \in Π

if

\int_{R^{n}} g (x, i) h (x, i, π_{m}) d x \to \int_{R^{n}} g (x, i) h (x, i, π) d x .

for all

g \in L^{1} (R^{n} \times E)

, and

h \in C_{b} (R^{n} \times E \times U)

(see (3)). Since this mode of convergence was introduced by Warga (cf. [30]), we denote it as

π_{m} \overset{W}{\to} π

.

For

ν (\cdot, \cdot, θ) \in C^{2} (R^{n} \times E)

,

u \in U

and

θ \in Θ

, the infinitesimal generator associated with the process

(x (\cdot), ψ (\cdot))

is

L^{u, θ} ν (x, i, θ) : = \sum_{k = 1}^{n} b_{k} (x, i, u, θ) \partial_{k} ν (x, θ, i) + \frac{1}{2} \sum_{k, ℓ = 1}^{n} a^{k, ℓ} (x, i) \partial_{k, ℓ}^{2} ν (x, i, θ) + \sum_{j = 1}^{N} q_{i j} ν (x, i, θ),

where

b_{k}

is the k-th component of the drift function b, and

a^{k, ℓ}

is the

(k, ℓ)

component of the matrix

a (\cdot, \cdot) : = σ (\cdot, \cdot) σ {(\cdot, \cdot)}^{⊤}

. As in (3), for each policy

π \in Π

, we write

L^{π, θ} ν (x, i, θ) : = \int_{U} L^{u, θ} ν (x, i, θ) π (d u | x, i) .

The following set of assumptions and conditions ensures the existence and uniqueness of a strong solution as well as stability of the dynamic system (1) and (2) (see [31,33,34,35]).

Assumption 1.

(a): The random process (1) belongs to a complete probability space $(Ω, F, P^{u, θ})$ . Here, ${F_{t}}_{t \geq 0}$ is a filtration on $(Ω, F)$ such that each $F_{t}$ is complete relative to $F$ , and $P^{u, θ}$ is the law of the state process $x (\cdot)$ given the parameter $θ \in Θ$ and the control $u (\cdot)$ .
(b): The drift function $b (\cdot, \cdot, \cdot, \cdot)$ in (1) is continuous and satisfies that for each $R > 0$ , there exist non-negative constants $K_{θ} (R)$ and $D (R)$ such that, for all $u \in U$ , all $| θ_{1} |, | θ_{2} | \leq R$ and $| x |, | y | \leq R$ ,

$| b (x, i, u, θ) - b (y, i, u, θ) | \leq K_{θ} (R) | x - y |,$

$| b (x, i, u, θ_{1}) - b (x, i, u, θ_{2}) | \leq D (R) | θ_{1} - θ_{2} |,$

Moreover, the function $u \mapsto b (x, i, u, θ)$ is continuous on U.
(c): The diffusion coefficient σ satisfies a local Lipschitz condition; that is, for each $R > 0$ , there exists a constant $K_{1} (R) > 0$ such that, for all $| x |, | y |$ less than R,

$| σ (x, i) - σ (y, i) | \leq K_{1} (R) | x - y | .$
(d): A global linear growth condition is satisfied

$sup_{(u, θ) \in U \times Θ} {|b (x, i, u, θ)|}^{2} + {| σ (x, i) |}^{2} \leq \tilde{K} {(1 + | x |}^{2}) f o r a l l x \in R^{n},$

where $\tilde{K}$ > 0 is a constant.
(e): The matrix $a (x, i) : = σ (x, i) σ {(x, i)}^{⊤}$ satisfies that, for some constant $K_{2} > 0$ ,

$x^{⊤} a (y, i) x \geq K_{2} {| x |}^{2} f o r a l l x, y \in R^{n} .$

Remark 1.

(i): Properties such as continuity or Lipschitz continuity given in Assumption 1 are inherited to the drift function $b (x, i, π, θ)$ .
(ii): Under Assumption 1, once a policy $π \in Π$ and a parameter $θ \in Θ$ are fixed, the references [31] and [33] guarantee the existence of a probability space $(Ω, F, P^{π, θ})$ in which there exists a unique process $x^{π, θ} (\cdot)$ with the Markov–Feller property which, in turn, is an almost surely strong solution.

The next hypothesis is known as the Lyapunov stability condition.

Assumption 2.

There exists a function

w \in C^{2} (R^{n} \times E)

,

w (\cdot, \cdot) \geq 1

and constants

d \geq β > 0

such that

(a): ${lim}_{| x | \to \infty} w (x, i) = \infty$ uniformly in $i \in E$ .
(b): $L^{π, θ} w (x, i) \leq - β w (x, i) + d$ for all $π \in Π$ , $θ \in Θ$ and $(x, i) \in R^{n} \times E$ .

Assumption 2 essentially asks for a twice-continuously differentiable function to solve the problem at hand. This hypothesis is equivalent to requiring positive-definite matrices in the context of linear matrix inequalities (see [36] and pages 113–135 in [37]). The existence of a function w with the conditions in Assumption 2 implies that the rate functions involved in our model can be unbounded (see Assumption 3). As in the first part, we define next an adequate space for these functions.

Definition 3.

Let v be a function from

R^{n} \times E

to

R

, we define its w-norm as

{∥v∥}_{w} : = sup_{(x, i) \in R^{n} \times E} \frac{| v (x, i) |}{w (x, i)} < \infty .

Even more, let

B_{w} (R^{n} \times E)

be the Banach space of real-valued measurable functions with finite w-norm.

Let r and c be measurable functions from

R^{n} \times E \times U \times Θ

to

R

identified as reward (social welfare) rate and the cost rate, respectively, and let

η

from

R^{n} \times E \times Θ

to

R

be another measurable function that models the constraint rate. In the context of pollution accumulation, in some situations, such a restriction is due to each country’s legal framework, and the cost of cleaning the environment must be bounded for some given quantity.

Assumption 3.

For each

i \in E

fixed, the payoff rate

r (\cdot, i, \cdot, \cdot)

, the cost rate

c (\cdot, i, \cdot, \cdot)

and the constraint rate

η (\cdot, i, \cdot)

are continuous on

R^{n} \times U \times Θ

. Moreover, they are locally Lipschitz on

R^{n}

, uniformly on E, U and Θ. That is, for each

R > 0

, there are positive constants

K (R)

and

K_{2} (R)

such that for all

| x |, | y | \leq R

,

\begin{matrix} sup_{(i, u, θ) \in E \times U \times Θ} | r (x, i, u, θ) - r (y, i, u, θ) | + sup_{(i, u, θ) \in E \times U \times Θ} | c (x, i, u, θ) - c (y, i, u, θ) | & \leq & K (R) | x - y |, \\ sup_{(i, θ) \in E \times Θ} | η (x, i, θ) - η (y, i, θ) | & \leq & K_{2} (R) | x - y | . \end{matrix}

Even more, the rate functions belong to

B_{w} (R^{n} \times E)

and there exists

M > 0

such that for all

(x, i) \in R^{n} \times E

,

sup_{(u, θ) \in U \times Θ} | η (x, i, θ) | + sup_{(u, θ) \in U \times Θ} | r (x, i, u, θ) | + sup_{(u, θ) \in U \times Θ} | c (x, i, u, θ) | \leq M w (x, i) .

3. Discounted Optimality Problems and Main Results

Through this section, we establish the contamination problem of our interest in terms of the terminology of optimal control. To this end, we will introduce the functions that evaluate the behavior of the system throughout the process associated with payments, costs, and restrictions.

In order to avoid confusion, we will preserve the notation and the ordering in the presentation of the results from the first part of the project.

3.1. Discounted Optimality Criterion

Definition 4.

Given the initial state

(x, i) \in R^{n} \times E

, a parameter value

θ \in Θ

and a discount rate

α > 0

, we define the total expected α-discounted reward, cost and constraint when the controller uses a policy π in Π as

\begin{matrix} V (x, i, π, r, θ) & : = & E_{x, i}^{π, θ} [\int_{0}^{\infty} e^{- α t} r (x (t), ψ (t), π, θ) d t], \\ V (x, i, π, c, θ) & : = & E_{x, i}^{π, θ} [\int_{0}^{\infty} e^{- α t} c (x (t), ψ (t), π, θ) d t] a n d \end{matrix}

\bar{η} (x, i, π, θ) : = α E_{x, i}^{π, θ} [\int_{0}^{\infty} e^{- α t} η (x (t), ψ (t), θ) d t] .

respectively, and

E_{x, i}^{π, θ} [\cdot]

is the expectation of · taken with respect to the probability measure

P^{π, θ}

when

(x (t), ψ (t))

starts at

(x, i)

.

Proposition 1.

If Assumptions 1–3 hold, the functions

V (\cdot, \cdot, π, r, θ)

and

V (\cdot, \cdot, π, c, θ)

belong to

B_{w} (R^{n} \times E)

for each π in Π; in fact, for each

(x, i) \in R^{n} \times E

and

θ \in Θ

we have

sup_{π \in Π} |V (x, i, π, r, θ)| + sup_{π \in Π} |V (x, i, π, c, θ)| \leq 2 M (α) w (x, i)

where

M (α) : = M \frac{α + d}{α β}

, the constants c and d are as in Assumption 2, and M is as in Assumption 3 (b).

Proposition 1 can be obtained directly using the following inequality, which is an application of Dynkin’s formula to the function

v (t, x, i) : = e^{β t} w (x, i)

, and Assumption 2 (b) yield that, for all

π \in Π

,

θ \in Θ

,

(x, i) \in R^{n} \times E

and

t \geq 0

,

E_{x, i}^{π, θ} [w (x (t), ψ (t))] \leq e^{- β t} w (x, i) + \frac{d}{β} (1 - e^{- β t}) .

(4)

Remark 2.

The function

\bar{η} (\cdot, \cdot, π, θ)

is in

B_{w} (R^{n} \times E)

for each

π \in Π

. Moreover, for each

(x, i) \in R^{n} \times E

, we have

sup_{π \in Π} |\bar{η} (x, i, π, θ)| \leq {∥ η ∥}_{w} \frac{α + d}{β} w (x, i) .

Let

θ \in Θ

be fixed, and again apply Dynkin’s formula to the function V (see Theorem 1.45 in p. 48 in [34] or Theorem 1 (iii) in [38]) to yield the following result.

Proposition 2.

Let Assumptions 1, 2 hold, and let v be a measurable function on

R^{n} \times E \times U \times Θ

satisfying Assumption 3. Then, for

π \in Π

, the associated expected α-discounted reward

V (\cdot, \cdot, \cdot, π, θ)

belongs to

W^{2, p} (R^{n} \times E) \cap B_{w} (R^{n} \times E)

, and satisfies

α V (x, i, π, v, θ) = V (x, i, π, v, θ) + L^{π, θ} V (x, i, π, v, θ) f o r a l l (x, i) \in R^{n} \times E a n d θ \in Θ .

(5)

Conversely, if some function

φ (\cdot, \cdot, θ)

in

W^{2, p} (R^{n} \times E) \cap B_{w} (R^{n} \times E)

satisfies Equation (5), then

φ (x, i, θ) = V (x, i, π, v, θ) f o r a l l (x, i) \in R^{n} \times E a n d θ \in Θ .

(6)

Even more, if relation (5) is an inequality, then (6) holds with the respective inequality.

Consider that

W^{ℓ, p} (R^{n} \times E)

is the Sobolev space of real-valued measurable functions on

R^{n} \times E

whose derivatives up to order

ℓ \geq 0

are in

L^{p} (R^{n} \times E)

for

p \geq 1

.

Given the initial conditions

(x, i) \in R^{n} \times E

a parameter

θ \in Θ

, and a constraint function

η

satisfying Assumption 3, we define the set of policies

F_{θ}^{x, i} : = \{π \in Π | V (x, i, π, c, θ) \leq \bar{η} (x, i, π, θ)\} .

(7)

We assume, for the moment, that the set defined in (7) is nonempty. Up to this point, we are in a position to formulate the discounted problem with constraints (DPC), which is defined below.

Definition 5.

Given the initial condition

(x, i) \in R^{n} \times E

and the parameter

θ \in Θ

we say that policy

π^{*} \in Π

is optimal for the DPC if

π^{*} \in F_{θ}^{x, i}

and

V (x, i, π^{*}, r, θ) = sup_{π \in F_{θ}^{x, i}} V (x, i, π, r, θ) .

Furthermore, the function

V^{*} (x, i, r, θ) : = V (x, i, π^{*}, r, θ)

is known as the

α —

discount optimal reward for the DPC.

3.2. Unconstrained Discounted Optimality

The objective of this part is to transform the original DPC (presented above) into an unconstrained problem, and thus, to be able to propose results and techniques known in the literature. To this end, we will apply the Lagrange multipliers technique used in [26]. Take

λ \leq 0

and consider the function

r^{λ} (x, i, u, θ) : = r (x, i, u, θ) + λ (c (x, i, u, θ) - α η (x, i, θ)) .

(8)

For our purpose,

r^{λ}

represents the new reward rate. Now recalling (3), we write (8) as

r^{λ} (x, i, π, θ) : = r (x, i, π, θ) + λ (c (x, i, π, θ) - α η (x, i, θ)), π \in Π, θ \in Θ .

Remark 3.

For each

α > 0

and

λ < 0

, by direct calculations, it is possible to show that

r^{λ} (\cdot, \cdot, π, θ) \in B_{w} (R^{n} \times E)

uniformly in

π \in Π

and

θ \in Θ

. Even more, by Assumption 3, this new reward rate is a Lipschitz function.

In the same way as in Definition (4), for all

(x, i) \in R^{n} \times E

and

θ \in Θ

, we define the function

V (x, i, π, r^{λ}, θ) : = E_{x, i}^{π, θ} [\int_{0}^{\infty} e^{- α t} r^{λ} (x (t), ψ (t), π, θ) d t] .

So, the discounted unconstrained problem is defined as follows.

Definition 6

(The adaptive

θ

-control problem with Markovian switching). A policy

π^{*} \in Π

is said to be α-discount optimal for the λ-DUP given that θ is the true parameter value, if

V^{*} (x, i, r^{λ}, θ) : = V (x, i, π^{*}, r^{λ}, θ) = sup_{π \in Π} V (x, i, π, r^{λ}, θ)

(9)

for all

(x, i) \in R^{n} \times E

. The function

V^{*}

will be called the value function of the adaptive θ-control problem with Markovian switching.

Let

v : R^{n} \times E \times U \times Θ \to R

be a measurable function satisfying the conditions given in Assumption 3. The following result (obtained from [33]) shows that the function

V^{*} (\cdot, \cdot, v, θ)

is the unique solution of (10), and also proves the existence of stationary optimal policies.

Proposition 3.

Suppose that Assumptions 1–3 hold. Then we have the following:

(i): The α-optimal discount reward $V^{*} (\cdot, \cdot, v, θ)$ belongs to $W^{2, p} (R^{n} \times E) \cap B_{w} (R^{n} \times E)$ and it verifies the discounted reward HJB equation. That is, for all $(x, i) \in R^{n} \times E$ and $θ \in Θ,$

$\begin{matrix} α V^{*} (x, i, v, θ) & = & sup_{u \in U} {r (x, i, u, θ) + L^{u, θ} V^{*} (x, i, v, θ)} . \end{matrix}$

(10)

Conversely, if a function $φ_{θ} \in W^{2, p} (R^{n} \times E) \cap B_{w} (R^{n} \times E)$ satisfies (10), then $φ_{θ} (x, i) = V^{*} (x, i, v, θ)$ for all $(x, i) \in R^{n} \times E$ .
(ii): There exists a stationary policy $f_{θ}^{*} \in F$ that maximizes the right-hand side of (10). That is,

$α V^{*} (x, i, v, θ) = r (x, i, f_{θ}^{*}, θ) + L^{f_{θ}^{*}, θ} V^{*} (x, i, v, θ) f o r a l l (x, i) \in R^{n} \times E,$

and $f_{θ}^{*}$ is α-discount optimal given that θ is the true parameter value.

Remark 4.

(a): Notice that $V (x, i, π, r^{λ}, θ) = V (x, i, π, r, θ) + λ [V (x, i, π, c, θ) - \bar{η} (x, i, π, θ)],$ and by Definition 4, $V (x, i, π, c, θ) - \bar{η} (x, i, π, θ) = V (x, i, π, c - α η, θ)$ ,
(b): Remark 3 and Proposition 1 yield that ${sup}_{π \in Π} |V (x, i, π, r^{λ}, θ)| \leq M_{α}^{λ} w (x, i),$ with $M_{α}^{λ} : = N^{λ} \frac{α + d}{α β}$ and $N^{λ}$ is a bound of $∥ r^{λ} ∥_{w}$ , implying in turn that $V (\cdot, \cdot, π, r^{λ}, θ) \in B_{w} (R^{n} \times E)$ .
(c): If Assumptions 1, 2 and 3 hold, then by Proposition 3.4 in [28], the mappings $π \to V (x, i, π, v, θ)$ , $π \to V (x, i, π, c - α η, θ)$ and $π \to V (x, i, π, r^{λ}, θ)$ are continuous on Π for each $(x, i) \in R^{n} \times E$ and $θ \in Θ .$

3.3. Convergence of Value Functions and Estimation Methods

Finally, in this part, we will present one of the main results of this work, which combines optimality and the statistical approximation scheme (in a discrete way) of our unknown parameter. To do this, we define the concept of consistent estimator and the approximation technique that will be used for it.

Definition 7.

A sequence

{(θ_{m})}_{m \in N}

of measurable functions

θ_{m} : Ω \to Θ

is said to be a sequence of uniformly strongly consistent (USC) estimators of

θ \in Θ

if, as

m \to \infty

,

θ_{m} (ω) \to θ P^{π, θ} - a . s . f o r a l l π \in Π .

For ease of notation, we write

θ_{m} : = θ_{m} (ω) \in Θ

. Let

v : R^{n} \times U \times Θ \to R

be a measurable function satisfying similar conditions as those given in Assumption 3. The following observations and estimation procedure are an adaptation to what was done in the first part and show us that our set of hypotheses and procedures are consistent.

Remark 5.

(a): Let ${(θ_{m})}_{m \in N}$ be a sequence of USC estimators of $θ \in Θ$ and let $v : R^{n} \times E \times Π \times Θ \to R$ be a function that satisfies the Assumptions 1–3. Theorem 4.5 in [29], guarantees that every sequence ${(V (x, i, π, v, θ_{m}))}_{m \in N}$ converges to $V (x, i, π, v, θ)$ , $P^{π, θ}$ almost surely.
(b): Let ${(π_{m})}_{m \in N}$ be a sequence in Π. Since Π is a compact set, there exists a subsequence ${(π_{m_{k}})}_{k \in N} \subset {(π_{m})}_{m \in N}$ such that $π_{m_{k}} \overset{W}{\to} π \in Π$ , and thus, combining Remark 4 (a) and Remark 4 (c), and applying a suitable triangular inequality, it is possible to deduce that for every measurable function v satisfying Assumption 3,

$\begin{matrix} V (x, i, π_{m_{k}}, v, θ_{m_{k}}) \to V (x, i, π, v, θ) P^{π, θ} - a . s . a s k \to \infty . \end{matrix}$
(c): By Proposition 3, and taking into account that $r^{λ}$ in (8), the function $V^{*} (\cdot, \cdot, r^{λ}, θ)$ verifies (10). In addition, the second part of Proposition 3 ensures the existence of stationary policy $f_{θ}^{λ} \in F_{1}$ .
(d): For each $λ \leq 0$ , $θ \in Θ$ and $α > 0$ , we define the set

$Π^{λ, θ} : = \{π \in Π : α V^{*} (x, i, r^{λ}, θ) = r^{λ} (x, i, π, θ) + L^{π, θ} V^{*} (x, i, r^{λ}, θ) \forall (x, i) \in R^{n} \times E\} .$

(11)

Since $F_{1}$ can be seen as an embedding of Π, Proposition 3 (ii) guarantees that $Π^{λ, θ}$ is a nonempty set.
(e): As in [4], the set of hypotheses considered in this paper and Lemma 3.15 in [28] ensures that for each $θ \in Θ$ fixed and any sequence ${(λ_{m})}_{m \in N}$ , converging to λ (with $λ, λ_{m} \leq 0$ ); if there exists a sequence of policies ${(π^{λ_{m}, θ})}_{m \in N} \in Π^{λ_{m}, θ}$ such that $π^{λ_{m}, θ} \overset{W}{\to} π$ , then $π \in Π^{λ, θ}$ .
(f): Lemma 3.16 in [28] ensures that the mapping $λ \mapsto V^{*} (x, i, r^{λ}, θ)$ is differentiable on $(- \infty, 0)$ . In fact, for each $λ < 0$ and $θ \in Θ$

$\frac{\partial V^{*} (x, i, r^{λ}, θ)}{\partial λ} = V (x, i, π^{λ}, c, θ) - \bar{η} (x, i, π^{λ}, θ) .$

(12)

The unknown parameter

θ

will be estimated as Pedersen [39] describes. That is, the functions

h_{m} : Ω \times Θ \to R, for m = 1, \dots

will measure how likely the different values of

θ

are. If for each

ω \in Ω

fixed, the function

h_{m} (ω, θ)

has a unique maximum point

θ_{m} (ω) \in Θ

, then

θ

is estimated by

θ_{m} (ω)

.

Under the assumption that, for

m \in N

and

θ \in Θ

,

h_{m} (\cdot, θ)

is a measurable function of

ω

and that it is also twice continuously differentiable in

θ

for all

P^{π, θ} -

almost all

ω \in Ω

, it is proven that the function

θ \to h_{m} (ω, θ)

is continuous and has a unique maximum point

θ_{m} (ω)

for each

ω \in Ω

fixed. The number

m \in N

is the index of a sequence of random experiments on the measurable space

(Ω, F)

. This method is known as the approximate maximum likelihood estimator.

In our scenery, given a partition of times

{0 = t_{0} < t_{1} < t_{m} : = T}

from

[0, T]

, the outcomes of the random experiments will be represented by a sequence

X_{T} : = (x_{t_{i}} : i = 0, \dots, m)

of a trajectory

x^{u, θ} (t)

up to time T on

(Ω, F) : = (C ([0, \infty)), B (C ([0, \infty)))

and the function

h_{m}

will be called the least square function (LSE), i.e.,

h_{m} (w, θ) : = L S E (w, θ) .

It is evident that

x^{u, θ} (t)

in (1) is observed up to a finite time, say T, for which we define

L S E (X_{T}, θ) : = \sum_{i = 1}^{m} {(x_{t_{i}} - x_{t_{i - 1}} - b (x_{t_{i - 1}}, ψ (t_{i - 1}), u_{t_{i - 1}}, θ) (t_{i} - t_{i - 1}))}^{2},

(13)

with the drift function b as in (1). The above function generates the least square estimator until time T with m observations:

θ_{L S E} \equiv θ_{L S E} (X_{T}) : = arg min_{θ \in Θ} L S E (X_{T}, θ) .

(14)

Remark 6.

The fact that

x^{u, θ} (t)

in (1) can only be observed in a finite horizon is one of the hypotheses of the so-called model predictive control. However, at least from a theoretical point of view, our version of the PEC makes no such assumption, but still chooses T as large as practically possible and thus defines (13) and (14). In this sense, there is a connection between these two perspectives.

In [12,16,39], the consistency and asymptotic normality of

θ_{L S E}

are studied. In particular, Shoji (see [16]) shows that the optimization based on the LSE function is equivalent to the optimization based on the discrete approximate likelihood ratio function in the one-dimensional stochastic differential equation case and with a constant diffusion coefficient considered:

\begin{matrix} M L R (X_{T}, θ) : = \sum_{i = 1}^{m} b (y_{t_{i - 1}}, u_{t_{i - 1}}, θ) {[σ (y_{t_{i - 1}}) σ {(y_{t_{i - 1}})}^{⊤}]}^{- 1} (x_{t_{i}} - x_{t_{i - 1}}) \\ - \frac{1}{2} \sum_{i = 1}^{m} \{b {(y_{t_{i - 1}}, u_{t_{i - 1}}, θ)}^{⊤} {[σ (y_{t_{i - 1}}) σ {(y_{t_{i - 1}})}^{T}]}^{- 1} \cdot b (y_{t_{i - 1}}, u_{t_{i - 1}}, θ) (t_{i} - t_{i - 1})\}, \end{matrix}

with

y_{t_{i - 1}} : = (x_{t_{i - 1}}, ψ (t_{i - 1})), a n d

b and

σ

as in (1). The MLR function generates the discrete approximate likelihood ratio estimator:

θ_{L R} \equiv θ_{L R} (X_{T}) : = arg max_{θ \in Θ} M L R (X_{T}, θ) .

Now, we will establish our main result.

Theorem 1.

Let

{(θ_{m})}_{m \in N}

be a sequence of USC estimators of

θ \in Θ

. For each m, let

π_{m}

be a α-discount optimal policy. Then there exists a subsequence

{(m_{k})}_{k}

of

{(m)}_{m}

and a policy

π^{*}

such that

π_{m_{k}} \overset{W}{\to} π^{*}

. Moreover, if Assumptions 1–3 hold, as

k \to \infty

,

V^{*} (x, i, θ_{m_{k}}) \to V^{*} (x, i, θ) P^{π^{*}, θ} - a . s . f o r e a c h x \in R^{n} a n d i \in E,

and

π^{*}

is α-discount optimal for the θ-control problem

P^{π^{*}, θ}

almost surely.

Proof.

Consider a sequence of USC estimators

{(θ_{m})}_{m \in N}

such that

θ_{m} \to θ

as

m \to \infty

. Let

R > 0

, and take the open ball

B_{R} \times E : = {(x, i) \in R^{n} \times E | | x | < R, i \in E}

. For

(x, i) \in B_{R} \times E

, let

{(π_{m})}_{m \in N} \subset Π

be a sequence of

α

-discounted optimal policies. Since

Π

is a compact set, there exists a subsequence

{(π_{m_{k}})}_{k \in N} \subset {(π_{m})}_{m \in N}

such that

π_{m_{k}}

converges to

π^{*} \in Π

in the topology of relaxed controls given in Definition 2.

Let us first fix an arbitrary

m_{k} \in N

. Then, Theorem 6.1 in [33] ensures that the value function

V^{*} (x, i, θ_{m_{k}})

in (9) is the unique solution of the HJB Equation (10), i.e., it satisfies

\begin{matrix} α V^{*} (x, i, θ_{m_{k}}) & = & r (x, i, π_{m_{k}}, θ_{m_{k}}) + L^{π_{m_{k}}, θ_{m_{k}}} V^{*} (x, i, θ_{m_{k}}), \end{matrix}

(15)

and by Theorem 9.11 in [40], there exists a constant

C_{0}

(depending on R) such that, for fixed

θ_{m_{k}}

and

p > n

, we have

\begin{matrix} | V^{*} (x, i, θ_{m_{k}}) |_{W^{2, p} (B_{2 R} \times E)} & \leq & C_{0} ({∥V^{*} (x, i, θ_{m_{k}})∥}_{L^{p} (B_{2 R} \times E)} + {∥r (x, i, π_{m_{k}}, θ_{m_{k}})∥}_{L^{p} (B_{2 R} \times E)} \\ \leq & (M + M (α)) | B_{2 R} |^{1 / p} max_{(x, i) \in B_{2 R} \times E} w (x, i) < \infty, \end{matrix}

(16)

where

| {\bar{B}}_{2 R} |

represents the volume of the closed ball with radius

2 R

, and M and

M (α)

are the constants in Assumption 3 (b) and Proposition 1, respectively.

Now, observe that conditions (a)–(e) of Theorem A1 hold. In fact, for each

π_{m_{k}}

, (15) can be written in terms of the operator (A2) as

L^{π_{m_{k}}, θ} V^{*} (x, i, θ_{m_{k}}) = 0

with the functions

v_{2}

,

λ

,

ρ

equal to zero,

v_{1} \equiv r

and

h_{m_{k}} (x, i) \equiv V^{*} (x, i, θ_{m_{k}})

. So, taking

ξ_{m_{k}} \equiv 0

,

λ \equiv 0

, conditions (a),(c) and (d) hold. In addition, by (16), condition (b) is verified as well.

Then, by Theorem A1, we claim the existence of a function

h (\cdot, \cdot, θ) \in W^{2, p} (R^{n} \times E)

, together with a subsequence

(m_{k} : k = 1, \dots)

such that

V^{*} (\cdot, \cdot, θ_{m_{k}}) \to h (\cdot, \cdot, θ)

uniformly in

B_{R} \times E

, and pointwise on

R^{n} \times E

as

k \to \infty

and

π_{m_{k}} \overset{W}{\to} π^{*}

. Furthermore,

h (\cdot, \cdot, θ)

satisfies

α h (x, i, θ) = r (x, i, π^{*}) + L^{π^{*}, θ} h (x, i, θ) P^{π^{*}, θ} - a . s .,

(17)

with

h (\cdot, \cdot, θ) \in W^{2, p} (B_{R} \times E)

. Since the radius

R > 0

is arbitrary, we can extend our analysis to all of

(x, i) \in R^{n} \times E

.

Thus, as

V^{*} (x, i, θ)

is the unique solution of the HJB equation (17), we can deduce that

h (x, i, θ)

coincides with

V^{*} (x, i, θ)

. So, by (15) and (17), as

k \to \infty

,

V^{*} (x, i, θ_{m_{k}}) \to V^{*} (x, i, θ) P^{π^{*}, θ} - a . s ., f o r e a c h x \in R^{n} and i \in E .

On the other hand, by Proposition 3, for each

i \in E

and

θ_{m_{k}} \in Θ

fixed, we have

\begin{matrix} α V^{*} (x, i, θ_{m_{k}}) & \geq & r (x, i, π, θ_{m_{k}}) + L^{π, θ_{m_{k}}} V^{*} (x, i, θ_{m_{k}}) f o r a l l π \in Π . \end{matrix}

(18)

Hence, letting

k \to \infty

and using Theorem A1 from appendix again, we obtain that (18) converges to

\begin{matrix} α V^{*} (x, i, θ) & \geq & r (x, i, π, θ) + L^{π, θ} V^{*} (x, i, θ) f o r a l l π \in Π . \end{matrix}

(19)

Thus, by (17) and (19), we obtain

\begin{matrix} α V^{*} (x, i, θ) & = & sup_{π \in Π} {r (x, i, π, θ) + L^{π, θ} V^{*} (x, i, θ)} . \end{matrix}

implying that

π^{*}

is

α

-optimal for the

θ

-control problem with Markovian switching. □

In the following section, we present a numerical example to illustrate our results. To this end, we implement Algorithm 1. In it, first we introduce the number of iterations in our process and define the variables we need to simulate the dynamic system

x (t)

and the Markov chain

ψ (t)

. Such simulations are inspired by the algorithm proposed in [41], and allow us to obtain the discrete observations

{x_{k} : k = 1, 2, \dots}

needed to feed (13) and (14) and thus approximate the real value of

θ

.

Algorithm 1: Method of LSE to find

θ

Remark 7.

Now we list some limitations of our approach.

1.: Approximation of the derivative. In our case, we use central differences, but in each application, the approximation type to be used must be analyzed.
2.: Least squares approximation. The most common restrictions are the amount of data, the regularity of the samples, and the size of the subintervals.
3.: Euler–Maruyama method. The most common restrictions in this method occur if the differential equation presents stiffness, inappropriate step size, or sudden growth. In our application, the Euler–Maruyama method converges with strong order 1/2 to the true solution. See Theorem 10.2.2 in [42].

4. Numerical Example

This application complements the one we used in [38]. We represent the stock of pollution as the controlled diffusion process with Markovian switchings of the form

d x (t) = [u (t, ψ (t)) - θ x (t)] d t + σ d W (t), x (0) = x > 0, ψ (0) = i,

(20)

where

ψ (t)

is a Markov chain with generator

Q = (\begin{matrix} - λ_{0} & λ_{0} \\ λ_{1} & - λ_{1} \end{matrix}) .

ψ (t)

stands for the perception of society toward the current level of pollution at each time. It takes values from the set

E : = {1, 2}

. So, if the Markov chain is initially in state

ψ (0) = 1

, then before its first jump from state 1 to state 2 at its first random jump time

τ_{1}

, the stock of pollution obeys the following SDE

d x (t) = [u (t, ψ (1)) - θ x (t)] d t + σ d W (t),

(21)

with initial state

x (0) = 0

. At time

τ_{1}

, the Markov chain jumps to 2, where it will stay until the next jump, at time

τ_{2}

. During the period

[τ_{1}, τ_{2}]

, the stock of pollution is driven by the SDE

d x (t) = [u (t, ψ (2)) - θ x (t)] d t + σ d W (t),

(22)

with initial value

x (τ_{1})

at time

τ_{1}

, and the stock of pollution switches to (22) from (21). The stock of pollution will continue to alternate between these two states ad infinitum.

We also consider the pollution flow to be constrained. This means that our controller variable

u (t)

will be taking values in

\begin{matrix} [0, η] & if ψ (t) = 1, or in \\ [η, γ] & if ψ (t) = 2 . \end{matrix}

for a constant

0 \leq η \leq γ .

So,

u (t) : = u (t, ψ (t))

. We introduce the reward rate function

r : [0, \infty) \times E \times U \to R

, that represents the social welfare defined by

r (x, i, u) : = \sqrt{u} - a (i) x, for all (x, i, u) \in [0, \infty) \times E \times U,

whereas the cost and constraint rates are

\begin{matrix} c (x, i, u) & = & c_{1} (i) x + c_{2} (i) u for all (x, i, u) \in [0, \infty) \times E \times U, \\ η (x, i, θ) & : = & \frac{c_{1} (i) x}{α + θ} + q, \end{matrix}

where q is a positive constant. Clearly, (20) satisfies Assumption 1. The infinitesimal generator for a function

v \in C^{2} (R \times E)

is

L^{u, θ} v (x, i) = [u - θ x] \frac{\partial v (x, i)}{\partial x} + \frac{1}{2} σ^{2} \frac{\partial^{2} v (x, i)}{\partial x^{2}} + \sum_{j = 0}^{1} q_{i j} v (x, j), for x > 0 and i \in E .

We use

w (x, i) : = x^{2} i + 1

. It is easy to verify that

L^{u, θ} w (x, i) \leq - b_{1} w (x, i) + g (x, i, u, θ)

, with

0 < b_{1} < 2 θ - q_{1 i}

, where

g (x, i, u, θ) : = b_{1} w (x, i) + (2 u x - 2 θ x^{2} + σ^{2}) i + q_{i 1} x^{2}

.

Take

b_{1}

such that

b_{1} - 2 θ + q_{i 1} < 0

, and note that for every

(x, i) \in R \times E

,

(u, θ) \to g (x, i, u, θ)

is continuous on the compact sets U and

Θ

; therefore, there exists a constant

d_{1}

such that

g (x, i, u, θ) \leq d_{1}

for all

(x, i) \in R \times E

,

u \in U

and

θ \in Θ

. So, Assumption 2 is satisfied.

In this problem, the payoff rate is

r^{λ} (x, i, u) : = r (x, i, u) - λ c (x, i, u)

where

λ

is the Lagrange multiplier, the

α —

discounted expected payoff is

V (x, i, π, r_{1}, θ) : = E_{x, i}^{π, θ} [\int_{0}^{\infty} e^{- α t} r^{λ} (x (t), ψ (t), π) d t],

and the value function is

V^{*} (x, i, θ) = sup_{π \in Π} V (x, i, π, r_{1}, θ) .

(23)

In order to find the optimal control and the value function

V^{*} (x, i, θ)

given in (23), we need to solve (10) for each

i \in E

. The HJB equations associated with this example are

\begin{matrix} α φ (x, 1) = & sup_{0 \leq u \leq η} {\sqrt{u} - a (0) x + λ [c_{1} (1) x + c_{2} (1) u - α (\frac{c_{1} (1) x}{α + θ} + q)] \\ + (u - θ x) \frac{\partial φ (x, 1)}{\partial x} + \frac{1}{2} σ^{2} \frac{\partial^{2} φ (x, 1)}{\partial x^{2}} + \sum_{j = 1}^{2} q_{1 j} φ (x, j)} for all x > 0 . \end{matrix}

(24)

\begin{matrix} α φ (x, 2) = & sup_{0 \leq u \leq γ} {\sqrt{u} - a (2) x + λ [c_{1} (2) x + c_{2} (2) u - α (\frac{c_{1} (2) x}{α + θ} + q)] \\ + (u - θ x) \frac{\partial φ (x, 2)}{\partial x} + \frac{1}{2} σ^{2} \frac{\partial^{2} φ (x, 2)}{\partial x^{2}} + \sum_{j = 1}^{2} q_{2 j} φ (x, j)} for all x > 0 . \end{matrix}

(25)

Assuming that a solution to (24) and (25) has the form

φ (x, i) = k_{1} (i) x + k_{2} (i)

with

k_{1}, k_{2} : E \to R

measurable functions, we get

\frac{\partial φ (x, i)}{\partial x} = k_{1} (i)

and

\frac{\partial^{2} φ (x, i)}{\partial x^{2}} = 0 .

Replacing the derivatives of

φ (x, i)

into (24) and (25), we obtain

\begin{matrix} k_{1} (i) & = & \frac{λ θ c_{1} (i) - (α + θ) a (i)}{{(α + θ)}^{2}} + \frac{\sum_{j = 1}^{2} q_{i j} k_{1} (j)}{α + θ}, \end{matrix}

\begin{matrix} α k_{2} (1) & = & sup_{0 \leq u \leq η} (\sqrt{u} - a_{λ, θ, 1} u) - λ α q + \sum_{j = 1}^{2} q_{1 j} k_{2} (j), \end{matrix}

(26)

\begin{matrix} α k_{2} (2) & = & sup_{η \leq u \leq γ} (\sqrt{u} - a_{λ, θ, 1} u) - λ α q + \sum_{j = 1}^{2} q_{2 j} k_{2} (j), \end{matrix}

(27)

with

a_{λ, θ, i} : = \frac{(α + θ) a (i) - λ [θ c_{1} (i) + {(α + θ)}^{2} c_{2} (i)]}{{(α + θ)}^{2}} + \frac{\sum_{j = 1}^{2} q_{i j} k_{1} (j)}{α + θ} > 0,

Notice that the suprema in (26) and (27) are attained at

f_{θ}^{λ} (1) = \{\begin{matrix} \frac{1}{4 {(a_{λ, θ, 0})}^{2}} & if \frac{1}{2 \sqrt{η}} < a_{λ, θ, 0}, \\ η & if \frac{1}{2 \sqrt{η}} \geq a_{λ, θ, 0}, \end{matrix}

(28)

f_{θ}^{λ} (2) = \{\begin{matrix} \frac{1}{4 {(a_{λ, θ, 1})}^{2}} & if \frac{1}{2 \sqrt{γ}} < a_{λ, θ, 1}, \\ γ & if \frac{1}{2 \sqrt{γ}} \geq a_{λ, θ, 1} . \end{matrix}

(29)

Thus,

k_{2} (\cdot)

can be written as

k_{2} (i) = \frac{\sqrt{f_{θ}^{λ} (i)} - a_{λ, θ, i} f_{θ}^{λ} (i)}{α} - λ q + \frac{1}{α} \sum_{j = 0}^{1} q_{i j} k_{2} (j) .

By Proposition 3, the optimal control is (28) and (29), and the value function is

φ (x, i)

, i.e.,

\begin{matrix} φ (x, i) = V^{*} (x, i, θ, r) = & [\frac{λ θ c_{1} (i) - (α + θ) a (i)}{{(α + θ)}^{2}} + \frac{\sum_{j = 0}^{1} q_{i j} k_{1} (j)}{α + θ}] x \\ + & \frac{\sqrt{f_{θ}^{λ} (i)} - a_{λ, θ, i} f_{θ}^{λ} (i)}{α} - λ q + \frac{1}{α} \sum_{j = 0}^{1} q_{i j} k_{2} (j) . \end{matrix}

(30)

For the numerical experiment, we consider the particular form of (1) given by (20) to test Algorithm 1 with

Q = (\begin{matrix} - 1 & 1 \\ 1 & - 1 \end{matrix})

as the generator of the continuous-time Markov chain

ψ (t)

embedded within (20). Also, let

x (0) = 0

,

T = 5

,

d t = 10^{- 4}

,

u (t, 1) = 0.5

,

u (t, 2) = 1.5

,

σ = \sqrt{10^{- 10}}

, and

θ = 2.5

as the true parameter value of the pollution decay rate. These last data allow us to simulate (20) in the interval

[0, 5]

, and, for the sake of comparison, it will be considered as the real model (see Figure 1). Based on this information,

m = 50, 000

discrete observations were obtained. Now, we suppose that

θ

is the unknown parameter and we estimate it by means of the least square function LSE in (13) and (14). Substituting

b (x, i, u, θ) = u (t, i) - θ x (t)

in (13), we obtain the following estimator for each state

i \in E = {1, 2}

.

θ {(i)}_{L S E_{m}} = \frac{\sum_{k = 2}^{m - 1} u x_{k} - x_{k} d x_{k}}{\sum_{i = 2}^{m - 1} x_{k}^{2}},

(31)

where

d x_{k} : = \frac{1}{2} \frac{x_{k + 1} - x_{k - 1}}{t_{k + 1} - t_{k}} .

Given that the dynamic system for

x (t)

is governed by a stochastic differential equation with Markovian switching, it is not possible to have a single value for

θ

, but rather a set of values (the number of these values strictly depends on the number of jumps that occur in the interval

[0, T]

), which we will denote as in (31). These approximations allow us to simulate the stochastic differential equation with Markovian switching again with the same jumps. The outputs of the approximate stochastic differential equation with Markovian switching

x^{θ_{m}^{j}} (t)

and the one with the real value for

θ

,

x^{θ} (t)

are displayed in Table 1.

To graph the value function

V_{θ}^{*} (x, i) : = V^{*} (x, i, θ, r)

given in (30) we take

η = 3,

γ = 3,

a (1) = 1.25

,

a (2) = 2

,

c_{1} (1) = 100

,

c_{1} (2) = 150

,

c_{2} (1) = 10

,

c_{1} (2) = 1.5

,

α = 0.2

, and

q = 60

, and

θ {(i)}_{L S E_{m}}

with

m = 10, 000, 12, 500, 16, 667, 25, 000

and 5000, see Figure 2.

The symbol

(x {(t)}^{θ_{L S E_{m}}}, f_{θ_{L S E_{m}}}^{λ} (i), V_{θ_{L S E_{m}}}^{*} (x, i))

denotes the value estimates of the dynamic system

x (t)

, the optimal control

f_{θ}^{λ}

and of the value function

V_{θ}^{*}

when we take

θ_{L S E_{m}}

instead of

θ

.

We obtained

m = 50, 000

discrete observations of (20) on

[0, 5]

. Given that the Markov chain is known, the vector of jump times

(τ_{1}, τ_{2}, τ_{3}, τ_{4}, τ_{5})

is known as well. The estimator used in each interval

[τ_{k}, τ_{k + 1}]

with

k = 0, 1, 2, 3, 4

and

τ_{0} = 0

is

θ_{L S E_{m}}

given in (31). Figure 1 and Figure 2, together with Table 1 show that, as m increases, the estimator approaches the true parameter value

θ = 2.5

, and the RMSE between the estimations

(x {(t)}^{θ_{L S E_{m}}}, f_{θ_{L S E_{m}}}^{λ} (i), V_{θ_{L S E_{m}}}^{*} (x, i))

and the actual values of

(x {(t)}^{θ}, f_{θ}^{λ} (i), V_{θ}^{*} (x, i))

decreases, thus implying a good fit.

Theorem 5.5 in [28] ensures that for a fixed point

z > 0

such that

q < \frac{η c_{1} (i) z}{{(α + θ)}^{2}} + \frac{[θ c_{1} (i) + {(α + θ)}^{2} c_{2} (i)] γ}{α {(α + θ)}^{2}} for all i = 1, 2

if the inequality

\frac{1}{2 \sqrt{(\frac{α {(α + θ)}^{2} q - α θ c_{1} (i) z}{θ c_{1} (i) + {(α + θ)}^{2} c_{2} (i)})}} > \frac{a (i)}{α + θ}

holds, then the mapping

λ ⟼ V^{*} (z, i, θ, r^{λ})

admits a critical point

λ_{z, θ}^{*} \equiv λ_{z, θ}^{*} (α, z) < 0

satisfying

a_{λ_{z, θ}^{*}, θ} (i) = \frac{(α + θ) a (i) - λ_{z, θ}^{*} [θ c_{1} (i) + {(α + θ)}^{2} c_{2} (i)]}{{(α + θ)}^{2}} = \frac{1}{2 \sqrt{(\frac{α {(α + θ)}^{2} q - α θ c_{1} (i) z}{θ c_{1} (i) + {(α + θ)}^{2} c_{2}})}} .

Therefore, every

π^{λ_{z, θ}^{*}} \in Π^{λ_{z, θ}^{*}}

is

α

-optimal for the DPC and

V (z, i, π^{λ_{z, i, θ}^{*}}, c, θ) = \bar{η} (z, i, π^{λ_{z, θ}^{*}}, θ)

; in particular, the

α

-optimal policy for the DPC is

f_{θ}^{λ_{z, θ}^{*}} \in F \cap Π^{λ_{z, θ}^{*}}

of the form

f_{θ}^{λ_{z, θ}^{*}} (i) = \frac{α {(α + θ)}^{2} q - α θ c_{1} (i) z}{θ c_{1} (i) + {(α + θ)}^{2} c_{2} (i)},

(32)

and the

α

-optimal value for the DPC is given by

\begin{matrix} V^{*} (z, i, θ, r^{λ_{z, θ}^{*}}) = V^{*} (z, π^{λ_{z, i, θ}^{*}}, θ, r) \\ = - \frac{a (i) z}{α + θ} + \frac{1}{α} \sqrt{(\frac{α {(α + θ)}^{2} q - α θ c_{1} (i) z}{θ c_{(} {i) + (α + θ)}^{2} c_{2} (i)})} - \frac{a (i)}{α + θ} [\frac{{(α + θ)}^{2} q - θ c_{1} (i) z}{θ c_{1} (i) + {(α + θ)}^{2} c_{2} (i)}] . \end{matrix}

(33)

5. Conclusions

We studied controlled stochastic differential equations with Markovian switching of the form (1), where the drift coefficient depends on an unknown parameter

θ \in Θ

.

Two problems were analyzed, each one under a corresponding reward criterion: the discounted unconstrained problems (DUP) and the discounted problem with constraints (DPC) with optimal value functions

V_{θ}^{*} (x, i, r)

and

V_{θ}^{*} (x, i, r^{λ})

, respectively. Once a suitable procedure estimation of

θ

is obtained, it generates a sequence of estimators

{(θ_{m})}_{m \in N}

such that

θ_{m} \to θ

as

m \to \infty

, and the results obtained guarantee the following:

For each initial state and parameter $θ_{m}$ , $V_{θ_{m}}^{*} \to V_{θ}^{*}$ almost surely for both problems.
For each estimation $θ_{m}$ and problem (DUP or PDC), there are optimal policies $π_{θ_{m}}$ .
There is a subsequence of policies ${(π_{θ_{m_{k}}})}_{k \in N}$ and a policy $π_{θ}^{*} \in Π$ such that $π_{θ_{m_{k}}} \overset{W}{\to} π_{θ}^{*}$ , and, moreover, $π_{θ}^{*}$ is optimal for the $θ$ -OCP.
Similar to the previous point, for the DUP, there is a subsequence of policies and a policy $π_{θ}^{*} \in Π^{λ, θ}$ such that $π^{λ_{m_{k}}, θ_{m_{k}}} \overset{W}{\to} π_{θ}^{*}$ , and $π_{θ}^{*}$ is optimal for the $θ$ -DUP. Moreover, if $λ_{m_{k}} < 0$ is a critical point of $V_{θ_{m_{k}}}^{*} (x, r^{λ})$ , then $π_{θ}^{*}$ is optimal for the $θ$ -DCP.

The numerical part is one of the strengths of this work. Indeed, it aims at solving an estimation problem and a control problem. This task requires knowledge and storage of the optimal policies

π_{θ}

for all the values of

θ

, which may take considerable offline execution time. In addition, we propose and implement an algorithm to approximate

θ

.

Finally, the idea of modeling the dynamic

x (t)

as a controlled diffusion process with Markovian switchings allows us to consider extra factors or elements that affect the pollution stock. Such factors could be seen, in particular, as multiple pollution sources. An interesting task or challenge would be to pose this scenario as a multi-objective problem, where both sources of contamination and the stock require to be minimized under certain restrictions. This could be done by adapting and defining a suitable multi-objective linear program (convex program) and guaranteeing the existence of a saddle-point—or Pareto optimal policy—as studied in [43,44]. Another technique, called the multi-objective evolutionary algorithm, combines multi-objective problems with statistical techniques to approximate the Pareto optimal as in [45]. In both cases, it is still necessary to apply extra techniques due to the unknown parameter

θ

.

Author Contributions

Conceptualization, methodology, and writing/original draft preparation of this research are due to B.A.E.-T., F.A.A.-H. and J.D.L.-B.; software, validation, visualization, and data curation are original of F.A.A.-H.; formal analysis, investigation, writing/review and editing are due to C.G.H.-C.; project administration, funding acquisition are due to J.D.L.-B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Universidad Anáhuac México.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors wish to sincerely thank Ekaterina Viktorovna Gromova for her kind invitation to publish this work.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Convergence of the HJB-Equation

Let

v_{1}

,

v_{2} : O \times U \times Θ \times E \to R

be two functions with the same properties of the rate functions established in Assumption 3. Furthermore, for every

x \in R^{n}

,

k \in E,

u \in U

,

α > 0

, functions

λ

and

ρ

in

B (O)

, and h in

W^{2, p} (\bar{O} \times E)

, let

\begin{matrix} \hat{Ψ} (x, k, u, α, λ, ρ, θ; h) : = & v_{1} (x, k, u, θ) + λ (x) [v_{2} (x, k, u, θ) - ρ (x)] \\ + \sum_{i = 1}^{n} b_{i} (x, k, u, θ) \partial_{i} h (x, k) - α h (x, k), \end{matrix}

(A1)

where

b_{i}

is the i-th component of the drift function b in (1). We also define

L^{u, θ} h (x, k) : = \hat{Ψ} (x, k, u, α, λ, ρ, θ; h) + \frac{1}{2} \sum_{i, j = 1}^{n} a^{i j} (x, k) \partial_{i j}^{2} h (x, k),

with a as in Assumption 1 (d). For each

π \in Π

, we denote

\begin{matrix} \hat{Ψ} (x, k, π, α, λ, ρ, θ; h) & : = & \int_{U} \hat{Ψ} (x, k, u, α, λ, ρ, θ; h) π (d u | x), and \\ L^{π, θ} h (x, k) & : = & \hat{Ψ} (x, k, π, α, λ, ρ, θ; h) + \frac{1}{2} \sum_{i, j = 1}^{n} a^{i j} (x, k) \partial_{i j}^{2} h (x, k) . \end{matrix}

The framework we consider requires the interchange of limits, which is an extension of the adaptive case of Theorem 6.1 in [28], Theorem A1 in [26], Theorem 3.4 in [46] and Theorem 5.2 in [47].

Theorem A1.

Let

O

be a bounded

C^{2}

domain and suppose that Assumptions 1–3 hold. In addition, assume that there exist sequences

{(λ_{m})}_{m \in N}, {(ρ_{m})}_{m \in N} \subset B (O)

,

{(π_{m})}_{m \in N}) \subset Π

,

θ_{m} \in Θ

and

{(h_{m})}_{m \in N} \equiv {(h (\cdot, \cdot, θ_{m}))}_{m \in N} \subset W^{2, p} (O \times E)

,

{(ξ_{m})}_{m \in N} \subset L^{p} (O \times E)

, with

p > n

(n is the dimension of (1)), satisfying the following:

(a): $L^{π_{m}, θ_{m}} h_{m} = ξ_{m}$ in $O \times E$ for $m = 1, 2, \dots$ .
(b): There exists a constant ${\tilde{M}}_{1}$ such that ${∥h_{m}∥}_{W^{2, p} (O \times E)} \leq {\tilde{M}}_{1}$ for $m = 1, 2, \dots$ .
(c): $ξ_{m}$ converges in $L^{p} (O \times E)$ to some function ξ.
(d): $θ_{m}$ converges to some θ, $P^{π, θ}$ -a.s.
(e): $ρ_{m}$ converges uniformly to some function ρ.
(f): $π_{m} \overset{W}{\to} π$ .

Then, there exists a function

h \in W^{2, p} (O \times E)

and a subsequence

(m_{k} : k = 1, \dots) \subset {1, 2, \dots}

such that

h_{m_{k}} \to h

in the norm of

C^{1, η} (O \times E)

for

η < 1 - \frac{n}{p}

as

k \to \infty

. Moreover,

L^{π, θ} h = ξ i n O \times E P^{π, θ} - a . s .

(A2)

Proof.

It is known that Sobolev’s space

W^{2, p} (O \times E)

is reflexive Theorem 3.5 in [48]. Then, by Theorem 1.17 in [48], for every

\bar{M} \geq 0

, the ball

H : = \{h \in W^{2, p} (O \times E) : {∥h∥}_{W^{2, p} (O \times E)} \leq \bar{M}\}

(A3)

is weakly sequentially compact. On the other hand, since

p > n

, by Theorem 6.2 (Part III) in [48], for

0 \leq η < 1 - \frac{n}{p}

, the embedding

W^{2, p} (O \times E) ↪ C^{1, η} (O \times E)

is compact; hence, it is also continuous, and thus the set H in (A3) is relatively compact in

C^{1, η} (O \times E)

. This fact ensures the existence of a function

h \in W^{2, p} (\bar{O} \times E)

and a subsequence

{(h_{m_{k}})}_{k \in N} \equiv {(h_{m})}_{m \in N} \subset H

such that

h_{m} \to h w e a k l y i n W^{2, p} (O \times E) a n d s t r o n g l y i n C^{1, η} (O \times E) .

(A4)

Now, we show that, as

m \to \infty

,

\int_{O} g (x, k) \hat{Ψ} (x, k, π_{m}, α_{m}, λ_{m}, ρ_{m}, θ_{m}; h_{m}) d x \to \int_{O} g (x, k) \hat{Ψ} (x, k, π, α, λ, ρ, θ; h) d x, P^{π, θ} - a . s .,

(A5)

for all

g \in L^{1} (O \times E)

.

To this end, recall (A1) and note that, given

(x, k) \in O \times E

, functions

h \in W^{2, p} (O \times E)

and

h_{m} \in H

,

λ_{m}

,

λ

,

ρ_{m}

,

ρ \in B (O)

, a pair of policies

π, π_{m} \in Π

, and

θ_{m}, θ \in Θ, α \geq 0

, the following holds.

\begin{matrix} \int_{O} g (x, k) |\hat{Ψ} (x, k, π_{m}, α, λ_{m}, ρ_{m}, θ_{m}; h_{m}) - \hat{Ψ} (x, k, π, α, λ, ρ, θ; h)| d x \\ \leq & \int_{O} g (x, k) |v_{1} (x, k, π_{m}, θ_{m}) - v_{1} (x, k, π, θ_{m})| d x \\ + & \int_{O} g (x, k) |v_{1} (x, k, π, θ_{m}) - v_{1} (x, k, π, θ)| d x \\ + & \int_{O} g (x, k) |λ_{m} (x) v_{2} (x, k, π_{m}, θ_{m}) - λ_{m} (x) v_{2} (x, k, π, θ_{m})| d x \\ + & \int_{O} g (x, k) |λ_{m} (x) v_{2} (x, k, π, θ_{m}) - λ_{m} (x) v_{2} (x, k, π, θ)| d x \\ + & \int_{O} g (x, k) |λ_{m} (x) v_{2} (x, k, π, θ) - λ (x) v_{2} (x, k, π, θ)| d x \\ + & \sum_{i = 1}^{n} \int_{O} g (x, k) |\partial_{i} h_{m} (x, k) [b_{i} (x, k, π_{m}, θ_{m}) - b_{i} (x, k, π, θ_{m}]| d x \\ + & \sum_{i = 1}^{n} \int_{O} g (x, k) | \partial_{i} h_{m} (x, k) [b_{i} (x, k, π, θ_{m}) - b_{i} (x, k, π, θ)] | d x \\ + & \sum_{i = 1}^{n} \int_{O} g (x, k) | b_{i} (x, k, π, θ) [\partial_{i} h_{m} (x, k) - \partial_{i} h (x, k)] | d x \\ + & \int_{O} g (x, k) | λ_{m} (x) [ρ_{m} (x) - ρ (x)] d x | \\ + & \int_{O} g (x) | ρ (x) [λ_{m} (x) - λ (x)] | d x \\ + & α \int_{O} g (x, k) | h_{m} (x, k) - h (x, k) | d x \end{matrix}

Since the embedding

W^{2, p} (O \times E) ↪ C^{1, η} (O \times E)

is continuous, hypothesis (b) together with the definition of the norm

{∥ \cdot ∥}_{C^{1, η} (O \times E)}

lead to

max \{| h_{m} |, max_{1 \leq i \leq n} | \partial_{i} h_{m} |\} \leq {∥h_{m}∥}_{C^{1, η} (O \times E)} \leq \bar{M} {∥h_{m}∥}_{W^{2, p} (O \times E)} \leq \bar{M} {\tilde{M}}_{1} .

On the other hand, Assumptions 1 and 3, yield that

sup_{π \in Π} | b (\cdot, \cdot, π, \cdot) | + sup_{π \in Π} | v_{2} (\cdot, \cdot, π, \cdot) | \leq K (\bar{O} \times E) .

Hence,

\begin{matrix} \int_{O} g (x, k) |\hat{Ψ} (x, k, π_{m}, α, λ_{m}, ρ_{m}, θ_{m}; h_{m}) - \hat{Ψ} (x, k, π, α, λ, ρ, θ; h)| d x \\ \leq & \int_{O} g (x, k) |v_{1} (x, k, π_{m}, θ_{m}) - v_{1} (x, k, π, θ_{m})| d x \\ + {∥g∥}_{L^{1} (O \times E)} |v_{1} (x, k, π, θ_{m}) - v_{1} (x, k, π, θ)| \\ + | λ_{m} | \int_{O} g (x, k) |v_{2} (x, k, π_{m}, θ_{m}) - v_{2} (x, k, π, θ_{m})| d x \\ + | λ_{m} | {∥g∥}_{L^{1} (O \times E)} |v_{2} (x, k, π, θ_{m}) - v_{2} (x, k, π, θ)| \\ + K (\bar{O} \times E) {∥λ_{m} - λ∥}_{B (O)} {∥g∥}_{L^{1} (O \times E)} \\ + \bar{M} {\tilde{M}}_{1} n max_{1 \leq i \leq n} \int_{O} g (x, k) |b_{i} (x, k, π_{m}, θ_{m}) - b_{i} (x, k, π, θ_{m})| d x \\ + \bar{M} {\tilde{M}}_{1} n max_{1 \leq i \leq n} max_{k \in E} {∥g∥}_{L^{1} (O \times E)} |b_{i} (x, k, π, θ_{m}) - b_{i} (x, k, π, θ)| \\ + {∥h_{m} - h∥}_{C^{1, η} (O \times E)} 2 n K (\bar{O} \times E) {∥g∥}_{L^{1} (O \times E)} \\ + | λ_{m} | | {∥ρ_{m} - ρ∥}_{B (O)} {∥g∥}_{L^{1} (O \times E)} \\ + {∥ρ∥}_{B (O)} {∥λ_{m} - λ∥}_{B (O)} {∥g∥}_{L^{1} (O \times E)} . \end{matrix}

(A6)

Observe that

v_{1} (\cdot, \cdot π, θ)

,

v_{2} (\cdot, \cdot, π, θ)

and

b_{i} (\cdot, \cdot, π, θ)

i = 1, \dots, n

are bounded on

\bar{O} \times E

, so the weak convergence criterion can be applied. In addition to that, Assumptions 1 (a) and 3 (a) implies that these functions are continuous on

Θ

. Then, hypotheses (d) to (f), together with (A4), lead to the right-hand side of (A6) going to zero as

m \to \infty

P^{π, θ}

almost surely, thus proving (A5).

The existence of the constant

K (\bar{O} \times E)

used for the analysis in (A6) can be also used to claim that

| σ (x, k) | \leq K (\bar{O} \times E)

, then we can assert that for each g in

L^{\frac{p}{p - 1}} (O \times E)

,

\begin{matrix} \frac{1}{2} |\int_{O} g (x, k) [\sum_{i, j = 1}^{n} a^{i j} (x, k) \partial_{i j}^{2} h_{m} (x, k) - \sum_{i, j = 1}^{n} a^{i j} (x, k) \partial_{i j}^{2} h (x, k)] d x| \leq \\ \frac{n^{2}}{2} {[K (\bar{O} \times E)]}^{2} \sum_{i, j = 1}^{n} |\int_{O} g (x, k) [\partial_{i j}^{2} h_{m} (x, k) - \partial_{i j}^{2} h (x, k)] d x| \end{matrix}

(A7)

Thus the weak convergence of

(h_{m} : m = 1, 2, \dots)

to h in

W^{2, p} (O \times E)

yields that the right-hand side of (A7) converges to zero almost surely as

m \to \infty

. Notice also that the convergence of (A5) is also valid for all

g \in L^{\frac{p}{p - 1}} (O \times E)

. The reason is because

L^{\frac{p}{p - 1}} (O) \times E \subset L^{1} (O \times E)

(recall the Lebesgue measure on

O

is bounded). This last fact together with (A7) and hypothesis (c), yield that for every g in

L^{\frac{p}{p - 1}} (O \times E)

,

\begin{matrix} \int_{O} g (x, k) [L^{π, θ} h (x, k) - ξ (x, k)] d x = lim_{n \to \infty} \int_{} g (x, k) [L^{π_{m}, θ_{m}} h_{m} (x, k) - ξ_{m} (x, k)] d x = 0 \end{matrix}

P^{π, θ}

almost surely. This fact, along with Theorem 2.10 in [49], implies (A2), i.e.,

L^{π, θ} h = ξ

P^{π, θ}

almost surely in

O \times E

. This completes the proof. □

References

Vierros, M.K. Promotion and Strengthening of Sustainable Ocean-Based Economies; United Nations: New York, NY, USA, 2021. [Google Scholar]
Kawaguchi, K. Optimal Control of Pollution Accumulation with Long-Run Average Welfare. Environ. Resour. Econ. 2003, 26, 457–468. [Google Scholar] [CrossRef]
Morimoto, H. Optimal Pollution Control with Long-Run Average Criteria. In Stochastic Control and Mathematical Modeling: Applications in Economics; Encyclopedia of Mathematics and its Applications; Cambridge University Press: Cambridge, UK, 2010; pp. 237–251. [Google Scholar] [CrossRef]
Escobedo-Trujillo, B.A.; López-Barrientos, J.D.; Higuera-Chan, C.G.; Alaffita-Hernández, F.A. Robust statistic estimation in constrained optimal control problems of pollution accumulation (Part I). Mathematics 2023, 11, 923. [Google Scholar] [CrossRef]
Hilgert, N.; Minjárez-Sosa, A. Adaptive control of stochastic systems with unknown disturbance distribution: Discounted criteria. Math. Methods Oper. Res. 2006, 63, 443–460. [Google Scholar] [CrossRef] [Green Version]
Hernández-Lerma, O.; Marcus, S. Technical note: Adaptive control of discounted Markov Decision chains. J. Optim. Theory Appl. 1985, 46, 227–235. [Google Scholar] [CrossRef]
Kurano, M. Discrete-time markovian decision processes with an unknown parameter-average return criterion. J. Oper. Res. Soc. Jpn. 1972, 15, 67–76. [Google Scholar]
Mandl, P. Estimation and control in Markov chains. Adv. Appl. Probab. 1974, 6, 40–60. [Google Scholar] [CrossRef]
Borkar, V.; Ghosh, M. Ergodic Control of Multidimensional Diffusions II: Adaptive Control. Appl. Math. Optim. 1990, 21, 191–220. [Google Scholar] [CrossRef]
Vrabie, D.; Pastravanu, O.; Abu-Khalaf, M.; Lewis, F. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 2009, 45, 477–484. [Google Scholar] [CrossRef]
Di Masp, G.; Stettner, L. Bayesian ergodic adaptive control of diffusion processes. Stochastics Stochastics Rep. 1997, 60, 155–183. [Google Scholar] [CrossRef]
Ralchenko, K. Asymptotic normality of discretized maximum likelihood estimator for drift parameter in homogeneous diffusion model. Mod. Stochastics Theory Appl. 2015, 2, 17–28. [Google Scholar] [CrossRef] [Green Version]
Duncan, T.; Pasik-Duncan, B.; Stettner, L. Almost self-optimizing strategies for the adaptive control of diffusion processes. J. Optim. Theory Appl. 1994, 81, 479–507. [Google Scholar] [CrossRef]
Durham, G.; Gallant, A. Numerical Techniques for Maximum Likelihood Estimation of Continuous-Time Diffusion Processes. J. Bus. Econ. Stat. 2002, 20, 297–316. [Google Scholar] [CrossRef] [Green Version]
Huzak, M. Estimating a class of diffusions from discrete observations via approximate maximum likelihood method. Statistics 2018, 52, 239–272. [Google Scholar] [CrossRef]
Shoji, I. A note on asymptotic properties of the estimator derived from the Euler method for diffusion processes at discrete times. Stat. Probab. Lett. 1997, 36, 153–159. [Google Scholar] [CrossRef]
Athanassoglou, S.; Xepapadeas, A. Pollution control with uncertain stock dynamics: When, and how, to be precautious. J. Environ. Econ. Manag. 2012, 63, 304–320. [Google Scholar] [CrossRef] [Green Version]
Jiang, K.; You, D.; Li, Z.; Shi, S. A differential game approach to dynamic optimal control strategies for watershed pollution across regional boundaries under eco-compensation criterion. Ecol. Indic. 2019, 105, 229–241. [Google Scholar] [CrossRef]
Kawaguchi, K.; Morimoto, H. Long-run average welfare in a pollution accumulation model. J. Econ. Dyn. Control 2007, 31, 703–720. [Google Scholar] [CrossRef]
Jasso-Fuentes, H.; López-Barrientos, J.D. On the use of stochastic differential games against nature to ergodic control problems with unknown parameters. Int. J. Control 2015, 88, 897–909. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, G.; Su, B. The spatial impacts of air pollution and socio-economic status on public health: Empirical evidence from China. Socio-Econ. Plan. Sci. 2022, 83, 101167. [Google Scholar] [CrossRef]
Méndez-Cubillos, X.C.; de Souza, L.C.G. Using of H_∞ Control Method in Attitude Control System of Rigid-Flexible Satellite. Math. Probl. Eng. 2009, 173145. [Google Scholar] [CrossRef] [Green Version]
Shaked, U.; Theodor, Y. H_∞ optimal estimation: A tutorial. In Proceedings of the 31st IEEE Conference on Decision and Control, Tucson, AZ, USA, 16–18 December 1992. [Google Scholar]
Cox, L.A.T., Jr. Confronting Deep Uncertainties in Risk Analysis. Risk Anal. 2012, 32, 1607–1629. [Google Scholar] [CrossRef]
Lu, J.; Xue, H.; Duan, X. An Adaptive Moving Mesh Method for Solving Optimal Control Problems in Viscous Incompressible Fluid. Symmetry 2022, 14, 707. [Google Scholar] [CrossRef]
Escobedo-Trujillo, B.A.; López-Barrientos, J.D.; Garrido-Meléndez, J. A Constrained Markovian Diffusion Model for Controlling the Pollution Accumulation. Mathematics 2021, 9, 1466. [Google Scholar] [CrossRef]
López-Barrientos, J.D.; Jasso-Fuentes, H.; Escobedo-Trujillo, B.A. Discounted robust control for Markov diffusion processes. Top 2015, 23, 53–76. [Google Scholar] [CrossRef]
Jasso-Fuentes, H.; Escobedo-Trujillo, B.; Mendoza-Pérez, A. The Lagrange and the vanishing discount techniques to controlled diffusions with cost constraints. J. Math. Anal. Appl. 2016, 437, 999–1035. [Google Scholar] [CrossRef]
Escobedo-Trujillo, B.; Hernández-Lerma, O.; Alaffita-Hernández, F. Adaptive control of diffusion processes with a discounted criterion. Appl. Math. 2020, 47, 225–253. [Google Scholar] [CrossRef]
Warga, J. Optimal Control of Differential and Functional Equations; Academic Press: New York, NY, USA, 1972. [Google Scholar]
Arapostathis, A.; Borkar, V.; Ghosh, M. Ergodic control of diffusion processes. In Encyclopedia of Mathematics and Its Applications; Cambridge University Press: Cambridge, UK, 2012; Volume 143. [Google Scholar]
Fleming, W.; Nisio, M. On the stochastic relaxed control for partially observed diffusions. Nagoya Mathhematical J. 1984, 93, 71–108. [Google Scholar] [CrossRef] [Green Version]
Ghosh, M.; Arapostathis, A.; Marcus, S. Optimal control of switching diffusions with applications to flexible manufacturing systems. SIAM J. Control Optim. 1992, 30, 1–23. [Google Scholar]
Mao, X.; Yuan, C. Stochastic Differential Equations with Markovian Switching; World Scientific Publishing Co.: London, UK, 2006. [Google Scholar]
Jasso-Fuentes, H.; Hernández-Lerma, O. Characterizations of overtaking optimality for controlled diffusion processes. Appl. Math. Optim. 2007, 57, 349–369. [Google Scholar] [CrossRef]
Dimarogonas, D.V.; Kyriakopoulos, K.J. Lyapunov-like stability of switched stochastic systems. Proc. 2004 Am. Control Conf. 2004, 2, 1868–1872. [Google Scholar] [CrossRef]
Fathi, M.; Bevrani, H. Optimization in Electrical Engineering; Springer International Publishing: Berlin/Heidelberg, Germany, 2019. [Google Scholar] [CrossRef]
Escobedo-Trujillo, B.A.; Higuera-Chan, C.G.; López-Barrientos, J.D. Controlled Switching Diffusions Under Ambiguity: The Average Criterion. Int. Game Theory Rev. 2021, 23, 2150017. [Google Scholar] [CrossRef]
Pedersen, A.R. Consistency and asymptotic normality of an approximate maximum likelihood estimator for discretely observed diffusions process. Bernoulli 1995, 1, 257–279. [Google Scholar] [CrossRef]
Gilbarg, D.; Trudinger, N.S. Elliptic Partial Differential Equations of Second Order; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
Yuan, C.; Mao, X. Convergence of the Euler–Maruyama method for stochastic differential equations with Markovian switching. Math. Comput. Simul. 2004, 64, 223–235. [Google Scholar] [CrossRef]
Kloeden, P.; Platen, E. Numerical Solutions of Stochastic Differential Equations. Stochastic Modelling and Applied Probability; Springer: Berlin/Heidelberg, Germany, 1992. [Google Scholar]
Hernández-Lerma, O.; Romera, R. The Scalarization Approach to Multiobjective Markov Control Problems: Why Does It Work? Appl. Math. Optim. 2004, 50, 279–293. [Google Scholar] [CrossRef]
Jasso-Fuentes, H.; López-Martínez, R.; Minjárez-Sosa, J. Some advances on constrained Markov decision processes in Borel spaces with random state-dependent discount factors. Optimization 2022, 2130699. [Google Scholar] [CrossRef]
Gaspar-Cunha, A.; Covas, J. Robustness in multi-objective optimization using evolutionary algorithms. Comput. Optim. Appl. 2008, 39, 75–96. [Google Scholar] [CrossRef]
López-Barrientos, J.D. Basic and Advanced Optimality Criteria for Zero–Sum Stochastic Differential Games; Centro de Investigación y de Estudios Avanzados del IPN: México, 2012; Available online: www.math.cinvestav.mx/sites/default/files/tesis-daniel-2012.pdf (accessed on 18 January 2023).
Alaffita-Hernández, F.A.; Escobedo-Trujillo, B.A.; López-Martínez, R. Constrained stochastic differential games with additive structure: Average and discount payoffs. J. Dyn. Games 2018, 5, 109–141. [Google Scholar]
Adams, R. Sobolev Spaces; Academic Press: New York, NY, USA, 1975. [Google Scholar]
Lieb, E.; Loss, M. Analysis; American Mathematical Society: Providence, RI, USA, 2001. [Google Scholar]

Figure 1. Asymptotic behavior of

x {(t)}^{θ_{L S E_{m}}}

and

ψ (t)

.

Figure 1. Asymptotic behavior of

x {(t)}^{θ_{L S E_{m}}}

and

ψ (t)

.

Figure 2. Asymptotic behavior of the optimal reward

V_{θ_{L S E_{m}}}^{*} (x {(t)}^{θ_{L S E_{m}}}, i)

(vertical axis) using the estimator

θ_{L S E_{m}}

with m = 10,000, 12,500, 16,667, 25,000 and 5000.

Figure 2. Asymptotic behavior of the optimal reward

V_{θ_{L S E_{m}}}^{*} (x {(t)}^{θ_{L S E_{m}}}, i)

(vertical axis) using the estimator

θ_{L S E_{m}}

with m = 10,000, 12,500, 16,667, 25,000 and 5000.

Table 1. Estimated processes using

θ_{L S E_{m}}

and the real processes (

θ = 2.5

).

Table 1. Estimated processes using

θ_{L S E_{m}}

and the real processes (

θ = 2.5

).

m	RMSE	RMSE	RMSE	RMSE
	$(θ - θ_{{LSE}_{m}})$	$(x^{θ} - x^{θ_{{LSE}_{m}}})$	$(f_{θ}^{λ} (i) - f_{θ_{{LSE}_{m}}}^{λ} (i))$	$(V_{θ}^{} (x^{θ}, i) - V_{θ_{LSE m}}^{} (^{θ_{{LSE}_{m}}}, i))$
50,000	0.00496364	0.00059318	$5.72448 \times 10^{- 5}$	5.09465
25,000	0.00497008	0.0423565	$5.72228 \times 10^{- 5}$	1707.61
16,667	3.23512	0.112074	0.00174892	2648.67
12,500	0.0049845	0.0873181	$5.72787 \times 10^{- 5}$	2518.47
10,000	0.00498072	0.10305	$5.72684 \times 10^{- 5}$	2669.93

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Escobedo-Trujillo, B.A.; López-Barrientos, J.D.; Higuera-Chan, C.G.; Alaffita-Hernández, F.A. Robust Statistic Estimation in Constrained Optimal Control Problems of Pollution Accumulation (Part II: Markovian Switchings). Mathematics 2023, 11, 1045. https://doi.org/10.3390/math11041045

AMA Style

Escobedo-Trujillo BA, López-Barrientos JD, Higuera-Chan CG, Alaffita-Hernández FA. Robust Statistic Estimation in Constrained Optimal Control Problems of Pollution Accumulation (Part II: Markovian Switchings). Mathematics. 2023; 11(4):1045. https://doi.org/10.3390/math11041045

Chicago/Turabian Style

Escobedo-Trujillo, Beatris Adriana, José Daniel López-Barrientos, Carmen Geraldi Higuera-Chan, and Francisco Alejandro Alaffita-Hernández. 2023. "Robust Statistic Estimation in Constrained Optimal Control Problems of Pollution Accumulation (Part II: Markovian Switchings)" Mathematics 11, no. 4: 1045. https://doi.org/10.3390/math11041045

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust Statistic Estimation in Constrained Optimal Control Problems of Pollution Accumulation (Part II: Markovian Switchings)

Abstract

1. Introduction

Notation and Terminology

2. Model Formulation and Assumptions

3. Discounted Optimality Problems and Main Results

3.1. Discounted Optimality Criterion

3.2. Unconstrained Discounted Optimality

3.3. Convergence of Value Functions and Estimation Methods

4. Numerical Example

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Convergence of the HJB-Equation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI