Convergence Analysis for an Online Data-Driven Feedback Control Algorithm

Liang, Siming; Sun, Hui; Archibald, Richard; Bao, Feng

doi:10.3390/math12162584

Open AccessArticle

Convergence Analysis for an Online Data-Driven Feedback Control Algorithm

¹

Department of Mathematics, Florida State University, Tallahassee, FL 32304, USA

²

Citigroup Inc., Wilmington, DE 19801, USA

³

Devision of Computational Science and Mathematics, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(16), 2584; https://doi.org/10.3390/math12162584

Submission received: 10 July 2024 / Revised: 9 August 2024 / Accepted: 19 August 2024 / Published: 21 August 2024

(This article belongs to the Special Issue Machine Learning and Statistical Learning with Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper presents convergence analysis of a novel data-driven feedback control algorithm designed for generating online controls based on partial noisy observational data. The algorithm comprises a particle filter-enabled state estimation component, estimating the controlled system’s state via indirect observations, alongside an efficient stochastic maximum principle-type optimal control solver. By integrating weak convergence techniques for the particle filter with convergence analysis for the stochastic maximum principle control solver, we derive a weak convergence result for the optimization procedure in search of optimal data-driven feedback control. Numerical experiments are performed to validate the theoretical findings.

Keywords:

stochastic optimal control; nonlinear filtering; data driven; maximum principle; stochastic optimization

MSC:

49N35

1. Introduction

In this paper, we carry out numerical analysis demonstrating the convergence of a data-driven feedback control algorithm designed for generating online control based on partial noisy observational data.

Our focus lies on the stochastic feedback control problem, which aims to determine optimal controls. These control actions are used to guide a controlled state dynamical system towards meeting certain optimality conditions, leveraging feedback from the system’s current state. There are two practical challenges in solving the feedback control problem. First, when the control problem’s dimension is high, the computational cost for searching the optimal control escalates exponentially. This is known as the “curse of dimensionality”. Second, in numerous scenarios, the state of the controlled system is not directly observable and must be inferred through detectors or observation facilities. These sensors are typically subject to noise that originates from the device itself or the surrounding environment. For instance, radar receives noisy data and processes them through the arctangent function. Therefore, state estimation techniques become necessary to estimate the current state for designing optimal control, with observations gathered to aid in estimating the hidden state.

To address the aforementioned challenges, a novel online data-driven feedback control algorithm has been developed [1]. This algorithm introduces a stochastic gradient descent optimal control solver within the stochastic maximum principle framework to combat the high dimensionality issue in optimal control problems. Traditionally, stochastic optimal control problems are solved using dynamical programming or the stochastic maximum principle, both requiring numerical simulations for large differential systems [2,3,4]. However, the stochastic maximum principle stands out for its capability to handle random coefficients in the state model and finite-dimensional terminal state constraints [5]. In the stochastic maximum principle approach, a system of backward stochastic differential equations (BSDEs) is derived as the adjoint equation of the controlled state process. Then, the solution of the adjoint BSDE is utilized to formulate the gradient of the cost function with respect to the control process [6,7]. However, solving BSDEs numerically entails significant computational costs, especially in high-dimensional problems, demanding a large number of random samples [8,9]. To bolster efficiency, a sample-wise optimal control solver method has been devised [10], where the solution of the adjoint BSDE is represented using only one realization or a small batch of samples. This approach justifies the application of stochastic approximation in the optimization procedure [11,12], and it shifts the computational cost from solving BSDEs to searching for the optimal control, thereby enhancing overall efficiency [13].

In data-driven feedback control, optimal filtering methods also play a pivotal role in dynamically estimating the state of the controlled system. Two prominent approaches for nonlinear optimal filtering are the Zakai filter and the particle filter. While the Zakai filter aims to compute the conditional probability density function (pdf) for the target dynamical system using a parabolic-type stochastic partial differential equation known as the Zakai equation [14], the particle filter, also known as a sequential Monte Carlo method, approximates the desired conditional pdf using the empirical distribution of a set of random samples (particles) [15]. Although the Zakai filter theoretically offers more accurate approximations for conditional distributions, the particle filter is favored in more practical applications due to the high efficiency of the Monte Carlo method in approximating high-dimensional distributions [16].

The aim of this study is to examine the convergence of the data-driven feedback control algorithm proposed in [1], providing mathematical validation for its performance. While convergence in particle filter methods has been well studied [17,18,19], this work adopts the analysis technique outlined in [18] to establish weak convergence results for the particle filter regarding the number of particles. Analysis techniques for BSDEs alongside classical convergence results for stochastic gradient descent [13,20] are crucial for achieving convergence in the stochastic gradient descent optimal control solver. The theoretical framework of this analysis merges the examination of particle filters with the analysis of optimal control, and the overarching objective of this paper is to derive a comprehensive weak convergence result for the optimal data-driven feedback control.

In this paper, we present two numerical examples to demonstrate the baseline performance and convergence trend of our algorithm. The first example involves a classic linear quadratic optimal control problem, comparing the analytical control with the estimated control. The second example addresses a nonlinear scenario, specifically a Dubins vehicle maneuvering problem, where both the system and observations exhibit significant nonlinearity.

The rest of this paper is organized as follows. In Section 2, we introduce the data-driven feedback control algorithm. Convergence analysis will be presented in Section 3, and in Section 4, we conduct two numerical experiments to validate our theoretical findings.

2. An Efficient Algorithm for Data-Driven Feedback Control

We first briefly introduce the data-driven feedback control problem that we consider in this work. Then, we describe our efficient algorithm for solving the data-driven feedback control problem by using a stochastic gradient descent-type optimization procedure for the optimal control.

2.1. Problem Setting for the Data-Driven Optimal Control Problem

In probability space

(Ω, F, P)

, we consider the following augmented system on time interval

[0, T]

\begin{matrix} d (\begin{matrix} X_{t} \\ M_{t} \end{matrix}) = (\begin{matrix} b (t, X_{t}, u_{t}) \\ g (X_{t}) \end{matrix}) d t + (\begin{matrix} σ (t, X_{t}, u_{t}) & 0 \\ 0 & I \end{matrix}) d (\begin{matrix} W_{t} \\ B_{t} \end{matrix}), (\begin{matrix} X_{0} = ξ \\ M_{0} = 0 \end{matrix}), \end{matrix}

(1)

where

X : = {X_{t}}_{t = 0}^{T}

is the

R^{d}

-dimensional controlled state process with dynamics

b : [0, T] \times R^{d} \times R^{m} \to R^{d}

,

σ : [0, T] \times R^{d} \times R^{m} \to R^{d \times q}

is the diffusion coefficient for a d-dimensional Brownian motion W that perturbs the the state X, and u is an m-dimensional control process valued in some set U that controls the state process X. In the case that the state X is not directly observable, we have an observation process M that collects partial noisy observations on X with observation function

g : R^{d} \to R^{p}

, and B is a p-dimensional Brownian motion that is independent from W.

Let

F^{B} = {F_{t}^{B}}_{t \geq 0}

be the filtration of B augmented by all the

P

-null sets in

F

, and

F^{W, B} \equiv {F_{t}^{W, B}}_{t \geq 0}

be the filtration generated by W and B (augmented by

P

-null sets in

F

). Under mild conditions, for any square integrable random variable

ξ

independent of W and B, and any

F^{W, B}

-progressively measurable process u (valued in U), Equation (1) admits a unique solution

(X, M)

which is

F^{W, B}

-adapted. Next, we let

F^{M} = {F_{t}^{M}}_{t \geq 0}

be the filtration generated by M (augmented by all the

P

-null sets in

F

). Clearly,

F^{M} \subset F^{W, B}

, and

F^{M} \neq F^{W}

,

F^{M} \neq F^{B}

, in general. The

F^{M}

progressively measurable control processes, denoted by

u^{M}

, are control actions driven by the information contained in observational data.

We introduce the set of data-driven admissible controls as

U_{a d} [0, T] = \{u^{M} : [0, T] \times Ω \to U \subset R^{m} | u^{M} is F^{M} - progressively measurable\},

and the cost functional that measures the performance of data-driven control

u^{M}

is defined as

J (u^{M}) = E [\int_{0}^{T} f (t, X_{t}, u_{t}^{M}) d t + h (X_{T})],

(2)

where f is the running cost, and h is the terminal cost.

The goal of the data-driven feedback control problem is to find the optimal data-driven control

u^{*} \in U_{a d} [0, T]

such that

J (u^{*}) = inf_{u^{M} \in U_{a d} [0, T]} J (u^{M}) .

(3)

2.2. The Algorithm for Solving the Data-Driven Optimal Control Problem

To solve the data-driven feedback control problem, we will use the algorithm from [1], which is derived from the stochastic maximum principle.

2.2.1. The Optimization Procedure for Optimal Control

When the optimal control

u^{*}

is in the interior of

U_{a d}

, the gradient process of the cost functional

J^{*}

with respect to the control process on time interval

t \in [0, T]

can be derived using the Gâteaux derivative of

u^{*}

and the stochastic maximum principle in the following form:

{(J^{*})}_{u}^{'} (u_{t}^{*}) = E [b_{u} {(t, X_{t}^{*}, u_{t}^{*})}^{⊤} Y_{t} + σ_{u} {(t, X_{t}^{*}, u_{t}^{*})}^{⊤} Z_{t} + f_{u} {(t, X_{t}^{*}, u_{t}^{*})}^{⊤} | F_{t}^{M}],

(4)

where stochastic processes Y and Z are solutions of the following forward–backward stochastic differential equations (FBSDEs) system:

\{\begin{matrix} d X_{t}^{*} = b (t, X_{t}^{*}, u_{t}^{*}) d t + σ (t, X_{t}^{*}, u_{t}^{*}) d W_{t}, X_{0} = ξ \\ d M_{t}^{*} = ϕ (X_{t}) d t + d B_{t}, M_{0} = 0 \\ d Y_{t} = (- b_{x} {(t, X_{t}^{*}, u_{t}^{*})}^{⊤} Y_{t} - σ_{x} {(t, X_{t}^{*}, u_{t}^{*})}^{⊤} Z_{t} - f_{x} {(t, X_{t}^{*}, u_{t}^{*})}^{⊤}) d t \\ + Z_{t} d W_{t} + ζ_{t} d B_{t}, Y_{T} = {(g_{x} (X_{T}))}^{⊤} \end{matrix}

(5)

where Z is the martingale representation of Y with respect to W and

ζ

is the martingale representation of Y with respect to B.

To solve the data-driven feedback optimal control problem, we also use gradient descent-type optimization, and the gradient process

{(J^{*})}_{u}^{'}

is defined in (4). Then, we can use the following gradient descent iteration to find the optimal control

u_{t}^{*}

at any time instant

t \in [0, T]

u_{t}^{l + 1, M} = u_{t}^{l, M} - r {(J^{*})}_{u}^{'} (u_{t}^{l, M}), l = 0, 1, 2, \dots,

(6)

where r is the step size for the gradient. We know that the observational information

F_{t}^{M}

is gradually increased as we collect more and more data over time. Therefore, at a certain time instant t, we target finding the optimal control

u_{t}^{*}

with accessible information

F_{t}^{M}

. Since the evaluation for

{(J^{*})}_{u}^{'} (u_{t}^{l, M})

requires trajectories

{(Y_{s}, Z_{s})}_{t \leq s \leq T}

as

Y_{t}

and

Z_{t}

are solved backwards from T to t, we take conditional expectation

E [\cdot | F_{t}^{M}]

to the gradient process

{{(J^{*})}_{u}^{'} (u_{t}^{l, M})}_{t \leq s \leq T}

, i.e.,

\begin{matrix} E [{(J^{*})}_{u}^{'} (u_{s}^{l, M}) | F_{t}^{M}] & = E [b_{u} {(s, X_{s}, u_{s}^{l, M})}^{⊤} Y_{s} + σ_{u} {(s, X_{s}, u_{s}^{l, M})}^{⊤} Z_{s} \\ + f_{u} {(s, X_{s}, u_{s}^{l, M})}^{⊤} | F_{t}^{M}], s \in [t, T], \end{matrix}

(7)

where

X_{s}

,

Y_{s}

and

Z_{s}

correspond to the estimated control

u_{s}^{l, M}

. For the gradient descent iteration (6) on the time interval

[t, T]

, by taking conditional expectation

E [\cdot | F_{t}^{M}]

, we obtain

E [u_{s}^{l + 1, M} | F_{t}^{M}] = E [u_{s}^{l, M} | F_{t}^{M}] - r E [{(J^{*})}_{u}^{'} (u_{s}^{l, M}) | F_{t}^{M}], l = 0, 1, 2, \dots s \in [t, T] .

(8)

When

s > t

, the observational information

{F_{s}^{M}}_{t \leq s \leq T}

is not available at time t. We use conditional expectation

E [u_{s}^{l + 1, M} | F_{t}^{M}]

to replace

u_{s}^{l, M}

since it provides the best approximation for

u_{s}^{l, M}

given the current observational information

F_{t}^{M}

. We denote

u_{s}^{l, M} |_{t} : = E [u_{s}^{l + 1, M} | F_{t}^{M}]

and then the gradient descent iteration is

u_{s}^{l + 1, M} {|_{t} = u_{s}^{l, M} |}_{t} - r E [{(J^{*})}_{u}^{'} (u_{s}^{l, M} |_{t}) | F_{t}^{M}], l = 0, 1, 2, \dots s \in [t, T],

(9)

where

E [{(J^{*})}_{u}^{'} (u_{s}^{l, M} |_{t}) | F_{t}^{M}]

can be obtained by solving the following FBSDEs

\begin{matrix} d X_{t} & = b (s, X_{s}, u_{s}^{l, M} |_{t}) d s + σ (s, X_{s}, u_{s}^{l, M} |_{t}) d W_{s}, s \in [t, T] \\ d Y_{t} & = (- b_{x} {(s, X_{s}, u_{s}^{l, M} |_{t})}^{⊤} Y_{s} - σ_{x} {(s, X_{s}, u_{s}^{l, M} |_{t})}^{⊤} Z_{s} - f_{x} {(s, X_{s}, u_{s}^{l, M} |_{t})}^{⊤}) d s \\ + Z_{s} d W_{s} + ζ_{s} d B_{s}, Y_{T} = {(g_{x} (X_{T}))}^{⊤} \end{matrix}

(10)

and evaluated effectively using the numerical algorithm, which will be introduced later.

When the controlled dynamics and the observation function

ϕ

are nonlinear, we will use optimal filtering techniques to obtain the conditional expectation

E [Ψ (s) | F_{t}^{M}]

. Before applying the particle filter method, which is one of the most important particle-based optimal filtering methods, we define

Ψ (s, X_{s}, u_{s}^{l, M} |_{t}) : = b_{u}^{⊤} (s, X_{s}, u_{s}^{l, M} |_{t}) Y_{s} + σ_{u}^{⊤} (s, X_{s}, u_{s}^{l, M} |_{t}) Z_{s} + f_{u}^{⊤} (s, X_{s}, u_{s}^{l, M} |_{t})

(11)

for

s \in [t, T]

. With the conditional probability density function (pdf)

p (X_{t} | F_{t}^{M})

that we obtain through optimal filtering methods and the fact that

Ψ (s, X_{s}, u_{s}^{l, M} |_{t})

is a stochastic process depending on the state of random variable

X_{t}

, the conditional gradient process

E [{(J^{*})}_{u}^{'} (u_{t}^{l, M} |_{t}) | F_{t}^{M}]

in (7) can be obtained by the following integral

E [{(J^{*})}_{u}^{'} (u_{s}^{l, M} |_{t}) | F_{t}^{M}] = \int_{R^{d}} E [Ψ (s, X_{s}, u_{s}^{l, M} |_{t}) | X_{t} = x] \cdot p (x | F_{t}^{M}) d x, s \in [t, T] .

(12)

2.2.2. Numerical Approach for Data-Driven Feedback Control by PF-SGD

For the numerical framework, we need the temporal partition

Π_{N_{T}}

Π_{N_{T}} = {t_{n}, 0 = t_{0} < t_{1} < \dots < t_{N_{T}} = T},

and we use the control sequence

{u_{t_{n}}^{*}}_{n = 1}^{N_{T}}

to represent the control process

u^{*}

over the time interval

[0, T]

.

Numerical Schemes for FBSDEs

For the FBSDEs system, we adopt the following schemes:

\begin{matrix} X_{i + 1} = X_{i} + b (t_{i}, X_{i}, u_{t_{i}}^{l, M} |_{t_{n}}) ▵ t_{i} + σ (t_{i}, X_{i}, u_{t_{i}}^{l, M} |_{t_{n}}) ▵ W_{t_{i}}, \\ Y_{i} = E_{i} [Y_{i + 1}] + E_{i} [b_{x} {(t_{i + 1}, X_{i + 1}, u_{t_{i + 1}}^{l, M} |_{t_{n}})}^{⊤} Y_{i + 1} \\ + σ_{x} {(t_{i + 1}, X_{i + 1}, u_{t_{i + 1}}^{l, M} |_{t_{n}})}^{⊤} Z_{i + 1} + f_{x} {(t_{i + 1}, X_{i + 1}, u_{t_{i + 1}}^{l, M} |_{t_{n}})}^{⊤}] ▵ t_{i}, \\ Z_{i} = \frac{1}{▵ t_{i}} E_{i} [Y_{i + 1} ▵ W_{t_{i}}], \end{matrix}

(13)

where

X_{i + 1}, Y_{i}, and Z_{i}

are numerical approximations for

X_{t_{i + 1}}, Y_{t_{i}}, and Z_{t_{i}}

, respectively.

Then, the standard Monte Carlo method can approximate expectations with K random samples:

\begin{matrix} X_{i + 1}^{k} = X_{i} + b (t_{i}, X_{i}, u_{t_{i}}^{l, M} |_{t_{n}}) ▵ t_{i} + σ (t_{i}, X_{i}, u_{t_{i}}^{l, M} |_{t_{n}}) \sqrt{▵ t_{i}} ω_{i}^{k}, k = 1, 2, \dots, K, \\ Y_{i} = \sum_{k = 1}^{K} \frac{Y_{i + 1}^{k}}{K} + \frac{▵ t_{i}}{K} \sum_{k = 1}^{K} [b_{x} {(t_{i + 1}, X_{i + 1}^{k}, u_{t_{i + 1}}^{l, M} |_{t_{n}})}^{⊤} Y_{i + 1}^{k} \\ + σ_{x} {(t_{i + 1}, X_{i + 1}^{k}, u_{t_{i + 1}}^{l, M} |_{t_{n}})}^{⊤} Z_{i + 1}^{k} + f_{x} {(t_{i + 1}, X_{i + 1}^{k}, u_{t_{i + 1}}^{l, M} |_{t_{n}})}^{⊤}], \\ Z_{i} = \frac{1}{▵ t_{i}} \sum_{k = 1}^{K} \frac{Y_{i + 1}^{k} \sqrt{▵ t_{i}} ω_{i}^{k}}{K}, \end{matrix}

(14)

where

{ω_{i}^{k}}_{k = 1}^{K}

is a set of random samples following the standard Gaussian distribution that we use to describe the randomness of

▵ W_{t_{i}}

.

The above schemes solve the FBSDE system (5) as a recursive algorithm, and the convergence of the schemes is well studied—cf. ([20,21]).

Particle Filter Method for Conditional Distribution

To apply the particle filter method, we consider the controlled process on time interval

[t_{n - 1}, t_{n}]

X_{t_{n}} = X_{t_{n - 1}} + \int_{t_{n - 1}}^{t_{n}} b (s, X_{s}, u_{s}) d s + \int_{t_{n - 1}}^{t_{n}} σ (s, X_{s}, u_{s}) d W_{s}

(15)

Assume that at time instant

t_{n - 1}

, we have S particles, denoted by

{x_{n - 1}^{(s)}}_{s = 1}^{S}

, that follow an empirical distribution

π (X_{t_{n - 1}} | F_{t_{n - 1}}^{M}) : = \frac{1}{S} \sum_{s = 1}^{S} δ_{x_{n - 1}^{(s)}} (X_{t_{n - 1}})

as an approximation for

p (X_{t_{n - 1}} | F_{t_{n - 1}}^{M})

. The prior pdf that we want to find in the prediction stage is approximated as

\tilde{π} (X_{t_{n}} | F_{t_{n - 1}}^{M}) : = \frac{1}{S} \sum_{s = 1}^{S} δ_{{\tilde{x}}_{n}^{(s)}} (X_{t_{n}})

(16)

where

{\tilde{x}}_{n}^{(s)}

is sampled from

π (X_{t_{n - 1}} | F_{t_{n - 1}}^{M}) p (X_{t_{n}} | X_{t_{n - 1}})

and

p (X_{t_{n}} | X_{t_{n - 1}})

is the transition probability derived from the state dynamics (15). As a result, the sample cloud

{{\tilde{x}}_{n}^{(s)}}_{s = 1}^{S}

provides an approximate distribution for the prior

p (X_{t_{n}} | F_{t_{n - 1}}^{M})

. Then, in the update stage, we have

\tilde{π} (X_{t_{n}} | F_{t_{n}}^{M}) : = \frac{\sum_{s = 1}^{S} δ_{{\tilde{x}}_{n}^{(s)}} (X_{t_{n}}) p (M_{t_{n}} | {\tilde{x}}_{n}^{(s)})}{\sum_{s = 1}^{S} p (M_{t_{n}} | {\tilde{x}}_{n}^{(s)})} = \sum_{s = 1}^{S} w_{n}^{(s)} δ_{{\tilde{x}}_{n}^{(s)}} (X_{t_{n}})

(17)

In this way, we obtain a weighted empirical distribution

\tilde{π} (X_{t_{n}} | F_{t_{n}}^{M})

that approximates the posterior pdf

p (X_{t_{n}} | F_{t_{n}}^{M})

with the importance density weight

w_{n}^{(s)} \propto p (M_{t_{n}} | {\tilde{x}}_{n}^{(s)})

. Then, to avoid the degeneracy problem, we need the resampling step. Thus, we have

π (X_{t_{n}} | F_{t_{n}}^{M}) = \frac{1}{S} \sum_{s = 1}^{S} δ_{x_{n}^{(s)}} (X_{t_{n}})

(18)

Then, we combine the numerical schemes for the adjoint FBSDEs system (10) and the particle filter algorithm to formulate an efficient stochastic optimization algorithm to solve the optimal control process

u^{*}

.

Stochastic Optimization for Control Process

In this subsection, we combine the numerical schemes for the adjoint FBSDEs system (10) and the particle filter algorithm to formulate an efficient stochastic optimization algorithm to solve the optimal control process

u^{*}

.

On a time instant

t_{n} \in Π_{N_{T}}

, we have

E [{(J^{*})}_{u}^{'} (u_{t_{i}}^{l, M} |_{t_{n}}) | F_{t_{n}}^{M}] = \int_{R^{d}} E [Ψ (t_{i}, X_{t_{i}}, u_{t_{i}}^{l, M} |_{t_{n}}) | X_{t_{n}} = x] \cdot p (x | F_{t_{n}}^{M}) d x,

(19)

where

t_{i} \geq t_{n}

is a time instant after

t_{n}

.

Then, we use the approximate solutions

(Y_{i}, Z_{i})

of FBSDEs from schemes (14) to replace

(Y_{t_{i}}, Z_{t_{i}})

and the conditional distribution

p (X_{t_{n}} | F_{t_{n}}^{M})

is approximated by the empirical distribution

π (X_{t_{n}} | F_{t_{n}}^{M})

obtained from the particle filter algorithm (16)–(18). Then, we can solve the optimal control

u_{t_{n}}^{*}

through the following gradient descent optimization iteration

\begin{matrix} u_{t_{i}}^{l + 1, M} {|_{t_{n}} = u_{t_{i}}^{l, M} |}_{t_{n}} & - r \frac{1}{S} \sum_{s = 1}^{S} E [b_{u}^{⊤} (t_{i}, X_{t_{i}}, u_{t_{i}}^{l, M} |_{t_{n}}) Y_{i} + σ_{u}^{⊤} (t_{i}, X_{t_{i}}, u_{t_{i}}^{l, M} |_{t_{n}}) Z_{i} \\ + f_{u}^{⊤} (t_{i}, X_{t_{i}}, u_{t_{i}}^{l, M} |_{t_{n}}) | X_{t_{n}} = x_{n}^{(s)}] \end{matrix}

(20)

Then, the standard Monte Carlo method can approximate expectation

E [\cdot | X_{t_{n}} = x_{n}^{(s)}]

by

Λ

samples:

\begin{matrix} u_{t_{i}}^{l + 1, M} {|_{t_{n}} \approx u_{t_{i}}^{l, M} |}_{t_{n}} & - r \frac{1}{S} \frac{1}{Λ} \sum_{s = 1}^{S} \sum_{λ = 1}^{Λ} [b_{u}^{⊤} (t_{i}, X_{t_{i}}^{(λ, s)}, u_{t_{i}}^{l, M} |_{t_{n}}) Y_{i} + σ_{u}^{⊤} (t_{i}, X_{t_{i}}^{(λ, s)}, u_{t_{i}}^{l, M} |_{t_{n}}) Z_{i} \\ + f_{u}^{⊤} (t_{i}, X_{t_{i}}^{(λ, s)}, u_{t_{i}}^{l, M} |_{t_{n}}) | X_{t_{n}} = x_{n}^{(s)}] \end{matrix}

(21)

We can see from the above Monte Carlo approximation that in order to approximate the expectation in one gradient descent iteration step, we need to generate

S \times Λ

samples. This is even more computationally expensive when the controlled system is a high-dimensional process.

Thus, we want to apply the idea of stochastic gradient descent (SGD) to improve the efficiency of classic gradient descent optimization and combine it with the particle filter method. Instead of using the fully calculated Monte Carlo simulation to approximate the conditional expectation, we use only one realization of

X_{t_{i}}

to represent the expectation. For the conditional distribution of the controlled process, we can use the particles to describe. Thus, we have

\begin{matrix} E [{(J^{*})}_{u}^{'} (u_{t_{i}}^{l, M} |_{t_{n}}) | F_{t_{n}}^{M}] & \approx b_{u}^{⊤} (t_{i}, X_{t_{i}}^{(\hat{l}, \hat{s})}, u_{t_{i}}^{l, M} |_{t_{n}}) Y_{i}^{(\hat{l}, \hat{s})} + σ_{u}^{⊤} (t_{i}, X_{t_{i}}^{(\hat{l}, \hat{s})}, u_{t_{i}}^{l, M} |_{t_{n}}) Z_{i}^{(\hat{l}, \hat{s})} \\ + f_{u}^{⊤} (t_{i}, X_{t_{i}}^{(\hat{l}, \hat{s})}, u_{t_{i}}^{l, M} |_{t_{n}}) \end{matrix}

(22)

where l is the iteration index, and the index

\hat{l}

indicates that the random generation of the controlled process varies among the gradient descent iteration steps.

X_{t_{i}}^{(\hat{l}, \hat{s})}

indicates a randomly generated realization of the controlled process with a randomly selected initial state

X_{t_{n}}^{(\hat{l}, \hat{s})} = x_{n}^{\hat{s}}

from the particle cloud

{x_{n}^{(s)}}_{s = 1}^{S}

.

Then, we have the following SGD schemes:

\begin{matrix} u_{t_{i}}^{l + 1, M} {|_{t_{n}} = u_{t_{i}}^{l, M} |}_{t_{n}} & - r (b_{u}^{⊤} (t_{i}, X_{t_{i}}^{(\hat{l}, \hat{s})}, u_{t_{i}}^{l, M} |_{t_{n}}) Y_{i}^{(\hat{l}, \hat{s})} + σ_{u}^{⊤} (t_{i}, X_{t_{i}}^{(\hat{l}, \hat{s})}, u_{t_{i}}^{l, M} |_{t_{n}}) Z_{i}^{(\hat{l}, \hat{s})} \\ + f_{u}^{⊤} (t_{i}, X_{t_{i}}^{(\hat{l}, \hat{s})}, u_{t_{i}}^{l, M} |_{t_{n}})) \end{matrix}

(23)

where

Y_{i}^{(\hat{l}, \hat{s})}

is the approximate solution

Y_{i}

corresponding to the random sample

X_{t_{i}}^{(\hat{l}, \hat{s})}

, and the path of

X_{t_{i}}^{(\hat{l}, \hat{s})}

is generated as follows

X_{t_{i + 1}}^{(\hat{l}, \hat{s})} = X_{t_{i}}^{(\hat{l}, \hat{s})} + b (t_{i}, X_{t_{i}}^{(\hat{l}, \hat{s})}, u_{t_{i}}^{l, M} |_{t_{n}}) ▵ t_{i} + σ (t_{i}, X_{t_{i}}^{(\hat{l}, \hat{s})}, u_{t_{i}}^{l, M} |_{t_{n}}) \sqrt{▵ t_{i}} ω_{i}^{(\hat{l}, \hat{s})}

(24)

where

ω_{i}^{(\hat{l}, \hat{s})} \sim N (0, 1)

. Then, an estimate for our desired data-driven optimal control at time instant

t_{n}

is

{\hat{u}}_{t_{n}} : = u_{t_{n}}^{L, M} |_{t_{n}}

The scheme for FBSDEs is

\begin{matrix} Y_{i}^{(\hat{l}, \hat{s})} = Y_{i + 1}^{(\hat{l}, \hat{s})} + [b_{x} {(t_{i + 1}, X_{t_{i + 1}}^{(\hat{l}, \hat{s})}, u_{t_{i}}^{l, M} |_{t_{n}})}^{⊤} Y_{i + 1}^{(\hat{l}, \hat{s})} + σ_{x} {(t_{i + 1}, X_{t_{i + 1}}^{(\hat{l}, \hat{s})}, u_{t_{i}}^{l, M} |_{t_{n}})}^{⊤} Z_{i + 1}^{(\hat{l}, \hat{s})} \\ + f_{x} {(t_{i + 1}, X_{t_{i + 1}}^{(\hat{l}, \hat{s})}, u_{t_{i}}^{l, M} |_{t_{n}})}^{⊤}] ▵ t_{i} \\ Z_{i}^{(\hat{l}, \hat{s})} = \frac{Y_{i + 1}^{(\hat{l}, \hat{s})} ω_{i}^{(\hat{l}, \hat{s})}}{\sqrt{▵ t_{i}}} \end{matrix}

(25)

Then, we have the following Algorithm 1:

Algorithm 1 PF-SGD algorithm for data-driven feedback control problem.

Initialize the particle cloud ${x_{0}^{(s)}}_{s = 1}^{S} \sim ξ$ and the number of iteration $L \in N$
while $n = 0, 1, 2, \dots, N_{T},$ do
Initialize an estimated control process ${u_{t_{i}}^{(0, M)} |_{t_{n}}}_{i = n}^{N_{T}}$ and a step-size $ρ$
for SGD iteration steps $l = 0, 1, 2, \dots, L$ , do
Simulate one realization of controlled process ${X_{t_{i + 1}}^{(\hat{l}, \hat{s})} |_{t_{n}}}_{i = n}^{N_{T} - 1}$ through scheme (24) with
${X_{t_{n}}^{(\hat{l}, \hat{s})}} = x_{n}^{(\hat{s})} \in {x_{n}^{(s)}}_{s = 1}^{S}$ ;
Calculate solution ${Y_{t_{i}}^{(\hat{l}, \hat{s})} |_{t_{n}}}_{i = N_{T}}^{n}$ of the FBSDEs system (14) corresponding to
${X_{t_{i + 1}}^{(\hat{l}, \hat{s})} |_{t_{n}}}_{i = n}^{N_{T} - 1}$ through schemes (25);
Update the control process to obtain ${u_{t_{i}}^{(l + 1, M)} |_{t_{n}}}_{i = n}^{N_{T}}$ through scheme (23);
end for
The estimated optimal control is given by ${\hat{u}}_{t_{n}}^{*} = u_{t_{n}}^{(L, M)} |_{t_{n}}$
Propagate particles through the particle filter algorithm (16)–(18) to obtain ${x_{n + 1}^{(s)}}_{s = 1}^{S}$
by using the estimated optimal control ${\hat{u}}_{t_{n}}^{*}$ .
end while

3. Convergence Analysis

Our convergence analysis aims to show the convergence of the distribution of the state to the "true state" under the temporal model discretization N. We also show the convergence of the estimated control to the "true control" under the expectation restricting it to a compact set. To proceed, we first introduce our notations and the assumptions required in the proof in Section 3.1. Then, in Section 3.2, we shall provide the main convergence theorems.

3.1. Notations and Assumptions

Notations

We use $U_{n} : {t_{n}, \dots, T} \to R^{d}$ to denote the control process starts from time $t_{n}$ and ends at time T. We use

$U_{n} : = {U_{n} | U_{n} : {t_{n}, \dots, T} \to R^{d}, U_{n} is F_{t_{n}}^{M} - adapted}$

to denote the collection of the admissible controls starting at time $t_{n}$ .
We define the control at time $t_{n}$ to be $u_{n} : = U_{n} |_{t_{n}}$ , the conditional distribution coming from a particle filter algorithm.
We define $μ_{n}^{N} : = π_{t_{n} | t_{n}}^{N}$ where the superscript means that the measure is obtained through the particle filter method, and so it is random.
We use $S^{N}$ to denote the sampling operator: $π_{t_{n} | t_{n}}^{N} ≜ \frac{1}{N} \sum_{i = 1}^{N} δ_{x_{t}^{(i)}}$ and $L_{n}$ to denote the updating step in the particle filter. We use $P_{n}^{N}$ to denote the transition operator (the prediction step) under the SGD–particle filter framework. $P_{n}$ is the deterministic transition operator for the exact case (the control is exact in SDG). We mention “deterministic” here to distinguish the case where the control $u_{n}$ may be random due to the SGD optimization algorithm.
We use $〈 \cdot, \cdot 〉$ to denote the deterministic $L_{2}$ inner product, i.e., if $f, g \in L^{2} ([0, T]; R^{d})$ , then

$\begin{matrix} 〈 f, g 〉 : = \int_{0}^{T} f \cdot g d t \end{matrix}$

(26)
We define $J_{N}^{' x} (U_{n}) : = E [J_{N}^{'} (U_{n}) | X_{n} = x]$ . We then have $E [J_{N}^{' X_{n}} (U_{n})] : = \int E [J_{N}^{'} (U_{n}) | X_{n} = s] d μ_{n}^{N} (s)$ . We remark that $U_{n}$ is a process that starts from time $t_{n}$ , and so $X_{n}$ is essentially the initial condition of the diffusion process.
We define the distance between two random measures to be the following:

$\begin{matrix} d (μ, ν) : = sup_{{| | f | |}_{\infty} \leq 1} \sqrt{E^{ω} [| μ^{ω} f - ν^{ω} f |^{2}]} \end{matrix}$

(27)

where the expectation is taken over by the randomness of the measure.
We use the total variation distance between two deterministic probability measures $μ, ν$ :

$\begin{matrix} d_{T V} (μ, ν) : = sup_{{| | f | |}_{\infty} \leq 1} | μ f - ν f | \end{matrix}$

(28)
We use $K_{n}$ to denote the total number of iterations taken in the SGD algorithm at time $t_{n}$ ; we use $N$ to denote the total number of particles in the system. We use C to denote a generic constant which may vary from line to line.
Abusing the notation, we will denote $J_{N}^{' x} (U_{n})$ in the following way where the argument $U_{n}$ can be a vector of any length $1 \leq n \leq N$ :

$\begin{matrix} J_{N}^{' x} (U_{n}) |_{t_{i}} : = E^{X_{t_{n}} = x} [b_{u} {(X_{t_{i}}, U_{n} |_{t_{i}})}^{⊤} Y_{t_{i}} + σ_{u} {(X_{t_{i}}, U_{n})}^{⊤} Z_{t_{i}} + f_{u} (X_{t_{i}}, U_{n} |_{t_{i}})] \end{matrix}$

(29)

Assumptions

We assume that $J_{N}^{'}$ satisfy the following strong condition: for any $x \in X$ , there exist a constant $λ > 0$ such that for all $U, V \in U_{0}$ :

$\begin{matrix} λ | | U - {V | |}^{2} \leq 〈 J_{N}^{' x} (U) - J_{N}^{' x} (V), U - V 〉 \end{matrix}$

(30)

Notice that (30) implies that such inequality is true for any $U_{n}, V_{n} \in U_{n}$ , and it can be seen from simply fixing all the $U_{n} {|_{t_{i}}, V_{n} |}_{t_{i}}$ , $0 \leq i \leq n - 1$ to be 0.
This is a very strong assumption, and one should consider relaxing it to

$\begin{matrix} λ | | U - {V | |}^{2} \leq E^{ω} [E^{μ_{n}, \cdot} [〈 J_{N}^{' x} (U), J_{N}^{' x} (V), U - V 〉]] \end{matrix}$

(31)

That is, this relation holds in expectation instead of point-wise.
Both b and $σ$ are deterministic and in $C_{b}^{2, 2} (R^{d} \times R^{m}; R^{d})$ in space variable x and control u.
$b, b_{x}, b_{u}, σ, σ_{x}, f_{x}, f_{u}$ are all uniformly Lipschitz in $x, u$ and uniformly bounded.
$σ$ satisfies the uniform elliptic condition.
The initial condition $X_{0} : = ξ \in L^{2} (F_{0})$ .
The terminal (Loss) function $Φ$ is $C^{1}$ and positive, and $Φ_{x}$ has the most linear growth at infinity.
We assume that the function $g_{n}$ (related to the Bayesian step) has the following bound: there exists $0 < κ < 1$ such that

$κ \leq g_{n} \leq κ^{- 1}$

3.2. The Convergence Theorem for the Data-Driven Feedback Control Algorithm

Our algorithm combines the particle filter method and the stochastic gradient descent method. Lemma 1 (combine Lemma 4.7–4.9 from the book [22]) provides the convergence result for the particle filter method alone. It shows that each prediction and updating step is guaranteed to be convergent.

Recall that

S^{N}

is the sampling operator where we sample

N

particles.

P_{n}^{N}

denotes the transition operator (the prediction step) under the SGD–particle filter framework.

P_{n}

denotes the deterministic transition operator assuming that SGD gives the exact control.

L_{n}

denotes the updating step in the particle filter method.

Lemma 1.

We assume that there exists

κ \in (0, 1]

. The following is true:

\begin{matrix} sup_{μ \in P (μ)} d (S^{N} μ, μ) & \leq \frac{1}{\sqrt{N}} \end{matrix}

\begin{matrix} d (P_{n}^{N} μ, P_{n}^{N} ν) & \leq d (μ, ν) \\ d (P_{n} μ, P_{n} ν) & \leq d (μ, ν) \end{matrix}

(32)

\begin{matrix} d (L_{n} ν, L_{n} μ) & \leq 2 κ^{- 2} d (ν, μ) \end{matrix}

(33)

Given Lemma 1, Theorem 4.5 in [22] tells us the particle filter framework is convergent. Then, following Lemma 1, we can have the distance between the true distribution of the state and the estimated distribution through the SGD–particle filter framework.

\begin{matrix} d (μ_{n + 1}^{N, \cdot}, μ_{n + 1}) & \equiv d (L_{n} S^{N} P_{n}^{N} μ_{n}^{N, \cdot}, L_{n} P_{n} μ_{n}) \\ \leq d (L_{n} S^{N} P_{n}^{N} μ_{n}^{N, \cdot}, L_{n} S^{N} P_{n} μ_{n}) + d (L_{n} S^{N} P μ_{n}, L_{n} P_{n} μ_{n}) \\ \leq 2 κ^{- 2} (\frac{2}{\sqrt{N}} + d (P_{n}^{N} μ_{n}^{N, \cdot}, P_{n} μ_{n}^{N, \cdot}) + d (S^{N} P_{n} μ_{n}^{N, \cdot}, P_{n} μ_{n})) \\ \leq 2 κ^{- 2} (\frac{3}{\sqrt{N}} + d (P_{n}^{N} μ_{n}^{N, \cdot}, P_{n} μ_{n}^{N, \cdot}) + d (μ_{n}^{N, \cdot}, μ_{n})) \end{matrix}

(34)

where in the above inequalities, we have used triangle inequalities and Lemma 1.

Hence, if we can show that the inequality of the following form holds

\begin{matrix} d (P_{n}^{N} μ_{n}^{N, \cdot}, P_{n} μ_{n}^{N, \cdot}) \leq C_{n} d (μ_{n}^{N, \cdot}, μ_{n}) + ϵ_{n} \end{matrix}

(35)

for some constant

C_{n}

and

ϵ_{n}

that we can tune, then by recursion, we can show that by using (34), the convergence holds true.

Remark.

We point out that the difficulty lies in showing (35). Recall that the distance between two random measures is defined in (27) and involves testing the overall measurable function bounded by 1. However, we will see later that it is more desirable to test against Lipschitz functions. Hence, since the underlying measure is a finite Borel probability measure, we want to identify the function first with a continuous function on a compact set (Lusin). Then, we approximate this continuous function uniformly by a Lipschitz function since now the domain is compact. This way, we can roughly show that a form close to (35) is true.

Remark.

Notice that the first measure in

d (P_{n}^{N} μ_{n}^{N, \cdot}, P_{n} μ_{n}^{N, \cdot})

has two sources of randomness: the randomness in

P_{n}

which comes from the SGD algorithm used to find the control, and the randomness in the measure

μ_{n}^{N}

. However, we do not distinguish the two when we take the expectation.

To prove the convergence, we need to create a subspace where all particles

X_{i}

(obtained from the particle filter method) at any time n are within this bounded subspace. Or we can relax it to the statement that the probability of any particles

X_{i}

escaping from a very large region is very small. Lemma 2 shows we can restrict the particles to a compact subspace with the radius M by starting from any particles and any admissible control

U_{0}

.

Lemma 2.

There exists M and constant C, such that under any admissible control

U_{0}

\begin{matrix} P (sup_{\begin{matrix} i \in {1, \dots, N} \\ U_{0} \in U \end{matrix}} | X_{i} | \geq M) \leq \frac{C}{M^{2}}, X_{i} \sim π_{t_{i} | t_{i - 1}}^{N} or X_{i} \sim π_{t_{i} | t_{i}}^{N} \end{matrix}

(36)

Proof.

See Appendix A.1. □

Remark.

Lemma 2 tells that starting from any random selection of particles and any admissible control

U_{0}

, at any time t, all particles are restricted by a compact set

M

with diameter

d i a m (M) \leq M

, such that

\begin{matrix} P (sup_{\begin{matrix} i \in {1, \dots, N} \\ U_{0} \in U \end{matrix}} | X_{i} | \geq d i a m (M)) \leq \frac{C}{M^{2}}, X_{i} \sim π_{t_{i} | t_{i - 1}}^{N} or X_{i} \sim π_{t_{i} | t_{i}}^{N} \end{matrix}

(37)

We will use the following result extensively later

\begin{matrix} E [1_{{| X_{n} | \geq M}}] \leq \frac{C}{M^{2}}, \forall 1 \leq n \leq N \end{matrix}

(38)

The following Lemma 3 describes the difference between the estimated optimal control

u_{n}

and the true control

u_{n}^{*}

. Let

G_{k} : = {Δ W_{n}^{i}, x^{i}}_{i = 0}^{k - 1}

. We can see that knowing

G_{k}

essentially means that we know the control

u_{k}

in the SGD framework at time

t_{n}

, since according to our scheme, the control is

G_{k}

measurable.

Lemma 3.

Under a fixed temporal discretization number N, with the particle cloud

μ^{N, ω}

, a deterministic

u_{n}^{*}

and a compact domain

K_{n}^{'}

, (such that

E^{ω} E^{μ_{n}^{N, ω}} [1_{K_{n}^{' c}}] \leq \frac{C}{M_{n}^{2}}

and

d i a m (K_{n}^{' c}) \leq M_{n}

), we have for any iteration number K that the following holds

\begin{matrix} E^{ω} [E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} {| u_{n}^{ω} - u_{n}^{*} |}^{2} | X_{n} = x]] \leq C M_{n}^{2} sup_{{| | q | |}_{\infty} \leq 1} E^{ω} [| μ_{n}^{N, ω} q - μ_{n} q |^{2}] + \frac{C}{M_{n}} + \frac{C M_{n}^{2}}{K} \end{matrix}

(39)

Remark.

The value of

{sup}_{{| | q | |}_{\infty} \leq 1} E^{ω} [| μ_{n}^{N, ω} q - μ_{n} q |^{2}]

depends on

μ_{n}^{N, ω}

, which is obtained from the previous step, and it does not depend on the current

M_{n}

. As a result, we can see that, as long as

sup_{{| | q | |}_{\infty} \leq 1} E^{ω} [| μ_{n}^{N, ω} q - μ_{n} q |^{2}] \to 0

(39) can be made arbitrarily small on any compact domain

K_{n}^{'}

, and this indicates the point-wise convergence for u at any time

t_{n}

.

Proof.

For simplicity in the proof, we denote control as

U_{n}^{K + 1}

where K is the iteration in SGD, and n is the current time

t_{n}

.

j_{n}^{' x^{k}} (U_{n}^{K})

denote the SGD process using estimated control

U_{n}^{K}

, and

J_{N}^{' x} (U_{n}^{*})

denote the process using true control

U_{n}^{*}

.

\begin{matrix} U_{n}^{K + 1} & = U_{n}^{K} - η_{k} j_{n}^{' x^{k}} (U_{n}^{K}) \end{matrix}

(40)

\begin{matrix} U_{n}^{*} & = U_{n}^{*} - η_{k} E^{μ_{n}} [J_{N}^{' x} (U_{n}^{*})] \end{matrix}

(41)

where

x^{k}

is drawn from the current distribution

μ_{n}^{N, ω}

and

E^{μ_{n}} [J_{N}^{' x} (U_{n}^{*})] = 0

by the optimality condition. Take the difference between (40) and (41), square both sides, and take conditional expectation

E [\cdot | G_{K}]

, and this conditional expectation is taken with respect to the following randomness:

The randomness comes from the selection on the initial point $x_{n}^{k}$ .
The randomness comes from the pathwise approximated Brownian motion used for FBSDEs.
The randomness comes from the accumulation of the past particle sampling.

We can write

E [j^{' x} (U^{K}) | G_{K}] = E^{μ_{n}^{N, ω}} [J_{N}^{' x} (U^{K}) | G_{K}]

, which can be seen from the following

\begin{matrix} E [j^{' x} (U^{K}) | G_{K}] & = E^{μ_{n}^{N, ω}} [E^{X_{t_{n}} = x} [j^{' x} (U^{K}) | G_{K}]] \\ = E^{μ_{n}^{N, ω}} [J_{N}^{' x} (U^{K}) | G_{K}] \end{matrix}

(42)

Then, take the square norm on both sides, multiply by an indicator function

1_{K_{n}^{'}}

and take conditional expectation

E [\cdot | G_{K}]

. Noticing that

U_{n}^{K}

is

G_{K}

is measurable and

U^{*}

is deterministic. In this case, we obtain the following

\begin{matrix} E [1_{K_{n}^{'}} | | U_{n}^{K + 1} - U_{n}^{*} | |^{2} | G_{K}] & = E [1_{K_{n}^{'}} | | U_{n}^{K} - U_{n}^{*} {| |}^{2} | G_{K}] - 2 η_{k} 〈 E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{' x} (U_{n}^{K}) | G_{K}] \\ - E^{μ_{n}} [1_{K_{n}^{'}} J_{N}^{' x} (U_{n}^{*})], U_{n}^{K} - U_{n}^{*} 〉 + η_{K}^{2} E [1_{K_{n}^{'}} | | j^{' x} (U_{n}^{K}) - E^{μ_{n}} [J_{N}^{' x} (U_{n}^{*})] | | | G_{K}] \\ = E [1_{K_{n}^{'}} | | U_{n}^{K} - U_{n}^{*} {| |}^{2} | G_{K}] - 2 η_{k} E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} 〈 J_{N}^{' x} (U_{n}^{K}) - J_{N}^{' x} (U_{n}^{*}) \\ + J_{N}^{' x} (U_{n}^{*}) - E^{μ_{n}} [J_{N}^{' x} (U_{n}^{*})], U_{n}^{K} - U_{n}^{*} 〉 | G_{K}] \\ + η_{k}^{2} E^{μ_{n}^{N}} [1_{K_{n}^{'}} | | j^{' x} (U_{n}^{K}) - E^{μ_{n}} [J_{N}^{' x} (U_{n}^{*})] | |] \\ = E [1_{K_{n}^{'}} | | U_{n}^{K} - U_{n}^{*} {| |}^{2} | G_{K}] - 2 η_{k} E^{μ_{n}^{N, ω}} [〈 1_{K_{n}^{'}} J_{N}^{' x} (U_{n}^{K}) \\ - 1_{K_{n}^{'}} J_{N}^{' x} (U^{*}), U_{n}^{K} - U_{n}^{*} 〉 | G_{K}] \\ - 2 η_{k} 〈 E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{' x} (U_{n}^{*})] - E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} E^{μ_{n}} [J_{N}^{' x} (U_{n}^{*})]], 1_{K_{n}^{'}} (U_{n}^{K} - U_{n}^{*}) 〉 \\ + η_{k}^{2} E [1_{K_{n}^{'}} | | j^{' x} (U_{n}^{K}) - E^{μ_{n}} [J_{N}^{' x} (U_{n}^{*})] | | | G_{K}] \\ \leq (1 - λ η_{k}) E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} | | U_{n}^{K} - U_{n}^{*} {| |}^{2} | G_{K}] \\ + \frac{η_{k}}{λ} \underset{* *}{\underset{︸}{| | E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{' x} (U_{n}^{*}) - 1_{K_{n}^{'}} E^{μ_{n}} [J_{N}^{' x} (U_{n}^{*})]] {| |}^{2}}} + η_{k}^{2} E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} C | x_{i} |^{2} + C] \end{matrix}

(43)

where in the last line, we used the following Lemma from [13] that states that there exists C such that

\begin{matrix} E [| | j^{' x} (U_{n}) {| |}^{2} {] \leq C | x |}^{2} + C \end{matrix}

(44)

Recall that

E^{μ_{n}} [J_{N}^{' x} (U^{*})] = 0

, we then have

\begin{matrix} * * & \leq | | E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] - E^{μ_{n}^{N, ω}} [E^{μ_{n}} [J_{N}^{' x} (U^{*})]] + E^{μ_{n}^{N, ω}} [E^{μ_{n}} [J_{N}^{' x} (U^{*})]] \\ - E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} E^{μ_{n}} [J_{N}^{' x} (U^{*})]] {| |}^{2} \\ \leq | | E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] - E^{μ_{n}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] + E^{μ_{n}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] - E^{μ_{n}^{N, ω}} [E^{μ_{n}} [J_{N}^{' x} (U^{*})] {| |}^{2} \\ \leq (1 + ϵ) | | E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] - E^{μ_{n}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] {| |}^{2} + (1 + \frac{1}{ϵ}) | | E^{μ_{n}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] \\ - E^{μ_{n}} [J_{N}^{' x} (U^{*})] {| |}^{2} \\ \leq C | | E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] - E^{μ_{n}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] {| |}^{2} + \frac{C}{M_{n}} \end{matrix}

(45)

Then, we take the expectation on both sides over the randomness and we have

\begin{matrix} E [1_{K_{n}^{'}} | | U^{K + 1} - U^{*} | |^{2}] & \leq (1 - λ η_{k}) E^{μ_{n}^{N}} [| | U^{k} - U^{*} {| |}^{2}] + \frac{η_{k}}{λ} E^{ω} [C | | E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] \\ - E^{μ_{n}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] {| |}^{2} + \frac{C}{M_{n}}] + η_{k}^{2} C M_{n}^{2} \\ \leq \frac{| | U^{0} - U^{*} {| |}^{2}}{K} + C E^{ω} [| | E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] - E^{μ_{n}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*}) {| |}_{2}^{2}] \\ + \frac{C}{M_{n}} + \frac{C M_{n}^{2}}{K} \end{matrix}

(46)

Notice that for the control

U^{*}

, we know that for for a fixed x,

J_{N}^{' x} (U^{*})

is uniformly bounded:

\begin{matrix} | | E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] - E^{μ_{n}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*})] {| |}_{2}^{2} \\ = \sum_{i = n}^{N} Δ t | E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*}) |_{i}] - E^{μ_{n}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*}) |_{i}] |^{2} \\ \leq \sum_{i = n}^{N} Δ t sup_{j \in {n, \dots, N}} | E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{'} (U^{*}) (x) |_{j}] - E^{μ_{n}} [1_{K_{n}^{'}} J_{N}^{'} (U^{*}) (x) |_{j}] |^{2} \\ \leq sup_{j \in {n, \dots, N}} | E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*}) |_{j}] - E^{μ_{n}} [1_{K_{n}^{'}} J_{N}^{' x} (U^{*}) |_{j}] |^{2} \end{matrix}

(47)

However, since from the result in (44):

\begin{matrix} sup_{j \in {n, \dots, N}} | J_{N}^{' x} (U^{*}) |_{j} |^{2} & \leq {C | x |}^{2} + C \end{matrix}

(48)

we have that

\begin{matrix} 1_{K_{n}^{'}} sup_{j \in {n, \dots, N}} | J_{N}^{' x} (U^{*}) |_{j} |^{2} & \leq C | M_{n} |^{2} 1_{K_{n}^{'}} | q (x) | \end{matrix}

(49)

for some

q (x)

, where

{| | q (x) | |}_{\infty} \leq 1

. As a result, we see that

\begin{matrix} (46) & \leq C M_{n}^{2} E^{ω} [| E^{μ_{n}^{N, ω}} [q (x)] - E^{μ_{n}} [q (x)] |^{2}] + \frac{C M_{n}^{2}}{K} \\ \leq C M_{n}^{2} sup_{{| | q | |}_{\infty} \leq 1} E^{ω} [| μ_{n}^{N, ω} q - μ_{n} q |^{2}] + \frac{C M_{n}^{2}}{K} \end{matrix}

(50)

Thus, we have

\begin{matrix} E^{ω} [E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} sup_{n} {| u^{K + 1} - u^{*} |}^{2}]] & \leq C M_{n}^{2} sup_{{| | q | |}_{\infty} \leq 1} E^{ω} [| μ_{n}^{N, ω} q - μ_{n} q |^{2}] + \frac{C}{M_{n}} + \frac{C M_{n}^{2}}{K} \end{matrix}

(51)

where we have absorbed the constant term N in C. □

Lemma 3 shows that when the empirical distribution

μ_{n}^{N}

is close enough to the true distribution

μ_{n}

, the difference between

u_{n}

and

u_{n}^{*}

under the expectation restricted to a compact set is bounded by the difference between the true distribution of the state and the estimated distribution. Thus, suppose we can show the convergence of the distribution. In that case, we have the convergence result of the estimated control to the "true control" under the expectation restricted to a compact set.

Next, we want to show that by moving forward in one step, the distance between the true distribution of the state and the estimated distribution through the SGD–particle filter framework is bounded by the distance of the previous step with some constants in Lemma 4.

Lemma 4.

For each

n = 0, 1, \dots, N - 1

, there exist

M_{n}, L_{n}, δ_{n}, K_{n}

such that the following inequality holds

\begin{matrix} d (μ_{n + 1}^{N, \cdot}, μ_{n + 1}) & \leq 2 κ^{- 2} ((1 + C Δ t L_{n} M_{n}) d (μ_{n}^{N, \cdot}, μ_{n}) + \frac{C}{M_{n}} + \frac{C M_{n}}{K_{n}} + 2 δ_{n} + \frac{3}{\sqrt{N}}) \end{matrix}

(52)

Proof.

The key step is to estimate the quantity

d^{2} (P_{n}^{N} μ_{n}^{N, \cdot}, P_{n} μ_{n}^{N, \cdot})

in (34). WLOG, we assume that the sup is realized by the function f with

{| | f | |}_{\infty} \leq 1

; then, we have

\begin{matrix} d^{2} (P_{n}^{N} μ_{n}^{N, \cdot}, P_{n} μ_{n}^{N, \cdot}) = E^{ω} [| P_{n}^{N} μ_{n}^{N, \cdot} f - P_{n} μ_{n}^{N, \cdot} f |^{2}] \end{matrix}

(53)

Notice that

P_{n}^{N}

is the prediction operator that uses the control

u_{n}

which carries the randomness from SGD, and

P_{n}

uses the control

u_{n}^{*}

. Then

P_{n}^{N} μ_{N}^{N, ω}

is a random measure, and we comment that both

u_{n}^{*}

and

μ_{n}

are deterministic.

Without loss of generality, we use

u_{n}^{ω}

and

μ_{n}^{N, ω}

to denote the random control and the random measure. (Even though the randomness can be different, we can concatenate

(ω_{1}, ω_{2}) : = ω

to define them as

ω

in general.)

We have for the fixed randomness

ω

, and by Fubini’s theorem

\begin{matrix} | P_{n}^{N} μ_{n}^{N, ω} f - P_{n} μ_{n}^{N, ω} {f |}^{2} & = | E^{μ_{n}^{N, ω}} [E [\underset{f_{1}^{ω}}{\underset{︸}{f (X_{n} + b (X_{n}, u_{n}^{ω}) Δ t + σ (X_{n}) Δ W_{n})}} | X_{n} = x]] \\ - E^{μ_{n}^{N, ω}} [E [\underset{f_{2}}{\underset{︸}{f (X_{n} + b (X_{n}, u_{n}^{*}) Δ t + σ (X_{n}) Δ W_{n})}} | X_{n} = x]] |^{2} \\ = | E^{μ_{n}^{N, ω}} [E [f_{1}^{ω} - f_{2} | X_{n} = x]] |^{2} \\ = \underset{A_{1}}{\underset{︸}{| E^{μ_{n}^{N, ω}} [1_{M_{n}} E [f_{1}^{ω} - f_{2} | X_{n} = x]] |^{2}}} \\ + \underset{A_{2}}{\underset{︸}{| E^{μ_{n}^{N, ω}} [1_{M_{n}^{c}} E [f_{1}^{ω} - f_{2} | X_{n} = x]] |^{2}}} \end{matrix}

(54)

where the inner conditional expectation is taken with respect to

Δ W_{n}

.

Now, since we can pick

M_{n}

to be a large compact set containing the origin, then

\begin{matrix} P (sup_{n, U_{0}} | X_{n} | \geq diam (M_{n})) \leq \frac{C}{M_{n}^{2}} \end{matrix}

(55)

To deal with

A_{1}, A_{2}

, we see that it is desirable that the function f has the Lipschitz property. However, it is only measurable in general. The strategy to overcome this difficulty is to first use Lusin’s Theorem to find a continuous identification

\tilde{f}

with f on a large compact set; then, on this compact set, we can approximate

\tilde{f}

uniformly by a Lipschitz function.

We see that

\begin{matrix} A_{1} & \leq E^{μ_{n}^{N, ω}} [1_{M_{n}} E [| f_{1}^{ω} - f_{2} |^{2} | X_{n} = x]] \end{matrix}

(56)

Then, by taking expectation on both sides over all the randomness in this quantity, we have

\begin{matrix} E^{ω} [A_{1}] & \leq E^{ω} E^{μ_{n}^{N, ω}} [1_{M_{n}} E^{} [| f_{1}^{ω} - f_{2} |^{2}]] \end{matrix}

(57)

We know that there exists a big compact

K_{n}

(so a large

M_{n}

) containing the origin such that

\begin{matrix} P (sup_{n, U_{0}} | X_{n} | \geq diam (K_{n})) \leq \frac{C}{M_{n}^{2}} \end{matrix}

(58)

and a continuous

{\tilde{f}}^{n}

with

{\tilde{f}}^{n} {|_{K_{n}} = f |}_{K_{n}}

by Lusin’s theorem.

Thus, we know that

{\tilde{f}}^{n} {|_{K_{n} \cap M_{n}} = f |}_{K_{n} \cap M_{n}}

, and we also have the following inequality:

\begin{matrix} E^{ω} E^{μ_{n}^{N, ω}} [1_{M_{n}} E^{} [| f_{1}^{ω} - f_{2} |^{2}]] & = E^{ω} E^{μ_{n}^{N, ω}} [(1_{M_{n} \cap K_{n}} + 1_{M_{n} \cap K_{n}^{c}}) E^{} [| f_{1}^{ω} - f_{2} |^{2}]] \\ \leq E^{ω} E^{μ_{n}^{N, ω}} [(1_{M_{n} \cap K_{n}} + 1_{K_{n}^{c}}) E^{} [| f_{1}^{ω} - f_{2} |^{2}]] \end{matrix}

(59)

Moreover, since both

K_{n}

and

M_{n}

are compact,

K_{n}^{'} : = K_{n} \cap M_{n}

is also compact with

d i a m (K_{n}^{'}) \leq M_{n}

. From Lemma 2, we know that there exists some constant C such that for any

π_{t_{n} | t_{n - 1}}^{N}

,

π_{t_{n} | t_{n}}^{N}

that one obtains from or particle filter–SGD algorithm,

X \sim π_{t_{n} | t_{n - 1}}^{N}

or

π_{t_{n} | t_{n}}^{N}

:

\begin{matrix} E^{ω} [E [1_{{X \in K_{n}^{c}}}]] \leq \frac{C}{M_{n}^{2}} \end{matrix}

(60)

Hence, we have that

\begin{matrix} (59) & \leq E^{ω} E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} E^{} [| f_{1}^{ω} - f_{2} |^{2}]] + \frac{C}{M_{n}^{2}} \end{matrix}

(61)

To deal with

A_{2}

, notice that

| f_{1}^{ω} - f_{2} | \leq 2

by the choice of f, we have the following.

\begin{matrix} A_{2} & \leq E^{μ_{n}^{N, ω}} [1_{M_{n}^{c}} E [| f_{1}^{ω} - f_{2} |^{2} | X_{n} = x]] \\ \Rightarrow E^{ω} [A_{2}] & \leq 4 E^{ω} [E^{μ_{n}^{N, ω}} [1_{K_{n}^{c}}]] \\ \leq \frac{C}{M_{n}^{2}} \end{matrix}

(62)

by Lemma 2.

To deal with

A_{1}

, we have by the density of the Lipschitz function that there exists

| | f^{n} - {\tilde{f}}^{n} {| |}_{K_{n}^{'}, \infty} \leq δ_{n}

with Lipschitz constant

L_{n}

. We point out that

L_{n}

may depend on

K_{n}^{'}

,

δ_{n}

and the function

\tilde{f} |_{K_{n}^{'}}

. Now, by taking the expectation on both sides and using the Lipschitz property, we have

\begin{matrix} E^{ω} [A_{1}] \leq {(C Δ t L_{n})}^{2} \underset{*}{\underset{︸}{E^{ω} [E^{μ_{n}^{N, ω}} [1_{K_{n}^{'}} {| u_{n}^{ω} - u_{n}^{*} |}^{2} | X_{n} = x]]}} + \frac{C}{M_{n}^{2}} + 4 δ_{n}^{2} \end{matrix}

(63)

We realize that * is the SGD optimization part of the algorithm in expectation, and we note that we have dropped the inner expectation. The expectation

E^{μ_{n}^{N, ω}} [\cdot]

means that given the initial condition

X_{n} = x \in 1_{K_{n}^{'}}

, with

X_{n} \sim μ_{n}^{N, ω}

, one wants to find the difference in expectation between

u_{n}

and

u_{n}^{*}

. The outer expectation

E^{ω} [\cdot]

means averaging overall the randomness in both the measure and the SGD.

Now, by using (50) in Lemma 3, absorbing N in the constant C, we obtain the following

\begin{matrix} E^{ω} [A_{1}] \leq {(C Δ t L_{n})}^{2} N M_{n}^{2} sup_{| | q | | \leq 1} E^{ω} [| μ_{n}^{N, ω} q - μ_{n} q |^{2}] + \frac{C M_{n}^{2}}{K_{n}} + \frac{C}{M_{n}^{2}} + 4 δ_{n}^{2} \end{matrix}

(64)

By definition of the distance between two random measures, we have that:

\begin{matrix} E [A_{1}] & \leq {(C Δ t L_{n})}^{2} N M_{n}^{2} d^{2} (μ_{n}^{N, \cdot}, μ_{n}) + \frac{C M_{n}^{2}}{K_{n}} + \frac{C}{M_{n}^{2}} + 4 δ_{n}^{2} \\ \Rightarrow \sqrt{E [A_{1}]} & \leq C Δ t L_{n} M_{n} d (μ_{n}^{N, \cdot}, μ_{n}) + \frac{C M_{n}}{\sqrt{K_{n}}} + \frac{C}{M_{n}} + 2 δ_{n} \end{matrix}

(65)

Since

\sqrt{E [A_{2}]} \leq \frac{C}{M_{n}}

, we have that

\begin{matrix} (34) \leq 2 κ^{- 2} (\frac{3}{\sqrt{N}} & + C Δ t L_{n} M_{n} d (μ_{n}^{N, \cdot}, μ_{n}) + \frac{C M_{n}}{\sqrt{K_{n}}} + \frac{C}{M_{n}} + 2 δ_{n} + \frac{2}{M_{n}} + d (μ_{n}^{N, \cdot}, μ_{n})) \\ \Rightarrow d (μ_{n + 1}^{N, \cdot}, μ_{n + 1}) & \leq 2 κ^{- 2} ((1 + C Δ t L_{n} M_{n}) d (μ_{n}^{N, \cdot}, μ_{n}) + \frac{C}{M_{n}} + \frac{C M_{n}}{\sqrt{K_{n}}} + 2 δ_{n} + \frac{3}{\sqrt{N}}) \end{matrix}

(66)

where in (66), we have merged

\sqrt{N}

into C. □

Remark.

Lusin’s theorem requires the underlying measure to be finite Borel regular, and in this case, we are looking at the measure

\tilde{μ}

defined as follows: for

A \subset R^{n}

,

\tilde{μ} (A) = P ({ω | t h e r e e x i s t n, U_{0}

s u c h t h a t X_{n} (ω) \in A})

.

\tilde{μ}

is clearly a probability measure induced on the Polish space

R^{n}

, and so it is tight by the inverse implication of Prokhorov’s theorem (or we can use the fact that all finite Borel measures defined on a complete metric space are tight). Thus, it is inner regular; since now

\tilde{μ}

is also clearly locally finite, it also implies the outer regularity.

Finally, we can use Lemma 4 repeatedly to show the convergence result:

Theorem 5.

By taking

μ_{0}^{N} = μ^{0}

, there exist

{M_{n} | M_{n} \in R, n = 0, 1, . . N - 1}

,

{L_{n} | L_{n} \in R, n = 0, 1, . . N - 1}

and

{δ_{n} | δ_{n} \in R, n = 0, 1, . . N - 1}

such that

\begin{matrix} d (μ_{N}^{N, \cdot}, μ_{N}) & \leq \sum_{i = 0}^{N - 1} {(2 κ^{- 2})}^{i} \prod_{j = 0}^{i - 1} C_{N - j} (\frac{C}{M_{N - i}} + 2 δ_{N - i} + \frac{C M_{N - i}}{\sqrt{K_{N - i}}} + \frac{3}{\sqrt{N}}) \end{matrix}

(67)

where

C_{j} : = 1 + C Δ t L_{j} M_{j}

. Then, for any

M > 0

, we have by picking

{M_{n}}

,

{K_{n}}

,

N

large enough and

{δ_{n}}

small enough, and then the following holds

\begin{matrix} d (μ_{N}^{N, \cdot}, μ_{N}) \leq \frac{C}{M} \end{matrix}

(68)

for some fixed constant C which depends only on κ.

Proof.

With

C_{n}

defined as

\begin{matrix} C_{n} : = 1 + C Δ t L_{n} M_{n} \end{matrix}

(69)

and by using (66) repeatedly, we obtain the following result:

\begin{matrix} d (μ_{N}^{N, \cdot}, μ_{N}) & \leq \sum_{i = 0}^{N - 1} {(2 κ^{- 2})}^{i} \prod_{j = 0}^{i - 1} C_{N - j} (\frac{C}{M_{N - i}} + \frac{C M_{N - i}}{K_{N - i}} + 2 δ_{N - i} + \frac{3}{\sqrt{N}}) \\ + {(2 κ^{- 2})}^{N} \prod_{j = 0}^{N - 1} C_{N - j} d (μ_{0}^{N}, μ_{0}) \\ \leq \sum_{i = 0}^{N - 1} {(2 κ^{- 2})}^{i} \prod_{j = 0}^{i - 1} C_{N - j} (\frac{C}{M_{N - i}} + \frac{C M_{N - i}}{K_{N - i}} + 2 δ_{N - i} + \frac{3}{\sqrt{N}}) \end{matrix}

(70)

Since we know that

d (μ_{0}^{N}, μ_{0}) = 0

, we now just need to show that (70) vanishes when

K_{l}, N

gets large and

δ_{i}

gets small,

i \in {0, 1, \dots, N}

. Notice that

M_{l}

comes from the domain truncation for each time step and

δ_{l}

comes from the uniform approximation which is free to choose. The choice of

δ_{l}

will potentially determine the value of

L_{n}

.

We fix

M_{N} : = N M, δ_{N} : = \frac{1}{N M}

where N is the number of time discretization and M is potentially a large number.

Then, we define

δ_{l}

,

M_{l}

through the following:

\begin{matrix} {(2 κ^{- 2})}^{i + 1} \prod_{j = 0}^{i} C_{N - j} 2 δ_{N - i - 1} & = {(2 κ^{- 2})}^{i} \prod_{j = 0}^{i - 1} 2 δ_{N - i} \end{matrix}

(71)

\begin{matrix} {(2 κ^{- 2})}^{i + 1} \prod_{j = 0}^{i} C_{N - j} \frac{C}{M_{N - i - 1}} & = {(2 κ^{- 2})}^{i} \prod_{j = 0}^{i - 1} C_{N - j} \frac{C}{M_{N - i}} \end{matrix}

(72)

Here, we define

C_{N + 1} \equiv 1

.

Notice that one should iterate (71) and (72) iteratively, since defining

δ_{i}

will lead to the Lipschitz constant

L_{i}

at stage i, which is needed for the definition for

C_{i}

.

Then, we have that

\begin{matrix} \sum_{i = 0}^{N - 1} {(2 κ^{- 2})}^{i} \prod_{j = 0}^{i - 1} C_{N - j} \frac{C}{M_{n - i}} & \leq N \frac{C}{N M} \\ \leq \frac{C}{M} \end{matrix}

(73)

and we also have

\begin{matrix} \sum_{i = 0}^{N - 1} {(2 κ^{- 2})}^{i} \prod_{j = 0}^{i - 1} C_{N - j} 2 δ_{n - i} & \leq N \frac{1}{N M} \\ \leq \frac{1}{M} \end{matrix}

(74)

By picking

K_{N - i}

to be large, we then can have

\begin{matrix} \sum_{i = 0}^{N - 1} {(2 κ^{- 2})}^{i} \prod_{j = 0}^{i - 1} C_{N - j} \frac{C M_{N - i}}{K_{N - i}} & \leq N \frac{1}{N M} \\ \leq \frac{1}{M} \end{matrix}

(75)

Last but not least, by taking

N

so large such that

\begin{matrix} (\sum_{i = 0}^{N - i} {(2 κ^{- 2})}^{i} \prod_{j = 0}^{i - 1} C_{N - j}) \frac{3}{\sqrt{N}} \leq \frac{1}{M} \end{matrix}

(76)

we can see that (70) converges to 0 by taking M to be very large. □

Remark.

Notice that in Theorem 5, it is natural to have terms that depend on

\frac{1}{K_{n}}

and

\frac{1}{N}

. The presence of

M_{n}

and

δ_{n}

are due to technical difficulties.

M_{n}

basically gives the growth of the particles in the worst-case scenario (we want our domain to be compact), while

L_{n}

and

δ_{n}

come from the Lipchitz approximation for the test function f.

4. Numerical Example

In this section, we carry out two numerical examples. In the first example, we consider a classic linear quadratic optimal control problem, in which the optimal control can be derived analytically. We use this example as a benchmark example to show the baseline performance and the convergence trend of our algorithm. In the second example, we solve a more practical Dubins vehicle maneuvering problem, and we design control actions based on bearing angles to let the target vehicle follow a pre-designed path.

4.1. Example 1. Linear Quadratic Control Problem with Nonlinear Observations

Assume B, K are symmetric, positive definite. The forward process Y and the observation process M are given by

\begin{matrix} d Y (t) & = A (u (t) - r (t)) d t + σ B u (t) d W_{t} \\ d M (t) & = sin (Y (t)) d t + d B_{t} \end{matrix}

(77)

The cost functional is given by

J [u] = \frac{1}{2} E [\int_{0}^{T} 〈 R (Y_{t} - Y_{t}^{*}), (Y_{t} - Y_{t}^{*}) 〉 d t + \frac{1}{2} \int_{0}^{T} 〈 K u_{t}, u_{t} 〉 d t + \frac{1}{2} 〈 Q Y_{T}, Y_{T} 〉]

(78)

and we want to find

J (u^{*}) = {inf}_{u \in U_{a d} [0, T]} J (u)

.

4.1.1. Experimental Design

An interesting fact of such an example is that one can construct a time deterministic exact solution which depends only on

x_{0}

.

By simplifying (78), we have

J [u] = \frac{1}{2} \int_{0}^{T} (R E [Y_{t}^{T} Y_{t}] - 2 R Y_{t}^{* T} E [Y_{t}] + R Y^{* T} Y^{*} + 〈 K u_{t}, u_{t} 〉 d t) + \frac{1}{2} E [〈 Q Y_{T}, Y_{T} 〉]

(79)

Then, we define:

\begin{matrix} X_{t} & = E [Y_{t}] = E [Y_{0} + \int_{0}^{T} A (u (s) - r (s)) d s + \int_{0}^{T} σ B u (s) d W_{s}] \\ = E [Y_{0} + \int_{0}^{T} A (u (s) - r (s) d s] \end{matrix}

(80)

Hence, we see that

\begin{matrix} E [Y_{t}^{T} Y_{t}] & = E [{(Y_{0} + \int_{0}^{t} A (u (s) - r (s)) d s + \int_{0}^{t} σ B u (s) d W_{s})}^{2}] \\ = E [Y_{0}^{T} Y_{0} + \int_{0}^{t} {(u (s) - r (s))}^{T} A^{T} A (u (s) - r (s) d s + Y_{0}^{T} \int_{0}^{t} A (u (s) - r (s)) d s \\ + \int_{0}^{t} {(u (s) - r (s))}^{T} A^{T} Y_{0} d s] + E [\int_{0}^{t} σ^{2} u {(s)}^{T} B^{T} B u (s) d s] \\ = X_{t}^{T} X_{t} + σ^{2} \int_{0}^{t} u {(s)}^{T} B^{T} B u (s) d s \end{matrix}

(81)

and (81) is true because all the terms are deterministic in time given

x_{0}

. Moreover, we observe that

\begin{matrix} E [Y_{T}^{T} Y_{T}] & = E [{(Y_{0} + \int_{0}^{T} A (u (s) - r (s)) d s + \int_{0}^{T} σ B u (s) d W_{s})}^{2}] \\ = X_{T}^{T} X_{T} + σ^{2} \int_{0}^{T} u {(s)}^{T} B^{T} B u (s) d s \end{matrix}

(82)

As a result, we see that (79) now takes the form:

\begin{matrix} J [u] = & \frac{1}{2} \int_{0}^{T} (X_{s}^{T} R X_{s} - 2 X_{s}^{T} R X_{s}^{*} + X_{s}^{* T} R X_{s}^{*} + u_{s}^{T} (σ^{2} B^{T} Q B + K) u_{s}) d s \\ + \frac{1}{2} σ^{2} \int_{0}^{T} \int_{0}^{t} u_{s}^{T} B^{T} R B u_{s} d s d t + \frac{1}{2} X_{T}^{T} Q X_{T} \end{matrix}

(83)

and by performing a simple integration by part, we have

\begin{matrix} J [u] = & \frac{1}{2} \int_{0}^{T} (X_{s}^{T} R X_{s} - 2 X_{s}^{T} R X_{s}^{*} + X_{s}^{* T} R X_{s}^{*} + u_{s}^{T} (σ^{2} B^{T} Q B + K) u_{s}) d s \\ + \frac{1}{2} σ^{2} \int_{0}^{T} (T - t) u_{s}^{T} B^{T} R B u_{s} d s + \frac{1}{2} X_{T}^{T} Q X_{T} \end{matrix}

(84)

As a result, we have the following standard deterministic control problem:

\begin{matrix} J [u] = & \frac{1}{2} \int_{0}^{T} \underset{2 f}{\underset{︸}{(X_{s}^{T} R X_{s} - 2 X_{s}^{T} R X_{s}^{*} + X_{s}^{* T} R X_{s}^{*} + u_{s}^{T} (σ^{2} B^{T} Q B + K) u_{s} + σ^{2} (T - t) u_{s}^{T} B^{T} R B u_{s})}} d s \\ + \frac{1}{2} X_{T}^{T} Q X_{T} \end{matrix}

(85)

\frac{d X_{t}}{d t} = \underset{b}{\underset{︸}{A (u (t) - r (t))}}, X_{t_{0}} = X_{0}

(86)

Then, one can form the following Hamiltonian

H (x, p, u) = b p + f

(87)

where p is

\nabla_{x} v

, and v is the value function.

Then, to find the optimal control, we have

\begin{matrix} \frac{\partial}{\partial u} H = 0 \end{matrix}

(88)

which is

\begin{matrix} A p + u (σ^{2} B^{T} R B (T - t) + (K + σ^{2} B^{T} Q B)) = 0 \end{matrix}

(89)

Thus, we obtain

\begin{matrix} u_{t} = \frac{- A p (t)}{σ^{2} B^{T} R B (T - t) + (K + σ^{2} B^{T} Q B)} \end{matrix}

(90)

Additionally, notice that

\begin{matrix} \frac{d}{d t} p (t) = - R (X_{t} - X_{t}^{*}), p (T) = Q X_{T} \end{matrix}

(91)

and then,

\begin{matrix} p (t) = Q X_{T} + \int_{0}^{T} R (X_{s} - X_{s}^{*}) d s \end{matrix}

(92)

Combining (86), (90) and (92) together, we can solve the control of the system.

Then, set

\begin{matrix} A = [\begin{matrix} 1 & 0.2 & 0.2 & 0.2 \\ 0.2 & 1 & 0.2 & 0.2 \\ 0.2 & 0.2 & 1 & 0.2 \\ 0.2 & 0.2 & 0.2 & 1 \end{matrix}] \end{matrix}

(93)

Set B, R, K and Q as identity matrices. With

X_{0} = 0

, we have the following solution according to this setup.

To solve (92), let

\begin{matrix} X_{t}^{1} - X_{t}^{* 1} & : = t \\ X_{t}^{2} - X_{t}^{* 2} & : = c o s (t) \\ X_{t}^{3} - X_{t}^{* 3} & : = t^{2} \\ X_{t}^{4} - X_{t}^{* 4} & : = - 2 π s i n (2 π t) \end{matrix}

Then, we have

\begin{matrix} p (t) = [\begin{matrix} X_{t}^{1} \\ X_{t}^{2} \\ X_{t}^{3} \\ X_{t}^{4} \end{matrix}] + [\begin{matrix} \frac{T^{2}}{2} - \frac{t^{2}}{2} \\ s i n (T) - s i n (t) \\ \frac{T^{3}}{3} - \frac{t^{3}}{3} \\ c o s (2 π T) - c o s (2 π t) \end{matrix}] \end{matrix}

(94)

Let

\hat{r} (t)

be

{\hat{r}}_{t}^{1} = \frac{- t^{2} / 2}{β_{t}}, {\hat{r}}_{t}^{2} = \frac{- s i n (t)}{β_{t}}, {\hat{r}}_{t}^{3} = \frac{- t^{3}}{3 β_{t}}, {\hat{r}}_{t}^{4} = \frac{- c o s (2 π t)}{β_{t}}

where

β_{t} = (1 + σ^{2}) + σ^{2} (T - t)

. Then,

r (t) = A \hat{r} (t)

Then, plug (94) into (90), and we solve (86)

\begin{matrix} \frac{d X_{t}}{d t} & = A (u (t) - r (t)) = A (- \frac{A p (t)}{β_{t}} - A \hat{r} (t))) \end{matrix}

(95)

and obtain

\begin{matrix} X_{t} = [\begin{matrix} X_{t}^{1} \\ X_{t}^{2} \\ X_{t}^{3} \\ X_{t}^{4} \end{matrix}] = \frac{α_{t} A^{2}}{σ^{2}} ([\begin{matrix} \frac{T^{2}}{2} \\ s i n (T) \\ \frac{T^{3}}{3} \\ c o s (2 π T) \end{matrix}] - [\begin{matrix} X_{T}^{1} \\ X_{T}^{2} \\ X_{T}^{3} \\ X_{T}^{4} \end{matrix}]) \end{matrix}

(96)

Thus, replacing

X_{t}

with

X_{t}^{*}

, we obtain

\begin{matrix} X_{t}^{*} = [\begin{matrix} X_{t}^{1, *} \\ X_{t}^{2, *} \\ X_{t}^{3, *} \\ X_{t}^{4, *} \end{matrix}] = [\begin{matrix} t \\ c o s (t) \\ t^{2} \\ - 2 π s i n (2 π t) \end{matrix}] + \frac{α_{t} A^{2}}{σ^{2}} ([\begin{matrix} \frac{T^{2}}{2} \\ s i n (T) \\ \frac{T^{3}}{3} \\ c o s (2 π T) \end{matrix}] - [\begin{matrix} X_{T}^{1} \\ X_{T}^{2} \\ X_{T}^{3} \\ X_{T}^{4} \end{matrix}]) \end{matrix}

(97)

where

X_{T}^{i}

can be obtained from the system (96) by letting

t = T

, and

α_{t} = l n \frac{1 + σ^{2} + σ^{2} T}{(1 + σ^{2}) + σ^{2} (T - t)}

Then, to find the exact form by following the trajectory of

y_{t}

in this setup, one will have to solve the following coupled forward–backward ODE.

\begin{matrix} \frac{d X_{t}}{d t} & = A (u_{t} - r_{t}) x_{n} = y_{t_{n}} \end{matrix}

(98)

\begin{matrix} \frac{d p (t)}{d t} & = X_{t} - X_{t}^{*}, p_{T} = x_{T}^{t_{n}, y_{t_{n}}} \end{matrix}

(99)

with

u_{t} = - A p_{t} / (σ^{2} (T - t) + (1 + σ^{2}))

. As a result, we have

\begin{matrix} \frac{d X_{t}}{d t} & = A (- A p_{t} / (σ^{2} (T - t) + (1 + σ^{2})) - r_{t}) x_{n} = y_{t_{n}} \end{matrix}

(100)

\begin{matrix} \frac{d p (t)}{d t} & = X_{t} - X_{t}^{*}, p_{T} = x_{T}^{t_{n}, y_{t_{n}}} \end{matrix}

(101)

That is, we need to solve the above coupled FBODE. Then, seeing that

p_{t} = x_{T}^{t_{n}, y_{t_{n}}} + \int_{t_{n}}^{T} X_{s} - X_{s}^{*} d s

, and writing

a_{t} : = 1 / (σ^{2} (T - t) + (1 + σ^{2}))

, we have

\begin{matrix} \frac{d X_{t}}{d t} & = A (- a_{t} A X_{T} - a_{t} A \int_{t_{n}}^{T} (X_{s} - X_{s}^{*}) d s - r_{t}), x_{t_{n}} = y_{t_{n}} \end{matrix}

(102)

To solve (102) numerically, we conduct a numerical discretization:

\begin{matrix} x_{t_{n + 1}} - x_{t_{n}} & = - a_{t_{n}} A^{2} X_{T} Δ t - a_{t_{n}} {(Δ t)}^{2} A^{2} \sum_{i = n}^{N - 1} (X_{t_{i}} - X_{t_{i}}^{*}) - A r_{t} \\ \Rightarrow A r_{t} - a_{t_{n}} {(Δ t)}^{2} A^{2} \sum_{i = n}^{N - 1} X_{t_{i}}^{*} & = X_{t_{n}} - X_{t_{n + 1}} - a_{t_{n}} {(Δ t)}^{2} A^{2} \sum_{i = n}^{N - 1} X_{t_{i}} - a_{t_{n}} A^{2} X_{T}, x_{t_{n}} = y_{t_{n}} \end{matrix}

(103)

We can put (103) into a large linear system and solve it numerically.

4.1.2. Performance Experiment

We set the total number of discretizations to be

N = 50

. Set iteration

L = 10^{4}, σ = 0.1

, the number of particles in each dimension is 128,

T = 1

, and

X_{0} = 0

.

In Figure 1, we present the estimated data-driven control and the true optimal control.

In Figure 2, we show the estimated state trajectories with respect to true state trajectories in each dimension.

We can see from these Figures that our data-driven feedback control algorithm works very well for this 4-D linear quadratic control problem despite there being nonlinear observations.

4.1.3. Convergence Experiment

In this experiment, we demonstrate the convergence performance of our algorithm, and we study the error decay of the algorithm in the

L_{2}

norm with respect to the number of particles used. Each result is an average of

| | u^{e s t} - u^{*} {| |}_{2}

of 50 independent tests.

Specifically, we set

L = 10^{4}

and we just increase the number of particles S = {2, 8, 32, 128, 512, 2048, 4096, 8192, 16,384, 32,768}, and we obtained the result in Figure 3.

Set the number of particles

S = {2, 8, 32, 128, 512, 1024, 2048, 4096}

, and

L = S^{2}

. We obtained the result in Figure 4.

From the results above, we can see that the error will decrease and converge as we increase the number of particles and the number of iterations.

4.2. Example 2. Two-Dimensional Dubins Vehicle Maneuvering Problem

In this example, we solve a Dubins vehicle maneuvering problem. The controlled process is described by the following nonlinear controlled dynamics:

\begin{matrix} d S_{t} & = [\begin{matrix} d X_{t} \\ d Y_{t} \end{matrix}] = [\begin{matrix} sin (θ_{t}) d t \\ cos (θ_{t}) d t \end{matrix}] + σ d W_{t} \end{matrix}

(104)

\begin{matrix} d θ_{t} & = u_{t} d t + σ^{2} d W_{t} \end{matrix}

(105)

\begin{matrix} d M_{t} & = {[arctan (\frac{X_{t} + 1}{Y_{t} - 1}), arctan (\frac{X_{t} - 2}{Y_{t} - 1})]}^{T} + η_{t} \end{matrix}

(106)

where the pair

(X, Y)

gives the position of a car-like robot moving in the 2D plane,

θ

is the steering angle that controls the moving direction of the robot, which is governed by the control action

u_{t}

, and

σ

is the noise that perturbs the motion and control actions. Assume that we do not have direct observations on the robot. Instead, we use two detectors located on different observation platforms at

(- 1, 1)

and

(2, 1)

to collect bearing angles of the target robot as indirect observations. Thus, we have the observation process

M_{t}

. Given the expected path

S^{*}

, the car should follow it and arrive at the terminal position on time. The performance cost functional based on observational data that we aim to minimize is defined as:

J [u] = E [\frac{1}{2} \int_{0}^{T} 〈 R (S_{t} - S_{t}^{*}), (S_{t} - S_{t}^{*}) 〉 d t + \frac{1}{2} \int_{0}^{T} 〈 K u_{t}, u_{t} 〉 d t + 〈 Q (S_{T} - S_{T}^{*}), (S_{T} - S_{T}^{*}) 〉]

(107)

In our numerical experiments, we let the car start from

(X_{0}, Y_{0}) = (0, 0)

to

(X_{T}, Y_{T}) = (1, 1)

. The expected path

S_{t}^{*}

is

X_{t}^{2} + Y_{t}^{2} = 1

. Other settings are

T = 1

,

Δ t = 0.02

, i.e.,

N_{T} = 50

,

σ = 0.1

,

η_{t} \sim N (0, 0.1)

,

L = 1000

,

K = 1

, and the initial heading direction is

π / 2

. To emphasize the importance of following the expected path and arriving at the target location at the terminal time, let

R = Q = 20

.

In Figure 5, we plot our algorithm’s designed trajectory and the estimated trajectory. We can see from this figure that the car moves towards the target along the designed path and is “on target” at the final time with a very small error.

We set

L = 10^{3}

and we just increase the number of particles

S = {2, 8, 32, 128, 512,

1024, 2048, 4096, 8192, 16384, 32768}

. To provide the convergence of our algorithm in solving this Dubins vehicle maneuvering problem, we repeated the above experiment 50 times and we obtained the error =

\frac{1}{M_{r e p t .}} \sum_{m = 1}^{M_{r e p t .}} 〈 (S_{t} - S_{t}^{*}), (S_{t} - S_{t}^{*}) 〉

in Figure 6 where

M_{r e p t .} = 50

.

Set the number of particles

S = {8, 16, 32, 64, 128, 256, 512, 1024}

, and

L = S^{2}

. We obtained the error=

\frac{1}{M_{r e p t .}} \sum_{m = 1}^{M_{r e p t .}}

〈 (S_{t} - S_{t}^{*}), (S_{t} - S_{t}^{*}) 〉

in Figure 7, where the error is the average of

| | S_{t} - S_{t}^{*} {| |}_{2}

.

From the results above, we can see that the error will decrease and converge as we increase the number of particles and the number of iterations.

5. Conclusions

In this paper, we present the weak convergence of the data-driven feedback control algorithm proposed in [1]. We do not discuss the convergence rate due to the challenge of determining the radius M of the compact subspace that bounds all particles

X_{i}

. However, in practice, given a terminal time T, one can use Monte Carlo simulations to find an M that satisfies a certain probability in Lemma 2. Our numerical experiments indicate that both the estimated control and estimated distribution converge at a rate related to the number of particles and iterations.

Future work can focus on analyzing the convergence rate and error bounds for a given state system. This will provide clarity on the number of particles and iterations required to achieve the desired estimation accuracy when applying the algorithm from [1].

Author Contributions

Conceptualization, S.L., H.S. and F.B.; methodology, F.B. and R.A.; software, S.L.; validation, S.L. and F.B.; formal analysis, S.L. and H.S.; investigation, S.L.; resources, S.L.; data curation, S.L.; writing—original draft preparation, S.L. and H.S.; writing—review and editing, F.B. and R.A.; visualization, S.L.; supervision, F.B.; project administration, F.B.; funding acquisition, F.B. and R.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partially supported by U.S. Department of Energy through FASTMath Institute and Office of Science, Advanced Scientific Computing Research program under the grant DE-SC0022297. FB would also like to acknowledge the support from U.S. National Science Foundation through project DMS-2142672.

Data Availability Statement

All code written as part of this study will be made available on GitHub upon completing the peer review process for this article.

Conflicts of Interest

Author Hui Sun was employed by the company Citigroup Inc. The research work performed by the author does not represent any corporate opinion. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BSDEs	Backward stochastic differential equations
FBSDEs	Forward–backward stochastic differential equations
PF	Particle filter
SGD	Stochastic gradient descent

Appendix A

Appendix A.1

Proof of Lemma 2.

Proof.

We start with time

t_{0}

.

Step 1. Starting from

X_{0} \sim ξ

with

E [ξ^{2}] \leq C_{0}

, and by fixing an arbitrary control

u_{0}

, we have for the prediction step:

\begin{matrix} E [| X_{1}^{-} |^{2}] & = E [| X_{0} + b (X_{0}, u_{0}) Δ t + σ (X_{0}) Δ W_{0} |^{2}] \\ \leq E [(1 + Δ t) X_{0}^{2} + (1 + \frac{1}{Δ t}) b^{2} {(Δ t)}^{2}] + C_{σ}^{2} Δ t \\ \leq (1 + Δ t) C_{0}^{2} + (C_{b}^{2} (Δ t + 1) + C_{σ}^{2}) Δ t \\ : = C_{0}^{-} \end{matrix}

(A1)

Step 2. We denote the distribution

L (X_{1}^{-}) \sim π_{t_{1} | t_{0}}

, and then the particle method will perform a random resampling from such a distribution and obtain a random distribution

\begin{matrix} \frac{1}{N} \sum_{i = 1}^{N} δ_{x_{i} (ω)} : = π_{t_{1} | t_{0}}^{N} \end{matrix}

(A2)

Hence, we have for

X \sim π_{t_{1} | t_{0}}^{N}

, taking the expectation where the expectation is taken over all randomness in the measure

\begin{matrix} E [X^{2}] & = E [E [X^{2} | G_{1}^{-}]] = \frac{N}{N} E [E [x_{1}^{2} | G_{1}^{-}]] = E [{\tilde{X}}^{2}] \end{matrix}

(A3)

where

x_{i} \sim π_{t_{1} | t_{0}}

are i.i.d random samples,

G_{0}

contains the sampling randomness and

\tilde{X} \sim π_{t_{1} | t_{0}}

. The conditional expectation is meant to show that all the particles

ξ

are conditionally independent (since there is other randomness that has accumulated in the history if we want to apply this argument recursively.) Thus, by (A1)

\begin{matrix} E [{\tilde{X}}^{2}] & \leq C_{0}^{-} \end{matrix}

(A4)

Step 3. We now have the random measure

π_{t_{1} | t_{0}}^{N}

, and we proceed to the analysis step. We have by definition

\begin{matrix} X_{1} \sim \frac{g (x) d π_{t_{1} | t_{0}}^{N} (x)}{\int g (x) d π_{t_{1} | t_{0}}^{N} (x)} : = {\tilde{π}}_{t_{1} | t_{1}}^{N} \end{matrix}

(A5)

where

π_{t_{1} | t_{0}}^{N} (x)

is the distribution of the terminal state

X_{1}^{-}

from the previous step. Recalling assumption 7, we give an estimate over

E [| X_{1} |^{2}]

:

\begin{matrix} E [| X_{1} |^{2}] & \leq {(\frac{1}{κ})}^{2} E [\int x^{2} d π_{t_{1} | t_{0}}^{N} (x)] \leq {(\frac{1}{κ})}^{2} C_{0}^{-} : = C_{1} \end{matrix}

(A6)

Step 4. Now, we again apply the random sampling step

\begin{matrix} \frac{1}{N} \sum_{i = 1}^{N} δ_{x_{i} (ω)} : = π_{t_{1} | t_{1}}^{N} \end{matrix}

(A7)

where

x_{i} (ω) \sim {\tilde{π}}_{t_{1} | t_{1}}^{N}

. Then, for

X \sim π_{t_{1} | t_{1}}^{N}

, we have

\begin{matrix} E [X^{2}] & = E [E [X^{2} | G_{1}]] \\ = \frac{N}{N} E [E [x_{i}^{2} | G_{1}]] \\ = E [{\tilde{X}}^{2}] \end{matrix}

(A8)

where

\tilde{X} \sim {\tilde{π}}_{t_{1} | t_{1}}^{N}

and

G_{1}

is the filtration that builds on

G_{1}^{-}

and the randomness of the current sampling. Then, by (A6), we have

\begin{matrix} E [X^{2}] \leq C_{1}, X \sim {\tilde{π}}_{t_{1} | t_{1}}^{N} \end{matrix}

(A9)

and this completes all the estimates for the first time-stepping. Hence, after one time step, we have

\begin{matrix} C_{1} = κ^{- 2} (1 + Δ t) C_{0} + C Δ t \end{matrix}

(A10)

which means that by applying the same argument, we will have the following recursion in general:

C_{n + 1} = κ^{- 2} (1 + Δ t) C_{n} + C Δ t

(A11)

As a result, by picking arbitrary

u_{n}

, using this same argument repeatedly until N, we have that for all

n = 1, \dots, N

:

\begin{matrix} C_{n} = {(κ^{- 2} (1 + Δ t))}^{n} C_{0} + \sum_{i = 0}^{n - 1} {(κ^{- 2} (1 + Δ t))}^{i} C Δ t \end{matrix}

(A12)

and we notice that

C_{n}

is increasing in n. As a result, we know that for any

X_{n} \sim π_{t_{n} | t_{n - 1}}^{N}, π_{t_{n} | t_{n}}^{N}

, we have that

\begin{matrix} E [| X_{n} |^{2}] \leq C_{N} \end{matrix}

(A13)

Hence, by Chebyshev’s inequality, we have

\begin{matrix} P (| X_{n} | \geq M) \leq \frac{C_{N}}{M^{2}}, \forall n \in {1, 2, \dots, N} \end{matrix}

(A14)

and then we have that

\begin{matrix} P (sup_{n} | X_{n} | \geq M) \leq \frac{C_{N}}{M^{2}} \end{matrix}

(A15)

By noticing that the control values are arbitrarily picked, we have that

\begin{matrix} P (sup_{n, U_{0}} | X_{n} | \geq M) \leq \frac{C}{M^{2}}, X_{n} \sim π_{t_{n} | t_{n - 1}}^{N} or π_{t_{n} | t_{n}}^{N} \end{matrix}

(A16)

□

References

Archibald, R.; Bao, F.; Yong, J.; Zhou, T. An efficient numerical algorithm for solving data driven feedback control problems. J. Sci. Comput. 2020, 85, 51. [Google Scholar] [CrossRef]
Bellman, R. Dynamic programming. Science 1966, 153, 34–37. [Google Scholar] [CrossRef] [PubMed]
Feng, X.; Glowinski, R.; Neilan, M. Recent developments in numerical methods for fully nonlinear second order partial differential equations. SIAM Rev. 2013, 55, 205–267. [Google Scholar] [CrossRef]
Peng, S. A general stochastic maximum principle for optimal control problems. SIAM J. Control Optim. 1990, 28, 966–979. [Google Scholar] [CrossRef]
Yong, J.; Zhou, X.Y. Stochastic Controls: Hamiltonian Systems and HJB Equations; Springer Science & Business Media: Cham, Switzerland, 2012. [Google Scholar]
Gong, B.; Liu, W.; Tang, T.; Zhao, W.; Zhou, T. An efficient gradient projection method for stochastic optimal control problems. SIAM J. Numer. Anal. 2017, 55, 2982–3005. [Google Scholar] [CrossRef]
Tang, S. The maximum principle for partially observed optimal control of stochastic differential equations. SIAM J. Control Optim. 1998, 36, 1596–1617. [Google Scholar] [CrossRef]
Zhang, J. A numerical scheme for BSDEs. Ann. Appl. Probab. 2004, 14, 459–488. [Google Scholar] [CrossRef]
Zhao, W.; Fu, Y.; Zhou, T. New kinds of high-order multistep schemes for coupled forward backward stochastic differential equations. SIAM J. Sci. Comput. 2014, 36, A1731–A1751. [Google Scholar] [CrossRef]
Archibald, R.; Bao, F.; Yong, J. A stochastic gradient descent approach for stochastic optimal control. East Asian J. Appl. Math. 2020, 10, 635–658. [Google Scholar] [CrossRef]
Sato, I.; Nakagawa, H. Approximation analysis of stochastic gradient Langevin dynamics by using Fokker-Planck equation and Ito process. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; Volume 32, pp. 982–990. [Google Scholar]
Shapiro, A.; Wardi, Y. Convergence analysis of gradient descent stochastic algorithms. J. Optim. Theory Appl. 1996, 91, 439–454. [Google Scholar] [CrossRef]
Archibald, R.; Bao, F.; Cao, Y.; Sun, H. Numerical analysis for convergence of a sample-wise backpropagation method for training stochastic neural networks. SIAM J. Numer. Anal. 2024, 62, 593–621. [Google Scholar] [CrossRef]
Zakai, M. On the optimal filtering of diffusion processes. Z. Wahrscheinlichkeitstheorie Verw. Gebiete 1969, 11, 230–243. [Google Scholar] [CrossRef]
Gordon, N.J.; Salmond, D.J.; Smith, A.F. Novel approach to nonlinear/non-Gaussian Bayesian state estimation. IEE Proc. F (Radar Signal Process.) 1993, 140, 107–113. [Google Scholar] [CrossRef]
Morzfeld, M.; Tu, X.; Atkins, E.; Chorin, A.J. A random map implementation of implicit filters. J. Comput. Phys. 2012, 231, 2049–2066. [Google Scholar] [CrossRef]
Andrieu, C.; Doucet, A.; Holenstein, R. Particle Markov chain Monte Carlo methods. J. R. Statist. Soc. B 2010, 72, 269–342. [Google Scholar] [CrossRef]
Crisan, D.; Doucet, A. A survey of convergence results on particle filtering methods for practitioners. IEEE Trans. Signal Process. 2002, 50, 736–746. [Google Scholar] [CrossRef]
Künsch, H.R. Particle filters. Bernoulli 2013, 19, 1391–1403. [Google Scholar] [CrossRef]
Bao, F.; Cao, Y.; Meir, A.; Zhao, W. A first order scheme for backward doubly stochastic differential equations. SIAM/ASA J. Uncertain. Quantif. 2016, 4, 413–445. [Google Scholar] [CrossRef]
Zhao, W.; Zhou, T.; Kong, T. High order numerical schemes for second-order FBSDEs with applications to stochastic optimal control. Commun. Comput. Phys. 2017, 21, 808–834. [Google Scholar] [CrossRef]
Law, K.; Stuart, A.; Zygalakis, K. Data Assimilation; Springer: Cham, Switzerland, 2015. [Google Scholar]

Figure 1. Estimated control vs. true optimal control.

Figure 2. Estimated state vs. true state.

Figure 3. Error vs. number of particles.

Figure 4. Error vs. number of steps.

Figure 5. Controlled trajectory from (0,0) to (1,1).

Figure 6. Error vs. number of particles.

Figure 7. Error vs. number of steps.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, S.; Sun, H.; Archibald, R.; Bao, F. Convergence Analysis for an Online Data-Driven Feedback Control Algorithm. Mathematics 2024, 12, 2584. https://doi.org/10.3390/math12162584

AMA Style

Liang S, Sun H, Archibald R, Bao F. Convergence Analysis for an Online Data-Driven Feedback Control Algorithm. Mathematics. 2024; 12(16):2584. https://doi.org/10.3390/math12162584

Chicago/Turabian Style

Liang, Siming, Hui Sun, Richard Archibald, and Feng Bao. 2024. "Convergence Analysis for an Online Data-Driven Feedback Control Algorithm" Mathematics 12, no. 16: 2584. https://doi.org/10.3390/math12162584

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Convergence Analysis for an Online Data-Driven Feedback Control Algorithm

Abstract

1. Introduction

2. An Efficient Algorithm for Data-Driven Feedback Control

2.1. Problem Setting for the Data-Driven Optimal Control Problem

2.2. The Algorithm for Solving the Data-Driven Optimal Control Problem

2.2.1. The Optimization Procedure for Optimal Control

2.2.2. Numerical Approach for Data-Driven Feedback Control by PF-SGD

3. Convergence Analysis

3.1. Notations and Assumptions

3.2. The Convergence Theorem for the Data-Driven Feedback Control Algorithm

4. Numerical Example

4.1. Example 1. Linear Quadratic Control Problem with Nonlinear Observations

4.1.1. Experimental Design

4.1.2. Performance Experiment

4.1.3. Convergence Experiment

4.2. Example 2. Two-Dimensional Dubins Vehicle Maneuvering Problem

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI