1. Introduction: Optimal Control in Stochastic Thermodynamics and the Fokker–Planck Equation
Thermodynamic transitions at nanoscale occur in highly fluctuating environments. For instance, nanoscale bio-molecular machines operate within a power output range between
and
per molecule while experiencing random environmental buffeting of approximately
at room temperature [
1]. Nanomachines experience topological randomness as their motion occurs in inherently non-smooth surroundings due to the fact that machine constituent dimensions are close to those of the atom [
1]. The dynamics of nanosystems, therefore, need to be described in terms of stochastic [
2] or, more generally, random differential equations [
3]. Consequently, the laws of macroscopic thermodynamics are replaced by identities involving functions of indicators of the state of the system that are naturally expressed by stochastic processes. Addressing fundamental and technological questions of nanoscale physics has thus propelled interest in the field of stochastic thermodynamics over the last few years [
4,
5,
6,
7].
A class of important questions in stochastic thermodynamics revolves around finding efficient protocols that natural or artificial nanomachines adopt to perform useful work at nanoscale. Optimal control theory provides a natural mathematical formulation for this type of question [
8]. For instance, the conversion of chemical energy into mechanical work typically implies steering the system probability distribution between two assigned values. Schrödinger bridge problems [
9] (English translation in [
10]) and their extensions, see, e.g., [
11,
12,
13], depict this idea mathematically. In these types of problems, protocols optimizing a given functional of the stochastic process describing the state of a nanosystem are determined by solving a coupled Hamilton–Jacobi–Bellman equation [
14] and Fokker–Planck equation. The solution of the Hamilton–Jacobi–Bellman equation determines the value of the optimal action (force) steering the dynamics at any instant of time in the control horizon. However, the boundary condition at the end of the control horizon is assigned on the system probability distribution. Solving the Fokker–Planck equation thus becomes necessary to fully determine the optimal protocols.
Possibly the most prominent physical application of such a setup is the derivation of a tight lower bound on the entropy production in classical stochastic thermodynamics [
15]. Remarkably, when the system dynamics is modelled by Langevin–Smoluchowski (overdamped) dynamics, the problem maps into the Monge–Ampère–Kantorovich equations and becomes essentially integrable [
16]. This allows one to extricate relevant quantitative information about molecular engine efficiency [
17,
18,
19,
20] and minimum heat release [
21]. The situation is, however, more complicated for more realistic models of nanosystem evolution. Specifically, if we adopt an underdamped (Langevin–Kramers) model of the dynamics [
22,
23], then even the equation connecting the optimal mechanical potential to the value function solving the dynamic programming equation is not analytically integrable. In the Gaussian case, the solution of a Lyapunov equation [
24] specifies the optimal mechanical potential [
25]. In general, optimal control duality methods [
26] and multiscale perturbative expansions [
25] yield lower bounds on the entropy production and approximate expressions of optimal protocols. More detailed quantitative information calls for exact numerical methods. This is particularly challenging, as integration strategies must be adapted to take into account the boundary conditions at the end of the control horizon imposed on the system probability density. Hence, the development of accurate and scaleable methods for numerical integration of the Fokker–Planck equation becomes an essential element of optimization algorithms.
Traditional numerical methods from hydrodynamics, such as the pseudo-spectral method, see, e.g., [
27], are certainly accurate, but require boundary conditions that are periodic in space, and may not suit problems in stochastic thermodynamics. An even more serious limitation is the exponentially fast increase in computational complexity with the degrees of freedom of the problem. Monte Carlo averages over Lagrangian particles paths, i.e., realizations of the solution of the stochastic differential equation associated with the Fokker–Planck, circumvent the curse of dimensionality; see [
28,
29] for example applications to classical and quantum physics. The drawback is, however, that these methods are best suited to computing expectation values of smooth indicators of the stochastic process. They lack accuracy when computing the probability density itself, as this involves averaging over Dirac distributions. These considerations motivate recent works [
30,
31], which use machine learning methods to construct solutions of the Fokker–Planck equation in the system’s state space. These approaches consider the associated probability flow equation [
16,
30,
31], which use the score function (or gradient of the log probability density [
32]) to turn the Fokker–Planck into a mass conservation equation. The score function can be parametrized by, for example, a neural network [
31], and the probability density can be recovered through a deterministic transport map.
In this article, we propose a Monte Carlo method adapted to the numerical integration of Fokker–Planck equations of diffusion processes driven by a time-dependent mechanical force. Although mathematically non-generic, they are recurrent in applications of stochastic thermodynamics, as they describe the evolution of a system under a mechanical potential, which may vary in time because of a feedback. We encounter this type of equation in generalized Schrödinger bridge problems instantiating refinements of the second law of thermodynamics [
15,
21,
25]. In this context, the Fokker–Planck equation describes the evolution of the optimal distribution of the state throughout the time interval. Integrating this directly offers a challenging problem, particularly when the driving mechanical potential is non-linear or for systems of high dimension. By using the well-known Girsanov change-of-measure formula [
33], we couch the solution to the Fokker–Planck in terms of a numerical expectation that can be evaluated from sampled trajectories of the dynamics.
In addition, we also take a look at an application of the Bismut–Elworthy–Li formula [
34,
35,
36] to compute the gradient of the solution to the Hamilton–Jacobi–Bellman equation. This equation determines the value function, which enforces the system dynamics over the control horizon, and is coupled to the Fokker–Planck through its boundary conditions. Direct access to the gradient of the value function is important, since the stationarity condition in control problems often links the optimal control protocol through the gradient of the value function. The Bismut–Elworthy–Li formula is commonly used in finance, for the calculation of the Greek derivatives [
37,
38,
39]. It has also been used in the numerical integration of non-linear parabolic partial differential equations [
40]. We apply the Bismut–Elworthy–Li formula in the underdamped, or degenerate, dynamics [
36], alongside a numerical example.
Numerical approaches to the Schrödinger bridge are often iterative. For example, ref. [
41] turns the problem into a pair of Fokker–Planck equations and iteratively integrates them to recompute the boundary conditions via a proximal operator-based numerical integration method [
42]. Machine learning techniques have been used to iteratively solve half-bridge problems [
43,
44]. We bring together the described Monte Carlo methods for the Fokker–Planck and the Hamilton–Jacobi–Bellman equation in a prototype numerical example to solve a Schrödinger bridge minimizing the Kullback–Leibler divergence from a free diffusion. This is done by an iteration between updating the drift, parametrized by a neural network, with the stationarity condition, and updating the value function with the probability density. Using Monte Carlo integration allows us to compute the update steps quickly and without spacial discretization.
The next Sections are organized as follows. In
Section 2, we leverage the Girsanov theorem to express the solution of a Fokker–Planck equation as an expectation which can be evaluated numerically. This is complemented by analytical and numerical examples. In
Section 3, we extend this setup into the underdamped dynamics, with an accompanying numerical example from a stochastic optimal control model in the underdamped dynamics.
Section 4 discusses the application of the Bismut–Elworthy–Li formula to non-degenerate dynamics, with an analytic example application in
Section 4.3, and a numerical example in
Section 4.4 of the Hamilton–Jacobi–Bellman equation coupled to the Fokker–Planck from
Section 2.2.4. In
Section 5, we extend the application of the Bismut–Elworthy–Li formula to the degenerate case. The formula is applied analytically in
Section 5.2, and numerically in
Section 5.3. Finally, we give an example use case for the derived formulae, by solving an optimal control problem in the overdamped dynamics by machine learning.
2. Fokker–Planck for a Time-Dependent Mechanical Overdamped Diffusion Process
We consider the Langevin–Smoluchowski stochastic differential equation
where
denotes a standard Wiener process [
2,
45]. The diffusion coefficient
is proportional to the temperature of the environment surrounding the system. The positive constant
is the motility with canonical dimensions of a time over a mass. The drift in (
1) is the gradient of a time-dependent potential
that we assume to be sufficiently regular and confining.
Remark 1. Following a well-established convention in stochastic thermodynamics (see, e.g., [46]), we denote functional dependence upon time, i.e., the dynamical parameter, with an underscript. Round brackets express dependence upon state coordinates in configuration or phase space of the diffusion. The probability density distribution of the solution of (
1) at any instant of time
t satisfies the Fokker–Planck equation
whose solution is fully specified by the assignment of an initial datum at time
. The assumption of a confining potential (
2) guarantees that the probability density is integrable in
. The connection between (
1) and (
3) stems from the representation of the transition probability density as a Monte Carlo average:
The expectation value
is over the probability measure
weighing the realizations of the solutions of (
1). The singular nature of the Dirac delta distribution prevents the accurate evaluation of the transition probability density as a Monte Carlo average. For this reason, we look for the solution in the form
Upon inserting into (
3), we arrive at
Proposition 1. The solution of (3) admits the representationwhere is the probability measure over the paths of the backward diffusion processnaturally complemented by conditions assigned for some . Idea of the proof: We start by recalling that, for any test function
, Itô’s lemma for backward differentials yields
We emphasize that
is just a standard Wiener process but evolving backward in time. In the stochastic analysis jargon, (
8) is a diffusion process with respect to a backward filtration as in, e.g., [
47]. As is well known, the stochastic integrals and martingale properties are the same as in forward calculus once one exchanges the pre-point rule with the post point, see, e.g., [
48]. Let us define the auxiliary function
Then, Itô’s lemma and (
6) immediately imply
The equation tells us that the auxiliary function is a local martingale of the backward diffusion (see, e.g., Chapter 7 of [
2]). Since we assume that the confining potential also guarantees integrability, we infer that the martingale property
must also hold true. By construction, we know that
and we conclude
Replacing
with
t in the above chain of identities completes the proof. □
The upshot is that we can use the Feynman–Kac formula over a backward diffusion to compute the solution of a forward Fokker–Planck equation. Next, we take advantage of Girsanov’s change-of-measure formula (see, e.g., Chapter 10 of [
2] or 3.5 of [
45]) to evaluate the conditional expectation in (
7) directly over the paths of the Wiener process, or, more generally, over the paths of any diffusion that generates a measure with respect to which
is absolutely continuous. Girsanov’s change-of-measure formula is thus the basis of statistical inference for diffusion processes, see, e.g., [
49]. We emphasize that we make use of the Girsanov formula while dealing with backward diffusions as, e.g., in [
48]. As time is evolving from a larger to smaller value, correspondingly, the role of “past” and “future” events must be exchanged.
Remark 2. As increments of any Wiener process are independent, from now on we writeto alleviate the notation. 2.1. Use of Girsanov’s Formula
We denote by
the probability measure over the path of
Our aim is to use Girsanov’s formula to express expectations with respect to the path measure
of (
8) in terms of expectations with respect to
:
where
is the Radon–Nikodym derivative. The symbol ∘ emphasizes that we define the stochastic integral in (
10) using the
post-point prescription:
i.e., the function
is evaluated at the end of each time interval. Accordingly, (
10) is a martingale with respect to the backward filtration (i.e., the family of
-algebras increases as
decreases) to which we associate the probability measure
. In other, rougher, words, (
10) is a martingale conditional to events occurring at times larger than or equal to the upper bound of integration
t.
Writing the stochastic integral in the standard pre-point form allows us to simplify the expression of the probability density. We notice that (
9) trivially implies
Next, we use the relation between stochastic integrals in the post-point, mid-point (or Stratonovich, denoted by the ⋄-symbol), and pre-point prescriptions:
Finally, we recall that ordinary differential calculus holds for the stochastic differentials in Stratonovich form:
Putting these observations together, we obtain the following representation of the solution for the Fokker–Planck Equation (
3):
In practice, this means that, to compute the probability density at the configuration space point
at time
, we need to average the initial density over solutions of (
9) evolved backward in time to
and weighted by a path-dependent change-of-measure factor.
2.2. Examples and Path Integral Representation
Let us summarize the meaning of (
11) in words. Formula (
7) tells us that a Fokker–Planck equation of a forward diffusion process with gradient drift, i.e., of the form (
3), admits a Feynman–Kac representation in terms of a backward diffusion process. This is because we can use the potential specifying the drift to turn the forward Fokker–Planck into a non-homogeneous backward Kolmogorov equation with respect to the backward diffusion process. This latter equation, as is well known, generically specifies a problem with initial data. We now turn to illustrate this fact with two examples.
2.2.1. Analytical Example
Consider a quadratic potential
with
denoting a
real symmetric time-dependent matrix. The backward stochastic differential Equation (
8) reduces to
Letting
denote the flow solution of the deterministic ordinary differential equation
then the solution of the backward stochastic differential equation is
The symbol ⊤ as usual denotes matrix transposition. The corresponding transition probability density is Gaussian with mean
(recalling that, for standard backward differential equations, the martingale property arises upon conditioning on future events [
47]) and variance matrix
We need to compute
If we couch this expression into the form of a path integral [
50], we obtain
Here,
denotes the limit over finite dimensional approximations over time lattices in
of paths satisfying the terminal condition
. We are free to interpret the path integral in the mid-point sense, because any change of discretization generates a path-independent Jacobian that can be reabsorbed into the normalization constant. As the integral is Gaussian, we can compute it by infinite dimensional stationary phase using ordinary differential calculus. We are left with
We now readily recognize that
is the path integral expression of the transition probability density of the forward stochastic differential equation
We therefore recover the Chapman–Kolmogorov representation of the solution of (
3):
as expected.
2.2.2. Path-Integral Representation in General
The path integral representation of (
7) is
As the stochastic integral term is evaluated in the pre-point representation, the path integral exactly recovers the path integral expression of the transition probability density. We have thus verified that (
12) holds in general. We refer the reader unfamiliar with path integral calculus to, e.g., [
50].
2.2.3. Numerical Example: Time-Independent Drift
In this section, we demonstrate how the method described can be applied to find a numerical solution of a Fokker–Planck equation driven by a mechanical potential. We consider the Fokker–Planck of the form (
3) with a time-independent drift. By applying the Girsanov theorem as above, we couch the solution of the Fokker–Planck at time
into a numerical average of simulated trajectories of the auxiliary dynamics, given by
where
is the measure generated by the backwards diffusion (
9). We approximate the expectation value numerically by repeated sampling of trajectories of the process (
9). The trajectories are approximated on a discretization of the time interval
given by
Trajectories of (
9) are sampled using the Euler–Maruyama scheme:
where
is an increment of Brownian noise, sampled independently from a standard normal distribution. The Girsanov factor
g is computed as a running cost:
This computation is summarized in Algorithm 1. In
Figure 1, we integrate an example Fokker–Planck equation driven by a time-independent mechanical potential in two ways. The results of Algorithm 1 are compared to the proximal gradient descent method of [
42]. In this method, the solution is found via gradient descent on the space of probability distributions by solving a proximal fixed point recursion at each time step. Both methods discretize the time interval, but do not require spacial discretization. In our implementation, the Monte Carlo method performs significantly faster.
Algorithm 1 Integrating Fokker–Planck equation using Girsanov theorem |
Initialize Initialize Initialize for Initialize for for i in do Sample Brownian noise: Evolve one step of ( 8): end for Return
|
2.2.4. Numerical Example: “Föllmer’s Drift”
In this section, we apply (
7) to a non-trivial example of gradient drift. Specifically, we consider the Föllmer drift solution of the dynamic Schrödinger bridge that steers the system between assigned boundary conditions while minimizing the Kullback–Leibler divergence from a free diffusion [
10], given by
in a finite time interval
. The boundary conditions are assigned on the initial and final distributions of the position, denoted
for the initial at time
and
for the final at time
. We consider boundary conditions of the form
Föllmer drifts are relevant to machine learning applications, see, e.g., [
44,
51]. We refer to [
12] or [
25] and references therein for further details on the mathematics and physics backgrounds, respectively.
We summarize how to construct the Föllmer drift by solving a Schrödinger bridge problem using an iterative method of [
41]. In doing so, we also obtain the solution of the Fokker–Planck Equation (
3) that we use for comparison with the numerical expression provided by (
7). The Schrödinger bridge problem is formulated as the minimization of a Bismut–Pontryagin functional [
25]. In this framework, we find that the intermediate density
and a value function
V imposing the boundary conditions satisfy the coupled partial differential equations
along with the stationarity condition
We identify (
18a) as the Fokker–Planck equation, and (
18b) as the Hamilton–Jacobi–Bellman equation which is discussed in later sections. For a known
U, we can apply the Girsanov theorem to integrate (
18a). We find a reference solution to the system (
18a) and (
18b) using an adaptation of the method of [
41], which is briefly described below. The transformation
applied to (
18a) and (
18b) yields the linear coupled equations
with boundary conditions
We make an initial guess for
which we use to integrate (
21a), recompute the boundary conditions and then integrate (
21b), recomputing
, and repeating this process until convergence: see [
41] or Section 8.2 of [
25] for a more detailed treatment. We reconstruct the value function and intermediate densities using (
20).
With these results, we have a numerical approximation of the drift which maps an assigned initial probability density into an assigned final density while minimizing the Kullback–Leibler divergence on the interval
. We use this drift to compute the solution of the Fokker–Planck (
18a)
via Algorithm 1, and compare it to the density resulting from the iteration method of [
41] in
Figure 2.
4. Bismut–Elworthy–Li Monte Carlo Representation of Gradients
Numerical integration of Schrödinger bridge type problems, in the overdamped [
12,
13,
15,
41] and underdamped [
22,
23,
25,
53] cases require the solution of a Hamilton–Jacobi–Bellman (also known as a dynamic programming) equation, specifying the optimal control potential. In the simplest overdamped setup, the mechanical force is given by (
19). The function
solves a Burgers type equation. More generally, optimization problems often require computing gradients of scalar functions satisfying a non-homogeneous backward Kolmogorov equation in
of the form
The left hand side of (
34a) is the mean forward derivative (see Chapter 11 of [
54]) of
along the paths of the
n-dimensional system of Itô stochastic differential equations:
In (
34a) and (
35), we consider drift
and volatility fields
of a more general form than in
Section 2 and
Section 3. This choice means that the following discussion is applicable to both overdamped and underdamped cases, as well as to more general situations, including non-linear problems [
40]. In non-linear problems, the expression of the solution of (
34a) and (
34b) and its gradient are iteratively computed in sequences of infinitesimal time horizons
to construct the solution of partial differential equations in which
,
and
F depend upon the unknown field
V.
It is well known that Dynkin’s formula (see, e.g., Chapter 6 of [
2]) yields a Monte Carlo representation of the solution of (
34a) and (
34b):
Our goal is to find an analogous expression for the gradient of
. The Bismut–Elworthy–Li formula [
34,
35,
55] accomplishes this task.
Remark 3. In what follows, to neaten mathematical formulae we adopt the push-forward notation for the Jacobian matrix of a vector field. We refer to Section pag. of [56] for a geometrical justification of the notation. For any , we writewhere and are, respectively, the i-th and j-th elements of the canonical basis of . Under our regularity assumptions, we regard the solution of (35) satisfying the conditionas the image of the stochastic flow [57] such thatand omit reference to the initial data on the left hand side, when no ambiguity arises. According to (37), we denote the cocycle obtained by differentiating the flow with respect to its argument as implying thatBy definition, enjoys the cocycle property [3], meaning that Here, we present a heuristic, physics-style derivation of the formula based on Malliavin’s stochastic variational calculus [
58] which draws from the mathematically more rigorous exposition in [
36], and is close to the original treatment in [
34]. To this end, we observe that if
is the
i-th element of the canonical basis of
where
denotes the matrix valued process obtained by varying (
35) with respect to its initial datum. In other words, if we suppose
then
The identity (
38) allows us to derive the Bismut–Elworthy–Li formula from Malliavin’s integration by parts formula.
4.1. Integration by Parts Formula
Let us consider the equation
We assume
to be a differentiable process, although rigorous constructions of integration by parts formula, see, e.g., [
58], weaken this assumption to processes of bounded variation (see Chapter 1 of [
2]). Differentiating at
yields the variational equation
We can always write the solution of this latter equation in terms of the push-forward of the flow of (
35):
Therefore, for sufficiently small
,
allows us to regard the solution of (
40) as a functional of the solution of (
35) (h.o.t. stands for higher order terms). The conclusion is that we can compute the expectation value of any integrable function
g of a solution of (
40) by expressing it as a function of the solution of (
35) via (
42) and then averaging with respect to the measure
generated by (
35):
A second connection comes from Girsanov’s change-of-measure formula. Namely, if
is the path measure generated by (
39), then, for any test function
g, we obtain the identity
If
g is also sufficiently regular, upon differentiating (
43) and (
44) at
, we arrive at Malliavin’s integration by parts formula:
4.2. Application to Non-Degenerate Diffusion
We set
where
is the
i-th element of the canonical basis of
. This is legitimate because, under standard regularity assumptions,
is a process of finite variation. Upon inserting into (
41), we obtain
The integration by parts formula becomes
The identity holds for arbitrary
. Hence, we can apply it to (
38) in order to derive the expression of the gradient of the solution of (
34a) and (
34b):
provided the volatility field
is always non-singular.
Application to the Transition Probability Density
It is worth noticing the following consequence of (
46) when
vanishes. In such a case, (
46) reduces to
As the identity must hold true for any
, we can also write
A result by Molchanov, Section 5 of [
59], allows us to express (
47) in terms of an expectation value with respect to a reciprocal process, see, e.g., [
60]. Namely, given a Markov process in
, we can use it to construct a reciprocal process, i.e., a process conditioned at both ends of the time horizon from the relations
Upon contrasting (
48) with (
47), we thus arrive at Bismut’s formula (page 78 of [
34]) for the gradient of the transition probability density:
The subscript
here means that we construct all finite dimensional approximations to the reciprocal process from the transition probability density of (
35) according to (
47).
Unfortunately, (
49) does not directly provide a Monte Carlo representation of the score function because the derivative acts on the variable expressing the condition. It is, however, possible to use ideas similar to these and from the previous sections to obtain a Monte Carlo representation of the score function.
4.3. Analytical Example
It is worth illustrating the use of the Dynkin’s and Bismut–Elworthy–Li formulas in a case where all calculations can be performed explicitly. To this end let us consider
whose solution is simply
Let us consider the partial differential equation
It is straightforward to verify that at any time
Upon applying Dynkin’s formula (
36), we verify that
Next, we wish to apply Bismut–Elworthy–Li to recover
To this end, we determine the cocycle solution of the linearized dynamics. The cocycle equation is
from where we obtain the unique solution
To evaluate the Bismut–Elworthy–Li formula, we also need the inverse of the volatility matrix, which is
We thus obtain
Using standard properties of stochastic integrals [
2], we recover the expected result:
4.4. Numerical Example
In this section, we apply the Bismut-Elworthy- Li formula to compute the gradient of the value function in the optimal control problem minimizing the Kullback–Leibler divergence (
16) in the overdamped dynamics: the gradient of the solution to (
18b). This is calculated as a numerical average over sampled trajectories of (
3). We use the same approximation of the optimal control potential
U as in the Fokker–Planck example of
Section 2.2.4. We find
Hence, (
46) becomes
We repeatedly sample trajectories of process (
3) using the Euler–Maruyama discretization scheme and compute the integrals as running costs over each trajectory, finally taking a numerical expectation. The calculation is summarized in Algorithm 3 and numerical results shown in
Figure 5.
From the physics point of view, note that we can conceptualize the motility constant with the ratio
for consistency with the underdamped equations.
Algorithm 3 Monte Carlo integration for gradient of the value function |
Initialize Initialize Initialize Initialize Initialize drift for for i in do Sample Brownian noise: Compute the BEL weights: if i == n then else end if Evolve one step of ( 3): end for Return
|
6. Application to Machine Learning
In this section, we return to the overdamped dynamics and demonstrate an application of numerical methods we discuss above. We present a prototype example for the optimal control problem in the overdamped dynamics of minimizing the Kullback–Leibler divergence (
16). Inspired by the seminal works [
62,
63], we model the optimal control protocol by a neural network, and use gradient descent to iteratively update it based on the stationarity condition (
19).
As before, we formulate the problem in terms of a Bismut–Pontryagin cost functional. Additionally, we enforce the assigned boundary conditions (initial and final conditions on the density of the form (
17a) and (
17b)) through a Lagrangian multiplier
. This gives
Taking stationary variation with respect to the density
, control protocol
U and value function
V yields the coupled partial differential Equations (
18a) and (
18b), and the stationarity condition (
19). We identify
and the following update rule:
chosen in this way to preserve the integrability conditions of the value function. The stationarity condition gives an update rule for the drift of the control protocol:
The parameters
control the step sizes of the gradient descent, known as a learning rate. The update for the Lagrange multiplier is a gradient ascent rather than descent [
64].
The right hand sides of both (
69) and (
70) can be computed using Monte Carlo integration techniques discussed in this article. With appropriate parametrization of the gradient of the control protocol and the Lagrangian multiplier
, the method could therefore scale to high dimensions. In this prototype example, we use a polynomial regression for fitting
and a neural network for the gradient of the control protocol. The polynomial regression could be replaced with any suitable parametrization, in particular, with a second neural network.
The gradient of the optimal control protocol is modelled by a neural network, denoted by . We use a feed-forward neural network: connected layers, representing affine transformations with non-linear functions (known as activation functions) between them. The neural network has a set of parameters (weights and biases) associated with the layers, which we denote by . The network takes the time t and space coordinates as input. Using a neural network allows for evaluating the optimal control protocol on new space coordinates without using interpolation, meaning that it can easily be used as the drift in the computation of the density and value functions using Algorithms 1 and 3.
The training process can be summarized as follows. Firstly, the functions
and
with a set of parameters
are initialized. We use these to find the final density
under this drift with Algorithm 1. The Lagrange multiplier is updated using (
69). The new
is used as the terminal condition of the value function. We then use Algorithm 3 to compute the gradient of the value function,
, using the current drift and terminal condition. The neural network parameters
are updated so that the new drift satisfies (
70). Under the updated drift, the final density is recomputed and the process is repeated until convergence. The whole iteration is summarized in Algorithm 5.
The results of Algorithm 5 are illustrated in
Figure 7. We show the final density in panel (a). Panels (b)–(g) show the approximation of the gradient of the control protocol by the trained neural network.
Algorithm 5 Learning an Optimal Control Protocol by Gradient Descent |
Initialize a neural network with parameters Initialize as a polynomial with coefficients zero Initialize learning rates while max iters do Initialize a batch of K points Compute using Algorithm 1 with as drift Update Set for , do Compute using Algorithm 3 with as drift Update such that is minimized end for end whilereturn Approximation of the gradient of the optimal control protocol
|
7. Conclusions
In this article, we discuss two integration methods for partial differential equations which frequently appear in optimal control problems. We show how we can use the Girsanov theorem such that a Fokker–Planck equation driven by a mechanical potential can be integrated by taking a numerical expectation of Monte Carlo trajectories of an auxiliary stochastic process. This method can be applied when the auxiliary stochastic process is non-degenerate or degenerate. Secondly, we use the Bismut–Elworthy–Li formula to find expressions for the gradient of the value function satisfying a Hamilton–Jacobi–Bellman equation. We show this for both a non-degenerate and degenerate diffusion.
The discussed numerical methods are supported by computational examples. We examine the dynamic Schrödinger bridge problem, or the minimization of the Kullback–Leibler divergence from a free diffusion while satisfying boundary conditions on the density at the initial and final time. For the overdamped dynamics, our integration shows good agreement with the iterative approach of Caluya and Halder [
41] in
Figure 2 and
Figure 5. In the underdamped case, we integrate the associated Fokker–Planck equation to support the consistency of the multiscale perturbative approach used in [
25]. In particular, we compute an estimate of the evolution of the joint density function of the system state for this problem in
Figure 3. We also verify the stationarity condition using the Bismut–Elworthy–Li for a degenerate diffusion in
Figure 6. Finally, we demonstrate an application of both integrations in a simple machine learning model in
Figure 7.
The optimal control problem discussed here has many applications. One possibility is application in machine learning, for instance in the development of diffusion models for image generation [
44]. Here, we find an optimal steering protocol between a noise distribution (e.g., a Gaussian) and a target (e.g., an image) by minimizing the Kullback–Leibler divergence. Optimal control problems in the underdamped dynamics are particularly interesting. Underdamped dynamics take into account random thermal fluctuations, noise and the effects of inertia; hence, they are well suited to model non-equilibrium transitions at nanoscale. Models of certain biological systems require considering complex dynamics, for example, because of random external noise from the environment [
30]. Such models then result in non-linear partial differential equations, making them difficult to integrate. While the implementation of machine learning to solve an optimal control problem we use here is a prototype, it may be possible to extend it to a more general setting. Specifically, we have in mind transitions obeying underdamped dynamics and occurring at minimum entropy production such as those considered in [
53].