Hybrid Neural Networks for Solving Fully Coupled, High-Dimensional Forward–Backward Stochastic Differential Equations

Wang, Mingcan; Wang, Xiangjun

doi:10.3390/math12071081

Open AccessArticle

Hybrid Neural Networks for Solving Fully Coupled, High-Dimensional Forward–Backward Stochastic Differential Equations

by

Mingcan Wang

and

Xiangjun Wang

^*

School of Mathematics and Statistics, Huazhong University of Science and Technology, Wuhan 430070, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(7), 1081; https://doi.org/10.3390/math12071081

Submission received: 21 February 2024 / Revised: 18 March 2024 / Accepted: 30 March 2024 / Published: 3 April 2024

(This article belongs to the Section Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

:

The theory of forward–backward stochastic differential equations occupies an important position in stochastic analysis and practical applications. However, the numerical solution of forward–backward stochastic differential equations, especially for high-dimensional cases, has stagnated. The development of deep learning provides ideas for its high-dimensional solution. In this paper, our focus lies on the fully coupled forward–backward stochastic differential equation. We design a neural network structure tailored to the characteristics of the equation and develop a hybrid BiGRU model for solving it. We introduce the time dimension based on the sequence nature after discretizing the FBSDE. By considering the interactions between preceding and succeeding time steps, we construct the BiGRU hybrid model. This enables us to effectively capture both long- and short-term dependencies, thus mitigating issues such as gradient vanishing and explosion. Residual learning is introduced within the neural network at each time step; the structure of the loss function is adjusted according to the properties of the equation. The model established above can effectively solve fully coupled forward–backward stochastic differential equations, effectively avoiding the effects of dimensional catastrophe, gradient vanishing, and gradient explosion problems, with higher accuracy, stronger stability, and stronger model interpretability.

Keywords:

FBSDE; high-dimensional solutions; hybrid neural networks; time dependency

MSC:

65Q10

1. Introduction

The concept of stochastic calculus and related theories was introduced in 1942 by the Japanese mathematician Itô, who later introduced stochastic differential equations (SDEs). The linear backward stochastic differential equation (BSDE) was first introduced in 1973 [1], and since then, the BSDE has long been used as a concomitant equation for stochastic optimal control problems, and the related theory has been continuously developed. In 1990, Pardoux et al. studied general nonlinear BSDE and proved the existence and uniqueness of solutions to such equations, laying the foundation for the development of the theory of forward–backward stochastic differential equation (FBSDE) [2]. Kohlmann et al. [3] attempted to explore the relationship between BSDEs and stochastic control by interpreting BSDEs as certain stochastic optimization problems. The study of forward–backward stochastic differential equations (FBSDEs) originates from the optimal control problem, and the main objective is to study how to reach the desired goal under the given conditions. Pardoux et al. [2] proposed the general stochastic maximum principle, which, under certain assumptions, defines a Hamiltonian function of the following form:

H (x, y, z) = 〈 y, h (x, v) 〉 + 〈 z, μ (x, v) 〉 + L (x, v)

(1)

Here, the dyadic variables,

(y_{t}, z_{t})

, and the state of the system corresponding to the optimal control,

x_{t}

, satisfy a stochastic Hamiltonian system of the following form:

\{\begin{matrix} d x_{t} = H_{y} (x_{t}, y_{t}, z_{t}) d t + H_{z} (x_{t}, y_{t}, z_{t}) d B_{t}, \\ - d y_{t} = H_{x} (x_{t}, y_{t}, z_{t}) d t - z_{t} d B_{t}, \\ x_{0} = a, y_{T} = h_{x} (x_{T}) . \end{matrix}

(2)

The stochastic system consists of a class of fully coupled forward–backward stochastic differential equations and the study of its solvability is of great significance to the stochastic optimal control theory. Antonelli et al. [4] were the first to investigate the existence and uniqueness of FBSDE; since then, Ma et al. [5,6,7,8] have launched a detailed demonstration of the existence and uniqueness of the solution of FBSDE in different forms and dimensions. Zhen et al. [9] proved the existence and uniqueness of fully coupled FBSDE solutions. Now, the application of FBSDE theory in the direction of stochastic optimal control [10,11], financial mathematics [12], partial differential equation theory [13,14], and other directions is constantly deepening, and it has become an important branch of the stochastic analysis field. However, for the FBSDE solution, we usually cannot obtain an explicit solution; therefore, the study of numerical solutions plays a significant role in the development of the related theory.

Nowadays, the main numerical solution methods for FBSDE are as follows:

(1) We use the relationship between the FBSDE and the PDE solution, i.e., the Feynman–Kac formula, to find a numerical solution to the PDE, and thus, a corresponding numerical solution to the FBSDE [5];

(2) Local linearized solution of BSDE using the Picard iterative method [15];

(3) By approximating the Brownian motion and the BSDE generator, we approximate solution of BSDE and, thus, solve the corresponding FBSDE [16];

(4) According to the structure of the FBSDE, we combine the theories of stochastic analysis and scientific computation and use the differential method or integral method for numerical solving [17,18].

The above methods are limited to low-dimensional FBSDEs; for high-dimensional solutions, they inevitably encounter the “dimensional catastrophe” problem [19], i.e., in high-dimensional cases, computational complexity increases exponentially, which will undoubtedly restrict the applicability of FBSDE scenarios. In recent years, deep learning has been applied in different fields, providing new ideas for numerical approximations of high-dimensional functions.

The development of deep learning provides new ideas for numerical solutions of equations. Deep learning has been widely used in various fields in recent years, including computer vision, natural language processing, speech recognition, and recommender systems. At the heart of deep learning is a neural network, a mathematical model consisting of multiple neurons, or nodes, which are interconnected by weights. The universal approximation theorem [20] shows that given an infinite number of neurons in a neural network, it is possible to represent any complex continuous function exactly. In 2017, Han et al. [21] proposed the DBSDE method for solving high-dimensional partial differential equations (PDEs) or backward stochastic differential equations (BSDEs). Ref. [22] extends the DBSDE method to solve high-dimensional FBSDEs by constructing three different network structures based on various feedback forms.

This paper focuses on applying neural networks to the numerical solution computation of high-dimensional fully coupled FBSDEs. The main ideas are as follows:

(1) Constructing neural networks to solve fully coupled, high-dimensional forward–backward stochastic differential equations to effectively avoid falling into the “dimensional catastrophe” problem;

(2) Determining an equivalent stochastic optimal control process with the stochastic process

X (\cdot)

as the feedback and stochastic process

(Y (\cdot), Z (\cdot))

as the control pairs; it involves constructing the neural network model to estimate

(Y (\cdot), Z (\cdot))

at the same time and modifying the structure of the loss function to adapt to the different situations effectively.

(3) Constructing a neural network according to the nature of the FBSDE, the time dimension is introduced, the long- and short-term dependence between sequences is considered, as well as the influence of previous and subsequent time steps. Moreover, the hybrid BiGRU model is constructed, which can reduce error, improve computational accuracy and model interpretability, and at the same time, avoid problems such as “gradient disappearance”, “gradient explosion”, “gradient explosion”, and so on;

(4) Adjusting the internal structure of the neural network involves using a new weight initialization method, considering the gradient relationship between adjacent time steps, adjusting the activation function types, using the residual learning network, and discussing the data normalization process. Through the above adjustments, the accuracy of the model as well as the convergence speed are further improved.

This article is structured as follows:

Section 2 briefly introduces the theory of forward–backward stochastic differential equations and neural networks, briefly reviews the deep learning approach to solving the numerical solution of the BSDE (DBSDE method), and designs a hybrid BiGRU model to solve the equations. Section 3 designs and conducts numerical experiments to compare the experimental results of different models. Section 4 evaluates the models based on the experimental results and theoretical derivations.

2. Materials and Methods

This chapter introduces the basic concepts needed for the article, including the forward–backward stochastic differential equation, deep learning, and neural networks. It provides a brief review of the DBSDE method. We construct a hybrid BiGRU model based on the above study and explain the rationale for the modeling.

2.1. Materials

2.1.1. Forward–Backward Stochastic Differential Equation

In this section, from the structure of the solution and the structure of the neural network, we study the fully coupled forward–backward stochastic differential equation (FBSDE) in the high-dimensional case of numerical solutions.

A fully coupled forward–backward stochastic differential equation defined on

(Ω, F, F, P)

has the following general form:

\{\begin{matrix} X_{t} = X_{0} + \int_{0}^{t} b (s, X_{s}, Y_{s}, Z_{s}) d s + \int_{0}^{t} σ (s, X_{s}, Y_{s}, Z_{s}) {d B}_{s}, \\ Y_{t} = ξ + \int_{t}^{T} f (s, X_{s}, Y_{s}, Z_{s}) d s - \int_{t}^{T} Z_{s} {d B}_{s} \end{matrix}

(3)

Where

t \in [0, t]

,

X_{0} \in F_{0}

denotes the initial conditions for the forward stochastic differential equation (SDE), and

ξ \in F_{T}

represents the terminal conditions for the backward stochastic differential equation (BSDE). b and

σ

denote the SDE drift coefficient and diffusion coefficient, and f denotes the generator of the BSDE. When

b (\cdot)

and

σ (\cdot)

do not depend on the solutions (

Y_{t}

and

Z_{t}

) of the BSDE, the above FBSDE simplifies to the following:

\{\begin{matrix} X_{t} = X_{0} + \int_{0}^{t} b (s, X_{s}) d s + \int_{0}^{t} σ (s, X_{s}) {d B}_{s}, \\ Y_{t} = ξ + \int_{t}^{T} f (s, X_{s}, Y_{s}, Z_{s}) d s - \int_{t}^{T} Z_{s} {d B}_{s} \end{matrix}

(4)

Where

t \in [0, T]

, at this point, the equation is an uncoupled FBSDE.

Assumption 1.

(1)

A (t, v)

is consistently Lipschitz continuous with respect to v.

(2)

g (x)

on

x \in R^{n}

is consistently Lipschitz continuous.

Assumption 2.

\begin{matrix} 〈 A (t, v) - A (t, \bar{v}), v - \bar{v} 〉 & \leq - β_{1} {| G \hat{x} |}^{2} - β_{2} ({|G^{T} \hat{y}|}^{2} + {|G^{T} \hat{z}|}^{2}), \\ 〈 g (x) - g (\bar{x}), G (x - \bar{x}) 〉 & \geq μ_{1} {| G \hat{x} |}^{2}, \end{matrix}

where

β_{1}, β_{2}

and

μ_{1}

are given non-negative constants and satisfy

β_{1} + β_{2} > 0, μ_{1} + β_{2} > 0

. Furthermore, when

m > n

, there are

β_{1} > 0, μ_{1} > 0

.

\forall v = (x, y, z), \bar{v} = (\bar{x}, \bar{y}, \bar{z}), \hat{x} = x - \bar{x}, \hat{y} = y - \bar{y}, \hat{z} = z - \bar{z},

where G is given a full rank matrix, and we define it as follows:

v = (\begin{matrix} x \\ y \\ z \end{matrix}), A (t, v) = (\begin{matrix} - G^{T} f \\ G b \\ G σ \end{matrix}) (t, v),

G σ = (G σ_{1}, \dots, G σ_{d})

When Assumption 1 and Assumption 2 hold, then there exists a unique adapted solution

(X (\cdot), Y (\cdot), Z (\cdot))

of the FBSDE, and the proof of the theorem is given in [6,7].

2.1.2. Deep Learning and Neural Networks

Deep learning is a machine learning method that attempts to automatically learn and extract information from data by mimicking the workings of the human brain to be able to process and analyze complex data. Neural networks can have multiple layers, each with a set of neurons, where the input layer receives data, the output layer produces results, and the middle layer performs information processing. Commonly used neural network models [23] include feedforward neural networks (FNNs), convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), etc. At the current stage, deep learning is still rapidly evolving, and based on the models of the neural networks mentioned above, new models are emerging for different problems.

Recurrent neural networks (RNNs) can be specialized to process sequential data and, unlike traditional feedforward neural networks, have a recurrent structure that allows them to process variable-length input data and maintain memory while processing sequential data. Although RNNs have been widely used, there are still some problems, such as the long-term dependency problem and the difficulty for neural networks to capture long-distance dependencies. This problem can be solved efficiently in the gated recurrent unit (GRU) [24], a model that performs well in solving the gradient vanishing problem and the long-term dependency problem, which can be used to solve long sequence modeling and processing problems. The structure is shown in Figure 1:

BiGRU (bidirectional GRU) is a variant of recurrent neural networks (RNNs) with update gates and reset gates, which control the passing and forgetting of information and help to solve the gradient vanishing problem. It uses two separate GRU layers, one that handles forward (from the beginning to the end of the sequence) inputs and another that handles reverse (from the end to the beginning of the sequence) inputs, so more contextual information can be captured.

2.1.3. The DBSDE Method and Its Generalization

Neural networks rely on combinations of simple functions and have been shown to have good performance and accuracy for high-dimensional problems. Among them, the deep learning-based backward stochastic differential equation method (DBSDE) [21] is used to solve the partial differential equation and forward–backward stochastic differential equation. The method treats the BSDE corresponding to the PDE as a stochastic control problem by means of the Feynman–Kac formulation and discretizes time; the gradient of the solution at individual time steps is used as a strategy function to solve the loss function using gradient descent. The posterior estimation of the DBSDE method was further investigated by Han et al. [25]. Sirignano et al. [26] proposed a DGM method based on deep learning; Huré et al. [27] proposed the inverse dynamic programming method based on deep learning, constructed two formats, DBDP1 and DBDP2, according to the different ways of processing the solution, and provided the corresponding convergence analysis.

The main idea of the DBSDE method for solving high-dimensional PDE or BSDE is as follows:

For the following BSDE:

\{\begin{matrix} - d Y_{t} = f (t, X_{t}, Y_{t}, Z_{t}) d t - Z_{t} d B_{t} \\ Y_{T} = g (X_{T}) \end{matrix}

Where

X (\cdot), Z (\cdot) \in M_{F}^{2} (0, T; R^{n})

,

Y (\cdot) \in M_{F}^{2} (0, T; R)

,

B_{t}

is an n-dimensional standard Brownian motion. According to the nonlinear

F e y n m a n - - K a c

formula, the BSDE is associated with a PDE, and the solution

(Y_{t}, Z_{t})

of the BSDE and the solution

u (t, x)

of the PDE satisfy the following relation:

\{\begin{matrix} Y_{t} = u (t, X_{t}), \\ Z_{t} = σ (t, X_{t}) \nabla u (t, X_{t}) . \end{matrix}

The BSDE is transformed into a stochastic optimal control problem, whose state equation is the following SDE (assuming that the initial moment is 0).

\{\begin{matrix} d X_{t} = b (t, X_{t}) d t + σ (t, X_{t}) d B_{t} \\ - d Y_{t} = f (t, X_{t}, Y_{t}, Z_{t}) d t - Z_{t} d B_{t} \\ X_{0} = x, Y_{0} = Y_{0} \end{matrix}

Consider the following equivalence control problem:

inf_{{\{Y_{0}, Z_{t}\}}_{0 \leq t \leq T}} E [{|Y_{T} - g (X_{T})|}^{2}]

Here, the process

Z (\cdot)

and the initial state

Y_{0}

are treated as stochastic controls. By discretizing the forward SDE, the corresponding Euler–Maruyama series can be obtained, as follows:

\{\begin{matrix} X_{t_{i + 1}}^{π} = X_{t_{i}}^{π} + b (t, X_{t_{i}}^{π}) (t_{i + 1} - t_{i}) + σ (t, X_{t_{i}}^{π}) (B_{t_{i + 1}} - B_{t_{i}}), \\ Y_{t_{i + 1}}^{π} = Y_{t_{i}}^{π} - f (t, X_{t_{i}}^{π}, Y_{t_{i}}^{π}, Z_{t_{i}}^{π}) (t_{i + 1} - t_{i}) + Z_{t_{i}}^{π} (B_{t_{i + 1}} - B_{t_{i}}), \\ X_{t_{0}}^{π} = x, Y_{t_{0}}^{π} = Y_{0} \end{matrix}

Where

π : 0 = t_{0} < t_{1} < \dots < t_{N} = T

is a division of the interval

[0, T]

.

Y_{0}, Z_{0}

are two trainable initial parameters, and a feedforward neural network is constructed to approximate the control

Z_{t_{i}}^{π}

:

Z_{t_{i}}^{π} ≃ {NN}_{i} (X_{t_{i}}^{π}; θ_{i}), 1 \leq i \leq N - 1

where

θ_{i}

denotes the training parameters at moment

t_{i}

.

Remark, Neural network approximation refers to the process of using neural networks to approximate complex functions or data. Specifically, neural networks effectively approximate complex nonlinear functions by learning the mapping relationship between inputs and outputs. In this process, the parameters of the neural network are adjusted to minimize the difference between the predicted output and the actual output, enabling the neural network to produce prediction results close to the true output for a given input. The DBSDE method proposed in the references employs neural networks to approximate

Z_{t_{i}}^{π}

.

We define the loss function of the neural network as follows:

loss = \frac{1}{M} \sum_{m = 1}^{M} {|Y_{t_{N}}^{π, m} - g (X_{t_{N}}^{π, m})|}^{2}

where M denotes the number of orbital samples for Brownian motion.

The neural network structure is shown in Figure 2; hereafter, and in later drawings,

X_{t_{i}}, Y_{t_{i}}, Z_{t_{i}}

is abbreviated as

X_{i}, Y_{i}, Z_{i}

.

Comparing the numerical results of the DBSDE method with the explicit solutions of the equation, it was found that the DBSDE method is not only effective at approximating the explicit solutions of the equations in the low-dimensional case but also in the high-dimensional case; the results are still satisfactory.

Reference [22] generalizes the above DBSDE method to the numerical solution problem of high-dimensional FBSDE. And the algorithms are designed according to three different forms of state feedback with relatively good numerical results. However, for different equations, the three algorithms do not have the same effect, and the structures of the FBSDE and neural network are not considered in the design of the algorithms, reducing the scope of the use of the algorithms and efficiency. This paper is based on the DBSDE method.

2.2. Construction of Hybrid Models

For the following fully coupled FBSDE:

\{\begin{matrix} d X_{t} = b (t, X_{t}, Y_{t}, Z_{t}) d t + σ (t, X_{t}, Y_{t}, Z_{t}) d B_{t} \\ - d Y_{t} = f (t, X_{t}, Y_{t}, Z_{t}) d t - Z_{t} d B_{t} \\ X_{0} = x, Y_{T} = g (X_{T}) \end{matrix}

(5)

Where

t \in [0, T]

,

x \in F_{0}

is the initial condition for this FBSDE,

X (\cdot), Y (\cdot), Z (\cdot)

are

F_{t}

adapted, and stochastic processes take values of

R^{n}, R^{m}, R^{m}, R^{m \times d}

, respectively.

\begin{matrix} b : Ω \times [0, T] \times R^{n} \times R^{m} \times R^{m \times d} \to R^{n} \\ σ : Ω \times [0, T] \times R^{n} \times R^{m} \times R^{m \times d} \to R^{n \times d} \\ f : Ω \times [0, T] \times R^{n} \times R^{m} \times R^{m \times d} \to R^{m} \\ g : Ω \times [0, T] \times R^{n} \to R^{m} \end{matrix}

are given continuous deterministic functions. b and

σ

are the coefficients of the drift term and diffusion term, respectively, and f is the generator.

2.2.1. Equivalent Stochastic Optimal Control

In this section, the FBSDE is solved numerically from the perspective of stochastic optimal control. In the DBSDE method,

Z_{t}

is regarded as a control process with states

X_{t}, Y_{t}

as feedback variables, whereas in the actual computation, there are similar convergence results when

Z_{t}

is regarded as a control process with state

X_{t}

as a feedback variable. In this case, the stochastic optimal control problem is focused solely on the terminal time state, which is effective in certain specific cases. For instance, in the context of option theory, European options only require considering the state at maturity. The equivalent stochastic optimal control problem proposed in our paper not only considers the terminal time state but also considers the state at each time, adjusting between the two parts using the parameter

λ

. This effectively expands the model’s applicability; for example, when

λ = 0

, only the terminal time scenario is considered; when

λ = 1

, the terminal time and process time are given equal weight, aligning with the financial scenario where option holders have the right to exercise the option at any time. Other scenarios can also be dynamically adjusted through

λ

.

Therefore, in this paper, we consider

X (\cdot)

as the state process and

Y (\cdot)

,

Z (\cdot)

as the control process; we obtain the equivalent stochastic optimal control process. We consider the following SDE:

\{\begin{matrix} d X_{t} = b (t, X_{t}, Y_{t}, Z_{t}) d t + σ (t, X_{t}, Y_{t}, Z_{t}) d B_{t} \\ - d Y_{t} = f (t, X_{t}, Y_{t}, Z_{t}) d t - Z_{t} d B_{t} \\ X_{0} = x, Y_{0} = u_{0} \end{matrix}

(6)

Where

t \in [0, T]

; the control

Y (\cdot)

of the SDE is denoted as

u (\cdot)

in order to distinguish it from the Y of the backward equation; and

(X_{0} = x, Y_{0} = u_{0})

is the initial state of the equation. Consider the following equivalent stochastic optimal control process:

inf_{{\{u_{t}, Z_{t}\}}_{0 \leq t \leq T}} E [{|Y_{T} - g (X_{T})|}^{2} + λ \int_{0}^{T} {|Y_{t} - u_{t}|}^{2} d t]

(7)

Where

(u (\cdot), Z (\cdot))

is the

F_{t} -

adapted control process with

X (\cdot)

as the feedback, and

λ \geq 0

is the parameter used to adjust the weighting of two of the above stochastic optimal control processes. In particular, when

λ = 0

, the stochastic control problem (7) takes

(Y_{0}, Z (\cdot))

as the control pair and

(X (\cdot), Y (\cdot))

as the feedback.

Further, the Euler–Maruyama discrete form of the state equation SDE is obtained as follows:

\{\begin{matrix} X_{t_{i + 1}}^{π} = X_{t_{i}}^{π} + b (t_{i}, X_{t_{i}}^{π}, u_{t_{i}}^{π}, Z_{t_{i}}^{π}) Δ t_{i} + σ (t_{i}, X_{t_{i}}^{π}, u_{t_{i}}^{π}, Z_{t_{i}}^{π}) Δ B_{t_{i}}, \\ Y_{t_{i + 1}}^{π} = Y_{t_{i}}^{π} - f (t_{i}, X_{t_{i}}^{π}, Y_{t_{i}}^{π}, Z_{t_{i}}^{π}) Δ t_{i} + Z_{t_{i}}^{π} Δ B_{t_{i}}, \\ X_{0}^{π} = x, Y_{0}^{π} = u_{0}^{π} \end{matrix}

Lemma 1.

Solving the fully coupled FBSDE (5) is equivalent to solving the stochastic control problem (7), i.e., the stochastic control process (7) is the equivalent stochastic optimal control process of the FBSDE (7).

Proof.

Let

(u (\cdot), Z (\cdot)

be an optimal control pair for a stochastic control process (7), i.e.,

inf_{{\{u_{t}, Z_{t}\}}_{0 \leq t \leq T}} E [{|Y_{T} - g (X_{T})|}^{2} + λ \int_{0}^{T} {|Y_{t} - u_{t}|}^{2} d t] = 0

then the terminal conditions,

Y_{T} = g (X_{T}), Y_{t} = u_{t}

, are satisfied at this point. Thus, the state Equation (6) can be written in the corresponding BSDE form, i.e.,

(X (\cdot), Y (\cdot), Z (\cdot))

is the solution of the FBSDE (5). Since Assumptions 1 and 2 hold,

(X (\cdot), Y (\cdot), Z (\cdot))

is the unique solution to the FBSDE (5).

Conversely, a solution for FBSDE (5) exists and is unique when Assumptions 1 and 2 hold. Let the solution at this point be

(X^{*} (\cdot), Y^{*} (\cdot), Z^{*} (\cdot))

,

(u^{*} (\cdot), Z^{*} (\cdot))

as a control pair of a stochastic control process (7), and we clearly have the following:

inf_{{\{u_{t}, Z_{t}\}}_{0 \leq t \leq T}} E [{|Y_{T} - g (X_{T})|}^{2} + λ \int_{0}^{T} {|Y_{t} - u_{t}|}^{2} d t] = 0

That is, stochastic optimal control pairs exist. □

Kohlmann et al. studied the relationship between FBSDE and stochastic optimal control [3], and through the proof of the lemma, it is not difficult to see that under the conditions of Assumptions 1 and 2, solving the FBSDE (5) is equivalent to solving the stochastic optimal control problem (7). If the coefficients

b, σ, f, g

in the FBSDE (5) satisfy weaker conditions, we can still obtain that the FBSDE (5) is solvable if and only if the stochastic optimal control problem (7) is solvable [28]. In this case, we can still obtain the solution of the FBSDE by solving the stochastic optimal control problem, but the solution of the FBSDE may not be unique. Therefore, the subsequent design of the cost function is also reasonable.

2.2.2. Neural Network Architecture

Both the DBSDE method and its generalization currently use standard fully connected neural networks, while the use of different neural network architectures will likely improve results depending on the type of problem to be addressed. For example, convolutional neural networks are widely used in computer vision and image generation, while recurrent neural networks are more common in natural language processing. Analyzing the issue of solving the high-dimensional fully coupled FBSDE, the sequence data are obtained after discretizing the equations, and it is easy to find that there are dependencies between the sequence time steps. Consider that the output of the previous time step of the SDE affects the input of the next step and that the output of the subsequent time step of the BSDE also affects the input of the previous time step. In this paper, inspired by the DBSDE method, we innovate from the overall architecture of the neural network and the internal structure of each time step of the neural network, which is used to perform a fully coupled, high-dimensional FBSDE solution.

(1) Loss function design.

Equivalent stochastic optimal control process (7). According to the implicit function existence theorem, we have the following:

\{\begin{matrix} u_{t_{i}}^{π} = φ_{i}^{u} (X_{t_{i}}^{π}), \\ Z_{t_{i}}^{π} = φ_{i}^{z} (X_{t_{i}}^{π}) . \end{matrix}

We construct two neural networks to fit

u_{t_{i}}^{π}, Z_{t_{i}}^{π}

, i.e.,

\{\begin{matrix} u_{t_{i}}^{π} ≃ N N_{u} (X_{t_{i}}^{π}; θ_{i}^{u}), \\ Z_{t_{i}}^{π} ≃ N N_{z} (X_{t_{i}}^{π}; θ_{i}^{z}), \end{matrix}

θ_{i}^{u}

and

θ_{i}^{z}

denote the trainable parameters of the two neural networks at the moment

t_{i}

, respectively.

We define the loss function of the neural network as follows:

loss = \frac{1}{M} \sum_{m = 1}^{M} [{|Y_{T}^{π, m} - g (X_{T}^{π, m})|}^{2} + λ \sum_{i = 0}^{N - 1} {|Y_{t_{i}}^{π, m} - u_{t_{i}}^{π, m}|}^{2} Δ t_{i}]

Where

{|Y_{T}^{π, m} - g (X_{T}^{π, m})|}^{2}

denotes loss at the endpoint,

{|Y_{t_{i}}^{π, m} - u_{t_{i}}^{π, m}|}^{2} Δ t_{i}

denotes loss at each time step, the parameter

λ

effectively adjusts the proportion occupied by the endpoint error and the continuous time process error.

(2) Incorporation of the time dimension.

According to the nature of the sequence after FBSDE discretization, incorporating the time dimension in the neural network can effectively deal with the non-stationary problem and increase the generalization ability of the model.

(3) Based on the BiGRU model.

Due to the Eulerian discretization error, the objective

g (X_{T})

cannot be reached exactly, and the loss function cannot go exactly to zero. If the neural network’s output is permitted to depend not just on the variable realizations at moment t, but also on the long-term and short-term dependencies, it may be possible to offset the discretization error and find some strategy with a loss function smaller than that of a simple fully connected network. The LSTM and GRU models can effectively address the problem of short-term and long-term sequence dependencies, avoid the problems of “gradient disappearance” and “gradient explosion”, reduce the instability of numerical computation, and improve the convergence of the algorithm. Compared with the LSTM model, the GRU model can effectively limit the number of network parameters that need to be estimated when the temporal discretization increases, requiring a smaller amount of data without significant performance degradation, so the GRU model is used in this paper.

FBSDE is concerned with how to achieve the desired goal under the given conditions, and its solution requires finding the process that satisfies the beginning and end conditions of the time; therefore, when the neural network is constructed, the effects of the previous and subsequent time steps are considered at the same time, and a bidirectional GRU (BiGRU) model is used. Two separate GRU layers process the forward input and the reverse input, respectively, while capturing information related to the previous and subsequent time steps.

2.2.3. Internal Structure of Neural Networks

Analyze the internal structure of the DBSDE method and attempt to improve the algorithm in the following aspects to improve the convergence efficiency, accuracy, and stability of the model.

(1) Weight initialization.

Zero initialization makes it difficult to train the network, while random initialization has the issue of “backpropagation gradient Vanishing” or “forward propagation information flow vanishing”. The original method lacks a connection between the gradients of adjacent time steps, which is obviously not in line with the nature of FBSDE. In this paper, we attempt to initialize the neural network weights and biases to the weights and biases after processing the previous time step.

(2) Batch normalization (BN) processing.

BN processing not only accelerates the convergence speed of the model but also effectively alleviates the problem of dispersed feature distribution in the deep neural network, to avoid the emergence of “gradient explosion” and other problems that cannot be trained, so as to make the training model more stable. Note that BN processing is more applicable to convolutional neural networks, and is difficult to implement in recurrent neural networks (even if applicable, it would only be able to process static signals, not share statistical information about movements over time). The DBSDE method uses feedforward neural networks, so adding a normalization layer (BN layer) before the input of each layer can improve the speed of convergence. At the same time, our experiments have shown that omitting batch normalization in the last layer can effectively improve the accuracy of the results, so it is treated in this way.

(3) Adjust the activation function type.

The original method uses ReLU as the activation function, which has the advantages of unilateral inhibition, relatively wide excitation boundaries, and sparse activation properties. However, its output is not “0-centered”, and “neuron death” may occur as training progresses. In this regard, this paper attempts to adopt ELU as the activation function with an output mean close to 0, thus speeding up the learning rate while avoiding the problem of vanishing gradients through positive identification. Because of the above properties of ELU as an activation function, the batch normalization of data can be appropriately relaxed to reduce the amount of computation.

(4) ELU is used as an activation function while residual learning is incorporated.

Analyzing the batch normalization process and adjusting the activation function type reveal that the results obtained by the two methods are similar, but adjusting the activation function type requires fewer parameters of the neural network, and the computational amount is smaller. Therefore, a residual learning network is added to the ELU as an activation function. The largest difference between the residual network and the previous network is the extra identity shortcut branch, which creates identity shortcut connections between several hidden layers. And because of the existence of this one branch, it allows the network to propagate in reverse; the loss can be passed through this shortcut to pass the gradient directly to the network further ahead, thus slowing down the problem of network degradation and enhancing convergence speed. Other than that, no new parameters are added to the residual network, only an extra step of addition.

(5) Optimizer selection.

For the parameters in the neural network, Adam was chosen as the optimizer to iteratively update the network parameters.

The internal restructuring of the neural network for a single time step is shown in Figure 3:

Figure 3a illustrates the inclusion of batch normalization with ReLU activation for each input layer. However, experiments showed that omitting batch normalization in the final layer yielded better outcomes, as depicted in Figure 3b. Figure 3c represents the modification of the activation function in Figure 3b to ELU and the elimination of batch normalization. Figure 3d showcases the incorporation of residual learning to the structure outlined in Figure 3c.

2.2.4. Construction of the Model

In this section, we construct a hybrid BiGRU model and compare it with the model extended from the DBSDE method.

(1) Extension of the DBSDE method.

The equivalent stochastic optimal control process is still used as proposed in the DBSDE method, with

(X_{t}, Y_{t})

denoting the feedback variables and

Z_{t}

denoting the control, i.e.,

inf_{{\{Z_{t}\}}_{0 \leq t \leq T}} E [{|Y_{T} - g (X_{T})|}^{2}]

A feedforward neural network is used to fit

Z_{t}

, i.e.,

Z_{t_{i}}^{π} ≃ {NN}_{i} (X_{t_{i}}^{π}; θ_{i}), 1 \leq i \leq N - 1

Batch normalization is performed before each layer of input.

The neural network structure of the Extension of the DBSDE method is shown in Figure 4:

In Figure 4, the red line represents the neural network fitting process and the black line represents the data flow.

The pseudocode for the model is shown in Table 1:

(2) Hybrid BiGRU Model

Using

(X_{t}, Y_{t})

as the feedback variables, we fit the variable

Z_{t}

with the BiGRU model; the neural network structure is shown in Figure 5, as follows:

In Figure 5, the red line represents the neural network fitting process and the black line represents the data flow.

The equivalent stochastic optimal control process uses the (7) scheme proposed in this paper. Using

X_{t}

as the feedback variable, we construct two different feedforward neural networks to fit the variables,

u_{t}, Z_{t}

respectively, i.e.,

\{\begin{matrix} u_{t_{i}}^{π} = φ_{i}^{u} (X_{t_{i}}^{π}), \\ Z_{t_{i}}^{π} = φ_{i}^{z} (X_{t_{i}}^{π}) . \end{matrix}

The neural network structure is shown in Figure 6, as follows:

In Figure 6, the red line represents the neural network fitting process and the black line represents the data flow.

Combining the above two adjustments, a hybrid BiGRU model is constructed by using

X_{t}

as the feedback variable and fitting

u_{t}

and

Z_{t}

using the BiGRU model, respectively, i.e.,

u_{i} : = {NN}_{u} (X_{i}, Δ B_{t_{i}}; θ_{i}^{u})

Z_{i} : = {NN}_{z} (X_{i}, Δ B_{t_{i}}; θ_{i}^{z})

The neural network structure is shown in Figure 7, where the red arrows indicate the neural network fitting process, the black arrows indicate the data flow, the green squares indicate the neural network fitting variables, and the blue squares indicate the structure of BiGRU.

The above network structure can likewise be optimized for a single time step using a similar approach to Figure 3.

The pseudocode for the hybrid BiGRU model is shown in Table 2, as follows:

3. Results

In this section, two equations are selected for experimentation.

The neural network has two hidden layers, and the learning rate is adjusted using an adaptive strategy to dynamically improve training effectiveness and stability. The weights of the neural network model are initialized using the Xavier method, and to avoid falling into local optimal solutions,

Y_{0}

is initialized. A batch size of

M = 300

is chosen to balance learning efficiency and algorithmic speed, and the number of discrete time steps is

N = 100

. The experiments were conducted on a Lenovo Xiaoxin Pro 14ACH 2021x computer, manufactured in Beijing, China. This computer is equipped with an Intel i5 CPU and 16 GB of RAM. The experiments required a setup with Python 3.8 and TensorFlow 2.8 environment.

We evaluate the experimental results and corresponding loss functions of the model using the following metrics:

(1) Relative error on

Y_{0}

:

Relative error on Y_{0} = \frac{|Y_{0, est} - Y_{0, real}|}{|Y_{0, real}|}

This paper uses relative error to quantify the difference between the true value and the predicted value of

Y_{0}

. A smaller relative error indicates a closer approximation to the true value, while a larger relative error suggests a greater disparity between the estimated and true values.

(2) Number of iterations to convergence. We recorded the number of iterations required for convergence based on the experimental results obtained to evaluate the training efficiency and performance of the neural network. A smaller number of iterations implies that the network can converge more quickly, reducing training time and improving efficiency. Additionally, the speed of convergence also reflects the extent to which the neural network fits the training data. Generally, faster convergence may indicate better model fitting.

On the one hand, for the hybrid BiGRU model, the following four lifting strategies are validated against the extension of the DBSDE method to prove the effectiveness of the lifting strategies.

Strategy 1: Using $(X_{t}, Y_{t})$ as the feedback variables, we use a feedforward neural network, adjust the activation function type from ReLU to ELU without batch normalization, and fit the variable $Z_{t}$ ;
Strategy 2: Taking $(X_{t}, Y_{t})$ as the feedback variables, we use the feedforward neural network, adjust the activation function type from ReLU to ELU without batch normalization, and also add residual learning to fit the variable $Z_{t}$ ;
Strategy 3: Using $(X_{t}, Y_{t})$ as the feedback variables, we fit the variable $Z_{t}$ with the BiGRU model;
Strategy 4: Using $X_{t}$ as the feedback variable, we construct two different feedforward neural networks to fit the variables $u_{t}, Z_{t}$ , respectively.

On the other hand, the experimental results of the extended model based on the DBSDE method are then compared with those of the hybrid BiGRU model, and conclusions are drawn.

3.1. Experiment 1

Suppose

t \in [0, T]

, the initial state is

x = (x_{1}, \dots, x_{n}) \in R^{d}

;

y, z

take the values of

R, R^{d}

, respectively, and functions

b, σ, f, g

are satisfied, as follows:

\begin{matrix} b (t, x, y, z) & = 0, \\ σ (t, x, y, z) & = 0.25 diag (x), \\ f (t, x, y, z) & = 0.25 \times (y - \frac{2 + 0 . 25^{2} \times d}{2 \times 0 . 25^{2} \times d}) (\sum_{i = 1}^{d} z_{i}), \\ g (x) & = \frac{exp (T + \sum_{i = 1}^{d} x_{i})}{1 + exp (T + \sum_{i = 1}^{d} x_{i})}, \end{matrix}

where

diag (x)

means that the ith element is the diagonal matrix of

x_{i}

.

The analytical solution to this FBSDE is

Y (t, X_{t}) = \frac{exp (t + \sum_{i = 1}^{d} X_{i t})}{1 + exp (t + \sum_{i = 1}^{d} X_{i t})}

where

X_{t} = (X_{1 t}, X_{2 t}, \dots, X_{d t})

.

the FBSDE set $d = 100, T = 0.5, x = 0 . 01_{d}$ , where $1_{d}$ is a d-dimensional vector with all elements having a value of one. The explicit solution of $Y_{0} = 0.5$ is a baseline for comparing our models.

3.1.1. Validation of Enhancement Strategies

The values for

d = 100

are calculated as shown in Table 3.

Very clearly, the above model and corresponding enhancement strategies are valid. In this case, the relative error significantly decreases for strategies 3 and 4, with a relative error of 0.04074 for the DBSDE extension method, while the relative errors for Strategy 3 and Strategy 4 are 0.00430 and 0.00542, respectively. Regarding the number of iterations required for convergence, the DBSDE extension method requires

1.38 \times 10^{4}

iterations, while strategy 3 only requires

1.15 \times 10^{4}

iterations. The other three enhancement strategies show varying degrees of decrease in the number of iterations required for convergence.

The trend plot of the results for

Y_{0}

is shown in Figure 8:

The trend graph for loss is shown in Figure 9:

From the results of

Y_{0}

and loss, it is easy to see that the four strategies show a clear downward trend and tend to be stable, with no overfitting, underfitting, or instability. They can achieve convergence in 14,000 iterations, and the results are valid.

It can be observed from Table 3 and Figure 8 and Figure 9 that all four strategies are effective. The convergence speeds of strategy 2, strategy 3, and strategy 4 are significantly better than that of strategy 1. This indicates that the strategies proposed in this paper, which involve using the equivalent stochastic optimal control problem to construct two separate neural networks for fitting

Y_{t}

and

Z_{t}

, substituting the BiGRU model for the feedforward neural network, and employing residual learning, can be applied effectively in the hybrid BiGRU model.

3.1.2. Results of the Hybrid BiGRU Model and the Extension of the DBSDE Method

From the numerical results in Table 4, it can be seen that both models are valid, but the relative error of the hybrid BiGRU model is significantly smaller, the relative error for the DBSDE extension method is 0.04074, while the relative error for the hybrid BiGRU model is only 0.00476. Additionally, the number of iterations required for convergence has decreased from the original

1.38 \times 10^{4}

to

1.05 \times 10^{4}

.

The trend plot of the results for

Y_{0}

is shown in Figure 10:

The trend plot of the results for loss is shown in Figure 11:

From the results of

Y_{0}

and loss, it can be seen that both models converge effectively, but the hybrid BiGRU model has relatively smaller losses and converges more efficiently. Meanwhile, from the results, it can be seen that a reasonable initialization of

Y_{0}

can effectively improve the efficiency of convergence and avoid falling into local optimal solutions. Since the Brownian motion data can be generated by the random wandering model, the effect of outliers can be temporarily disregarded.

3.2. Experiment 2

Assuming

t \in [0, T], x = (x_{1}, \dots, x_{d}) \in R^{d}, y \in R, z \in R^{d}

, consider the following fully coupled FBSDE:

\{\begin{matrix} d X_{i, t} = d exp (- \frac{1}{d} \sum_{i = 1}^{d} X_{i, t}) Z_{i, t} d B_{i, t}, \\ - d Y_{t} = - exp (- \frac{1}{d} \sum_{i = 1}^{d} X_{i, t}) (\sum_{i = 1}^{d} Z_{i, t}^{2}) d t - \sum_{i = 1}^{d} Z_{i, t} d B_{i, t}, \\ X_{0} = x, Y_{T} = exp (\frac{1}{d} \sum_{i = 1}^{d} X_{i, T}), \end{matrix}

The analytical solution to this FBSDE is as follows:

Y (t, x) = exp (\frac{1}{d} \sum_{i = 1}^{d} X_{i, t})

Let

T = 0.1, X_{0} = 1_{d}

and

d = 10, d = 50, d = 100

, then the analytical solution of

Y_{0}

is

Y_{0} \approx 2.7183

.

3.2.1. Validation of Enhancement Strategies

The numerical calculations for

d = 100

are as follows in Table 5:

The numerical results above illustrate that the relative errors of the four strategies are significantly smaller, The relative error for the DBSDE extension method is 0.066549, while the relative errors for all four enhancement strategies are no higher than 0.0022. In particular, the relative error for strategy 2 is only 0.001346, and for strategy 3, it is only 0.001460. There is a significant decrease in the relative error. Both the DBSDE extension method and several enhancement strategies exhibit high convergence rates. The number of iterations required for convergence is

1.25 \times 10^{4}

for the DBSDE extension method,

1.11 \times 10^{4}

for strategy 2, and

1.12 \times 10^{4}

for strategy 3. This represents a slight decrease compared to the DBSDE extension method, while the decrease in the number of iterations for the other two strategies is not significant.

The trend graphs for

Y_{0}

are shown in Figure 12:

The trend graphs for loss are shown in Figure 13:

From the trend graphs of

Y_{0}

as well as loss, it can be seen that the numerical results of the four strategies tend to be stable. The loss function exhibits a significant downward trend, and there is no problem with underfitting or overfitting. Additionally, the loss is smaller when strategy 2, strategy 3, and strategy 4 converge. Combined with the results of relative error, it shows that selecting strategy 2, strategy 3, and strategy 4 to construct the hybrid BiGRU model is reasonable.

3.2.2. Results of the Hybrid BiGRU Model and Extension of the DBSDE Method

The numerical results of the two models for

d = 100

are shown in Table 6:

From the above numerical results, compared to the relative error of 0.06549 for the DBSDE extension method, the relative error for the hybrid BiGRU model is significantly lower at 0.000850, indicating a substantial reduction. Additionally, in terms of the number of iterations required for convergence, the hybrid BiGRU model shows a decrease from the original of

1.25 \times 10^{4}

to

1.09 \times 10^{4}

, demonstrating a notable improvement. Overall, the hybrid BiGRU model performs well.

The trend graph for

Y_{0}

is shown in Figure 14:

The trend graph for loss is shown in Figure 15:

From the plots of

Y_{0}

as well as the trend change of loss, it can be seen that both models can converge effectively. However, the hybrid BiGRU model achieves significantly higher accuracy in predicting

Y_{0}

, and the value of the loss function at convergence is also significantly smaller. Combined with the results in Table 6, it can be concluded that the hybrid BiGRU model demonstrates relatively better performance in the experiment.

In order to compare the performance effects of different algorithms in different dimensions, we additionally select the dimensions of

d = 10, d = 50

, and

d = 100

for comparison, and the results are shown in Table 7:

A comprehensive analysis of the above results shows that both algorithms are effective in different dimensions.

In low dimensions, the results obtained by the hybrid BiGRU model and the extension of the DBSDE method do not differ much. When

d = 10

, the solutions obtained by the two methods are 2.71659 and 2.71843, with relative errors of

6.29 \times 10^{- 4}

and

4.78 \times 10^{- 5}

, respectively. However, when

d = 100

, the numerical solutions obtained by the two methods are 2.72427 and 2.71602, with relative errors of

2.196 \times 10^{- 3}

and

8.388 \times 10^{- 4}

, respectively. In high dimensions, the results of the hybrid BiGRU model are relatively better than those of the extension of the DBSDE method, indicating that the performance of the hybrid BiGRU model is more robust in high dimensions.

In addition, the error in equations is influenced by T, when T increases, the relative error of the algorithm may become larger. At this point, the hybrid BiGRU model will have relatively stable results due to its consideration of time dependence.

4. Discussion

In this paper, the hybrid BiGRU model is proposed to solve the high-dimensional fully coupled FBSDE. The constructed equivalent stochastic optimal process (7) can be simultaneously applied to different feedback forms by parameter tuning, which greatly improves the applicability range of the model. The hybrid BiGRU model is constructed by deeply analyzing the nature of the FBSDE, incorporating the time dimension, and taking into account the interactions between the time steps before and after the equations, as well as the long- and short-term dependencies. The internal structure of the individual time steps of the model is adjusted to further improve the stability of the model operation. At the same time, the hybrid BiGRU model will not be affected by the problems of “dimensional catastrophe”, “gradient vanishing” and “gradient explosion”. We attempted to extend the DBSDE method to the domain of fully coupled FBSDE high-dimensional solutions and refined the original model. Regarding the neural network solution of the FBSDE equation problem, we considered not only the general approximation theorem of neural networks, utilizing feedforward neural networks for solving, but also aimed to customize neural networks based on the equation’s inherent properties and structure.

In terms of the numerical experiments as a whole, first, we verify the effectiveness of the four lifting strategies, indicating that the lifting strategies in the BiGRU model are effective. Then, when comparing the performance of the two models, it can be seen that both models are effective in the high-dimensional case. However, the hybrid BiGRU model exhibits better accuracy, convergence speed, and robustness. Meanwhile, the hybrid BiGRU model has better performance when the dimension is elevated or T is increased. Regarding the influence of T on different methods, we will further investigate in our future work.

In future work, based on the property that the hybrid BiGRU model proposed in this paper can effectively deal with long- and short-term dependencies, we will attempt to extend the model to high-dimensional solutions of non-Markov-driven forward-inverse stochastic differential equations and explore the impact of the long-range memory on non-Markov-driven forward–backward stochastic differential equations. Meanwhile, from the perspective of time dependence, models used for solving high-dimensional PDEs continue to evolve, such as the DGM method, which has achieved good results in high-dimensional numerical PDE solving. In future work, we will consider extending neural network methods, such as DGM, for solving PDEs to the high-dimensional solution of FBSDEs. We will provide a rationale for the use of neural networks and compare the results obtained, continuously enriching the methods for neural network-based solving of high-dimensional FBSDE problems.

Author Contributions

Writing—original draft, M.W.; Supervision, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The

X_{0}

dataset is the initial condition presented in the equation. In accordance with our research protocol, we are unable to provide the specific Brownian motion data generated in Python. Brownian motion is a stochastic process characterized by random fluctuations, and each simulation may yield different results. However, the Python3.8 pseudocode provided in our methodology section outlines the procedure for generating Brownian motion data using random walks.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SDE	stochastic differential equation
BSDE	backward stochastic differential equation
FBSDE	forward–backward stochastic differential equation
DBSDE	Deep-BSDE
GRU	gated recurrent unit
BiGRU	bidirectional gated recurrent unit
LSTM	long short-term memory

References

Bismut, J.M. Conjugate convex functions in optimal stochastic control. J. Math. Anal. Appl. 1973, 44, 384–404. [Google Scholar] [CrossRef]
Pardoux, E.; Peng, S. Adapted solution of a backward stochastic differential equation. Syst. Control. Lett. 1990, 14, 55–61. [Google Scholar] [CrossRef]
Kohlmann, M.; Zhou, X.Y. Relationship between backward stochastic differential equations and stochastic controls: A linear-quadratic approach. Siam J. Control. Optim. 2000, 38, 1392–1407. [Google Scholar] [CrossRef]
Antonelli, F. Backward forward Stochastic Differential Equations; Purdue University: West Lafayette, IN, USA, 1993. [Google Scholar]
Ma, J.; Protter, P.; Yong, J. Solving forward-backward stochastic differential equations explicitly—A four step scheme. Probab. Theory Relat. Fields 1994, 98, 339–359. [Google Scholar] [CrossRef]
Hu, Y.; Peng, S. Solution of forward-backward stochastic differential equations. Probab. Theory Relat. Fields 1995, 103, 273–283. [Google Scholar] [CrossRef]
Peng, S.; Wu, Z. Fully coupled forward-backward stochastic differential equations and applications to optimal control. Siam J. Control. Optim. 1999, 37, 825–843. [Google Scholar] [CrossRef]
Hamadene, S. Backward—Forward SDE’s and stochastic differential games. Stoch. Process. Their Appl. 1998, 77, 1–15. [Google Scholar] [CrossRef]
Zhen, W. Maximum principle for optimal control problem of fully coupled forward-backward stochastic systems. Syst. Sci. Math. Sci. 1998, 11, 249–259. [Google Scholar]
Peng, S. A general stochastic maximum principle for optimal control problems. SIAM J. Control. Optim. 1990, 28, 966–979. [Google Scholar] [CrossRef]
Peng, S. Backward stochastic differential equations and applications to optimal control. Appl. Math. Optim. 1993, 27, 125–144. [Google Scholar] [CrossRef]
El Karoui, N.; Peng, S.; Quenez, M.C. Backward stochastic differential equations in finance. Math. Financ. 1997, 7, 1–71. [Google Scholar] [CrossRef]
Peng, S. Probabilistic interpretation for systems of quasilinear parabolic partial differential equations. Stochastics Stochastics Rep. (Print) 1991, 37, 61–74. [Google Scholar]
Pardoux, E.; Tang, S. Forward-backward stochastic differential equations and quasilinear parabolic PDEs. Probab. Theory Relat. Fields 1999, 114, 123–150. [Google Scholar] [CrossRef]
Peng, S. A linear approximation algorithm using BSDE. Pac. Econ. Rev. 1999, 4, 285–292. [Google Scholar] [CrossRef]
Bally, V. Approximation scheme for solutions of BSDE. Pitman Res. Notes Math. Ser. 1997, 364, 177–191. [Google Scholar]
Tang, T.; Zhao, W.; Zhou, T. Deferred correction methods for forward backward stochastic differential equations. Numer. Math. Theory, Methods Appl. 2017, 10, 222–242. [Google Scholar] [CrossRef]
Zhao, W.; Chen, L.; Peng, S. A new kind of accurate numerical method for backward stochastic differential equations. SIAM J. Sci. Comput. 2006, 28, 1563–1581. [Google Scholar] [CrossRef]
Spruyt, V. The curse of dimensionality in classification. Comput. Vis. Dummies 2014, 21, 35–40. [Google Scholar]
Cybenko, G. Approximation by superpositions of a sigmoidal function. Math. Control. Signals Syst. 1989, 2, 303–314. [Google Scholar] [CrossRef]
Han, J.; Jentzen, A. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Commun. Math. Stat. 2017, 5, 349–380. [Google Scholar]
Ji, S.; Peng, S.; Peng, Y.; Zhang, X. Three algorithms for solving high-dimensional fully coupled FBSDEs through deep learning. IEEE Intell. Syst. 2020, 35, 71–84. [Google Scholar] [CrossRef]
Epelbaum, T. Deep learning: Technical introduction. arXiv 2017, arXiv:1709.01412. [Google Scholar]
Cho, K.; Van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014, arXiv:1409.1259. [Google Scholar]
Han, J.; Long, J. Convergence of the deep BSDE method for coupled FBSDEs. Probab. Uncertain. Quant. Risk 2020, 5, 5. [Google Scholar] [CrossRef]
Sirignano, J.; Spiliopoulos, K. DGM: A deep learning algorithm for solving partial differential equations. J. Comput. Phys. 2018, 375, 1339–1364. [Google Scholar] [CrossRef]
Huré, C.; Pham, H.; Warin, X. Deep backward schemes for high-dimensional nonlinear PDEs. Math. Comput. 2020, 89, 1547–1579. [Google Scholar] [CrossRef]
Ma, J.; Yong, J. Approximate Solvability of Forward—Backward Stochastic Differential Equations. Appl. Math. Optim. 2002, 45, 1–22. [Google Scholar] [CrossRef]

Figure 1. Structure of the GRU model.

Figure 2. Structure of the DBSDE method.

Figure 3. Single time-step structure.

Figure 4. The structure of the feedforward neural network fitting is

Z_{t}

.

Figure 4. The structure of the feedforward neural network fitting is

Z_{t}

.

Figure 5. BiGRU model fit to the structure of

Z_{t}

.

Figure 5. BiGRU model fit to the structure of

Z_{t}

.

Figure 6. Feedforward neural network fitting the structure of

u_{t}, Z_{t}

.

Figure 6. Feedforward neural network fitting the structure of

u_{t}, Z_{t}

.

Figure 7. BiGRU fitting the structure of

u_{t}, Z_{t}

.

Figure 7. BiGRU fitting the structure of

u_{t}, Z_{t}

.

Figure 8.

Y_{0}

of four strategies.

Figure 8.

Y_{0}

of four strategies.

Figure 9. Loss of 4 strategies.

Figure 10.

Y_{0}

of 2 models.

Figure 10.

Y_{0}

of 2 models.

Figure 11. Loss of 2 models.

Figure 12.

Y_{0}

of 4 strategies.

Figure 12.

Y_{0}

of 4 strategies.

Figure 13. Loss of 4 strategies.

Figure 14.

Y_{0}

of 2 models.

Figure 14.

Y_{0}

of 2 models.

Figure 15. Loss of 2 models.

Table 1. Code 1.

Using $(X_{t}, Y_{t})$ as the feedback variable
Inputs: initial state x, Brownian motion increment $Δ B_{t_{i}}$ , learning rate $η$ etc.;
Output: $Y_{0}$
1. $X_{0} \leftarrow x$
2. $Y_{0} \leftarrow Y_{0}$
3. for $i = 0, 1, . . ., N - 1$ do
4. $Z_{i} \leftarrow {NN}_{z} (X_{i}, Δ B_{t_{i}}; θ_{i}^{z})$
5. $X_{i + 1} \leftarrow X_{i} + b (t_{i}, X_{i}, Y_{i}, Z_{i}) Δ t + σ (t_{i}, X_{i}, Y_{i}, Z_{i}) Δ B_{t}$
6. $Y_{i + 1} \leftarrow Y_{i} - f (t_{i}, X_{i}, Y_{i}, Z_{i}) Δ t + Z_{i} Δ B_{t}$
7. end for
8. $l o s s \leftarrow \frac{1}{M} \sum_{m = 1}^{M} {\|Y_{N} - g (X_{N})\|}^{2}$
9. $θ \leftarrow Adam (θ, \nabla l o s s)$
10: $Y_{0} = Adam (Y_{0}, \nabla loss)$
11. end for

where M is the number of tracks sampled, variables such as

Y_{0}

can be initialized, and so on.

Table 2. Code 2.

Using $X_{t}$ as the feedback variable
Inputs: initial state x, Brownian motion increment $Δ B_{t_{i}}$ , loss weights $λ$ , learning rate $η$ etc.
Output: $Y_{0}$
1. $X_{0} \leftarrow x$
2. $Y_{0} \leftarrow {NN}_{u} (X_{0}; Δ B_{t_{i}; θ_{0}})$
3. $losst \leftarrow 0$
4. for $i = 0, 1, . . ., N - 1$ do
5. $Z_{i} \leftarrow {NN}_{z} (X_{i}, Δ B_{t_{i}; θ_{i}^{z}})$
6. $u_{i} \leftarrow {NN}_{u} (X_{i}, Δ B_{t_{i}; θ_{i}^{u}})$
7. if $i = 0$ then
8. $Y_{i} \leftarrow u_{i}$
9. else
10. $Y_{i + 1} \leftarrow Y_{i} - f (t_{i}, X_{i}, Y_{i}, Z_{i}) Δ t + Z_{i} Δ B_{t}$
11. $X_{i + 1} \leftarrow X_{i} + b (t_{i}, X_{i}, u_{i}, Z_{i}) Δ t + σ (t_{i}, X_{i}, u_{i}, Z_{i}) Δ B_{t}$
12. $l o s s t \leftarrow l o s s t + {\|Y_{i} - u_{i}\|}^{2}$
13. end for
15. $Y_{N} \leftarrow Y_{N - 1} - f (t_{N - 1}, X_{N - 1}, Y_{N - 1}, Z_{N - 1}) δ t + Z_{N - 1} δ B_{t}$
16. $l o s s T \leftarrow \frac{1}{M} \sum_{m = 1}^{M} {\|Y_{N} - g (X_{N})\|}^{2}$
17. $loss \leftarrow lossT + λ losst$
18. $θ \leftarrow Adam (θ, \nabla loss)$
19. end for

Table 3. Extension of the DBSDE method and 4 strategies.

Strategy	Numerical Solution	Relative Error	Nb Iterations to cv.
Extension of DBSDE	0.52037	0.04074	$1.38 \times 10^{4}$
Strategy 1	0.53249	0.06498	$1.25 \times 10^{4}$
Strategy 2	0.48203	0.03594	$1.27 \times 10^{4}$
Strategy 3	0.49785	0.00430	$1.15 \times 10^{4}$
Strategy 4	0.50271	0.00542	$1.33 \times 10^{4}$

Table 4. Extension of the DBSDE method and hybrid BiGRU model.

Model	Numerical Solution	Relative Error	Nb Iterations to cv.
Extension of DBSDE	0.52037	0.04074	$1.38 \times 10^{4}$
Hybrid BiGRU model	0.49762	0.00476	$1.05 \times 10^{4}$

Table 5. Extension of the DBSDE method and 4 strategies.

Strategy	Numerical Solution	Relative Error	Nb Iterations to cv.
Extension of DBSDE	2.53740	0.066549	$1.25 \times 10^{4}$
Strategy 1	2.72427	0.002189	$1.27 \times 10^{4}$
Strategy 2	2.72198	0.001346	$1.11 \times 10^{4}$
Strategy 3	2.72229	0.001460	$1.12 \times 10^{4}$
Strategy 4	2.71323	0.001870	$1.28 \times 10^{4}$

Table 6. Extension of the DBSDE method and hybrid BiGRU model.

Model	Numerical Solution	Relative Error	Nb Iterations to cv.
Extension of DBSDE	2.53740	0.066549	$1.25 \times 10^{4}$
Hybrid BiGRU	2.71602	0.000850	$1.09 \times 10^{4}$

Table 7.

Y_{0}

of different dimensions.

Table 7.

Y_{0}

of different dimensions.

Algorithm	$d = 10$	$d = 50$	$d = 100$
Extension of DBSDE	2.71659	2.72246	2.72427
Hybrid BiGRU	2.71843	2.71948	2.71602

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, M.; Wang, X. Hybrid Neural Networks for Solving Fully Coupled, High-Dimensional Forward–Backward Stochastic Differential Equations. Mathematics 2024, 12, 1081. https://doi.org/10.3390/math12071081

AMA Style

Wang M, Wang X. Hybrid Neural Networks for Solving Fully Coupled, High-Dimensional Forward–Backward Stochastic Differential Equations. Mathematics. 2024; 12(7):1081. https://doi.org/10.3390/math12071081

Chicago/Turabian Style

Wang, Mingcan, and Xiangjun Wang. 2024. "Hybrid Neural Networks for Solving Fully Coupled, High-Dimensional Forward–Backward Stochastic Differential Equations" Mathematics 12, no. 7: 1081. https://doi.org/10.3390/math12071081

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid Neural Networks for Solving Fully Coupled, High-Dimensional Forward–Backward Stochastic Differential Equations

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.1.1. Forward–Backward Stochastic Differential Equation

2.1.2. Deep Learning and Neural Networks

2.1.3. The DBSDE Method and Its Generalization

2.2. Construction of Hybrid Models

2.2.1. Equivalent Stochastic Optimal Control

2.2.2. Neural Network Architecture

2.2.3. Internal Structure of Neural Networks

2.2.4. Construction of the Model

3. Results

3.1. Experiment 1

3.1.1. Validation of Enhancement Strategies

3.1.2. Results of the Hybrid BiGRU Model and the Extension of the DBSDE Method

3.2. Experiment 2

3.2.1. Validation of Enhancement Strategies

3.2.2. Results of the Hybrid BiGRU Model and Extension of the DBSDE Method

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI