1. Introduction
In contrast to numerical solvers for partial differential equations (PDEs), recent years have seen a growing trend of using neural networks (NNs) to approximate PDE solutions because of its grid-free scheme [
1,
2,
3], which, however, requires a large amount of data samples to train. Physics-informed neural networks (PINNs) have demonstrated their efficiency in training physics-uninformed neural networks to solve PDEs [
4]. However, a new NN has to be retrained each time a set of input initial conditions vary; thus, they lack generalization capability to a family of PDEs with different parameters. To tackle this challenge, Fourier neural operator (FNO) is developed to take in various initial conditions to train an NN that could predict PDE outputs with different conditions [
5,
6]. The rationale is to propagate information from initial or other boundary conditions by projecting them into a high dimensional space using Fourier transformation. Physics information can be further incorporated into FNO as a physics-informed neural operator (PINO) [
7]. PINO utilizes physics loss to train the neural operator, which could further reduce the required training data size by leveraging the knowledge of PDE formulations.
In contrast to the classical PDE systems, here, we focus on a more complex class of forward–backward PDE systems, which arise from the concept of game theory, in particular, mean field games (MFGs). MFGs are micro/macro games that aim to model the strategic interaction among a large amount of self-interested agents who make dynamic decisions (corresponding to the backward PDE), while a population distribution is propagated to represent the state of interacting individual agents (corresponding to the forward PDE) [
8,
9,
10,
11]. The equilibrium of MFGs, so-called mean field equilibria (MFE), is characterized by two PDEs, as follows:
(1) Agent dynamic: An individuals’ dynamics using optimal control, i.e., a backward Hamilton–Jacobi–Bellman (HJB) equation, solved backward using dynamic programming, given the terminal state. (2) Mass dynamic: A system evolution arising from each individual’s choices, i.e., a forward Fokker–Planck–Komogorov (FPK) equation, solved forward when provided the initial state, representing agents’ anticipation of other agents’ choices and future system dynamics.
MFE is challenging to solve due to the coupled forward and backward structure of these two PDE systems. Therefore, researchers seek various machine learning methods, including reinforcement learning (RL) [
12,
13,
14,
15,
16,
17,
18,
19,
20], adversarial learning [
21], and PINNs [
22,
23,
24,
25]. Unlike traditional methods, which may require fine discretization of the problem space leading to high computational loads, learning-based approaches can learn to approximate solutions in a continuous domain. However, once trained, it can be time-consuming and memory-demanding for these learning tools to adapt to changes in different conditions. Specifically, each unique initial condition may require the assignment and retraining of a dedicated neural network to obtain the corresponding MFE. Motivated by the MFG framework, this paper aims to tackle the problem of solving a set of coupled forward–backward PDE systems with arbitrary initial conditions. To achieve this goal, we train a PINO, which utilizes a Fourier neural operator (FNO) to establish a functional mapping to approximate the solution to a family of the coupled PDE system with various boundary conditions. After training PINO properly, we can obtain the MFE under different initial densities and terminal costs.
The rest of this paper is organized as follows:
Section 2 presents preliminaries about the coupled PDE system and PINO.
Section 3 proposes a PINO learning framework for coupled PDEs.
Section 4 presents the solution approach.
Section 5 demonstrates numerical experiments.
Section 6 concludes the study.
3. Learning ST-MFGs via PINOs
We introduce our PINO framework to capture population dynamics and system value functions. The dynamics of ST-MFGs are influenced by initial densities and terminal value functions, and bounded by the periodicity of
x on the ring road, i.e.,
,
,
.
Figure 1 depicts the workflow of our proposed framework, where the PINO module employs a Fourier neural operator (FNO) to represent the population density
and value functions
over time. The FNO is updated according to the physical residual that adheres to the Fokker–Planck–Kolmogorov (FPK) equation. This rule elucidates the interplay between population evolution and velocity control. The FNO also expresses the propagation of the system value function according to the Hamilton–Jacobi–Bellman (HJB) equation. Then, the optimal velocity, represented as
, is derived from the HJB, considering the given population density
. We further elaborate PINO framework in the following parts.
3.1. FPK Module
As shown in
Figure 1, the initial density
over the space domain
in ST-MFG serves as the FPK module input for PINO. It is linearly merged with the HJB module input and processed by the FNO. The output of the FPK module is the population density over a spatiotemporal domain
. FNO is composed of
N Fourier layers. Each layer performs a Fourier transform
to capture frequency–domain features by decomposing the signals into Eigenmodes. Following a linear transformation
l to eliminate the high-frequency signals, an inverse Fourier transform
is applied to recover them. Furthermore, for each layer, a linear transformation
is also applied to capture time–domain features. The combined output from these transformations is then passed through a nonlinear activation function
for the subsequent Fourier layers.
The training of the FNO for the FPK module follows the residual (marked in red) determined by the FPK equation of population evolution. Mathematically, the residual
is calculated as
where the set
is the training set containing various initial densities.
is the physics loss, which is calculated as
The first term of the physics loss function measures the gap between the operator’s output at time zero and the initial density. Meanwhile, the second term measures the physical deviation using Equation (
3). Weight parameters, denoted as
and
, are employed to tune the relative significance of these terms. The process culminates in the derivation of the optimal control,
u, which is obtained from the HJB module.
3.2. HJB Module
We solve the HJB equation (Equation (5)) to determine the optimal velocity, given the population dynamics. Since HJB is an inverse process of FPK, and the propagation rules of FPK and HJB are the same, we use the same FNO as the FPK module. Combined with the initial density , the terminal costs of the ST-MFG system is input into the FNO to obtain the costs of spatiotemporal domain .
The training of FNO for the HJB module follows the residual (marked in red) determined by the HJB equation. Mathematically, the residual
is calculated as
where the set
is the training set containing various terminal values over the spatial domain
.
is the physics loss, which is calculated as
The first term of the physics loss function measures the discrepancy between the operator’s output in the end and the terminal values. The second term measures the physical deviation using Equation (5). Weight parameters, denoted as and , are employed to tune the relative significance of these terms. Thus, the total residual of FNO is .
We calculate the optimal control
after obtaining
and
. Numerical methods commonly employed for solving the HJB equation include backward induction [
16], the Newton method [
26], and variational inequality [
27]. Learning-based methods, such as RL [
28,
29] and PIDL [
25], can also be used for solving the HJB equation. In this work, we adopt backward induction since the dynamics of the agents and the cost functions are known in the MFG system.
4. Solution Approach
We develop Algorithm 1 based on the proposed scalable learning framework in the autonomous driving velocity control scenario. A population of autonomous vehicles (AVs) navigate a ring road (
Figure 2). At the time
t, a generic AV selects
at position
x. At a given moment
t, a typical autonomous vehicle (AV) selects a velocity
at a specific location
x on the ring road. This decision regarding speed selection influences the dynamic evolution of the population density of AVs circulating on the ring road. In this model, all AVs are considered to be homogeneous, implying that any decision made by one AV could be representative of decisions made by others in the same conditions. The primary objective of this study is to devise an optimal speed control strategy that minimizes the overall costs incurred during the specified time interval
for a representative AV. Through the application of mathematical models and optimization techniques, it seeks to determine the most cost-effective speed profiles that can lead to an optimal distribution of AVs on the road, reducing congestion and improving the collective utility of the transportation system.
In Algorithm 1, we first initialize the neural operator
, parameterized by
. During the
ith iteration of the training process, we first sample a batch of initial population densities
and terminal costs
. We use FNO to generate the population density and value function over the entire spatiotemporal domain
and
. According to
and
, we calculate the optimal speed
. The parameter
of the neural operator is updated according to the residual. We check the following convergence conditions for
and
obtained by
.
The training process moves on to the next iteration until the convergence condition holds.
Algorithm 1 PINO for ST-MFG |
- 1:
Initialization: FNO: ; - 2:
for to I do - 3:
Sample a batch of initial densities and terminal values from the training set; - 4:
Generate and using the neural operator ; - 5:
for each and generated by FNO do - 6:
for to 1 do - 7:
; - 8:
end for - 9:
Obtain and calculate ; - 10:
end for - 11:
Obtain residual according to Equations ( 7) and ( 9); - 12:
Update the neural operator to obtain according to Equations ( 8) and ( 10); - 13:
Check convergence (Equation ( 11)). - 14:
end for - 15:
Output , ,
|
We evaluate two PINN-based algorithms for solving ST-MFGs, as detailed in [
25]. The first, a hybrid RL and PIDL approach, employs an iterative process where the HJB equation is solved via the advantage actor-critic method, and the FPK equation is addressed using a PINN. This approach allows for the dynamic updating of agents’ actions and the population’s distribution in response to evolving game conditions, leveraging the strengths of both RL and PIDL. The second is a pure PIDL algorithm that updates agents’ states and population density altogether using two PINNs. We refer to these baseline algorithms as “RL-PIDL” and “Pure-PIDL”, respectively.
Both algorithms, RL-PIDL and Pure-PIDL, offer significant advantages over conventional methods [
26,
27] in the context of autonomous driving MFGs. Notably, they are not restricted by the spatiotemporal mesh granularity, enhancing their ability to learn the MFE efficiently. Pure-PIDL, in particular, demonstrates superior efficiency in scenarios where the dynamics of the environment are known, requiring less training time and space compared to RL-PIDL. However, despite their advancements over traditional numerical methods, both RL-PIDL and Pure-PIDL encounter challenges concerning the propagation of information from initial conditions. This limitation necessitates the assignment and retraining of new NNs for various initial population densities, a process that significantly hampers the efficiency and scalability [
7]. Conversely, our PINO framework effectively addresses these limitations, avoiding the memory and efficiency constraints observed in the baseline models.
5. Numerical Experiments
We conduct numerical experiments in the scenario shown in
Figure 2, which are designed to explore the dynamics of a MFG focused on AV navigating a circular road network, a scenario framed by specific initial and terminal conditions. The initial condition for this MFG is the distribution of AV population
over the ring road when
and the preference
for their location when
. To model the initial distribution
of AVs, we utilize sinusoidal wave functions characterized by the formula
, where
k represents the ordinary frequency and
denotes the phase of the wave. Both
k and
are randomly selected from uniform distributions, specifically
for
k and
for
. This approach ensures a wide range of sinusoidal patterns, reflecting diverse initial traffic distributions, as illustrated in
Figure 3, which shows a selection of initial density curves drawn from our training set
. We assume AVs have no preference for their locations at time
T, i.e.,
. The boundary condition for this MFG is the periodicity of
x.
Figure 4 demonstrates the performance of the algorithm in solving ST-MFGs. The
x-axis represents the iteration index during training.
Figure 4a displays the convergence gap, calculated as
.
Figure 4b displays the 1-Wasserstein distance (W1-distance), which measures the closeness [
30] between our results and the MFE (mean field equilibrium) obtained by numerical methods, represented as
. Our proposed algorithm converges to the MFE after 200 iterations.
Figure 5 and
Figure 6 demonstrate the population density
and velocity choice
at MFE for an unseen initial density in two different ST-MFG settings. We implement our methods on the ST-MFG with 2 cost functions:
ST-MFG1: The cost function is
In this model, AVs tend to decelerate in high-density areas and accelerate in low-density areas.
- 2.
ST-MFG2: The Lighthill–Whitham–Richards model is a traditional traffic flow model where the driving objective is to maintain some desired speed. The cost function is
where,
is an arbitrary desired speed function with respect to density
. It is straightforward to find that the analytical solution of the LWR model is
, which means that at MFE, vehicles maintain the desired speed on roads.
Figure 5.
Population density and speed choice at MFE for ST-MFG 1.
Figure 5.
Population density and speed choice at MFE for ST-MFG 1.
Figure 6.
Population density and speed choice at MFE for ST-MFG 2.
Figure 6.
Population density and speed choice at MFE for ST-MFG 2.
The population of AVs incurs a penalty in ST-MFG 1 if they select the same velocity control. The
x-axis represents position
x, and the
y-axis represents time
t.
Figure 5a,b and
Figure 6a,b have an initial density with
.
Figure 5c,d and
Figure 6c,d have an initial density with
. Compared to the equilibrium
in ST-MFG 2, the population density and velocity in ST-MFG 1 dissipate without waveforms, demonstrating smoother traffic conditions at time
T.
Table 1 compares the total training time (unit: s) required for solving ST-MFGs with various initial conditions with different learning methods.
We summarize the results from our numerical experiments as follows:
(1) Our proposed PINO needs fewer neural networks (NNs) and shorter training time compared to PINN-based algorithms, demonstrating the scalability of our method. This is because the input of neural architecture in the PINN framework fails to capture various boundary conditions, which leads to an iterative training process to solve MFEs. In a comparison to PINN, PINO is scalable to initial population density along with terminal cost among the space domain.
(2) In this study, the PINO-based learning method is designed to significantly enhance the adaptability of neural networks across various initial conditions. This advancement is pivotal for tackling large-scale Mean Field Games (MFGs), especially within graph-based frameworks. The applications of this method are broad and diverse, encompassing, but not limited to, managing autonomous vehicles on road networks, analyzing pedestrian or crowd movements, optimizing vehicle fleet network operations, routing internet packets efficiently, understanding social opinion trends, and tracking epidemiological patterns.
6. Conclusions
This work presents a scalable learning framework for solving coupled forward–backward PDE systems using a physics-informed neural operator (PINO). PINO allows for efficient training of the forward PDE with varying initial conditions. Compared to traditional physics-informed neural networks (PINNs), our proposed framework overcomes memory and efficiency limitations. We also demonstrate the efficiency of this method on a numerical example motivated by optimal autonomous driving control. The PINO-based framework offers a memory- and data-efficient approach for solving complex PDE systems with generalizability similar to PDE systems, differing only in boundary conditions.
This work is motivated by the computational challenge faced by the mean field game (MFG). MFGs have gained increasing popularity in recent years in finance, economics, and engineering due to its power to model the strategic interactions among a large number of agents in multi-agent systems. The equilibria associated with MFGs, also known as mean field equilibra (MFE), are challenging to solve due to their coupled forward and backward PDE structure. That is why computational methods based on machine learning have gained momentum. The PINO-based learning method developed in this study empowers the generalization of the trained neural networks to various initial conditions, which holds the potential to solve large-scale MFGs, in particular, graph-based applications, including but not limited to optimization of autonomous vehicle navigation in complex road networks, the management of pedestrian or crowd movements in urban environments, the efficient coordination of vehicle fleet networks, the strategic routing of Internet packets to enhance network efficiency, the analysis of social opinion dynamics to understand societal trends, and the study of epidemiological models to predict the spread of diseases.