Fourier Neural Solver for Large Sparse Linear Algebraic Systems

Cui, Chen; Jiang, Kai; Liu, Yun; Shu, Shi

doi:10.3390/math10214014

Open AccessArticle

Fourier Neural Solver for Large Sparse Linear Algebraic Systems

by

Chen Cui

,

Kai Jiang

^*,

Yun Liu

and

Shi Shu

^*

Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, School of Mathematics and Computational Science, Xiangtan University, Xiangtan 411105, China

^*

Authors to whom correspondence should be addressed.

Mathematics 2022, 10(21), 4014; https://doi.org/10.3390/math10214014

Submission received: 23 September 2022 / Revised: 22 October 2022 / Accepted: 25 October 2022 / Published: 28 October 2022

(This article belongs to the Special Issue Computational Intelligence: Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Large sparse linear algebraic systems can be found in a variety of scientific and engineering fields and many scientists strive to solve them in an efficient and robust manner. In this paper, we propose an interpretable neural solver, the Fourier neural solver (FNS), to address them. FNS is based on deep learning and a fast Fourier transform. Because the error between the iterative solution and the ground truth involves a wide range of frequency modes, the FNS combines a stationary iterative method and frequency space correction to eliminate different components of the error. Local Fourier analysis shows that the FNS can pick up on the error components in frequency space that are challenging to eliminate with stationary methods. Numerical experiments on the anisotropic diffusion equation, convection–diffusion equation, and Helmholtz equation show that the FNS is more efficient and more robust than the state-of-the-art neural solver.

Keywords:

Fourier neural solver; fast Fourier transform; local Fourier analysis; convection–diffusion–reaction equation

MSC:

65F10; 65N22; 68T07; 35Q68

1. Introduction

Large sparse linear algebraic systems are ubiquitous in scientific and engineering computation, such as discretization of partial differential equations (PDE) and linearization of non-linear problems. Designing efficient, robust, and adaptive numerical methods for solving them is a long-term challenge. Iterative methods are an effective way to resolve this issue. They can be classified into single-level and multi-level methods. There are two types of single-level methods: stationary and non-stationary [1]. Due to sluggish convergence, stationary methods, such as weighted Jacobi, Gauss–Seidel and successive over-relaxation methods [2] are frequently utilized as smoothers in multi-level approaches or as preconditioners. Non-stationary methods typically refer to Krylov subspace methods, such as conjugate gradient (CG) and generalized minimal residual (GMRES) methods [3,4], whose convergence rate is heavily influenced by certain factors, such as the initial value. Multi-level methods mainly comprise the geometric multigrid (GMG) method [5,6,7] and the algebraic multigrid (AMG) method [8,9]. They are both affected by many factors, such as smoother and coarse grid correction, which heavily affect convergence. Identifying these factors for a concrete problem is an art that requires extensive analysis, innovation, and trial.

In recent years, the technique of automatically picking parameters for Krylov and multi-level methods and constructing a learnable iterative scheme based on deep learning has attracted much interest. Many neural solvers have achieved satisfactory results for second-order elliptic equations with smooth coefficients. Hsieh et al. [10] utilized a convolutional neural network (CNN) to accelerate convergence of the Jacobi method. Luna et al. [11] accelerated the convergence of GMRES with a learned initial value. Zhang et al. [12] combined standard relaxation methods and the DeepONet [13] to target distinct regions in the spectrum of eigenmodes. Significant efforts have also been made in the development of multigrid solvers, such as the learning smoother, the transfer operator [14,15] and coarse-fine splitting [16].

Huang et al. [17] exploited a CNN to design a more sensible smoother for anisotropy elliptic equations. The results showed that the magnitude of the learned smoother was dispersed along the anisotropic direction. Wang et al. [18] introduced a learning-based local weighted least square method for the AMG interpolation operator and applied it to random diffusion equations and one-dimensional small wavenumber Helmholtz equations. Fanaskov [19] produced the learned smoother and transfer operator of GMG in a neural network form.

When the anisotropic strength is mild (within two orders of magnitude), the studies referred to evidence considerable acceleration. Chen et al. [20] proposed the Meta-MgNet to learn a basis vector of Krylov subspace as the smoother of GMG for strong anisotropic cases. However, the convergence rate was still sensitive to the anisotropic strength. For convection–diffusion equations, Katrutsa et al. [21] trained the weighted Jacobi smoother and transfer operator of GMG, which had a positive effect on the upwind discretization system and was also applied to solve a one-dimensional Helmholtz equation. For second-order elliptic equations with random diffusion coefficients, Greenfeld et al. [22] employed a residual network to construct the prolongation operator of AMG for uniform grids. Luz et al. [23] extended it to non-uniform grids using graph neural networks, which outperformed classical AMG methods. For jumping coefficient problems, Antonietti et al. [24] presented a neural network to forecast the strong connection parameter to speed up AMG and used it as a preconditioner for CG. For the Helmholtz equation, Stanziola et al. [25] constructed a fully learnable neural solver, the helmnet, which was built on U-net and a recurrent neural network [26]. Azulay et al. [27] developed a preconditioner based on U-net and shift-Laplacian MG [28] and applied the flexible GMRES [29] to solve the discrete system. For solid and fluid mechanics equations, several neural solver methods for associated discrete systems have been proposed, such as, but not limited to, learning initial values [30,31], constructing preconditioners [32], learning the search directions of CG [33], and learning the parameters of GMG [34,35].

In this paper, we propose a Fourier neural solver (FNS), a deep learning and fast Fourier transform (FFT)-based [36] neural solver. The FNS is made up of two modules: a stationary method and a frequency space correction. Since stationary methods, such as the weighted Jacobi method, have difficulty eliminating low-frequency error, the FNS uses FFT and CNN to learn these modes in the frequency space. Local Fourier analysis (LFA) [5] has shown that the FNS can pick up on the error components in frequency space that are challenging to eradicate using stationary methods. The FNS builds a complementary relationship with the stationary method and CNN to eliminate error. With the help of FFT, the single-step iteration of the FNS has a

O (N {log}_{2} N)

computational complexity. All matrix-vector products are implemented using convolution, which is both storage-efficient and straightforward to parallelize. We investigated the effectiveness and robustness of the FNS on three types of convection–diffusion–reaction equations. For anisotropic diffusion equations, numerical experiments showed that the FNS was able to reduce the number of iterations by nearly 10-times compared to the state-of-the-art Meta-MgNet when the anisotropic strength was relatively strong. For non-symmetric systems arising from the convection–diffusion equation discretized by the central difference method, the FNS can converge, while MG and CG methods diverge. In addition, FNS is faster than other algorithms, such as GMRES and BiCGSTAB(ℓ) [37]. For indefinite systems arising from the Helmholtz equation, the FNS outperforms GMRES and BiCGSTAB for medium wavenumbers. In this paper, we apply the FNS to the above three PDE systems. However, the principles employed by the FNS indicate that the FNS has the potential to be useful for a broad range of sparse linear algebraic systems.

The remainder of this paper is organized as follows: Section 2 proposes a general form of linear convection–diffusion–reaction equation and describes the motivation for designing the FNS. Section 3 presents the FNS algorithm. Section 4 examines the performance of the FNS with respect to anisotropy, convection–diffusion, and the Helmholtz equations. Finally, Section 5 describes the conclusions and potential future work.

2. Motivation

We consider the general linear convection–diffusion–reaction equation with a Dirichlet boundary condition

\{\begin{matrix} - ε \nabla \cdot (α (x) \nabla u) + \nabla \cdot (β (x) u) + γ u = f (x) & in Ω \\ u (x) = g (x) & on \partial Ω \end{matrix}

(1)

where

Ω \subseteq R^{d}

is an open and bounded domain.

α (x)

is the

d \times d -

order diffusion coefficient matrix.

β (x)

is the

d \times 1

velocity field that the quantity is moving with.

γ

is the reaction coefficient. f is the source term.

We can obtain a linear algebraic system once we discretize Equation (13) by the finite element method (FEM) or finite difference method (FDM)

A u = f,

(2)

where

A \in R^{N \times N}

,

f \in R^{N}

and N is the spatial discrete degrees of freedom.

Classical stationary iterative methods, such as Gauss–Seidel and weighted Jacobi methods, have the generic form

u^{k + 1} = u^{k} + B (f - A u^{k}),

(3)

where

B

is an easily computed operator, such as the inverse of the diagonal matrix (Jacobi method) or the inverse of the lower triangle matrix (Gauss–Seidel method). However, the convergence rate of such methods is relatively low. As an example, we utilize the weighted Jacobi method to solve a special case of Equation (1) and use LFA to analyze the rationale.

Taking

ε = 1, α (x) = (\begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix})

,

β (x) = (\begin{matrix} 0 \\ 0 \end{matrix})

and

γ = 0

, Equation (1) becomes the Poisson equation. With a linear FEM discretization, in stencil notation, the resulting discrete operator reads

[\begin{matrix} - 1 \\ - 1 & 4 & - 1 \\ - 1 \end{matrix}] .

(4)

In the weighted Jacobi method,

B = ω I / 4

, where

ω \in (0, 1]

and

I

is the identity matrix. Equation (3) can be written in the pointwise form

u_{i j}^{k + 1} = u_{i j}^{k} + \frac{ω}{4} (f_{i j} - (4 u_{i j}^{k} - u_{i - 1, j}^{k} - u_{i + 1, j}^{k} - u_{i, j - 1}^{k} - u_{i, j + 1}^{k})) .

(5)

Let

u_{i j}

be the true solution and define error

e_{i j}^{k} = u_{i j} - u_{i j}^{k}

. Then, we have

e_{i j}^{k + 1} = e_{i j}^{k} - \frac{ω}{4} (4 e_{i j}^{k} - e_{i - 1, j}^{k} - e_{i + 1, j}^{k} - e_{i, j - 1}^{k} - e_{i, j + 1}^{k}) .

(6)

Expanding error in a Fourier series

e_{i j}^{k} = \sum_{p_{1}, p_{2}} v^{k} e^{i 2 π (p_{1} x_{i} + p_{2} y_{j})}

, substituting the general term

v^{k} e^{i 2 π (p_{1} x_{i} + p_{2} y_{j})}

,

p_{1}, p_{2} \in [- N / 2, N / 2)

into Equation (6), we have

\begin{matrix} v^{k + 1} & = v^{k} (1 - \frac{ω}{4} (4 - e^{- i 2 π p_{1} h} - e^{i 2 π p_{1} h} - e^{- i 2 π p_{2} h} - e^{- i 2 π p_{2} h})) \\ = v^{k} (1 - \frac{ω}{4} (4 - 2 cos (2 π p_{1} h) - 2 cos (2 π p_{2} h))) . \end{matrix}

The convergence factor of the weighted Jacobi method (also known as the smoother factor in the MG framework [7]) is

μ_{loc} : = |\frac{v^{k + 1}}{v^{k}}| = |1 - ω + \frac{ω}{2} (\cos (2 π p_{1} h) + cos (2 π p_{2} h))| .

(7)

Figure 1a shows the distribution of the convergence factor

μ_{loc}

of weighted Jacobi (

ω = 2 / 3

) in solving a linear system for the Poisson equation. For a better understanding, let

θ_{1} = 2 π p_{1} h

,

θ_{2} = 2 π p_{2} h

,

θ = (θ_{1}, θ_{2}) \in {[- π, π)}^{2}

. Define the high and low-frequency regions

\begin{matrix} T^{low} : = {[- \frac{π}{2}, \frac{π}{2})}^{2}, \\ T^{high} : = {[- π, π)}^{2} \ {[- \frac{π}{2}, \frac{π}{2})}^{2}, \end{matrix}

(8)

as shown in Figure 1b. It can be seen that, in the high-frequency region,

μ_{loc}

is approximately zero, whereas, in the low-frequency region,

μ_{loc}

is close to one. As a result, the weighted Jacobi method attenuates high-frequency errors quickly but is mostly ineffective for low-frequency errors.

The explanation has two aspects. Firstly, the high-frequency reflects the local oscillation, while the low-frequency reflects the global pattern. Since

A

is sparse and the basic operation

Au

of the weighted Jacobi method is a local operation, it is challenging to remove low-frequency global error components. Secondly,

A

is sparse and

A^{- 1}

is commonly dense, which means that the mapping

f \to A^{- 1} f

is non-local, making local operations difficult to approximate.

Therefore, we should seek the solution in another space to obtain an effective approximation of the non-local mapping. For example, the Krylov method approximates the solution in a subspace spanned by a basis set. The MG generates a coarse space to broaden the receptive field of the local operation. However, as mentioned in Section 1, these methods require the careful design of various parameters. In this paper, we propose the FNS, a generic solver that uses FFT to learn solutions in frequency space, with the parameters obtained automatically in a data-driven manner.

3. Fourier Neural Solver

Denote stationary iterative methods of (3) in an operator form

v^{k + 1} = Φ (u^{k}),

(9)

and the

k -

th step residual

r^{k} : = f - A v^{k + 1},

(10)

then the

k -

th step error

e^{k} : = u - v^{k + 1}

satisfies residual equation

A e^{k} = r^{k} .

(11)

As shown in the preceding section, the slow convergence rate of stationary methods is due to the difficulty of reducing low-frequency errors. Even high-frequency errors might not be effectively eliminated by

Φ

in many cases. We employ stationary methods to rapidly erase some components of the error and use FFT to convert the remaining error components to frequency space. The resulting solver is the Fourier neural solver.

Figure 2 shows a flowchart of the

k -

th step for the FNS. The module for solving the residual equation in frequency space is denoted as

H

. It consists of three steps: FFT→Hadamard product→IFFT. The parameter

ϑ

of

H

is the output of the hyper-neural network (HyperNN). The input

η

of the HyperNN are the PDE parameters corresponding to the discrete systems. During training, the only parameter

θ

of the HyperNN serves as the optimization parameter.

The three-step operation of

H

was inspired by the fast Poisson solver [38]. Let eigenvalues and eigenvectors of

A

be

λ_{1}, \dots, λ_{N}

and

q_{1}, \dots, q_{N}

, respectively. Solving Equation (2) entails three steps:

Expand $f$ as a combination $f = a_{1} q_{1} + \dots + a_{N} q_{N}$ of the eigenvectors
Divide each $a_{k}$ by $λ_{k}$
Recombine eigenvectors into $u = (a_{1} / λ_{1}) q_{1} + \dots + (a_{N} / λ_{N}) q_{N}$ .

In particular, when

A

is the system generated by a five-point stencil (4), its eigenvector

q_{k}

is the sine function. The first and third steps above can now be performed with a computational complexity of

O (N {log}_{2} N)

using fast sine transform (based on the FFT). The computational complexity of each iteration of the FNS is

O (N {log}_{2} N)

.

It is worth noting that, although

Φ

can smooth some components of the error, the components that are removed are indeterminate. As a result, instead of filtering high-frequency modes in frequency space,

H

learns the error components that

Φ

cannot easily eliminate. For

Φ

with a fixed stencil, we can use LFA to demonstrate that the learned

H

is complementary to

Φ

.

The loss function used here for training is the relative residual

L = \sum_{i = 1}^{N_{b}} \frac{∥ f_{i} - A_{i} u_{i}^{K} ∥_{2}}{∥ f_{i} ∥_{2}},

(12)

where

{A_{i}, f_{i}}

are the training data.

N_{b}

is the batch size. K indicates that the

K -

th step solution

u^{K}

is used to calculate the loss. These specific values will be given in the next section. The training and testing algorithms of the FNS are summarized in Algorithms 1 and 2, respectively.

Algorithm 1: FNS offline traning.
	Data: PDE parameters ${η_{i}}_{i = 1}^{N_{t r a i n}}$ and corresponding discrete systems ${A_{i}, f_{i}}_{i = 1}^{N_{t r a i n}}$
	Input: $Φ$ , HyperNN( $θ$ ), K and Epochs
1	forepoch $= 1, \dots,$ Epochsdo serial
2		for $i = 1, \dots, N_{t r a i n}$ do parallel
3			$ϑ_{i} =$ HyperNN $(η_{i}, θ)$
4			$u_{i}^{0} =$ zeros like $f_{i}$
5			for $k = 0, \dots, K - 1$ do serial
6				$v_{i}^{k + 1} = Φ (u_{i}^{k})$
7				$r_{i}^{k} = f_{i} - A_{i} v_{i}^{k + 1}$
8				${\hat{r}}_{i}^{k} = F (r_{i}^{k})$
9				${\hat{e}}_{i}^{k} = {\hat{r}}_{i}^{k} \circ ϑ_{i}$
10				$e_{i}^{k} = F^{- 1} (e_{i}^{k})$
11				$u_{i}^{k + 1} = v_{i}^{k + 1} + e_{i}^{k}$
12			end
13		end
14		Compute loss function (12)
15		Apply Adam algorithm [39] to update parameters $θ$
16	end
17	returnlearned FNS

Algorithm 2: FNS online testing.

4. Numerical Experiments

We used the anisotropic diffusion equation, the convection–diffusion equation, and the Helmholtz equation as examples to demonstrate the performance of the FNS. In all experiments, the matrix-vector products were implemented by CNN based on the Pytorch [40] platform. All the code can be found at https://github.com/cuichen1996/FourierNeuralSolver (accessed on 13 September 2022).

4.1. Anisotropic Diffusion Equation

Consider the anisotropic diffusion equation

\{\begin{matrix} - \nabla \cdot (C \nabla u) & = f, in Ω, \\ u & = 0, on \partial Ω, \end{matrix}

(13)

the diffusion coefficient matrix

C = C (ξ, θ) = (\begin{matrix} cos θ & - sin θ \\ sin θ & cos θ \end{matrix}) (\begin{matrix} 1 & 0 \\ 0 & ξ \end{matrix}) (\begin{matrix} cos θ & sin θ \\ - sin θ & cos θ \end{matrix}),

(14)

0 < ξ < 1

is the anisotropic strength and

θ \in [0, π]

is the anisotropic direction,

Ω = {(0, 1)}^{2}

. We use bilinear FEM to discretize (13) with a uniform

n \times n

quadrilateral mesh. The associated discrete system is shown in (2), where

N = (n - 1) \times (n - 1)

. We performed experiments for the two cases described below.

4.1.1. Case 1: Generalization Ability of Anisotropic Strength with Fixed Direction

In this case, we use the same training and testing data as [20]. For fixed

θ = 0

, we randomly sample 20 distinct parameters

ξ

from the distribution

{log}_{10} \frac{1}{ξ} \sim U [0, 5]

and obtain

{A_{i}}_{i = 1}^{20}

by discretizing (13) using bilinear FEM with

n = 256

. We randomly select 100 right-hand functions for each

A_{i}

. Each entry of

f

is sampled from the Gaussian distribution

N (0, 1)

. Therefore, there are

N_{t r a i n} = 2000

training data. The hyperparameters used for training, including batch size, learning rate, K in the loss function, and the concrete network structure of HyperNN are listed in Appendix A.1.

The FNS can take various kinds of

Φ

. In this case, we use weighted Jacobi, Chebyshev semi-iterative (Cheby-semi) and Krylov methods. The weight of the weighted Jacobi method is

2 / 3

. We use a DenseNet [41] to give the basis vector of the Krylov subspace, then approximate the solution in this subspace by least squares as in [20]. For the Chebyshev semi-iterative method, we provide a brief summary here. More details can be obtained in [42,43].

If we vary the parameter of the Richardson iteration at each step

u^{k + 1} = u^{k} + τ_{k} (f - A u^{k}),

(15)

and the maximum and minimum eigenvalues of

A

are known, then

τ_{k}

can be determined as

τ_{k} = \frac{2}{λ_{max} + λ_{min} - (λ_{min} - λ_{max}) x_{k}}, k = 0, \dots, m - 1,

(16)

where

x_{k} = cos \frac{π (2 k + 1)}{2 n}, k = 0, \dots, m - 1,

(17)

are the roots of an

m -

order Chebyshev polynomial. Here,

λ_{max}

is obtained by the power method [44], but calculating

λ_{min}

often incurs an expensive computational cost. Therefore, we use

λ_{max} / α

to replace

λ_{min}

. The resulting method is referred to as a Chebyshev semi-iteration. We take

m = 10, α = 3

. Figure 3 shows the convergence factor obtained by LFA. It can be seen that the smooth effect improves as m increases. However, the high-frequency error along the y direction is also difficult to eliminate. When

Φ

is the Jacobi method, we implement

Φ

five-times and transform the residual to frequency space to correct errors. This is because employing the stationary method many times can enhance its smoothing effect.

After training, we choose

θ = 0, ξ = 10^{- 1}, \dots, 10^{- 6}

for testing and generate 10 random right-hand functions for each parameter. The iteration is terminated when the relative residual is less than

10^{- 6}

. We use “mean ± std” to show the mean and standard deviation of the iterations over the test set as in [20].

Table 1 shows the test results of different solvers. The iteration steps of all the solvers grow as anisotropic strength increases except for MG (line-Jacobi). The growth of FNS is substantially lower than Meta-MgNet and MG. When the FNS employs the same

Φ

as Meta-MgNet, the number of iterations is nearly 10-times lower than that of Meta-MgNet at

ε = 10^{- 5}

with the same computational complexity for a single-step iteration. The line-Jacobi smoother can only be applied to several specific

θ

, i.e.,

0, π / 4, π / 2, 3 π / 4, π

. However, the FNS can be available for arbitrary

θ

.

We use

ξ = 10^{- 1}, 10^{- 6}, n = 64

and Cheby-semi (

m = 10

) as examples to illustrate the error that

H

learned. Figure 4a shows the convergence factor obtained by LFA of Cheby-semi (

m = 10

) for solving the system with

ξ = 10^{- 1}, θ = 0

. It can effectively eliminate the error components except for low-frequency errors. Figure 4b shows the distribution of errors before correction in frequency space. The result is consistent with the guidance of LFA, i.e., the error is concentrated in the low-frequency modes. Figure 4c shows the distribution of errors learned by

H

in the frequency space at this time. Its distribution is largely similar to that of Figure 4b. Figure 4d–f show the corresponding situation for

ξ = 10^{- 6}, θ = 0

. In this case, Cheby-semi (

m = 10

) is unable to eliminate the error along the y direction. However,

H

is still capable of learning.

4.1.2. Case 2: Generalization Ability of Anisotropic Direction with Fixed Strength

We randomly sample 20 parameters

θ

according to the distribution

θ \sim U [0, π]

with fixed

ξ = 10^{- 6}

. The training and testing data are generated in a similar manner as in Section 4.1.1. Table 2 shows the test results. It can be seen that whether

Φ

is Jacobi or Krylov, the FNS can maintain robust performance in all situations, while the line-smoother is not available for these cases.

Take

θ = j π / 10, j = 1, \dots, 4, 6, \dots, 9

,

ξ = 10^{- 6}, n = 64

and

Φ

is the weighted Jacobi method with weight

2 / 3

. Figure 5 shows the test results. The first row shows the weighted Jacobi method convergence factor

μ_{loc}

for each

θ

which is computed by LFA. The region of

μ_{loc} \sim 1

means that the error is difficult to eliminate. These error components are distributed along the anisotropic direction. The second row shows the error distribution in frequency space before correction, which is consistent with the results obtained by the LFA. The third row shows the distribution of the error learned by

H

. It can be seen that

H

can automatically learn the error components that

Φ

has difficulty eliminating. The line-smoother is not feasible for these

θ

.

4.2. Convection–Diffusion Equation

Consider the convection–diffusion equation

\{\begin{matrix} - ε Δ u + u_{x} + u_{y} = 0, & Ω = {(0, 1)}^{2}, \\ u = 0, & x = 0, 0 \leq y < 1 and y = 0, 0 \leq x < 1, \\ u = 1, & x = 1, 0 \leq y < 1 and y = 1, 0 \leq x \leq 1, \end{matrix}

(18)

We use the central difference method to discretize (18) on a uniform mesh with spatial size h in both x and y directions, which yields a non-symmetric stencil

\frac{1}{h^{2}} [\begin{matrix} h / 2 - ε \\ - h / 2 - ε & 4 ε & h / 2 - ε \\ - h / 2 - ε \end{matrix}] .

(19)

To meet the stability requirement, the central difference scheme needs to satisfy the Peclet condition

P e : = \frac{h}{ε} max (| a |, | b |) \leq 2,

(20)

which means that the central difference method cannot approximate the PDE solution when

ε

is extremely small. However, here, we only take into account the solver of the linear system. Thus, we continue to use this discretization method to demonstrate the performance of the FNS. In the following experiments, we explore diffusion- and convection-dominant cases, respectively.

4.2.1. Case 1: $ε \in (0.01, 1)$

We utilize the weighted Jacobi method as

Φ

in this case. Taking

ε = 0.1, h = 1 / 64

as an example, Figure 6a illustrates the convergence factor obtained by LFA of the weighted Jacobi method (

ω = 4 / 5

) for solving the system. We use five-times the consecutive weighted Jacobi method as

Φ

. Figure 6b–e show that such a

Φ

is a good smoother. Figure 6f shows the distribution of error learned by

H

in the frequency space. It can be seen that this is essentially complementary to

Φ

.

Figure 7 uses

ε = 0.5, 0.1, 0.05

as examples to show the relative residual of the FNS and the weighted Jacobi method. It can be seen that the FNS has acceleration and the weighted Jacobi method ramps up as

ε

decreases. This is because when

ε

declines, the diagonal element

4 ε

becomes small, and the weight along the gradient direction increases. However, the Jacobi and other gradient descent algorithms will diverge as long as

ε

continues to decrease unless the weight is drastically reduced.

4.2.2. Case 2: $ε \in [10^{- 6}, 10^{- 3}]$

In this case, since the diagonal elements of the discrete system are notably less than the off-diagonal elements, the system is non-symmetric. Many methods, such as Jaocbi, CG, and MG (Jacobi), might diverge. Figure 8 shows the convergence factor of the weighted Jacobi (

ω = 4 / 5

) for solving the system when

ε = 10^{- 2}, 10^{- 3}, 10^{- 6}

. It can be seen that, when

ε

is small, the convergence factor of the weighted Jacobi method for most frequency modes is larger than 1, which causes this iterative method to diverge and to be unsuitable as a smoother.

Consequently, we learn the

Φ

in Equation (3), where

B

is a two-layer linear CNN with channels 1→8→1, and the kernel size is

3 \times 3

. The

Φ

is trained together with

H

; the training hyperparameters are listed in Appendix A.2. Figure 9a illustrates how the learned FNS is able to solve the linear system when

ε = 10^{- 3}, 10^{- 4}, 10^{- 5}, 10^{- 6}

. It is evident that the FNS converges rapidly. Figure 9b shows the change in the relative residual for the FNS, GMRES, and BiCGSTAB(ℓ) (

ℓ = 15

) with

ε = 10^{- 6}

. It is clear that the FNS has the fastest convergence rate.

4.3. Helmholtz Equation

The Helmholtz equation we consider here is

- Δ u (x) - κ^{2} u (x) = g (x), x \in Ω,

(21)

where

Ω = {(0, 1)}^{2}

,

κ

is the wavenumber. We currently only take into account the zero Dirichlet boundary condition. We use the second-order FDM to discretize (21) on a uniform mesh with spatial size h. The corresponding stencil reads

\frac{1}{h^{2}} [\begin{matrix} 0 & - 1 & 0 \\ - 1 & 4 - κ^{2} h^{2} & - 1 \\ 0 & - 1 & 0 \end{matrix}] .

(22)

We examine the FNS performance at a low wavenumber (

κ = 25

) and medium wavenumber (

κ = 125

). For

κ = 25

, we take

h = 1 / 64

and

h = 1 / 256

for

κ = 125

. Using Krylov in [20] as

Φ

and

g (x) = 1

; the training hyperparameters are listed in Appendix A.3. Figure 10 shows how the relative residual decreases with different solvers. For

κ = 25

, the FNS performs best for the first 300 steps; however, BiCGSTAB performs better at the end. For

κ = 125

, the FNS outperforms BiCGSTAB. The GMRES results were too poor to display for this case.

5. Conclusions and Future Work

This paper proposes an interpretable FNS to solve large sparse linear systems. It is composed of a stationary method and a frequency correction, which are used to eliminate errors in different Fourier modes. Numerical experiments undertaken showed that the FNS was more effective and robust than other solvers in solving the anisotropic diffusion equation, the convection–diffusion equation and the Helmholtz equation. The core concepts discussed here are relevant to a broad range of systems.

There is still a great deal of work to do. First, we only considered uniform mesh in this paper. We intend to generalize the FNS to non-uniform grids by exploiting geometric deep learning tools, such as graph neural networks and graph Fourier transform. Secondly, as previously discussed, the stationary method converges slowly or diverges in some situations, which has prompted researchers to approximate solutions in other transform space. This is true for almost all advanced iterative methods, including MG, Krylov subspace methods and the FNS. This specified space, however, may not always be the best choice. In the future, we will investigate additional potential transforms, such as Chebyshev, Legendre transforms, and potentially learnable transforms based on data.

Author Contributions

Conceptualization, C.C., K.J. and S.S.; methodology, C.C., K.J. and S.S.; software, C.C. and Y.L.; validation, C.C. and Y.L.; formal analysis, C.C., K.J. and S.S.; investigation, C.C.; resources, C.C., K.J. and S.S.; data curation, C.C.; writing—original draft preparation, C.C., K.J. and S.S.; writing—review and editing, C.C. and K.J.; visualization, C.C.; supervision, K.J. and S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (12171412, 11971414). K.J. is partially supported by the Natural Science Foundation for Distinguished Young Scholars of Hunan Province (2021JJ10037). C.C. is supported by the Hunan Provincial Innovation Foundation For Postgraduates.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Code and data are available at https://github.com/cuichen1996/FourierNeuralSolver (accessed on 13 September 2022).

Acknowledgments

We would like to thank Chen et al. [20] for sharing data for the anisotropic diffusion equation and the Krylov method.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PDE	Partial differential equation
FFT	Fast Fourier transform
FNS	Fourier neural solver
CG	Conjugate gradient
GMRES	Generalized minimal residual
GMG	Geometric multigrid
AMG	Algebraic multigrid
CNN	Convolutional neural network
LFA	Local Fourier analysis
FEM	Finite element method
FDM	Finite difference method
HyperNN	Hyper-neural network

Appendix A. Training Hyperparameters

Appendix A.1. Anisotropic Diffusion Equation

Table A1. Training hyperparameters for anisotropic diffusion equation.

	Learning Rate	Batch Size	K	Xavier Init	Grad Clip
FNS (Cheby-semi)	$10^{- 4}$	100	10	$10^{- 2}$	false
FNS (Jacobi)	$10^{- 4}$	100	10	$10^{- 2}$	false
FNS (Krylov)	$10^{- 4}$	100	10	$10^{- 2}$	false

Table A2. HyperNN architecture parameters for anisotropic diffusion equation. Notations in ConvTranspose2d are: i: in channels; o: out channels; k: kernel size; s: stride; p: padding.

ConvTranspose2d(i = 1,o = 4,k = 3,s = 2,p = 1) + Relu()

ConvTranspose2d(i = 4,o = 4,k = 3,s = 2,p = 1) + Relu()

ConvTranspose2d(i = 4,o = 2,k = 3,s = 2,p = 2)

AdaptiveAvgPool2d(n)

Appendix A.2. Convection–Diffusion Equation

Table A3. Training hyperparameters for convection–diffusion equation.

	Learning Rate	Batch Size	K	Xavier Init	Grad Clip
FNS(Jacobi)	$10^{- 4}$	100	10	$10^{- 2}$	false
FNS(Conv)	$10^{- 4}$	100	1∼100	$10^{- 2}$	1.0

Table A4. HyperNN architecture parameters for convection–diffusion equation. Notations in ConvTranspose2d are: i: in channels; o: out channels; k: kernel size; s: stride; p: padding.

$Φ$	Hyper NN
Conv(i = 1,o = 8, k = 3, s = 2, p = 1)	ConvTranspose2d(i = 1,o = 4,k = 3,s = 2,p = 1) + Relu()
Conv(i = 8,o = 1, k = 3, s = 2, p = 1)	ConvTranspose2d(i = 4,o = 4,k = 3,s = 2,p = 1) + Relu()
	ConvTranspose2d(i = 4,o = 4,k = 3,s = 2,p = 1) + Relu()
	ConvTranspose2d(i = 4,o = 4,k = 3,s = 2,p = 1) + Relu()
	ConvTranspose2d(i = 4,o = 2,k = 3,s = 2,p = 2)

Appendix A.3. Helmholtz Equation

Table A5. Training hyperparameters for Helmholtz equation.

	Learning Rate	Batch Size	K	Xavier Init	Grad Clip
FNS(Krylov)	$10^{- 4}$	100	1∼100	$10^{- 2}$	1.0

Table A6. HyperNN architecture parameters for Helmholtz equation. Notations in ConvTranspose2d are: i: in channels; o: out channels; k: kernel size; s: stride; p: padding.

ConvTranspose2d(i = 1,o = 4,k = 3,s = 2,p = 1) + Relu()

ConvTranspose2d(i = 4,o = 4,k = 3,s = 2,p = 1) + Relu()

ConvTranspose2d(i = 4,o = 2,k = 3,s = 2,p = 2)

AdaptiveAvgPool2d(n)

References

Barrett, R.; Berry, M.; Chan, T.F.; Demmel, J.; Donato, J.; Dongarra, J.; Eijkhout, V.; Pozo, R.; Romine, C.; Van der Vorst, H. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods; SIAM: Singapore, 1994. [Google Scholar]
Saad, Y. Iterative Methods for Sparse Linear Systems; SIAM: Singapore, 2003. [Google Scholar]
Hestenes, M.R.; Stiefel, E. Methods of conjugate gradients for solving. J. Res. Natl. Bur. Stand. 1952, 49, 409. [Google Scholar] [CrossRef]
Saad, Y.; Schultz, M.H. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. Sci. Stat. Comput. 1986, 7, 856–869. [Google Scholar] [CrossRef] [Green Version]
Brandt, A. Multi-level adaptive solutions to boundary-value problems. Math. Comput. 1977, 31, 333–390. [Google Scholar] [CrossRef]
Briggs, W.L.; Henson, V.E.; McCormick, S.F. A Multigrid Tutorial; SIAM: Singapore, 2000. [Google Scholar]
Trottenberg, U.; Oosterlee, C.W.; Schuller, A. Multigrid; Elsevier: Amsterdam, The Netherlands, 2000. [Google Scholar]
Falgout, R.D. An Introduction to Algebraic Multigrid; Technical Report; Lawrence Livermore National Lab. (LLNL): Livermore, CA, USA, 2006. [Google Scholar]
Xu, J.; Zikatanov, L. Algebraic multigrid methods. Acta Numer. 2017, 26, 591–721. [Google Scholar] [CrossRef] [Green Version]
Hsieh, J.T.; Zhao, S.; Eismann, S.; Mirabella, L.; Ermon, S. Learning neural PDE solvers with convergence guarantees. arXiv 2019, arXiv:1906.01200. [Google Scholar]
Luna, K.; Klymko, K.; Blaschke, J.P. Accelerating gmres with deep learning in real-time. arXiv 2021, arXiv:2103.10975. [Google Scholar]
Zhang, E.; Kahana, A.; Turkel, E.; Ranade, R.; Pathak, J.; Karniadakis, G.E. A Hybrid Iterative Numerical Transferable Solver (HINTS) for PDEs Based on Deep Operator Network and Relaxation Methods. arXiv 2022, arXiv:2208.13273. [Google Scholar]
Lu, L.; Jin, P.; Pang, G.; Zhang, Z.; Karniadakis, G.E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat. Mach. Intell. 2021, 3, 218–229. [Google Scholar] [CrossRef]
Weymouth, G.D. Data-Driven Multi-grid Solver for Accelerated Pressure Projection. arXiv 2021, arXiv:2110.11029. [Google Scholar] [CrossRef]
Tomasi, C.; Krause, R. Construction of Grid Operators for Multilevel Solvers: A Neural Network Approach. arXiv 2021, arXiv:2109.05873. [Google Scholar]
Taghibakhshi, A.; MacLachlan, S.; Olson, L.; West, M. Optimization-based algebraic multigrid coarsening using reinforcement learning. Adv. Neural Inf. Process. Syst. 2021, 34, 12129–12140. [Google Scholar]
Huang, R.; Li, R.; Xi, Y. Learning optimal multigrid smoothers via neural networks. arXiv 2021, arXiv:2102.12071. [Google Scholar] [CrossRef]
Wang, F.; Gu, X.; Sun, J.; Xu, Z. Learning-Based Local Weighted Least Squares for Algebraic Multigrid Method. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4110904 (accessed on 16 May 2022).
Fanaskov, V. Neural Multigrid Architectures. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), IEEE, Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar]
Chen, Y.; Dong, B.; Xu, J. Meta-mgnet: Meta multigrid networks for solving parameterized partial differential equations. J. Comput. Phys. 2022, 455, 110996. [Google Scholar] [CrossRef]
Katrutsa, A.; Daulbaev, T.; Oseledets, I. Black-box learning of multigrid parameters. J. Comput. Appl. Math. 2020, 368, 112524. [Google Scholar] [CrossRef]
Greenfeld, D.; Galun, M.; Basri, R.; Yavneh, I.; Kimmel, R. Learning to optimize multigrid PDE solvers. In Proceedings of the International Conference on Machine Learning. PMLR, Long Beach, CA, USA, 10–15 June 2019; pp. 2415–2423. [Google Scholar]
Luz, I.; Galun, M.; Maron, H.; Basri, R.; Yavneh, I. Learning algebraic multigrid using graph neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Online, 26–28 August 2020; pp. 6489–6499. [Google Scholar]
Antonietti, P.F.; Caldana, M.; Dede, L. Accelerating Algebraic Multigrid Methods via Artificial Neural Networks. arXiv 2021, arXiv:2111.01629. [Google Scholar]
Stanziola, A.; Arridge, S.R.; Cox, B.T.; Treeby, B.E. A Helmholtz equation solver using unsupervised learning: Application to transcranial ultrasound. J. Comput. Phys. 2021, 441, 110430. [Google Scholar] [CrossRef]
Kapturowski, S.; Ostrovski, G.; Quan, J.; Munos, R.; Dabney, W. Recurrent experience replay in distributed reinforcement learning. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2018. [Google Scholar]
Azulay, Y.; Treister, E. Multigrid-Augmented Deep Learning Preconditioners for the Helmholtz Equation. arXiv 2022, arXiv:2203.11025. [Google Scholar] [CrossRef]
Erlangga, Y.A.; Oosterlee, C.W.; Vuik, C. A novel multigrid based preconditioner for heterogeneous Helmholtz problems. SIAM J. Sci. Comput. 2006, 27, 1471–1492. [Google Scholar] [CrossRef]
Calandra, H.; Gratton, S.; Langou, J.; Pinel, X.; Vasseur, X. Flexible variants of block restarted GMRES methods with application to geophysics. SIAM J. Sci. Comput. 2012, 34, A714–A736. [Google Scholar] [CrossRef]
Um, K.; Brand, R.; Fei, Y.R.; Holl, P.; Thuerey, N. Solver-in-the-loop: Learning from differentiable physics to interact with iterative pde-solvers. Adv. Neural Inf. Process. Syst. 2020, 33, 6111–6122. [Google Scholar]
Nikolopoulos, S.; Kalogeris, I.; Papadopoulos, V.; Stavroulakis, G. AI-enhanced iterative solvers for accelerating the solution of large scale parametrized linear systems of equations. arXiv 2022, arXiv:2207.02543. [Google Scholar]
Stanaityte, R. ILU and Machine Learning Based Preconditioning for the Discretized Incompressible Navier-Stokes Equations. Ph.D. Thesis, University of Houston, Houston, TX, USA, 2020. [Google Scholar]
Kaneda, A.; Akar, O.; Chen, J.; Kala, V.; Hyde, D.; Teran, J. A Deep Gradient Correction Method for Iteratively Solving Linear Systems. arXiv 2022, arXiv:2205.10763. [Google Scholar]
Margenberg, N.; Hartmann, D.; Lessig, C.; Richter, T. A neural network multigrid solver for the Navier-Stokes equations. J. Comput. Phys. 2022, 460, 110983. [Google Scholar] [CrossRef]
Margenberg, N.; Jendersie, R.; Richter, T.; Lessig, C. Deep neural networks for geometric multigrid methods. arXiv 2021, arXiv:2106.07687. [Google Scholar]
Cooley, J.W.; Tukey, J.W. An algorithm for the machine calculation of complex Fourier series. Math. Comput. 1965, 19, 297–301. [Google Scholar] [CrossRef]
Sleijpen, G.L.; Fokkema, D.R. BiCGstab (ell) for linear equations involving unsymmetric matrices with complex spectrum. Electron. Trans. Numer. Anal. 1993, 1, 11–32. [Google Scholar]
Swarztrauber, P.N. The methods of cyclic reduction, Fourier analysis and the FACR algorithm for the discrete solution of Poisson’s equation on a rectangle. SIAM Rev. 1977, 19, 490–501. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Golub, G.H.; Varga, R.S. Chebyshev semi-iterative methods, successive overrelaxation iterative methods, and second order Richardson iterative methods. Numer. Math. 1961, 3, 157–168. [Google Scholar] [CrossRef]
Adams, M.; Brezina, M.; Hu, J.; Tuminaro, R. Parallel multigrid smoothing: Polynomial versus Gauss–Seidel. J. Comput. Phys. 2003, 188, 593–610. [Google Scholar] [CrossRef]
Mises, R.; Pollaczek-Geiringer, H. Praktische Verfahren der Gleichungsauflösung. Zamm-J. Appl. Math. Mech. FÜR Angew. Math. Mech. 1929, 9, 58–77. [Google Scholar] [CrossRef]

Figure 1. (a) Distribution of the convergence factor

μ_{loc}

of the weighted Jacobi method (

ω = 2 / 3

) in solving a linear system arising from the Poisson equation; (b) Low-frequency (white) and high-frequency (gray) regions.

Figure 1. (a) Distribution of the convergence factor

μ_{loc}

of the weighted Jacobi method (

ω = 2 / 3

) in solving a linear system arising from the Poisson equation; (b) Low-frequency (white) and high-frequency (gray) regions.

Figure 2. FNS calculation flowchart.

Figure 3. Distribution of convergence factor for Cheby-semi (

m = 10

) when

ξ = 10^{- 3}, θ = 0

.

Figure 3. Distribution of convergence factor for Cheby-semi (

m = 10

) when

ξ = 10^{- 3}, θ = 0

.

Figure 4. Distribution of convergence factor when

ξ = 10^{- 1}, 10^{- 6}

. The first column displays the convergence factor of Cheby-semi (

m = 10

). The second column shows the error distribution in the frequency space before correction. The third column shows the error distribution in the frequency space learned by

H

. (a)

ξ = 10^{- 1}

: convergence factor of

Φ

. (b)

ξ = 10^{- 1}

:

\hat{e}

, before doing corrections. (c)

ξ = 10^{- 1}

: learned

\hat{e}

. (d)

ξ = 10^{- 6}

: convergence factor of

Φ

. (e)

ξ = 10^{- 6}

:

\hat{e}

, before doing corrections. (f)

ξ = 10^{- 6}

: learned

\hat{e}

.

Figure 4. Distribution of convergence factor when

ξ = 10^{- 1}, 10^{- 6}

. The first column displays the convergence factor of Cheby-semi (

m = 10

). The second column shows the error distribution in the frequency space before correction. The third column shows the error distribution in the frequency space learned by

H

. (a)

ξ = 10^{- 1}

: convergence factor of

Φ

. (b)

ξ = 10^{- 1}

:

\hat{e}

, before doing corrections. (c)

ξ = 10^{- 1}

: learned

\hat{e}

. (d)

ξ = 10^{- 6}

: convergence factor of

Φ

. (e)

ξ = 10^{- 6}

:

\hat{e}

, before doing corrections. (f)

ξ = 10^{- 6}

: learned

\hat{e}

.

Figure 5. Case 2 of Equation (13) for different anisotropic direction

θ

. The first row represents the convergence factor of

Φ

. The second row represents the error distribution in the frequency space before correction. The third row represents the learned error by

H

.

Figure 5. Case 2 of Equation (13) for different anisotropic direction

θ

. The first row represents the convergence factor of

Φ

. The second row represents the error distribution in the frequency space before correction. The third row represents the learned error by

H

.

Figure 6. Distribution of convergence factor of five-times consecutive weighted Jacobi method. The last plot shows the distribution of errors learned by

H

in frequency space. (a) Convergence factor of one times weighted Jacobi (

ω = 4 / 5

). (b) Two times. (c) Three times. (d) Four times. (e) Five times. (f) Learned

\hat{e}

.

Figure 6. Distribution of convergence factor of five-times consecutive weighted Jacobi method. The last plot shows the distribution of errors learned by

H

in frequency space. (a) Convergence factor of one times weighted Jacobi (

ω = 4 / 5

). (b) Two times. (c) Three times. (d) Four times. (e) Five times. (f) Learned

\hat{e}

.

Figure 7. The relative residual with the FNS and weighted Jacobi method, where Jacobi denotes five-times the consecutive weighted Jacobi method.

Figure 8. Distribution of convergence factor for weighted Jacobi method (

ω = 4 / 5

) when solving the system corresponding to

ε = 10^{- 2}, 10^{- 3}, 10^{- 6}

.

Figure 8. Distribution of convergence factor for weighted Jacobi method (

ω = 4 / 5

) when solving the system corresponding to

ε = 10^{- 2}, 10^{- 3}, 10^{- 6}

.

Figure 9. (a) Change in relative residual with FNS iteration steps for different

ε

. (b) Comparison of FNS and GMRES, BiCGSTAB(ℓ) when

ε = 10^{- 6}

.

Figure 9. (a) Change in relative residual with FNS iteration steps for different

ε

. (b) Comparison of FNS and GMRES, BiCGSTAB(ℓ) when

ε = 10^{- 6}

.

Figure 10. Change in relative residual for the FNS and other solvers when

κ = 25

and

κ = 125

, respectively.

Figure 10. Change in relative residual for the FNS and other solvers when

κ = 25

and

κ = 125

, respectively.

Table 1. Mean and standard deviation of the number of iterations required to achieve the stopping criterion over all tests for the anisotropic diffusion equation case 1. “−” means that it cannot converge within 10,000 steps, and “ ” means that [20] does not provide test results for this parameter.

$ξ$	FNS (Cheby-Semi)	FNS (Jacobi)	FNS (Krylov)	Meta-MgNet (Krylov) [20]	MG (Jacobi)	MG (Line-Jacobi)
$ξ = 10^{- 1}$	$67.9 \pm 3.81$	$138.9 \pm 11.18$	$30.0 \pm 4.58$	$7.5 \pm 0.50$	$90.2 \pm 0.98$	$13.0 \pm 0.00$
$ξ = 10^{- 2}$	$101.6 \pm 8.72$	$167.8 \pm 13.81$	$38.5 \pm 3.83$	$35.1 \pm 1.04$	$752.8 \pm 12.23$	$13.0 \pm 0.00$
$ξ = 10^{- 3}$	$151.0 \pm 7.24$	$221.7 \pm 11.56$	$48.6 \pm 3.26$	$171.6 \pm 6.34$	$5600 \pm 119.42$	$13.0 \pm 0.00$
$ξ = 10^{- 4}$	$233.2 \pm 5.67$	$330.1 \pm 9.16$	$65.5 \pm 2.80$	$375.2 \pm 5.88$	−	$11.0 \pm 0.00$
$ξ = 10^{- 5}$	$340.1 \pm 9.43$	$466.2 \pm 13.47$	$80.7 \pm 7.21$	$797.8 \pm 12.76$	−	$11.0 \pm 0.00$
$ξ = 10^{- 6}$	$348.1 \pm 11.15$	$477.9 \pm 16.10$	$85.9 \pm 7.52$		−	$11.0 \pm 0.00$

Table 2. Mean and standard deviation of the number of iterations required to achieve the stopping criterion over all tests for Equation (13) case 2.

$θ$	$0.1 π$	$0.2 π$	$0.3 π$	$0.4 π$	$0.6 π$	$0.7 π$	$0.8 π$	$0.9 π$
FNS (Jacobi)	$300.2 \pm 23.95$	$252.4 \pm 34.81$	$269.3 \pm 36.70$	$356.4 \pm 39.69$	$338.4 \pm 32.07$	$265.0 \pm 29.96$	$266.5 \pm 25.07$	$316.7 \pm 17.26$
FNS (Krylov)	$58.4 \pm 4.45$	$46.5 \pm 4.84$	$45.1 \pm 2.30$	$64.0 \pm 7.01$	$54.4 \pm 5.75$	$41.3 \pm 7.11$	$43.0 \pm 2.83$	$60.3 \pm 3.93$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, C.; Jiang, K.; Liu, Y.; Shu, S. Fourier Neural Solver for Large Sparse Linear Algebraic Systems. Mathematics 2022, 10, 4014. https://doi.org/10.3390/math10214014

AMA Style

Cui C, Jiang K, Liu Y, Shu S. Fourier Neural Solver for Large Sparse Linear Algebraic Systems. Mathematics. 2022; 10(21):4014. https://doi.org/10.3390/math10214014

Chicago/Turabian Style

Cui, Chen, Kai Jiang, Yun Liu, and Shi Shu. 2022. "Fourier Neural Solver for Large Sparse Linear Algebraic Systems" Mathematics 10, no. 21: 4014. https://doi.org/10.3390/math10214014

APA Style

Cui, C., Jiang, K., Liu, Y., & Shu, S. (2022). Fourier Neural Solver for Large Sparse Linear Algebraic Systems. Mathematics, 10(21), 4014. https://doi.org/10.3390/math10214014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fourier Neural Solver for Large Sparse Linear Algebraic Systems

Abstract

1. Introduction

2. Motivation

3. Fourier Neural Solver

4. Numerical Experiments

4.1. Anisotropic Diffusion Equation

4.1.1. Case 1: Generalization Ability of Anisotropic Strength with Fixed Direction

4.1.2. Case 2: Generalization Ability of Anisotropic Direction with Fixed Strength

4.2. Convection–Diffusion Equation

4.2.1. Case 1: $ε \in (0.01, 1)$

4.2.2. Case 2: $ε \in [10^{- 6}, 10^{- 3}]$

4.3. Helmholtz Equation

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Training Hyperparameters

Appendix A.1. Anisotropic Diffusion Equation

Appendix A.2. Convection–Diffusion Equation

Appendix A.3. Helmholtz Equation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Fourier Neural Solver for Large Sparse Linear Algebraic Systems

Abstract

1. Introduction

2. Motivation

3. Fourier Neural Solver

4. Numerical Experiments

4.1. Anisotropic Diffusion Equation

4.1.1. Case 1: Generalization Ability of Anisotropic Strength with Fixed Direction

4.1.2. Case 2: Generalization Ability of Anisotropic Direction with Fixed Strength

4.2. Convection–Diffusion Equation

4.2.1. Case 1: ε ∈ ( 0.01 , 1 )

4.2.2. Case 2: ε ∈ [ 10 − 6 , 10 − 3 ]

4.3. Helmholtz Equation

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Training Hyperparameters

Appendix A.1. Anisotropic Diffusion Equation

Appendix A.2. Convection–Diffusion Equation

Appendix A.3. Helmholtz Equation

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2.1. Case 1: $ε \in (0.01, 1)$

4.2.2. Case 2: $ε \in [10^{- 6}, 10^{- 3}]$