Parallel Algorithm for Solving the Inverse Two-Dimensional Fractional Diffusion Problem of Identifying the Source Term

Akimova, Elena N.; Sultanov, Murat A.; Misilov, Vladimir E.; Nurlanuly, Yerkebulan

doi:10.3390/fractalfract7110801

Open AccessArticle

Parallel Algorithm for Solving the Inverse Two-Dimensional Fractional Diffusion Problem of Identifying the Source Term

¹

Ural Branch of RAS, Krasovskii Institute of Mathematics and Mechanics, S. Kovalevskaya Street 16, Ekaterinburg 620108, Russia

²

Department of Information Technologies and Control Systems, Institute of Radioelectronics and Information Technology, Ural Federal University, Mira Street 19, Ekaterinburg 620002, Russia

³

Department of Mathematics, Faculty of Natural Science, Khoja Akhmet Yassawi International Kazakh-Turkish University, Turkistan 160200, Kazakhstan

⁴

Department of High Performance Computing Technologies, Institute of Natural Sciences and Mathematics, Ural Federal University, Mira Street 19, Ekaterinburg 620002, Russia

^*

Author to whom correspondence should be addressed.

Fractal Fract. 2023, 7(11), 801; https://doi.org/10.3390/fractalfract7110801

Submission received: 11 August 2023 / Revised: 14 October 2023 / Accepted: 30 October 2023 / Published: 2 November 2023

(This article belongs to the Special Issue Fractional Diffusion Equations: Numerical Analysis, Modeling and Application)

Download

Browse Figures

Versions Notes

Abstract

:

This paper is devoted to the development of a parallel algorithm for solving the inverse problem of identifying the space-dependent source term in the two-dimensional fractional diffusion equation. For solving the inverse problem, the regularized iterative conjugate gradient method is used. At each iteration of the method, we need to solve the auxilliary direct initial-boundary value problem. By using the finite difference scheme, this problem is reduced to solving a large system of a linear algebraic equation with a block-tridiagonal matrix at each time step. Solving the system takes almost the entire computation time. To solve this system, we construct and implement the direct parallel matrix sweep algorithm. We establish stability and correctness for this algorithm. The parallel implementations are developed for the multicore CPU using the OpenMP technology. The numerical experiments are performed to study the performance of parallel implementations.

Keywords:

time-fractional diffusion equation; Caputo fractional derivative; inverse problems; source term identification; finite-difference scheme; block-elimination method; parallel matrix sweep method; parallel computing

1. Introduction

Numerous physical phenomena exhibit the characteristics of memory retention and nonlocal effects [1]. The formulation of mathematical frameworks to describe these phenomena often involves the utilization of fractional calculus [2]. Various fractional derivative operators possess distinct definitions and inherent properties. The fractional differential equation may be applied to a wide range of fields such as anomalous diffusion [3,4,5], viscoelasticity [6], ferroelectric media [7], fractional multi-pole and neuron modelling [8], and fractional Lèvy motion [9]. The review paper [10] presents a comprehensive survey on real-world applications of fractional calculus in various fields, namely, physics; control, signal and image processing; mechanics and dynamic systems; biology; environmental science; material studies; economics; and engineering.

The solution to direct and inverse problems for differential equations with fractional derivatives typically incurs substantial computational costs due to their nonlocal characteristics. Different numerical techniques are available for solving approximate initial-boundary problems for fractional differential equations [11,12,13,14], for example, the finite difference method. An established approach to enhance computational efficiency involves the utilization of parallel computing [15,16,17]. The problems for fractional differential equations may be solved using various original parallel algorithms [18,19].

Additionally, while forward problems are usually well developed, the inverse problems may present significant challenges to achieve the stable solutions [20]. Here, by direct problems, we mean the classical initial boundary problem of finding the unknown function from equation and additional boundary and initial conditions. Inverse problems include identifying unknown coefficients of an equation or unknown boundary or initial conditions [21]. In this case, a priori information is used to ensure the uniqueness of the solution. Such tasks are often incorrect. Regularization methods are used to achieve correctness. Hence, the development and implementation of parallel algorithms aimed at solving inverse problems emerging from fractional differential equations is of great importance.

In [22], an algorithm for solving the inverse problem of identifying the space-dependent source is constructed on the base of regularized Landweber iterative method. An iterative algorithm on the basis of the conjugate gradient method is constructed in [23]. For smoothing the values, the authors used the Savitzky–Golay filter.

In [24], existence, uniqueness, and stability estimates are established for the inverse source problem of the recovery of a space-dependent source term for a generalized subdiffusion equation. The Tikhonov regularization method was proposed for solving the problem of reconstruction of a time-dependent source term in a time-fractional diffusion-wave equation [25]. Convergence estimates and parameted choice rules for Tikhonov regularization for the inverse source problem for a time fractional diffusion equation were proposed in [26].

In our previous works [27,28] parallel algorithms for solving direct and inverse problems for one-dimensional time-fractional diffusion equation are constructed on the basis of the parallel sweep method for solving the systems of linear algebraic equations with tridiagonal matrices.

The goal of our work is to construct and implement an efficient algorithm for solving the inverse problem of identifying the stationary source term for the two-dimensional time-fractional diffusion equation. To solve the inverse problem, we apply the regularized iterative conjugate gradient method. It requires solving the auxilliary direct subproblem at each iteration. By using the finite difference scheme, we reduce the direct problem to solving a series of systems of linear algebraic equations (SLAE) with block tridiagonal matrices. Note that solving these SLAEs for the two-dimensional fractional diffusion problem takes most of the computation time (up to 99% of the total computational time). We construct and establish the stability and correctness of the direct parallel matrix sweep method for solving such systems. We have developed a parallel algorithm and a parallel code for solving the inverse problem. The code is intended for multicore processors and is implemented using C++17 and OpenMP 4.0 extension. In order to evaluate the validity of developed numerical methods and the performance of parallel algorithms, numerical experiments are conducted. In the future, we plan to use our algorithm and code for solving applied problems.

Below is a breakdown of the article’s structure. In Section 2, we formulate the direct and inverse problems and introduce the discretization and difference scheme for solving the direct initial-boundary value problem. We demonstrate the block-tridiagonal structure of the coefficient matrix of the SLAE. In Section 3, we describe the direct methods for solving SLAEs with the block-tridiagonal matrix. We establish the stability and correctness of the parallel matrix sweep algorithm. In Section 4, we describe the conjugate gradient algorithm for solving the inverse problem. Section 5 describes the development of the parallel code that implements the numerical algorithms described above. Section 6 presents the results of the performed numerical experiments. In Section 7, we discuss these results. Section 8 concludes our work.

2. Problem

2.1. Statement of the Problem

Consider the basis time-fractional elliptic partial differential equation in the following form:

\frac{\partial^{α} U}{\partial t^{α}} + L U = ψ (x) η (x, t), x \in Ω, 0 < t \leq T .

(1)

Here,

x = (x_{1}, x_{2}, \dots, x_{d}) \in \bar{Ω} = \prod_{i = 1}^{d} [0, γ_{i}], 0 < α < 1, q (x, t) \neq 0

and

L

is an elliptic operator

L U = - \sum_{i = 1}^{d} \frac{\partial}{\partial x_{i}} (k_{i} (x, t) \frac{\partial U}{\partial x_{i}}), x_{i} \in (0, γ_{i}), 0 < t \leq T .

The boundary condition is

U (x, t) = 0, x \in δ Ω .

(2)

The initial condition is

U (x, 0) = g_{0} (x), x \in Ω .

(3)

where

g_{0} (x)

is the given functions.

For simplicity, we have formulated our fractional diffusion equation with homogeneous Dirichlet boundary conditions. Note that the problem with inhomogeneous boundary conditions may be reduced to a problem with homogeneous conditions. To do this, we need to represent function U as the sum of a some function

U_{0}

that satisfies the inhomogeneous boundary conditions (for example, found by solving the Dirichlet problem for the Laplace’s equation) plus a remainder function V that will satisfy the homogeneous conditions

U (x, t) = U_{0} (x, t) + V (x, t)

. The algorithms and approaches presented below may be utilized for more general formulations with the Neumann or mixed boundary conditions.

For this study, we consider the Caputo fractional derivative with order

α

in the form [29]

D^{α} f (x) = \frac{1}{Γ (m - α)} \int_{0}^{x} \frac{f^{(m)} (t)}{{(x - t)}^{α - m + 1}} d t,

with

α \in (m - 1, m)

,

m \in N, x > 0

.

Assuming that the solution

U (x, t)

exists and satisfies the Dirichlet boundary conditions, for the case of

0 < α < 1

(

m = 1

) we can consider the following formula for the Caputo fractional partial derivative:

\frac{\partial^{α} U (x, t)}{\partial t^{α}} = \frac{1}{Γ (1 - α)} \int_{0}^{t} \frac{\partial U (x, s)}{\partial s} {(t - s)}^{- α} d s .

(4)

The direct initial boundary problem consists in finding the unknown function

U (x, t)

when all other components of Equation (1) are known.

In the present work, we study the inverse problem of restoring the space-dependent right-hand part

ψ (x)

. Thus, the problem consists in finding the pair of unknown functions

[U (x, t), ψ (x)]

. Additional information for the inverse problem is given in the form of final overdetermination

U (x, T) = φ (x), x \in Ω .

(5)

Conditions for the uniqueness of the solution of this inverse problem for a general multi-dimensional equation are formulated in [30].

2.2. Discretization of Equation and Difference Scheme

In this paper, for simplicity, we will consider the case of

d = 2

and

k_{1} = k_{2} = 1

. For an arbitrary elliptic operator, the general structure of the coefficient matrix of the SLAE remains the same, and the methods and algorithms presented below will be applicable.

Equation (1) will take the following form:

\frac{\partial^{α} U (x_{1}, x_{2}, t)}{\partial t^{α}} = \frac{\partial^{2} U (x_{1}, x_{2}, t)}{\partial x_{1}^{2}} + \frac{\partial^{2} U (x_{1}, x_{2}, t)}{\partial x_{2}^{2}} + ψ (x_{1}, x_{2}) η (x_{1}, x_{2}, t),

(6)

where

U (x_{1}, x_{2}, t)

and

ψ (x_{1}, x_{2})

are the sought functions;

a (x_{1}, x_{2}), b (x_{1}, x_{2}), c (x_{1}, x_{2})

,

η (x_{1}, x_{2}, t)

are the given functions; and

0 < α < 1

is the the order of the fractional derivative.

The problem is considered for the area

Ω : 0 \leq x_{1} \leq γ_{1}, 0 \leq x_{2} \leq γ_{2}

and time interval

0 \leq t \leq T

.

To discretize Equation (6), we construct the partitioning on the solution domain. On intervals

[0, γ_{1}], [0, γ_{2}]

, we introduce the grid with n and N points. The steps are

h_{1} = Δ x_{1} = γ_{1} / n

and

h_{2} = Δ x_{2} = γ_{2} / N

. For the time interval

[0, T]

, we use the uniform grid of

\tilde{N}

points. The step is

τ = Δ t = T / \tilde{N}

. We denote the nodes as

x_{1, i_{1}} = i_{1} h_{1}, x_{2, i_{2}} = i_{2} h_{2}, i_{1} \in \{0, 1, \dots, n\}, i_{2} \in \{0, 1, \dots, N\}

, and

t_{j} = j τ, j \in \{0, 1, \dots, \tilde{N}\}

. The values of discretized functions U and others are denoted as

U_{i_{1}, i_{2}, j} = U (x_{1, i_{1}}, x_{2, i_{2}}, t_{j})

.

In this work, the Grunwald–Letnikov formula [2] is used for approximating the Caputo fractional derivative in Equation (6)

\begin{matrix} D_{t}^{α} U_{i_{1}, i_{2}, j} ≅ σ_{α, τ} \sum_{ℓ = 1}^{j} w_{ℓ}^{(α)} (U_{i_{1}, i_{2}, j - ℓ + 1} - U_{i_{1}, i_{2}, j - ℓ}), \\ σ_{α, τ} = \frac{1}{Γ (1 - α) (1 - α) τ^{α}}, w_{ℓ}^{(α)} = ℓ^{1 - α} - {(ℓ - 1)}^{1 - α} . \end{matrix}

(7)

The implicit two-step finite difference scheme is used for approximating Equation (6). For the grid point

(x_{i_{1}}, x_{i_{2}})

at time layer

t_{j}

, the difference equation has the form

\begin{matrix} σ_{α, τ} \sum_{ℓ = 1}^{j} w_{ℓ}^{(α)} (U_{i_{1}, i_{2}, j - ℓ + 1} - U_{i_{1}, i_{2}, j - ℓ}) = \\ = \frac{U_{i_{1} - 1, i_{2}, j} - 2 U_{i_{1}, i_{2}, j} + U_{i_{1} + 1, i_{2}, j}}{h_{1}^{2}} + \frac{U_{i_{1}, i_{2} - 1, j} - 2 U_{i_{1}, i_{2}, j} + U_{i_{1}, i_{2} + 1, j}}{h_{2}^{2}} + ψ_{i_{1}, i_{2}} η_{i_{1}, i_{2}, j} . \end{matrix}

(8)

2.3. Constructing the SLAE

Let us apply the following transformations:

\begin{matrix} σ_{α, τ} (U_{i_{1}, i_{2}, j} - U_{i_{1}, i_{2}, j - 1}) + σ_{α, τ} \sum_{ℓ = 2}^{j} w_{ℓ}^{(α)} (U_{i_{1}, i_{2}, j - ℓ + 1} - U_{i_{1}, i_{2}, j - ℓ}) = \\ = r_{i_{1}, i_{2}} U_{i_{1}, i_{2} - 1, j} + p_{i_{1}, i_{2}} U_{i_{1} - 1, i_{2}, j} + (- 2 p_{i_{1}, i_{2}} - 2 q_{i_{1}, i_{2}}) U_{i_{1}, i_{2}, j} + \\ + p_{i_{1}, i_{2}} U_{i_{1} + 1, i_{2}, j} + r_{i_{1}, i_{2}} U_{i_{1}, i_{2} + 1, j} + ψ_{i_{1}, i_{2}} η_{i_{1}, i_{2}, j}; \end{matrix}

\begin{matrix} - r_{i_{1}, i_{2}} U_{i_{1}, i_{2} - 1, j} - p_{i_{1}, i_{2}} U_{i_{1} - 1, i_{2}, j} + q_{i_{1}, i_{2}} U_{i_{1}, i_{2}, j} - p_{i_{1}, i_{2}} U_{i_{1} + 1, i_{2}, j} - r_{i_{1}, i_{2}} U_{i_{1}, i_{2} + 1, j} = \\ = σ_{α, τ} (U_{i_{1}, i_{2}, j - ℓ} - \sum_{ℓ = 2}^{j} w_{ℓ}^{(α)} (U_{i_{1}, i_{2}, j - ℓ + 1} - U_{i_{1}, i_{2}, j - ℓ})) + ψ_{i_{1}, i_{2}} η_{i_{1}, i_{2}, j}, \end{matrix}

where

p_{i_{1}, i_{2}} = \frac{1}{h_{1}^{2}}, r_{i_{1}, i_{2}} = \frac{1}{h_{2}^{2}}, q_{i_{1}, i_{2}} = σ_{α, τ} + 2 p_{i_{1}, i_{2}} + 2 q_{i_{1}, i_{2}} .

Let us denote

\begin{matrix} f_{i_{1}, i_{2}, j} = σ_{α, τ} (U_{i_{1}, i_{2}, j - ℓ} - \sum_{ℓ = 2}^{j} w_{ℓ}^{(α)} (U_{i_{1}, i_{2}, j - ℓ + 1} - U_{i_{1}, i_{2}, j - ℓ})) + ψ_{i_{1}, i_{2}} η_{i_{1}, i_{2}, j}, j > 1, \\ f_{i_{1}, i_{2}, 1} = σ_{α, τ} U_{i_{1}, i_{2}, 0} + ψ_{i_{1}, i_{2}} η_{i_{1}, i_{2}, 0} . \end{matrix}

(9)

The difference equation will take the form

- r_{i_{1}, i_{2}} U_{i_{1}, i_{2} - 1, j} - p_{i_{1}, i_{2}} U_{i_{1} - 1, i_{2}, j} + q_{i_{1}, i_{2}} U_{i_{1}, i_{2}, j} - p_{i_{1}, i_{2}} U_{i_{1} + 1, i_{2}, j} - r_{i_{1}, i_{2}} U_{i_{1}, i_{2} + 1, j} = f_{i_{1}, i_{2}, j} .

(10)

Note that the values

U_{0, i_{2}, j}, U_{n, i_{2}, j}, i_{2} \in \{0, 1, \dots, N\}

and

U_{i_{1}, 0, j}, U_{i_{1}, N, j}

,

i_{1} \in \{0, 1, \dots, n\}

at the boundaries are given. Then, for all inner points

(x_{i_{1}}, x_{i_{2}})

,

i_{1} \in \{1, 2, \dots, n - 1\}

,

i_{2} \in \{1, 2, \dots, N - 1\}

, difference Equation (10) constitutes an SLAE

A Y = F,

(11)

where A is the block-tridiagonal matrix of dimension

(n - 2) (N - 2) \times (n - 2) (N - 2)

\begin{matrix} A = [\begin{matrix} C_{1} & - B_{1} \\ - A_{2} & C_{2} & - B_{2} \\ ⋱ & ⋱ & ⋱ \\ - A_{i_{2}} & C_{i_{2}} & - B_{i_{2}} \\ ⋱ & ⋱ & ⋱ \\ - A_{N - 2} & C_{N - 2} & - B_{N - 2} \\ - A_{N - 1} & C_{N - 1} \end{matrix}], \\ i_{2} \in \{1, 2, \dots, N - 1\} . \end{matrix}

Blocks

A_{i_{2}}, B_{i_{2}}, C_{i_{2}}

of dimension

(n - 2) \times (n - 2)

are defined as

\begin{matrix} A_{i_{2}} = B_{i_{2}} = [\begin{matrix} r_{1, i_{2}} \\ r_{2, i_{2}} \\ ⋱ \\ r_{i_{1}, i_{2}} \\ ⋱ \\ r_{n - 2, i_{2}} \\ r_{n - 1, i_{2}} \end{matrix}], \\ i_{1} \in \{1, 2, \dots, n - 1\}, \end{matrix}

\begin{matrix} C_{i_{2}} = [\begin{matrix} q_{1, i_{2}} & - p_{1, i_{2}} \\ - p_{2, i_{2}} & q_{2, i_{2}} & - p_{2, i_{2}} \\ ⋱ & ⋱ & ⋱ \\ - p_{i_{1}, i_{2}} & q_{i_{1}, i_{2}} & - p_{i_{1}, i_{2}} \\ ⋱ & ⋱ & ⋱ \\ - p_{n - 2, i_{2}} & q_{n - 2, i_{2}} & - p_{n - 2, i_{2}} \\ - p_{n - 1, i_{2}} & q_{n - 1, i_{2}} \end{matrix}], \\ i_{1} \in \{1, 2, \dots, n - 1\} . \end{matrix}

The sought vector consists of values

U_{i_{1}, i_{2}, j}

collapsed in a row-major order

\begin{matrix} Y = {[Y_{1}, Y_{2}, . . ., Y_{N - 1}]}^{⊺}, \\ Y_{1} = [U_{1, 1, j}, U_{2, 1, j}, \dots, U_{n - 2, 1, j}, U_{n - 1, 1, j}]; \\ Y_{2} = [U_{1, 2, j}, U_{2, 2, j}, \dots, U_{n - 2, 2, j}, U_{n - 1, 2, j}]; \\ \dots \\ Y_{N - 1} = [U_{1, N - 1, j}, U_{2, N - 1, j}, \dots, U_{n - 1, N - 1, j}] . \end{matrix}

Right-hand vector is constructed similarly, taking into account the boundary points

\begin{matrix} F = [ & F_{1}, F_{2}, \dots, F_{N - 1}]^{⊺}; \\ F_{1} = [ & F_{1, 1, j} + r_{1, 1} U_{1, 0, j} + p_{1, 1} U_{0, 1, j}, F_{2, 1, j} + r_{2, 1} U_{2, 0, j}, \dots, \\ F_{n - 2, 1, j} + r_{n - 2, 1} U_{n - 2, 0, j}, F_{n - 1, 1, j} + r_{n - 1, 0} U_{n - 2, 0, j} + p_{n - 1, 1} U_{n, 1, j}]; \\ F_{2} = [ & F_{1, 2, j} + p_{1, 2} U_{0, 2, j}, F_{2, 2, j}, \dots, F_{n - 1, 2, j}, F_{n - 1, 2, j} + p_{n - 1, 2} U_{n, 2, j}]; \\ \dots \\ F_{N - 1} = [ & F_{1, N - 1, j} + r_{1, N - 1} U_{1, N, j} + p_{1, N - 1} U_{0, N - 1, j}, F_{2, N - 1, j} + r_{2, N - 1} U_{2, N, j}, \dots, \\ F_{n - 1, N - 1, j} + r_{n - 1, N - 1} U_{n - 1, N, j} + p_{n - 1, N - 1} U_{n, N - 1, j}] . \end{matrix}

To numerically solve the initial boundary problem, we need to solve system (11) at each subsequent time level

j = 1, . . ., \tilde{N}

.

3. Numerical Methods for Solving the SLAE

To solve the SLAEs with various matrix structures, different numerical methods may be used. In this work, for solving the block-tridiagonal SLAE (11) we will use the block-elimination method [31] and parallel matrix sweep method [32].

3.1. Block-Elimination Method

Let us consider an SLAE with a block tridiagonal matrix in the following form:

\{\begin{matrix} C_{0} {\bar{Y}}_{0} - B_{0} {\bar{Y}}_{1} & = & {\bar{F}}_{0}, & i = 0, \\ - A_{i} {\bar{Y}}_{i - 1} + C_{i} {\bar{Y}}_{i} - B_{i} {\bar{Y}}_{i + 1} & = & {\bar{F}}_{i}, & i = 1, 2, \dots, N - 1, \\ - A_{N} {\bar{Y}}_{N - 1} + C_{N} {\bar{Y}}_{N} & = & {\bar{F}}_{N}, & i = N, \end{matrix}

(12)

where

{\bar{Y}}_{i}

are the sought vectors of the n dimension;

{\bar{F}}_{i}

are the given right-hand vectors of the n dimension; and

A_{i}, B_{i}, C_{i}

are the square matrices of the

n \times n

dimension.

The direct block elimination (or matrix sweep) method is intended for solving the SLAE with block tridiagonal matrix. The auxilliary coefficients

α_{i}

(matrices of

n \times n

dimension) and

β_{i}

(vectors of n dimension) are found by the reccurent formulae (the forward elimination phase)

\begin{matrix} α_{0} = C_{0}^{- 1} B_{0}, α_{i} = {(C_{i} - A_{i} α_{i})}^{- 1} B_{i}, i = 1, 2, \dots, N - 1, \\ β_{0} = C_{0}^{- 1} F_{0}, β_{i} = {(C_{i} - A_{i} α_{i})}^{- 1} (\bar{F_{i}} + A_{i} β_{i - 1}), i = 1, 2, \dots, N . \end{matrix}

(13)

Solution vectors

\bar{Y_{i}}

are found by formulae (the backward substitution phase)

\begin{matrix} \bar{Y_{N}} = β_{N}, \bar{Y_{i}} = α_{i} \bar{Y_{i + 1}} + β_{i}, i = N - 1, N - 2, \dots, 1, 0 . \end{matrix}

(14)

Remark 1.

The algorithms (13) and (14) are correct if matrices

C_{0}

and

(C_{i} - A_{i} α_{i})

are nonsingular for

i = 1, 2, . . ., N

. The algorithm is stable if

∥α_{i}∥ \leq 1

for

i = 1, 2, . . ., N

.

The stability condition for this algorithm is the following.

Lemma 1.

If matrices

C_{i}, i = 0, 1, . . ., N

are nonsingular, matrices

A_{i}, B_{i}, i = 1, . . ., N - 1

are non-null, and conditions

∥ C_{0}^{- 1} B_{0} ∥ \leq 1, ∥ C_{N}^{- 1} A_{N} ∥ \leq 1, ∥ C_{i}^{- 1} A_{i} ∥ + ∥ C_{i}^{- 1} B_{i} ∥ \leq 1, i = 1, \dots, N - 1,

(15)

are satisfied where at least one of the inequalities is strict; then, the algorithms (13) and (14) are stable and correct.

Remark 2.

The coefficients

α_{i}, β_{i}

are dependent on the previous ones

α_{i - 1}, β_{i - 1}

. Thus, we cannot distribute these calculations to independent workers. Thus, the parallelization is limited to operations of matrix inversion and multiplication.

3.2. Parallel Matrix Sweep Method

To construct a direct parallel algorithm, let us split the interval

i = 0, 1, . . ., N

into L subintervals of length M such as

N = L \times M

. Consider the unknown values

{\bar{Y}}_{K}, K = 0, M, . . ., N

as parameters.

Now, let us construct the reduced SLAE for

{\bar{Y}}_{K}

. To do this, consider the following problems for the interval

(K, K + M)

\{\begin{matrix} - A_{i} {\bar{U}}_{i - 1}^{1} + C_{i} {\bar{U}}_{i}^{1} - B_{i} {\bar{U}}_{i + 1}^{1} = 0, & {\bar{U}}_{K}^{1} = (10 . . . 0), & {\bar{U}}_{K + M}^{1} = (00 . . . 0), \\ \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots & \dots \dots \dots \dots & \dots \dots \dots \dots . \\ \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots & \dots \dots \dots \dots & \dots \dots \dots \dots . \\ - A_{i} {\bar{U}}_{i - 1}^{n} + C_{i} {\bar{U}}_{i}^{n} - B_{i} {\bar{U}}_{i + 1}^{n} = 0, & {\bar{U}}_{K}^{n} = (00 . . . 1), & {\bar{U}}_{K + M}^{n} = (00 . . . 0), \end{matrix}

(16)

\{\begin{matrix} - A_{i} {\bar{V}}_{i - 1}^{1} + C_{i} {\bar{V}}_{i}^{1} - B_{i} {\bar{V}}_{i + 1}^{1} = 0, & {\bar{V}}_{K}^{1} = (00 . . . 0), & {\bar{V}}_{K + M}^{1} = (10 . . . 0), \\ \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots & \dots \dots \dots \dots & \dots \dots \dots \dots . \\ \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots & \dots \dots \dots \dots & \dots \dots \dots \dots . \\ - A_{i} {\bar{V}}_{i - 1}^{n} + C_{i} {\bar{V}}_{i}^{n} - B_{i} {\bar{V}}_{i + 1}^{n} = 0, & {\bar{V}}_{K}^{n} = (00 . . . 0), & {\bar{V}}_{K + M}^{n} = (00 . . . 1), \end{matrix}

(17)

\{\begin{matrix} - A_{i} {\bar{W}}_{i - 1} + C_{i} {\bar{W}}_{i} - B_{i} {\bar{W}}_{i + 1} = {\bar{F}}_{i}, & {\bar{W}}_{K} = (00 . . . 0), & {\bar{W}}_{K + M} = (00 . . . 0), \end{matrix}

(18)

where

i = K + 1, \dots, K + M - 1

.

Theorem 1.

If

{\bar{U}}_{i}^{1}, \dots, {\bar{U}}_{i}^{n}

are solutions of auxilliary problem (16);

{\bar{V}}_{i}^{1}, \dots, {\bar{V}}_{i}^{n}

are solutions of problem (17);

{\bar{W}}_{i}

are solutions of problem (18); and

{\bar{Y}}_{i}

are solutions of the basic problem (12) for interval

(K, K + M)

, then

{\bar{Y}}_{i} = ({\bar{U}}_{i}^{1} {\bar{U}}_{i}^{2} \dots {\bar{U}}_{i}^{n}) {\bar{Y}}_{K} + ({\bar{V}}_{i}^{1} {\bar{V}}_{i}^{2} \dots {\bar{V}}_{i}^{n}) {\bar{Y}}_{K + M} + {\bar{W}}_{i} .

(19)

Proof.

Consider system (12) for the inner subinterval

(K, K + M)

:

- A_{i} {\bar{Y}}_{i - 1} + C_{i} {\bar{Y}}_{i} - B_{i} {\bar{Y}}_{i + 1} = {\bar{F}}_{i}, i = K + 1, \dots, K + M - 1 .

(20)

This system contains the parameters

{\bar{Y}}_{K}

and

{\bar{Y}}_{K + M}

.

Let us rewrite (20) in the form

\{\begin{matrix} C_{K + 1} {\bar{Y}}_{K + 1} - B_{K + 1} {\bar{Y}}_{K + 2} = A_{K + 1} {\bar{Y}}_{K} + {\bar{F}}_{K + 1}, \\ - A_{i} {\bar{Y}}_{i - 1} + C_{i} {\bar{Y}}_{i} - B_{i} {\bar{Y}}_{i + 1} = {\bar{F}}_{i}, i = K + 2, \dots, K + M - 2, \\ - A_{K + M - 1} {\bar{Y}}_{K + M - 2} + C_{K + M - 1} {\bar{Y}}_{K + M - 1} = B_{K + M - 1} {\bar{Y}}_{K + M} + {\bar{F}}_{K + M - 1} . \end{matrix}

(21)

In this system (21),

A_{K + 1} {\bar{Y}}_{K}

and

B_{K + M - 1} {\bar{Y}}_{K + M}

have the following form:

A_{K + 1} {\bar{Y}}_{K} = (\begin{matrix} A_{K + 1}^{11} & . . . & A_{K + 1}^{1 N} \\ . . . & . . . & . . . \\ A_{K + 1}^{N 1} & . . . & A_{K + 1}^{N N} \end{matrix}) (\begin{matrix} Y_{K}^{1} \\ . . . \\ Y_{K}^{N} \end{matrix}) =

= (\begin{matrix} A_{K + 1}^{11} \\ . . . \\ A_{K + 1}^{N 1} \end{matrix}) Y_{K}^{1} + \dots + (\begin{matrix} A_{K + 1}^{1 N} \\ . . . \\ A_{K + 1}^{N N} \end{matrix}) Y_{K}^{N} =

= {\bar{A}}_{K + 1}^{1} Y_{K}^{1} + {\bar{A}}_{K + 1}^{2} Y_{K}^{2} + \dots + {\bar{A}}_{K + 1}^{N} Y_{K}^{N};

(22)

B_{K + M - 1} {\bar{Y}}_{K + M} = (\begin{matrix} B_{K + M - 1}^{11} & . . . & B_{K + M - 1}^{1 N} \\ . . . & . . . & . . . \\ B_{K + M - 1}^{N 1} & . . . & B_{K + M - 1}^{N N} \end{matrix}) (\begin{matrix} Y_{K + M}^{1} \\ . . . \\ Y_{K + M}^{N} \end{matrix}) =

= (\begin{matrix} B_{K + M - 1}^{11} \\ . . . \\ B_{K + M - 1}^{N 1} \end{matrix}) Y_{K + M}^{1} + \dots + (\begin{matrix} B_{K + M - 1}^{1 N} \\ . . . \\ B_{K + M - 1}^{N N} \end{matrix}) Y_{K + M}^{N} =

= {\bar{B}}_{K + M - 1}^{1} Y_{K + M}^{1} + {\bar{B}}_{K + M - 1}^{2} Y_{K + M}^{2} + \dots + {\bar{B}}_{K + M - 1}^{N} Y_{K + M}^{N} .

Here, the line over the symbol denotes the vector column of corresponding matrix.

Taking into account the formulae (22), SLAEs (21) have the following form:

(\begin{matrix} C_{K + 1} & - B_{K + 1} \\ - A_{K + 2} & C_{K + 2} & - B_{K + 2} \\ ⋱ & ⋱ & ⋱ \\ - A_{K + M - 1} & C_{K + M - 1} \end{matrix}) (\begin{matrix} {\bar{Y}}_{K + 1} \\ {\bar{Y}}_{K + 2} \\ \dots \\ {\bar{Y}}_{K + M - 1} \end{matrix}) =

= (\begin{matrix} {\bar{F}}_{K + 1} \\ {\bar{F}}_{K + 2} \\ . . . \\ {\bar{F}}_{K + M - 1} \end{matrix}) + (\begin{matrix} {\bar{A}}_{K + 1}^{1} \\ 0 \\ . . . \\ 0 \end{matrix}) Y_{K}^{1} + \dots + (\begin{matrix} {\bar{A}}_{K + 1}^{N} \\ 0 \\ . . . \\ 0 \end{matrix}) Y_{K}^{N} +

(23)

+ (\begin{matrix} 0 \\ . . . \\ 0 \\ {\bar{B}}_{K + M - 1}^{1} \end{matrix}) Y_{K + M}^{1} + \dots + (\begin{matrix} 0 \\ . . . \\ 0 \\ {\bar{B}}_{K + M - 1}^{N} \end{matrix}) Y_{K + M}^{N} .

System (23) is equivalent to the following one:

Λ \bar{Y} = \bar{F} + {\bar{A}}^{1} Y_{K}^{1} + \dots + {\bar{A}}^{N} Y_{K}^{N} + {\bar{B}}^{1} Y_{K + M}^{1} + \dots + {\bar{B}}^{N} Y_{K + M}^{N},

(24)

where

Λ

is the submatrix of the basic block tridiagonal matrix of system (12) that corresponds to the interval

(K, K + M)

.

If matrix

Λ

is inversible, then

\bar{Y} = Λ^{- 1} \bar{F} + Λ^{- 1} {\bar{A}}^{1} Y_{K}^{1} + \dots + Λ^{- 1} {\bar{A}}^{N} Y_{K}^{N} + Λ^{- 1} {\bar{B}}^{1} Y_{K + M}^{1} + \dots + Λ^{- 1} {\bar{B}}^{N} Y_{K + M}^{N} .

From this, it follows that

\bar{Y} = {\bar{U}}^{1} Y_{K}^{1} + \dots + {\bar{U}}^{N} Y_{K}^{N} + {\bar{V}}^{1} Y_{K + M}^{1} + \dots + {\bar{V}}^{N} Y_{K + M}^{N} + \bar{W},

(25)

where

{\bar{U}}^{1} is a solution of Λ {\bar{U}}^{1} = {\bar{A}}^{1}, \dots,

{\bar{U}}^{N} is a solution of Λ {\bar{U}}^{N} = {\bar{A}}^{N},

{\bar{V}}^{1} is a solution of Λ {\bar{V}}^{1} = {\bar{B}}^{1}, \dots,

(26)

{\bar{V}}^{N} is a solution of Λ {\bar{V}}^{N} = {\bar{B}}^{N},

\bar{W} is a solution of Λ \bar{W} = \bar{F} .

Let us rewrite

{\bar{U}}^{1}, \dots, {\bar{U}}^{N}, {\bar{V}}^{1}, \dots, {\bar{V}}^{N}, \bar{W}

in a more detailed manner.

{\bar{U}}^{1} = (\begin{matrix} {\bar{U}}_{K + 1}^{1} \\ . . . \\ {\bar{U}}_{K + M - 1}^{1} \end{matrix}), \dots, {\bar{U}}^{N} = (\begin{matrix} {\bar{U}}_{K + 1}^{N} \\ . . . \\ {\bar{U}}_{K + M - 1}^{N} \end{matrix}), {\bar{V}}^{1} = (\begin{matrix} {\bar{V}}_{K + 1}^{1} \\ . . . \\ {\bar{V}}_{K + M - 1}^{1} \end{matrix}), \dots,

{\bar{V}}^{N} = (\begin{matrix} {\bar{V}}_{K + 1}^{N} \\ . . . \\ {\bar{V}}_{K + M - 1}^{N} \end{matrix}), \bar{W} = (\begin{matrix} {\bar{W}}_{K + 1} \\ . . . \\ {\bar{W}}_{K + M - 1} \end{matrix}) .

(27)

Taking into account (23) and (27), problems (26) are equivalent to following:

\{\begin{matrix} C_{K + 1} {\bar{U}}_{K + 1}^{1} - B_{K + 1} {\bar{U}}_{K + 2}^{1} = {\bar{A}}_{K + 1}^{1}, \\ . . . \\ - A_{K + M - 1} {\bar{U}}_{K + M - 2}^{1} + C_{K + M - 1} {\bar{U}}_{K + M - 1}^{1} = 0 . \end{matrix}

\dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots .

\{\begin{matrix} C_{K + 1} {\bar{U}}_{K + 1}^{N} - B_{K + 1} {\bar{U}}_{K + 2}^{N} = {\bar{A}}_{K + 1}^{N}; \\ . . . \\ - A_{K + M - 1} {\bar{U}}_{K + M - 2}^{N} + C_{K + M - 1} {\bar{U}}_{K + M - 1}^{N} = 0; \end{matrix}

\{\begin{matrix} C_{K + 1} {\bar{V}}_{K + 1}^{1} - B_{K + 1} {\bar{V}}_{K + 2}^{1} = 0; \\ . . . \\ - A_{K + M - 1} {\bar{V}}_{K + M - 2}^{1} + C_{K + M - 1} {\bar{V}}_{K + M - 1}^{1} = {\bar{B}}_{K + M - 1}^{1}; \end{matrix}

(28)

\dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots .

\{\begin{matrix} C_{K + 1} {\bar{V}}_{K + 1}^{N} - B_{K + 1} {\bar{V}}_{K + 2}^{N} = 0; \\ . . . \\ - A_{K + M - 1} {\bar{V}}_{K + M - 2}^{N} + C_{K + M - 1} {\bar{V}}_{K + M - 1}^{N} = {\bar{B}}_{K + M - 1}^{1}; \end{matrix}

\{\begin{matrix} C_{K + 1} {\bar{W}}_{K + 1} - B_{K + 1} {\bar{W}}_{K + 2} = {\bar{F}}_{K + 1}; \\ . . . \\ - A_{K + M - 1} {\bar{W}}_{K + M - 2} + C_{K + M - 1} {\bar{W}}_{K + M - 1} = {\bar{F}}_{K + M - 1} . \end{matrix}

Systems (28) for intervals

(K, K + M), K = 0, M, \dots, N

are equivalent to problems (16)–(18).

Thus, the original solution (25) on interval

(K, K + M)

has the form

{\bar{Y}}_{i} = ({\bar{U}}_{i}^{1} {\bar{U}}_{i}^{2} \dots {\bar{U}}_{i}^{n}) {\bar{Y}}_{K} + ({\bar{V}}_{i}^{1} {\bar{V}}_{i}^{2} \dots {\bar{V}}_{i}^{n}) {\bar{Y}}_{K + M} + {\bar{W}}_{i}, i = K + 1, \dots, K + M - 1 .

□

Remark 3.

Matrices

U_{i}, V_{i}

and vector

{\bar{W}}_{i}

can be obtained by the block elimination methods (13) and (14). For arbitrary L block elimination formulae for intervals

(K, K + M)

,

K = 0, M, \dots, N

have the following form.

The forward phase for Equations (16)–(18) is as follows:

\begin{matrix} α_{K + 1} = C_{K + 1}^{- 1} B_{K + 1}, β_{K + 1} = C_{K + 1}^{- 1} A_{K + 1}, {\bar{γ}}_{K + 1} = C_{K + 1}^{- 1} {\bar{F}}_{K + 1}, \\ α_{i} = {[C_{i} - A_{i} α_{i - 1}]}^{- 1} B_{i}, β_{i} = {[C_{i} - A_{i} α_{i - 1}]}^{- 1} A_{i} β_{i - 1}, \\ {\bar{γ}}_{i} = {[C_{i} - A_{i} α_{i - 1}]}^{- 1} ({\bar{F}}_{i} + A_{i} {\bar{γ}}_{i - 1}), i = K + 2, \dots, K + M - 1 . \end{matrix}

(29)

The backward phase for Equations (16)–(18) is as follows:

\begin{matrix} U_{K + M - 1} = β_{K + M - 1}, V_{K + M - 1} = α_{K + M - 1}, {\bar{W}}_{K + M - 1} = {\bar{γ}}_{K + M - 1}, \\ U_{i} = α_{i} U_{i + 1} + β_{i}, V_{i} = α_{i} V_{i + 1}, {\bar{W}}_{i} = α_{i} {\bar{W}}_{i + 1} + {\bar{γ}}_{i}, i = K + M - 2, \dots, K + 1 . \end{matrix}

(30)

When we substitute expression (19) for indices

K = 0, M, \dots, N

into the basic system (12), we will obtain a reduced SLAE for the parametric unknown values (vectors

{\bar{Y}}_{K}

). This system has a similar structure to (12) but has a smaller size.

\{\begin{matrix} [C_{0} - B_{0} U_{1}] {\bar{Y}}_{0} - [B_{0} V_{1}] {\bar{Y}}_{M} = {\bar{F}}_{0} + B_{0} {\bar{W}}_{1}, K = 0, \\ - [A_{K} U_{K - 1}] {\bar{Y}}_{K - M} + [C_{K} - A_{K} V_{K - 1} - B_{K} U_{K + 1}] {\bar{Y}}_{K} - [B_{K} V_{K + 1}] {\bar{Y}}_{K + M} = \\ = {\bar{F}}_{K} + A_{K} {\bar{W}}_{K - 1} + B_{K} {\bar{W}}_{K + 1}, K = M, \dots, N - M, \\ - [A_{N} U_{N - 1}] {\bar{Y}}_{N - M} + [C_{N} - A_{N} V_{N - 1}] {\bar{Y}}_{N} = {\bar{F}}_{N} + A_{N} {\bar{W}}_{N - 1}, K = N, \end{matrix}

(31)

where

U_{K}

and

V_{K}

are matrices of dimension

n \times n

.

Problem (31) is solved by the block-elimination method (13) and (14) in the single-threaded mode or on a single node of a cluster. Auxilliary problems (16)–(18) are solved independently for each of the L intervals. Thus, this workload can be distributed between L threads or processes. After obtaining

{\bar{Y}}_{K}

, the rest of the unknown values are found by formula (19). This work also can be performed independently for each of the L intervals.

The parallel matrix sweep algorithm for solving system (12) is presented in Listing 1.

Listing 1. Parallel matrix sweep algorithm for solving SLAE.

1.: Find values $U_{i}, V_{i}, {\bar{W}}_{i}$ from problems (16)–(18) for inner points $i \in (K, K + M)$ of each subinterval $K = 0, M, 2 M, . . ., N$ . Methods (29) and (30) are used to solve the subproblems independently on each of the subintervals.
2.: Calculate the coefficients for reduced system (31). The coefficients may be calculated independently for each K, but to solve the resulting reduced system, we need to transfer these coefficients to a single process. This requires synchronization or gather-type communication.
3.: Find values $\bar{Y_{K}}$ from the reduced system (31). Compared to the basic system (12), its dimension is much smaller. It is solved by the block elimination algorithms (13) and (14) in serial mode. The computed values ${\bar{Y}}_{K}$ must be transmitted to processors. This step requires synchronization or communication.
4.: Use formula (19) to calculate the sought values ${\bar{Y}}_{i}, i \in (K, K + M)$ . These computations may be performed independently for each subinterval K.

Thus, steps 1, 2, and 4 of this algorithm may be parallelized, while step 3 must be performed in serial mode.

To establish the stability (see Remark 1) of the parallel matrix sweep algorithm, let us prove the following theorems.

Theorem 2.

If original system (12) satisfies the condition

∥ C_{i} ∥ \geq ∥ A_{i} ∥ + ∥ B_{i} ∥ + δ, δ > 0,

then reduced system (31) also satisfies this condition in the form

∥ C_{K} - A_{K} V_{K - 1} - B_{K} U_{K + 1} ∥ \geq ∥ A_{K} U_{K - 1} ∥ + ∥ B_{K} V_{K + 1} ∥ + δ .

Proof.

\begin{matrix} ∥ C_{K} - A_{K} V_{K - 1} - B_{K} U_{K + 1} ∥ \geq ∥ C_{K} ∥ - ∥ A_{K} V_{K - 1} ∥ - ∥ B_{K} U_{K + 1} ∥ \geq \\ \geq & ∥ C_{K} ∥ - ∥ A_{K} (I - U_{K - 1}) ∥ - ∥ B_{K} (I - V_{K + 1}) ∥ \geq \\ \geq & ∥ C_{K} ∥ - ∥ A_{K} ∥ + ∥ A_{K} U_{K - 1} ∥ - ∥ B_{K} ∥ + ∥ B_{K} V_{K + 1} ∥ \geq \\ \geq & ∥ A_{K} U_{K - 1} ∥ + ∥ B_{K} V_{K + 1} ∥ + ∥ C_{K} ∥ - ∥ A_{K} ∥ - ∥ B_{K} ∥ \geq \\ \geq & ∥ A_{K} U_{K - 1} ∥ + ∥ B_{K} V_{K + 1} ∥ + δ, \end{matrix}

since

∥ U_{K} ∥ + ∥ V_{K} ∥ \leq 1

. □

Theorem 3.

If basic system (12) satisfies the stability conditions of the matrix sweep method (Lemma 1), then these conditions are sufficient for the stability of the matrix sweep method for reduced system (31) for

{\bar{Y}}_{K}

.

Proof.

Let us construct a proof by the mathematical induction method.

We will utilize the following statement [31]. If square matrix S satisfies

∥ S ∥ \leq q \leq 1

, then matrices

{(E - S)}^{- 1}

and

{∥ (E - S)}^{- 1} ∥ \leq 1 / (1 - q)

must exist.

Let

{\tilde{α}}_{1}, \dots, {\tilde{α}}_{K}, \dots, {\tilde{α}}_{N}

be the elimination coefficients for the matrix sweep method for system (31).

1. Let us demonstrate that

∥ {\tilde{α}}_{1} ∥ \leq 1 .

∥ C_{0}^{- 1} B_{0} U_{1} ∥ \leq ∥ C_{0}^{- 1} B_{0} ∥ \cdot ∥ U_{1} ∥ < 1,

therefore, there are

{(E - C_{0}^{- 1} B_{0} U_{1})}^{- 1}

and

∥ {\tilde{α}}_{1} ∥ = ∥ {(C_{0} - B_{0} U_{1})}^{- 1} B_{0} V_{1} ∥ \leq ∥ C_{0}^{- 1} {(E - C_{0}^{- 1} B_{0} U_{1})}^{- 1} B_{0} V_{1} ∥ \leq

\leq ∥ {(E - C_{0}^{- 1} B_{0} U_{1})}^{- 1} ∥ \cdot ∥ C_{0}^{- 1} B_{0} ∥ \cdot ∥ V_{1} ∥ \leq \frac{1}{1 - ∥ C_{0}^{- 1} B_{0} ∥ \cdot ∥ U_{1} ∥} \cdot ∥ C_{0}^{- 1} B_{0} ∥ \cdot ∥ V_{1} ∥ \leq

\leq \frac{∥ V_{1} ∥}{1 - ∥ U_{1} ∥} \leq \frac{∥ V_{1} ∥}{∥ V_{1} ∥} = 1, since ∥ U_{1} ∥ + ∥ V_{1} ∥ \leq 1 .

2. Assume

∥ {\tilde{α}}_{K} ∥ \leq 1 .

Let us demonstrate that

∥ {\tilde{α}}_{K + 1} ∥ \leq 1 .

∥ {\tilde{α}}_{K + 1} ∥ = ∥ {(C_{K} - A_{K} V_{K - 1} - B_{K} U_{K + 1} - A_{K} U_{K - 1} {\tilde{α}}_{K})}^{- 1} \cdot B_{K} V_{K + 1} ∥ .

Consider,

∥ {(C_{K}^{- 1} A_{K} V_{K - 1} + C_{K}^{- 1} B_{K} U_{K + 1} + C_{K}^{- 1} A_{K} U_{K - 1} {\tilde{α}}_{K})}^{- 1} ∥ \leq

\leq ∥ C_{K}^{- 1} A_{K} ∥ \cdot ∥ V_{K - 1} ∥ + ∥ C_{K}^{- 1} B_{K} ∥ \cdot ∥ U_{K + 1} ∥ + ∥ C_{K}^{- 1} A_{K} ∥ \cdot ∥ U_{K - 1} ∥ \leq

\leq ∥ C_{K}^{- 1} A_{K} ∥ + ∥ C_{K}^{- 1} B_{K} ∥ \cdot ∥ U_{K + 1} ∥ \leq 1 - ∥ C_{K}^{- 1} B_{K} ∥ + ∥ C_{K}^{- 1} B_{K} ∥ \cdot ∥ U_{K + 1} ∥ \leq

\leq 1 - ∥ C_{K}^{- 1} B_{K} ∥ \cdot (1 - ∥ U_{K + 1} ∥) \leq 1 - ∥ C_{K}^{- 1} B_{K} ∥ \cdot ∥ V_{K + 1} ∥ < 1,

since

∥ U_{K + 1} ∥ + ∥ V_{K + 1} ∥ \leq 1 .

Therefore, there are

{(E - C_{K}^{- 1} A_{K} V_{K - 1} + C_{K}^{- 1} B_{K} U_{K + 1} + C_{K}^{- 1} A_{K} U_{K - 1} {\tilde{α}}_{K})}^{- 1} and

∥ {\tilde{α}}_{K + 1} ∥ \leq ∥ {(E - C_{K}^{- 1} A_{K} V_{K - 1} + C_{K}^{- 1} B_{K} U_{K + 1} + C_{K}^{- 1} A_{K} U_{K - 1} {\tilde{α}}_{K})}^{- 1} ∥ \times

\times ∥ C_{K}^{- 1} B_{K} V_{K + 1} ∥ \leq \frac{1}{∥ C_{K}^{- 1} B_{K} ∥ \cdot ∥ V_{K + 1} ∥} \cdot ∥ C_{K}^{- 1} B_{K} ∥ \cdot ∥ V_{K + 1} ∥ = 1 .

3. Let us demonstrate that

∥ {\tilde{α}}_{N} ∥ \leq 1 .

∥ C_{N}^{- 1} A_{N} V_{N - 1} ∥ \leq ∥ C_{N}^{- 1} A_{N} ∥ \cdot ∥ V_{N - 1} ∥ < 1,

therefore, there are

{(E - C_{N}^{- 1} A_{N} V_{N - 1})}^{- 1} .

Consider

∥ {\tilde{α}}_{N} ∥ = ∥ {(C - A_{N} V_{N - 1})}^{- 1} A_{N} U_{N - 1} ∥ \leq

\leq ∥ {(E - C_{N}^{- 1} A_{N} V_{N - 1})}^{- 1} ∥ \cdot ∥ C_{N}^{- 1} A_{N} ∥ \cdot ∥ U_{N - 1} ∥ \leq

\leq \frac{1}{∥ C_{N}^{- 1} A_{N} ∥ \cdot ∥ V_{N - 1} ∥} \cdot ∥ C_{N}^{- 1} A_{N} ∥ \cdot ∥ U_{N - 1} ∥ \leq \frac{∥ U_{N - 1} ∥}{1 - ∥ V_{N - 1} ∥} \leq 1,

since

∥ U_{N - 1} ∥ + ∥ V_{N - 1} ∥ \leq 1 .

□

4. Numerical Method for Solving the Inverse Problems

To solve the inverse problems (1)–(3) and (5), we use the iterative conjugate gradient method [23,33]. Consider that additional information

φ

may contain a random perturbation

φ^{δ} = φ \cdot (1 + r a n d (- δ, δ))

. To overcome this, we will regularize the inverse problems using the Lavrentyev scheme [34].

The resulting algorithm for solving the inverse problems is presented in Listing 2, where

ε > 0

is the regularization parameter.

Listing 2. Regularized conjugate gradient algorithm for solving the inverse problems.

Initialization:

1.: Set $s = 0$ as the iterative step.
2.: Set the initial approximation $ψ_{0} (x)$ , for example, $ψ_{0} (x) = 0$ .
3.: Solve the initial boundary problems (1)–(3) by substituting the right-hand part $ψ$ with $ψ_{0}$ ; obtain $U_{T}^{0} = U (x, T) |_{ψ_{0}}$ .
4.: Calculate the initial residual $r_{0} (x) = φ (x) - (U_{T}^{0} + ε ψ_{0})$ and initial estimation $p_{0} (x) = r_{0} (x)$ , where $ε > 0$ is the regularization parameter.

Iterations:

5.: Set $s = s + 1$ .
6.: Solve the initial boundary problems by substituting the right-hand part with $p_{s} (x)$ ; obtain $U_{T}^{s} = U (x, T) |_{p_{s}}$ .
7.: Calculate the coefficient $α_{s} = (r_{s}, r_{s}) / (p_{s}, (U_{T}^{s} + ε p_{s}))$ .
8.: Calculate the estimation and residual for next step $ψ_{s + 1} = ψ_{s} + α_{s} p_{s}, r_{s + 1} = r_{s} - α_{s} (U_{T}^{s} + ε p_{s})$ .
9.: Calculate the coefficient $β_{s} = (r_{s + 1}, r_{s + 1}) / (r_{s}, r_{s})$ .
10.: Calculate $p_{s + 1} = r_{s + 1} + β_{s} p_{s} .$
11.: Check the stopping rule $∥r_{s}∥ / ∥φ∥ < μ$ , $0 < μ < 1$ . If not met, go to step 5.

5. Parallel Implementation of Algorithms for Solving the Inverse Problem

The numerical solution to the problems related to the fractional differential equation is an expensive task that requires a lot of computing time.

The most time-consuming subroutine of the regularized conjugate gradient algorithm (see Listing 2) is solving the auxiliary initial boundary problem at each iteration. In turn, this procedure consists in forming and solving SLAEs (11) at each subsequent time step.

5.1. Efficient Computation of the Right-Hand Parts

Forming SLAE requires the calculation of the right-hand parts using formula (9). In our earlier work [27], when solving one-dimensional problems, the fraction of time spent to calculate the right-hand part was up to 70% of the total time. To optimize this procedure, we implemented the logarithmic memory approach. It consists of using the non-uniform time grid when computing the approximation of the fractional derivative. The fine time step is used for the latest history part. For the more distant history, successively larger time steps are used. This approach allowed us to reduce the computing time for the one-dimensional case by up to 1.5 times.

This approach may also be utilized for the ctwo-dimensional problem. For solving system (11), a modified variant of formula (9) takes the form

\begin{matrix} f_{i_{1}, i_{2}, 1} = σ_{α, τ} U_{i_{1}, i_{2}, 0} + ψ_{i_{1}, i_{2}} η_{i_{1}, i_{2}, 0}, \\ f_{i_{1}, i_{2}, j} = σ_{α, τ} U_{i_{1}, i_{2}, j - 1} - \sum_{(ℓ, k)} σ_{α, τ}^{(ℓ, k)} w_{ℓ, k}^{(α)} (U_{i_{1}, i_{2}, j - ℓ + 1} - U_{i_{1}, i_{2}, j - k}) + ψ_{i_{1}, i_{2}} η_{i_{1}, i_{2}, j}, j > 1, \end{matrix}

(32)

\begin{matrix} (ℓ, k) \in \{(2, 2 + θ^{0}), (2 + θ^{0}, 2 + θ^{1}), (2 + θ^{1}, 2 + θ^{2}), \dots, (2 + θ^{⌊ {log}_{θ} j ⌋}, j)\}, \\ σ_{α, τ}^{(ℓ, k)} = \frac{1}{Γ (1 - α) (1 - α) θ^{k - ℓ} τ^{α}}, w_{ℓ, k}^{(α)} = {(k)}^{1 - α} - {(ℓ - 1)}^{1 - α}, 0 < α < 1, \end{matrix}

where

θ \in N

is the stretching coefficient and

⌊ {log}_{θ} n ⌋

is the floor function (integer part). Note that approach has complexity

O (\tilde{N} \cdot log \tilde{N})

in contrast of

O ({\tilde{N}}^{2})

of the uniform time grid.

In the next section, we will explore the usefulness of this approach for the case of a two-dimensional problem.

5.2. Parallel Implementation of the SLAE Solver

To speed up SLAE solving, we implement the parallel matrix sweep Algorithm (see Listing 1). For comparison, we also implemented the serial block-elimination methods (13) and (14).

The parallel algorithm for solving the inverse problem of finding the source term of the time-fractional diffusion equation was implemented for the multicore processor using OpenMP technology [35] and the Intel MKL library [36]. The parallelization is performed as follows.

The workload of calculating the right-hand parts (9) and (32) is distributed to OpenMP threads utilizing to the same subinterval decomposition that is used for parallel matrix sweep algorithm.
The matrix operations and the inversion of the matrix blocks are performed with MKL routines (gemv, gemm, getrf, and getri).
In the parallel matrix sweep algorithm (Listing 1), steps 1, 2, and 4 are performed by the individual OpenMP threads on their corresponding subintervals. To perform step 3, thread synchronization is required. This is performed by the ‘#pragma omp barrier’ directive. Note that this synchronization requires additional time.

6. Numerical Experiments

In this section, we present the numerical experiments of solving the direct and inverse problems for the two-dimensional fractional diffusion equation. The experiments were performed using the developed code on the Intel i9-12900k CPU, which has 8 P-cores. The goal is to study the validity of the proposed numerical methods, as well as the efficiency of the parallel code.

6.1. Problem 1

Consider the two-dimensional equation

\frac{\partial^{α} U (x_{1}, x_{2}, t)}{\partial t^{α}} = \frac{\partial^{2} U (x_{1}, x_{2}, t)}{\partial x_{1}^{2}} + \frac{\partial^{2} U (x_{1}, x_{2}, t)}{\partial x_{2}^{2}} + (\frac{2}{Γ (3 - α)} t^{2 - α} + 2 t^{2}) sin (x_{1}) sin (x_{2})

(33)

with initial and boundary conditions

U (x_{1}, x_{2}, 0) = 0,

U (x_{1}, 0, 0) = 0, U (x_{1}, π, 0) = 0,

U (0, x_{2}, 0) = 0, U (π, x_{2}, 0) = 0,

and area

0 \leq x_{1}, x_{2} \leq γ_{1} = γ_{2} = π, 0 \leq t \leq T = 1,

for order

0 < α < 1 .

Paper [37] presents the exact solution for this equation

U (x_{1}, x_{2}, t) = t^{2} sin (x_{1}) sin (x_{2}) .

6.1.1. Experiment 1

Experiment 1 consists of solving the forward (initial boundary) problem for Equation (33) on the various grids

n = N = {128; 256; 512}

,

\tilde{N} = {64; 128; 256}

and various parameters

α = {0.5; 0.8; 0.95}

. It was solved using the difference scheme described in Section 2.2. For solving the SLAE, two methods were applied, namely, the classical serial block-elimination method (13) and (14) and parallel matrix sweep method (see Listing 1).

Figure 1 shows the exact solution

U (x_{1}, x_{2}, 1) = t^{2} sin (x_{1}) sin (x_{2})

and approximate solution

\tilde{U} (x_{1}, x_{2}, 1)

for Problem 1 obtained by the parallel matrix sweep algorithm for grid size

n = N = 512

,

\tilde{N} = 256

, and the order of fractional derivative

α = 0.5

. The approximate solutions obtained by the matrix sweep and parallel matrix sweep methods coincide with each other up to the machine precision (

10^{- 15}

, as we used the double precision format).

Table 1 contains the relative error of the solutions

∥U - \tilde{U}∥ / ∥U∥

for various grid sizes and parameters

α

. The experiments show that taking a finer grid either for space or time reduces the relative error of the resulting solution. For the time grid, the rate of convergence is close to linear (increasing the number of grid points twofold reduces the error approximately by two times). As parameter

α

approaches 1, the error of the solution increases. To achieve higher accuracy, we need to use a finer grid.

Table 2 presents the total computing time

T_{L}

of solving the direct problem. For the parallel matrix sweep method, the computing time for various numbers L of OpenMP threads are presented. Total time

T_{L}

consists of time

T_{S L A E}

spent on solving the SLAEs plus time

T_{R i g h t}

spent on computing the right-hand part for these SLAEs using formula (9). The table shows that for the case of two-dimensional problem solving SLAE with a more complex matrix structure (block-tridiagonal), the time spent on solving the SLAE is up to 600 times larger than the time spent on computing the right-hand part. Thus, utilizing the optimized approach for computing the fractional derivative is less relevant as it would bring a miniscule speedup.

6.1.2. Experiment 2

Experiment 2 consists of solving the inverse problem for Equation (33). We assume that

η (t) = (\frac{2}{Γ (3 - α)} t^{2 - α} + 2 t^{2})

and

φ (x_{1}, x_{2}) = sin (x_{1}) sin (x_{2})

. Thus, we need to solve the inverse problem of finding unknown

[U (x_{1}, x_{2}, T), ψ (x_{1}, x_{2})]

. For this experiment, we introduce the varying level

δ

of random perturbation to the a priori data

φ^{δ} = φ \cdot (1 + r a n d (- δ, δ))

.

Remark 4.

Note that this level corresponds to an error in the infinity norm, i.e.,

δ \approx {∥φ - φ^{δ}∥}_{\infty} / {∥φ∥}_{\infty}

. In the rest of the paper, the norm is implied to be

L_{2}

-norm. In tables, we provide the corresponding

δ_{2} = ∥φ - φ^{δ}∥ / ∥φ∥

.

The inverse problem was solved by the regularized conjugate gradient method Algorithm (see Listing 2). For solving the SLAEs, the parallel matrix sweep method was used. The grid size was

n = N = 256

,

\tilde{N} = 256

.

Figure 2 shows the exact solution

ψ (x_{1}, x_{2}) = sin (x_{1}) sin (x_{2})

and the approximate solution

\tilde{ψ} (x_{1}, x_{2})

for Problem 1 for the noise level

δ = 0.02

.

Table 3 presents the results of Experiment 2 for varying levels of noise

δ, δ_{2}

. It contains the values of regularization parameter

ε

, the threshold

μ

for the stopping criterion (we used

μ = δ_{2}

for experiments), number of iterations S, and the relative error of the resulting solution.

6.2. Problem 2

Consider the two-dimensional equation [22]

\frac{\partial^{α} U (x_{1}, x_{2}, t)}{\partial t^{α}} = \frac{\partial^{2} U (x_{1}, x_{2}, t)}{\partial x_{1}^{2}} + \frac{\partial^{2} U (x_{1}, x_{2}, t)}{\partial x_{2}^{2}} + sin (x_{1}) sin (x_{2}) + sin (2 x_{1}) sin (2 x_{2})

(34)

with initial and boundary conditions

U (x_{1}, x_{2}, 0) = 0,

U (x_{1}, 0, 0) = 0, U (x_{1}, π, 0) = 0,

U (0, x_{2}, 0) = 0, U (π, x_{2}, 0) = 0,

and area

0 \leq x_{1}, x_{2} \leq π, 0 \leq t \leq 1,

for order

0 < α < 1 .

The numerical experiment consists of solving the inverse problem for Equation (34). We assume that

η (x_{1}, x_{2}, t) = 1

. Additional data

φ (x_{1}, x_{2}) = U (x_{1}, x_{2}, 1)

are obtained by solving the direct problem substituting exact

ψ (x_{1}, x_{2}) = sin (x_{1}) sin (x_{2}) + sin (2 x_{1}) sin (2 x_{2})

. They are shown in Figure 3.

Table 4 presents the results of experiments for Problem 2 for varying levels of noise

δ, δ_{2}

. It contains the values of the regularization parameter

ε

, the threshold

μ

for the stopping criterion (we used

μ = δ_{2}

for experiments), the number of iterations S, and the relative error of the resulting solution. The grid size was

n = N = 256

,

\tilde{N} = 256

, and order

α = 0.5

.

Figure 4 shows the approximate solutions

\tilde{ψ} (x_{1}, x_{2})

for Problem 2 for various noise levels.

7. Discussion

According to the experiments, the relative error of the direct problem solution decreases with finer grid size. This indicates the experimental confirmation of convergence of the finite difference scheme.

In the case of two-dimensional problem solving, SLAE takes a significantly larger time (up to 600) than computing the right-hands part for SLAE. This makes the parallel implementation of the SLAE solver more important than the optimization of the procedures for computing the fractional derivative.

Experiments show that the parallel matrix sweep method for solving the SLAE has good parallel efficiency. The minimal computing time is achieved by using eight OpenMP threads on an eight-core processor.

The parallel code performance is mainly limited by memory bandwidth. Adding more than eight threads does not reduce the computing time. The largest speed up is only three-fold for

512 \times 512

spatial grid. Figure 5 presents the roofline analysis performed by the Intel Advisor tool. Most subroutines of the parallel code lie primarily below and near the slanted line that represents DRAM bandwidth. This indicates that the code is memory-bound.

Several approaches can be used to overcome this limitation. Computing hardware with large memory bandwidth, such as graphics processors (GPU), can be used. Central processors with DDR4 or DDR5 RAM can achieve up to 100 GB/s, while modern GPUs have a bandwidth of 1000 GB/s or higher. Another option is to use massive distributed memory systems. Since each node works independently, the memory speed of individual node is effectively summed. Moreover, it enables larger problems to be solved with data that cannot be accommodated in the memory of a single computing node.

The experiments in Table 3 and Table 4 show that for the model problems, the regularized conjugate gradient method allows us to solve the inverse problem even with noised data. The results are comparable with other works in terms of accuracy (for example, see [23], Table 3).

We also note that while this work is devoted to the case of the two-dimensional equation, the results may be extended to three-dimensional elliptic equations. The structure of matrix A in Equation (11) will remain block tridiagonal, but the inner structure of the blocks will be more complex.

8. Conclusions

In this work, we construct the parallel algorithm for solving the inverse problem of finding the space-dependent component of a source term in a two-dimensional fractional diffusion equation. The considered inverse problem is solved by the iterative conjugate gradient method. At each iteration, it is necessary to solve an auxiliary direct initial-boundary value problem. Applying the finite difference scheme, we reduce the initial-boundary value problem to solving an SLAE with block tridiagonal matrices at each subsequent time level. For the efficient solution of such SLAEs, we construct and implement the direct parallel matrix sweep method. Stability and correctness for the parallel matrix sweep method are established. In the two-dimensional case, computing the fractional derivative (the right-hand part of the SLAE) takes little time in comparison with solving the SLAE.

The algorithm is implemented for the multicore processors using the OpenMP technology. In the numerical experiments, we investigated the validity of numerical methods and the efficiency and speedup of the parallel algorithm. The utilization of the parallel sweep algorithm reduces the computing time by up to three times on a eight-core processor. Using Lavrentyev regularization method allows us to solve the inverse problem with a disturbed data.

In future, the authors plan to implement a similar approach to solving the retrospective inverse problem (identifying the initial value) for a fractional differential equation. The developed algorithms may be utilized for real applications. The parallel algorithms will be implemented on graphics processors.

Author Contributions

Conceptualization, E.N.A., M.A.S. and V.E.M.; methodology, E.N.A., M.A.S. and V.E.M.; validation, E.N.A., M.A.S., V.E.M. and Y.N.; formal analysis, E.N.A., M.A.S., V.E.M. and Y.N.; investigation, E.N.A., V.E.M. and Y.N.; resources, E.N.A., M.A.S. and V.E.M.; writing—original draft preparation, E.N.A., M.A.S. and V.E.M.; writing—review and editing, E.N.A., M.A.S., V.E.M. and Y.N.; supervision, E.N.A. and M.A.S.; project administration, V.E.M.; and funding acquisition, M.A.S. All authors have read and agreed to the published version of the manuscript.

Funding

The second author (M.A.S.) and fourth author (Y.N.) were financially supported by the Ministry of Science and Higher Education of the Republic of Kazakhstan (project AP09258836). The first author (E.N.A.) and third author (V.E.M.) received no external funding.

Data Availability Statement

The data presented in this study are the model data. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Machado, J.T.; Galhano, A.; Trujillo, J. Science metrics on fractional calculus development since 1966. Fract. Calc. Appl. Anal. 2013, 16, 479–500. [Google Scholar] [CrossRef]
Podlubny, I. Fractional differential equations. Math. Sci. Eng. 1999, 198, 41–119. [Google Scholar]
Metzler, R.; Jeon, J.H.; Cherstvy, A.G.; Barkai, E. Anomalous diffusion models and their properties: Non-stationarity, non-ergodicity, and ageing at the centenary of single particle tracking. Phys. Chem. Chem. Phys. 2014, 16, 24128–24164. [Google Scholar] [CrossRef] [PubMed]
Tateishi, A.A.; Ribeiro, H.V.; Lenzi, E.K. The Role of Fractional Time-Derivative Operators on Anomalous Diffusion. Front. Phys. 2017, 5, 52. [Google Scholar] [CrossRef]
Yegenova, A.; Sultanov, M.; Brener, A. Nonlinear Wave Model for Transport Phenomena in Media with Non-local Effects. Chem. Eng. Trans. 2021, 86, 1201–1206. [Google Scholar] [CrossRef]
Li, X.; Han, X.; Wang, X. Numerical modeling of viscoelastic flows using equal low-order finite elements. Comput. Methods Appl. Mech. Eng. 2010, 199, 570–581. [Google Scholar] [CrossRef]
Maslovskaya, A.; Moroz, L. Time-fractional Landau–Khalatnikov model applied to numerical simulation of polarization switching in ferroelectrics. Nonlinear Dyn. 2023, 111, 4543–4557. [Google Scholar] [CrossRef]
Benson, D.A.; Wheatcraft, S.W.; Meerschaert, M.M. Application of a fractional advection-dispersion equation. Water Resour. Res. 2000, 36, 1403–1412. [Google Scholar] [CrossRef]
Laskin, N.; Lambadaris, I.; Harmantzis, F.; Devetsikiotis, M. Fractional Lévy motion and its application to network traffic modeling. Comput. Netw. 2002, 40, 363–375. [Google Scholar] [CrossRef]
Sun, H.; Zhang, Y.; Baleanu, D.; Chen, W.; Chen, Y. A new collection of real world applications of fractional calculus in science and engineering. Commun. Nonlinear Sci. Numer. Simul. 2018, 64, 213–231. [Google Scholar] [CrossRef]
Diethelm, K.; Ford, N.; Freed, A.; Luchko, Y. Algorithms for the fractional calculus: A selection of numerical methods. Comput. Methods Appl. Mech. Eng. 2005, 194, 743–773. [Google Scholar] [CrossRef]
Baleanu, D.; Diethelm, K.; Scalas, E.; Trujillo, J.J. Fractional Calculus: Models and Numerical Methods; World Scientific: Singapore, 2012; Volume 3. [Google Scholar]
Li, C.; Zeng, F. Numerical Methods for Fractional Calculus; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
Sultanov, M.A.; Durdiev, D.K.; Rahmonov, A.A. Construction of an Explicit Solution of a Time-Fractional Multidimensional Differential Equation. Mathematics 2021, 9, 2052. [Google Scholar] [CrossRef]
Gong, C.; Bao, W.; Tang, G.; Jiang, Y.; Liu, J. A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method. Sci. World J. 2014, 2014, 219580. [Google Scholar] [CrossRef] [PubMed]
Akimova, E.N.; Misilov, V.E.; Sultanov, M.A. Regularized gradient algorithms for solving the nonlinear gravimetry problem for the multilayered medium. Math. Methods Appl. Sci. 2022, 45, 8760–8768. [Google Scholar] [CrossRef]
Li, X.; Su, Y. A parallel in time/spectral collocation combined with finite difference method for the time fractional differential equations. J. Algorithms Comput. Technol. 2021, 15, 17483026211008409. [Google Scholar] [CrossRef]
De Luca, P.; Galletti, A.; Ghehsareh, H.; Marcellino, L.; Raei, M. A GPU-CUDA framework for solving a two-dimensional inverse anomalous diffusion problem. Parallel Comput. Technol. Trends 2020, 36, 311. [Google Scholar]
Yang, X.; Wu, L. A New Kind of Parallel Natural Difference Method for Multi-Term Time Fractional Diffusion Model. Mathematics 2020, 8, 596. [Google Scholar] [CrossRef]
Berdyshev, A.S.; Sultanov, M.A. On Stability of the Solution of Multidimensional Inverse Problem for the Schrödinger Equation. Math. Model. Nat. Phenom. 2017, 12, 119–133. [Google Scholar] [CrossRef]
Samarskii, A.A.; Vabishchevich, P.N. Numerical Methods for Solving Inverse Problems of Mathematical Physics; Walter de Gruyter: Berlin, Germany, 2007; Volume 52. [Google Scholar]
Yang, F.; Ren, Y.P.; Li, X.X.; Li, D.G. Landweber iterative method for identifying a space-dependent source for the time-fractional diffusion equation. Bound. Value Probl. 2017, 2017, 163. [Google Scholar] [CrossRef]
Su, L.D.; Vasil’ev, V.I.; Jiang, T.S.; Wang, G. Identification of stationary source in the anomalous diffusion equation. Inverse Probl. Sci. Eng. 2021, 29, 3406–3422. [Google Scholar] [CrossRef]
Bazhlekova, E. An Inverse Source Problem for the Generalized Subdiffusion Equation with Nonclassical Boundary Conditions. Fractal Fract. 2021, 5, 63. [Google Scholar] [CrossRef]
Gong, X.; Wei, T. Reconstruction of a time-dependent source term in a time-fractional diffusion-wave equation. Inverse Probl. Sci. Eng. 2019, 27, 1577–1594. [Google Scholar] [CrossRef]
Nguyen, H.T.; Le, D.L.; Nguyen, V.T. Regularized solution of an inverse source problem for a time fractional diffusion equation. Appl. Math. Model. 2016, 40, 8244–8264. [Google Scholar] [CrossRef]
Sultanov, M.A.; Akimova, E.N.; Misilov, V.E.; Nurlanuly, Y. Parallel Direct and Iterative Methods for Solving the Time-Fractional Diffusion Equation on Multicore Processors. Mathematics 2022, 10, 323. [Google Scholar] [CrossRef]
Akimova, E.N.; Sultanov, M.A.; Misilov, V.E.; Nurlanuly, Y. Parallel sweep algorithm for solving direct and inverse problems for time-fractional diffusion equation. Numer. Methods Program. 2022, 23, 275–287. (In Russian) [Google Scholar] [CrossRef]
Zhang, Y. A finite difference method for fractional partial differential equation. Appl. Math. Comput. 2009, 215, 524–529. [Google Scholar] [CrossRef]
Slodička, M.; Šišková, K.; Bockstal, K.V. Uniqueness for an inverse source problem of determining a space dependent source in a time-fractional diffusion equation. Appl. Math. Lett. 2019, 91, 15–21. [Google Scholar] [CrossRef]
Samarskii, A.; Nikolaev, E. Numerical Methods for Grid Equations, Volume I: Direct Methods; Birkhäuser: Basel, Switzerland, 1989. [Google Scholar]
Akimova, E.N. Parallel Algorithms for Solving the Gravimetry, Magnetometry, and Elastisity Problems on Multiprocessor Systems with Distributed Memory. Doctor of Physical and Mathematical Sciences, Institute of Mathematics and Mechanics, Ural Branch of Russian Academy of Sciences, Ekaterinburg, Russia, 2009. (In Russian). [Google Scholar]
Saad, Y. Iterative Methods for Sparse Linear Systems; SIAM: Philadelphia, PA, USA, 2003. [Google Scholar]
Vasin, V.V.; Eremin, I.I. Operators and Iterative Processes of Fejér Type: Theory and Applications; De Gruyter: Berlin, Germany; New York, NY, USA, 2009. [Google Scholar] [CrossRef]
OpenMP Community. OpenMP Application Programming Interface Specification. Available online: https://www.openmp.org (accessed on 1 August 2023).
Intel Corporation. Accelerate Fast Math with Intel oneAPI Math Kernel Library. Available online: https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl.html (accessed on 1 August 2023).
Zhang, Y.N.; Sun, Z.Z. Alternating direction implicit schemes for the two-dimensional fractional sub-diffusion equation. J. Comput. Phys. 2011, 230, 8713–8728. [Google Scholar] [CrossRef]

$Fractalfract 07 00801 g001$

Figure 1. Results of Experiment 1 for Problem 1: (a) exact solution

U (x_{1}, x_{2}, T)

; (b) approximate solution

\tilde{U} (x_{1}, x_{2}, T)

obtained by the parallel sweep algorithm.

Figure 1. Results of Experiment 1 for Problem 1: (a) exact solution

U (x_{1}, x_{2}, T)

; (b) approximate solution

\tilde{U} (x_{1}, x_{2}, T)

obtained by the parallel sweep algorithm.

$Fractalfract 07 00801 g001$

$Fractalfract 07 00801 g002$

Figure 2. Results of Experiment 2 for Problem 1: (a) exact solution

ψ (x_{1}, x_{2})

; (b) approximate solution

\tilde{ψ} (x_{1}, x_{2})

obtained by the regularized conjugate gradient method with noise level

δ = 0.02

.

Figure 2. Results of Experiment 2 for Problem 1: (a) exact solution

ψ (x_{1}, x_{2})

; (b) approximate solution

\tilde{ψ} (x_{1}, x_{2})

obtained by the regularized conjugate gradient method with noise level

δ = 0.02

.

$Fractalfract 07 00801 g002$

$Fractalfract 07 00801 g003$

Figure 3. A priori data

φ (x_{1}, x_{2})

for Problem 2.

Figure 3. A priori data

φ (x_{1}, x_{2})

for Problem 2.

$Fractalfract 07 00801 g003$

$Fractalfract 07 00801 g004$

Figure 4. Results of experiments for Problem 2: approximate solution

\tilde{ψ} (x_{1}, x_{2})

obtained by the regularized conjugate gradient method with noise level (a)

δ = 0.00

; (b)

δ = 0.01

; (c)

δ = 0.02

; and (d)

δ = 0.05

.

Figure 4. Results of experiments for Problem 2: approximate solution

\tilde{ψ} (x_{1}, x_{2})

obtained by the regularized conjugate gradient method with noise level (a)

δ = 0.00

; (b)

δ = 0.01

; (c)

δ = 0.02

; and (d)

δ = 0.05

.

$Fractalfract 07 00801 g004$

$Fractalfract 07 00801 g005$

Figure 5. Roofline analysis for various subroutines (represented by dots) of parallel code for 16 OpenMP threads.

$Fractalfract 07 00801 g005$

Table 1. Results of Experiment 1 for Problem 1: relative error of solving the direct problem.

	Grid Size	$\tilde{N} = 64$	$\tilde{N} = 128$	$\tilde{N} = 256$
$α = 0.5$	$n = N = 128$	$3.64 \times 10^{- 4}$	$1.48 \times 10^{- 4}$	$7.07 \times 10^{- 5}$
	$n = N = 256$	$3.45 \times 10^{- 4}$	$1.26 \times 10^{- 4}$	$4.93 \times 10^{- 5}$
	$n = N = 512$	$3.40 \times 10^{- 4}$	$1.21 \times 10^{- 4}$	$1.98 \times 10^{- 5}$
$α = 0.8$	$n = N = 128$	$2.25 \times 10^{- 3}$	$9.44 \times 10^{- 4}$	$4.23 \times 10^{- 4}$
	$n = N = 256$	$2.13 \times 10^{- 3}$	$9.25 \times 10^{- 4}$	$4.04 \times 10^{- 4}$
	$n = N = 512$	$2.12 \times 10^{- 3}$	$9.20 \times 10^{- 4}$	$4.02 \times 10^{- 4}$
$α = 0.95$	$n = N = 128$	$5.14 \times 10^{- 3}$	$2.48 \times 10^{- 3}$	$1.20 \times 10^{- 3}$
	$n = N = 256$	$5.12 \times 10^{- 3}$	$2.46 \times 10^{- 3}$	$1.18 \times 10^{- 3}$
	$n = N = 512$	$5.11 \times 10^{- 3}$	$2.45 \times 10^{- 3}$	$1.18 \times 10^{- 3}$

Table 2. Results of Experiment 1 for Problem 1: computing time of solving the direct problem.

Method	Number L of OpenMP Threads	$T_{L}$ (Minutes)	$T_{SLAE}$	$T_{Right}$
Matrix Sweep (13) and (14)	Serial	28.6	28.5	0.13
Parallel Matrix Sweep (Listing 1)	2	30.7	30.63	0.06
Parallel Matrix Sweep	4	16.3	16.26	0.03
Parallel Matrix Sweep	8	10.1	10.08	0.016
Parallel Matrix Sweep	16	10.5	10.48	0.017

Table 3. Results of Experiment 2 for Problem 1: solving the inverse problem.

Noise Level $δ$	Noise Level $δ_{2}$	Regularization Parameter $ε$	Stopping Rule $μ$	Number of Iterations S	Error of Solution $∥ψ - \tilde{ψ}∥ / ∥ψ∥$
0	0	0	0.001	3	5 $\times 10^{- 5}$
0.01	0.006	0.1	0.006	2	0.1
0.02	0.01	0.1	0.01	2	0.14
0.05	0.03	0.2	0.03	1	0.16

Table 4. Results of experiments for Problem 2: solving the inverse problem.

Noise Level $δ$	Noise Level $δ_{2}$	Regularization Parameter $ε$	Stopping Rule $μ$	Number of Iterations S	Error of Solution $∥ψ - \tilde{ψ}∥ / ∥ψ∥$
0	0	0	0.001	3	3.2 $\times 10^{- 4}$
0.01	0.006	0.02	0.006	3	0.13
0.02	0.01	0.02	0.01	3	0.19
0.05	0.03	0.05	0.03	1	0.23

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Akimova, E.N.; Sultanov, M.A.; Misilov, V.E.; Nurlanuly, Y. Parallel Algorithm for Solving the Inverse Two-Dimensional Fractional Diffusion Problem of Identifying the Source Term. Fractal Fract. 2023, 7, 801. https://doi.org/10.3390/fractalfract7110801

AMA Style

Akimova EN, Sultanov MA, Misilov VE, Nurlanuly Y. Parallel Algorithm for Solving the Inverse Two-Dimensional Fractional Diffusion Problem of Identifying the Source Term. Fractal and Fractional. 2023; 7(11):801. https://doi.org/10.3390/fractalfract7110801

Chicago/Turabian Style

Akimova, Elena N., Murat A. Sultanov, Vladimir E. Misilov, and Yerkebulan Nurlanuly. 2023. "Parallel Algorithm for Solving the Inverse Two-Dimensional Fractional Diffusion Problem of Identifying the Source Term" Fractal and Fractional 7, no. 11: 801. https://doi.org/10.3390/fractalfract7110801

Article Menu

Parallel Algorithm for Solving the Inverse Two-Dimensional Fractional Diffusion Problem of Identifying the Source Term

Abstract

1. Introduction

2. Problem

2.1. Statement of the Problem

2.2. Discretization of Equation and Difference Scheme

2.3. Constructing the SLAE

3. Numerical Methods for Solving the SLAE

3.1. Block-Elimination Method

3.2. Parallel Matrix Sweep Method

4. Numerical Method for Solving the Inverse Problems

5. Parallel Implementation of Algorithms for Solving the Inverse Problem

5.1. Efficient Computation of the Right-Hand Parts

5.2. Parallel Implementation of the SLAE Solver

6. Numerical Experiments

6.1. Problem 1

6.1.1. Experiment 1

6.1.2. Experiment 2

6.2. Problem 2

7. Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI