Modified Jacobi-Gradient Iterative Method for Generalized Sylvester Matrix Equation

Nopparut Sasaki; Pattrawut Chansangiam

doi:10.3390/sym12111831

Abstract

We propose a new iterative method for solving a generalized Sylvester matrix equation

A_{1} X A_{2} + A_{3} X A_{4} = E

with given square matrices

A_{1}, A_{2}, A_{3}, A_{4}

and an unknown rectangular matrix X. The method aims to construct a sequence of approximated solutions converging to the exact solution, no matter the initial value is. We decompose the coefficient matrices to be the sum of its diagonal part and others. The recursive formula for the iteration is derived from the gradients of quadratic norm-error functions, together with the hierarchical identification principle. We find equivalent conditions on a convergent factor, relied on eigenvalues of the associated iteration matrix, so that the method is applicable as desired. The convergence rate and error estimation of the method are governed by the spectral norm of the related iteration matrix. Furthermore, we illustrate numerical examples of the proposed method to show its capability and efficacy, compared to recent gradient-based iterative methods.

Keywords:

generalized Sylvester matrix equation; iterative method; gradient; Kronecker product; matrix norm

MSC:

65F45; 15A12; 15A60; 15A69

1. Introduction

In control engineering, certain problems concerning the analysis and design of control systems can be formulated as the Sylvester matrix equation:

\begin{matrix} A_{1} X + X A_{2} = C \end{matrix}

(1)

where

X \in R^{m \times n}

is an unknown matrix, and

A_{1}, A_{2}, C

are known matrices of appropriate dimensions. Here,

R^{m \times n}

stands for the set of

m \times n

real matrices. Let us denote by

{(\cdot)}^{T}

the transpose of a matrix. When

A_{2} = {A_{1}}^{T}

, the equation is reduced to the Lyapunov equation, which is often found in continuous- and discrete-time stability analysis [1,2]. The Sylvester equation is a special case of a generalized Sylvester matrix equation:

\begin{matrix} A_{1} X A_{2} + A_{3} X A_{4} = E \end{matrix}

(2)

where

X \in R^{m \times n}

is unknown, and

A_{1}, A_{2}, A_{3}, A_{4}, E

are known constant matrices of appropriate dimensions. This equation also includes the equation

A_{1} X A_{2} = C

and the Kalman–Yakubovich equation

A_{1} X A_{2} - X = C

as special cases. All of these equations have profound applications in linear system theory and related areas.

Normally, a direct way to solve the generalized Sylvester Equation (2) is to reduce it to a linear system by taking the vector operator. Then, Equation (2) reduces to

P x = b

where

P = {A_{2}}^{T} \otimes A_{1} + {A_{4}}^{T} \otimes A_{3}, x = vec (X), and b = vec (E) .

Here,

vec (\cdot)

is the vector operator and the operation ⊗ is the Kronecker multiplication. So, Equation (2) has a unique solution if and only if the square matrix P is invertible. However, it is not easy to compute

P^{- 1}

when the sizes of

A_{1}, A_{2}, A_{3}, A_{4}

are not small, since the size of P can be very large. Such a size problem leads to computation difficulty in that excessive computer memory is required for the inversion of large matrices. Thus, another way exists which transform the matrix cofficients into a Schur or Hessenberg form, for which solutions may be readily computed—see [3,4].

For matrix equations of large dimensions, iterative algorithms to find an approximated/exact solution are remarkable. There are many techniques to construct an iterative procedure for Equation (2) or its special cases—e.g., matrix sign function [5], block successive over-relaxation [6], block recursion [7,8], Krylov subspace [9,10], and truncated low-rank algorithm [11]. Lately, there have been some variants of Hermitian and skew-Hermitian splitting—e.g., a generalized modified Hermitian and skew-Hermitian splitting algorithm [12], accelerated double-step scale splitting algorithm [13], PHSS algorithm [14], and the four parameter PSS algorithm [15]. Furthermore, the idea of conjugate gradient leads to finite-step iterative methods to find the exact solution such as the generalized conjugated direction algorithm [16], the conjugated gradient least-squares algorithm [17], and generalized product-type methods based on the Bi-conjugate gradient algorithm [18].

In the last decade, many authors have developed gradient-based iterative (GI) algorithms for certain linear matrix equations that satisfy the asymptotic stability (AS) in the following meaning:

(AS): The sequence of approximated solutions converges to the exact solution, no matter the initial value is.

The first GI algorithm for solving (1) was developed by Ding and Chen [19]. In that paper, a sufficient condition in terms of a convergence factor is determined so that the algorithm satisfies (AS) property. By introducing of a relaxation parameter, Niu et al. [20] suggested a relaxed gradient iterative (RGI) gradient algorithm to overcome (1). Numerical studies show that when the relaxation factor is correctly selected the convergent behavior of the Niu’s algorithm is stronger than the Ding’s algorithm. Zhang and Sheng [21] introduced an RGI algorithm for finding the symmetric (skew-symmetric) solution of Equation (1). Xie et al. [22] improved the RGI algorithm to become an accelerated GI (AGBI) algorithm, on the basis of the information generated in the previous half-step and a relaxation factor. Ding and Chen [23] also applied the ideas of gradients and least-squares to formulate the least-squares iterative (LSI) algorithm. In [24], Fan et al. realized that multiplication of the matrix in GI would take great time and space if

A_{1}

and

A_{2}

were big and dense, so they proposed the following Jacobi-gradient iterative (JGI) method.

Method 1

(Jacobi-Gradient based Iterative (JGI) algorithm [24]). For

i = 1, 2

, let

D_{i}

be the diagonal part of

A_{i}

. Given any initial matrices

X_{1} (0), X_{2} (0)

. Set

k = 0

and compute

X (0) = (1 / 2) (X_{1} (0) + X_{2} (0))

. For

k = 1, 2, \dots, E n d

, do:

\begin{matrix} X_{1} (k) & = X (k - 1) + μ D_{1} [C - A_{1} X (k - 1) - X (k - 1) A_{2}], \\ X_{2} (k) & = X (k - 1) + μ [C - A_{1} X (k - 1) - X (k - 1) A_{2}] D_{2}, \\ X (k) & = \frac{1}{2} (X_{1} (k) + X_{2} (k)) . \end{matrix}

After that, Tian et al. [25] proposed that an accelerated Jacobi-gradient iterative (AJGI) algorithm for solving the Sylvester matrix equation relies on two relaxation factors and the half-step update. However, the parameter values to apply to the algorithm are difficult to find since they are given in terms of a nonlinear inequality. For the generalized Sylvester Equation (2), the gradient iterative (GI) algorithm [19] and the least-squares iterative (LSI) algorithm [26] were established as follows.

Method 2

(GI algorithm [19]). Given any two initial matrices

X_{1} (0), X_{2} (0)

. Set

k = 0

and compute

X (0) = (1 / 2) (X_{1} (0) + X_{2} (0))

. For

k = 1, 2, \dots, E n d

, do:

\begin{matrix} X_{1} (k) & = X (k - 1) + μ A_{1}^{T} [E - A_{1} X (k - 1) A_{2} - A_{3} X (k - 1) A_{4}] A_{2}^{T}, \\ X_{2} (k) & = X (k - 1) + μ A_{3}^{T} [E - A_{1} X (k - 1) A_{2} - A_{3} X (k - 1) A_{4}] A_{4}^{T}, \\ X (k) & = \frac{1}{2} (X_{1} (k) + X_{2} (k)) . \end{matrix}

A sufficient condition for which the algorithm satisfies (AS) is

\begin{matrix} 0 < μ < \frac{2}{‖ A_{1} ‖_{2}^{2} ‖ A_{2} ‖_{2}^{2} + ‖ A_{3} ‖_{2}^{2} {‖ A_{4} ‖}_{2}^{2}} . \end{matrix}

Method 3

(LSI algorithm [26]). Given any two initial matrices

X_{1} (0), X_{2} (0)

. Set

k = 0

and compute

X (0) = \frac{1}{2} (X_{1} (0) + X_{2} (0))

. For

k = 1, 2, \dots, E n d

, do:

\begin{matrix} X_{1} (k) & = X (k - 1) + μ {(A_{1}^{T} A_{1})}^{- 1} A_{1}^{T} [E - A_{1} X (k - 1) A_{2} - A_{3} X (k - 1) A_{4}] A_{2}^{T} {(A_{2} A_{2}^{T})}^{- 1}, \\ X_{2} (k) & = X (k - 1) + μ {(A_{3}^{T} A_{3})}^{- 1} A_{3}^{T} [E - A_{1} X (k - 1) A_{2} - A_{3} X (k - 1) A_{4}] A_{4}^{T} {(A_{4} A_{4}^{T})}^{- 1}, \\ X (k) & = \frac{1}{2} (X_{1} (k) + X_{2} (k)) . \end{matrix}

If

0 < μ < 4

, then the algorithm satisfies (AS).

In this paper, we shall propose a new iterative method for solving the generalized Sylvester matrix Equation (2), when

A_{1}, A_{3} \in R^{m \times m}

,

A_{2}, A_{4} \in R^{n \times n}

and

X, E \in R^{m \times n}

. This algorithm requires only one initial value

X (0)

and only one parameter, called a convergence factor. We decompose the coefficient matrices to be the sum of its diagonal part and others. The recursive formula for iteration is derived from the gradients of quadratic norm-error functions together with hierarchical identification principle. Under assumptions on the real-parts sign of eigenvalues of matrix coefficients, we find necessary and sufficient conditions on a convergent factor for which (AS) holds. The convergence rate and error estimates are regulated by the iteration matrix spectral radius. In particular, when the iteration matrix is symmetric, we obtain a convergence criteria, error estimates and the optimal convergent factor in terms of spectral norms and condition number. Moreover, numerical simulations are also provided to illustrate our results for (2) and (1). We compare the efficiency of our algorithm to LSI, GI, RGI, AGBI and JGI algorithms.

Let us recall some terminology from matrix analysis—see e.g., [27]. For any square matrix X, denote

σ (X)

its spectrum,

ρ (X)

its spectral radius, and

tr (X)

its trace. Let us denote the largest and the smallest eigenvalues of a matrix by

λ_{max} (\cdot)

and

λ_{min} (\cdot)

, respectively. Recall that the spectral norm

{‖ \cdot ‖}_{2}

and the Frobenius norm

{‖ \cdot ‖}_{F}

of

A \in R^{m \times n}

are, respectively, defined by

\begin{matrix} {‖ A ‖}_{2} = \sqrt{λ_{max} (A^{T} A)} and {‖ A ‖}_{F} = \sqrt{tr (A^{T} A)} . \end{matrix}

The condition number of

A \neq 0

is defined by

κ (A) = \sqrt{\frac{λ_{max} (A^{T} A)}{λ_{min} (A^{T} A)}} .

Denote the real part of a complex number z by

ℜ (z)

.

The rest of paper is organized as follows. We propose a modified Jacobi-gradient iterative algorithm in Section 2. Convergence criteria, convergence rate, error estimates, and optimal convergence factor are discussed in Section 3. In Section 4, we provide numerical simulations of the algorithm. Finally, we conclude the paper in Section 5.

2. A Modified Jacobi-Gradient Iterative Method for the Generalized Sylvester Equation

In this section, we propose an iterative algorithm for solving the generalized Sylvester equation, called a modified Jacobi-gradient iterative algorithm.

Throughout, let

m, n \in N

and

A_{1}, A_{3} \in R^{m \times m}

,

A_{2}, A_{4} \in R^{n \times n}

and

E \in R^{m \times n}

. We would like to find a matrix

X \in R^{m \times n}

, such that

\begin{matrix} A_{1} X A_{2} + A_{3} X A_{4} = E . \end{matrix}

(3)

Write

A_{1} = D_{1} + F_{1}, A_{2} = D_{2} + F_{2}, A_{3} = D_{3} + F_{3} and A_{4} = D_{4} + F_{4}

, where

D_{1}, D_{2}, D_{3}, D_{4}

are the diagonal parts of

A_{1}, A_{2}, A_{3}, A_{4}

, respectively. A necessary and sufficient condition for (3) to have a unique solution is the invertibility of the square matrix

P : = A_{2}^{T} \otimes A_{1} + A_{4}^{T} \otimes A_{3} .

In this case, the solution is given by

vec X = P^{- 1} vec E

.

To obtain an iterative algorithm for solving (3), we recall the hierarchical identification principle in [19]. We write (3) to

\begin{matrix} (D_{1} + F_{1}) X (D_{2} + F_{2}) + A_{3} X A_{4} & = E, \end{matrix}

(4)

\begin{matrix} A_{1} X A_{2} + (D_{3} + F_{3}) X (D_{4} + F_{4}) & = E . \end{matrix}

(5)

Define two matrices

M : = E - F_{1} X D_{2} - D_{1} X F_{2} - F_{1} X F_{2} - A_{3} X A_{4},

N : = E - F_{3} X D_{4} - D_{3} X F_{4} - F_{3} X F_{4} - A_{1} X A_{2} .

From (4) and (5), we shall find the approximated solution of the following two subsystems

\begin{matrix} D_{1} X D_{2} = M and D_{3} X D_{4} = N \end{matrix}

(6)

so that the following norm-error functions are minimized:

\begin{matrix} L_{1} (X) : = ‖ D_{1} X D_{2} {- M ‖}_{F}^{2} and L_{2} (X) : = {‖ D_{3} X D_{4} - N ‖}_{F}^{2} . \end{matrix}

(7)

From the gradient formula

\begin{matrix} \frac{d}{d X} tr (A X) = A^{T}, \end{matrix}

we can deduce the gradient of the error

L_{1}

as follows:

\begin{matrix} \frac{\partial}{\partial X} L_{1} (X) & = \frac{\partial}{\partial X} tr [{(D_{1} X D_{2} - M)}^{T} (D_{1} X D_{2} - M)] \\ = \frac{\partial}{\partial X} tr (X D_{2} D_{2} X^{T} D_{1} D_{1}) - \frac{\partial}{\partial X} tr (X D_{2} M^{T} D_{1}) - \frac{\partial}{\partial X} tr (X^{T} D_{1} M D_{2}) \\ = (D_{1} D_{2}) X^{T} D_{2} D_{2} + D_{1} D_{1} X D_{2} D_{2} - D_{1} M D_{2} - {(D_{2} M^{T} D_{1})}^{T} \\ = 2 D_{1} (D_{1} X D_{2} - M) D_{2} . \end{matrix}

(8)

Similarly, we have

\begin{matrix} \frac{\partial}{\partial X} L_{2} (X) = 2 A_{3}^{T} (A_{3} X A_{4} - N) A_{4}^{T} . \end{matrix}

(9)

Let

X_{1} (k)

and

X_{2} (k)

be the estimates or iterative solutions of the system (6) at k-th iteration. The recursive formulas of

X_{1} (k)

and

X_{2} (k)

come from the gradient formulas (8) and (9), as follows:

\begin{matrix} X_{1} (k) & = X (k - 1) + μ D_{1} (M - D_{1} X (k - 1) D_{2}) D_{2} \\ = X (k - 1) + μ D_{1} (E - A_{1} X (k - 1) A_{2} + A_{3} X (k - 1) A_{4}) D_{2}, \\ X_{2} (k) & = X (k - 1) + μ D_{3} (N - D_{3} X (k - 1) D_{3}) D_{4} \\ = X (k - 1) + μ D_{3} (E - A_{1} X (k - 1) A_{2} + A_{3} X (k - 1) A_{4}) D_{4} . \end{matrix}

Based on the hierarchical identification principle, the unknown variable X is replaced by its estimates at the

(k - 1)

-th iteration. To avoid duplicated computation, we introduce a matrix

S (k) = E - (A_{1} X (k) A_{2} + A_{3} X (k) A_{4}),

so we have

\begin{matrix} X (k) = \frac{1}{2} (X_{1} (k) + X_{2} (k)) = X (k - 1) + μ (D_{1} S (k - 1) D_{2} + D_{3} S (k - 1) D_{4}) . \end{matrix}

(10)

Since any diagonal matrix is sparse, the operation counts in the computation (10) can be substantially reduced. Let us denote

S (k) = [s_{i j} (k)]

,

X (k) = [x_{i j} (k)]

, and

D_{l} = [d_{i j}^{(l)}]

for each

l = 1, 2, 3, 4

. Indeed, the multiplication of

D_{1} S (k) D_{2}

results in a matrix whose

(i j)

-th entry is the product of the i-th entry in the diagonal of

D_{1}

, the

(i j)

-th entry of

S (k)

, and the j-th entry of

D_{2}

—i.e.,

D_{1} S (k) D_{2} = [d_{i i}^{(1)} s_{i j} (k) d_{j j}^{(2)}]

. Similarly,

D_{3} S (k) D_{4} = [d_{i i}^{(3)} s_{i j} (k) d_{j j}^{(4)}]

. Thus,

D_{1} S (k) D_{2} + D_{3} S (k) D_{4} = [(d_{i i}^{(1)} d_{j j}^{(2)} + d_{i i}^{(3)} d_{j j}^{(4)}) s_{i j} (k)] .

The above discussion leads to the following Algorithm 1.

Algorithm 1: Modified Jacobi-gradient based iterative (MJGI) algorithm

The complexity analysis for each step of the algorithm is given by

2 m n (m + n + 5)

. When

m = n

, the complexity analysis is

4 n^{3} + 10 n^{2} \in O (n^{3})

, so that the algorithm runtime complexity is cubic time. The convergence property of the algorithm relies on the convergence factor

μ

. The appropriate value of this parameter is determined in the next section.

3. Convergence Analysis of the Proposed Method

In this section, we make convergence analysis of Algorithm 1. First, we transform it into a linear iterative method of the first order:

x (k) = T x (k - 1)

where

x (k)

is a vector variable and T is a matrix. The iteration matrix T will reflect convergence criteria, convergence rate, and error estimates of the algorithm.

3.1. Convergence Criteria

Theorem 1.

Assume that the generalized Sylvester matrix Equation (3) has a unique solution

X^{*}

. Denote

H = D (P) P

and write

σ (H) = \{λ_{1}, \dots, λ_{m n}\}

. Let

{X (k)}

be a sequence generated from Algorithm 1.

(1): Then, (AS) holds if and only if $ρ (I_{m n} - μ H) < 1$ .
(2): If $ℜ (λ_{j}) > 0$ for all $j = 1, \dots, m n$ , then (AS) holds if and only if

$\begin{matrix} 0 < μ < max_{j = 1, \dots, m n} \frac{2 ℜ (λ_{j})}{{| λ_{j} |}^{2}} . \end{matrix}$
(3): If $ℜ (λ_{j}) < 0$ for all $j = 1, \dots, m n$ , then (AS) holds if and only if

$\begin{matrix} min_{j = 1, \dots, m n} \frac{2 ℜ (λ_{j})}{{| λ_{j} |}^{2}} < μ < 0 . \end{matrix}$
(4): If H is symmetric, then (AS) holds if and only if $λ_{max} (H)$ and $λ_{min} (H)$ have the same sign, and μ is chosen so that

$\{\begin{matrix} 0 < μ < \frac{2}{λ_{max} (H)} & if λ_{min} (H) > 0, \\ \frac{2}{λ_{min} (H)} < μ < 0 & if λ_{max} (H) < 0 . \end{matrix}$

Proof.

From Algorithm 1, we start with considering the error matrices

\tilde{X} (k) = X (k) - X^{*}, {\tilde{X}}_{1} (k) = X_{1} (k) - X^{*} and {\tilde{X}}_{2} (k) = X_{2} (k) - X^{*} .

We will show that

\tilde{X} (k) \to 0,

or equivalently,

vec \tilde{X} (k) \to 0

as

k \to \infty

. A direct computation reveals that

\begin{matrix} \tilde{X} (k) & = \frac{1}{2} (X_{1} (k) + X_{2} (k)) \\ = \tilde{X} (k - 1) - \frac{1}{2} μ D_{1} (A_{1} \tilde{X} (k - 1) A_{2} + A_{3} \tilde{X} (k - 1) A_{4}) D_{2} \\ - \frac{1}{2} μ D_{3} (A_{1} \tilde{X} (k - 1) A_{2} + A_{3} \tilde{X} (k - 1) A_{4}) D_{4} . \end{matrix}

By taking the vector operator and using properties of the Kronecker product, we have

\begin{matrix} vec \tilde{X} (k) & = vec \tilde{X} (k - 1) \\ - μ vec (D_{1} A_{1} \tilde{X} (k - 1) A_{2} D_{2} + D_{1} A_{3} \tilde{X} (k - 1) A_{4} D_{2}) \\ - μ vec (D_{3} A_{1} \tilde{X} (k - 1) A_{2} D_{4} + D_{3} A_{3} \tilde{X} (k - 1) A_{4} D_{4}) \\ = {I_{m n} - μ [(D_{2} \otimes D_{1}) (A_{2}^{T} \otimes A_{1}) (+ D_{2} \otimes D_{1}) (A_{4}^{T} \otimes A_{3}) \\ + (D_{4} \otimes D_{3}) (A_{2}^{T} \otimes A_{1}) + (D_{4} \otimes D_{3}) (A_{4}^{T} \otimes A_{3})]} vec (\tilde{X} (k - 1)) \\ = [I_{m n} - μ (D_{2} \otimes D_{1} + D_{4} \otimes D_{3}) (A_{2}^{T} \otimes A_{1} + A_{4}^{T} \otimes A_{3})] vec (\tilde{X} (k - 1)) . \end{matrix}

Let us denote the diagonal part of P by

D (P)

. Indeed,

\begin{matrix} D (P) = D_{2} \otimes D_{1} + D_{4} \otimes D_{3} . \end{matrix}

Thus, we arrive at a linear iterative process

\begin{matrix} vec \tilde{X} (k) & = [I_{m n} - μ H] vec \tilde{X} (k - 1), \end{matrix}

(11)

where

H = D (P) P

. Hence, the following statements are equivalent:

(i): $vec \tilde{X} (k) \to 0$ for any initial value $vec \tilde{X} (0)$ .
(ii): System (11) has an asymptotically-stable zero solution.
(iii): The iteration matrix $I_{m n} - μ H$ has spectral radius less than 1.

Indeed, since

I_{m n} - μ H

is a polynomial of H, we get

\begin{matrix} ρ (I_{m n} - μ H) = max_{λ \in σ (H)} | 1 - μ λ | . \end{matrix}

(12)

Thus,

ρ (I_{m n} - μ H) < 1

if and only if

| 1 - μ λ | < 1

for all

λ \in σ (H)

. Write

λ_{j} = a_{j} + i b_{j}

where

a_{j}, b_{j} \in R

. It follows that the condition

| 1 - μ λ_{j} | < 1

is equivalent to

(1 - μ λ_{j}) \bar{(1 - μ λ_{j})} < 1

, or

\begin{matrix} μ (- 2 a_{j} + μ (a_{j}^{2} + b_{j}^{2})) < 0 . \end{matrix}

Thus, we arrive at two alternative conditions:

(i): $μ > 0$ and $- 2 a_{j} + μ (a_{j}^{2} + b_{j}^{2}) < 0$ for all $j = 1, 2, 3, \dots, m n$ ;
(ii): $μ < 0$ and $- 2 a_{j} + μ (a_{j}^{2} + b_{j}^{2}) > 0$ for all $j = 1, 2, 3, \dots, m n$ .

Case 1: $a_{j} = ℜ (λ_{j}) > 0$ for all j. In this case, $ρ (I_{m n} - μ H) < 1$ if and only if

$\begin{matrix} 0 < μ < max_{j = 1, \dots, m n} \frac{2 a_{j}}{a_{j}^{2} + b_{j}^{2}} . \end{matrix}$

(13)
Case 2: $a_{j} = ℜ (λ_{j}) < 0$ for all j. In this case, $ρ (I_{m n} - μ H) < 1$ if and only if

$\begin{matrix} min_{j = 1, \dots, m n} \frac{2 a_{j}}{a_{j}^{2} + b_{j}^{2}} < μ < 0 . \end{matrix}$

(14)

Now, suppose that H is a symmetric matrix. Then

I_{m n} - μ H

is also symmetric, and thus all its eigenvalue are real. Hence,

\begin{matrix} ρ (I_{m n} - μ H) = max \{| 1 - μ λ_{min} (H) |, | 1 - μ λ_{max} (H) |\} . \end{matrix}

(15)

It follows that

ρ (I_{m n} - μ H) < 1

if and only if

\begin{matrix} 0 < μ λ_{min} (H) < 2 and 0 < μ λ_{max} (H) < 2 . \end{matrix}

(16)

So,

λ_{min} (H)

and

λ_{max} (H)

cannot be zero.

Case 1: If $λ_{max} (H) \geq λ_{min} (H) > 0,$ then the condition (16) is equivalent to

$\begin{matrix} 0 < μ < \frac{2}{λ_{max} (H)} . \end{matrix}$
Case 2: If $λ_{min} (H) \leq λ_{max} (H) < 0,$ then the condition (16) is equivalent to

$\begin{matrix} \frac{2}{λ_{min} (H)} < μ < 0 . \end{matrix}$
Case 3: If $λ_{min} (H) < 0 < λ_{max} (H)$ , then

$\begin{matrix} \frac{2}{λ_{min} (H)} < μ < 0 and 0 < μ < \frac{2}{λ_{max} (H)}, \end{matrix}$

which is a contradiction.

Therefore, the condition (16) holds if and only if

λ_{max} (H)

and

λ_{min} (H)

have the same sign and

μ

is chosen according to the above condition. □

3.2. Convergence Rate and Error Estimate

We now discuss the convergence rate and error estimates of Algorithm 1 from the iterative process (11).

Suppose that Algorithm 1 satisfies the (AS) property—i.e.,

ρ (I_{m n} - μ H) < 1

. From (11), we have

\begin{matrix} {‖ X (k) - X^{*} ‖}_{F} & = {‖ vec \tilde{X} (k) ‖}_{F} = {‖ (I_{m n} - μ H) vec \tilde{X} (k - 1) ‖}_{F} \\ \leq {‖ I_{m n} - μ H ‖}_{2} {‖ \tilde{X} (k - 1) ‖}_{F} = {‖ I_{m n} - μ H ‖}_{2} {‖ X (k - 1) - X^{*} ‖}_{F} . \end{matrix}

(17)

It follows inductively that for each

k \in N

,

\begin{matrix} {‖ X (k) - X^{*} ‖}_{F} \leq {‖ I_{m n} - μ H ‖}_{2}^{k} {‖ X (0) - X^{*} ‖}_{F} . \end{matrix}

(18)

Hence, the spectral norm of

I_{m n} - μ H

describes how fast the approximated solutions

X (k)

converges to the exact solution

X^{*}

. The smaller spectral radius, the faster

X (k)

goes to

X^{*}

. In that case, since

{‖ I_{m n} - μ H ‖}_{2} < 1

, if

{‖ X (k - 1) - X^{*} ‖}_{F} \neq 0

(i.e.,

X (k - 1)

is not the exact solution) then

\begin{matrix} {‖ X (k) - X^{*} ‖}_{F} < {‖ X (k - 1) - X^{*} ‖}_{F} . \end{matrix}

(19)

Thus, the error at each iteration gets smaller than the previous one.

The above discussion is summarized in the following theorem.

Theorem 2.

Suppose that the parameter μ is chosen as in Theorem 1 so that Algorithm 1 satisfies (AS). Then, the convergence rate of the algorithm is governed by the spectral radius (16). Moreover, the error estimate

{‖ X (k) - X^{*} ‖}_{F}

compared to the previous step and the fast step are provided by (17) and (18), respectively. In particular, the error at each iteration gets smaller than the

(n o n z e r o)

previous one, as in (19).

From (16), if the eigenvalues of

μ H

are close to 1, then the spectral radius of the iteration matrix is close to 0, and hence, the error

vec \tilde{X} (k)

or

\tilde{X} (k)

converge faster to 0.

Remark 1.

The convergence criteria and the convergence rate of Algorithm 1 depend on A, B, C and D but not on E. However, the matrix E can be used for the stopping criteria.

The next proposition determines the iteration number for which the approximated solution

X (k)

is close to the exact solution

X^{*}

so that

{‖ X (k) - X^{*} ‖}_{F} < ϵ

.

Proposition 1.

According to Algorithm 1, for each given error

ϵ > 0

, we have

{‖ X (k) - X^{*} ‖}_{F} < ϵ

after

k^{*}

iterations for any

k^{*}

, such that

\begin{matrix} k^{*} > \frac{log ϵ - log {‖ X (k) - X^{*} ‖}_{F}}{log {‖ I_{m n} - μ H ‖}_{2}} . \end{matrix}

(20)

Proof.

From the estimation (18), we have

\begin{matrix} {‖ X (k) - X^{*} ‖}_{F} \leq {‖ I_{m n} - μ H ‖}_{2}^{k} {‖ X (0) - X^{*} ‖}_{F} \to 0 as k \to \infty . \end{matrix}

This means precisely that for each given

ϵ > 0

, there is a

k^{*} \in N

such that for all

k \geq k^{*}

,

{‖ I_{m n} - μ H ‖}_{2}^{k} {‖ X (0) - X^{*} ‖}_{F} < ϵ .

Taking logarithms, we have that the above condition is equivalent to (20). Thus, if we run Algorithm 1

k^{*}

times, then we get

‖ X (k) - X^{*} ‖ < ϵ

as desired. □

3.3. Optimal Parameter

We discuss the fastest convergence factor for Algorithm 1.

Theorem 3.

The optimal convergence factor μ for which Algorithm 1 satisfies (AS) is one that minimizes

‖ I_{m n} {- μ H ‖}_{2}

. If, in addition, H is symmetric, then the optimal convergence factor for which the algorithm satisfies (AS) is determined by

\begin{matrix} μ_{o p t} & = \frac{2}{λ_{min} (H) + λ_{max} (H)} . \end{matrix}

(21)

In this case, the convergence rate is governed by

\begin{matrix} ρ (I_{m n} - μ H) = \frac{λ_{max} (H) - λ_{min} (H)}{λ_{max} (H) + λ_{min} (H)} = \frac{κ^{2} - 1}{κ^{2} + 1}, \end{matrix}

(22)

where κ denotes be condition number of H, and we have the following estimates:

\begin{matrix} ‖ X (k) - X^{*} ‖_{F} & \leq \frac{κ^{2} - 1}{κ^{2} + 1} {‖ X (k - 1) - X^{*} ‖}_{F}, \end{matrix}

(23)

\begin{matrix} ‖ X (k) - X^{*} ‖_{F} & \leq {(\frac{κ^{2} - 1}{κ^{2} + 1})}^{k} {‖ X (0) - X^{*} ‖}_{F} . \end{matrix}

(24)

Proof.

From Theorem 2, it is clear that the fastest convergence factor is attained at a convergence factor that minimizes

‖ I_{m n} {- μ H ‖}_{2}

. Now, assume that H is symmetric. Then,

I_{m n} - μ H

is also symmetric, thus all its eigenvalues are real and

\begin{matrix} {‖ I_{m n} - μ H ‖}_{2} = ρ (I_{m n} - μ H) . \end{matrix}

(25)

For convenience, denote

a = λ_{min} (H)

,

b = λ_{max} (H)

, and

\begin{matrix} f (μ) : = ρ (I_{m n} - μ H) = max \{| 1 - μ a |, | 1 - μ b |\} . \end{matrix}

First, we consider the case

λ_{min} (H) > 0

. To obtain the fastest convergence factor, according to (15), we must solve the following optimization problem

\begin{matrix} min_{0 < μ < \frac{2}{λ_{max} (H)}} {‖ I_{m n} - μ H ‖}_{2} = min_{0 < μ < \frac{2}{b}} f (μ) . \end{matrix}

We obtain that the minimizer is given by

μ_{o p t} = 2 / (a + b)

, so that

f (μ_{o p t}) = (b - a) / (b + a)

. For the case

λ_{max} (H) < 0

, we solve the following optimization problem

\begin{matrix} min_{\frac{2}{λ_{min} (H)} < μ < 0} {‖ I_{m n} - μ H ‖}_{2} = min_{\frac{2}{a} < μ < 0} f (μ) . \end{matrix}

A similar argument yields the same minimizer (21) and the same convergence rate (22). From (17), (18) and (25), we obtain the bounds (23) and (24). □

4. Numerical Simulations

In this section, we report numerical results to illustrate the effectiveness of Algorithm 1. We consider various sizes of matrix systems, namely, small

(2 \times 2)

, medium

(10 \times 10)

and large

(100 \times 100)

. For the generalized Sylvester equation, we compare the performance of Algorithm 1 to the GI and LSI algorithms. For the Sylvester equation, we compare our algorithm with GI, RGI, AGBI and JGI algorithms. All iterations have been carried out the same environment: MATLAB R2017b, Intel(R) Core(TM) i7-7660U CPU @ 2.5GHz, RAM 8.00 GB Bus speed 2133 MHz. We abbreviate IT and CPU for iteration time and CPU time (in seconds), respectively. As step k-th of the iteration, we consider the following error:

\begin{matrix} δ (k) : = {‖ E - A_{1} X (k) A_{2} - A_{3} X (k) A_{4} ‖}_{F} \end{matrix}

where

X (k)

is the k-th approximated solution of the corresponding system.

4.1. Numerical Simulation for the Generalized Sylvester Matrix Equation

Example 1.

Consider the matrix equation

A_{1} X A_{2} + A_{3} X A_{4} = E

where

\begin{matrix} A_{1} & = [\begin{matrix} 0.6959 & - 0.6385 \\ 0.6999 & 0.0336 \end{matrix}], A_{2} = [\begin{matrix} - 0.0688 & - 0.5309 \\ 0.3196 & 0.6544 \end{matrix}], A_{3} = [\begin{matrix} 0.4076 & 0.7184 \\ - 0.8200 & 0.9686 \end{matrix}], \\ A_{4} & = [\begin{matrix} 0.5313 & 0.1056 \\ 0.3251 & 0.6110 \end{matrix}], E = [\begin{matrix} 0.7788 & 0.0908 \\ 0.4235 & 0.2665 \end{matrix}] . \end{matrix}

Then, the exact solution of X is

\begin{matrix} X^{*} = [\begin{matrix} 1.3036 & - 0.0532 \\ 1.2725 & 1.2284 \end{matrix}] . \end{matrix}

Choose

X (0) = zeros (2)

. In this case, all eigenvalues of H have positive real parts. The effect of changing the convergence factor μ is illustrated in Figure 1. According to Theorem 1, the criteria for the convergence of

X (k)

is that

μ \in (0, 4.1870)

. Since

μ_{1}, μ_{2}, μ_{3}, μ_{4}

satisfy this criteria, the error is becoming smaller and goes to zero as k increase, as in Figure 1. Among them,

μ_{4} = 4.0870

gives the fastest convergence. For

μ_{5}

and

μ_{6}

, which do not meet the criteria, the error

δ (k)

does not converge to zero.

Figure 1. Error of Example 1.

Example 2.

Suppose that

A_{1} X A_{2} + A_{3} X A_{4} = E,

where

A_{1}, A_{2}, A_{3}, A_{4} and E

are

10 \times 10

matrices where

\begin{matrix} A_{1} = tridiag (1, 3, - 1), A_{2} = tridiag (1, 1, - 2), A_{3} = tridiag (- 2, - 2, 3), \\ A_{4} = tridiag (- 3, 2, - 1) and E = heptadiag (1, - 2, 1, - 2, - 2, 1, - 3) . \end{matrix}

Here, E is a heptadiagonal matrix—i.e., a band matrix with bandwidth 3. Choose an initial matrix

X (0) = zeros (10),

where

zeros (n)

is an n-by-n matrix that contains 0 for every position. We compare Algorithm 1 with the direct method, LSI and GI algorithms. Table 1 shows the errors at the final step of iteration as well as the computation time after 75 iterations. Figure 2 illustrates that the approximated solutions via LSI diverge, while those via GI and MJGI converge. Table 1 and Figure 2 imply that our algorithm takes significantly less computational time and error than others.

Table 1. Computational time and error for Example 2.

Figure 2. Error of Example 2.

Example 3.

We consider the equation

A_{1} X A_{2} + A_{3} X A_{4} = E

in which

A_{1}, A_{2}, A_{3}, A_{4} and E

are

100 \times 100

matrices determined by

\begin{matrix} A_{1} = tridiag (1, 1, - 1), A_{2} = tridiag (1, 2, - 2), A_{3} = tridiag (- 1, - 2, 3), \\ A_{4} = tridiag (- 2, 1, - 1) and E = heptadiag (1, 2, - 4, 1, - 2, 2, - 3) . \end{matrix}

The initial matrix is given by

X (0) = zeros (100) .

We run LSI, GI and MJGI algorithms by using

\begin{matrix} μ = 0.1, μ = {(‖ A_{1} ‖^{2} ‖ A_{2} ‖^{2} + ‖ A_{3} ‖^{2} ‖ A_{4} ‖^{2})}^{- 1}, μ = 2 {(‖ A_{1} ‖^{2} ‖ A_{2} ‖^{2} + ‖ A_{3} ‖^{2} ‖ A_{4} ‖^{2})}^{- 1}, \end{matrix}

respectively. The reported result in Table 2 and Figure 3 illustrate that the approximated solution generated from LSI diverges, while those from GI or MJGI converge. Both computational time and the error

δ (100)

from MJGI are less than those from GI.

Table 2. Computational time and error for Example 3.

Figure 3. Comparison of Example 3.

4.2. Numerical Simulation for Sylvester Matrix Equation

Assume that the Sylvester equation

\begin{matrix} A_{1} X + X A_{4} = E \end{matrix}

(26)

has a unique solution. This condition is equivalent to that the Kronecker sum

A_{4}^{T} \oplus A_{1}

is invertible, or all possible sums between eigenvalues of

A_{1}

and

A_{4}

are nonzero. To solve (26), the Algorithm 2 is proposed:

T (k) : = E - A_{1} X (k) - X (k) A_{4} .

Algorithm 2: Modified Jacobi-gradient based iterative (MJGI) algorithm for Sylvester equation

Example 4.

Consider the equation

A_{1} X + X A_{4} = E,

in which E is the same matrix as in the previous example,

\begin{matrix} A_{1} = tridiag (2, - 1, 1) \in R^{10 \times 10} and A_{4} = tridiag (- 1, 1, - 2) \in R^{10 \times 10} . \end{matrix}

In this case, all eigenvalues of the iteration matrix have positive real parts, so that we can apply our algorithm. We compare our algorithm with GI, RGI, AGBI and JGI algorithms. The results after running 100 iterations are shown in Figure 4 and Table 3. According to the error and CT in Table 3 and Figure 4, our algorithm uses less computational time and has smaller errors than others.

Figure 4. Errors of Example 4.

Table 3. CTs and errors for Example 4.

5. Conclusions and Suggestion

A modified Jacobi-gradient (MJGI) algorithm (Algorithm 1) is proposed for solving the generalized Sylvester matrix Equation (3). In order to have MJGI algorithm applicable for any sizes of matrix system and any initial matrices, the convergence factor

μ

must be chosen properly according to Theorem 1. In this case, the iteration matrix

I_{m n} - μ H

has a spectral radius less than 1. When the iteration matrix is symmetric, we determine the optimal convergent factor

μ_{o p t}

which enhances the algorithm reaching the fastest rate of convergence. The asymptotic convergence rate of the algorithm is governed by the spectral radius of

I_{m n} - μ H

. So, if the eigenvalue H is close to 1, then the algorithm converges faster in the long run. The numerical examples reveal that our algorithm is suitable for small (

2 \times 2

), medium (

10 \times 10

) and large (

100 \times 100

) sizes of matrix systems. In addition, the MJGI algorithm performs well compared to recent gradient iterative algorithms. For future works, we may add another parameter for an updating step to make the algorithm converge faster—see [25]. Another possible way is to apply the idea in this paper to derive an iterative algorithm for nonlinear matrix equations.

Author Contributions

Supervision, P.C.; software, N.S.; writing—original draft preparation, N.S.; writing—review and editing, P.C. All authors contributed equally and significantly in writing this article. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The first author received financial support from the RA-TA graduate scholarship from the faculty of Science, King Mongkut’s Institute of Technology Ladkrabang, Grant. No. RA/TA-2562-M-001 during his Master’s study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shang, Y. Consensus seeking over Markovian switching networks with time-varying delays and uncertain topologies. Appl. Math. Comput. 2016, 273, 1234–1245. [Google Scholar] [CrossRef]
Shang, Y. Average consensus in multi-agent systems with uncertain topologies and multiple time-varying delays. Linear Algebra Appl. 2014, 459, 411–429. [Google Scholar] [CrossRef]
Golub, G.H.; Nash, S.; Van Loan, C.F. A Hessenberg-Schur method for the matrix AX + XB = C. IEEE Trans. Automat. Control. 1979, 24, 909–913. [Google Scholar] [CrossRef]
Ding, F.; Chen, T. Hierarchical least squares identification methods for multivariable systems. IEEE Trans. Automat. Control 1997, 42, 408–411. [Google Scholar] [CrossRef]
Benner, P.; Quintana-Orti, E.S. Solving stable generalized Lyapunov equations with the matrix sign function. Numer. Algorithms 1999, 20, 75–100. [Google Scholar] [CrossRef]
Starke, G.; Niethammer, W. SOR for AX − XB = C. Linear Algebra Appl. 1991, 154–156, 355–375. [Google Scholar] [CrossRef]
Jonsson, I.; Kagstrom, B. Recursive blocked algorithms for solving triangular systems—Part I: One-sided and coupled Sylvester-type matrix equations. ACM Trans. Math. Softw. 2002, 28, 392–415. [Google Scholar] [CrossRef]
Jonsson, I.; Kagstrom, B. Recursive blocked algorithms for solving triangular systems—Part II: Two-sided and generalized Sylvester and Lyapunov matrix equations. ACM Trans. Math. Softw. 2002, 28, 416–435. [Google Scholar] [CrossRef]
Kaabi, A.; Kerayechian, A.; Toutounian, F. A new version of successive approximations method for solving Sylvester matrix equations. Appl. Math. Comput. 2007, 186, 638–648. [Google Scholar] [CrossRef]
Lin, Y.Q. Implicitly restarted global FOM and GMRES for nonsymmetric matrix equations and Sylvester equations. Appl. Math. Comput. 2005, 167, 1004–1025. [Google Scholar] [CrossRef]
Kressner, D.; Sirkovic, P. Truncated low-rank methods for solving general linear matrix equations. Numer. Linear Algebra Appl. 2015, 22, 564–583. [Google Scholar] [CrossRef]
Dehghan, M.; Shirilord, A. A generalized modified Hermitian and skew-Hermitian splitting (GMHSS) method for solving complex Sylvester matrix equation. Appl. Math. Comput. 2019, 348, 632–651. [Google Scholar] [CrossRef]
Dehghan, M.; Shirilord, A. Solving complex Sylvester matrix equation by accelerated double-step scale splitting (ADSS) method. Eng. Comput. 2019. [Google Scholar] [CrossRef]
Li, S.Y.; Shen, H.L.; Shao, X.H. PHSS iterative method for solving generalized Lyapunov equations. Mathematics 2019, 7, 38. [Google Scholar] [CrossRef]
Shen, H.L.; Li, Y.R.; Shao, X.H. The four-parameter PSS method for solving the Sylvester equation. Mathematics 2019, 7, 105. [Google Scholar] [CrossRef]
Hajarian, M. Generalized conjugate direction algorithm for solving the general coupled matrix equations over symmetric matrices. Numer. Algorithms 2016, 73, 591–609. [Google Scholar] [CrossRef]
Hajarian, M. Extending the CGLS algorithm for least squares solutions of the generalized Sylvester-transpose matrix equations. J. Frankl. Inst. 2016, 353, 1168–1185. [Google Scholar] [CrossRef]
Dehghan, M.; Mohammadi-Arani, R. Generalized product-type methods based on Bi-conjugate gradient (GPBiCG) for solving shifted linear systems. Comput. Appl. Math. 2017, 36, 1591–1606. [Google Scholar] [CrossRef]
Ding, F.; Chen, T. Gradient based iterative algorithms for solving a class of matrix equations. IEEE Trans. Automat. Control 2005, 50, 1216–1221. [Google Scholar] [CrossRef]
Niu, Q.; Wang, X.; Lu, L.-Z. A relaxed gradient based algorithm for solving Sylvester equation. Asian J. Control 2011, 13, 461–464. [Google Scholar] [CrossRef]
Zhang, X.D.; Sheng, X.P. The relaxed gradient based iterative algorithm for the symmetric (skew symmetric) solution of the Sylvester equation AX + XB = C. Math. Probl. Eng. 2017, 2017, 1624969. [Google Scholar] [CrossRef]
Xie, Y.J.; Ma, C.F. The accelerated gradient based iterative algorithm for solving a class of generalized Sylvester-transpose matrix equation. Appl. Math. Comput. 2012, 218, 5620–5628. [Google Scholar] [CrossRef]
Ding, F.; Chen, T. Iterative least-squares solutions of coupled Sylvester matrix equations. Syst. Control Lett. 2005, 54, 95–107. [Google Scholar] [CrossRef]
Fan, W.; Gu, C.; Tian, Z. Jacobi-gradient iterative algorithms for Sylvester matrix equations. In Proceedings of the 14th Conference of the International Linear Algebra Society, Shanghai University, Shanghai, China, 16–20 July 2007. [Google Scholar]
Tian, Z.; Tian, M.; Gu, C.; Hao, X. An accelerated Jacobi-gradient based iterative algorithm for solving Sylvester matrix equations. Filomat 2017, 31, 2381–2390. [Google Scholar] [CrossRef]
Ding, F.; Liu, P.X.; Chen, T. Iterative solutions of the generalized Sylvester matrix equations by using the hierarchical identification principle. Appl. Math. Comput. 2008, 197, 41–50. [Google Scholar] [CrossRef]
Horn, R.A.; Johnson, C.R. Topics in Matrix Analysis; Cambridge University Press: New York, NY, USA, 1991. [Google Scholar]

Figure 1. Error of Example 1.

Figure 2. Error of Example 2.

Figure 3. Comparison of Example 3.

Figure 4. Errors of Example 4.

Table 1. Computational time and error for Example 2.

Method	IT	CT	Error: $δ (75)$
Direct	-	0.0364	-
LSI	75	0.0125	1.1296 × 10⁵
GI	75	0.0049	1.4185
MJGI	75	0.0022	0.5251

Table 2. Computational time and error for Example 3.

Method	IT	CT	Error: $δ (100)$
Direct	-	34.6026	-
LSI	100	0.1920	2.7572 × 10⁴
GI	100	0.0849	4.7395
MJGI	100	0.0298	1.8844

Table 3. CTs and errors for Example 4.

Method	IT	CT	Error: $δ (100)$
Direct	-	0.0118	-
GI	100	0.0051	2.5981
RGI	100	0.0061	3.4741
AGBI	100	0.0051	7.3306
JGI	100	0.0038	17.2652
MJGI	100	0.0028	0.4281

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.