Conjugate Gradient Algorithm for Least-Squares Solutions of a Generalized Sylvester-Transpose Matrix Equation

Tansri, Kanjanaporn; Chansangiam, Pattrawut

doi:10.3390/sym14091868

Open AccessArticle

Conjugate Gradient Algorithm for Least-Squares Solutions of a Generalized Sylvester-Transpose Matrix Equation

by

Kanjanaporn Tansri

^† and

Pattrawut Chansangiam

^*,†

Department of Mathematics, School of Science, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Symmetry 2022, 14(9), 1868; https://doi.org/10.3390/sym14091868

Submission received: 2 August 2022 / Revised: 30 August 2022 / Accepted: 2 September 2022 / Published: 7 September 2022

(This article belongs to the Section Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

We derive a conjugate-gradient type algorithm to produce approximate least-squares (LS) solutions for an inconsistent generalized Sylvester-transpose matrix equation. The algorithm is always applicable for any given initial matrix and will arrive at an LS solution within finite steps. When the matrix equation has many LS solutions, the algorithm can search for the one with minimal Frobenius-norm. Moreover, given a matrix Y, the algorithm can find a unique LS solution closest to Y. Numerical experiments show the relevance of the algorithm for square/non-square dense/sparse matrices of medium/large sizes. The algorithm works well in both the number of iterations and the computation time, compared to the direct Kronecker linearization and well-known iterative methods.

Keywords:

generalized Sylvester-transpose matrix equation; conjugate gradient algorithm; least-squares solution; orthogonality; Kronecker product

MSC:

65F45; 65F10; 15A60; 15A69

1. Introduction

Sylvester-type matrix equations are closely related to ordinary differential equations (ODEs), which can be adapted to several problems in control engineering and information sciences; see e.g., monographs [1,2,3]. The Sylvester matrix equation

A X + X B = E

and the famous special case Lyapunov equation

A X + X A^{T} = E

have several applications in numerical methods for ODEs, control and system theory, signal processing, and image restoration; see e.g., [4,5,6,7,8,9]. The Sylvester-transpose equation

A X + X^{T} B = C

is utilized in eigenstructure assignment in descriptor systems [10], pole assignment [3], and fault identification in dynamic systems [11]. In addition, if we require that the solution X to be symmetric, then the Sylvester-transpose equation coincides with the Sylvester one. A generalized Sylvester equation

A X B + C X D = E

can be applied to implicit ODEs, and a general dynamical linear model for vibration and structural analysis, robot and spaceship controls; see e.g., [12,13].

The mentioned matrix equations are special cases of a generalized Sylvester-transpose matrix equation:

\begin{matrix} A X B + C X^{T} D = E, \end{matrix}

(1)

or more generally

\begin{matrix} \sum_{i = 1}^{s} A_{i} X B_{i} + \sum_{j = 1}^{t} C_{j} X^{T} D_{j} = E . \end{matrix}

(2)

A direct algebraic method to find a solution of Equation (2) is to use the Kronecker linearization transforming the matrix equation into an equivalent linear system; see e.g., [14] (Ch. 4). The same technique together with the notion of weighted Moore–Penrose inverse were adapted to solve a coupled inconsistent Sylvester-type matrix equations [15] for least-squares (LS) solutions. Another algebraic method is to apply a generalized Sylvester mapping [13], so that the solution is expressed in terms of polynomial matrices. However, when the sizes of coefficient matrices are moderate or large, it is inconvenient to use matrix factorizations or another traditional methods since they require a large memory to calculate an exact solution. Thus, the Kronecker linearization and other algebraic methods are only suitable for small matrices. That is why it is important to find solutions that are easy to compute, leading many researchers to come up with algorithms that can reduce the time and memory usage of solving large matrix equations.

In the literature, there are two notable techniques to derive iterative procedures for solving linear matrix equations; see more information in a monograph [16]. The conjugate gradient (CG) technique aims to create an approximate sequence of solutions so that the respective residual matrix creates a perpendicular base. The desired solution will come out in the final step of iterations. In the last decade, many authors developed CG-type algorithms for Equation (2) and its special cases, e.g., BiCG [17], BiCR [17], CGS [18], GCD [19], GPBiCG [20], and CMRES [21]. The second technique, known as gradient-based iterative (GI) technique, intends to construct a sequence of approximate solutions from the gradient of the associated norm-error function. If we carefully set parameters of GI algorithm, then the generated sequence would converge to the desired solution. In the last five years, many GI algorithms have been introduced; see e.g., GI [22,23], relaxed GI [24], accelerated GI [25], accelerated Jacobi GI [26], modified Jacobi GI [27], gradient-descent algorithm [28], and global generalized Hessenberg algorithm [29]. For LS solutions of Sylvester-type matrix equations, there are iterative solvers, e.g., [30,31].

Recently, the work [32] developed an effective gradient-descent iterative algorithm to produce approximated LS solutions of Equation (2). When Equation (2) is consistent, a CG-type algorithm was derived to obtain a solution within finite steps; see [33]. This work is a continuation of [33], i.e., we consider Equation (1) with rectangular coefficient matrices and a rectangular unknown matrix X. Suppose that Equation (1) is inconsistent. We propose a CG-type algorithm to approximate LS solutions, which will solve the following problems:

Problem 1.

Find a matrix

\hat{X} \in R^{n \times p}

that minimizes

∥ E - A X B - C X^{T} D ∥_{F}

.

Let

L

be the set of least-squares solutions of Equation (1). The second problem is to find an LS solution with the minimal norm:

Problem 2.

Find the matrix

X^{*}

such that

\begin{matrix} {∥ X^{*} ∥}_{F} = min_{\hat{X} \in L} {∥ \hat{X} ∥}_{F} . \end{matrix}

(3)

The last one is to find an LS solution closest to a given matrix:

Problem 3.

Let

Y \in R^{n \times p}

. Find the matrix

\overset{ˇ}{X}

such that

\begin{matrix} {∥ \overset{ˇ}{X} - Y ∥}_{F} = min_{\hat{X} \in L} {∥ \hat{X} - Y ∥}_{F} . \end{matrix}

(4)

Moreover, we extend our studies to the matrix Equation (2). We verify the results from theoretical and numerical points of view.

The organization of this article is as follows. In Section 2, we recall preliminary results from matrix theory that will be used in later discussions. In Section 3, we explain how the Kronecker linearization can transform Equation (1) into an equivalent linear system to obtain LS solutions. In Section 4, we propose a CG-type algorithm to solve Problem 1 and verify the theoretical capability of the algorithm. After that, Problems 2 and 3 are investigated in Section 5 and Section 6, respectively. To verify the theory, we provide numerical experiments in Section 7 to show the applicability and efficiency of the algorithm, compared to the Kronecker linearization and recent iterative algorithms. We summarize the whole work in the last section.

2. Auxiliary Results from Matrix Theory

Throughout, let us denote by

R^{m \times n}

the set of all m-by-n real matrices. Recall that the standard (Frobenius) inner product of

A, B \in R^{m \times n}

is defined by

\begin{matrix} 〈 A, B 〉 : = tr (B^{T} A) = tr (A B^{T}) . \end{matrix}

(5)

If

〈 A, B 〉 = 0

, we say that A is orthogonal to B. A well-known property of the inner product is that

〈 A, B C D 〉 = 〈 B^{T} A D^{T}, C 〉,

(6)

for any matrices

A, B, C, D

with appropriate dimensions. The Frobenius norm of matrix

A \in R^{m \times n}

is defined by

{∥ A ∥}_{F} : = \sqrt{tr (A^{T} A)} .

The Kronecker product

A \otimes B

of

A = [a_{i j}] \in R^{m \times n}

and

B \in R^{p \times q}

is defined to be the

m p

-by-

n q

matrix whose each

(i, j)

-th block is given by

a_{i j} B

.

Lemma 1

([14]). For any real matrices A and B, we have

{(A \otimes B)}^{T} = A^{T} \otimes B^{T}

.

The vector operator

Vec (\cdot)

transforms a matrix

A = [a_{i j}] \in R^{m \times n}

to the vector

\begin{matrix} Vec A : = {[a_{11} \dots a_{m 1} a_{12} \dots a_{m 2} \dots a_{1 n} \dots a_{m n}]}^{T} \in R^{m n} . \end{matrix}

The vector operator is bijective, linear, and related to the usual matrix multiplication as follows.

Lemma 2

([14]). For any

A \in R^{m \times n}

,

B \in R^{n \times p}

and

C \in R^{p \times q}

, we have

\begin{matrix} Vec A B C = (C^{T} \otimes A) Vec B . \end{matrix}

For each

m, n \in N

, we define a commutation matrix

\begin{matrix} P (m, n) : = \sum_{i = 1}^{m} \sum_{j = 1}^{n} E_{i j} \otimes E_{i j}^{T} \in R^{m n \times m n}, \end{matrix}

(7)

where the

(i, j)

-th position of

E_{i j} \in R^{m \times n}

is 1 and all other entries are 0. Indeed,

P (m, n)

acts on a vector by permuting its entries as follows.

Lemma 3

([14]). For any matrix

A \in R^{m \times n}

, we have

Vec (A^{T}) = P (m, n) Vec (A) .

(8)

Moreover, commutation matrices permute the entries of

A \otimes B

as follows.

Lemma 4

([14]). For any

A \in R^{m \times n}

and

B \in R^{p \times q}

, we have

B \otimes A = P {(m, p)}^{T} (A \otimes B) P (n, q) .

(9)

The next result will be used in the later discussions.

Lemma 5

([14]). For any matrices

A, B, C, D

with appropriate dimensions, we get

tr (A^{T} D^{T} B C) = {(Vec D)}^{T} (A \otimes B) Vec C .

(10)

3. Least-Squares Solutions via the Kronecker Linearization

From now on, we investigate the generalized Sylvester-transpose matrix Equation (1), with corresponding coefficient matrices

A \in R^{m \times n}

,

B \in R^{p \times q}

,

C \in R^{m \times p}

,

D \in R^{n \times q}

,

E \in R^{m \times q}

, and a rectangular unknown matrix

X \in R^{n \times p}

. We focus our attention when Equation (1) is inconsistent. In this case, we will seek for its LS solution, that is, a matrix

X^{*}

that solves the following minization problem:

\begin{matrix} min_{X \in R^{n \times p}} {∥E - A X B - C X^{T} D∥}_{F} . \end{matrix}

(11)

A traditional algebraic way to solve a linear matrix equation is known as the Kronecker linearization–to transform the matrix equation into an equivalent linear system using the notions of vector operator and Kronecker products. Indeed, taking the vector operator to Equation (1) and applying Lemmas 2 and 3 yield

\begin{matrix} Vec E & = Vec (A X B + C X^{T} D) = (B^{T} \otimes A) Vec X + (D^{T} \otimes C) Vec X^{T} \\ = (B^{T} \otimes A) Vec X + (D^{T} \otimes C) P (n, p) Vec X . \end{matrix}

(12)

Let us denote

x = Vec X

,

e = Vec E

, and

\begin{matrix} M & = (B^{T} \otimes A) + (D^{T} \otimes C) P (n, p) . \end{matrix}

(13)

Thus, a matrix X is an LS solution of Equation (1) if and only if x is an LS solution of the linear system

M x = e

, or equivalently, a solution of the associated normal equation

\begin{matrix} M^{T} M x = M^{T} e . \end{matrix}

(14)

The linear system (14) is always consistent, i.e., Equation (1) always has an LS solution. From the normal Equation (14) and Lemmas 2 and 3, we can deduce:

Lemma 6

([34]). Problem 1 is equivalent to the following consistent matrix equation

\begin{matrix} A^{T} (A X B + C X^{T} D) B^{T} + D (B^{T} X^{T} A^{T} + D^{T} X C^{T}) C = A^{T} E B^{T} + D E^{T} C . \end{matrix}

(15)

Moreover, the normal Equation (14) has a unique solution if and only if the matrix M is of full-column rank, i.e.,

M^{T} M

is invertible. In this case, the unique solution is given by

x^{*} = {(M^{T} M)}^{- 1} M^{T} e

, and the LS error can be computed as follows:

\begin{matrix} {∥ M x^{*} - e ∥}^{2} = {∥ e ∥}^{2} - e^{T} M x^{*} . \end{matrix}

(16)

If M is of not full-column rank (i.e., the kernel of M is nontrivial), then the system

M x = e

has many solutions. In this case, the LS solutions appear in the form

x^{*} = M^{†} e + u

where

M^{†}

is the Moore–Penrose inverse of M, and u is an arbitrary vector in the kernel of M. Among all these solutions,

\begin{matrix} x^{*} = M^{†} e \end{matrix}

(17)

is the unique one having minimal norm.

4. Least-Squares Solution via a Conjugate Gradient Algorithm

In this section, we propose a CG-type algorithm to solve Problem 1. We do not impose any assumption on the matrix M, so that LS solutions of Equation (1) may not be unique.

We shall adapt the conjugate-gradient technique to solve the equivalent matrix Equation (15). Recall that the set of LS solutions of Equation (1) is denoted by

L

. From Lemma 6, observe that the residual of a matrix

X \in R^{n \times p}

according to Equation (1) is given by

\begin{matrix} R_{X} : = A^{T} E B^{T} + D E^{T} C - A^{T} (A X B + C X^{T} D) B^{T} - D (B^{T} X^{T} A^{T} + D^{T} X C^{T}) C . \end{matrix}

(18)

Lemma 6 states that

X \in L

if and only if

R_{X} = 0

. From this, we propose the following algorithm. Indeed, the next approximate solution

X_{r + 1}

is equal to the current approximation

X_{r}

along with a search direction

U_{r + 1}

of suitable step size.

Algorithm 1: A conjugate gradient iterative algorithm for Equation (1)

Remark 1.

To terminate the algorithm, one can alternatively set the stopping rule to be

{∥ R_{r} ∥}_{F} - δ ⩽ ϵ^{'}

where

δ : = ∥ M x^{*} - e ∥

is the positive square root of the LS error described in Equation (16) and

ϵ^{'} > 0

is a small tolerance.

For any given initial matrix

X_{0}

, we will show that Algorithm 1 generates a sequence of approximate solutions

X_{r}

of Equation (1), so that the set of residual matrices

R_{r}

is orthogonal. It follows that a unique LS solution will be obtained within finite steps.

Lemma 7.

Assume that the sequences

{R_{r}}

and

{H_{r}}

are generated by Algorithm 1. We get

\begin{matrix} R_{r + 1} = R_{r} - \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} H_{r + 1}, for r = 1, 2, \dots . \end{matrix}

(19)

Proof.

From Algorithm 1, we have that for any r,

\begin{matrix} R_{r + 1} & = A^{T} E B^{T} + D E^{T} C - A^{T} (A X_{r + 1} B + C X_{r + 1}^{T} D) B^{T} - D (B^{T} X_{r + 1}^{T} A^{T} + D^{T} X_{r + 1} C^{T}) C \\ = A^{T} E B^{T} + D E^{T} C - A^{T} (A (X_{r} + \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} U_{r + 1}) B + C {(X_{r} + \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} U_{r + 1})}^{T} D) B^{T} \\ - D (B^{T} {(X_{r} + \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} U_{r + 1})}^{T} A^{T} + D^{T} (X_{r} + \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} U_{r + 1}) C^{T}) C \\ = A^{T} E B^{T} + D E^{T} C - A^{T} (A X_{r} B + \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} A U_{r + 1} B + C X_{r}^{T} D + \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} C U_{r + 1}^{T} D) B^{T} \\ - D (B^{T} X_{r}^{T} A^{T} + \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} B^{T} U_{r + 1}^{T} A^{T} + D^{T} X_{r} C^{T} + \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} D^{T} U_{r + 1} C^{T}) C \\ = A^{T} E B^{T} + D E^{T} C - A^{T} (A X_{r} B + C X_{r}^{T} D) B^{T} - D (B^{T} X_{r}^{T} A^{T} + D^{T} X_{r} C^{T}) C \\ - \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} [A^{T} (A U_{r + 1} B + C U_{r + 1}^{T} D) B^{T} + D (B^{T} U_{r + 1}^{T} A^{T} + D^{T} U_{r + 1} C^{T}) C] \\ = R_{r} - \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} H_{r + 1} . \end{matrix}

□

Lemma 8.

The sequences

{U_{r}}

and

{H_{r}}

generated by Algorithm 1 satisfy

\begin{matrix} tr (U_{m}^{T} H_{n}) = tr (H_{m}^{T} U_{n}), for any m, n . \end{matrix}

(20)

Proof.

Using the properties of the Kronecker product and the vector operator in Lemmas 1–5, we have

\begin{matrix} tr (H_{m}^{T} U_{n}) & = {(Vec H_{m})}^{T} Vec U_{n} \\ = {[Vec (A^{T} (A U_{m} B + C_{j} U_{m}^{T} D) B^{T} + D (B^{T} U_{m}^{T} A^{T} + D^{T} U_{m} C^{T}) C)]}^{T} Vec U_{n} \\ = {[Vec (A^{T} A U_{m} B B^{T})]}^{T} Vec U_{n} + {[Vec (A^{T} C U_{m}^{T} D B^{T})]}^{T} Vec U_{n} \\ + {[Vec (D B^{T} U_{m}^{T} A^{T} C)]}^{T} Vec U_{n} + {[Vec (D D^{T} U_{m} C^{T} C)]}^{T} Vec U_{n} \\ = {(Vec U_{m})}^{T} (B B^{T} \otimes A^{T} A) Vec U_{n} + {(Vec U_{m}^{T})}^{T} (D B_{k}^{T} \otimes C^{T} A) Vec U_{n} \\ + {(Vec U_{m}^{T})}^{T} (A^{T} C \otimes B D^{T}) Vec U_{n} + {(Vec U_{m})}^{T} (C^{T} C \otimes D D^{T}) Vec U_{n} \\ = tr (B B^{T} U_{m}^{T} A^{T} A U_{n}) + tr (A^{T} C U_{m}^{T} D B^{T} U_{n}^{T}) + tr (D B^{T} U_{m}^{T} A^{T} C U_{n}^{T}) \\ + tr (C^{T} C U_{m}^{T} D D^{T} U_{n}) \\ = {(Vec (B B_{k}^{T} U_{n}^{T} A^{T} A))}^{T} Vec U_{m}^{T} + {(Vec (C^{T} A U_{n} B D^{T}))}^{T} Vec U_{n}^{T} \\ + {(Vec (B D^{T} U_{n} C^{T} A))}^{T} Vec U_{m}^{T} + {(Vec (C^{T} C U_{n}^{T} D D^{T}))}^{T} Vec U_{m}^{T} \\ = {[Vec (A^{T} A U_{n} B B^{T})]}^{T} Vec U_{m} + {[Vec (D B^{T} U_{n}^{T} A^{T} C)]}^{T} Vec U_{m} \\ + {[Vec (A^{T} C U_{n}^{T} D B^{T})]}^{T} Vec U_{m} + {[Vec (D D^{T} U_{n} C^{T} C)]}^{T} Vec U_{m} \\ = {[Vec (A^{T} (A U_{n} B + C U_{n}^{T} D) B^{T} + D (B^{T} U_{n}^{T} A^{T} + D^{T} U_{n} C^{T}) C)]}^{T} Vec U_{m} \\ = {(Vec H_{n})}^{T} Vec U_{m} \\ = tr (H_{n}^{T} U_{m}) \\ = tr (U_{m}^{T} H_{n}) . \end{matrix}

□

Lemma 9.

The sequences

{R_{r}}

,

{U_{r}}

and

{H_{r}}

generated by Algorithm 1 satisfy

tr (R_{r}^{T} R_{r - 1}) = 0, and tr (U_{r + 1}^{T} H_{r}) = 0, forany r .

(21)

Proof.

We use the induction principle to prove (21). In order to calculate related terms, we utilize Lemmas 7 and 8. For

r = 1,

we get

\begin{matrix} tr (R_{1}^{T} R_{0}) & = tr (R_{0}^{T} R_{0}) - \frac{{∥ R_{0} ∥}_{F}^{2}}{α_{1}} tr (H_{1}^{T} R_{0}) = 0, \\ tr (U_{2}^{T} H_{1}) & = tr (R_{1}^{T} H_{1}) + \frac{{∥ R_{1} ∥}_{F}^{2}}{{∥ R_{0} ∥}_{F}^{2}} tr (U_{1}^{T} H_{1}) \\ = - \frac{α_{1}}{{∥ R_{0} ∥}_{F}^{2}} tr (R_{1}^{T} R_{1}) + α_{1} \frac{{∥ R_{1} ∥}_{F}^{2}}{{∥ R_{0} ∥}_{F}^{2}} = 0 . \end{matrix}

These imply that (21) holds for

r = 1

. Now, we proceed the inductive step by assuming that

tr (R_{r}^{T} R_{r - 1}) = 0

and

tr (U_{r + 1}^{T} H_{r}) = 0

. Then

\begin{matrix} tr (R_{r + 1}^{T} R_{r}) & = tr (R_{r}^{T} R_{r}) - \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} tr (H_{r + 1}^{T} (U_{r + 1} - \frac{{∥ R_{r} ∥}_{F}^{2}}{{∥ R_{r - 1} ∥}_{F}^{2}} U_{r})) \\ = {∥ R_{r} ∥}_{F}^{2} - \frac{{∥ R_{r} ∥}_{F}^{2}}{α_{r + 1}} tr (H_{r + 1}^{T} U_{r + 1}) = 0, \\ tr (U_{r + 2}^{T} H_{r + 1}) & = tr (R_{r + 1}^{T} (\frac{- α_{r + 1}}{{∥ R_{r} ∥}_{F}^{2}} (R_{r + 1} - R_{r}))) + \frac{{∥ R_{r + 1} ∥}_{F}^{2}}{{∥ R_{r} ∥}_{F}^{2}} α_{r + 1} \\ = \frac{α_{r + 1}}{{∥ R_{r} ∥}_{F}^{2}} tr [(R_{r + 1}^{T} R_{r}) - (R_{r + 1}^{T} R_{r + 1})] + \frac{{∥ R_{r + 1} ∥}_{F}^{2}}{{∥ R_{r} ∥}_{F}^{2}} α_{r + 1} = 0 . \end{matrix}

Hence, Equationd (21) holds for any r. □

Lemma 10.

Suppose the sequences

{R_{r}}

,

{U_{r}}

and

{H_{r}}

are constructed from Algorithm 1. Then

\begin{matrix} tr (R_{r}^{T} R_{0}) = 0, tr (U_{r + 1}^{T} H_{1}) = 0, for any r . \end{matrix}

(22)

Proof.

The initial step

r = 1

holds due to Lemma 9. Now, assume that Equation (22) is valid for all

r = 1, \dots, k

. From Lemmas 7 and 8, we get

\begin{matrix} tr (R_{k + 1}^{T} R_{0}) & = tr ({(R_{k} - \frac{{∥ R_{k} ∥}_{F}^{2}}{α_{k + 1}} V_{k + 1})}^{T} R_{0}) = tr (R_{k}^{T} R_{0}) - \frac{{∥ R_{k} ∥}_{F}^{2}}{α_{k + 1}} tr (H_{k + 1}^{T} R_{0}) \\ = - \frac{{∥ R_{k} ∥}_{F}^{2}}{α_{k + 1}} tr (H_{k + 1}^{T} U_{1}) = - \frac{{∥ R_{k} ∥}_{F}^{2}}{α_{k + 1}} tr (U_{k + 1}^{T} H_{1}) = 0, \end{matrix}

and

\begin{matrix} tr (U_{k + 2}^{T} H_{1}) & = tr (H_{k + 2}^{T} U_{1}) = tr ({(\frac{- α_{k + 2}}{{∥ R_{k + 1} ∥}_{F}^{2}} (R_{k + 2} - R_{k + 1}))}^{T} U_{1}) \\ = \frac{- α_{k + 2}}{{∥ R_{k + 1} ∥}_{F}^{2}} [tr (R_{k + 2}^{T} R_{0}) - tr (R_{k + 1}^{T} R_{0})] = 0 . \end{matrix}

Hence, Equation (22) holds for any r. □

Lemma 11.

Suppose the sequences

{R_{r}}

,

{U_{r}}

and

{H_{r}}

are constructed from Algorithm 1. Then for any integers m and n such that

m \neq n

, we have

\begin{matrix} tr (R_{m - 1}^{T} R_{n - 1}) = 0, and tr (U_{m}^{T} H_{n}) = 0 . \end{matrix}

(23)

Proof.

According to Lemma 8 and the fact that

tr (R_{m - 1}^{T} R_{n - 1}) = tr (R_{n - 1}^{T} R_{m - 1})

for any integers m and n, it remains to prove two equalities in (23) for only

m, n

such that

m > n

. For

m = n + 1

, the two equalities hold by Lemma 9. For

m = n + 2

, we have

\begin{matrix} tr (R_{n + 2}^{T} R_{n}) & = tr ({(R_{n + 1} - \frac{{∥ R_{n + 1} ∥}_{F}^{2}}{α_{n + 2}} H_{n + 2})}^{T} R_{n}) \\ = - \frac{{∥ R_{n + 1} ∥}_{F}^{2}}{α_{n + 2}} tr (H_{n + 2}^{T} (U_{n + 1} - \frac{{∥ R_{n} ∥}_{F}^{2}}{{∥ R_{n - 1} ∥}_{F}^{2}} U_{n})) \\ = \frac{{∥ R_{n + 1} ∥}_{F}^{2}}{α_{n + 2}} \frac{{∥ R_{n} ∥}_{F}^{2}}{{∥ R_{n - 1} ∥}_{F}^{2}} tr ({(R_{n + 1} + \frac{{∥ R_{n + 1} ∥}_{F}^{2}}{{∥ R_{n} ∥}_{F}^{2}} U_{n + 1})}^{T} H_{n}) \\ = \frac{{∥ R_{n + 1} ∥}_{F}^{2}}{α_{n + 2}} \frac{{∥ R_{n} ∥}_{F}^{2}}{{∥ R_{n - 1} ∥}_{F}^{2}} \frac{α_{n}}{{∥ R_{n - 1} ∥}_{F}^{2}} tr (R_{n + 1}^{T} R_{n - 1}), \end{matrix}

and, similarly, we have

\begin{matrix} tr (R_{n + 1}^{T} R_{n - 1}) & = \frac{{∥ R_{n} ∥}_{F}^{2}}{α_{n + 1}} \frac{{∥ R_{n - 1} ∥}_{F}^{2}}{{∥ R_{n - 2} ∥}_{F}^{2}} \frac{α_{n - 1}}{{∥ R_{n - 2} ∥}_{F}^{2}} tr (R_{n}^{T} R_{n - 2}), \\ tr (R_{n}^{T} R_{n - 2}) & = \frac{{∥ R_{n - 1} ∥}_{F}^{2}}{α_{n}} \frac{{∥ R_{n - 2} ∥}_{F}^{2}}{{∥ R_{n - 3} ∥}_{F}^{2}} \frac{α_{n - 2}}{{∥ R_{n - 3} ∥}_{F}^{2}} tr (R_{n - 1}^{T} R_{n - 3}) . \end{matrix}

Moreover,

\begin{matrix} tr (U_{n + 2}^{T} H_{n}) & = tr ({(R_{n + 1} - \frac{{∥ R_{n + 1} ∥}_{F}^{2}}{{∥ R_{n} ∥}_{F}^{2}} U_{n + 1})}^{T} H_{n}) \\ = tr (R_{n + 1}^{T} (\frac{- α_{n}}{{∥ R_{n - 1} ∥}_{F}^{2}} (R_{n} - R_{n - 1}))) \\ = \frac{α_{n}}{{∥ R_{n - 1} ∥}_{F}^{2}} tr ({(R_{n} - \frac{{∥ R_{n} ∥}_{F}^{2}}{α_{n + 1}} H_{n + 1})}^{T} R_{n - 1}) \\ = - \frac{α_{n}}{{∥ R_{n - 1} ∥}_{F}^{2}} \frac{{∥ R_{n} ∥}_{F}^{2}}{α_{n + 1}} [tr (H_{n + 1}^{T} U_{n}) - \frac{{∥ R_{n - 1} ∥}_{F}^{2}}{{∥ R_{n - 2} ∥}_{F}^{2}} tr (H_{n + 1}^{T} U_{n - 1})] \\ = \frac{α_{n}}{{∥ R_{n - 1} ∥}_{F}^{2}} \frac{{∥ R_{n} ∥}_{F}^{2}}{α_{n + 1}} \frac{{∥ R_{n - 1} ∥}_{F}^{2}}{{∥ R_{n - 2} ∥}_{F}^{2}} tr (U_{n + 1}^{T} H_{n - 1}), \end{matrix}

and, similarly,

\begin{matrix} tr (U_{n + 1}^{T} H_{n - 1}) & = \frac{α_{n - 1}}{{∥ R_{n - 2} ∥}_{F}^{2}} \frac{{∥ R_{n - 1} ∥}_{F}^{2}}{α_{n}} \frac{{∥ R_{n - 2} ∥}_{F}^{2}}{{∥ R_{n - 3} ∥}_{F}^{2}} tr (U_{n}^{T} H_{n - 2}) . \end{matrix}

In a similar way, we can write

tr (R_{n + 1}^{T} R_{n - 1})

and

tr (U_{n + 2}^{T} H_{n})

in terms of

tr (R_{n}^{T} R_{n - 2})

and

tr (U_{n + 1}^{T} H_{n - 1})

, respectively. Continuing this process until the terms

tr (R_{2}^{T} R_{0})

and

tr (U_{3}^{T} H_{1})

appear. By Lemma 10, we get

tr (R_{n + 1}^{T} R_{n - 1}) = 0

and

tr (U_{n + 2}^{T} H_{n}) = 0

. Similarly, we have

tr (R_{m}^{T} R_{n - 1}) = 0

and

tr (U_{m}^{T} H_{n}) = 0

for

m = n + 3, \dots, k

.

Suppose that

tr (R_{m - 1}^{T} R_{n - 1}) = tr (U_{m}^{T} H_{n}) = 0

for

m \in {n + 1, \dots, k}

. Then for

m = k + 1

, we have

\begin{matrix} tr (R_{k}^{T} R_{n - 1}) & = tr (R_{k - 1}^{T} R_{n - 1}) - \frac{{∥ R_{k - 1} ∥}_{F}^{2}}{α_{k}} tr (H_{k}^{T} R_{n - 1}) \\ = - \frac{{∥ R_{k - 1} ∥}_{F}^{2}}{α_{k}} tr (H_{k}^{T} (U_{n} - \frac{{∥ R_{n - 1} ∥}_{F}^{2}}{{∥ R_{n - 2} ∥}_{F}^{2}} U_{n - 1})) \\ = - \frac{{∥ R_{k - 1} ∥}_{F}^{2}}{α_{k}} [tr (H_{k}^{T} U_{n}) - \frac{{∥ R_{n - 1} ∥}_{F}^{2}}{{∥ R_{n - 2} ∥}_{F}^{2}} tr (H_{k}^{T} U_{n - 1})] = 0 . \end{matrix}

and

\begin{matrix} tr (U_{k + 1}^{T} H_{n}) & = tr (R_{k}^{T} H_{n}) + \frac{{∥ R_{k} ∥}_{F}^{2}}{{∥ R_{k - 1} ∥}_{F}^{2}} tr (U_{k}^{T} H_{n}) = tr (R_{k}^{T} (\frac{- α_{n}}{{∥ R_{n - 1} ∥}_{F}^{2}} (R_{n} - R_{n - 1}))) \\ = \frac{- α_{n}}{{∥ R_{n - 1} ∥}_{F}^{2}} tr (R_{k}^{T} R_{n} - R_{k}^{T} R_{n - 1}) = 0 . \end{matrix}

Hence,

tr (R_{m - 1}^{T} R_{n - 1}) = 0

and

tr (U_{m}^{T} H_{n}) = 0

for any

m, n

such that

m \neq n

. □

Theorem 1.

Algorithm 1 solves Problem 1 within finite steps. More precisely, for any given initial matrix

X_{0} \in R^{n \times p}

, the sequence

{X_{r}}

constructed from Algorithm 1 converges to an LS solution of Equation (1) in at most

n p

iterations.

Proof.

Assume that

R_{r} \neq 0

for

r = 0, 1, \dots, n p - 1

. Assume that

R_{n p} \neq 0

. By Lemma 11, the set

{R_{0}, R_{1}, \dots, R_{n p}}

of residual matrices is orthogonal in

R^{n \times p}

with respect to the Frobenius inner product (5). Therefore, the set

{R_{0}, R_{1}, \dots, R_{n p}}

of

n p + 1

elements is linearly independent. This contradicts the fact that the dimension of

R^{n \times p}

is

n p

. Thus,

R_{n p} = 0

, and

X_{n p}

satisfies Equation (15) in Lemma 6. Hence

X_{n p}

is an LS solution of Equation (1). □

We adapt the same idea as that for Algorithm 1 to derive an algorithm for Equation (2) as follows:

Algorithm 2: A conjugate gradient iterative algorithm for Equation (2)

The stopping rule of Algorithm 2 may be described as

{∥ R_{r} ∥}_{F} - δ ⩽ ϵ^{'}

where

δ

is the positive square root of the associated LS error and

ϵ^{'} > 0

is a small tolerance.

Theorem 2.

Consider Equation (2) where

A_{i} \in R^{m \times n}

,

B_{i} \in R^{p \times q}

,

C_{j} \in R^{m \times p}

,

D_{j} \in R^{n \times q}

,

D \in R^{m \times q}

,

E \in R^{m \times q}

are given constant matrices and

X \in R^{n \times p}

is an unknown matrix. Assume that the matrix

\begin{matrix} M : = \sum_{i = 1}^{s} (B_{i}^{T} \otimes A_{i}) + \sum_{j = 1}^{t} (D_{j}^{T} \otimes C_{j}) P (n, p) \end{matrix}

(24)

is of full-column rank. Then, for any given initial matrix

X_{0} \in R^{n \times p}

, the sequence

{X_{r}}

constructed from Algorithm 2 converges to a unique LS solution.

Proof.

The proof of is similar to that of Theorem 1. □

5. Minimal-Norm Least-Squares Solution via Algorithm 1

In this section, we investigate Problem 2. That is, we consider the case when the matrix M may not have full-column rank, so that Equation (1) may have many LS solutions. We shall seek for an element of

L

with the minimal Frobenius norm.

Lemma 12.

Assume

\hat{X} \in L

. Then, any arbitrary element

\tilde{X} \in L

can be expressed as

\hat{X} + Z

for some matrix

Z \in R^{n \times p}

satisfying

\begin{matrix} A^{T} (A Z B + C Z^{T} D) B^{T} + D (B^{T} Z^{T} A^{T} + D^{T} Z C^{T}) C = 0 . \end{matrix}

(25)

Proof.

Let us denote the residual of the LS solutions

\hat{X}

and

\tilde{X}

, according to Equation (18), by

R_{\hat{X}}

and

R_{\tilde{X}}

, respectively. We consider the different

Z : = \tilde{X} - \hat{X}

. Now, we compute

\begin{matrix} R_{\tilde{X}} & = A^{T} (A (\hat{X} + Z) B + C {(\hat{X} + Z)}^{T} D) B^{T} + D (B^{T} {(\hat{X} + Z)}^{T} A^{T} + D^{T} (\hat{X} + Z) C^{T}) C \\ - A^{T} E B^{T} - D E^{T} C \\ = A^{T} (A Z B + C Z^{T} D) B^{T} + D (B^{T} Z^{T} A^{T} + D^{T} Z C^{T}) C - R_{\hat{X}} . \end{matrix}

Since

\hat{X}, \tilde{X} \in L

, by Lemma 6 we have

R_{\hat{X}} = R_{\tilde{X}} = 0

. It follows that Equation (25) holds as desired. □

Theorem 3.

Algorithm 1 solves Problem 2 in at most

n p

iterations by starting with the initial matrix

\begin{matrix} X_{0} = A^{T} (A V_{0} B + C V_{0}^{T} D) B^{T} + D (B^{T} V_{0}^{T} A^{T} + D^{T} V_{0} C^{T}) C, \end{matrix}

(26)

where

V_{0} \in R^{n \times p}

is an arbitrary matrix, or especially

X_{0} = 0

.

Proof.

If we run Algorithm 1 starting with (26), then we can write the solution

X^{*}

of Problem 2 so that

X^{*} = A^{T} (A V^{*} B + C V^{* T} D) B^{T} + D (B^{T} V^{* T} A^{T} + D^{T} V^{*} C^{T}) C,

for some matrix

V^{*} \in R^{n \times p}

. Now, assume that

\tilde{X}

is an arbitrary element in

L

. By Lemma 12, there is a matrix

Z \in R^{n \times p}

such that

\tilde{X} = X^{*} + Z

and

A^{T} (A Z B + C Z^{T} D) B^{T} + D (B^{T} Z^{T} A^{T} + D^{T} Z C^{T}) C = 0 .

Using the property (6), we get

\begin{matrix} 〈X^{*}, Z〉 & = 〈A^{T} (A V^{*} B + C V^{* T} D) B^{T} + D (B^{T} V^{* T} A^{T} + D^{T} V^{*} C^{T}) C, Z〉 \\ = 〈V^{*}, A^{T} (A Z B + C Z^{T} D) B^{T} + D (B^{T} Z^{T} A^{T} + D^{T} Z C^{T}) C〉 \\ = 0 . \end{matrix}

Since

X^{*}

is orthogonal to Z, it follows from the Pythagorean theorem that

\begin{matrix} {∥ \tilde{X} ∥}_{F}^{2} = {∥ X^{*} + Z ∥}_{F}^{2} = {∥ X^{*} ∥}_{F}^{2} + {∥ Z ∥}_{F}^{2} \geq {∥ X^{*} ∥}_{F}^{2} . \end{matrix}

This implies that

X^{*}

is the minimal-norm solution. □

Theorem 4.

Consider the sequence

{X_{r}}

generated by Algorithm 2 starting with the initial matrix

\begin{matrix} X_{0} = \sum_{k = 1}^{s} A_{k}^{T} (\sum_{i = 1}^{s} A_{i} V_{0} B_{i} + \sum_{j = 1}^{t} C_{j} V_{0}^{T} D_{j}) B_{k}^{T} + \sum_{l = 1}^{t} D_{l} (\sum_{i = 1}^{s} B_{i}^{T} V_{0}^{T} A_{i}^{T} + \sum_{j = 1}^{t} D_{j}^{T} V_{0} C_{j}^{T}) C_{l}, \end{matrix}

where

V_{0} \in R^{n \times p}

is an arbitrary matrix, or especially

X_{0} = 0 \in R^{n \times p}

. Then the sequence

{X_{r}}

converges to the minimal-norm LS solution of Equatiom (2) in at most

n p

iterations.

Proof.

The proof is similar to that of Theorem 3. □

6. Least-Squares Solution Closest to a Given Matrix

In this section, we investigate Problem 3. In this case, Equation (1) may have many LS solutions. We shall seek for one that closest to a given matrix with respect to the Frobenius norm.

Theorem 5.

Algorithm 1 solves Problem 3 by substituting E with

E_{1} = E - (A Y B + C Y^{T} D)

, and choosing the initial matrix to be

\begin{matrix} W_{0} = A^{T} (A V B + C V^{T} D) B^{T} + D (B^{T} V^{T} A^{T} + D^{T} V C^{T}) C, \end{matrix}

(27)

where

V \in R^{n \times p}

is arbitrary, or specially

W_{0} = 0 \in R^{n \times p}

.

Proof.

Let

Y \in R^{n \times p}

be given. We can translate Problem 3 into Problem 2 as follows:

\begin{matrix} min_{X \in R^{n \times p}} & {∥ A X B + C X^{T} D - E ∥}_{F} \\ = min_{X \in R^{n \times p}} {∥ A X B + C X^{T} D - E - A Y B - C Y^{T} D + A Y B + C Y^{T} D ∥}_{F} \\ = min_{X \in R^{n \times p}} {∥ A (X - Y) B + C {(X - Y)}^{T} D - E + A Y B + C Y^{T} D ∥}_{F} . \end{matrix}

Now, substituting

E_{1} = E - (A Y B + C Y^{T} D)

and

W = X - Y

, we see that the solution

\overset{ˇ}{X}

of Problem 3 is equal to

W^{*} + Y

where

W^{*}

is the minimal-norm LS solution of the equation

\begin{matrix} A W B + C W^{T} D = E_{1}, \end{matrix}

in unknown W. By Theorem 3, the matrix

W^{*}

can be solved by Algorithm 1 with the initial matrix (27) where

V \in R^{n \times p}

is arbitrary matrix, or especially

W_{0} = 0

. □

Theorem 6.

Suppose that the matrix Equation (2) is inconsistent. Let

Y \in R^{n \times p}

be given. Consider Algorithm 2 when we replace the matrix E by

\begin{matrix} E_{1} = E - \sum_{i = 1}^{s} A_{i} Y B_{i} - \sum_{j = 1}^{t} C_{j} Y^{T} D_{j}, \end{matrix}

and choose the initial matrix

\begin{matrix} W_{0} = \sum_{k = 1}^{s} A_{k}^{T} (\sum_{i = 1}^{s} A_{i} F B_{i} + \sum_{j = 1}^{t} C_{j} F^{T} D_{j}) B_{k}^{T} + \sum_{l = 1}^{t} D_{l} (\sum_{i = 1}^{s} B_{i}^{T} F^{T} A_{i}^{T} + \sum_{j = 1}^{t} D_{j}^{T} F C_{j}^{T}) C_{l}, \end{matrix}

where

F \in R^{n \times p}

is arbitrary, or

W_{0} = 0 \in R^{n \times p}

. Then, the sequence

{X_{r}}

obtained by Algorithm 2 converges to the LS solution of (2) closest to Y within

n p

iterations.

Proof.

The proof of the theorem is similar to that of Theorem 5. □

7. Numerical Experiments

In this section, we provide numerical results to show the efficiency and effectiveness of Algorithm 2 (denoted by CG), which is an extension of Algorithm 1. We perform experiments when the coefficients in a given matrix equation are dense/sparse rectangular matrices of moderate/large sizes. We denote by

ones (m, n)

the m-by-n matrix whose all entries are 1. Each random matrix

rand (m, n)

has all entries belonging to the interval

(0, 1)

. Each experiment contains some comparisons of Algorithm 2 with the direct Kronecker linearization as well as well-known iterative algorithms. All iterations were performed by MATLAB R2021a, on Mac operating system (M1 chip 8C CPU/8C GPU/8GB/512GB). The performance of algorithms is investigated through the number of iterations, the norm of residual matrices, and the CPU time. The latter is measured in seconds using the functions

t i c

and

t o c

on MATLAB.

The next is an example of Problem 1.

Example 1.

Consider a generalized Sylvester-transpose matrix equation

\begin{matrix} A_{1} X B_{1} + A_{2} X B_{2} + A_{3} X B_{3} + C_{1} X^{T} D_{1} + C_{2} X^{T} D_{2} = E, \end{matrix}

(28)

where the coefficient matrices are given by

\begin{matrix} A_{1} = 0.5 ones (m, n) - rand (m, n) \in R^{50 \times 50}, & A_{2} = 0.5 ones (m, n) - rand (m, n) \in R^{50 \times 50}, \\ A_{3} = 0.5 ones (m, n) - rand (m, n) \in R^{50 \times 50}, & B_{1} = 0.5 ones (p, q) - rand (p, q) \in R^{40 \times 50}, \\ B_{2} = 0.5 ones (p, q) - rand (p, q) \in R^{40 \times 50}, & B_{3} = 0.5 ones (p, q) - rand (p, q) \in R^{40 \times 50}, \\ C_{1} = 0.5 ones (m, p) - rand (m, p) \in R^{50 \times 40}, & C_{2} = 0.5 ones (m, p) - rand (m, p) \in R^{50 \times 40}, \\ D_{1} = 0.5 ones (n, q) - rand (n, q) \in R^{50 \times 50}, & D_{2} = 0.5 ones (n, q) - rand (n, q) \in R^{50 \times 50}, \\ E = 0.5 ones (m, q) - rand (m, q) \in R^{50 \times 50} . \end{matrix}

In fact, we have

rank M = 2000 \neq 2001 = rank [M e]

, i.e., the matrix equation does not have an exact solution. However, M is of full-column rank, so this equation has a unique LS solution. We run Algorithm 2 using an initial matrix

X_{0} = 0 \in R^{50 \times 40}

and a tolerance error

ϵ = ∥ M x^{*} - e ∥ = 6.4812

, where

x^{*} = {(M^{T} M)}^{- 1} M^{T} e

. It turns out that Algorithm 2 takes 20 iterations to get a least-square solution, consuming around

0.2

s, while the direct method consumes around 7 s. Thus, Algorithm 2 takes 35 times less computational time than the direct method. We compare the performance of Algorithm 2 with other well-known iterative algorithms: GI method [31], LSI method [31], and TAUOpt method [32]. The numerical results are shown in Table 1 and Figure 1. We see that after running 20 iterations, Algorithm 2 consumes CTs slightly more than other methods, but the relative error

{∥ R_{r} ∥}_{F}

is less than those of the others. Hence, Algorithm 2 is applicable and has a good performance.

The next is an example of Problem 2.

Example 2.

Consider a generalized Sylvester-transpose matrix equation

\begin{matrix} A_{1} X B_{1} + C_{1} X^{T} D_{1} + C_{2} X^{T} D_{2} = E, \end{matrix}

(29)

where

\begin{matrix} A_{1} = - 0.08 \times ones (m, n) \in R^{30 \times 25}, & B_{1} = tridiag (0.11, - 0.61, - 0.29) \in R^{30 \times 30}, \\ C_{1} = tridiag (- 0.03, - 0.22, - 0.1) \in R^{30 \times 30}, & C_{2} = tridiag (0.38, 0.29, - 0.41) \in R^{30 \times 30}, \\ D_{1} = - 0.13 \times ones (n, q) \in R^{25 \times 30}, & D_{2} = 0.04 \times ones (n, q) \in R^{25 \times 30}, \\ E = - 0.01 \times I_{30} \in R^{30 \times 30} . \end{matrix}

In this case, Equation (29) is inconsistent and the associated matrix M is not of full-column rank. Thus, Equation (29) has many LS solutions. The direct method concerning Moore–Penrose inverse (17) takes

0.627019

s to get the minimal-norm LS solutions. Alternatively, MNLS method [35] can be also used to this kind of problem. However, some coefficient matrices are triangular matrices with multiple zeros, causing the MNLS algorithm diverges and cannot provide answer. Therefore, let us apply Algorithm 2 using a tolerance error

ϵ = 10^{- 5}

. According to Theorem 4, we choose three different matrices

V_{0}

to generate the initial matrix

X_{0}

. The numerical results are shown in Table 2 and Figure 2.

Figure 2 shows that the logarithm of the relative errors

{∥ R_{r} ∥}_{F}

for CG algorithms, using three initial matrices, are rapidly decreasing to zero. All of them consume around

0.037

s to arrive at the desired solution, which is 16 times less than the direct method.

The following is an example of Problem 3.

Example 3.

Consider a generalized Sylvester-transpose matrix equation

\begin{matrix} A_{1} X B_{1} + C_{1} X^{T} D_{1} + C_{2} X^{T} D_{2} = E, \end{matrix}

(30)

where

\begin{matrix} A_{1} = 0.2 \times ones (m, n) \in R^{50 \times 40}, & B_{1} = tridiag (- 0.2, 0.3, 0.3) \in R^{50 \times 50}, \\ C_{1} = tridiag (0.4, - 0.2, - 0.1) \in R^{50 \times 50}, & C_{2} = tridiag (0.7, - 0.2, 0.3) \in R^{50 \times 50}, \\ D_{1} = - 0.2 \times ones (n, q) \in R^{40 \times 50}, & D_{2} = 0.1 \times ones (n, q) \in R^{40 \times 50}, \\ E = I_{50} \in R^{50 \times 50} . \end{matrix}

In fact, Equation (30) is inconsistent and has many LS solutions. The first task is to find the LS solution of Equation (30) closest to

Y = 0.1 \times ones (n, p) \in R^{40 \times 50}

. According to Theorem 6, we apply Algorithm 2 with two different matrices V to construct the initial matrix

W_{0}

. Algorithm 2 with

V = 0

and

V = - 0.19 \times I

are denoted in Figure 3 by

C G_{1}

and

C G_{2}

, respectively.

The second task is to solve Problem 3 when we are given

Y = 0.1 \times ones (n, p) \in R^{40 \times 50}

. Similarly, we use two different matrices

V = 0

and

V = 0.02 \times ones (n, p) \in R^{40 \times 50}

to construct the initial matrix.

Figure 3. The logarithm of the relative error for Example 3 with Y = 0.1

\times ones (n, p)

.

Figure 3. The logarithm of the relative error for Example 3 with Y = 0.1

\times ones (n, p)

.

Figure 4. The logarithm of the relative error for Example 3 with Y = I.

Table 3. Relative error and computational time for Example 3.

Y	Initial V	Iterations	CPU	${∥ R_{r} ∥}_{F}$	${∥ X^{*} - Y ∥}_{F}$
$0.1 \times ones (n, p)$	0	18	0.104135	0.000006	4.3116
$0.1 \times ones (n, p)$	$- 0.19 \times I$	20	0.108153	0.000005	4.3116
I	0	18	0.113960	0.000009	0.8580
I	$0.02 \times ones$	20	0.108499	0.000006	0.8580

We apply Algorithm 2 with a tolerance error

ϵ = 10^{- 5}

. The numerical results in Figure 3 and Figure 4, and Table 3 illustrate that, in each case, the relative error converges rapidly to zero within 20 iterations, consuming around

0.1

s. Thus, Algorithm 2 performs well in both the number of iterations and computational time. Moreover, changing initial matrix and the desired matrix Y does not siginificantly affect the performance of algorithm.

8. Conclusions

We propose CG-type iterative algorithms, namely, Algorithms 1 and 2, to generate approximate solutions for the generalized Sylvester-transpose matrix Equations (1) and (2), respectively. When the matrix equation is inconsistent, the algorithm will arrive at an LS solution within

n p

iterations with the absence of round-off errors. When the matrix equation has many LS solutions, the algorithm can search for the one with minimal Frobenius norm within

n p

steps. Moreover, given a matrix Y, the algorithm can find the LS solution closest to Y within

n p

steps. The numerical simulations validate the relevance of the algorithm for medium/large sizes of squares/non-squares matrices. The algorithm is always applicable for any given initial matrix and the given matrix Y. The algorithm performs well in both the number of iterations and computational times, compared to the direct Kronecker linearization and well-known iterative methods.

Author Contributions

Writing—original draft preparation, K.T.; writing—review and editing, P.C.; data curation, K.T.; supervision, P.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research project is supported by National Research Council of Thailand (NRCT): (N41A640234).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare there is no conflict of interest.

References

Geir, E.D.; Fernando, P. A Course in Robust Control Theory: A Convex Approach; Springer: New York, NY, USA, 1999. [Google Scholar]
Lewis, F. A survey of linear singular systems. Circ. Syst. Signal Process. 1986, 5, 3–36. [Google Scholar] [CrossRef]
Dai, L. Singular Control Systems; Springer: Berlin, Germany, 1989. [Google Scholar]
Enright, W.H. Improving the efficiency of matrix operations in the numerical solution of stiff ordinary differential equations. ACM Trans. Math. Softw. 1978, 4, 127–136. [Google Scholar] [CrossRef]
Aliev, F.A.; Larin, V.B. Optimization of Linear Control Systems: Analytical Methods and Computational Algorithms; Stability Control Theory, Methods Applications; CRC Press: Boca Raton, FL, USA, 1998. [Google Scholar]
Calvetti, D.; Reichel, L. Application of ADI iterative methods to the restoration of noisy images. SIAM J. Matrix Anal. Appl. 1996, 17, 165–186. [Google Scholar] [CrossRef]
Duan, G.R. Eigenstructure assignment in descriptor systems via output feedback: A new complete parametric approach. Int. J. Control 1999, 72, 345–364. [Google Scholar] [CrossRef]
Duan, G.R. Parametric approaches for eigenstructure assignment in high-order linear systems. Int. J. Control Autom. Syst. 2005, 3, 419–429. [Google Scholar]
Kim, Y.; Kim, H.S. Eigenstructure assignment algorithm for second order systems. J. Guid. Control Dyn. 1999, 22, 729–731. [Google Scholar] [CrossRef]
Fletcher, L.R.; Kuatsky, J.; Nichols, N.K. Eigenstructure assignment in descriptor systems. IEEE Trans. Autom. Control 1986, 31, 1138–1141. [Google Scholar] [CrossRef]
Frank, P.M. Fault diagnosis in dynamic systems using analytical and knowledge-based redundancy a survey and some new results. Automatica 1990, 26, 459–474. [Google Scholar] [CrossRef]
Epton, M. Methods for the solution of AXD - BXC = E and its applications in the numerical solution of implicit ordinary differential equations. BIT Numer. Math. 1980, 20, 341–345. [Google Scholar] [CrossRef]
Zhou, B.; Duan, G.R. On the generalized Sylvester mapping and matrix equations. Syst. Control Lett. 2008, 57, 200–208. [Google Scholar] [CrossRef]
Horn, R.; Johnson, C. Topics in Matrix Analysis; Cambridge University Press: New York, NY, USA, 1991. [Google Scholar]
Kilicman, A.; Al Zhour, Z.A. Vector least-squares solutions for coupled singular matrix equations. Comput. Appl. Math. 2007, 206, 1051–1069. [Google Scholar] [CrossRef] [Green Version]
Simoncini, V. Computational methods for linear matrix equations. SIAM Rev. 2016, 58, 377–441. [Google Scholar] [CrossRef]
Hajarian, M. Developing BiCG and BiCR methods to solve generalized Sylvester-transpose matrix equations. Int. J. Autom. Comput. 2014, 11, 25–29. [Google Scholar] [CrossRef]
Hajarian, M. Matrix form of the CGS method for solving general coupled matrix equations. Appl. Math. Lett. 2014, 34, 37–42. [Google Scholar] [CrossRef]
Hajarian, M. Generalized conjugate direction algorithm for solving the general coupled matrix equations over symmetric matrices. Numer. Algorithms 2016, 73, 591–609. [Google Scholar] [CrossRef]
Dehghan, M.; Mohammadi-Arani, R. Generalized product-type methods based on Bi-conjugate gradient (GPBiCG) for solving shifted linear systems. Comput. Appl. Math. 2017, 36, 1591–1606. [Google Scholar] [CrossRef]
Zadeh, N.A.; Tajaddini, A.; Wu, G. Weighted and deflated global GMRES algorithms for solving large Sylvester matrix equations. Numer. Algorithms 2019, 82, 155–181. [Google Scholar] [CrossRef]
Kittisopaporn, A.; Chansangiam, P.; Lewkeeratiyukul, W. Convergence analysis of gradient-based iterative algorithms for a class of rectangular Sylvester matrix equation based on Banach contraction principle. Adv. Differ. Equ. 2021, 2021, 17. [Google Scholar] [CrossRef]
Boonruangkan, N.; Chansangiam, P. Convergence analysis of a gradient iterative algorithm with optimal convergence factor for a generalized Sylvester-transpose matrix equation. AIMS Math. 2021, 6, 8477–8496. [Google Scholar] [CrossRef]
Zhang, X.; Sheng, X. The relaxed gradient based iterative algorithm for the symmetric (skew symmetric) solution of the Sylvester equation AX + XB = C. Math. Probl. Eng. 2017, 2017. [Google Scholar] [CrossRef]
Xie, Y.J.; Ma, C.F. The accelerated gradient based iterative algorithm for solving a class of generalized Sylvester transpose matrix equation. Appl. Math. Comp. 2016, 273, 1257–1269. [Google Scholar] [CrossRef]
Tian, Z.; Tian, M.; Gu, C.; Hao, X. An accelerated Jacobi-gradient based iterative algorithm for solving Sylvester matrix equations. Filomat 2017, 31, 2381–2390. [Google Scholar] [CrossRef]
Sasaki, N.; Chansangiam, P. Modified Jacobi-gradient iterative method for generalized Sylvester matrix equation. Symmetry 2020, 12, 1831. [Google Scholar] [CrossRef]
Kittisopaporn, A.; Chansangiam, P. Gradient-descent iterative algorithm for solving a class of linear matrix equations with applications to heat and Poisson equations. Adv. Differ. Equ. 2020, 2020, 324. [Google Scholar] [CrossRef]
Heyouni, M.; Saberi-Movahed, F.; Tajaddini, A. On global Hessenberg based methods for solving Sylvester matrix equations. Comp. Math. Appl. 2018, 2019, 77–92. [Google Scholar] [CrossRef]
Hajarian, M. Extending the CGLS algorithm for least squares solutions of the generalized Sylvester-transpose matrix equations. J. Franklin Inst. 2016, 353, 1168–1185. [Google Scholar] [CrossRef]
Xie, L.; Ding, J.; Ding, F. Gradient based iterative solutions for general linear matrix equations. Comput. Math. Appl. 2009, 58, 1441–1448. [Google Scholar] [CrossRef]
Kittisopaporn, A.; Chansangiam, P. Approximated least-squares solutions of a generalized Sylvester-transpose matrix equation via gradient-descent iterative algorithm. Adv. Differ. Equ. 2021, 2021, 266. [Google Scholar] [CrossRef]
Tansri, K.; Choomklang, S.; Chansangiam, P. Conjugate gradient algorithm for consistent generalized Sylvester-transpose matrix equations. AIMS Math. 2022, 7, 5386–5407. [Google Scholar] [CrossRef]
Wang, M.; Cheng, X. Iterative algorithms for solving the matrix equation AXB + CX^TD = E. Appl. Math. Comput. 2007, 187, 622–629. [Google Scholar] [CrossRef]
Chen, X.; Ji, J. The minimum-norm least-squares solution of a linear system and symmetric rank-one updates. Electron. J. Linear Algebra 2011, 22, 480–489. [Google Scholar] [CrossRef]

Figure 1. The logarithm of the relative error

{∥ R_{r} ∥}_{F}

for Example 1.

Figure 1. The logarithm of the relative error

{∥ R_{r} ∥}_{F}

for Example 1.

Figure 2. The logarithm of the relative error for Example 2.

Table 1. Relative error and computational time for Example 1.

Method	Iterations	CPU	${∥ R_{r} ∥}_{F}$
CG	20	0.199308	6.407766
GI	20	0.129715	10.907665
LSI	20	0.179449	14.390460
TAUOpt	20	0.073866	7.806273
Direct	−	7.048632	0

Table 2. Relative error and computational time for Example 2.

$V_{0}$	Iterations	CPU	${∥ R_{r} ∥}_{F}$
0	6	0.036523	0.000008
0.02 × ones	11	0.036540	0.000009
−0.01 $\times I$	10	0.038425	0.000009

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tansri, K.; Chansangiam, P. Conjugate Gradient Algorithm for Least-Squares Solutions of a Generalized Sylvester-Transpose Matrix Equation. Symmetry 2022, 14, 1868. https://doi.org/10.3390/sym14091868

AMA Style

Tansri K, Chansangiam P. Conjugate Gradient Algorithm for Least-Squares Solutions of a Generalized Sylvester-Transpose Matrix Equation. Symmetry. 2022; 14(9):1868. https://doi.org/10.3390/sym14091868

Chicago/Turabian Style

Tansri, Kanjanaporn, and Pattrawut Chansangiam. 2022. "Conjugate Gradient Algorithm for Least-Squares Solutions of a Generalized Sylvester-Transpose Matrix Equation" Symmetry 14, no. 9: 1868. https://doi.org/10.3390/sym14091868

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Conjugate Gradient Algorithm for Least-Squares Solutions of a Generalized Sylvester-Transpose Matrix Equation

Abstract

1. Introduction

2. Auxiliary Results from Matrix Theory

3. Least-Squares Solutions via the Kronecker Linearization

4. Least-Squares Solution via a Conjugate Gradient Algorithm

5. Minimal-Norm Least-Squares Solution via Algorithm 1

6. Least-Squares Solution Closest to a Given Matrix

7. Numerical Experiments

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI