1. Introduction
In 1843, Sir William Rowan Hamilton [
1] introduced quaternions in an effort to expand the concept of complex numbers into spaces of higher dimensions. Quaternions and quaternion matrices play a critical role in many applications, such as quantum mechanics, computer graphics, quaternion principal component analysis (QPCA), and image processing [
2,
3,
4,
5,
6]. Due to the non-commutative property of quaternion multiplication, the eigenvalues of quaternion matrices are distinguished into left and right types, with the right eigenvalue problem having garnered widespread attention [
7,
8,
9,
10,
11].
Quaternionic analysis extends complex analysis to the quaternion algebra
, exploring the differentiation, integration, and regularity of quaternion-valued functions. The non-commutativity of quaternions introduces significant complexities, leading to two dominant frameworks for defining regularity: Fueter regularity [
12,
13], which is based on the Cauchy–Riemann–Fueter equations, and slice regularity [
14,
15], which requires holomorphicity restricted to complex slices of
. Modern research focuses on unifying these approaches and expanding their applications in fields like physics, including in 3D rotations, and engineering, such as in hypercomplex signal processing [
16,
17,
18].
In recent years, a series of numerical methods have been developed to compute the eigenvalues of quaternion matrices, particularly focusing on the eigenvalue problems of Hermitian matrices. These numerical methods can be broadly categorized into three classes: The first class involves direct quaternion arithmetic operations. For instance, Bunse-Gerstner proposed a quaternion QR algorithm for solving the right eigenvalue problem of quaternion matrices [
19]. However, due to the complexity of quaternion arithmetic, this algorithm requires significant computational effort. The second class is based on the real or complex counterparts of quaternion matrices. By studying the real or complex counterpart structures and properties of quaternion matrices and by leveraging stable orthogonal transformations, real or complex structure-preserving methods have been developed to solve the right eigenvalue problem of quaternion Hermitian matrices [
20,
21]. The third class is based on the real counterparts of quaternion matrices, leading to the development of numerous structure-preserving iterative algorithms. Examples include the explicitly restarted quaternion Arnoldi method (ERQAM) [
22], designed to compute standard right eigenpairs of general quaternion matrices, and the novel quaternion power method introduced in [
23] for computing the dominant standard right eigenvalue and its corresponding eigenvector. Structure-preserving methods exhibit significant advantages in terms of storage space and computational efficiency.
In the field of quaternion optimization, significant progress has been made with generalized HR calculus (GHR) [
24,
25,
26]. GHR leverages quaternion rotations in a general orthogonal system, offering a way to compute the derivatives and gradients of functions with quaternion variables, thereby providing a solid theoretical foundation for the development of quaternion optimization methods. Subsequently, based on generalized HR calculus (GHR), Diao et al. [
27] proposed a gradient projection algorithm for maximizing the quaternion Rayleigh quotient under unit constraints. This algorithm demonstrated good performance and contributed to the development of quaternion optimization algorithms.
In this paper, we first equivalently transform the principal eigenvalue problem of quaternion Hermitian matrices into a maximization optimization problem over the quaternion skew field. Leveraging generalized HR calculus, we propose a quaternion Nesterov accelerated gradient projection algorithm to solve it. Subsequently, we conduct a convergence analysis of the quaternion Nesterov accelerated gradient projection algorithm, proving that a real differentiable function with Lipschitz continuous gradient possesses a quadratic upper bound. Furthermore, we theoretically prove the convergence of the quaternion Nesterov accelerated projected gradient algorithm. Finally, we compare our algorithm with two other methods, and numerical experiments indicate that our algorithm exhibits superior performance in terms of both accuracy and time efficiency.
The rest of this paper is organized as follows:
Section 2 introduces some basic notations and fundamental properties of quaternions, including definitions of quaternion modulus, similarity, and rotation, with a particular emphasis on reviewing the relevant definitions and properties of generalized HR integrals. In
Section 3, we design a quaternion Nesterov accelerated projected gradient algorithm to solve the principal eigenvalue and corresponding eigenvector of quaternion Hermitian matrices.
Section 4 provides a convergence analysis of the quaternion Nesterov accelerated gradient algorithm. In
Section 5, we conduct numerical experiments to validate the proposed method. Finally, in
Section 6, we summarize this paper.
2. Preliminaries
In this section, some quaternion notations and basic definitions are introduced, which are used in the rest of the paper.
2.1. Notations
Throughout this paper, scalars, vectors, real or complex matrices, and quaternion matrices are distinguished as follows: scalars are denoted by lowercase Greek letters, e.g., , ; quaternions are denoted by lowercase letters, e.g., , and quaternion vectors are denoted by ; real or complex matrices are defined by uppercase letters, e.g., ; and quaternion matrices are denoted by bold uppercase letters, e.g., . The notations , , and denote the transpose, conjugate, and conjugate transpose, respectively. The MATLAB function command is denoted by typewriter letters, e.g., .
2.2. Quaternions and Quaternion Matrices
Denote the set of quaternions as
where
are three imaginary units of quaternions, satisfying
The scalar (real) part of is denoted by , and the vector (imaginary) part of is denoted by . A quaternion is called imaginary when its real part is equal to zero. The multiplication of quaternions adheres to the distributive law but is non-commutative.
The zero element in
is
, and the unit element is
. For any
, the conjugate of a quaternion is defined as
The magnitude of
is
. It follows that the inverse of a nonzero quaternion
is given by
. If
; then, we call
a unit quaternion.
Two quaternions and are said to be similar if there exists a nonzero quaternion such that ; this is written as . Obviously, and are similar if and only if there is a unit quaternion such that , and two similar quaternions have the same norm. It is routine to check that ∼ is an equivalence relation on the quaternions. We denote by the equivalence class containing . If , then and are similar, namely, .
Quaternions can also be expressed in polar form as , where is a pure unit quaternion, and denotes the angle (or argument) of the quaternion. Next, we introduce the quaternion rotation and involution operators.
Definition 1 (quaternion rotation [
28])
. For any quaternion , the transformationgeometrically describes a three-dimensional rotation of the vector part of by an angle about the vector part of μ, where is any nonzero quaternion. Specifically, if
in (
2) is an imaginary unit, then the quaternion rotation (
2) reduces to quaternion involution [
29], defined by
where
. Below, we list some properties of quaternion rotation, including
and
Note that the representation in (
1) can be extended to a general orthogonal basis
, where the following properties hold [
28]:
Denote the set of quaternion matrices as
The conjugate transpose of
is
. We say that a square quaternion matrix
is normal if
; Hermitian if
, i.e.,
and
; unitary if
, where
is the identity matrix; and invertible (nonsingular) if there exists a matrix
such that
. In this case, we denote
. We have
if
and
are invertible, and
if
is invertible. The 2-norm of a given quaternion vector
is defined as
.
2.3. GHR Calculus
We now introduce the generalized HR derivatives, which comprise both the product and chain rules; see [
24,
26] for more details.
Definition 2 (real-differentiability [
24])
. Let ; then, a function is called real differentiable when , and are differentiable with respect to the real variables , and , respectively. Definition 3 (GHR derivatives [
24])
. If is a real differentiable, then the GHR derivatives of with respect to and are defined asandwhere , , , and are the partial derivatives of f with respect to , and , while the set is an orthogonal basis of . Definition 4 (quaternion gradient [
24])
. Let and ; then, the two quaternion gradients of f are defined asand Based on the definitions of GHR provided above, we consider a simple quadratic function
, where
and
is a quaternion Hermitian matrix; then, the gradient of this function
f is given by
in which
is the steepest ascent direction [
26].
3. Quaternion Nesterov Accelerated Projected Gradient (Q-NAPG)
In this section, we introduce the quaternion Nesterov accelerated projected gradient algorithm. To this end, we first review the definition related to the eigenvalue of quaternion Hermitian matrices.
Definition 5 (right eigenvalue [
10,
11])
. Let be a quaternion matrix. Then, is called a right eigenvalue of if there exists a nonzero vector such thatHere, is the eigenvector corresponding to the eigenvalue λ. For a quaternion matrix
, if there exists a quaternion
and a nonzero vector
such that
, then for any invertible quaternion
, it follows that
Here,
is also a right eigenvalue of
with the corresponding eigenvector
. This demonstrates that, for a general quaternion square matrix, there exist an infinite number of right eigenvalues [
10].
Due to the non-commutativity of quaternion multiplication, general quaternion square matrices have distinct left and right eigenvalues. However, for a quaternion Hermitian matrix
, if
has a right eigenvalue
and its corresponding eigenvector
, it is straightforward to show that
and by dividing both sides of the above equation by
, we obtain
where (
3) represents the Rayleigh quotient on the quaternion skew field. Therefore, the eigenvalues of quaternion Hermitian matrices are all real numbers, and, thus, there is no distinction between left and right eigenvalues.
Our goal is to compute the principal eigenvalue of a given quaternion Hermitian matrix. By defining the objective function as
and imposing the normalization constraint
, we can equivalently transform the problem of finding the principal eigenvalue of a given quaternion Hermitian matrix into the following maximization optimization problem on the quaternion skew field:
The above problem (
4) can be addressed using the quaternion gradient projection algorithm. Since the introduction of the Nesterov accelerated gradient (NAG) method [
30], the incorporation of momentum has become a conventional approach to overcome the shortsighted issue in gradient algorithms [
31,
32]. To tackle the problem (
4), we propose a quaternion Nesterov accelerated projected gradient algorithm (Q-NAPG). Given an initial point
and set
, the QNAG method repeats for
,
where
and
are the step size and momentum parameters, respectively. When the momentum parameter
, Q-NAPG simplifies to standard gradient ascent (GA). When
, it is possible to achieve accelerated rates of convergence for certain combinations of
and
in the deterministic setting. The framework of the proposed algorithm is detailed below (Algorithm 1).
Algorithm 1 Quaternion Nesterov Accelerated Projected Gradient (Q-NAPG) |
- Input:
The quaternion Hermitian matrix , step size , momentum coefficient , tolerable error , and maximum number of iterations . - Output:
The principle eigenvalue and its corresponding eigenvector . - 1:
Initialize: a unit quaternion vector . - 2:
. - 3:
for to do - 4:
Momentum extrapolation: . - 5:
Compute the gradient at the extrapolation point: . - 6:
Gradient ascent: . - 7:
Normalization: . - 8:
. - 9:
if then - 10:
Break - 11:
end if - 12:
end for
|
Remark 1. After obtaining the principal eigenvalue and its corresponding eigenvector of the quaternion Hermitian matrix, we can employ a deflation technique by updating and continue to apply the QNAG algorithm. By repeating this process, all eigenvalues and their corresponding eigenvectors can be obtained.
The proposed Nesterov accelerated projected gradient (Q-NAPG) algorithm is a novel method for computing the principal eigenvalue of quaternion Hermitian matrices using GHR calculus. The core improvement of the algorithm lies in the introduction of a momentum term and the calculation of a look-ahead gradient. This look-ahead adjustment allows the algorithm to correct the momentum direction in advance, reducing oscillations and converging faster to the optimal solution compared to traditional quaternion projected gradient algorithms. The non-commutativity of quaternion multiplication requires special handling in matrix decomposition algorithms, often necessitating conversion to complex or real equivalent forms, which may lead to the loss of quaternion structural information. The advantage of this method over customary methods is its direct operation on quaternion vectors, avoiding the conversion process to complex or real numbers. The algorithm is simple in its workflow and fully utilizes quaternion gradient information to preserve the intrinsic quaternion structure. Furthermore, the algorithm can be seamlessly extended to quaternion sparse optimization and manifold-constrained optimization problems, demonstrating strong applicability and extensibility.
5. Numerical Experiments
In this section, we provide numerical examples to demonstrate the feasibility and effectiveness of the quaternion Nesterov accelerated gradient projection algorithm for the eigenvalue problem of quaternion Hermitian matrices. In the specific implementation of Algorithm 1, we set the constant step size, momentum parameter, and tolerable error to , , and , respectively.
All experiments are performed using Windows 11 and MATLAB version 23.2.0.2365128 (R2023b), with an AMD Ryzen 7 5800H with Radeon Graphics CPU at 3.20 GHz and 16 GB of memory.
Example 1. Given quaternion matrix withand In this experiment, we employ the quaternion Nesterov accelerated projected gradient method (Algorithm 1) to compute all eigenvalues of
, which are
, with their corresponding eigenvectors being
Additionally, we obtain the following three residuals:
It is evident that the residuals are controlled within an ideal range, demonstrating the feasibility and effectiveness of Algorithm 1 in computing the eigenvalues of quaternion Hermitian matrices.
Example 2. In this experiment, we utilize MATLAB’s built-in functions to randomly generate three quaternion Hermitian matrices of different sizes and compare Algorithm 1 with the QPGA method [27], the eigQ method [21], and the eig function in the Quaternion Toolbox for MATLAB (QTFM) [33]. We first tested the performance of Algorithm 1, the AQPGA method, the eigQ method, and the eig function in computing the principal eigenvalues of three different types and sizes of quaternion Hermitian matrices. The numerical experimental results are presented in
Table 1, which includes three evaluation metrics: the number of iterations, residuals, and runtime. Superior results are highlighted in bold. The symbol “−” indicates that the computer had insufficient running memory, and the algorithm was forcibly terminated by the computer, as it failed to produce a result. The symbol * indicates that the algorithm does not have this metric, as it is not an iterative algorithm and therefore does not have iteration steps. It can be observed that Algorithm 1 outperforms the other algorithms in terms of the number of iterations, problem residuals, and runtime, demonstrating advantages when computing large-scale quaternion Hermitian matrices.
Subsequently, we plotted the variation curves of the objective function values for the first 50 iterations of Algorithm 1 and the QPGA algorithm, as shown in
Figure 1. It is evident that our algorithm achieves a faster increase in the objective function, demonstrating higher efficiency in obtaining the maximum eigenvalue.
Figure 2 illustrates the residual variation curves generated by Algorithm 1 and the QNGA algorithm with respect to the number of iterations. Across the tested matrix dimensions, our algorithm consistently achieves higher accuracy and efficiency.