Next Article in Journal
Few-Shot Classification Based on Sparse Dictionary Meta-Learning
Previous Article in Journal
Zero-Inflated Binary Classification Model with Elastic Net Regularization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Covariance-Free Strictly Complex-Valued Relevance Vector Machine for Reducing the Order of Linear Time-Invariant Systems

School of Mathematics and Statistics, Shaoguan University, Shaoguan 512000, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(19), 2991; https://doi.org/10.3390/math12192991
Submission received: 2 September 2024 / Revised: 20 September 2024 / Accepted: 24 September 2024 / Published: 25 September 2024
(This article belongs to the Special Issue Applied Mathematics in Data Science and High-Performance Computing)

Abstract

:
Multiple-input multiple-output (MIMO) linear time-invariant (LTI) systems exhibit enormous computational costs for high-dimensional problems. To address this problem, we propose a novel approach for reducing the dimensionality of MIMO systems. The method leverages the Takenaka–Malmquist basis and incorporates the strictly complex-valued relevant vector machine (SCRVM). We refer to this method as covariance-free maximum likelihood (CoFML). The proposed method avoids the explicit computation of the covariance matrix. CoFML solves multiple linear systems to obtain the required posterior statistics for covariance. This is achieved by exploiting the preconditioning matrix and the matrix diagonal element estimation rule. We provide theoretical justification for this approximation and show why our method scales well in high-dimensional settings. By employing the CoFML algorithm, we approximate MIMO systems in parallel, resulting in significant computational time savings. The effectiveness of this method is demonstrated through three well-known examples.

1. Introduction

A reduction in dimensionality is crucial in multiple-input multiple-output (MIMO) system models due to the high computational costs associated with high-dimensional problems. A MIMO linear time-invariant (LTI) system with l outputs and k inputs can be represented by a matrix of transfer functions in the following form [1,2,3,4,5,6]:
F ( s ) = B m 1 s m 1 + + B 0 d m s m + d 1 s m 1 + + 1 .
Here, B i R l × k , m is the system order, and d i R , i = 0 , 1 , 2 , , m .
The n-th approximate model is given by the following [1,2,3,4,5,6]:
F ^ n ( s ) = B ^ 1 s n 1 + + B ^ n d ^ 0 s n + d ^ 1 s n 1 + + 1 ,
where B ^ i R l × k , n is the reduced system order, and d ^ i R , i = 0 , 1 , 2 , , n .
Numerous techniques have been developed for model reduction in linear invariant systems. These techniques encompass a wide range of approaches, including linear matrix inequalities [7], error minimization [8], magnitude and phase criteria [9], balanced truncation [10,11], rational interpolation [12], the Krylov method [13], adaptive Fourier decomposition (AFD) [14], Routh approximations, and Padé-type model reductions [15,16]. However, those methods incur substantial computational costs and necessitate prior knowledge of the actual system, which is typically unknown in practice. Consequently, a persistent demand exists for rapid and efficient methods to reduce the order of LTI systems. To address this research gap, this study explores novel MIMO system model reduction approaches that are computationally efficient and do not rely on explicit knowledge of the underlying system. By leveraging the latest advancements in the field, we seek to develop effective techniques that can reliably reduce the order of LTI systems while preserving their essential dynamic characteristics.
AFD and SCRVM methods are effective in reducing LTI systems. These methods rely on the rational orthogonal basis, also called the Takenaka–Malmquist (TM) system [17]. In the case of the open right-half plane Π = { s C : R ( s ) > 0 } , the TM system is defined as follows:
B k ( s ) = 2 R ( a k ) s + a ¯ k l = 1 k 1 s a l s + a ¯ l , k = 1 , 2 , ,
where a k Π and R ( · ) are the complex value real parts. The basis { B k } k = 0 (with B 0 = 1 ) forms a basis of the Hardy-2 space H 2 ( Π ) if and only if
k = 1 2 R ( a k ) 1 + | a k | 2 = .
The shifted Cauchy kernel, denoted as B ( s ) = e a ( s ) = 2 R ( a ) s + a ¯ , is a component of the TM system. These systems are commonly used in model reduction due to their parameter-based linear model structure [18,19,20,21,22].
In recent years, the relevance vector machine (RVM) has emerged as a robust framework for solving sparse coding problems and providing uncertainty quantification [23,24,25]. RVM, also called sparse Bayesian learning (SBL), was proposed by Tipping [26] in 2001. RVM has achieved significant success in high-spectral image classification [27,28,29] and reconstruction [30]. It also has found applications in various fields such as direction-of-arrival estimation (DOA) [31,32,33], classification [25], compressive sensing [34], feature selection [35], signal processing [36,37,38], image reconstruction [30,39], financial prediction [40], and more.
The RVM offers computational efficiency by progressively “pruning” irrelevant vectors during the inference process, reducing computation time for covariance matrix inversion. This advantage is precious for high-dimensional problems. However, when dealing with large-scale datasets, RVM faces challenges due to the computational costs of the iterative process, scaling O ( T M 3 ) in time and O ( M 2 ) in space, where T is the iteration time, and M is the number of parameters. Additionally, for complex-valued data, the computational time increases to O ( T ( 2 M ) 3 ) with a storage space of O ( ( 2 M ) 2 ) . There exist several methods to reduce the costs of the iterative process, such as iteratively reweighted least-squares (IRLS) [41], approximate message passing (AMP) [42], and variational inference (VI) [43]. A popular method that does not require the computation of the inverse matrix, called inverse-free sparse Bayesian learning (IFSBL) [33,43], is often faster in practice. IFSBL bypasses matrix inversion via relaxed evidence lower bound (ELBO), employing a variational EM scheme for efficient, fast-convergence SBL. However, these methods need more scalability at very high dimensions M and more accurate recovery of the sparse codes. Researchers have proposed approximate inference algorithms and acceleration techniques like CoFEM [44], which have significant advantages in both scalability and accuracy over other SBL approaches. Unfortunately, CoFEM only handles real-valued data. This paper introduces a novel method called covariance-free maximum likelihood (CoFML) to address the complex-valued issue similar to CoFEM. CoFML, which omits explicit covariance matrix calculations, provides unbiased posterior estimates through linear system solutions and advanced numerical algebra, resulting in a fast, accurate, and sparsely approximated model.
The paper is organized as follows: Section 2 presents an innovative approach to enhance the strictly complex-valued relevant vector machine (SCRVM) algorithm’s computational efficiency in scenarios involving high-dimensional problems, which we call CoFML. Section 3 provides a theoretical analysis of CoFML. Numerical examples are presented in Section 4, followed by conclusions in Section 5.

2. Reduction in MIMO Systems

2.1. Strictly Complex-Valued Relvence Vector Machine Inference

We expand the MIMO LTI systems F ( s ) in (1) row by row into L SISO LTI systems F l ( s ) , with F l denoting the l-th system of F ( s ) , where l = 1 , 2 , , L and L = p × q . We define the single-input signal s = [ s 1 , , s N ] T and the single-output signal as z l = [ z l 1 , , z l N ] T , and N is the number of examples. The transfer function of the l-th systems is represented as follows:
z l = F l ( s ; θ l ) + v l = Ω l θ l + v l ,
where θ l = [ θ l 1 , , θ l M ] T C M × 1 , Ω l =   e a 1 ( s 1 ) e a M ( s 1 ) e a 1 ( s N ) e a M ( s N ) C N × M and e a i ( s j ) = 2 R ( a i ) s j + a ¯ i , i = 1 , 2 , , M , j = 1 , 2 , , N . The noise vector v l = [ v l 1 , v l 2 , , v l N ] T is assumed to be complex Gaussian-distributed, with v l n N ( v l n | 0 , σ 2 I 2 ) . Here, 0 = [ 0 , , 0 ] T R N × 1 , v l n includes both the real and imaginary components of the complex noise, and σ 2 denotes the variance. Notably, those systems have the same denominator, so we have Ω 1 = Ω 2 = = Ω L . By considering the real and imaginary parts separately and assuming independence among z l , the likelihood function for the complete data set can be expressed as follows:
p ( Z l | Θ l , σ 2 ) = N ( Z l | K Θ l , σ 2 I 2 N ) = ( 2 π σ 2 ) N exp 1 2 ( Z l K Θ l ) T σ 2 I 2 N ( Z l K Θ l ) ,
where K = ( Ω ) ( Ω ) ( Ω ) ( Ω ) , Z l = ( z l ) ( z l ) , S = ( s ) ( s ) , Θ l = ( θ l ) ( θ l ) , and ( · ) is the imaginary part of a complex number.
The above notation represents a 2 N -dimensional Gaussian distribution over Z l with a mean of K Θ l and variance of σ 2 I 2 N . For simplicity, the implicit conditioning on the set of input vectors S is omitted in Equation (6) and subsequent expressions during inference.
The prior distribution for the coefficients Θ l is modeled as a zero-mean Gaussian distribution:
p ( Θ l | α ) = N ( Θ l | 0 , A ) ,
where 0 = [ 0 , , 0 ] T R 2 M × 1 , A = diag ( α ) , and α = [ α 1 , , α 2 M ] T . To enforce sparsity with equal sparsity patterns for the real and imaginary parts, we set α l = α M + l > 0 for l = 1 , 2 , , M . For convenience, we define A 1 = diag ( [ α 1 , , α M ] T ) , so that A = A 1 O O A 1 where O is the zero matrix. Additionally, we assign Gamma distributions [24] as hyperpriors for α m and the noise variance σ 2 :
p ( α ) = m = 1 2 M Γ ( α m 1 | a , b ) , p ( σ 2 ) = Γ ( σ 2 | c , d ) .
We set a = b = c = d = 10 4 [26] to ensure flat priors.
By using Bayes’ rule, the posterior covariance and mean are given by
Σ = ( σ 2 K T K + A ) 1 ,
and
U l = σ 2 Σ K T Z l ,
respectively. To facilitate subsequent inference, we introduce the notation μ i l = U i l + j U i + M l , where i = 1 , 2 , , M . We define μ l = ( μ 1 l , μ 2 l , , μ M l ) T , leading to U l = R ( μ l ) I ( μ l ) .
To infer the SCRVM, we use the augmented vector z l ̲ = z l z l * , θ l ̲ = θ l θ l * , μ l ̲ = μ l μ l * and the augmented matrix Ω ̲ = Ω O O Ω *  [45]. Using a simple relation of the composite augmented vector and matrix, given by z ̲ l = T Z l , θ ̲ l = T Θ l , μ ̲ l = T U l , and Ω ̲ = 1 2 T K T H , where T = I j I I j I C 2 N × 2 N , and T H T = T T H = 2 I . Hence, we have Z l = 1 2 T H z ̲ l , Θ l = 1 2 T H θ ̲ l , and U l = 1 2 T H μ ̲ l . By using these simple transformations, we obtain the following:
p ( θ l | z l , α , σ 2 ) = 1 ( π ) M | Σ θ l θ l | 1 2 exp { 1 2 ( θ ̲ l μ ̲ l ) H Σ θ l θ l 1 ( θ ̲ l μ ̲ l ) } ,
where
Σ θ l θ l 1 = 1 4 T Σ 1 T H = Λ 1 O O Λ * , | Σ θ l θ l | = 2 2 M | Σ | ,
and
μ ̲ l = T U l = ( 2 σ 2 ) 1 Σ θ l θ l Ω H z ̲ l = μ l μ l * ,
where Λ = ( 2 σ 2 ) 1 Ω H Ω + A 1 2 1 and μ l = ( 2 σ 2 ) 1 Λ Ω H z l .
To simplify the inference, we introduce the notation c 2 = 2 σ 2 and E = 1 2 A 1 (with b i = α i 2 ), yielding the following:
Λ = ( c 2 Ω H Ω + E ) 1 ,
and
μ l = c 2 Λ Ω H z l .
Therefore, the distribution of Equation (10) can also be rewritten as follows:
p ( θ l | z l , α , σ 2 ) = 1 ( π ) M | Λ | exp { ( θ l μ l ) H Λ 1 ( θ l μ l ) } .
Then, we also have
p ( z l | α , σ 2 ) = ( π ) N | D | 1 exp z l H D 1 z l ,
where D = 2 σ 2 I + 2 Ω A 1 1 Ω H = c 2 I + Ω E 1 Ω H is a Hermitian positive definite matrix [46].
So, we can represent the joint probability density function of MIMO LTI systems as follows:
L ( α , σ 2 ) = l = 1 L log p ( z l | α , σ 2 ) = ( L N log π + L log | D | + l = 1 L z l H D 1 z l ) .
Using the maximum likelihood method [24] to find its maximum value, we obtain the following:
α i n e w = γ i 1 L l = 1 L | μ i l | 2 ,
and
( σ 2 ) n e w = 2 N 1 L l = 1 L | | z l Ω μ l | | 2 + T r ( Λ Ω H Ω ) ,
where γ i = 2 α i Λ i i , μ i l is the i-th elements of (14), and T r ( · ) is the trace of a matrix.

2.2. An Estimator for the Diagonal of Covariance Matrix

We adopt the technique in [47] to estimate the diagonal components of Λ .
Proposition 1. 
Let v k = [ v k , 1 , v k , 2 , , v k , M ] T C M × 1 ( k = 1 , 2 , , 2 M ) be random probe vectors, where each v k comprises independent and identically distributed elements such that E k = 1 2 M v k , i · v k , j = 0 for all i j = 1 , , M . Moreover, it also assumes that k = 1 2 M v k , i · v k , j is independent of k = 1 2 M v k , i · v k , j , j j . For each v k . Let r k = Λ v k , where Λ is (13) and r k = [ r k , 1 , , r k , M ] T . Consider the estimator τ R M × 1 , where, for each j = 1 , , M ,
τ i = k = 1 2 M v k , i · r k , i k = 1 2 M v k , i 2 .
Then, the τ i provides an unbiased estimate of Λ i i .
Proof. 
The expected value of τ i is given by the following:
E [ τ i ] = Λ i , i + j i Λ i , j · ( E k = 1 2 M v k , j · v k , i k = 1 2 M v k , i 2 ) ,
because the E k = 1 2 M v k , i v k , j = 0 for all i j , and we have E τ i = Λ i , i . The proof is completed.    □
In particular, we assume that v 1 , v 2 , , v 2 M are independent Rademacher variables, which means P ( v k , i = 1 ) = P ( v k , i = 1 ) = 0.5 . Then, we can simplify (20) to the following form:
τ i = 1 2 M k = 1 2 M v k , i · r k , i .
Theorem 1. 
Let τ i be the estimator given by Equation (21), satisfying the conditions of Proposition 1. Then, τ i is one of the optimal estimators for Λ i i .
Proof. 
By using Proposition 1 and the calculated methods of [44] that confirm E τ i = Λ i , i , we can calculate the variance ς i 2 of τ i as follows:
ς i 2 = E τ i E τ i 2 = E i i Λ i , i · k = 1 2 M v k , i · v k , i k = 1 2 M v k , i 2 2 = E i i i i Λ i , i · Λ i , i · e i , i e i , i · e i , i e i , i ,
where e i , l : = k = 1 2 M v k , i · v k , l . In the numerator, due to the independence of k = 1 2 M v k , i · v k , i and k = 1 2 M v k , i · v k , i when i i , we observe that E e i , i · e i , i = E e i , i E e i , i = 0 . Consequently, τ i serves as an optimal estimator for Λ i , i . This completes the proof.    □
Then, we transform the inversion problem into the problem of solving a linear system; that is, solving the linear equation C y = b , where C : = Λ 1 and b : = v k . Furthermore, we can concurrently solve these systems by considering the matrix equation CY = B , where the inputs C C M × M and B C M × ( 2 M + L ) are defined as follows:
C : = c 2 Ω H Ω + E , B : = [ b 1 , b 2 , , b 2 M + L ] = v 1 , v 2 , , v 2 M , c 2 Ω H z 1 , , c 2 Ω H z L .
By labeling the columns of the solution matrix Y C M × ( 2 M + L ) as
Y : = [ y 1 , y 2 , y 2 M , , y 2 M + L ] = [ r 1 , , r 2 M , μ 1 , , μ L ] ,
our desired quantities for CoFML, μ l and τ , can be calculated using (21). Then, we perform the update in (18) as
α i n e w = γ i 1 L l = 1 L | μ j l | 2 ,
where γ i = 2 α i τ i .
Then, we choose the conjugate gradient (CG) algorithm to solve the multiple linear system CY = B . This requires converting the matrix–vector multiplication of the CG algorithm for a single linear system into a matrix–matrix multiplication.
Lemma 1. 
Consider the CG algorithm applied to solve Cy i = b i for i = 1 , 2 , , 2 M + L , where C C M × M is a positive definite matrix, and B C M × ( 2 M + L ) . Here, y i and b i represent the i-th column of Y and B , respectively. Let y i 0 C M × 1 be the initial solution, y i * be the exact solution, and y i k denote the solution obtained by the CG algorithm at the k-th step. We can establish the following relationship:
y i k y i * C 2 K 1 K + 1 k y i 0 y i * C ,
where | | x | | C : = x H C x denotes the norm induced by the positive definite matrix C for any vector x C M × 1 , and K = cond 2 ( C )  [48]. From (23), we observe that when C is ill-conditioned ( K > 1 ), the convergence of the CG algorithm tends to be slow. However, in SBL iterations, many diagonal elements of α i are pushed towards infinity [26], resulting in a large value of K. To address this issue, we incorporate a preconditioned matrix into the CG algorithm, which will be discussed in the following section.

3. Preconditioned Conjugate Gradient Method for SCRVM

3.1. Preconditioned Matrix

Here, we would rather solve the equivalent system C Y = B than solve C Y = B , where C : = P 1 / 2 CP 1 / 2 , B : = P 1 / 2 B , and Y : = P 1 / 2 Y . The matrix P is called the preconditioned matrix. Before presenting the convergence proof of the preconditioned conjugate gradient method, the parallel conjugate gradient (PCG) algorithm is outlined, as shown in Algorithm 1.
Algorithm 1  PCG ( C , B , P , T , 2 M )
1: Initialize α i ( 1 ) 1 for i = 1 , , M .
2: for t = 1 , 2 , , T do
3: Define C c 2 Ω H Ω + E .
4: Draw v 1 , v 2 , , v 2 M Rademacher distribution .
5: Define B v 1 , v 2 , , v 2 M , c 2 Ω H z 1 , c 2 Ω H z 2 , , c 2 Ω H z L .
6: C P 1 2 C P 1 2 , B P 1 2 B .
7: Y CG ( C , B ) .
8: Y P 1 2 Y .
9: Compute τ i 1 2 M k = 1 2 M v k , i · r k , i for i = 1 , , M .
10: if α i ( t + 1 ) = and t < T then
11: Delete the corresponding columns of Ω .
12: else
13: Compute γ i 2 α i ( t ) τ i .
14: Update α i ( t + 1 ) γ i 1 L l = 1 L | μ i l | 2 , for i = 1 , , M .
15: end if
16: end for
17: return α ( T ) , μ 1 , μ 2 , , μ L , τ .
However, in practice, it is impossible to obtain α ( t + 1 ) = in Algorithm 1. So, we usually use the condition α i ( t + 1 ) > 10 12 to replace α ( t + 1 ) = . Finally, we call the final Algorithm 1 CoFML. After running Algorithm 1, we can obtain the approximate model (2). In particular, when n = 2 and the basis consists of real numbers, the l-th system of the second approximating model is given by the following:
F ^ 2 , l ( s ) = μ 1 , l 2 R ( a 1 ) s + a 1 + μ 2 , l 2 R ( a 2 ) s + a 2 , = ( μ 1 , l 2 R ( a 1 ) + μ 2 , l 2 R ( a 2 ) ) s + ( μ 1 , l a 2 2 R ( a 1 ) + μ 2 , l a 1 2 R ( a 2 ) ) s 2 + ( a 1 + a 2 ) s + a 1 a 2 , = ( μ 1 , l 2 a 1 + μ 2 , l 2 a 2 ) s + ( μ 1 , l a 2 2 a 1 + μ 2 , l a 1 2 a 2 ) s 2 + ( a 1 + a 2 ) s + a 1 a 2 .
To ensure the l-th steady-state values of the reduced model are equal to the original systems (1), we use the following constraint:
μ 1 , l a 2 2 a 1 + μ 2 , l a 1 2 a 2 a 1 a 2 B 0 , l b 0 .

3.2. Convergence of CoFML

When the values of a , b , c , d are small and t , it is common for specific α values to approach infinity and effectively “prune” the associated “nuisance” parameters while maintaining finite values for α [26]. We designate these retained parameters as the “true” parameters.
Definition 1 
(CoFML Convergence). In Algorithm 1, let α ^ : = lim t α ( t ) , and the fulfillment of ( N , T , α ^ ) -convergence for CoFML is established. Partition the indices N M : = { 1 , 2 , , M } into a “nuisance” set denoted as N N M and a “true” set denoted as T : = N M N , respectively. In this partition, α ^ i is finite when i T , while α ^ i = when i N .
Here, we denote the resulting matrices with the appropriate “true” and “nuisance” columns removed as Ω T and Ω N , respectively. By leveraging the expression for the inverse of a partitioned matrix, we can demonstrate the following relationship:
Λ = c 2 Ω H Ω + E 1 = c 2 Ω T Ω N H Ω T Ω N + E T O O E N 1 = c 2 Ω T H Ω N H Ω T Ω N + E T O O E N 1 = c 2 Ω T H Ω T + E T c 2 Ω T H Ω N c 2 Ω N H Ω T c 2 Ω N H Ω N + E N 1 t Λ T O O O .
Building upon the derivation above, we can establish the convergence theorem for the preconditioned matrix.
Theorem 2 
(PCG Convergence). Let C ( t ) : = c 2 Ω H Ω + E ( t ) and P ( t ) : = c 2 I + E ( t ) , respectively, represent the inverse-covariance matrix and preconditioned matrix at the t-th iteration of Algorithm 1. We let C ( t ) : = P ( t ) 1 / 2 C ( t ) P ( t ) 1 / 2 , b ( t ) : = P ( t ) 1 / 2 b ( t ) , and y 0 C M × 1 be the initial solution, y * is the exact solution, and y k , ( t ) represents the solution obtained by the algorithm at the k-th step and t-th iteration. Then, given ( N , T , α ^ ) -convergence, it follows that
lim t y k , ( t ) y * C ^ 2 exp { k 1 η 1 + η } y 0 y * C ^ ,
where η = Ω T H Ω T I 2 .
Proof. 
From Lemma 1, we have the following residual:
y k , ( t ) y * C ( t ) 2 ( K ( t ) 1 K ( t ) + 1 ) k y 0 y * C ( t ) 2 ( 1 1 K ( t ) ) k y 0 y * C ( t ) 2 exp { k K ( t ) } y 0 y * C ( t ) ,
where K ( t ) = λ max C ( t ) / λ min C ( t ) and K = lim t K ( t ) . In (26), we notice that our goal is to bound K = λ max C / λ min C .
Let Ψ : = c 2 Ω H Ω c 2 I . So, we obtain the following:
C ( t ) = P ( t ) + Ψ C ( t ) = I + P ( t ) 1 2 Ψ P ( t ) 1 2 C ^ : = lim t C ( t ) = I + P ^ T , T 1 2 Ψ T , T P ^ T , T 1 2 O O O ,
where P ^ T , T : = c 2 I T + E T .
Ref. (27) demonstrates that if λ is an eigenvalue of P ^ T , T 1 / 2 Ψ T , T P ^ T , T 1 / 2 , then the eigenvalue of C ^ is 1 + λ . M is similar to N if there exists an invertible matrix T , such that N = T 1 M T . Similar matrices have the same eigenvalues [49]. Consider M : = P ^ T , T 1 / 2 Ψ T , T P ^ T , T 1 / 2 , N : = P ^ T , T 1 Ψ T , T and T : = P ^ T , T 1 / 2 , so the eigenvalue of P ^ T , T Ψ T , T is λ . Since the absolute value of the eigenvalue of a matrix does not exceed its spectral norm, we have
| λ | P ^ T , T 1 Ψ T , T 2 = P ^ T , T 1 c 2 Ω T H Ω T c 2 I 2 ( c 2 I ) 1 c 2 Ω T H Ω T c 2 I 2 = Ω T H Ω T I 2 .
It follows that λ max C ^ 1 + Ω T H Ω T I 2 and λ min C ^ 1 Ω T H Ω T I 2 , so we have
K = λ max λ min 1 + Ω T H Ω T I 2 1 Ω T H Ω T I 2 ,
where we first let t on both sides of (26), and then substitute (28) into it to obtain (25). The Proof is completed. □

3.3. Computational Complexities of CoFML

After satisfying the convergence conditions mentioned above, we analyze the algorithm’s overall complexity. During each of the T iterations of CoFML, a maximum of H ( H < M ) steps of conjugate gradient (CG) is required, resulting in an overall time complexity of O ( T H ) and space complexity of O ( H ) for each of the L systems.

4. Examples

This section provides three examples to illustrate the algorithm described above. The l-th system impulse response is defined as follows:
h l ( t ) = 1 2 π j Q j Q + j e ζ t F l ( ζ ) d ζ ,
where F l ( ζ ) H 2 ( Π ) is a high-order transfer function, and Q is chosen to be greater than the real part of the largest pole of F l ( ζ ) . The l-th system impulse response energy (IRE) is calculated as follows:
I R E l = 0 h l 2 ( t ) d t = 1 2 π | F l ( j ω ) | 2 d ω .
Then, we consider three well-known examples when L = 1 , 2 , 4 , respectively.
Example 1. 
We first consider a SISO LTI system [16,50,51] as follows:
F 10 ( s ) = 540.70748 × 10 17 l = 1 10 ( s + b l ) ,
where b 1 = 2.04 , b 2 = 18.3 , b 3 = 50.13 , b 4 = 95.15 , b 5 = 148.85 , b 6 = 205.16 , b 7 = 257.21 , b 8 = 298.03 , b 9 = 320.97 , and b 10 = 404.16 .
In this example, we use N = 251 frequency-domain measurements within the interval [ 5 j , 5 j ] and M = 14 basis functions within the interval [ 0.1 , 10 ] . By applying the CoFML method, we obtain a 1 = 2.3846 , a 2 = 6.9538 and take the real part of μ . Then, the 2nd partial sum is given by the following:
F ^ 2 ( s ) = 0.01053 s + 16.26 s 2 + 9.338 s + 16.58 .
Figure 1 shows the step response comparison of the original system and the simplified model obtained by other methods, while Table 1 compares the IRE with different techniques. Figure 1 and Table 1 show that the CoFML method is effective.
Example 2. 
The next example is studied in [53] which is a 4th-order system, where
F 4 ( s ) = 28 12 s 3 + 496 528 s 2 + 1800 1440 s + 2400 4320 2 s 4 + 36 s 3 + 204 s 2 + 360 s + 240 .
Here we use N = 334 frequency-domain measurements in the interval [ 5 j , 5 j ] and M = 14 basis numbers in the interval [ 0.1 , 10 ] . Using the CoFML method, we obtain a 1 = 0.8615 , a 2 = 1.6231 and take the real part of the μ . Then, the 2nd partial sum is given by the following:
F ^ 2 ( s ) = 12.3043 9.5249 s + 14.4694 26.2957 s 2 + 2.4846 s + 1.3983 .
Figure 2 shows the original and other reduced model step responses, while Table 2 compares the IRE with different methods. Figure 2 and Table 2 show the adequate CoFML model.
Example 3. 
We finally consider the transfer function study in [54]:
F 4 ( s ) = 14.96 ( s + 1.7 ) ( s + 100 ) 95150 ( s + 1.898 ) ( s + 10 ) 85.20 ( s + 1.44 ) ( s + 100 ) 124000 ( s + 2.077 ) ( s + 10 ) ( s + 1.338354 ) ( s + 1.886647 ) ( s + 10 ) ( s + 100 ) .
In this example, we use N = 334 frequency-domain measurements within the interval [ 5 j , 5 j ] and M = 8 basis functions within the interval [ 0.1 , 12 ] . By applying the CoFML method, we obtain a 1 = 1.8 , a 2 = 3.5 and take the real part of the μ . With these values, the 2nd partial sum F ^ 2 ( s ) is given by:
F ^ 2 ( s ) = 0.4517 s + 9.0010 724.0193 s + 6141.1035 3.6572 s + 46.0935 856.7064 s + 8562.8471 s 2 + 7 s + 9.36 .
Figure 3 shows the step responses of the original system and the reduced models obtained using different methods. In contrast, Table 3 and Table 4 compare the IRE using other methods. From Figure 3 and Table 3 and Table 4, we can observe that the CoFML method is effective.

5. Conclusions

In this paper, we developed the CoFML method to accelerate SCRVM. By solving the SCRVM inversion problem with unbiased estimates of the matrix diagonal elements, CoFML showed superior time and space efficiency compared to existing SCRVM techniques, especially when the number of unknowns M was large. Next, we theoretically analyzed the convergence of the CoFML algorithm, including convergence under preconditioned matrices. In addition, three well-known examples of results demonstrated that CoFML outperforms existing model reduction methods for MIMO systems, even with a small data set. Moreover, the applicability of CoFML extends to any scenario involving complex sparse Bayesian methods where covariance computation is required. This versatility opens up opportunities for its use in various fields such as compressed sensing, direction of arrival (DOA) estimation, multi-target tracking, and beyond.

Author Contributions

Methodology, W.X.; writing—original draft, J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This researchis supported by the Science Foundation of Shaoguan University (No. SY2021KJ11) and the Scientific Computing Research Innovation Team of Guangdong (No. 2021KCXTD052).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ogata, K. Discrete-Time Control Systems; Prentice-Hall, Inc.: Upper Saddle River, NJ, USA, 1995. [Google Scholar]
  2. d’Azzo, J.J.; Houpis, C.D. Linear Control System Analysis and Design: Conventional and Modern; McGraw-Hill Higher Education: New York, NY, USA, 1995. [Google Scholar]
  3. Khalil, I.S.; Doyle, J.C.; Glover, K. Robust and Optimal Control; Prentice Hall: Upper Saddle River, NJ, USA, 1996; Volume 2. [Google Scholar]
  4. Goodwin, G.C.; Graebe, S.F.; Salgado, M.E. Control System Design; Prentice Hall: Upper Saddle River, NJ, USA, 2001; Volume 240. [Google Scholar]
  5. Lathi, B.P.; Green, R.A. Linear Systems and Signals; Oxford University Press: New York, NY, USA, 2005; Volume 2. [Google Scholar]
  6. Albertos, P.; Antonio, S. Multivariable Control Systems: An Engineering Approach; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  7. Geromel, J.; Kawaoka, F.; Egas, R. Model reduction of discrete time systems through linear matrix inequalities. Int. J. Control 2004, 77, 978–984. [Google Scholar] [CrossRef]
  8. Mittal, A.; Prasad, R.; Sharma, S. Reduction of linear dynamics systems using an error minimization technique. J.-Inst. Eng. India Part Electr. Eng. Div. 2004, 84, 201–206. [Google Scholar]
  9. Sandberg, H.; Lanzon, A.; Anderson, B.D. Model approximation using magnitude and phase criteria: Implications for model reduction and system identification. Int. J. Robust Nonlinear Control-IFAC-Affil. J. 2007, 17, 435–461. [Google Scholar] [CrossRef]
  10. Gugercin, S.; Sorensen, D.; Antoulas, A. A modified low-rank Smith method for large-scale Lyapunov equations. Numer. Algorithms 2003, 32, 27–55. [Google Scholar] [CrossRef]
  11. Penzl, T. Algorithms for model reduction of large dynamical systems. Linear Algebra Its Appl. 2006, 415, 322–343. [Google Scholar] [CrossRef]
  12. Gugercin, S.; Antoulas, A.C.; Beattie, C. H2 model reduction for large-scale linear dynamical systems. SIAM J. Matrix Anal. Appl. 2008, 30, 609–638. [Google Scholar] [CrossRef]
  13. Magruder, C.; Beattie, C.; Gugercin, S. Rational Krylov methods for optimal L 2 model reduction. In Proceedings of the 49th IEEE Conference on Decision and Control (CDC), Atlanta, GA, USA, 15–17 December 2010; pp. 6797–6802. [Google Scholar]
  14. Mi, W.; Qian, T.; Wan, F. A fast adaptive model reduction method based on Takenaka–Malmquist systems. Syst. Control Lett. 2012, 61, 223–230. [Google Scholar] [CrossRef]
  15. Freund, R.W. Padé–Type Model Reduction of Second-Order and Higher-Order Linear Dynamical Systems. In Dimension Reduction of Large-Scale Systems: Proceedings of the Workshop held in Oberwolfach, Germany, 19–25 October 2003; Benner, P., Sorensen, D.C., Mehrmann, V., Eds.; Lecture Notes in Computational Science and Engineering; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
  16. Parmar, G.; Mukherjee, S.; Prasad, R. System reduction using eigen spectrum analysis and Padé approximation technique. Int. J. Comput. Math. 2007, 84, 1871–1880. [Google Scholar] [CrossRef]
  17. Walsh, J.L. Interpolation and Approximation by Rational Functions in the Complex Domain; American Mathematical Soc.: Providence, RI, USA, 1935; Volume 20. [Google Scholar]
  18. Heuberger, P.S.C.; Hof, P.M.J.V.D.; Bosgra, O.H. A generalized orthonormal basis for linear dynamical systems. In Proceedings of the 32nd IEEE Conference on Decision and Control, New Orleans, LA, USA, 13–15 December 1995. [Google Scholar]
  19. Ward, N.F.D.; Partington, J.R. Rational wavelet decompositions of transfer functions in hardy-sobolev classes. Math. Control Signals Syst. 1995, 8, 257–278. [Google Scholar] [CrossRef]
  20. Akçay, H.; Ninness, B. Orthonormal basis functions for modelling continuous-time systems. Signal Process. 1999, 77, 261–274. [Google Scholar] [CrossRef]
  21. Akçay, H.; Heuberger, P. A frequency-domain iterative identification algorithm using general orthonormal basis functions. Automatica 2001, 37, 663–674. [Google Scholar] [CrossRef]
  22. Hof, P.V.D.; Wahlberg, B.; Heuberger, P.; Ninness, B.; Bokor, J.; e Silva, T.O. Modelling and Identification with Rational Orthogonal Basis Functions—ScienceDirect. IFAC Proc. Vol. 2000, 33, 445–455. [Google Scholar]
  23. Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4. [Google Scholar]
  24. Berger, J.O. Statistical Decision Theory and Bayesian Analysis; Springer: Berlin/Heidelberg, Germany, 1985. [Google Scholar]
  25. Luo, J.; Vong, C.M.; Wong, P.K. Sparse Bayesian extreme learning machine for multi-classification. IEEE Trans. Neural Netw. Learn. Syst. 2013, 25, 836–843. [Google Scholar]
  26. Tipping, M.E. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 2001, 1, 211–244. [Google Scholar]
  27. Demir, B.; Erturk, S. Hyperspectral image classification using relevance vector machines. IEEE Geosci. Remote Sens. Lett. 2007, 4, 586–590. [Google Scholar] [CrossRef]
  28. Mianji, F.A.; Zhang, Y. Robust hyperspectral classification using relevance vector machine. IEEE Trans. Geosci. Remote Sens. 2011, 49, 2100–2112. [Google Scholar] [CrossRef]
  29. Liu, X.; Chen, X.; Li, J.; Zhou, X.; Chen, Y. Facies identification based on multikernel relevance vector machine. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7269–7282. [Google Scholar] [CrossRef]
  30. Ospina-Acero, D.; Marashdeh, Q.M.; Teixeira, F.L. Relevance vector machine image reconstruction algorithm for electrical capacitance tomography with explicit uncertainty estimates. IEEE Sens. J. 2020, 20, 4925–4939. [Google Scholar] [CrossRef]
  31. Zhang, J.; Qiu, T.; Luan, S. An efficient real-valued sparse Bayesian learning for non-circular signal’s DOA estimation in the presence of impulsive noise. Digit. Signal Process. 2020, 106, 102838. [Google Scholar] [CrossRef]
  32. Dai, J.; So, H.C. Real-valued sparse Bayesian learning for DOA estimation with arbitrary linear arrays. IEEE Trans. Signal Process. 2021, 69, 4977–4990. [Google Scholar] [CrossRef]
  33. Lu, J.; Yang, Y.; Yang, L. An efficient off-grid direction-of-arrival estimation method based on inverse-free sparse Bayesian learning. Appl. Acoust. 2023, 211, 109521. [Google Scholar] [CrossRef]
  34. Ji, S.; Dunson, D.; Carin, L. Multitask compressive sensing. IEEE Trans. Signal Process. 2008, 57, 92–106. [Google Scholar] [CrossRef]
  35. Cheng, H.; Chen, H.; Jiang, G.; Yoshihira, K. Nonlinear feature selection by relevance feature vector machine. In Proceedings of the International Workshop on Machine Learning and Data Mining in Pattern Recognition, Leipzig, Germany, 18–20 July 2007; Springer: Berlin/Heidelberg, Germany, 2007; pp. 144–159. [Google Scholar]
  36. Wipf, D.P.; Rao, B.D. Bayesian learning for sparse signal reconstruction. In Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’03), Hong Kong, 6–10 April 2003; Volume 6, p. VI–601. [Google Scholar]
  37. Huang, K.; Aviyente, S. Sparse representation for signal classification. Adv. Neural Inf. Process. Syst. 2006, 19. [Google Scholar] [CrossRef]
  38. Zhang, Z.; Rao, B.D. Sparse signal recovery with temporally correlated source vectors using sparse Bayesian learning. IEEE J. Sel. Top. Signal Process. 2011, 5, 912–926. [Google Scholar] [CrossRef]
  39. Bilgic, B.; Goyal, V.K.; Adalsteinsson, E. Multi-contrast reconstruction with Bayesian compressed sensing. Magn. Reson. Med. 2011, 66, 1601–1615. [Google Scholar] [CrossRef]
  40. Hossain, A.; Nasser, M. Recurrent support and relevance vector machines based model with application to forecasting volatility of financial returns. J. Intell. Learn. Syst. Appl. 2011, 3, 230. [Google Scholar] [CrossRef]
  41. Wipf, D.; Nagarajan, S. Iterative Reweighted 1 and 2 Methods for Finding Sparse Solutions. IEEE Trans. Signal Process. 2010, 4, 317–329. [Google Scholar]
  42. Fang, J.; Zhang, L.; Li, H. Two-Dimensional Pattern-Coupled Sparse Bayesian Learning via Generalized Approximate Message Passing. IEEE Trans. Image Process. 2016, 25, 2920–2930. [Google Scholar] [CrossRef]
  43. Duan, H.; Yang, L.; Fang, J.; Li, H. Fast inverse-free sparse Bayesian learning via relaxed evidence lower bound maximization. IEEE Signal Process. Lett. 2017, 24, 774–778. [Google Scholar] [CrossRef]
  44. Lin, A.; Song, A.H.; Bilgic, B.; Ba, D. Covariance-free sparse Bayesian learning. IEEE Trans. Signal Process. 2022, 70, 3818–3831. [Google Scholar] [CrossRef]
  45. Boloix-Tortosa, R.; Murillo-Fuentes, J.J.; Velázquez, I.S.; Pérez-Cruz, F. Complex-Valued Kernel Methods for Regression. arXiv 2016, arXiv:1610.09915. [Google Scholar]
  46. Schreier, P.J.; Scharf, L.L. Statistical Signal Processing of Complex-Valued Data: The Theory of Improper and Noncircular Signals; Cambridge University Press: Cambridge, MA, USA, 2010. [Google Scholar]
  47. Bekas, C.; Kokiopoulou, E.; Saad, Y. An estimator for the diagonal of a matrix. Appl. Numer. Math. 2007, 57, 1214–1229. [Google Scholar] [CrossRef]
  48. Young, D.M. Iterative Solution of Large Linear Systems; Elsevier: Amsterdam, The Netherlands, 2014. [Google Scholar]
  49. Strang, G. Linear Algebra and Its Applications; Pearson Education India: Bangalore, India, 2012. [Google Scholar]
  50. Mukherjee, S. Order Reduction of Linear Systems using Eigenspectrum Analysis. J. Inst. Eng. (India) Electr. Eng. Div. 1996, 77, 76–79. [Google Scholar]
  51. Therapos, C.P.; Diamessis, J.E. A new method for linear system reduction. J. Frankl. Inst. 1984, 317, 359–371. [Google Scholar] [CrossRef]
  52. Edgar, T.F. Least squares model reduction using step response. Int. J. Control 1975, 22, 261–270. [Google Scholar] [CrossRef]
  53. Hutton, M.; Friedland, B. Routh approximations for reducing order of linear, time-invariant systems. IEEE Trans. Autom. Control 1975, 20, 329–337. [Google Scholar] [CrossRef]
  54. Shamash, Y. Linear system reduction using Pade approximation to allow retention of dominant modes. Int. J. Control 1975, 21, 257–272. [Google Scholar] [CrossRef]
Figure 1. Step responses of the original and reduced models.
Figure 1. Step responses of the original and reduced models.
Mathematics 12 02991 g001
Figure 2. Step responses of the original and reduced models.
Figure 2. Step responses of the original and reduced models.
Mathematics 12 02991 g002
Figure 3. Step responses of the original and reduced models.
Figure 3. Step responses of the original and reduced models.
Mathematics 12 02991 g003
Table 1. IRE of the reduced models.
Table 1. IRE of the reduced models.
MethodReduced ModelIRE
Original system F 10 ( s ) 0.1503
CoFML F ^ 2 ( s ) 0.1422
AFD [14] 0.5367 s + 20.96 s 2 + 11.86 s + 20.97 0.1498
G. Parmar et al. [16] 28.367 s + 647.60193 s 2 + 359.999 s + 647.60193 8.2056
Edgar [52] 0.93 s + 26.28 s 2 + 14.92 s + 26.4961 0.1580
Therapos and Diamessis [51] 1.999638 s + 37.32915 s 2 + 20.34 s + 37.332 0.2027
Table 2. IRE of the reduced models.
Table 2. IRE of the reduced models.
MethodReduced Model IRE 1 IRE 2
Original system F 4 ( s ) 11.724220.5297
CoFML F ^ 2 ( s ) 10.831119.4217
Routh [53] 30 24 s + 40 72 3 s 2 + 6 s + 4 10.214020.6014
S. Gugercin et al. [10,11] 0.7108 0.9026 s 2 + 10.4003 6.4600 s + 15.9178 28.6521 s 2 + 2.2218 s + 1.5918 8.448817.9249
Table 3. IRE of the reduced models.
Table 3. IRE of the reduced models.
Model Reduction MethodReduced Model
Original system F 4 ( s )
CoFML F ^ 2 ( s )   
Padé [54] 2.5095 + 1.2406 s 1782.9822 + 931.9822 s 12.1064 + 7.2706 s 2492.4439 + 1213.9036 s s 2 + 3.1976 s + 2.4916   
S. Gugercin et al. [10,11] 0.241 s 2 + 0.7760 s + 2.8787 8.9938 s 2 + 932.5588 s + 2044.2005 0.6891 s 2 + 7.7785 s + 13.8874 10.4496 s 2 + 1202.2714 s + 2915.2559 s 2 + 3.4801 s + 2.8581  
Table 4. IRE of the reduced models.
Table 4. IRE of the reduced models.
Method IRE 1 IRE 2 IRE 3 IRE 4
Original system0.12933.4819 6.5224 × 10 4 1.2412 × 10 5
CoFML0.12803.5463 6.9290 × 10 4 1.2854 × 10 5
Padé [54]0.13483.7616 7.1415 × 10 4 1.3136 × 10 5
S. Gugercin et al. [10,11]0.21646.0371 7.4435 × 10 4 1.3888 × 10 5
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xie, W.; Song, J. A Covariance-Free Strictly Complex-Valued Relevance Vector Machine for Reducing the Order of Linear Time-Invariant Systems. Mathematics 2024, 12, 2991. https://doi.org/10.3390/math12192991

AMA Style

Xie W, Song J. A Covariance-Free Strictly Complex-Valued Relevance Vector Machine for Reducing the Order of Linear Time-Invariant Systems. Mathematics. 2024; 12(19):2991. https://doi.org/10.3390/math12192991

Chicago/Turabian Style

Xie, Weixiang, and Jie Song. 2024. "A Covariance-Free Strictly Complex-Valued Relevance Vector Machine for Reducing the Order of Linear Time-Invariant Systems" Mathematics 12, no. 19: 2991. https://doi.org/10.3390/math12192991

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop