Next Article in Journal
A Novel Multi-Dimensional Composition Method Based on Time Series Similarity for Array Pulse Wave Signals Detecting
Next Article in Special Issue
Representing Deep Neural Networks Latent Space Geometries with Graphs
Previous Article in Journal
Variational Multiscale Nonparametric Regression: Algorithms and Implementation
Previous Article in Special Issue
Spikyball Sampling: Exploring Large Networks via an Inhomogeneous Filtered Diffusion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spectrum-Adapted Polynomial Approximation for Matrix Functions with Applications in Graph Signal Processing

1
Institute for Computational & Mathematical Engineering, Stanford University, Stanford, CA 94305, USA
2
Department of Mathematics, Statistics, and Computer Science, Macalester College, St. Paul, MN 55105, USA
3
IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
4
Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
*
Author to whom correspondence should be addressed.
Algorithms 2020, 13(11), 295; https://doi.org/10.3390/a13110295
Submission received: 16 October 2020 / Revised: 16 October 2020 / Accepted: 29 October 2020 / Published: 13 November 2020
(This article belongs to the Special Issue Efficient Graph Algorithms in Machine Learning)

Abstract

:
We propose and investigate two new methods to approximate f ( A ) b for large, sparse, Hermitian matrices A . Computations of this form play an important role in numerous signal processing and machine learning tasks. The main idea behind both methods is to first estimate the spectral density of A , and then find polynomials of a fixed order that better approximate the function f on areas of the spectrum with a higher density of eigenvalues. Compared to state-of-the-art methods such as the Lanczos method and truncated Chebyshev expansion, the proposed methods tend to provide more accurate approximations of f ( A ) b at lower polynomial orders, and for matrices A with a large number of distinct interior eigenvalues and a small spectral width. We also explore the application of these techniques to (i) fast estimation of the norms of localized graph spectral filter dictionary atoms, and (ii) fast filtering of time-vertex signals.

1. Introduction

Efficiently computing f ( A ) b , a function of a large, sparse Hermitian matrix times a vector, is an important component in numerous signal processing, machine learning, applied mathematics, and computer science tasks. Application examples include graph-based semi-supervised learning methods [1,2,3]; graph spectral filtering in graph signal processing [4]; convolutional neural networks/deep learning [5,6]; clustering [7,8]; approximating the spectral density of a large matrix [9]; estimating the numerical rank of a matrix [10,11]; approximating spectral sums such as the log-determinant of a matrix [12] or the trace of a matrix inverse for applications in physics, biology, information theory, and other disciplines [13]; solving semidefinite programs [14]; simulating random walks [15] (Chapter 8); and solving ordinary and partial differential equations [16,17,18].
References [19] (Chapter 13), [20,21,22] survey different approaches to this well-studied problem of efficiently computing
f ( A ) b : = V f ( Λ ) V b ,
where the columns of V are the eigenvectors of the Hermitian matrix A R N × N ; Λ is a diagonal matrix whose diagonal elements are the corresponding eigenvalues of A , which we denote by λ 1 , λ 2 , , λ N ; and f ( Λ ) is a diagonal matrix whose kth diagonal entry is given by f ( λ k ) . For large matrices, it is not practical to explicitly compute the eigenvalues of A in order to approximate (1). Rather, the most common techniques, all of which avoid a full eigendecomposition of A , include (i) truncated orthogonal polynomial expansions, including Chebyshev [23,24,25] and Jacobi; (ii) rational approximations [21] (Section 3.4); (iii) Krylov subspace methods such as the Lanczos method [23,26,27,28,29]; and (iv) quadrature/contour integral methods [19] (Section 13.3).
Our focus in this work is on polynomial approximation methods. Let p K ( λ ) = c 0 + k = 1 K c k λ k be a degree K polynomial approximation to the function f on a known interval [ λ ̲ , λ ¯ ] containing all of the eigenvalues of A . Then the approximation p K ( A ) b can be computed recursively, either through a three-term recurrence for specific types of polynomials (see Section 3 for more details), or through a nested multiplication iteration [30] (Section 9.2.4), letting x ( 0 ) = c K b , and then iterating
x ( l ) = c K l b + A x ( l 1 ) , l = 1 , 2 , , K .
The computational cost of either of these approaches is dominated by multiplying the sparse matrix A by K different vectors. The approximation error is bounded by
f ( A ) p K ( A ) 2 = max = 1 , 2 , , N | f ( λ ) p K ( λ ) |
sup λ [ λ ̲ , λ ¯ ] | f ( λ ) p K ( λ ) | .
If, for example, p K is a degree K truncated Chebyshev series approximation of an analytic function f, the upper bound in () converges geometrically to 0 as K increases, at a rate of O ρ K , where ρ is the radius of an open Bernstein ellipse on which f is analytic and bounded (see, e.g., [31] (Theorem 5.16), [32] (Theorem 8.2)).
In addition to the computational efficiency and convergence guarantees, a third advantage of polynomial approximation methods is that they can be implemented in a distributed setting [33]. A fourth advantage is that the ith element of p K ( A ) b only depends on the elements of b within K hops of i on the graph associated with A (e.g., if A is a graph Laplacian matrix, a nonzero entry in the ( i , j ) th element of A , where i j , corresponds to an edge connecting vertices i and j in the graph). This localization property is important in many graph-based data analysis applications, such as graph spectral filtering [34] and deep learning [5]. Finally, as opposed to other methods that incorporate prior knowledge about b into the choice of the approximating polynomial (e.g., [35] considers vectors b from a zero-mean distribution with a known covariance matrix), the polynomial approximations resulting from the methods we propose do not depend on b or any information about b . Thus, in applications where the computation of f ( A ) b is repeated for many different vectors b with the same f and A , the polynomial coefficients only need to be computed a single time.
While the classical truncated orthogonal polynomial expansion methods (e.g., Chebyshev, Legendre, Jacobi) aim to approximate the function f throughout the full interval [ λ ̲ , λ ¯ ] , it is only the polynomial approximation error at the eigenvalues of A that affects the overall error in (3). With knowledge of the complete set of eigenvalues, we could do better, for example, by fitting a degree K polynomial via the discrete least squares problem min p P K = 1 N f ( λ ) p ( λ ) 2 . In Figure 1, we show an example of such a discrete least squares fitting. The resulting approximation error f ( A ) p K ( A ) 2 for K = 5 is 0.020, as opposed to 0.347 for the degree 5 truncated Chebyshev approximation. This is despite the fact that sup λ [ λ ̲ , λ ¯ ] | f ( λ ) p K ( λ ) | is equal to 0.650 for the discrete least squares approximation, as opposed to 0.347 for the Chebyshev approximation.
While in our setting we do not have access to the complete set of eigenvalues, our approach in this work is to leverage recent developments in efficiently estimating the spectral density of the matrix A , to adapt the polynomial to the spectrum in order to achieve better approximation accuracy at the (unknown) eigenvalues. After reviewing spectral density estimation in the next section, we present two new classes of spectrum-adapted approximation techniques in Section 3. In Section 4, we perform numerical experiments, approximating f ( A ) b for different matrices A and functions f, and discuss the situations in which the proposed methods work better than the state-of-the-art methods. In Section 5 and Section 6, we explore the application of the proposed technique to fast estimation of the norms of localized graph spectral filter dictionary atoms and fast filtering of time-vertex signals, respectively.

2. Spectral Density Estimation

The cumulative spectral density function or empirical spectral cumulative distribution of the matrix A is defined as
P λ ( z ) : = 1 N = 1 N 1 1 λ z ,
where 1 1 { C } = 1 if statement C is true and 1 1 { C } = 0 otherwise. The spectral density function [36] (Chapter 6) (also called the Density of States or empirical spectral distribution [37] (Chapter 2.4)) of A is the probability measure defined as
p λ ( z ) : = 1 N = 1 N 1 1 λ = z ·
Lin et al. [9] provide an overview of methods to approximate these functions. In this work, we use a variant of the Kernel Polynomial Method (KPM) [38,39,40] described in [9,41] to estimate the cumulative spectral density function P λ ( z ) of A . Namely, for each of S linearly spaced points { ξ i } i = 1 S between λ ̲ and λ ¯ , we estimate the number of eigenvalues less than or equal to ξ i via stochastic trace estimation [42,43]. Let x denote a Gaussian random vector with distribution N ( 0 , I ) , { x ( j ) } j = 1 J denote a sample of size J from this distribution, and Θ ˜ ξ i denote a Jackson–Chebyshev polynomial approximation to Θ ξ i ( λ ) : = 1 1 λ ξ i [44,45]. The stochastic trace estimate of the number of eigenvalues less than or equal to ξ i is then given by
η i = tr Θ ξ i ( A ) = E [ x Θ ξ i ( A ) x ] 1 J j = 1 J x ( j ) Θ ˜ ξ i ( A ) x ( j ) .
As in [46], we then form an approximation P ˜ λ ( z ) to P λ ( z ) by performing monotonic piecewise cubic interpolation [47] on the series of points ξ i , η i N i = 1 , 2 , , S . Analytically differentiating P ˜ λ ( z ) yields an approximation p ˜ λ ( z ) to the spectral density function p λ ( z ) . Since P ˜ λ ( z ) is a monotonic cubic spline, we can also analytically compute its inverse function P ˜ λ 1 ( y ) . The spectrum-adapted methods we propose in Section 3 utilize both p λ ( z ) and P ˜ λ 1 ( y ) to focus on regions of the spectrum with higher eigenvalue density when generating polynomial approximations. Figure 2 shows examples of the estimated cumulative spectral density functions for eight real, symmetric matrices A : the graph Laplacians of the Erdös–Renyi graph (gnp) from Figure 1, the Minnesota traffic network [48] ( N = 2642 ), and the Stanford bunny graph [49] ( N = 2503 ); the normalized graph Laplacian of a random sensor network ( N = 5000 ) from [50]; and the net25 ( N = 9520 ), si2 ( N = 769 ), cage9 ( N = 3534 ), and saylr4 ( N = 3564 ) matrices from the SuiteSparse Matrix Collection [51] (We use A + A 2 for cage9, and for net25 and saylr4, we generate graph Laplacians based on the off-diagonal elements of A . For saylr4, we scale the entire Laplacian by a factor of 1 2000 ).
The computational complexity of forming the estimate P ˜ λ ( z ) is O ( M J K Θ ) , where M is the number of nonzero entries in A , J is the number of random vectors in (6) (in our experiments, J = 10 suffices), and K Θ is the degree of the Jackson–Chebyshev polynomial approximations Θ ˜ ξ i [41]. While this cost is non-negligible if computing f ( A ) b for a single f and a single b , it only needs to be computed once for each A if repeating this calculation for multiple functions f or multiple vectors b , as is often the case in the applications mentioned above.

3. Spectrum-Adapted Methods

In this section, we introduce two new classes of degree K polynomial approximations p K ( A ) b to f ( A ) b , both of which leverage the estimated cumulative spectral density function P ˜ λ ( z ) .

3.1. Spectrum-Adapted Polynomial Interpolation

In the first method, we take y k : = cos k π K + 1 2 , for k = 0 , 1 , , K , which are the K + 1 extrema of the degree K Chebyshev polynomial shifted to the interval [ 0 , 1 ] . We then warp these points via the inverse of the estimated cumulative spectral density function by setting x k = P λ 1 ( y k ) , before finding the unique degree K polynomial interpolation through the points { ( x k , f ( x k ) ) } k = 0 , 1 , , K . The intuition behind warping the interpolation points is that (i) a better approximation is attained in areas of the spectrum (domain) with more interpolation points, (ii) the error in (3) only depends on the errors at the eigenvalues of A, so we would like the approximation to be best in regions with many eigenvalues, and (iii) as shown in Figure 3, using the inverse of the estimated cumulative spectral density function as the warping function leads to a higher density of the warped points { x k } falling in higher density regions of the spectrum of A . Thus, the warping should ideally lead to more interpolation points in high density regions of the spectrum, better approximation of the target function in these regions, and, in turn, a reduction in the error (3), as compared to interpolations generated from points spread more evenly across the spectrum.
To find the (unique) degree K polynomial interpolation, our numerical implementation uses MATLAB’s polyfit function, which centers and scales the data and then solves the resulting system of equations via a QR decomposition. Once the interpolating polynomial coefficients are attained, p K ( A ) b can be computed, e.g., via (2) or by representing the interpolating polynomial as a linear combination of Chebyshev polynomials and using Chebyshev coefficients in the associated three-term recurrence [23,32]. This entire procedure is detailed in Algorithm 1.
Algorithm 1 Spectrum-adapted polynomial interpolation.
Input Hermitian matrix A R N × N , vector b R N , function f, polynomial degree K
Output degree K approximation p K ( A ) b f ( A ) b R N
1:
Compute P ˜ λ ( z ) , an estimate of the cumulative spectral density of A
2:
fork in 0 : K do
3:
   y k 1 2 [ cos ( k π K ) + 1 ]
4:
   x k P ˜ λ 1 ( y k )
5:
end for
6:
Find the (unique) degree K polynomial interpolation p K through the points { ( x k , f ( x k ) ) } k = 0 , 1 , , K
7:
Compute p K ( A ) b via (2)
8:
return p K ( A ) b

3.2. Spectrum-Adapted Polynomial Regression/Orthogonal Polynomial Expansion

A second approach is to solve the weighted least squares polynomial regression problem
min p P K m = 1 M w m f ( x m ) p ( x m ) 2 ,
where the abscissae { x m } m = 1 , 2 , , M and weights { w m } m = 1 , 2 , , M are chosen to capture the estimated spectral density function. We investigated several methods to set the points (e.g., linearly spaced points, Chebyshev points on the interval [ λ ̲ , λ ¯ ] , Chebyshev points on each subinterval [ ξ i , ξ i + 1 ] , and warped points via the inverse of the estimated cumulative spectral density function as in Section 3.1) and weights (e.g., the analytically computed estimate p ˜ λ of the spectral density function, a discrete estimate of the spectral density function based on the eigenvalue counts in (6), the original KPM density of states method based on a truncated Chebyshev expansion [9] (Equation 3.11), or equal weights for warped points). Without going into extensive detail about these various options, we remark that choosing abscissae { x m } that are very close to each other, which may occur when using points warped by the inverse of the estimated density function, may lead to numerical instabilities when solving the weighted least squares problem. In the numerical experiments, we use M evenly spaced points on the interval [ λ ̲ , λ ¯ ] (i.e., x m = m 1 M 1 ( λ ¯ λ ̲ ) + λ ̲ ), and set the weights to be w m = p ˜ λ ( x m ) . To solve (7), we use a customized variant of MATLAB’s polyfit function that solves, again via QR decomposition, the normal equations of the weighted least squares problem:
Ψ x W Ψ x c = Ψ x W y ,
where Ψ x is the Vandermonde matrix associated with the points { x m } , W is a diagonal matrix with diagonal elements equal to the weights { w m } , y is a column vector with entries equal to { f ( x m ) } , and c is the vector of unknown polynomial coefficients. Once the coefficients c of the optimal polynomial, p K * are attained, p K * ( A ) b can once again be computed, e.g., via (2). A summary of this method is detailed in Algorithm 2 (As pointed out by an anonymous reviewer, since our estimate p ˜ λ of the spectral density function is a piecewise quadratic function, the optimization problem (7) could be replaced by its continuous analog, min p P K λ ̲ λ ¯ f ( x ) p ( x ) 2 p ˜ λ ( x ) d x , which can be solved analytically for many functions f ( · ) ).
Algorithm 2 Spectrum-adapted polynomial regression.
Input Hermitian matrix A R N × N , vector b R N , function f, polynomial degree K, number of grid points M
Output degree K approximation p K ( A ) b f ( A ) b R N
1:
Compute p ˜ λ ( z ) , an estimate of the spectral density of A
2:
form in 1 : M do
3:
   x m m 1 M 1 ( λ ¯ λ ̲ ) + λ ̲
4:
   w m p ˜ λ ( x m )
5:
end for
6:
p K argmin p P K m = 1 M w m f ( x m ) p ( x m ) 2
7:
Compute p K ( A ) b via (2)
8:
return p K ( A ) b
An alternative way to view this weighted least squares method [52] is as a truncated expansion in polynomials orthogonal with respect to the discrete measure d λ M with finite support at the points { x m } , and an associated inner product [53] (Section 1.1)
f , g d λ M = R f ( x ) g ( x ) d λ M = m = 1 M w m f ( x m ) g ( x m ) .
The M discrete monic orthogonal polynomials { π k , M } k = 0 , 1 , M 1 satisfy the three-term recurrence relation [53] (Section 1.3)
π k + 1 , M ( x ) = ( x α k , M ) π k , M ( x ) β k , M π k 1 , M ( x ) , k = 0 , 1 , , M 1 ,
with π 1 , M ( x ) = 0 , π 0 , M ( x ) = 1 , β 0 , M = m = 1 M w m ,
α k , M = x π k , M , π k , M d λ M π k , M , π k , M d λ M , k = 0 , 1 , , M 1 , and β k , M = π k , M , π k , M d λ M π k 1 , M , π k 1 , M d λ M , k = 1 , 2 , , M 1 .
Given the abscissae { x m } and weights { w m } , the three-term recursion coefficients { α k , M } k = 0 , 1 , , M 1 and { β k , M } k = 1 , 2 , , M 1 can also be computed through a stable Lanczos type algorithm on an ( M + 1 ) × ( M + 1 ) matrix [53] (Section 2.2.3), [54]. In matrix-vector notation, the vectors π k , M R M , which are the discrete orthogonal polynomials evaluated at the M abscissae, can be computed iteratively by the relation
π k + 1 , M = ( diag ( { x m } ) α k , M I M ) π k , M β k , M π k 1 , M , k = 0 , 1 , , M 1 ,
with π 1 , M = 0 M and π 0 , M = 1 M . Figure 4 shows an example of these discrete orthogonal polynomials.
Finally, the degree K polynomial approximation to f ( A ) b is computed as
p K ( A ) b = k = 0 K f , π k , M d λ M π k , M , π k , M d λ M π k , M ( A ) b ,
with π 1 , M ( A ) b = 0 N , π 0 , M ( A ) b = b , and
π k + 1 , M ( A ) b = ( A α k , M I N ) π k , M ( A ) b β k , M π k 1 , M ( A ) b , k = 0 , 1 , , K 1 ( where K M 1 ) .
Before proceeding to numerical experiments, we briefly comment on the relationship between the spectrum-adapted approximation proposed in this section and the Lanczos approximation to f ( A ) b , which is given by [19] (Section 13.2), [23]
Q K f ( T K ) Q K b = | | b | | 2 Q K f ( T K ) e 1 ,
where Q K is an N × ( K + 1 ) matrix whose columns form an orthonormal basis for K K ( A , b ) = span b , Ab , , A K b , a Krylov subspace. In (9), T K = Q K A Q K is a ( K + 1 ) × ( K + 1 ) tridiagonal Jacobi matrix. The first column of Q K is equal to b | | b | | . The approximation (9) can also be written as q K ( A ) b , where q K is the degree K polynomial that interpolates the function f at the K + 1 eigenvalues of T K [19] (Theorem 13.5), [55]. Thus, unlike classical polynomial approximation methods, such as the truncated Cheybshev expansion, the Lanczos method is indirectly adapted to the spectrum of A . The Lanczos method differs from proposed method in that T K and the Lanczos approximating polynomial q K depend on the initial vector b . Specifically, the polynomials { π ˜ k } generated from the three-term recurrence of the form (8)
γ k + 1 π ˜ k + 1 ( x ) = ( x α k ) π ˜ k ( x ) γ k π ˜ k 1 ( x ) ,
with the { α k } k = 0 , 1 , , K and { γ k } k = 1 , 2 , , K coefficients taken from the diagonal and superdiagonal entries of T K , respectively, are orthogonal with respect to the piecewise-constant measure
μ ( x ) = 0 , x < λ 1 j = 1 i [ b ^ ( j ) ] 2 , λ i x < λ i + 1 j = 1 N [ b ^ ( j ) ] 2 = 1 , λ N x ,
where b ^ = V q 1 = V b | | b | | , and b ^ ( j ) is its jth component [56] (Theorem 4.2). If b ^ happens to be a constant vector, then μ ( x ) = P λ ( x ) from (5). If A is a graph Laplacian, b ^ is the graph Fourier transform [4] of b , normalized to have unit energy.

4. Numerical Examples and Discussion

In Figure 5, for different functions f ( λ ) and matrices A , we approximate f ( A ) b with b = V 1 and polynomial approximation orders ranging from K = 3 to 25. To estimate the cumulative spectral density function P ˜ λ ( z ) with parameters S = 10 , J = 10 , and K Θ = 30 , we use the KPM, as shown in Figure 2. Based on the analytical derivative and inverse function of P ˜ λ ( z ) , we obtain the two proposed spectrum-adapted polynomial approximations for f ( λ ) , before computing each p K ( A ) b via the corresponding three-term recursion. We compare the proposed methods to the truncated Chebyshev expansion and the Lanczos method with the same polynomial order. Note that when b is a constant vector in the spectral domain of A , the relative error | | f ( A ) b p K ( A ) b | | 2 2 | | f ( A ) b | | 2 2 is equal to = 1 N f ( λ ) p K ( λ ) 2 = 1 N f ( λ ) 2 , the numerator of which is the discrete least squares objective mentioned in Section 1. The first column of Figure 5 displays the errors at all eigenvalues of A for each order 10 polynomial approximation of f ( λ ) = e λ . The second column examines the convergence of relative errors in approximating e A b for matrices with various spectral distributions, for each of the four methods.
In the field of graph signal processing [4], it is common to analyze or modify a graph signal b R N , where b ( i ) is the value of the graph signal at vertex i of a weighted, connected graph G with N vertices, by applying a graph spectral filter f l . The filtered signal is exactly the product (1) of a function of the graph Laplacian (or some other symmetric matrix) and a vector, the graph signal b . In Figure 6, we show examples of collections of such functions, commonly referred to as graph spectral filter banks. In the right three columns of Figure 5, we examine the relative errors incurred by approximating f l ( A ) b for the lowpass, bandpass, and highpass graph spectral filters f 1 , f 3 , and f 5 shown in Figure 6a. We expand on the applications of such graph spectral filters in Section 5.
We make two observations based on the numerical examples:
  • The spectrum-adapted interpolation method often works well for low degree approximations ( K 10 ), but is not very stable at higher orders due to overfitting of the polynomial interpolant to the specific K + 1 interpolation points (i.e., the interpolant is highly oscillatory).
  • The proposed spectrum-adapted weighted least squares method tends to outperform the Lanczos method for matrices such as si2 and cage9 that have a large number of distinct interior eigenvalues.

5. Application I: Estimation of the Norms of Localized Graph Spectral Filter Dictionary Atoms

A common method to extract information from data residing on a weighted, undirected graph is to represent the graph signal as a linear combination of building block graph signals called atoms, a collection of which is called a dictionary. In this section, we consider localized spectral graph filter dictionaries with the form D = { φ i , l } i = 1 , 2 , , N ; l = 1 , 2 , , L , where each atom is defined as φ i , l : = f l ( L ) δ i with L being the graph Laplacian and δ i having a value of 1 at vertex i and 0 elsewhere. Each atom can be interpreted as the result of localizing a spectral pattern characterized by the filter function f l to be centered at vertex i in the graph. See [58] for more details about localized spectral graph filter dictionaries and their applications as transforms and regularizers in myriad signal processing and machine learning tasks.
For large, sparse graphs, the dictionary atoms are never explicitly computed; rather, their inner products with the graph signal are approximated by b , φ i , l = δ i f l ( L ) b δ i p l , K ( L ) b , using polynomial approximation methods such as those described in Section 1 or those proposed in this work. However, in graph signal processing applications such as thresholding for denoising or compression [58,59,60] or non-uniform random sampling and interpolation of graph signals [41,45,58,61], it is often important to form a fast estimate of the norms of the dictionary atoms, { | | φ i , l | | 2 } i , l . Since | | φ i , l | | 2 2 = φ i , l , φ i , l = δ i f l 2 ( L ) δ i is a bilinear form of the type u f ( A ) u , the norm of a single atom can be estimated via quadrature methods such as Lanczos quadrature [53] (Ch. 3.1.7), [56] (Ch. 7), [62]; however, doing this for all N L atoms is not computationally tractable since each requires a different combination of function and starting vector. Other alternatives include the methods discussed in [63] for estimating diagonal elements of a matrix that is not explicitly available but for which matrix-vector products are easy to evaluate, as | | φ i , l | | 2 = f l 2 ( L ) i , i .
In this application example, we estimate the norms of the dictionary atoms through the products of matrix functions with random vectors, as follows. Let x be a random vector with each component having an independent and identical standard normal distribution (in fact, we are only utilizing the property that the random components have unit variance). Then we have
var φ i , l , x = var n = 1 N x ( n ) φ i , l ( n ) = n = 1 N [ φ i , l ( n ) ] 2 var ( x ( n ) ) = | | φ i , l | | 2 2 .
Thus, to estimate | | φ i , l | | 2 , it suffices to estimate sd φ i , l , x = sd δ i f l ( L ) x sd   δ i p l , K ( L ) x for a degree K polynomial approximation p l , K to f l . We therefore define each atom norm estimate as the sample standard deviation
ν i , l : = sd δ i p l , K ( L ) x ( j ) j = 1 , 2 , , J | | φ i , l | | 2 ,
where each x ( j ) is a realization of the random vector x . In the numerical experiments, we compare the estimates resulting from Chebyshev polynomial approximation to those resulting from spectrum-adapted weighted least squares polynomial approximation. As a computational aside, in the process of estimating the spectral density via KPM in (6), we have already computed T ¯ k ( L ) x ( j ) for each k = 0 , 1 , , K and each j, where T ¯ k are the Chebyshev polynomials shifted to the interval [ 0 , λ max ] . From these quantities, we can easily compute the p l , K ( L ) x ( j ) vectors in (10) for different values of l (different filters). See [41] (Sec. III.B.1) for details.
In the top row of Figure 7, we demonstrate the estimation of the norms of the atoms of a spectral graph wavelet dictionary generated by localizing the four filters in Figure 6b to each of the vertices of the bunny graph. Figure 7a shows the exact norms of the N L = 2503 · 4 = 10012 dictionary atoms. In Figure 7b, we plot the estimated norms of the atoms, generated via degree K = 8 spectrum-adapted weighted least squares polynomial approximation with J = 50 random vectors in (10), against the actual atom norms. The ratios of the estimated atom norms to the actual norms are shown in Figure 7c. We show the mean of the relative error ν i , l | | φ i , l | | 2 1 across all of these atoms as a single point in Figure 7d, and also repeat this experiment with different values of K and J and different classes of approximating polynomials, as well as for different graphs in Figure 7e–f. On all three graphs and at both values of J, for low degrees K, the estimates generated from the spectrum-adapted polynomial least squares method have lower mean relative error than those based on Chebyshev polynomial approximation. While these examples are on small to medium graphs for comparison to the exact atom norms, the method scales efficiently to dictionaries designed for large, sparse graphs.

6. Application II: Fast Filtering of Time-Vertex Signals

In this section, we demonstrate the use of the spectrum-adapted approximation methods in Section 3 to accelerate the the joint filtering of time-vertex signals in both time and graph spectral domains [50,64,65].

6.1. Time-Vertex Signals

We consider a weighted, undirected graph G = { V , E , W G } with N vertices, where V is the set of vertices, E is the set of edges, and W G is the weighted adjacency matrix. The combinatorial graph Laplacian is defined as L G : = D G W G , where D G is diagonal with D G ( i , i ) equal to the degree of the ith vertex. At each vertex, we observe a time series of T observations. Thus, the time-varying graph signal can be represented as a matrix X R N × T , where X i , j is the value on the ith vertex at the jth time. Figure 8 shows an example of a time-vertex signal.

6.2. Time-Vertex Filtering

Fixing a point in time, each column of X is a graph signal defined on G. Let x t j R N × 1 denote the jth column of X , j = 1 , , T . Based on the graph structure of G, we can perform high-dimensional graph signal processing tasks on x t j , such as filtering, denoising, inpainting, and compression [4]. In particular, for a filter g : σ ( L G ) C defined on the eigenvalues of L G , the graph spectral filtering of x t j can be computed as g ( L G ) x t j = U G g ( Λ G ) U G * x t j , where U G * is the conjugate transpose of U G , and L G = U G Λ G U G * is the spectral decomposition of L G .
Conversely, focusing on one vertex of G, the ith row of X is a discrete time signal x v i R 1 × T , which indicates how the signal value changes over time on the ith vertex. We can compute the one-dimensional discrete Fourier transform (DFT) of x v i by x ˜ v i = x v i U ¯ R , where U R is the normalized DFT matrix of size T, and U ¯ R is its complex conjugate. The DFT converts a signal from the time domain to the frequency domain, and allows for the amplification or attenuation of different frequency components of the signal. This process is referred to as frequency filtering in classical signal processing [4]. Frequency filtering of classical one-dimensional signals is equivalent to graph spectral filtering of graph signals on a ring graph [66] (Theorem 5.1). Let L R denote the graph Laplacian of a ring graph with T vertices. Its spectral decomposition gives L R = U R Λ R U R * , where U R comprises the DFT basis vectors (normalized to have length 1), and the k t h eigenvalue is given by Λ R ( k , k ) = 2 2 cos 2 π ( k 1 ) T , for k = 1 , , T .
The joint time-vertex Fourier transform of a time-varying graph signal X is defined as the combination of a graph Fourier transform and a DFT [50]:
X ^ : = U G * X U ¯ R .
A joint time-vertex filter h : σ ( L G ) × σ ( L R ) C is defined for all combinations of ( λ G , λ R ) where λ G is an eigenvalue of L G and λ R is an eigenvalue of L R . The joint time-vertex filtering is defined accordingly as
h ( L G , L R ) X : = U G ( H ( U G * X U ¯ R ) ) U R ,
where H R N × T has entries H i , j = h ( λ G i , λ R j ) , and ∘ denotes the entry-wise product of two matrices. Figure 9 shows two examples of time-vertex filters: an ideal lowpass filter
h ( λ G , λ R ) = 1 1 ( λ G < 1 2 λ G max , λ R < 2 ) ,
and a wave filter
h ( λ G , λ R ) = 5 e 100 arccos 2 λ R 2 arccos 1 λ G 2 λ G max 2 ,
where λ G max is the largest eigenvalue of L G . The eigenvalues of L R range from 0 to 4 regardless of the size of T, so 1 2 λ R max = 2 .
As a two-dimensional extension of spectral filtering, the time-vertex filtering decomposes the input signal into N T orthogonal components, where each component corresponds to an outer product of a graph Laplacian eigenvector and a DFT basis function. Then, the components are amplified or attenuated by the corresponding scalars h ( λ G , λ R ) . Finally, the scaled components are added together to obtain the filtered signal.

6.3. Spectrum-Adapted Approximation of Time-Vertex Filtering

Due to the high complexity of the spectral decomposition required to compute U G , approximation methods have been developed to accelerate joint time-vertex filtering, such as Chebyshev2D [64], ARMA2D [67], and the Fast Fourier Chebyshev algorithm [50].
As outlined in Algorithm 3, we can use the methods described in Section 3 to efficiently approximate the filtering of time-vertex signals. The overall complexity of our method is O ( N T log T + T K M ) , where M is the number of nonzero entries in L G . The FFT of N discrete-time signals of length T takes O ( N T log T ) . The loop computes T spectrum-adapted approximations to matrix functions with polynomials of degree K, and thus has a complexity of O ( T K M ) for sparse L G .
Algorithm 3 Spectrum-adapted approximate time-vertex filtering.
Input weighted undirected graph G with N vertices, time-vertex signal X R N × T , filter h
Output time-vertex filtered signal Y = h ( L G , L R ) X R N × T
1:
X ˜ FFT of X , where the nth row of X ˜ is the 1D FFT of the nth row of X , for n = 1 , , N
2:
Estimate the spectral density of G
3:
fort in 1 : T do
4:
   f t ( λ G ) = h ( λ G , λ R t )
5:
  Find the best order K polynomial approximation p k ( λ G ) to f t ( λ G ) via interpolation on the warped Chebyshev points (described in Section 3.1), or weighted least squares regression with evenly spaced abscissae and weights from the estimated spectral PDF (described in Section 3.2)
6:
  Compute p k ( L G ) x ˜ t , where x ˜ t is the t t h column of X ˜
7:
   t t h column of Y ˜ p k ( L G ) x ˜ t
8:
end for
9:
Y inverse FFT of Y ˜ , where the nth row of Y is the 1D inverse FFT of the nth row of Y ˜
10:
return Y

6.4. Numerical Experiments

We consider the ideal lowpass filter (12) and the wave filter (13). We approximate h ( L G , L R ) X for both filter functions, with T = 1000 observations, and for different graphs G: gnp ( N = 500 ), saylr4 ( N = 3564 ) and a random sensor network ( N = 5000 ), the cumulative spectral densities of which are shown in Figure 2. In each case, we choose X = 1 N T i = 1 , , N ; j = 1 , , T V G i V T j , i.e., a constant vector in the joint spectral domain of L G and L R , in order to test the average performance of the approximation methods over different combinations of eigenvalue pairs. With polynomial approximation orders ranging from K = 1 to 30, we follow the procedure described in Algorithm 3 to approximate h ( L G , L R ) X . We estimate the cumulative spectral density functions P ˜ λ ( z ) with parameters T = 10 , J = 10 , and K Θ = 30 . We use M = 100 in the spectrum-adapted polynomial regression/orthogonal polynomial expansion when finding the best polynomial approximation. We compare the proposed methods to the truncated Chebyshev expansion and the Lanczos method with the same polynomial order. For each method, we examine the convergence of relative errors in Frobenius norm as a function of K for graphs with various structures (and thus various distributions of Laplacian eigenvalues). The results are summarized in Figure 10. Similar to our observation in Figure 5, we see that the spectrum-adapted interpolation method performs well for lower polynomial orders ( K 5 ) , but tends to be unstable at higher polynomial orders.

6.5. Dynamic Mesh Denoising

Finally, we replicate a dynamic mesh denoising example presented in [50] (Sec. VI.B). The original time-varying sequence of 3D meshes of a walking dog features meshes from T = 59 time instances, each with N = 2502 points in 3D space (x-y-z coordinates). This sequence is denoted by the 2502 × 59 × 3 matrix X . The original 3D mesh from time t = 5 is shown in Figure 11a. The dynamic mesh sequence is corrupted by adding Gaussian noise (mean 0, standard deviation equal to 0.2 | | X | | F 2502 · 59 · 3 ) to each mesh point coordinate. We denote the noisy 3D mesh sequence, one element of which is shown in Figure 11b, by Y . A single graph is created by subtracting the mean x, y, and z coordinates of each noisy mesh from that mesh, averaging the centered noisy mesh coordinates across all 59 time instances, and then building a 5-nearest neighbor graph on the average centered noisy mesh coordinates. The resulting graph and its spectral density function are shown in Figure 11e–f. The dynamic mesh sequence is denoised by solving the Tikhonov regularization problem [50] (Equation (30))
X denoised = argmin Z | | Z Y | | F 2 + τ 1 tr ( Z L G Z ) + τ 2 tr ( Z L R Z ) .
The optimization problem (14) has a closed-form solution
X denoised ( : , : , i ) = h tik ( L G , L R ) Y ( : , : , i ) , i = 1 , 2 , 3 ;
i.e., the joint time-vertex filtering operation defined in (11) is applied to each of the noisy x, y, and z coordinates, using the same joint non-separable lowpass filter h tik , which is defined in the joint spectral domain as [50] (Equation (31))
h tik ( λ G , λ R ) = 1 1 + τ 1 λ G + τ 2 λ R .
We perform a grid search to find the values of the parameters τ 1 and τ 2 that minimize the relative error | | X denoised X | | F | | X | | F averaged over multiple realizations of the noise. In Figure 11g, we show the resulting filter h tik ( λ G , λ R ) with τ 1 = 7.20 and τ 2 = 0.45 on the joint spectrum of the graph shown in Figure 11e. The dashed black line in Figure 11h shows the relative error between the denoised sequence computed exactly in (15) and the original dynamic 3D mesh sequence, averaged over 20 realizations of the additive Gaussian noise. The other three curves in the same image show the average relative error when the computation in (15) is approximated by the fast Fourier Chebyshev method of [50], the Chebyshev2D method of [64], and the spectrum-adapted approximate time-vertex filtering (fast Fourier weighted least squares) of Algorithm 3, for different values of the polynomial degree K. All three approximations converge to the exact solution and corresponding error, but at low degrees ( K 10 ), the spectrum-adapted fast Fourier weighted least squares yields a better approximation. Figure 11c–d display examples of the denoised mesh at a single time ( t = 5 ) resulting from two of these approximations, with K = 6 . The difference between the two is subtle, but can be seen, e.g., by scanning the top of the dog, where the mesh points form a slightly smoother surface.

7. Conclusions

We presented two novel spectrum-adapted polynomial methods for approximating f ( A ) b for large sparse matrices: spectrum-adapted interpolation and spectrum-adapted weighted least squares/orthogonal polynomial expansion. These methods leverage an estimation of the cumulative spectral density of the matrix to build polynomials of a fixed order K that yield better approximations to f in the higher density regions of the matrix spectrum. In terms of approximation accuracy, numerical experiments showed that, relative to the state-of-the-art Lanczos and Chebysev polynomial approximation techniques, the proposed methods often yield more accurate approximations at lower polynomial orders; however, the proposed spectrum-adapted interpolation method is not very stable at higher degrees ( K > 10 ) due to overfitting. The proposed spectrum-adapted weighted least squares method performs particularly well in terms of accuracy for matrices with many distinct interior eigenvalues, whereas the Lanczos method, e.g., is often more accurate when K is higher and/or the matrix A has many extreme eigenvalues. One potential extension would be to investigate a hybrid method that combines the Lanczos and spectrum-adapted weighted least squares approaches. We did not notice consistent trends regarding relative approximation accuracy with respect to the shape of the function f.
In terms of computational complexity, the cost of our methods, like Chebyshev polynomial approximation, is O ( M K ) , where M = n n z ( A ) . For very large, sparse matrices, this complexity reduces to O ( N K ) , where A is an N × N matrix. The Lanczos method, on the other hand, incurs an additional O ( N K 2 ) cost due to the orthogonalization step, making it more expensive for large enough K. Finally, the proposed spectrum-adapted methods, like the Chebyshev approximation, are amenable to efficient parallel and distributed computation via communication between neighboring nodes [33]. The inner products of the Lanczos method, on the other hand, may lead to additional communication expense or severe loss of efficiency in certain distributed computation environments (e.g., GPUs).

Author Contributions

Conceptualization, T.F., D.I.S., S.U., and Y.S.; data curation, T.F., D.I.S., and S.U.; formal analysis, T.F., D.I.S., and S.U.; funding acquisition, D.I.S.; investigation, T.F., D.I.S., and S.U.; methodology, T.F., D.I.S., S.U., and Y.S.; project administration, D.I.S.; software, T.F., D.I.S., and S.U.; supervision, D.I.S.; validation, T.F., D.I.S., and S.U.; visualization, T.F., D.I.S., and S.U.; writing—original draft, T.F., D.I.S., and S.U.; writing—review and editing, T.F., D.I.S., S.U., and Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We thank the anonymous reviewers for their constructive comments on an earlier version of this article.

Conflicts of Interest

The authors declare no conflict of interest.

Reproducible Research

MATLAB code for all numerical experiments in this paper is available at http://www.macalester.edu/~dshuman1/publications.html. It leverages the open access GSPBox [57].

References

  1. Smola, A.J.; Kondor, R. Kernels and Regularization on Graphs. In Learning Theory and Kernel Machines; Lecture Notes in Computer Science; Schölkopf, B., Warmuth, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 144–158. [Google Scholar]
  2. Belkin, M.; Matveeva, I.; Niyogi, P. Regularization and Semi-Supervised Learning on Large Graphs. In Learnning Theory; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2004; pp. 624–638. [Google Scholar]
  3. Zhou, D.; Bousquet, O.; Lal, T.N.; Weston, J.; Schölkopf, B. Learning with Local and Global Consistency; Advances Neural Information Processing Systems; Thrun, S., Saul, L., Schölkopf, B., Eds.; MIT Press: Cambridge, MA, USA, 2004; Volume 16, pp. 321–328. [Google Scholar]
  4. Shuman, D.I.; Narang, S.K.; Frossard, P.; Ortega, A.; Vandergheynst, P. The Emerging Field of Signal Processing on Graphs: Extending High-Dimensional Data Analysis to Networks and Other Irregular Domains. IEEE Signal Process. Mag. 2013, 30, 83–98. [Google Scholar] [CrossRef] [Green Version]
  5. Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering; Advances Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2016; pp. 3844–3852. [Google Scholar]
  6. Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric Deep Learning: Going Beyond Euclidean Data. IEEE Signal Process. Mag. 2017, 34, 18–42. [Google Scholar] [CrossRef] [Green Version]
  7. Tremblay, N.; Puy, G.; Gribonval, R.; Vandergheynst, P. Compressive Spectral Clustering. Proc. Int. Conf. Mach. Learn. 2016, 48, 1002–1011. [Google Scholar]
  8. Orecchia, L.; Sachdeva, S.; Vishnoi, N.K. Approximating the Exponential, the Lanczos Method and an Õ(m)-Time Spectral Algorithm for Balanced Separator. Proc. ACM Symp. Theory Comput. 2012, 1141–1160. [Google Scholar] [CrossRef]
  9. Lin, L.; Saad, Y.; Yang, C. Approximating Spectral Densities of Large Matrices. SIAM Rev. 2016, 58, 34–65. [Google Scholar] [CrossRef]
  10. Ubaru, S.; Saad, Y. Fast Methods for Estimating the Numerical Rank of Large Matrices. Proc. Int. Conf. Mach. Learn. 2016, 48, 468–477. [Google Scholar]
  11. Ubaru, S.; Saad, Y.; Seghouane, A.K. Fast Estimation of Approximate Matrix Ranks Using Spectral Densities. Neural Comput. 2017, 29, 1317–1351. [Google Scholar] [CrossRef]
  12. Ubaru, S.; Chen, J.; Saad, Y. Fast Estimation of tr(f(A)) Via Stochastic Lanczos Quadrature. SIAM J. Matrix Anal. Appl. 2017, 38, 1075–1099. [Google Scholar] [CrossRef]
  13. Han, I.; Malioutov, D.; Avron, H.; Shin, J. Approximating Spectral Sums of Large-Scale Matrices Using Stochastic Chebyshev Approximations. SIAM J. Sci. Comput. 2017, 39, A1558–A1585. [Google Scholar] [CrossRef] [Green Version]
  14. Arora, S.; Kale, S. A Combinatorial, Primal-Dual Approach to Semidefinite Programs. Proc. ACM Symp. Theory Comput. 2007, 227–236. [Google Scholar] [CrossRef]
  15. Sachdeva, S.; Vishnoi, N.K. Faster Algorithms Via Approximation Theory. Found. Trends Theor. Comput. Sci. 2014, 9, 125–210. [Google Scholar] [CrossRef]
  16. Hochbruck, M.; Lubich, C.; Selhofer, H. Exponential Integrators for Large Systems of Differential Equations. SIAM J. Sci. Comput. 1998, 19, 1552–1574. [Google Scholar] [CrossRef] [Green Version]
  17. Friesner, R.A.; Tuckerman, L.; Dornblaser, B.; Russo, T.V. A Method for Exponential Propagation of Large Systems of Stiff Nonlinear Differential Equations. J. Sci. Comput. 1989, 4, 327–354. [Google Scholar] [CrossRef]
  18. Gallopoulos, E.; Saad, Y. Efficient Solution of Parabolic Equations by Krylov Approximation Methods. SIAM J. Sci. Stat. Comput. 1992, 13, 1236–1264. [Google Scholar] [CrossRef] [Green Version]
  19. Higham, N.J. Functions of Matrices; SIAM: Philadelphia, PA, USA, 2008. [Google Scholar]
  20. Davies, P.I.; Higham, N.J. Computing f(A)b for Matrix Functions f. In QCD and Numerical Analysis III; Springer: Berlin/Heidelberg, Germany, 2005; pp. 15–24. [Google Scholar]
  21. Frommer, A.; Simoncini, V. Matrix Functions. In Model Order Reduction: Theory, Research Aspects and Applications; Springer: Berlin/Heidelberg, Germany, 2008; pp. 275–303. [Google Scholar]
  22. Moler, C.; Van Loan, C. Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later. SIAM Rev. 2003, 45, 3–49. [Google Scholar] [CrossRef]
  23. Druskin, V.L.; Knizhnerman, L.A. Two Polynomial Methods of Calculating Functions of Symmetric Matrices. USSR Comput. Maths. Math. Phys. 1989, 29, 112–121. [Google Scholar] [CrossRef]
  24. Saad, Y. Filtered Conjugate Residual-Type Algorithms with Applications. SIAM J. Matrix Anal. Appl. 2006, 28, 845–870. [Google Scholar] [CrossRef] [Green Version]
  25. Chen, J.; Anitescu, M.; Saad, Y. Computing f(A)b Via Least Squares Polynomial Approximations. SIAM J. Sci. Comp. 2011, 33, 195–222. [Google Scholar] [CrossRef] [Green Version]
  26. Druskin, V.; Knizhnerman, L. Extended Krylov Subspaces: Approximation of the Matrix Square Root and Related Functions. SIAM J. Matrix Anal. Appl. 1998, 19, 755–771. [Google Scholar] [CrossRef]
  27. Eiermann, M.; Ernst, O.G. A Restarted Krylov Subspace Method for the Evaluation of Matrix Functions. SIAM J. Numer. Anal. 2006, 44, 2481–2504. [Google Scholar] [CrossRef] [Green Version]
  28. Afanasjew, M.; Eiermann, M.; Ernst, O.G.; Güttel, S. Implementation of a Restarted Krylov Subspace Method for the Evaluation of Matrix Functions. Lin. Alg. Appl. 2008, 429, 2293–2314. [Google Scholar] [CrossRef]
  29. Frommer, A.; Lund, K.; Schweitzer, M.; Szyld, D.B. The Radau-Lanczos Method for Matrix Functions. SIAM J. Matrix Anal. Appl. 2017, 38, 710–732. [Google Scholar] [CrossRef] [Green Version]
  30. Golub, G.H.; Van Loan, C.F. Matrix Computations; Johns Hopkins University Press: Baltimore, MD, USA, 2013. [Google Scholar]
  31. Mason, J.C.; Handscomb, D.C. Chebyshev Polynomials; Chapman and Hall: London, UK, 2003. [Google Scholar]
  32. Trefethen, L.N. Approximation Theory and Approximation Practice; SIAM: Philadelphia, PA, USA, 2013. [Google Scholar]
  33. Shuman, D.I.; Vandergheynst, P.; Kressner, D.; Frossard, P. Distributed Signal Processing via Chebyshev Polynomial Approximation. IEEE Trans. Signal Inf. Process. Netw. 2018, 4, 736–751. [Google Scholar] [CrossRef] [Green Version]
  34. Hammond, D.K.; Vandergheynst, P.; Gribonval, R. Wavelets on Graphs Via Spectral Graph Theory. Appl. Comput. Harmon. Anal. 2011, 30, 129–150. [Google Scholar] [CrossRef] [Green Version]
  35. Segarra, S.; Marques, A.G.; Ribeiro, A. Optimal Graph-Filter Design and Applications to Distributed Linear Network Operators. IEEE Trans. Signal Process. 2017, 65, 4117–4131. [Google Scholar] [CrossRef]
  36. Van Mieghem, P. Graph Spectra for Complex Networks; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  37. Tao, T. Topics in Random Matrix Theory; American Mathematical Society: Providence, RI, USA, 2012. [Google Scholar]
  38. Silver, R.N.; Röder, H. Densities of States of Mega-Dimensional Hamiltonian Matrices. Int. J. Mod. Phys. C 1994, 5, 735–753. [Google Scholar] [CrossRef]
  39. Silver, R.N.; Röder, H.; Voter, A.F.; Kress, J.D. Kernel Polynomial Approximations for Densities of States and Spectral Functions. J. Comput. Phys. 1996, 124, 115–130. [Google Scholar] [CrossRef]
  40. Wang, L.W. Calculating the Density of States and Optical-Absorption Spectra of Large Quantum Systems by the Plane-Wave Moments Method. Phy. Rev. B 1994, 49, 10154. [Google Scholar] [CrossRef]
  41. Li, S.; Jin, Y.; Shuman, D.I. Scalable M-channel critically sampled filter banks for graph signals. IEEE Trans. Signal Process. 2019, 67, 3954–3969. [Google Scholar] [CrossRef] [Green Version]
  42. Girard, D. Un Algorithme Simple et Rapide Pour la Validation Croisée Généralisée sur des Problèmes de Grande Taille; Technical Report; Université Grenoble Alpes: St Martin d’Hères, France, 1987. [Google Scholar]
  43. Girard, A. A fast ’Monte-Carlo cross-validation’procedure for large least squares problems with noisy data. Numer. Math. 1989, 56, 1–23. [Google Scholar] [CrossRef]
  44. Di Napoli, E.; Polizzi, E.; Saad, Y. Efficient Estimation of Eigenvalue Counts in an Interval. Numer. Linear Algebra Appl. 2016, 23, 674–692. [Google Scholar] [CrossRef] [Green Version]
  45. Puy, G.; Pérez, P. Structured Sampling and Fast Reconstruction of Smooth Graph Signals. Inf. Inference A J. IMA 2018, 7, 657–688. [Google Scholar] [CrossRef] [Green Version]
  46. Shuman, D.I.; Wiesmeyr, C.; Holighaus, N.; Vandergheynst, P. Spectrum-Adapted Tight Graph Wavelet and Vertex-Frequency Frames. IEEE Trans. Signal Process. 2015, 63, 4223–4235. [Google Scholar] [CrossRef] [Green Version]
  47. Fritsch, F.N.; Carlson, R.E. Monotone Piecewise Cubic Interpolation. SIAM J. Numer. Anal. 1980, 17, 238–246. [Google Scholar] [CrossRef]
  48. Gleich, D. The MatlabBGL Matlab Library. Available online: http://www.cs.purdue.edu/homes/dgleich/packages/matlab_bgl/index.html (accessed on 11 November 2020).
  49. Stanford University Computer Graphics Laboratory. The Stanford 3D Scanning Repository. Available online: http://graphics.stanford.edu/data/3Dscanrep/ (accessed on 11 November 2020).
  50. Grassi, F.; Loukas, A.; Perraudin, N.; Ricaud, B. A Time-Vertex Signal Processing Framework: Scalable Processing and Meaningful Representations for Time-Series on Graphs. IEEE Trans. Signal Process. 2018, 66, 817–829. [Google Scholar] [CrossRef] [Green Version]
  51. Davis, T.A.; Hu, Y. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 2011, 38, 1:1–1:25. [Google Scholar] [CrossRef]
  52. Forsythe, G.E. Generation and Use of Orthogonal Polynomials for Data-Fitting with a Digital Computer. J. SIAM 1957, 5, 74–88. [Google Scholar] [CrossRef]
  53. Gautschi, W. Orthogonal Polynomials: Computation and Approximation; Oxford University Press: Oxford, UK, 2004. [Google Scholar]
  54. Gragg, W.B.; Harrod, W.J. The Numerically Stable Reconstruction of Jacobi Matrices from Spectral Data. Numer. Math. 1984, 44, 317–335. [Google Scholar] [CrossRef]
  55. Saad, Y. Analysis of Some Krylov Subspace Approximations to the Matrix Exponential Operator. SIAM J. Numer. Anal. 1992, 29, 209–228. [Google Scholar] [CrossRef]
  56. Golub, G.H.; Meurant, G. Matrices, Moments and Quadrature with Applications; Princeton University Press: Princeton, NJ, USA, 2010. [Google Scholar]
  57. Perraudin, N.; Paratte, J.; Shuman, D.I.; Kalofolias, V.; Vandergheynst, P.; Hammond, D.K. GSPBOX: A Toolbox for Signal Processing on Graphs. arXiv 2014, arXiv:1408.5781. Available online: https://epfl-lts2.github.io/gspbox-html/ (accessed on 11 November 2020).
  58. Shuman, D.I. Localized Spectral Graph Filter Frames: A Unifying Framework, Survey of Design Considerations, and Numerical Comparison. IEEE Signal Process. Mag. 2020, 37, 43–63. [Google Scholar] [CrossRef]
  59. Göbel, F.; Blanchard, G.; von Luxburg, U. Construction of tight frames on graphs and application to denoising. In Handbook of Big Data Analytics; Springer: Berlin/Heidelberg, Germany, 2018; pp. 503–522. [Google Scholar]
  60. de Loynes, B.; Navarro, F.; Olivier, B. Data-Driven Thresholding in Denoising with Spectral Graph Wavelet Transform. arXiv 2019, arXiv:1906.01882. [Google Scholar]
  61. Puy, G.; Tremblay, N.; Gribonval, R.; Vandergheynst, P. Random sampling of bandlimited signals on graphs. Appl. Comput. Harmon. Anal. 2018, 44, 446–475. [Google Scholar] [CrossRef] [Green Version]
  62. Benzi, M.; Boito, P. Quadrature Rule-Based Bounds for Functions of Adjacency Matrices. Lin. Alg. Appl. 2010, 433, 637–652. [Google Scholar] [CrossRef] [Green Version]
  63. Bekas, C.; Kokiopoulou, E.; Saad, Y. An Estimator for the Diagonal of a Matrix. Appl. Numer. Math. 2007, 57, 1214–1229. [Google Scholar] [CrossRef] [Green Version]
  64. Loukas, A.; Foucard, D. Frequency Analysis of Time-Varying Graph Signals. In Proceedings of the 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Washington, DC, USA, 7–9 December 2016; pp. 346–350. [Google Scholar]
  65. Perraudin, N.; Loukas, A.; Grassi, F.; Vandergheynst, P. Towards Stationary Time-Vertex Signal Processing. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 3914–3918. [Google Scholar]
  66. Grady, L.J.; Polimeni, J.R. Discrete Calculus; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
  67. Isufi, E.; Loukas, A.; Simonetto, A.; Leus, G. Separable Autoregressive Moving Average Graph-Temporal Filters. In Proceedings of the 2016 24th European Signal Processing Conference (EUSIPCO), Budapest, Hungary, 29 August–2 September 2016; pp. 200–204. [Google Scholar]
Figure 1. Degree 5 polynomial approximations of the function f ( λ ) = e λ of the graph Laplacian of a random Erdös–Renyi graph with 500 vertices and edge probability 0.2. The discrete least squares approximation incurs larger errors in the lower end of the spectrum. However, since the eigenvalues are concentrated at the upper end of the spectrum, it yields a lower approximation error | | f ( A ) p 5 ( A ) | | 2 .
Figure 1. Degree 5 polynomial approximations of the function f ( λ ) = e λ of the graph Laplacian of a random Erdös–Renyi graph with 500 vertices and edge probability 0.2. The discrete least squares approximation incurs larger errors in the lower end of the spectrum. However, since the eigenvalues are concentrated at the upper end of the spectrum, it yields a lower approximation error | | f ( A ) p 5 ( A ) | | 2 .
Algorithms 13 00295 g001
Figure 2. Estimated and actual cumulative spectral density functions for eight real, symmetric matrices A. We estimate the eigenvalue counts for S = 10 linearly spaced points on [ λ ̲ , λ ¯ ] via (6), with degree K Θ = 30 polynomials and J = 10 random vectors x ( j ) .
Figure 2. Estimated and actual cumulative spectral density functions for eight real, symmetric matrices A. We estimate the eigenvalue counts for S = 10 linearly spaced points on [ λ ̲ , λ ¯ ] via (6), with degree K Θ = 30 polynomials and J = 10 random vectors x ( j ) .
Algorithms 13 00295 g002
Figure 3. Construction of six interpolation points for the same graph Laplacian matrix described in Figure 1. The interpolation points { x k } on the horizontal axis are computed by applying the inverse of the estimated cumulative spectral density function to the initial Chebyshev points { y k } on the vertical axis.
Figure 3. Construction of six interpolation points for the same graph Laplacian matrix described in Figure 1. The interpolation points { x k } on the horizontal axis are computed by applying the inverse of the estimated cumulative spectral density function to the initial Chebyshev points { y k } on the vertical axis.
Algorithms 13 00295 g003
Figure 4. Comparison of (top) the first six discrete orthogonal polynomials defined in (8), adapted to the estimated cumulative spectral density of the random Erdös–Renyi graph from Figure 1, Figure 2 and Figure 3, to (bottom) the first six standard shifted Chebyshev polynomials with degree k = 0 to 5. In the region of high spectral density, shown in the zoomed boxes on the right, the discrete orthogonal polynomials feature more oscillation while maintaining small amplitudes, enabling better approximation of smooth functions in this region.
Figure 4. Comparison of (top) the first six discrete orthogonal polynomials defined in (8), adapted to the estimated cumulative spectral density of the random Erdös–Renyi graph from Figure 1, Figure 2 and Figure 3, to (bottom) the first six standard shifted Chebyshev polynomials with degree k = 0 to 5. In the region of high spectral density, shown in the zoomed boxes on the right, the discrete orthogonal polynomials feature more oscillation while maintaining small amplitudes, enabling better approximation of smooth functions in this region.
Algorithms 13 00295 g004
Figure 5. Approximations of f ( A ) b with b = V 1 and f ( λ ) equal to e λ (first two columns) and lowpass, bandpass, and highpass spectral graph filters (last three columns, respectively).
Figure 5. Approximations of f ( A ) b with b = V 1 and f ( λ ) equal to e λ (first two columns) and lowpass, bandpass, and highpass spectral graph filters (last three columns, respectively).
Algorithms 13 00295 g005aAlgorithms 13 00295 g005b
Figure 6. Spectral graph filter bank examples. (a) A set of five uniform translates of an itersine kernel sin π 2 cos 2 ( π x ) [46,57]. (b) A set of four log-warped translates (also called octave-band or wavelet filters) [46]. We plot both sets of spectral graph filters on the spectrum of the Stanford bunny graph with N = 2503 vertices; however, the design only depends on the spectral range [ 0 , λ max ] , so the filters will look the same (except stretched) for all graphs considered, e.g., in Figure 5.
Figure 6. Spectral graph filter bank examples. (a) A set of five uniform translates of an itersine kernel sin π 2 cos 2 ( π x ) [46,57]. (b) A set of four log-warped translates (also called octave-band or wavelet filters) [46]. We plot both sets of spectral graph filters on the spectrum of the Stanford bunny graph with N = 2503 vertices; however, the design only depends on the spectral range [ 0 , λ max ] , so the filters will look the same (except stretched) for all graphs considered, e.g., in Figure 5.
Algorithms 13 00295 g006
Figure 7. Estimation of the norms of the atoms of spectral graph wavelet dictionaries. (a) Exact atom norms, { | | φ i , l | | 2 } , colored by the indices of the generating filters (shown in Figure 6), each of which is localized to every vertex on the bunny graph. (b) Comparison of the estimated norms, { ν i , l } , to the corresponding exact norms (shown in black). The atoms are sorted in descending order of exact norm to aid visual comparison. (c) The ratios of the estimated norms to the exact atom norms. (df) The mean relative errors across all atoms for dictionaries generated by the same set of filters on three different graphs, with varying polynomial approximation methods, polynomial degrees K, and numbers of random vectors J.
Figure 7. Estimation of the norms of the atoms of spectral graph wavelet dictionaries. (a) Exact atom norms, { | | φ i , l | | 2 } , colored by the indices of the generating filters (shown in Figure 6), each of which is localized to every vertex on the bunny graph. (b) Comparison of the estimated norms, { ν i , l } , to the corresponding exact norms (shown in black). The atoms are sorted in descending order of exact norm to aid visual comparison. (c) The ratios of the estimated norms to the exact atom norms. (df) The mean relative errors across all atoms for dictionaries generated by the same set of filters on three different graphs, with varying polynomial approximation methods, polynomial degrees K, and numbers of random vectors J.
Algorithms 13 00295 g007
Figure 8. A time-vertex signal defined on a sensor graph G with N = 64 vertices and T = 4 observations, visualized on a multilayer graph structure. Each layer is a copy of G observed at one time point.
Figure 8. A time-vertex signal defined on a sensor graph G with N = 64 vertices and T = 4 observations, visualized on a multilayer graph structure. Each layer is a copy of G observed at one time point.
Algorithms 13 00295 g008
Figure 9. Two time-vertex filters defined for a random Erdös–Renyi graph with 500 vertices and edge probability 0.2. (a) An ideal lowpass filter; (b) a wave filter. The blue dots correspond to the positions of eigenvalue pairs.
Figure 9. Two time-vertex filters defined for a random Erdös–Renyi graph with 500 vertices and edge probability 0.2. (a) An ideal lowpass filter; (b) a wave filter. The blue dots correspond to the positions of eigenvalue pairs.
Algorithms 13 00295 g009
Figure 10. Approximation errors | | h ( L G , L R ) X Y | | F | | h ( L G , L R ) X | | F for h ( L G , L R ) X when h is an ideal lowpass filter (12) and a wave filter (13).
Figure 10. Approximation errors | | h ( L G , L R ) X Y | | F | | h ( L G , L R ) X | | F for h ( L G , L R ) X when h is an ideal lowpass filter (12) and a wave filter (13).
Algorithms 13 00295 g010
Figure 11. Denoising of a time-varying sequence of 3D meshes of a walking dog. Top row: one element each of the dynamic sequences of (a) original, (b) noisy, and (c,d) denoised (using two different approximation methods) meshes. Bottom row: (e,f) a 5-nearest neighbor graph (and its spectral CDF) constructed from the entire sequence of noisy meshes, which is used to denoise the meshes at all times instances; (g) the joint time-vertex lowpass filter (16); and (h) the average relative errors between the original sequence of meshes and the denoised versions attained by exactly or approximately computing (15) with different polynomial approximation methods, for a range of polynomial degrees.
Figure 11. Denoising of a time-varying sequence of 3D meshes of a walking dog. Top row: one element each of the dynamic sequences of (a) original, (b) noisy, and (c,d) denoised (using two different approximation methods) meshes. Bottom row: (e,f) a 5-nearest neighbor graph (and its spectral CDF) constructed from the entire sequence of noisy meshes, which is used to denoise the meshes at all times instances; (g) the joint time-vertex lowpass filter (16); and (h) the average relative errors between the original sequence of meshes and the denoised versions attained by exactly or approximately computing (15) with different polynomial approximation methods, for a range of polynomial degrees.
Algorithms 13 00295 g011
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Fan, T.; Shuman, D.I.; Ubaru, S.; Saad, Y. Spectrum-Adapted Polynomial Approximation for Matrix Functions with Applications in Graph Signal Processing. Algorithms 2020, 13, 295. https://doi.org/10.3390/a13110295

AMA Style

Fan T, Shuman DI, Ubaru S, Saad Y. Spectrum-Adapted Polynomial Approximation for Matrix Functions with Applications in Graph Signal Processing. Algorithms. 2020; 13(11):295. https://doi.org/10.3390/a13110295

Chicago/Turabian Style

Fan, Tiffany, David I. Shuman, Shashanka Ubaru, and Yousef Saad. 2020. "Spectrum-Adapted Polynomial Approximation for Matrix Functions with Applications in Graph Signal Processing" Algorithms 13, no. 11: 295. https://doi.org/10.3390/a13110295

APA Style

Fan, T., Shuman, D. I., Ubaru, S., & Saad, Y. (2020). Spectrum-Adapted Polynomial Approximation for Matrix Functions with Applications in Graph Signal Processing. Algorithms, 13(11), 295. https://doi.org/10.3390/a13110295

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop