1. Introduction
Approximation using a sparse linear combination of elements from a fixed redundant family is actively used because of its concise representations and increased computational efficiency. It has been applied widely to signal processing, image compression, machine learning and PDE solvers (see [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10]). Among others, simultaneous sparse approximation has been utilized in signal vector processing and multi-task learning (see [
11,
12,
13,
14]). It is well known that the greedy-type algorithms are powerful tools for generating such sparse approximations (see [
15,
16,
17,
18,
19]). In particular, vector greedy algorithms are very efficient at approximating a given finite number of target elements simultaneously (see [
20,
21,
22,
23]). In this article, we propose a new vector greedy algorithm—the Vector Weak Rescaled Pure Greedy Algorithm (VWRPGA)—for simultaneous approximation. We estimate the error of the VWRPGA and show that its convergence rate on the convex hull of the dictionary is optimal.
Let X be a real Banach space with norm . We say a set of elements is a dictionary, if for each and . We assume that every dictionary is symmetric, i.e.,
If is the output of a greedy algorithm after m iterations, then the efficiency of the approximation can be measured by the decay of the error as . We are mainly concerned with the error . We want to know whether it tends to zero, as . If it indeed converges to zero, then what is the convergence rate? To solve these problems, we need the following classes of elements.
For a general dictionary
, we define the class of elements
and
as the closure of
. Let
be the union of the classes
over all
. Denote
. For
, we define its norm as
We recall some related results in a Hilbert space for the reason that this kind of space has priority in geometric features and practical applications. Let H be a real Hilbert space with an inner product and the norm .
The most natural greedy algorithm in a Hilbert space is the Pure Greedy Algorithm (PGA). This algorithm is also known as the Matching Pursuit in signal processing [
24]. We recall its definition from [
15].
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, choose an element
such that
Define the next approximant to be
and proceed to Step
.
The first upper bound on the rate of convergence of the PGA for
was obtained in [
15] as follows:
Later, the above estimate of the PGA was improved in [
25,
26] to
and
, where
s is the root of the equation
on the closed interval
It is known that
.
Note that when
is an ortho-normal basis of
H, it is not difficult to prove that for any
, there holds
In addition, there exists an element
(see [
27]) such that
Thus, inequality (
1) cannot be improved for ortho-normal bases. A natural question arises: does inequality (
1) hold for any dictionary
? Unfortunately, the answer is negative.
In fact, Livshitz and Temlyakov [
28] proved that there exists a dictionary
, a positive constant
C and an element
such that
This lower bound on the convergence rate of the PGA indicates that this algorithm does not attain the rate for all .
In [
15], the idea of the best approximation was introduced into the greedy algorithm, which formed the original idea of the Orthogonal Greedy Algorithm (OGA). In order to construct an approximation, the OGA takes the orthogonal projection of
f on the subspace generated by all the chosen
. We recall its definition from [
15].
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, choose an element
such that
Define the next approximant to be
and proceed to Step
, where
is the orthogonal projection onto
In [
15], it is shown that for any
, the output of the OGA(
H,
) satisfies
Note that when is an ortho-normal basis of H, the OGA(H, ) coincides with the PGA(H, ). So, the rate is sharp.
The Relaxed Greedy Algorithm (RGA) is also a modification of PGA. We recall its definition from [
15].
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, choose an element
such that
For
, define the next approximant to be
and proceed to Step
.
It is shown in [
15] that the RGA also achieves the rate
on
.
The Rescaled Pure Greedy Algorithm (RPGA) [
17] is another kind of greedy algorithm which makes a modification to the rescaling process to replace the original output of the PGA with
at each iteration. It is defined as follows.
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, choose an element
such that
With
define the next approximant to be
and proceed to Step
.
In [
17], the convergence rate of the RPGA was obtained as follows:
It is worth noting that the supremum of the inner product might not be attainable. To remedy this problem, the original condition on the selection of
is replaced by
where
. This is often referred to as the “weak” condition. The study on the weak version of the above algorithms can be found in [
15,
25,
29,
30].
Meanwhile, building simultaneous approximations for a given vector of elements brings about the so-called vector-type greedy algorithms. Instead of running the algorithm for a finite collection of elements
each time separately, the vector greedy algorithm manages to obtain a simultaneous approximation of all elements with a single run. Hence, the complexity of calculation and the storage of information can be reduced greatly. Now, it comes to the question of how well this type of algorithm can perform. Namely, we need to measure its efficiency via its error bound. The Vector Weak Pure Greedy Algorithm (VWPGA, which is also referred to as the Vector Weak Greedy Algorithm (VWGA)) and the Vector Weak Orthogonal Greedy Algorithm (VWOGA) have been introduced and studied in [
21,
22,
23].
We recall the definitions of the VWPGA and VWOGA from [
23] as follows. Let
and
be a given sequence.
VWPGA(H, ): is the target element.
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, choose an element
such that
Define the next approximant to be
and proceed to Step
.
VWOGA(H, ): is the target element.
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
. Let
be such that
Choose an element
such that
Define the next approximant to be
and proceed to Step
, where
is the orthogonal projection onto
:= span{
}.
We list the results on the convergence rate of the VWPGA and VWOGA in [
23] as follows.
Theorem 1. Let be a given real sequence. Then, for any , the output of the VWPGA satisfies Theorem 2. Let be a given real sequence. Then, for any , the output of the VWOGA satisfies Improvements to the above estimates are made in [
19,
21,
22]. The results indicate that the VWOGA achieves a better convergence rate on
than that of the VWPGA.
In [
23], the authors gave a sufficient condition of convergence for the VWPGA.
Theorem 3. Assume that . Then, for any dictionary and any finite elements , the VWPGA satisfies Motivated by these studies, we design the Vector Weak Rescaled Pure Greedy Algorithm (VWRPGA) and study its efficiency. The remainder of the paper is organized as follows. In
Section 2, we deal with the case of Hilbert spaces. In
Section 3, we deal with the case of Banach spaces. In
Section 4, we draw the conclusions. Below, we provide more details.
In
Section 2, we define the VWRPGA in Hilbert spaces and study its approximation properties. We first prove that
is the sufficient convergence condition of the VWRPGA for any
and
,
. This convergence condition is weaker than that of the VWPGA. Then, we prove that the error bound of the VWRPGA on
satisfies
When , we show that convergence rate of the VWRPGA on is , which is sharp. This convergence rate is better than that of the VWPGA. In particular, this advantage is more obvious when N is large. The VWRPGA is more efficient than VWOGA from the viewpoint of computational complexity. This is because, for N target elements, the VWRPGA only needs to solve N one-dimensional optimization problems, while the VWOGA involves N m-dimensional optimization problems.
In
Section 3, we define the VWRPGA for some uniformly smooth Banach spaces. The sufficient condition of the convergence of the VWRPGA is obtained in this case. It seems that this is the first result on the convergence analysis of the vector greedy algorithms in the Banach space setting. Then, we derive the error bound of the VWRPGA. The results show that the convergence rate of the VWRPGA on
is sharp. We compare the approximation properties of the VWRPGA with those of the Vector Weak Chebyshev Greedy Algorithm (VWCGA) and the Vector Weak Relaxed Greedy Algorithm (VWRGA). We show that the VWRPGA has better convergence properties than the VWRGA. Similarly, the computational complexity of the VWRPGA is essentially smaller than those of the VWCGA and VWRGA.
In
Section 4, we draw the conclusions of our study. Our results show that the VWRPGA is the simplest vector greedy algorithm for simultaneous approximation with the best convergence property and the optimal convergence rate. We also discuss the possible applications of the VWRPGA in multi-task learning and signal vector processing.
2. The VWRPGA for Hilbert Spaces
In this section, we define the VWRPGA in Hilbert spaces and obtain its sufficient condition of convergence together with an estimate of its error bound. Based on these results, we compare the VWRPGA with the VWPGA and the VWOGA.
Firstly, we recall the definition of the Weak Rescaled Pure Greedy Algorithm (WRPGA) in Hilbert spaces from [
17]. Let
,
be a given sequence. The WRPGA consists of the following stages:
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, choose an element
such that
With
define the next approximation to be
and proceed to Step
.
The error bound of the WRPGA has been obtained as follows.
Theorem 4 (see Theorem 4.1 in [
17]).
If , then the output of the WRPGA satisfies the error estimate Based on the WRPGA, we can define the VWRPGA. Let , be a given sequence. Now, we define the VWRPGA using the following steps:
VWRPGA(H, ): Given
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, let
be such that
Choose an element
such that
With
define the next approximation to be
and proceed to Step
.
We establish in this section two typical results on the approximation properties of the VWRPGA (H, ). We first give the sufficient condition for the convergence of the VWRPGA for any dictionary and any ,
Theorem 5. Assume . Then, the VWRPGA converges for any dictionary and any
In the proof of Theorem 5, we will reduce the approximation of the general element to that of the element from
. To this end, we recall from [
31] the following lemmas on the approximation properties of
.
Lemma 1. Let X be a Banach space and be a dictionary. Then, for any and any , there exists such thatandwith some number . Lemma 2. For any and any dictionary , we have Proof of Theorem 5. Note that is the orthogonal projection of onto the one-dimensional space . Thus, it is the best approximation of from .
Let
, be the residual of
. By the definition of
and the choice of
, we have
The latter inequality implies that is a decreasing sequence. According to the Monotone Convergence Theorem, we know that exists, .
We prove that
by contradiction. Assume
,
. Then, for any
m, we have
. By (
2), we obtain that
By the inequality
, we obtain that
Then, we come to obtain a lower estimate for .
Set
. In view of Lemma 1, we can find
such that
and
with some number
.
Since
is the orthogonal projection of
onto
, we have
Combining (
4) and (
5) with
, we obtain
Combining (
3) with (
6), we can obtain that
The assumption implies that as .
It is obvious that for . Hence, we obtain a contradiction, which proves this theorem. □
Remark 1. It is known from Theorem 2.1 in [32] that is also the necessary condition for the convergence of the VWRPGA. Remark 2. According to the Cauchy–Schwartz inequality, we know that On the other hand, taking , we notice that Therefore, the convergence condition of the VWPRGA is weaker than that of the VWPGA.
The following theorem gives the error bound of the VWRPGA () for .
Theorem 6. Let be a weakness sequence. If . Then, we have for the VWRPGA () Proof. We establish the approximation error of the VWRPGA based on the methods of [
25]. The main idea of this proof is that the VWRPGA can be seen as a realization of the WRPGA with a particular weakness sequence.
Let
. Under the assumption that
and the fact that the sequence
is decreasing, we have
Thus, we only need to prove the estimate below:
At step k, the VWRPGA chooses from the in terms of only one remainder from . Then, each has been used different times to choose when the VWRPGA carries on to step m. Now, we record the usage of .
For every
, denote
} (
is defined in the definition of VWRPGA). Then, we have
Using
we can find
such that
Next, let
. We have
Now, we only consider element
. For
, we can obtain
as an application of the WRPGA with the weakness sequence
given by
Therefore, by Theorem 4, we obtain
Together with (
7), we complete the proof of Theorem 6. □
We recall the theorem in [
21] about the error estimate of the VWPGA.
Theorem 7. Let be a decreasing sequence. Then, for any , the output of the VWPGA satisfies We observe from Theorem 7, for a fixed m, the error of the VWPGA increases as the number of the target elements increases. The exponent is close to zero as long as N is sufficiently large.
Taking in Theorem 7, we yield the following theorem, which gives the convergence rate of the VWPGA.
Theorem 8. Let , . Then, for any , the output of the VWPGA satisfies Again, by taking in Theorem 6, we obtain the following theorem.
Theorem 9. Let . If . Then, for the VWRPGA (), Remark 3. From Theorems 8 and 9, we see that the VWRPGA provides a significantly better convergence rate than the VWPGA. In particular, this advantage is more obvious when N is large.
Remark 4. It is known from Theorems 2 and 9 that the approximation property of the VWRPGA is almost the same as that of the VWOGA. While the VWRPGA is simpler than the VWOGA from the viewpoint of computational complexity. For N target elements, from the definitions of the algorithms, one can see that the VWOGA needs to solve N m-dimensional optimization problems. However, the VWRPGA only needs to solve N one-dimensional optimization problems. Then, this makes the VWRPGA easier to implement than the VWOGA in practical applications.
3. The VWRPGA for Banach Spaces
In this section, we consider the VWRPGA in the context of Banach spaces. We remark that there are two natural generations of the PGA in the case of Banach space
X: the X-greedy algorithm and the dual greedy algorithm. However, there are no general results on convergence and error bound of these two algorithms, cf. [
29]. On the other hand, the WOGA, WRGA, WRPGA and VWOGA have been successfully generalized to the case of Banach spaces. We first recall from [
31] the definition of the Weak Chebyshev Greedy Algorithm (WCGA), which is a natural generalization of the WOGA.
For any non-zero element
, we denote by
a norming functional for
f:
The existence of such a functional is guaranteed by the Hahn–Banach theorem. Let , be a given sequence. The WCGA is defined as follows.
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, choose an element
such that
Define to be the best approximant to f from and proceed to Step .
To estimate the error of WCGA, we shall utilize some geometric aspects of Banach spaces. For a Banach space
X, we define
, the modulus of smoothness of
X, as
A uniformly smooth Banach space is one with the property
We shall only consider Banach spaces whose modulus of smoothness satisfies the inequality
where
is a constant independent of
u.
A typical example of a uniformly smooth Banach space is the Lebesgue space
,
. It is known from [
33] that
Moreover, we obtain from [
34] that for any
X with
,
and for any
X with
,
The following error bound of the WCGA on
has been established in [
31].
Theorem 10. Let X be a Banach space with modulus of smoothness . If , then the output of the WCGA() satisfies the inequalitywhere the constant depends only on q and γ. Taking
, Theorem 10 implies the following corollary, which can be seen in [
31].
Corollary 1. Let X be a Banach space with modulus of smoothness . Then, for any , the output of the WCGA(X, ) satisfies the inequality In order to show the convergence rate cannot be improved, we now take as an example.
Let
be fixed. Combining Corollary 1 with inequality (
8), we have for any
and any
When
is a wavelet basis of
, it is known from [
35] that there is a
such that
Thus, inequality (
9) could not be improved.
Similarly, let
be fixed. Combining Corollary 1 with inequality (
8), we have for any
and any
When
is the trigonometric system of
, it is known from [
36] that there is a
such that
Thus, inequality (
10) could not be improved.
Then, the convergence rate in Corollary 1 serves as a benchmark for the performance of greedy algorithms in uniformly smooth Banach spaces.
Next, we recall the definition of the WRGA in the Banach space setting from [
31]. Let
,
, be a given sequence. The WRGA is defined as follows.
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, choose a element
such that
Find
such that
Define and proceed to Step .
The following error bound of the WRGA on
has been established in [
31].
Theorem 11. Let X be a Banach space with a modulus of smoothness . If , then the output of the WRGA () satisfies the inequalitywhere the constant depends only on q and γ. Now, we turn to the vector greedy algorithms. Let
,
be a given sequence. The Vector Weak Chebyshev Greedy Algorithm (VWCGA) [
22] is defined as follows.
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, let
be such that
Choose an element
such that
Define to be the best approximant to from , and proceed to Step .
Let
,
be a given sequence. The Vector Weak Relaxed Greedy Algorithm (VWRGA) [
22] is defined as follows.
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, let
be such that
Choose an element
such that
Find
such that
Define and proceed to Step .
The error bounds of the VWCGA and VWRGA on
have been established in [
22].
Theorem 12. Let X be a Banach space with a modulus of smoothness . Then, for a sequence and any , we have Now, we start to define the VWRPGA (
X,
). To accomplish this, we recall the definition of the WRPGA from [
17]. Let
X be a Banach space with a modulus of smoothness
. Let
,
be a given sequence.
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, choose an element
such that
Define the next approximant to be and proceed to Step .
The sufficient conditions for the convergence of the WRPGA in terms of the weakness sequence and the modulus of smoothness can be found in [
17]. Moreover, the following theorem gives the error bound of the WRPGA on
.
Theorem 13 (see Theorem 6.1 in [
17]).
Let X be a Banach space with modulus of smoothness . If , then the output of the WRPGA() satisfies the inequalitywhere the constant depends only on q and γ. Let X be a Banach space with modulus of smoothness . Let , be a given sequence. We define the VWRPGA (X, ) as follows.
VWRPGA(X, ): Given
•Step 0: Define .
•Step m:
- If , stop the algorithm and define for .
- If
, let
be such that
Choose an element
such that
Define the next approximant to be and proceed to Step .
In this section, we obtain the convergence properties and error bound of the VWRPGA.
Firstly, we establish the theorem on the convergence of the VWRPGA. It seems that this theorem is the first result on the convergence property of the vector greedy algorithms in the Banach space setting.
Theorem 14. Let X be a Banach space with modulus of smoothness . Assume Then, for any and any dictionary the VWRPGA converges.
The idea of the proof of Theorem 14 is similar to that of Theorem 5. However, because of the complexity of Banach spaces, a series of arguments in the subsequence analysis must be modified, replaced, and generalized. Some useful results in the case of Hilbert spaces have been generalized to the case of Banach spaces, as shown in the following lemmas.
Lemma 3. Let X be a Banach space with modulus of smoothness . For any two nonzero elements and any , we have Proof. The proof of this lemma follows from the proof of Lemma 6.1 in [
29] and the fact that the modulus of smoothness of
X satisfies
□
Lemma 4. Let X be a Banach space with modulus of smoothness . Let be the output of the VWRPGA at Step for . If , then we have Proof. Denote
. By the definition of the VWRPGA
is the best approximant to
from
L for
. Thus, the conclusion of the lemma follows from Lemma 2.1 in [
31]. □
Lemma 5 (see Lemma 2.2 in [
31]).
For any bounded linear functional F and any dictionary from a Banach space, we have Now, we prove Theorem 14.
Proof of Theorem 14. Let
be the residual of
. It is known from the definition of the VWRPGA
that
satisfies
We apply Lemma 3 to the latter inequality with
and obtain
By the choice of
, we have
Thus, it is easy to see that is a decreasing sequence. According to the Monotone Convergence Theorem, we know that for exists.
Next, we prove that
by contradiction. Assume
. Then, for any
m, we have
. By (
11), we obtain that
By the inequality
, we obtain
Then, we proceed with a lower estimate for .
By Lemma 1, we set
and find
such that
and
with some number
.
We obtain from Lemma 5 that
Inequalities (
14), (
15) and
result in
Combining (
12) with (
16), we obtain
Combining (
17) with (
13), we have
The assumption implies that as .
Thus, for We obtain a contradiction which proves this theorem. □
Remark 5. According to Theorem 3.1 in [32], we know that is also a necessary condition for the convergence of the VWRPGA. Remark 6. Since the WRGA only converges for the target elements from , see [31], then the VWRGA also only converges for the target elements from . Thus, the convergence property of the VWRPGA is better than that of the VWRGA. Remark 7. For , the sufficient condition for the convergence of the VWCGA follows from Theorem 12. Theorem 15 gives the convergence condition for any .
By using the same method, it is not difficult to prove the following theorem on the convergence of the VWCGA.
Theorem 15. Let X be a Banach space with modulus of smoothness . Assume Then, for any and any dictionary the VWCGA converges.
Next, we give the theorem about the error bound of the VWRPGA() on .
Theorem 16. Let X be a Banach space with modulus of smoothness . Then, for a sequence and any , we have Proof. It is known from (
11) that the sequences
are decreasing. Fix
i. The inequality
follows from the assumption
and the fact that
is decreasing.
Thus, we only need to prove the following estimate:
We define the set
just as we did in proof of Theorem 6. It is obvious that
Thus, there exists
such that
As in proof of Theorem 6, for
, the sequences
are the outputs of the WRPGA with the weakness sequence
. Therefore, using Theorem 13, we obtain
The proof of Theorem 16 is completed. □
Remark 8. We know from Theorems 12 and 16 that the error bound of the VWRPGA is almost the same as those of the VWCGA and VWRGA. Similarly, the computational complexity of the VWRPGA is essentially smaller than those of the VWCGA and VWRGA.