Next Article in Journal
Cayley Graphs Defined by Systems of Equations
Previous Article in Journal
A Comparison of Parallel Algorithms for Numerical Solution of Parabolic Problems with Fractional Power Elliptic Operators
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Smallest Singular Value Anomaly and the Condition Number Anomaly

Hydrological Service, P.O. Box 36118, Jerusalem 91360, Israel
Axioms 2022, 11(3), 99; https://doi.org/10.3390/axioms11030099
Submission received: 18 January 2022 / Revised: 10 February 2022 / Accepted: 15 February 2022 / Published: 25 February 2022
(This article belongs to the Special Issue Numerical Linear Algebra with Applications in Data Analysis)

Abstract

:
Let A be an arbitrary matrix in which the number of rows, m, is considerably larger than the number of columns, n. Let the submatrix A i , i = 1 , , m , be composed of the first i rows of A. Let β i denote the smallest singular value of A i , and let k i denote the condition number of A i . In this paper, we examine the behavior of the sequences β 1 , , β m , and k 1 , , k m . The behavior of the smallest singular values sequence is somewhat surprising. The first part of this sequence, β 1 , , β n , is descending, while the second part, β n , , β m , is ascending. This phenomenon is called “the smallest singular value anomaly”. The sequence of the condition numbers has a different character. The first part of this sequence, k 1 , , k n , always ascends toward k n , which can be very large. The condition number anomaly occurs when the second part, k n , , k m , descends toward a value of k m , which is considerably smaller than k n . It is shown that this is likely to happen whenever the rows of A satisfy two conditions: all the rows are about the same size, and the directions of the rows scatter in some random way. These conditions hold in a wide range of random matrices, as well as other types of matrices. The practical importance of these phenomena lies in the use of iterative methods for solving large linear systems, since several iterative solvers have the property that a large condition number results in a slow rate of convergence, while a small condition number yields fast convergence. Consequently, a condition number anomaly leads to a similar anomaly in the number of iterations. The paper ends with numerical experiments that illustrate the above phenomena.

1. Introduction

Let A R m × n be a given matrix in which the number of rows, m, is considerably larger than the number of columns, n. Let the rows of A be denoted as a i T , i = 1 , , m , where a i R n . That is, A = [ a 1 , , a m ] T . Let
A i = [ a 1 , , a i ] T R i × n
be a submatrix of A, which is composed from the first i rows of A. Let α i denote the largest singular value of A i , let β i denote the smallest singular value of A i , and let k i denote the condition number of this matrix. In this paper, we investigate the behavior of the sequences α 1 , , α m , β 1 , , β m and k 1 , , k m . We start by showing that adding rows causes the largest singular value to increase,
α i α i + 1 for i = 1 , , m 1 ,
and study the reasons for the large, or small, increase. Next, we consider the behavior of the smallest singular values, which is somewhat surprising: at first, adding rows causes decreasing,
β i β i + 1 for i = 1 , , n 1 .
Then, as i passes n, adding rows increases the smallest singular value. That is,
β i β i + 1 for i = n , , m 1 .
This behavior is called “the smallest singular value anomaly”. The study of this phenomenon explains the reasons for large, or small, difference between β i + 1 and β i .
The last observation implies that β n is the smallest number in the sequence β 1 , , β m . Assume for simplicity that β n > 0 . In this case, β i β n > 0 for i = 1 , , m , and the ratio k i = α i / β i is the condition number of A i . This number affects the results of certain computations, such as the solution of linear equations, e.g., [1,2,3]. It is interesting, therefore, to examine the behavior of the sequence k i , i = 1 , , m . The inequalities (2) and (3) show that as i moves from 1 to n, the value of k i increases. That is,
k i k i + 1 for i = 1 , , n 1 .
However, as i passes n, both α i and β i are increasing, and the behavior of k i is not straightforward. The fact that the sequence β i , i = n , , m is increasing tempts one to expect that the sequence k i , i = n , , m will decrease. That is,
k i k i + 1 for i = n , , m 1 .
The situation in which (6) holds is called the condition number anomaly. In this case, the sequence k i , i = 1 , , n increases toward k n , which can be quite large, while the sequence k i , i = n , , m decreases toward a value of k m , which is considerably smaller than k n . The inequalities that assess the increase in the sequences (2) and (4) enable us to derive a useful bound on the ratio k i + 1 / k i . The bound explains the reasons behind the condition number anomaly, and characterizes situations that invite (or exclude) such behavior.
One type of matrices that exhibits the condition number anomaly is that of dense random matrices in which each element of the matrix is independently sampled from the same probability distribution. In particular, if each element of A comes from an independent standard normal distribution, then A is a Gaussian random matrix, and A A T is a Wishart matrix. The problem of estimating the largest and the smallest singular values of large Gaussian matrices has been studied by several authors. See [4,5,6,7,8,9,10,11,12,13,14] and the references therein. In this case, when n is very large and i > n , we have the estimates
α i / i 1 + n / i ,
β i / i 1 n / i ,
and
k i ( 1 + n / i ) / ( 1 n / i ) ,
which means that very large Gaussian matrices possess the condition number anomaly (for very large n and i = n we have β n 1 / n ; see [7,12]).
Our analysis shows that the condition number anomaly is not restricted to large Gaussian matrices. It is shared by a wide range of matrices, from small random matrices to large sparse matrices. The bounds that we derive have a simple geometric interpretation that helps to see what makes k n large and what forces the sequence k n , k n + 1 , , k m to decrease. Roughly speaking, the condition number anomaly is expected whenever all the rows of the matrix have about the same size and the directions of the rows are randomly scattered. The paper brings several numerical examples that illustrate this feature.
The practical interest in the condition number anomaly comes from the use of iterative methods for solving large sparse linear systems, e.g., [15,16,17,18]. Some of these methods have the property that the asymptotic rate of convergence depends on the condition number of the related matrix. That is, a large condition number results in slow convergence, while a small condition number yields fast convergence. Assume now that such a method is used to solve a linear system whose matrix has the condition number anomaly. Then the last property implies a similar anomaly in the number of iterations. This phenomenon is called “iterations anomaly”. The discussion in Section 5 demonstrates this property in the methods of Richardson, Cimmino, and Jacobi. See Table 12.

2. The Ascending Behavior of the Largest Singular Values

In this section, we investigate the behavior of the sequence α 1 , , α m . The first assertion establishes the ascending property of this sequence.
Theorem 1.
The sequence α 1 , , α m satisfies
α i 2 α i + 1 2 for i = 1 , , m 1 .
Proof. 
Observe that α i 2 is the largest eigenvalue of the matrix A i A i T , which is a principal submatrix of A i + 1 A i + 1 T . Hence, (10) is a direct consequence of the Cauchy interlace theorem. For statements and proofs of this theorem, see, for example, Refs. [2] (p. 441), [19] (p. 185), [20] (p. 149), [21] and [22] (p. 186). A second way to prove (10) is given below. This approach enables a closer inspection of the ascending process.
Here, we use the fact that α i 2 is the largest eigenvalue of the cross-product matrix A i T A i . Let the unit vector u i denote the corresponding dominant eigenvector of A i T A i . Then
A i T A i u i = α i 2 u i
and
u i T A i T A i u i = α i 2 = max x T A i T A i x | x R n and x 2 = 1 ,
where x 2 = ( x T x ) 1 / 2 denotes the Euclidean vector norm. Note also that
A i T A i = j = 1 i a j a j T ,
A i + 1 T A i + 1 = A i T A i + a i + 1 a i + 1 T ,
and
u i T A i + 1 T A i + 1 u i u i + 1 T A i + 1 T A i + 1 u i + 1 = α i + 1 2 .
Consequently,
α i + 1 2 u i T A i + 1 T A i + 1 u i = u i T A i T A i u i + ( u i T a i + 1 ) ( a i + 1 T u i ) = α i 2 + ( u i T a i + 1 ) 2 α i 2 ,
which proves (10). □
Next, we provide an upper bound on the increase in α i + 1 2 .
Theorem 2.
The inequality
α i + 1 2 α i 2 + a i + 1 2 2
holds for i = 1 , , m 1 .
Proof. 
Observe that
u i + 1 T A i T A i u i + 1 u i T A i T A i u i = α i 2 .
Hence a further use of (14) gives
α i + 1 2 = u i + 1 T A i + 1 T A i + 1 u i + 1 = u i + 1 T A i T A i u i + 1 + ( u i + 1 T a i + 1 ) 2 α i 2 + | u i + 1 T a i + 1 | 2 α i 2 + a i + 1 2 2 ,
which proves (17). The last inequality in (19) is due to the Cauchy–Schwartz inequality
| u i + 1 T a i + 1 | u i + 1 2 a i + 1 2
and the fact that u i + 1 2 = 1 . □
Combining (10) and (17) shows that
α i 2 α i + 1 2 α i 2 + a i + 1 2 2 .
This raises the question of for which directions of a i + 1 the value of α i + 1 2 attains its bounds. The key to answering this question lies in the following observation.
Lemma 1.
Assume for a moment that a i + 1 is an eigenvector of the matrix A i T A i . That is,
A i T A i a i + 1 = τ a i + 1
where τ R is a nonnegative scalar. In this case, the matrix A i + 1 T A i + 1 has the same set of eigenvectors as A i T A i . The eigenvector a i + 1 satisfies
A i + 1 T A i + 1 a i + 1 = ( τ + a i + 1 2 2 ) a i + 1 ,
while all the other eigenpairs remain unchanged.
Proof. 
Since a i + 1 is an eigenvector of A i T A i , substituting the spectral decomposition of A i T A i into (14) yields the spectral decomposition of A i + 1 T A i + 1 . □
The possibility that α i + 1 2 achieves its upper bound is characterized by the following assertion.
Theorem 3.
Assume for a moment that a i + 1 is a dominant eigenvector of A i T A i . In this case
α i + 1 2 = α i 2 + a i + 1 2 2 .
Otherwise, when a i + 1 is not pointing toward a dominant eigenvector,
α i + 1 2 < α i 2 + a i + 1 2 2 .
Proof. 
The first claim is a direct consequence of Lemma 1. To prove the second claim, we consider two cases. The first one occurs when u i + 1 = u i . Since a i + 1 is not at the direction of u i + 1 , in this case, there is a strict inequality in (20), which yields a strict inequality in (19). In the second case, u i + 1 u i , so now we have a strict inequality in (18), which leads to a strict inequality in (19). □
Finally, we consider the possibility that α i + 1 2 = α i 2 .
Theorem 4.
Assume that i n and that a i + 1 is an eigenvector of A i T A i , which corresponds to the smallest eigenvalue of this matrix. That is,
A i T A i a i + 1 = β i 2 a i + 1 .
If
a i + 1 2 2 α i 2 β i 2
then α i + 1 2 = α i 2 . Otherwise, when
a i + 1 2 2 > α i 2 β i 2
the value of α i + 1 2 satisfies
α i + 1 2 = β i 2 + a i 2 2 > α i 2 .
Proof. 
From Lemma 1, we obtain that a i + 1 is an eigenvector of A i + 1 T A i + 1 whose eigenvalue equals β i 2 + a i + 1 2 2 . Therefore, if (27) holds, then α i 2 remains the largest eigenvalue. Otherwise, when (28) holds, β i 2 + a i + 1 2 2 is the largest eigenvalue of A i + 1 T A i + 1 . □
The restriction i n is due to the fact that if i < n , then the smallest eigenvalue of A i T A i is always zero. The extension of Theorem 5 to cover this case is achieved by setting zero instead of β n 2 . Similar results are obtained when a i + 1 points to other eigenvectors of A i T A i .

3. The Smallest Singular Value Anomaly

In this section we explore the behavior of the smallest singular values. We shall start by proving that the sequence β 1 , , β n is descending. The proof uses the fact that for i n , the smallest eigenvalue of A i A i T is β i 2 .
Theorem 5.
For i = 1 , , n 1 , we have the inequality
β i 2 β i + 1 2 .
Proof. 
The matrix A i A i T is a principal submatrix of A i + 1 A i + 1 T . Hence, (30) is a direct corollary of the Cauchy interlace theorem. □
Next, we show that the sequence β n , β n + 1 , , β m , is ascending.
Theorem 6.
For i = n , n + 1 , , m 1 , we have the inequality
β i + 1 2 β i 2 .
Proof. 
One way to conclude (31) is by using the fact that A i A i T is a principal submatrix of A i + 1 A i + 1 T . Let
λ 1 ( A i A i T ) λ i ( A i A i T ) 0 ,
and
λ 1 ( A i + 1 A i + 1 T ) λ i ( A i + 1 A i + 1 T ) λ i + 1 ( A i + 1 A i + 1 T ) 0 ,
denote the eigenvalues of these matrices. Then, since i n , β i 2 = λ n ( A i A i T ) , β i + 1 2 = λ n ( A i + 1 , A i + 1 T ) , and (31) is a direct consequence of Cauchy interlace theorem.
As before, a second proof is obtained by comparing the matrices A i T A i and A i + 1 T A i + 1 , and this approach provides us with useful inequalities. Let the unit vector v i , i = n , , m , denote an eigenvector of A i T A i that corresponds to β i 2 . Then
A i T A i v i = β i 2 v i
and v i has the minimum property
v i T A i T A i v i = β i 2 = min { x T A i T A i x | x R n and x 2 = 1 } .
The last property implies the inequality
v i + 1 T A i T A i v i + 1 v i T A i T A i v i = β i 2 ,
while a further use of (14) gives
β i + 1 2 = v i + 1 T A i + 1 T A i + 1 v i + 1 = v i + 1 T A i T A i v i + 1 + ( v i + 1 T a i + 1 ) 2 v i T A i T A i v i + ( v i + 1 T a i + 1 ) 2 = β i 2 + ( v i + 1 T a i + 1 ) 2 β i 2 .
The inequality (35) implies that the growing of β i + 1 2 depends on the size of the scalar product v i + 1 T a i + 1 . Basically, it is difficult to estimate this product, but Lemma 1 and Theorem 5 give some insight. For example, if a i + 1 is an eigenvector of A i T A i whose eigenvalue differs from β i 2 , then β i + 1 2 = β i 2 . If a i + 1 is an eigenvector that corresponds to β i 2 , there are two possibilities to consider. If β i 2 is a multiple eigenvalue, then, again, β i + 1 2 = β i 2 . Otherwise, when β i 2 is a simple eigenvalue,
β i + 1 2 = β i 2 + min { δ , a i + 1 2 2 }
where δ > 0 is the difference between the two smallest eigenvalues of A i T A i .
We have seen that the sequence β 1 , , β n is descending, while the sequence β n , , β m is ascending. This behavior is called the smallest singular value anomaly. The fact that β n is the smallest singular value in the whole sequence raises the question of what makes β n small. Clearly, β n is always smaller than
min { a 1 2 , , a n 2 } .
Thus, to obtain a meaningful answer, we make the simplifying assumption
a i 2 = 1 for i = 1 , , m ,
which enables the following bounds.
Lemma 2.
Assume that (38) holds and define
μ = max { | a i T a j | | i = 1 , , n , j = 1 , , n , and i j } .
Then,
α n 2 1 + μ
and
β n 2 1 μ .
Proof. 
It is possible to assume that the above maximum is attained for the first two rows and that a 1 T a 2 = μ > 0 . In this case,
A 2 A 2 T = 1 μ μ 1 ,
and the eigenvalues of this matrix are λ 1 = 1 + μ and λ 2 = 1 μ . Therefore, since A 2 A 2 T is a principal submatrix of A n A n T , the Cauchy interlace theorem implies (40) and (41). □
Usually, the bound (41) is a crude estimate of β n . Yet, in some cases, it is the reason for a small value of β n .

4. The Condition Number Anomaly

In this section we investigate the behavior of the sequence k i = α i / β i , i = 1 , , m . The discussion is carried out under the assumption that β n > 0 , which ensures that β i > 0 for i = 1 , , m . We have seen that the sequence α 1 , , α n is ascending while the sequence β 1 , , β n is descending. This proves that the sequence k 1 , , k n is ascending. That is,
k i k i + 1 for i = 1 , , n 1 .
It is also known that the sequences α n , , α m and β n , , β m are ascending, but this does not provide decisive information about the behavior of the sequence k n , , k m . We shall start with examples that illustrate this point.
Example 1.
This example shows that k i + 1 can be larger than k i . For this purpose, consider the case when a i + 1 is a dominant eigenvector of A i T A i . Then from Lemma 1 we see that α i + 1 2 = α i 2 + a i + 1 2 2 but β i + 1 = β i , which means that k i + 1 > k i .
Example 2.
A similar situation arises when A has the following property. Assume that as i grows, the sequence of rows directions a i / a i 2 , i = 1 , 2 , converges rapidly toward some vector. In this case, the sequence u i , i = 1 , 2 , converges to the same vector, which brings us close to the situation of Example 1 (Tables 3 and 10 illustrate this possibility).
Example 3.
The third example shows that k i + 1 can be smaller than k i . Consider the case described in Theorem 4, when (27) holds. Here α i + 1 = α i , β i + 1 > β i and k i + 1 < k i . More reasons that force decrease are given in Corollary 1 below.
Example 4.
The fourth example describes a situation in which the condition number behaves in a cyclic manner. Let B R l × n be a given matrix with l n . Let the matrix A be obtained by duplicating B k times. That is, m = k × l and
A = [ B T , B T , , B T ] T R m × n .
Then when i takes the values i = j × l , j = 1 , , k , the matrix A i T A i has the form
A i T A i = j B T B .
Hence, for these values of i we have α i = j α l , and β i = j β l , but k i = k l .
The situation in which the sequence k n , , k m is descending,
k i k i + 1 for i = n , , m 1 ,
is called the condition number anomaly. The reasons behind this behavior are explained below.
Theorem 7.
Let the positive parameters η i and ν i be defined by the equalities
η i 2 = u i + 1 T a i + 1 2 / α i 2
and
ν i 2 = v i + 1 T a i + 1 2 / β i 2 .
Then, for i = n , , m 1 ,
k i + 1 2 k i 2 ( 1 + η i 2 ) / ( 1 + ν i 2 ) .
Proof. 
From (19), we see that
α i + 1 2 α i 2 + ( u i + 1 T a i + 1 ) 2 = α i 2 ( 1 + η i 2 ) .
Similarly from (35), we obtain
β i + 1 2 β i 2 + ( v i + 1 T a i + 1 ) 2 = β i 2 ( 1 + ν i 2 ) .
Hence, combining these inequalities gives (47). □
Corollary 1.
The inequality
ν i 2 η i 2
implies
k i k i + 1 .
The last corollary is a key observation that indicates at which situations the condition number anomaly is likely to occur. Assume for a moment that the direction of a i + 1 is chosen in some random way. Then, the scalar product terms ( u i + 1 T a i + 1 ) 2 and ( v i + 1 T a i + 1 ) 2 are likely to be about the same size. However, since β i 2 is (considerably) smaller than α i 2 , the term ( v i + 1 T a i + 1 ) 2 / β i 2 is expected to be larger than ( u i + 1 T a i + 1 ) 2 / α i 2 , which implies (50).
Summarizing the above discussion, we see that the condition number anomaly is likely to occur whenever the rows of the matrix satisfy two conditions: all the rows have about the same size, and the directions of the rows are scattered in some random way. This conclusion means that the phenomenon is shared by a wide range of matrices. The examples in Section 6 illustrate this point.

5. Iterations Anomaly

Let a i , A i , α i , β i and k i , i = 1 , , m , be as in the previous sections. Let b = ( b 1 , , b m ) T R m be an arbitrary given vector, which is used to define the vectors b i = ( b 1 , , b i ) T R i , i = 1 , , m . In this section, we examine how the condition number anomaly affects the convergence of certain iterative methods for solving a linear system of the form
A i x = b i .
We shall start by considering the Richardson method for solving the normal equations
A i T A i x = A i T b i ,
e.g., [16,17,18]. Given x k the k-th iteration, k = 1 , 2 , , of Richardson method has the form
x k + 1 = x k w A i T ( A i x k b i ) ,
where w > 0 is a pre-assigned relaxation parameter. Recall that A i T ( A i x k b i ) is the gradient vector of the least-squares objective function
F ( x ) = 1 / 2 A i x b i 2 2
at the point x k . Hence, iteration (54) can be viewed as a steepest descent method for minimizing F ( x ) that uses a fixed step length. An equivalent way to write (54) is
x k + 1 = ( I w A i T A i ) x k + w A i T b i ,
which shows that the rate of convergence of the method depends on the spectral radius of the iteration matrix
H w = I w A i T A i .
Let ρ ( H w ) denote the spectral radius of H w . Then the theory of iterative methods tells us that the method converges whenever
ρ ( H w ) < 1 ,
and the smaller ρ ( H w ) is, the faster the convergence; see, for example, Refs. [16,17,18]. Observe that the eigenvalues of H w lie in the interval [ 1 w α i 2 , 1 w β i 2 ] . This shows that (58) holds for values of w that satisfy
0 < w < 2 / α i 2 .
Furthermore, let w o p t denote the optimal value of w, for which ρ ( H w ) attains its smallest value. Then
w o p t = 2 / ( α i 2 + β i 2 )
and
ρ ( I w o p t A i T A i ) = ( 1 w o p t α i 2 ) = 2 α i 2 / ( α i 2 + β i 2 ) 1 = ( α i 2 β i 2 ) / ( α i 2 + β i 2 ) = ( k i 2 1 ) / ( k i 2 + 1 ) .
See [17] (pp. 22–23) and [18] (pp. 114–115) for detailed discussion of these results. Consequently, as k i increases, the spectral radius of the iteration matrix approaches 1, and the rate of convergence slows down. That is, the condition number anomaly results in a similar anomaly in the number of iterations. See Table 12.
Another useful iterative method for solving large sparse linear systems is the Cimmino method, e.g., [15,16,18,23]. Let the unit vectors
a ˜ i = a i / a i 2 , i = 1 , , m ,
be obtained by normalizing the rows of A. Let A ˜ i be an i × n matrix whose rows are a ˜ j , j = 1 , , i , and let D i denote the i × i diagonal matrix
D i = d i a g { a 1 T a 1 , , a i T a i } .
Then
A ˜ i = [ a ˜ 1 , , a ˜ i ] T = D i 1 / 2 A i
for i = 1 , , m . Similarly, we define b ˜ i = b i / a i 2 for i = 1 , , m , and b ˜ i = D i 1 / 2 b i . Then Cimmino method is aimed at solving the linear system
A ˜ i x = b ˜ i ,
or the related normal equations
A ˜ i T A ˜ i x = A ˜ i T b ˜ i .
The kth iteration of the Cimmino method, k = 1 , 2 , , has the form
x k + 1 = x k w j = 1 i ν j a ˜ j ( a ˜ j T x k b ˜ j ) ,
where w > 0 is a pre-assigned relaxation parameter, and ν j , j = 1 , , i are weighting parameters that satisfy
ν j > 0 for j = 1 , , i and ν 1 + + ν i = 1 .
Observe that the point
x k a ˜ j ( a ˜ j T x k b ˜ j )
is the projection of x k on the hyperplane { x R n | a j T x = b j } , and the point
x k j = 1 i ν j a ˜ j ( a ˜ j T x k b ˜ j )
is a weighted average of these projections. The usual way to apply the Cimmino method is with equal weights. That is, ν j = 1 / i for j = 1 , , i . This enables us to rewrite the Cimmino iteration in the form
x k + 1 = x k w A ˜ i T ( A ˜ i x k b ˜ i ) ,
which is the Richardson iteration for solving the normal equations (66). Therefore, from (61), we conclude that the optimal rate of convergence of the Cimmino method depends on the ratio
ρ ( I w o p t A ˜ i T A ˜ i ) = ( k ˜ i 2 1 ) / ( k ˜ i 2 + 1 ) ,
where k ˜ i is the condition number of A ˜ i .
Another example is the Jacobi iteration for solving the equations
A i A i T y = b i .
The basic iteration of this method has the form
y k + 1 = ( I w D i 1 A i A i T ) y k + w D i 1 b i ,
where D i is the diagonal matrix (63) and w > 0 is a pre-assigned relaxation parameter. Now the equalities
D i 1 / 2 ( I w D i 1 A i A i T ) D i 1 / 2 = I w D i 1 / 2 A i A i T D i 1 / 2 = I w A ˜ i A ˜ i T
indicate that the iteration matrix of the Jacobi method is similar to the matrix I w A ˜ i A ˜ i T . Hence, as before, the optimal rate of convergence depends on the ratio ( k ˜ i 2 1 ) / ( k ˜ i 2 + 1 ) . Thus, again, a condition number anomaly invites a similar anomaly in the number of iterations.
We shall finish this section by mentioning two further methods that share this behavior. The first one is the conjugate gradients algorithm for solving the normal Equations (53), whose rate of convergence slows down as the condition numbers of A increases. See, for example, Refs. [1] (pp. 312–314), [3] (pp. 299–300) and [18] (pp. 203–205). The second is Kaczmarz’s method, which is a popular “row-action” method; see Refs. [15,16,23,24,25]. The use of this method to solve A x = b is equivalent to the SOR method for solving the system A A T y = b , and both methods have the property that a small condition number results in fast convergence while a large condition number slows it [24,25].

6. Numerical Examples

In this section, we bring some examples that illustrate the actual behavior of the anomaly phenomena. The first examples consider small m × n matrices.
Table 1 describes the anomaly in a “two-ones” matrix. This matrix has m = n ( n 1 ) / 2 different rows. Each row has only two nonzero entries, and each nonzero entry has the value 1 (a matrix with n columns has at most n ( n 1 ) / 2 different rows of this type). This matrix exhibits a moderate anomaly, due to the fact that A n is well conditioned.
Table 2 describes the anomaly in a small m × n segment of the Hilbert matrix. Here, the ( i , j ) entry equals 1 / ( i + j 1 ) . Consequently, the sequence of rows directions a i / a i 2 , i = 1 , 2 , , converges slowly toward the vector e / e 2 , where e = ( 1 , 1 , , 1 ) T R n . Hence, the decrease in the sequence k n , k n + 1 , , k m is quite moderate.
In Table 3, we consider a small m × n segment of the Pascal matrix. Recall that the entries of this matrix are built in the following way: a 1 j = 1 for j = 1 , , n , and a i 1 = 1 for i = 1 , , m . The other entries are obtained from the rule
a i j = a i , j 1 + a i 1 , j for i = 2 , , m and j = 2 , , n .
In this matrix, the norm of the rows grows very fast while the sequence of row directions a i / a i 2 , i = 1 , 2 , converges rapidly toward the vector e n = ( 0 , 0 , , 0 , 1 ) T R n . Thus, as i becomes considerably larger than n, both a i / a i 2 and u i approach e n , which causes k i + 1 to be larger than k i .
The random matrices that are tested in Table 4 and Table 5 provide nice examples of the anomaly phenomenon. In these matrices, each entry is a random number from interval [ 1 , 1 ] . To generate these matrices, and the other random matrices, we used MATLAB’s command “rand”, whose random numbers generator is of uniform distribution. Similar results are obtained when “rand” is replaced with “randn”, which uses normal distribution.
The nonnegative random matrix that is tested in Table 6 is obtained by MATLAB’s command A = r a n d ( m , n ) . That is, here, each entry of A is a random number from the interval [ 0 , 1 ] . This yields a more ill-conditioned matrix and a sharper anomaly.
Table 7 and Table 8 consider a different type of random matrices. As its name says, the entries of the “ 1 or 1” matrix are either 1 or 1, with equal probability. In practice, the ( i , j ) entry, a i j , is defined in the following way. First, sample a random number, r, from the interval [ 1 , 1 ] . If r > 0 , then a i j = 1 ; otherwise a i j = 1 . The entries of the “0 or 1” matrix are defined in a similar manner: if r > 0 then a i j = 1 ; otherwise a i j = 0 . Both matrices display a strong anomaly. The “0 or 1” matrix is slightly more ill conditioned and, therefore, has a sharper anomaly.
The results of Table 9 and Table 10 are quite instructive. Both matrices are highly ill conditioned but display a different behavior. The “narrow range” matrix is a random matrix whose entries are sampled from the small interval [0.99, 1.01]. However, the directions of the rows are not converging, and the matrix displays a nice anomaly. The “converging rows” matrix is defined in a slightly different way. Here, the entries in the ith row, i = 1 , , m , are random numbers from the interval [ 1 1 / i , 1 + 1 / i ] . Hence, the related sequence of rows directions, a i / a i 2 , i = 2 , , converges toward the vector e / e 2 , which is the situation described in Example 2. Consequently, when i becomes much larger than n, we see a moderate increase in the value of k i .
Other matrices that possess the anomaly phenomena are large sparse matrices. The matrix in Table 11 is created by using MATLAB’s command A = sprand ( m , n , density) with m = 100,000, n = 10,000 and density = 100 / n . This way, each row of A has nearly 100 nonzero entries that have random values and random locations. Although not illustrated in this paper, our experience shows that the smaller the density, the sharper the anomaly.
Table 12 illustrates the iterations anomaly phenomenon when using the methods of Richardson, Cimmino, and Jacobi. The first two methods were used to solve linear systems of the form
A i x = b i , i = 1 , , m .
As before, each A i is an i × n submatrix that is composed from the first i rows of a given m × n matrix A. The construction of A is done in two steps. First, we generate a random matrix as in Table 4 and Table 5. Then the rows of the matrix are normalized to be unit vectors. The vector b i is defined by the product
b i = A i e ,
which ensures that e solves the linear system. Since A i has unit rows, Cimmino iteration (71) coincides with Richardson iteration (54). The value of w that we use is the optimal one,
w o p t = 2 / ( α i 2 + β i 2 ) ,
and the iterations start from the point x 0 = 0 . The iterative process is terminated as soon as the residual vector satisfies
A i x k b i 2 / b i 2 10 10 .
The number of iterations which are required to satisfy this condition is displayed in the last column of Table 12.
The Jacobi method was used to solve the linear systems (73), where A i and b i are defined as above. Since A i has unit rows, D i is a unit matrix, and Jacobi iteration (74) is reduced to
y k + 1 = y k w ( A i A i T y k b i ) .
The last iteration uses the optimal value of w, given in (79). It starts from the point y 0 = 0 and terminates as soon as the residual vector satisfies
A i A i T y k b i 2 / b i 2 10 10 .
The number of required iterations is nearly identical to that of the Richardson method. This is not surprising since multiplying (81) by A i T shows that the sequence
x k = A i T y k , k = 0 , 1 , 2 , ,
is generated by Richardson iteration (54) (there were only two minor exceptions: for i = 48 , the Jacobi method required 36,959 iterations instead of 36,960, while for i = 50 , the Jacobi required 6,772,151 iterations instead of 6,760,589. In all the other cases, the two methods required exactly the same number of iterations).
The figures in Table 12 demonstrate the close link between the condition number and the rate of convergence. As anticipated from (61), for large values of k i the spectral radius approaches 1 and the rate of convergence slows down. Thus, a large condition number results in a large number of iterations. Conversely, a small value of k i implies a small spectral radius and a small number of iterations. In other words, a condition number anomaly invites a similar anomaly in the number of iterations.
Usually, it is reasonable to assume that the computational effort in solving a linear system is proportional to the number of rows. That is, the more rows we have, the more computation time is needed. From this point of view, the iterations anomaly phenomenon is somewhat surprising, as solving a linear system with i = 10 n rows needs considerably less time than solving a linear system with i = n rows.

7. Concluding Remarks

As an old adage says, the whole is sometimes much more than the sum of its parts. The basic ascending (descending) properties of singular values are easily concluded from the Cauchy interlace theorem, while the inequalities that we derive enable us to see what causes a large, or small, increase. Combining these results gives a better overview of the whole situation. One consequence regards the anomalous behavior of the smallest singular values sequence β 1 , , β m , and the fact that β n is the smallest number in this sequence. The second observation is about the condition number anomaly. It is easy to conclude the increasing of the condition numbers sequence, k 1 , , k n , but Cauchy interlace theorem does not tell us how the rest of this sequence behaves. The answer is obtained by considering the expression for the ratio k i + 1 2 / k i 2 . This expression explains the reasons behind the condition number anomaly and characterizes situations that invite (or exclude) such behavior. We see that the anomaly phenomenon is likely to occur in “random-like” matrices whose rows satisfy two conditions: all the rows have about the same size and the directions of the rows scatter in some random way. This suggests that the condition number anomaly phenomenon is common in several types of matrices, and the numerical examples illustrate this point.
The practical importance of the condition number anomaly lies in the use of iterative methods for solving large linear systems. As we have seen, several iterative solvers have the property that the rate of convergence depends on the condition number. Therefore, when solving “random-like” systems, a fast rate of convergence is expected in under-determined or over-determined systems, while a slower rate is expected in (nearly) square systems.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Demmel, J.W. Applied Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [Google Scholar]
  2. Golub, G.H.; Loan, C.F.V. Matrix Computations, 4th ed.; Johns Hopkins University Press: Baltimore, MD, USA, 2013. [Google Scholar]
  3. Trefethen, L.N.; Bau, D., III. Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [Google Scholar]
  4. Tikhomirov, K. The smallest singular value of random rectangular matrices with no moment assumptions on entries. Israel J. Math. 2016, 212, 289–314. [Google Scholar] [CrossRef] [Green Version]
  5. Bai, Z.D.; Yin, Y.Q. Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. Ann. Probab. 1993, 21, 1275–1294. [Google Scholar] [CrossRef]
  6. Chen, Z.; Dongarra, J.J. Condition numbers of Gaussian random matrices. SIAM J. Matrix Anal. Appl. 2005, 27, 603–620. [Google Scholar] [CrossRef] [Green Version]
  7. Edelman, A. Eigenvalues and condition numbers of random matrices. SIAM J. Matrix Anal. Appl. 1988, 9, 543–560. [Google Scholar] [CrossRef] [Green Version]
  8. Marchenko, V.A.; Pastur, L.A. Distribution of eigenvalues of some sets of random matrices. Math. USSR-Sb. 1967, 1, 457–486. [Google Scholar] [CrossRef]
  9. Mendelson, S.; Paouris, G. On the singular values of random matrices. J. Eur. Math. Soc. 2014, 16, 823–834. [Google Scholar] [CrossRef]
  10. Rudelson, M.; Vershynin, R. Smallest singular value of a random rectangular matrix. Commun. Pure Appl. Math. 2009, 62, 1707–1739. [Google Scholar] [CrossRef] [Green Version]
  11. Silverstein, J. On the weak limit of the largest eigenvalue of a large dimensional sample covariance matrix. J. Multivar. Anal. 1989, 30, 307–311. [Google Scholar] [CrossRef] [Green Version]
  12. Szarek, S. Condition numbers of random matrices. J. Complex. 1991, 7, 131–149. [Google Scholar] [CrossRef] [Green Version]
  13. Tatarko, K. An upper bound on the smallest singular value of a square random matrix. J. Complex. 2018, 48, 119–128. [Google Scholar] [CrossRef] [Green Version]
  14. Zimmermann, R. On the condition number anomaly of Gaussian correlation matrices. Linear Algebr. Appl. 2014, 466, 512–526. [Google Scholar] [CrossRef]
  15. Censor, Y. Row-action methods for huge and sparse systems and their applications. SIAM Rev. 1981, 23, 444–466. [Google Scholar] [CrossRef]
  16. Dax, A. The convergence of linear stationary iterative processes for solving singular unstructured systems of linear equations. SIAM Rev. 1990, 32, 611–635. [Google Scholar] [CrossRef]
  17. Hageman, L.A.; Young, D.M. Applied Iterative Methods; Academic Press: New York, NY, USA, 1981. [Google Scholar]
  18. Saad, Y. Iterative Methods for Sparse Linear Systems, 2nd ed.; SIAM: Philadelphia, PA, USA, 2003. [Google Scholar]
  19. Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, MA, USA, 1985. [Google Scholar]
  20. Horn, R.A.; Johnson, C.R. Topics in Matrix Analysis; Cambridge University Press: Cambridge, MA, USA, 1991. [Google Scholar]
  21. Hwang, S.-G. Cauchy’s interlace theorem for eigenvalues of Hermitian matrices. Am. Math. Mon. 2004, 111, 157–159. [Google Scholar] [CrossRef]
  22. Parlett, B.N. The Symmetric Eigenvalue Problem; Prentice-Hall: Englewood Cliffs, NJ, USA, 1980. [Google Scholar]
  23. Censor, Y.; Zenios, S.A. Parallel Optimization, Theory, Algorithms, and Applications; Oxford University Press: Oxford, UK, 1997. [Google Scholar]
  24. Dax, A. The adventures of a simple algorithm. Linear Algebr. Appl. 2003, 361, 41–61. [Google Scholar] [CrossRef] [Green Version]
  25. Dax, A. Kaczmarz Anomaly: A Surprising Feature of Kaczmarz Method, Technical Report; Hydrological Service of Israel, 2021; in preparation.
Table 1. The anomaly in small “two-ones” matrix.
Table 1. The anomaly in small “two-ones” matrix.
Number of RowsNumber of ColumnsLargest Singular ValueSmallest Singular ValueCondition Number
i n α i β i k i
2101.7321.0001.732
4102.2361.0002.236
6102.6461.0002.646
8103.0001.0003.000
10103.1710.3778.416
15103.2860.5685.781
20103.4640.8743.963
30103.7851.2942.924
40104.0672.0481.986
45104.2432.8281.500
Table 2. The anomaly in small Hilbert matrix.
Table 2. The anomaly in small Hilbert matrix.
Number of RowsNumber of ColumnsLargest Singular ValueSmallest Singular ValueCondition Number
i n α i β i k i
251.3941.129 × 10 1 1.234 × 10 1
351.4815.428 × 10 3 2.727 × 10 2
451.5321.711 × 10 4 8.956 × 10 3
551.5673.288 × 10 6 4.766 × 10 5
651.5926.400 × 10 6 2.488 × 10 5
751.6119.679 × 10 6 1.664 × 10 5
1051.6481.928 × 10 5 8.549 × 10 4
1551.6793.231 × 10 5 5.197 × 10 4
2051.6964.182 × 10 5 4.056 × 10 4
3051.7145.422 × 10 5 3.161 × 10 4
5051.7236.689 × 10 5 2.585 × 10 4
10051.7467.845 × 10 5 2.219 × 10 4
Table 3. Failure of the anomaly in small Pascal matrix.
Table 3. Failure of the anomaly in small Pascal matrix.
Number of RowsNumber of ColumnsLargest Singular ValueSmallest Singular ValueCondition Number
i n α i β i k i
257.691 × 10 0 9.194 × 10 1 8.366 × 10 0
352.068 × 10 1 3.495 × 10 1 5.915 × 10 1
454.649 × 10 1 8.258 × 10 2 5.630 × 10 2
559.229 × 10 1 1.084 × 10 2 8.517 × 10 3
651.672 × 10 2 2.623 × 10 2 6.376 × 10 3
752.826 × 10 2 4.751 × 10 2 5.948 × 10 3
1051.020 × 10 3 1.391 × 10 1 7.331 × 10 3
2051.539 × 10 4 5.132 × 10 1 2.998 × 10 4
Table 4. The anomaly in small random matrix.
Table 4. The anomaly in small random matrix.
Number of RowsNumber of ColumnsLargest Singular ValueSmallest Singular ValueCondition Number
i n α i β i k i
4162.7721.4921.858
8163.1941.0063.189
12163.8170.28613.337
16164.0680.030136.876
20164.2580.29514.439
30164.6831.1793.971
50165.9562.2732.621
80166.8913.3382.064
120167.9604.4121.804
160168.8755.2911.677
Table 5. The anomaly in random matrix.
Table 5. The anomaly in random matrix.
Number of RowsNumber of ColumnsLargest Singular ValueSmallest Singular ValueCondition Number
i n α i β i k i
201007.9763.5052.276
401008.7822.6813.276
601009.6791.4766.556
8010010.3500.70614.653
10010010.8020.044245.346
12010011.4290.75315.183
15010012.1641.4718.270
20010013.4132.4095.569
30010015.3294.5373.379
40010016.8726.0842.773
50010018.4587.3012.528
100010023.26612.5931.879
Table 6. The anomaly in nonnegative random matrix.
Table 6. The anomaly in nonnegative random matrix.
Number of RowsNumber of ColumnsLargest Singular ValueSmallest Singular ValueCondition Number
i n α i β i k i
2010022.3091.83612.151
4010031.5321.31723.949
6010038.4620.65358.927
8010044.3550.359123.504
10010049.6340.0301662.0
12010054.5110.357152.782
15010061.1620.73483.301
20010070.7351.21158.391
30010086.7102.26938.213
400100100.2173.10232.306
500100112.0183.64930.702
1000100158.3446.29625.148
Table 7. The anomaly in random “−1 or 1” matrix.
Table 7. The anomaly in random “−1 or 1” matrix.
Number of RowsNumber of ColumnsLargest Singular ValueSmallest Singular ValueCondition Number
i n α i β i k i
2010013.5266.4722.090
4010015.6244.2053.715
6010016.9492.6476.404
8010018.0191.09216.503
10010019.2040.0355541.70
12010020.0951.10618.169
15010021.6002.6018.305
20010023.1724.0525.718
30010026.5367.3373.617
40010029.1779.9432.934
50010031.62612.8342.464
100010041.21621.8661.883
Table 8. The anomaly in random “0 or 1” matrix.
Table 8. The anomaly in random “0 or 1” matrix.
Number of RowsNumber of ColumnsLargest Singular ValueSmallest Singular ValueCondition Number
i n α i β i k i
2010022.4043.3376.714
4010031.7052.28313.886
6010038.6711.34428.776
8010044.7270.556080.443
10010050.0270.02072416.8
12010054.9630.4955110.91
15010061.7451.31447.001
20010071.2922.02635.192
30010087.3153.72023.469
400100100.784.98120.234
500100112.746.39817.620
1000100159.0710.98114.486
Table 9. The anomaly in random “narrow range” matrix.
Table 9. The anomaly in random “narrow range” matrix.
Number of RowsNumber of ColumnsLargest Singular ValueSmallest Singular ValueCondition Number
i n α i β i k i
2010044.7163.667 × 10 2 1.219 × 10 3
4010063.2402.621 × 10 2 2.413 × 10 3
6010077.4511.303 × 10 2 5.946 × 10 3
8010089.4327.184 × 10 3 1.245 × 10 4
10010099.9895.916 × 10 4 1.690 × 10 5
120100109.547.901 × 10 3 1.545 × 10 4
150100122.471.473 × 10 2 8.316 × 10 3
200100141.432.423 × 10 2 5.836 × 10 3
300100173.204.540 × 10 2 3.815 × 10 3
400100200.006.204 × 10 2 3.224 × 10 3
500100223.617.297 × 10 2 3.064 × 10 3
1000100316.231.259 × 10 1 2.511 × 10 3
Table 10. Failure of anomaly in matrix with “converging rows”.
Table 10. Failure of anomaly in matrix with “converging rows”.
Number of RowsNumber of ColumnsLargest Singular ValueSmallest Singular ValueCondition Number
i n α i β i k i
2010044.7002.373 × 10 1 1.883 × 10 2
4010063.2198.897 × 10 2 7.106 × 10 2
6010077.4273.257 × 10 2 2.378 × 10 3
8010089.4091.486 × 10 2 6.017 × 10 3
10010099.9681.061 × 10 3 9.418 × 10 4
120100109.511.018 × 10 2 1.076 × 10 4
150100122.451.781 × 10 2 6.875 × 10 3
200100141.402.536 × 10 2 5.576 × 10 3
300100173.183.333 × 10 2 5.189 × 10 3
400100199.983.724 × 10 2 5.369 × 10 3
500100223.593.934 × 10 2 5.684 × 10 3
1000100316.224.319 × 10 2 7.321 × 10 3
Table 11. The anomaly in large sparse random matrix.
Table 11. The anomaly in large sparse random matrix.
Number of RowsNumber of ColumnsLargest Singular ValueSmallest Singular ValueCondition Number
i n α i β i k i
200010,00023.2073.1227.434
400010,00032.2752.08515.479
600010,00039.2941.28530.571
800010,00045.2010.60574.710
900010,00047.8930.291164.79
10,00010,00050.4264.064 × 10 4 1.241 × 10 5
11,00010,00052.8540.279189.50
12,00010,00055.1610.5437101.45
15,00010,00061.5901.28447.956
20,00010,00071.0332.37229.953
30,00010,00086.8954.18920.744
40,00010,000100.325.70817.573
50,00010,000112.147.07215.856
Table 12. Iterations anomaly in the methods of Richardson, Cimmino, and Jacobi.
Table 12. Iterations anomaly in the methods of Richardson, Cimmino, and Jacobi.
Number of RowsNumber of ColumnsLargest Singular ValueSmallest Singular ValueCondition NumberNumber of Iterations
i n α i β i k i
10501.3720.64542.12550
30501.6730.22657.385580
40501.8080.124114.5682192
45501.8619.303 × 10 2 20.0084154
48501.9163.259 × 10 2 58.79136,960
50501.9172.415 × 10 3 793.966,760,589
52501.9373.606 × 10 2 53.73030,976
55501.9550.107818.1333559
60502.0290.146813.8252107
80502.1760.27447.929681
100502.3370.46175.061266
200503.0301.0212.96698
500504.0982.2171.84837
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Dax, A. The Smallest Singular Value Anomaly and the Condition Number Anomaly. Axioms 2022, 11, 99. https://doi.org/10.3390/axioms11030099

AMA Style

Dax A. The Smallest Singular Value Anomaly and the Condition Number Anomaly. Axioms. 2022; 11(3):99. https://doi.org/10.3390/axioms11030099

Chicago/Turabian Style

Dax, Achiya. 2022. "The Smallest Singular Value Anomaly and the Condition Number Anomaly" Axioms 11, no. 3: 99. https://doi.org/10.3390/axioms11030099

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop