Eigenvalue Estimates Using the Kolmogorov-Sinai Entropy

Shieh, Shih-Feng

doi:10.3390/e13122036

Open AccessArticle

Eigenvalue Estimates Using the Kolmogorov-Sinai Entropy

by

Shih-Feng Shieh

Department of Mathematics, National Taiwan Normal University, 88 SEC. 4, Ting Chou Road, Taipei 11677, Taiwan

Entropy 2011, 13(12), 2036-2048; https://doi.org/10.3390/e13122036

Submission received: 31 October 2011 / Revised: 28 November 2011 / Accepted: 12 December 2011 / Published: 20 December 2011

(This article belongs to the Special Issue Concepts of Entropy and Their Applications)

Download

Browse Figure

Versions Notes

Abstract

:

The scope of this paper is twofold. First, we use the Kolmogorov-Sinai Entropy to estimate lower bounds for dominant eigenvalues of nonnegative matrices. The lower bound is better than the Rayleigh quotient. Second, we use this estimate to give a nontrivial lower bound for the gaps of dominant eigenvalues of

A

and

A + V

.

Keywords:

Kolmogorov-Sinai entropy; Parry’s theorem; Eigenvalue estimates

1. Introduction

The main concern of this paper is to relate eigenvalue estimates to the Kolmogorov-Sinai entropy for Markov shifts. We shall begin with the definition of the Kolmogorov-Sinai entropy. Let

A = (a_{i j}) \in R^{N \times N}

be an irreducible nonnegative matrix. By an irreducible matrix

A

, we mean for each

1 \leq i, j \leq N

, there exists positive integer k such that

{(A^{k})}_{i j} \neq 0

. A matrix

P = (p_{i j}) \in R^{N \times N}

is said to be a stochastic matrix compatible with

A

, if

P

satisfies

$0 < p_{i j} \leq 1$ if $a_{i j} > 0$ ,
$p_{i j} = 0$ if $a_{i j} = 0$ ,
$\sum_{j = 1}^{N} p_{i j} = 1$ , for all $i = 1, \dots, N$ .

We denote by

P_{A}

the set of all stochastic matrices compatible with

A

. By Perron-Frobenius Theorem, it is easily seen that every stochastic matrix

P

has a unique left eigenvector

q > 0

corresponding to eigenvalue 1 with

\sum_{i = 1}^{N} q_{i} = 1

. Here we say

q

is the stationary probability vector associated with

P

. For a transition matrix

A

, i.e.,

a_{i j} = 1

or 0 for each

1 \leq i, j \leq N

, the subshift of finite type generated by

A

is defined by

\begin{matrix} Σ_{A} = {i = (i_{0}, i_{1}, \dots) | i_{j} \in {1, \dots, N}, a_{i_{j}, i_{j + 1}} = 1, j = 0, 1, 2, \dots} \end{matrix}

and the shift map on

Σ_{A}

is defined by

σ_{A} (i_{0}, i_{1}, \dots) = (i_{1}, i_{2}, \dots)

. A cylinder of

Σ_{A}

is the set

\begin{matrix} C_{j_{0}, j_{1}, \dots, j_{n}} = {i \in Σ_{A} | i_{0} = j_{0}, \dots, i_{n} = j_{n}} \end{matrix}

for any

n \geq 0

. Disjoint unions of cylinders form an algebra which generates the Borel σ-algebra of

Σ_{A}

. For any

P \in P_{A}

and its associated stationary probability vector

q

, the Markov measure of a cylinder may then be defined by

\begin{matrix} μ_{P, q} (C_{j_{0}, j_{1}, \dots, j_{n}}) = q_{j_{0}} p_{j_{0}, j_{1}} \dots p_{j_{n - 1}, j_{n}} \end{matrix}

Here

μ_{P, q}

is an invariant measure under the shift map

σ_{A}

(see e.g., [8]). The Kolmogorov-Sinai entropy (or called the measure theoretic entropy) of

σ_{A}

under the invariant measure

μ_{P, q}

is defined by

\begin{matrix} h_{P, q} (σ_{A}) = lim_{n \to \infty} \frac{1}{n} \sum_{j_{0}, j_{1}, \dots, j_{n}} H (μ_{P, q} (C_{j_{0}, j_{1}, \dots, j_{n}})) \end{matrix}

where

H (x) = - x log x

and the convention

0 log 0 = 0

is adopted. The notion of the Kolmogorov-Sinai entropy was first studied by Kolmogorov in 1958 on the problems arising from information theory and dimension of functional spaces, that measures the uncertainty of the dynamical systems (see e.g., [6,7]). It is shown in [8] (p. 221) that

\begin{matrix} h_{P, q} (σ_{A}) = - \sum_{i j} q_{i} p_{i j} log p_{i j} \end{matrix}

(1)

where the summation in (1) is taken over all

i, j

with

a_{i j} = 1

. On the other hand, it is shown by Parry [9] (Theorems 6 and 7) that the Kolmogorov-Sinai entropy of

σ_{A}

has an upper bound

log λ_{N} (A)

.

Theorem 1.1 (Parry’s Theorem).

Let

A

be an

N \times N

irreducible transition matrix. Then for any

P \in P_{A}

and its associated stationary probability vector

q

, we have

\begin{matrix} h_{P, q} (σ_{A}) \leq log λ_{N} (A) \end{matrix}

(2)

where

λ_{N} (A)

denotes the dominant eigenvalue of

A

. Moreover, if

A

is regular (

A^{n} > 0

for some

n > 0

), the equality in (2) holds for some unique

P \in P_{A}

and

q

the stationary probability vector associated with

P

.

Parry’s Theorem shows the Kolmogorov-Sinai entropy for a Markov shift is less than or equal to its topological entropy (that is,

log λ_{N} (A)

) and exactly one of the Markov measures on

Σ_{A}

maximizes the Kolmogorov-Sinai entropy of

σ_{A}

provided it is topological mixing. This is also a crucial lemma for showing the Variational Property of Entropy [8] (Proposition 8.1) in the ergodic theory. However, from the viewpoint of eigenvalue problems, combination of (1) and (2) gives a lower bound for the dominant eigenvalue of the transition matrix

A

. In this paper, we generalize Parry’s Theorem to general

N \times N

irreducible nonnegative matrices. Toward this end, we extend the entropy of irreducible nonnegative matrices by

h_{P, q, A} = - \sum_{i j} q_{i} p_{i j} log \frac{p_{i j}}{a_{i j}}

It is easily seen that

h_{P, q, A} = h_{P, q} (σ_{A})

.

Theorem 1.2 (Main Result 1: The Generalized Parry’s Theorem).

Let

A \in R^{N \times N}

an irreducible nonnegative matrix. Let

P \in P_{A}

and

q

be a stationary probability vector associated with

P

, then we have

\begin{matrix} h_{P, q, A} \leq log λ_{N} (A) \end{matrix}

(3)

where the summation is taken over all

i, j

with

a_{i j} > 0

. Moreover, the equality in (3) holds when

\begin{matrix} P = \frac{1}{λ_{N} (A)} diag {(x)}^{- 1} A diag (x) \end{matrix}

and

\begin{matrix} q = \frac{y \circ x}{y^{⊤} x} \end{matrix}

where

x > 0

and

y > 0

are, respectively, the right and left eigenvectors of

A

corresponding to the eigenvalue

λ_{N} (A)

. Here,

diag (x)

denotes the diagonal matrix with

x

on its diagonal,

y \circ x

denotes the vector

(y_{1} x_{1}, \dots, y_{N} x_{N})

, and

y^{⊤}

denotes the transpose of the column vector

y

.

Lower bound estimates for the dominant eigenvalue of a symmetric irreducible nonnegative matrix play an important role in various fields, e.g., the complexity of a symbolic dynamical system [5], synchronization problem of coupled systems [10], or the ground state estimates of Schrödinger operators [2]. A usual way to estimate the lower bound for

λ_{N} (A)

is the Rayleigh quotient

λ_{N} (A) \geq \frac{x^{⊤} A x}{x^{⊤} x}

It is also well-known that (see e.g., [4] (Theorem 8.1.26)),

\begin{matrix} min_{1 \leq i \leq N} \frac{1}{x_{i}} \sum_{j = 1}^{N} a_{i j} x_{j} \leq λ_{N} (A) \leq max_{1 \leq i \leq N} \frac{1}{x_{i}} \sum_{j = 1}^{N} a_{i j} x_{j} \end{matrix}

(4)

provided that

A \in R^{N \times N}

is nonnegative and

x \in R^{N}

is positive. Comparing the lower bound estimate (3) with (4) as well as with the Rayleigh quotient, we have the following result.

Corollary 1.3.

Let

A \in R^{N \times N}

be a symmetric, irreducible nonnegative matrix. Suppose

x \in R^{N}

be positive. Then the matrix

P = diag {(A x)}^{- 1} A diag (x)

is in

P_{A}

and

q = \frac{x \circ (A x)}{x^{⊤} A x}

is the stationary probability vector associated with

P

. In addition,

\begin{matrix} h_{P, q, A} \geq log (min_{1 \leq i \leq N} \frac{1}{x_{i}} \sum_{j = 1}^{N} a_{i j} x_{j}) \end{matrix}

and

\begin{matrix} h_{P, q, A} \geq log (\frac{x^{⊤} A x}{x^{⊤} x}) \end{matrix}

Here, each equality holds if and only if

x

is the eigenvector of

A

corresponding to the eigenvalue

λ_{N} (A)

.

Here we remark that for any arbitrary irreducible nonnegative matrix

A

, the entropy

h_{P, q, A}

involves the left eigenvector

q

of

P

. Hence, the lower bound estimate (3) is merely a formal expression. However, for a symmetric irreducible nonnegative matrix

A

and

P

chosen as in Corollary 1.3, the vector

q

can be explicitly expressed. Therefore,

h_{P, q, A}

can be written in an explicit form. We shall further show in Proposition 2.6 that

h_{P, q, A} = \frac{- 1}{x^{⊤} y} \sum_{i = 1}^{N} x_{i} y_{i} log \frac{x_{i}}{y_{i}}

where

y = A x

.

Considering symmetric nonnegative

A

and its perturbation

A + V

, it is easily seen that

λ_{N} (A + V) - λ_{N} (A) \geq x^{⊤} V x

, where

x

is the normalized eigenvector of

A

corresponding to

λ_{N} (A)

. This gives a trivial lower bound for the gap of

λ_{N} (A + V)

and

λ_{N} (A)

. Upper bound estimates for the gap are well studied in the perturbation theory [4,11]. By considering

A + V

as a low rank perturbation of

A

, the interlace structure of eigenvalues of

A + V

and of

A

is studied by [1,3]. In the second result of this paper, we give a nontrivial lower bound for

λ_{N} (A + V) - λ_{N} (A)

.

Theorem 1.4 (Main Result 2).

Let

A \in R^{N \times N}

be an irreducible nonnegative matrix and

x > 0

be the eigenvector of

A

corresponding to

λ_{N} (A)

with

{∥ x ∥}_{2} = 1

. Suppose

A

is symmetric. Then for any nonnegative

V = diag (v_{1}, \dots, v_{N})

, we have

\begin{matrix} λ_{N} (A + V) - λ_{N} (A) & \geq \frac{f (1 / λ_{N} (A)) - 1}{1 / λ_{N} (A)} \end{matrix}

(5)

where

\begin{matrix} f (z) & = \prod_{i = 1}^{N} {(1 + v_{i} z)}^{\frac{(1 + v_{i} z) x_{i}^{2}}{\sum_{j = 1}^{N} (1 + v_{j} z) x_{i}^{2}}} \end{matrix}

Here

(f (1 / λ_{N} (A)) - 1) λ_{N} (A) \geq x^{⊤} V x

. Furthermore, the equality in (5) holds if and only if

v_{1} = \dots = v_{N}

.

This paper is organized as follows. In Section 2, we prove the generalized Parry’s Theorem in three steps. First, we prove the case in which the matrix

A

has only integer entries. Next we show that Theorem 1.2 is true for nonnegative matrices with rational entries. Finally we show that it holds true for all irreducible nonnegative matrices. The proof of Corollary 1.3 is given at the end of this section. In Section 3, we give the proof of Theorem 1.4. We conclude this paper in Section 4.

Throughout this paper, we use the boldface alphabet (or symbols) to denote matrices (or vectors). For

u, v \in R^{N}

, the Hadamard product of

u

and

v

is their elementwise product which is denoted by

u \circ v = {(u_{i} v_{i})}_{1 \leq i \leq N}

. The notation

diag (u)

denotes the

N \times N

diagonal matrix with

u

on its diagonal. A matrix

A = (a_{i j}) \in R^{N \times N}

is said to be a transition matrix if

a_{i j} = 1

or 0 for all

1 \leq i, j \leq N

.

λ_{1} (A) \leq \dots \leq λ_{N} (A)

denotes the dominant eigenvalue of a nonnegative matrix

A

.

2. Proof of the Generalized Parry’s Theorem and Corollary 1.3

In this section, we shall prove the generalized Parry’s Theorem and Corollary 1.3. To prove inequality (3), we proceed in three steps.

Step 1: Inequality (3) is true for all irreducible nonnegative matrices with integer entries.

Let

A

be an irreducible nonnegative matrix with integer entries. To adopt Parry’s Theorem, we shall construct a transition matrix

\bar{A}

corresponding to

A

for which

λ_{N} (\bar{A}) = λ_{N} {(A)}^{1 / 2}

. To this end, we define the sets of indexes:

\begin{matrix} I & = {1, \dots, N} \\ E & = {{\vec{i j}}^{(k)} | a_{i j} \neq 0, 1 \leq k \leq a_{i j}} \end{matrix}

Let

\tilde{N} = \sum_{i, j = 1}^{N} a_{i j} = # E

and

\bar{N} = N + \tilde{N}

. The transition matrix

\bar{A} \in R^{\bar{N} \times \bar{N}}

corresponding to

A

with index set

I \cup E

is defined as follows

\begin{matrix} (1) {\bar{a}}_{i, {\vec{i j}}^{(k)}} = 1, for all 1 \leq k \leq a_{i j} if a_{i j} \neq 0, \end{matrix}

(6a)

\begin{matrix} (2) {\bar{a}}_{{\vec{i j}}^{(k)}, j} = 1, for all 1 \leq k \leq a_{i j} if a_{i j} \neq 0, \end{matrix}

(6b)

\begin{matrix} (3) the rest entries are set to be zero \end{matrix}

(6c)

It is easily seen that

\bar{A}

can be written in the block form:

\begin{matrix} \bar{A} = [\begin{matrix} 0_{N \times N} & {\bar{A}}_{I E} \\ {\bar{A}}_{E I} & 0_{\tilde{N} \times \tilde{N}} \end{matrix}] \end{matrix}

(7)

where

0_{N \times N}

and

0_{\tilde{N} \times \tilde{N}}

are, respectively, the zero matrices in

R^{N \times N}

and

R^{\tilde{N} \times \tilde{N}}

,

{\bar{A}}_{I E} \in R^{N \times \tilde{N}}

and

{\bar{A}}_{E I} \in R^{\tilde{N} \times N}

.

Proposition 2.1.

λ_{\bar{N}} (\bar{A}) = λ_{N} {(A)}^{1 / 2}

.

Proof.

From (7), we see that

\begin{matrix} {\bar{A}}^{2} = [\begin{matrix} {\bar{A}}_{I E} {\bar{A}}_{E I} & 0_{N \times \tilde{N}} \\ 0_{\tilde{N} \times N} & {\bar{A}}_{E I} {\bar{A}}_{I E} \end{matrix}] \end{matrix}

From (6a) and (6b), for each

i, j

with

a_{i j} \neq 0

, we have

\begin{matrix} \sum_{k = 1}^{a_{i j}} {\bar{a}}_{i, {\vec{i j}}^{(k)}} {\bar{a}}_{{\vec{i j}}^{(k)}, j} = a_{i j} \end{matrix}

(8)

Using (8), together with (6c), we have

\begin{matrix} {({\bar{A}}_{I E} {\bar{A}}_{E I})}_{i j} & = \sum_{α \in E} {\bar{a}}_{i α} {\bar{a}}_{α j} \end{matrix}

\begin{matrix} = \{\begin{matrix} \sum_{k} {\bar{a}}_{i, {\vec{i j}}^{(k)}} {\bar{a}}_{{\vec{i j}}^{(k)}, j} = a_{i j} & if a_{i j} \neq 0 \\ 0 = a_{i j} & if a_{i j} = 0 \end{matrix} \end{matrix}

(9)

From (9) we see that

{\bar{A}}_{I E} {\bar{A}}_{E I} = A

. Hence

λ_{\bar{N}} ({\bar{A}}^{2}) = λ_{N} ({\bar{A}}_{I E} {\bar{A}}_{E I}) = λ_{\tilde{N}} ({\bar{A}}_{E I} {\bar{A}}_{I E}) = λ_{N} (A)

. On the other hand,

\bar{A}

is a nonnegative matrix. From Perron-Frobenius Theorem, its dominant eigenvalue is nonnegative. The assertion follows. ☐

Remark 2.1. In the language of graph theory,

a_{i j}

represents the number of directed edges from vertex i to vertex j. Hence

\sum_{i j} {(A^{n})}_{i j}

equals to the number of all possible routes of length

n + 1

, i.e.,

\begin{matrix} # {all possible routes of length n + 1} = \sum_{i j} {(A^{n})}_{i j} = O (λ_{N} {(A)}^{n}) \end{matrix}

For the construction of

\bar{A}

, we add an additional vertex on every edge from vertex i to vertex j (See Figure 2.1 for the illustration). Hence, each route that obeys the rule defined by

A

,

\begin{matrix} (i_{1}, i_{2}, \dots, i_{j}, i_{j + 1}, \dots, i_{n - 1}, i_{n}), provided a_{i_{j} i_{j + 1}} > 0 for all j = 1, \dots, n - 1 \end{matrix}

(10)

now becomes one of the following routes according to the rule defined by

\bar{A}

:

\begin{matrix} (i_{1}, {\vec{i_{1} i_{2}}}^{(k_{1})}, i_{2}, \dots i_{j}, {\vec{i_{j} i_{j + 1}}}^{(k_{j})} i_{j + 1}, \dots, i_{n - 1}, {\vec{i_{n - 1} i_{n}}}^{(k_{n - 1})}, i_{n}) \end{matrix}

(11)

where

1 \leq k_{j} \leq a_{i_{j} i_{j + 1}}

,

j = 1, \dots, n - 1

. However, a route of the form in (11) is equivalent to the form in (10) but its length is doubled. Hence

O (λ_{N} {(\bar{A})}^{2 n}) = O (λ_{N} {(A)}^{n})

.

Figure 1. Illustration for Remark 2.1 with the example

A = [\begin{matrix} 1 & 2 \\ 1 & 0 \end{matrix}]

.

Figure 1. Illustration for Remark 2.1 with the example

A = [\begin{matrix} 1 & 2 \\ 1 & 0 \end{matrix}]

.

Now, let

P \in P_{A}

be given and

q

be its associated stationary probability vector. We shall accordingly define a stochastic matrix

\bar{P} \in P_{\bar{A}}

and its associated stationary probability vector

\bar{q}

. The stochastic matrix

\bar{P}

is defined as follows:

\begin{matrix} (1) {\bar{p}}_{i, {\vec{i j}}^{(k)}} = \frac{p_{i j}}{a_{i j}} for all 1 \leq k \leq a_{i j} provided a_{i j} > 0 \end{matrix}

(12a)

\begin{matrix} (2) {\bar{p}}_{{\vec{i j}}^{(k)}, j} = 1 for all 1 \leq k \leq a_{i j} provided a_{i j} > 0 \end{matrix}

(12b)

\begin{matrix} (3) the rest entries are set to zero \end{matrix}

(12c)

From (6) and (12), it is easily seen that

\bar{P}

is a stochastic matrix compatible with

\bar{A}

. Let the vector

\bar{q} \in R^{N + \tilde{N}}

be defined by

\begin{matrix} {\bar{q}}_{i} = \frac{q_{i}}{2}, 1 \leq i \leq N \end{matrix}

(13a)

and

\begin{matrix} {\bar{q}}_{{\vec{i j}}^{(k)}} = \frac{q_{i} p_{i j}}{2 a_{i j}}, for all 1 \leq k \leq a_{i j} with a_{i j} > 0 \end{matrix}

(13b)

Proposition 2.2.

\bar{q}

is the stationary probability vector associated with

\bar{P}

.

Proof.

We first show that

\bar{q}

is a left eigenvector of

\bar{P}

with the corresponding eigenvalue 1. For any

1 \leq j \leq N

, using (12b), (13b), and the fact that

q^{⊤} P = q^{⊤}

, we have

\begin{matrix} {({\bar{q}}^{⊤} \bar{P})}_{j} = \sum_{i, k} {\bar{q}}_{{\vec{i j}}^{(k)}} {\bar{p}}_{{\vec{i j}}^{(k)}, j} & = \sum_{i, a_{i j} > 0} \sum_{k = 1}^{a_{i j}} \frac{1}{2} q_{i} \frac{p_{i j}}{a_{i j}} \cdot 1 \end{matrix}

\begin{matrix} = \sum_{i} \frac{1}{2} q_{i} p_{i j} = \frac{1}{2} q_{j} = {\bar{q}}_{j} \end{matrix}

(14a)

On the other hand, using (12a) and (13a), for all

{\vec{i j}}^{(k)}

with

a_{i j} > 0

and

1 \leq k \leq a_{i j}

, we have

\begin{matrix} {({\bar{q}}^{⊤} \bar{P})}_{{\vec{i j}}^{(k)}} & = {\bar{q}}_{i} {\bar{p}}_{i, {\vec{i j}}^{(k)}} \end{matrix}

\begin{matrix} = \frac{1}{2} q_{i} \frac{p_{i j}}{a_{i j}} = {\bar{q}}_{{\vec{i j}}^{(k)}} \end{matrix}

(14b)

In (14), we have proved

{\bar{q}}^{⊤} \bar{P} = {\bar{q}}^{⊤}

. Now we show that the total sum of entries of

\bar{q}

is 1. Using the fact

\begin{matrix} \sum_{i j} \sum_{k = 1}^{a_{i j}} {\bar{q}}_{{\vec{i j}}^{(k)}} & = \sum_{i j} \sum_{k = 1}^{a_{i j}} \frac{q_{i} p_{i j}}{2 a_{i j}} \\ = \sum_{i j} \frac{1}{2} q_{i} p_{i j} = \frac{1}{2} \sum_{i} q_{i} \end{matrix}

we conclude that

\begin{matrix} \sum_{α \in I \cup E} {(q)}_{α} & = \sum_{i} {\bar{q}}_{i} + \sum_{i j} \sum_{k = 1}^{a_{i j}} {\bar{q}}_{{\vec{i j}}^{(k)}} \\ = \frac{1}{2} \sum q_{i} + \frac{1}{2} \sum_{i} q_{i} = 1 \end{matrix}

The proof is complete. ☐

From the construction of the transition matrix

\bar{A}

, it is easily seen that

\bar{A}

is irreducible. In (12) and Proposition 2.2, we show that

\bar{P} \in P_{\bar{A}}

and the vector

\bar{q}

defined by (13) is its associated stationary probability vector. Hence the Kolmogorov-Sinai entropy

h_{\bar{P}, \bar{q}} (σ_{\bar{A}})

is well-defined. Now we give the relationship between the quantities

h_{\bar{P}, \bar{q}} (σ_{\bar{A}})

and

h_{P, q, A}

defined in Equation (3).

Proposition 2.3.

\begin{matrix} h_{\bar{P}, \bar{q}} (σ_{\bar{A}}) = \frac{1}{2} h_{P, q, A} \end{matrix}

Proof.

We note that by (12b),

log {\bar{p}}_{{\vec{i j}}^{(k)}, j} = 0

if

a_{i j} > 0

. Using the definition of

\bar{P}

and

\bar{q}

in (12) and (13), as well as the entropy formula (1), we have

\begin{matrix} h_{\bar{P}, \bar{q}} (σ_{\bar{A}}) & = - \sum_{i j, a_{i j} > 0} \sum_{k = 1}^{a_{i j}} {\bar{q}}_{i} {\bar{p}}_{i, {\vec{i j}}^{(k)}} log {\bar{p}}_{i, {\vec{i j}}^{(k)}} \\ = - \sum_{i j, a_{i j} > 0} \sum_{k = 1}^{a_{i j}} \frac{1}{2} q_{i} \frac{p_{i j}}{a_{i j}} log \frac{p_{i j}}{a_{i j}} \\ = - \sum_{i j, a_{i j} > 0} \frac{1}{2} q_{i} p_{i j} log \frac{p_{i j}}{a_{i j}} \\ = \frac{1}{2} h_{P, q, A} \end{matrix}

The proof is complete. ☐

Using Proposition 2.3, 2.1, and Parry’s Theorem 1.1, it follows that

\begin{matrix} \frac{1}{2} h_{P, q, A} & = h_{\bar{P}, \bar{q}} (σ_{\bar{A}}) \end{matrix}

\begin{matrix} \leq log λ_{N} (\bar{A}) = \frac{1}{2} log λ_{N} (A) \end{matrix}

(15)

Step 2: Inequality (3) is true for all irreducible nonnegative matrices with rational entries.

Any

N \times N

nonnegative matrix with all entries that are rational can be written as

A / n

where

A

is a nonnegative matrix with integer entries and n is an positive integer. Suppose

A

is irreducible and

P \in P_{A / n}

. Note that

P_{A / n} = P_{A}

. Letting

q

be a stationary probability vector associated with

P

, inequality (3) for

A / n

follows from the following proposition.

Proposition 2.4.

\begin{matrix} h_{P, q, A / n} \leq log λ_{N} (A / n) \end{matrix}

Proof.

From the definition of

h_{P, q, A / n}

, we see that

\begin{matrix} h_{P, q, A / n} = - \sum_{i j, a_{i j} > 0} q_{i} p_{i j} log \frac{p_{i j} n}{a_{i j}} & = - \sum_{i j, a_{i j} > 0} q_{i} p_{i j} log \frac{p_{i j}}{a_{i j}} - \sum_{i j} q_{i} p_{i j} log n \end{matrix}

\begin{matrix} = h_{P, q, A} - \sum_{i j} q_{i} p_{i j} log n \end{matrix}

(16)

On the other hand, since

q^{⊤} P = q^{⊤}

and

\sum q_{i} = 1

, we have

\begin{matrix} \sum_{i j} q_{i} p_{i j} log n = log n \end{matrix}

(17)

Substituting (17) into (16) and using the result (15) in Step 1, we have

\begin{matrix} h_{P, q, A / n} = h_{P, q, A} - log n \leq log λ_{N} (A) - log n = log λ_{N} (A / n) \end{matrix}

☐

Step 3: Inequality (3) is true for all irreducible nonnegative matrices.

It remains to show (3) holds for all nonnegative

A

with irrational entries. The assertion follows from Step 2 and the continuous dependence of eigenvalues with respect to the matrix.

Now, we give the proof of the second assertion of Theorem 1.2.

Proposition 2.5.

The equality in (3) holds when one chooses

\begin{matrix} P = \frac{1}{λ_{N} (A)} diag {(x)}^{- 1} A diag (x) \end{matrix}

and

\begin{matrix} q = \frac{y \circ x}{y^{⊤} x} \end{matrix}

where

x > 0

and

y > 0

are, respectively, the right and left eigenvectors of

A

corresponding to eigenvalue

λ_{N} (A)

.

Proof.

By setting

y^{⊤} x = 1

, we may write

\begin{matrix} p_{i j} = \frac{a_{i j} x_{j}}{λ_{N} (A) x_{i}} and q_{i} = x_{i} y_{i} \end{matrix}

To ease the notation, set

λ_{N} = λ_{N} (A)

. Hence, we have

\begin{matrix} h_{P, q, A} & = - \sum_{i j} x_{i} y_{i} \frac{a_{i j} x_{j}}{λ_{N} x_{i}} log \frac{x_{j}}{λ_{N} x_{i}} \\ = \sum_{i j} \frac{y_{i}}{λ_{N}} (a_{i j} x_{j}) log (λ_{N} x_{i}) - \sum_{i j} \frac{x_{j}}{λ_{N}} (y_{i} a_{i j}) log x_{j} \\ = \sum_{i} y_{i} x_{i} log (λ_{N} x_{i}) - \sum_{j} x_{j} y_{j} log x_{j} (Use the facts \sum_{j} a_{i j} x_{j} = λ_{N} x_{i} and \sum_{i} y_{i} a_{i j} = λ_{N} y_{j}) \\ = \sum_{i} x_{i} y_{i} log λ_{N} \\ = log λ_{N} \end{matrix}

The proof of Theorem 1.2 is complete. ☐

In the following, we give the proof of Corollary 1.3. We first prove the following useful proposition. It will be used in Section 3 as well.

Proposition 2.6.

Let

A \in R^{N \times N}

be an irreducible nonnegative matrix. Suppose

A

is symmetric and

x \in R^{N}

be positive. If

P = diag {(A x)}^{- 1} A diag (x)

and

q = \frac{x \circ y}{x^{⊤} y}

, where

y = A x

, then

\begin{matrix} h_{P, q, A} = \frac{- 1}{x^{⊤} y} \sum_{i = 1}^{N} x_{i} y_{i} log \frac{x_{i}}{y_{i}} \end{matrix}

From Proposition 2.5, we see that the matrix

P

in Proposition 2.5 is a stochastic matrix compatible with

A

and

q

is its associated stationary probability vector. Hence, the entropy

h_{P, q, A}

is well defined. Now, we give the proof of this Proposition.

Proof. Since

A \geq 0

is irreducible and

x > 0

, it follows

A x > 0

, and hence,

diag {(A x)}^{- 1}

is well-defined. It is easily seen that

p_{i j} = 0

if and only if

a_{i j} = 0

. However,

P e = diag {(A x)}^{- 1} (A x) = e

. This shows that

P \in P_{A}

. On the other hand, since

A

is symmetric, we see that

y^{⊤} = x^{⊤} A

. Hence

\begin{matrix} q^{⊤} P = {(x \circ (A x))}^{⊤} diag {(A x)}^{- 1} A diag (x) / x^{⊤} A x = q^{⊤} \end{matrix}

We have proved the first assertion of this proposition. By the definition of

h_{P, q, A}

in (3), we have

\begin{matrix} h_{P, q, A} & = - \sum_{i j} \frac{a_{i j} x_{i} x_{j}}{x^{⊤} y} log \frac{x_{j}}{y_{i}} = \frac{1}{x^{⊤} y} [\sum_{i = 1}^{N} x_{i} y_{i} log y_{i} - \sum_{i = 1}^{N} x_{j} y_{j} log x_{j}] \\ = \frac{- 1}{x^{⊤} y} \sum_{i = 1}^{N} x_{i} y_{i} log \frac{x_{i}}{y_{i}} \end{matrix}

This completes the proof. ☐

Now, we are in a proposition to give the proof of Corollary 1.3.

Proof of Corollary 1.3.

For convenience, we let

y = A x

. Hence

q = \frac{x \circ y}{x^{⊤} y}

and

p_{i j} = \frac{a_{i j} x_{j}}{y_{i}}

. Using Proposition 2.6, we have

\begin{matrix} h_{P, q, A} & = \frac{- 1}{x^{⊤} y} \sum_{i = 1}^{N} x_{i} y_{i} log \frac{x_{i}}{y_{i}} \end{matrix}

(18)

\begin{matrix} \geq - log \frac{x^{⊤} x}{x^{⊤} y} \end{matrix}

(19)

\begin{matrix} = log \frac{x^{⊤} A x}{x^{⊤} x} \end{matrix}

Here inequality (19) follows from Jensen’s inequality (see e.g., [12] (Theorem 7.35)) for

- log

and the fact that

\sum_{i = 1}^{N} \frac{1}{x^{⊤} y} x_{i} y_{i} = 1

. Similarly, using Proposition 2.6 and the monotonicity of log, we also see that

\begin{matrix} h_{P, q, A} & = \frac{- 1}{x^{⊤} y} \sum_{i = 1}^{N} x_{i} y_{i} log \frac{x_{i}}{y_{i}} \end{matrix}

\begin{matrix} \geq \frac{1}{x^{⊤} y} \sum_{i = 1}^{N} x_{i} y_{i} log (min_{1 \leq i \leq N} \frac{y_{i}}{x_{i}}) \end{matrix}

(20)

\begin{matrix} = log (min_{1 \leq i \leq N} \frac{y_{i}}{x_{i}}) \end{matrix}

This proves the first assertion of Corollary 1.3. It is easily seen that if

x

is an eigenvector corresponding to

λ_{N} (A)

, then both equalities in (19) and (20) hold. From the assumption that

A \geq 0

is irreducible and

x > 0

, it follows that

y > 0

also. This implies there are N terms in (18). Hence equality in (19) or in (20) holds only if

\frac{x_{i}}{y_{i}}

, for all

i = 1, \dots, N

, are constant. That is,

y = A x = λ x

. Here λ is some eigenvalue of

A

. However,

x > 0

. From Perron-Frobenius Theorem it follows

λ = λ_{N} (A)

. The proof is complete. ☐

3. Proof of Theorem 1.4

In this section, we shall give the proof of Theorem 1.4. We first prove (5).

Proposition 3.1.

Let

A

,

V

and

x

be as defined in Theorem 1.4. Then we have

\begin{matrix} λ_{N} (A + V) - λ_{N} (A) & \geq \frac{f (1 / λ_{N} (A)) - 1}{1 / λ_{N} (A)} \end{matrix}

(21)

where

\begin{matrix} f (z) & = \prod_{i = 1}^{N} {(1 + v_{i} z)}^{\frac{(1 + v_{i} z) x_{i}^{2}}{\sum_{j = 1}^{N} (1 + v_{j} z) x_{i}^{2}}} \end{matrix}

The equality holds in (21) if and only if

v_{1} = \dots = v_{N}

.

Proof.

To ease the notation, we shall denote

λ = λ_{N} (A)

. Let

y = (A + V) x = λ x + V x

,

q = \frac{x \circ y}{x^{⊤} y}

, and

P = diag {(y)}^{- 1} (A + V) diag (x) \in P_{A + V}

. From Theorem 1.2 and Proposition 2.6, we have

\begin{matrix} log λ_{N} (A + V) & \geq h_{P, q, A + V} \end{matrix}

\begin{matrix} = \frac{1}{x^{⊤} (A + V) x} \sum_{i = 1}^{N} (λ + v_{i}) x_{i}^{2} log (λ + v_{i}) \end{matrix}

(22)

We note that

\begin{matrix} log λ_{N} (A) = \frac{1}{x^{⊤} (A + V) x} \sum_{i = 1}^{N} (λ + v_{i}) x_{i}^{2} log λ \end{matrix}

(23)

Subtracting (23) from (22), we have

\begin{matrix} log \frac{λ_{N} (A + V)}{λ_{N} (A)} \geq \frac{1}{\sum_{i = 1}^{N} (1 + v_{i} / λ) x_{i}^{2}} \sum_{i = 1}^{N} (1 + v_{i} / λ) x_{i}^{2} log (1 + v_{i} / λ) \end{matrix}

and hence,

\begin{matrix} \frac{λ_{N} (A + V) - λ_{N} (A)}{λ_{N} (A)} \geq f (1 / λ_{N} (A)) - 1 \end{matrix}

This proves (21). Now we prove the second assertion of this proposition. It is easily seen that

v_{1} = \dots = v_{N}

implies the equality in (21) holds. Conversely, suppose the equality in (21) holds. It is equivalent to the equality in (22) holds. Now, we write (22) in an alternative form

\begin{matrix} \frac{1}{x^{⊤} (A + V) x} \sum_{i = 1}^{N} (λ + v_{i}) x_{i}^{2} log (λ + v_{i}) & \leq log (\frac{1}{x^{⊤} (A + V) x} \sum_{i = 1}^{N} {(λ + v_{i})}^{2} x_{i}^{2}) \end{matrix}

(24)

\begin{matrix} = log \frac{x^{⊤} {(A + V)}^{2} x}{x^{⊤} (A + V) x} \end{matrix}

\begin{matrix} \leq log λ_{N} (A + V) \end{matrix}

(25)

Here (24) follows from the convexity of log and Jensen’s inequality. Hence, if the equality in (22) holds, then the equality in (25) also holds. This means

x

is also an eigenvector of

A + V

. However, since

x > 0

is the eigenvector of

A

corresponding to

λ_{N} (A)

, we conclude that

v_{1} = \dots = v_{N}

. This completes the proof. ☐

The following proposition can be obtained from a standard calculation.

Proposition 3.2.

Let f be the real-valued function in Proposition 3.1. Then we have

\begin{matrix} f^{'} (z) & = (\frac{b}{1 + b z} + \frac{g (z)}{{(1 + b z)}^{2}}) f (z) \end{matrix}

(26a)

\begin{matrix} f^{''} (z) & = (\frac{g^{'} (z)}{{(1 + b z)}^{2}} + \frac{g {(z)}^{2}}{{(1 + b z)}^{4}}) f (z) \end{matrix}

(26b)

where

b = \sum_{i = 1}^{N} x_{i}^{2} v_{i}

and

\begin{matrix} g (z) & = \sum_{i = 1}^{N} x_{i}^{2} \sum_{j = 1}^{N} x_{j}^{2} (v_{i} - v_{j}) log (1 + v_{i} z), \end{matrix}

(27a)

\begin{matrix} g^{'} (z) & = \frac{1}{2} \sum_{i, j = 1}^{N} x_{i}^{2} x_{j}^{2} {(v_{i} - v_{j})}^{2} \frac{1}{(1 + v_{i} z) (1 + v_{j} z)}, \end{matrix}

(27b)

In the following, we show that the lower bound estimate (5) for

λ_{N} (A + V) - λ_{N} (A)

is greater than

x^{⊤} V x

.

Proposition 3.3.

Let f be the real-valued function in Proposition 3.1. Then we have

\begin{matrix} \frac{f (1 / λ_{N} (A)) - 1}{1 / λ_{N} (A)} \geq x^{⊤} V x \end{matrix}

Proof.

It is easily seen from the definition of

f (z)

that

f (0) = 1

. Hence, using the Mean Value Theorem follows that there exists a

ζ \in (0, 1 / λ_{N} (A))

such that

\begin{matrix} \frac{f (1 / λ_{N} (A)) - 1}{1 / λ_{N} (A)} = f^{'} (ζ) . \end{matrix}

(28)

From (26a) and (27a), we see that

f^{'} (0) = b = x^{⊤} V x

. From (26b), (27a) and (27b), we also see that

f^{''} (z) \geq 0

for all

z \geq 0

. This implies

\begin{matrix} f^{'} (ζ) \geq f^{'} (0) = x^{⊤} V x \end{matrix}

(29)

The assertion of this proposition follows from (28) and (29) directly. ☐

4. Conclusions

In this paper, we first generalize Parry’s Theorem to general nonnegative matrices. This can be treated as an estimation for the lower bound for a nonnegative matrix. Second, we use the generalized Parry’s Theorem to estimate a nontrivial lower bound of

λ_{N} (A + V) - λ_{N} (A)

, provided that

A \geq 0

is symmetric and

V \geq 0

is a diagonal matrix. The bound is optimal but implicit that can be applied when

λ_{N} (A)

and its corresponding eigenvector are known. As an interesting topic to be explored in the future, rather than a nonnegative matrix eigenvalue problem, one may wish to derive a similar inequality to (3) for a general square matrix or for a generalized eigenvalue problem

A x = λ B x

.

References

Arbenz, P.; Golub, G.H. On the spectral decomposition of Hermitian matrices modified by low rank perturbations with applications. SIAM J. Matrix Anal. Appl. 1988, 9, 40–58. [Google Scholar] [CrossRef]
Chang, S.-M.; Lin, W.-W.; Shieh, S.-F. Gauss-Seidel-type methods for energy states of a multi-component Bose-Einstein condensate. J. Comp. Phys. 2005, 22, 367–390. [Google Scholar] [CrossRef]
Golub, G.H. Some modified matrix eigenvalue problems. SIAM Rev. 1973, 15, 318–334. [Google Scholar] [CrossRef]
Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, UK, 1985. [Google Scholar]
Juang, J.; Shieh, S.-F.; Turyn, L. Cellular neural networks: Space-dependent template, mosaic patterns and spatial chaos. Internat. J. Bifur. Chaos Appl. Sci. Engrg. 2002, 12, 1717–1730. [Google Scholar] [CrossRef]
Kolmogorov, A.N. A new metric invariant of transitive dynamical systems and automorphisms of Lebesgue spaces. Dokl. Akad. Nauk SSSR 1958, 119, 861–864. [Google Scholar]
Kolmogorov, A.N. On the entropy per time unit as a metric invariant of auto-morphisms. Dokl. Akad. Nauk SSSR 1958, 21, 754–755. [Google Scholar]
Mañé, R. Ergodic Theory and Differentiable Dynamics; Springer-Verlag: Berlin, Germany, 1987. [Google Scholar]
Parry, W. Intrinsic Markov chains. Trans. Amer. Math. Soc. 1964, 112, 55–66. [Google Scholar] [CrossRef]
Shieh, S.F.; Wang, Y.Q.; Wei, G.W.; Lai, C.-H. Mathematical analysis of the wavelet method of chaos control. J. Math. Phys. 2006, 47, 082701. [Google Scholar] [CrossRef]
Stewart, G.W.; Sun, J.-G. Matrix Perturbation Theory; Academic Press: Boston, MA, USA, 1990. [Google Scholar]
Wheeden, R.L.; Zygmund, A. Measure and integral: An introduction to real analysis. In Monographs and Textbooks in Pure and Applied Mathematics; Marcel Dekker: New York, NY, USA, 1977. [Google Scholar]

© 2011 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)

Share and Cite

MDPI and ACS Style

Shieh, S.-F. Eigenvalue Estimates Using the Kolmogorov-Sinai Entropy. Entropy 2011, 13, 2036-2048. https://doi.org/10.3390/e13122036

AMA Style

Shieh S-F. Eigenvalue Estimates Using the Kolmogorov-Sinai Entropy. Entropy. 2011; 13(12):2036-2048. https://doi.org/10.3390/e13122036

Chicago/Turabian Style

Shieh, Shih-Feng. 2011. "Eigenvalue Estimates Using the Kolmogorov-Sinai Entropy" Entropy 13, no. 12: 2036-2048. https://doi.org/10.3390/e13122036

Article Menu

Eigenvalue Estimates Using the Kolmogorov-Sinai Entropy

Abstract

1. Introduction

2. Proof of the Generalized Parry’s Theorem and Corollary 1.3

Step 1: Inequality (3) is true for all irreducible nonnegative matrices with integer entries.

Step 2: Inequality (3) is true for all irreducible nonnegative matrices with rational entries.

Step 3: Inequality (3) is true for all irreducible nonnegative matrices.

3. Proof of Theorem 1.4

4. Conclusions

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI