Next Article in Journal
Composition Operators on Weighted Zygmund Spaces of the First Loo-keng Hua Domain
Previous Article in Journal
Dynamic Interactions: Non-Integer-Order Heat-Mass Transfer in Magnetohydrodynamic Flow of Non-Newtonian Fluid over Inclined Plates
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On Asymptotic Equipartition Property for Stationary Process of Moving Averages

1
School of Mathematics and Statistics, Xinyang College, Xinyang 464000, China
2
School of Microelectronics and Data Science, Anhui University of Technology, Ma’anshan 243000, China
*
Author to whom correspondence should be addressed.
Symmetry 2024, 16(7), 827; https://doi.org/10.3390/sym16070827
Submission received: 26 May 2024 / Revised: 24 June 2024 / Accepted: 27 June 2024 / Published: 1 July 2024
(This article belongs to the Section Mathematics)

Abstract

:
Let { X n } n Z be a stationary process with values in a finite set. In this paper, we present a moving average version of the Shannon–McMillan–Breiman theorem; this generalize the corresponding classical results. A sandwich argument reduced the proof to direct applications of the moving strong law of large numbers. The result generalizes the work by Algoet et. al., while relying on a similar sandwich method. It is worth noting that, in some kind of significance, the indices a n and ϕ ( n ) are symmetrical, i.e., for any integer n, if the growth rate of ( a n ) n Z is slow enough, all conclusions in this article still hold true.

1. Introduction

Information theory is mainly concerned with stationary random processes X = { X n } n Z , where X n takes values in a set X , with cardinality | X | < . The strong convergence of the entropy at time n of a random processes divided by n to a constant limit called the entropy rate of the process is known as the ergodic theorem of information theory or the asymptotic equipartition property (AEP) [1], in some sense, of the expression
lim n [ 1 n log p ( X 0 , , X n 1 ) ] a constant .
Its original version proven in the 1950s for ergodic stationary processes is known as the Shannon–McMillan theorem for the convergence in mean and as the Shannon–Breiman–McMillan theorem [2,3,4] for the almost everywhere convergence. Since then, generalized versions of Shannon–McMillan–Breiman’s limit theorem were developed by many authors [1,2,4,5]. Extensions have been made in the direction of weakening the assumptions on the reference measure, state space, index set and required properties of the process. For the general development, please see Girardin [6] and the references therein.
In statistics, smoothing data is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise phenomena. One of the most used smoothing methods is moving average (MA). A number of authors have studied the question of almost everywhere convergence for an invertible transformation of X, which is measure preserving the moving averages, e.g., Akcoglu and Del Junco [7]; Bellow, Jones, and Rosenblatt [8]; Junco and Steele [9]; Schwartz [10]; and Haili and Nair [11]. Recently, Wang and Yang [12,13] proposed a new concept of the generalized entropy density, and established a generalized entropy ergodic theorem for time-nonhomogeneous Markov chain and for non-null stationary processes. Shi, Wang et al. [14] studied the generalized entropy ergodic theorem for nonhomogeneous Markov chains indexed by a binary tree.
Motivated by the work above, in this paper we will give a moving average version of the Shannon–McMillan–Breiman theorem. The results in this paper generalize the results of those in [2]. It is worth noting that, in some sense, the indices a n and ϕ ( n ) are symmetrical. In this paper, we are discussing the so-called forward moving average; if the growth rate of ( a n ) n Z w.r.p. to integer n is slow enough, all conclusions in this article still hold true, i.e., the backward moving average is still established.
The method used in showing the main results is the “sandwich” approximation approach of Algoet and Cover [2], which depends strongly on the moving strong law of large numbers: sample entropy is asymptotically sandwiched between two functions whose limits can be determined from the moving SLLN theorem.
This paper is organized as follows. In Section 2, we introduced some necessary preparatory knowledge. To distinguish this from the main conclusion theorem names, we present some required preliminaries and three lemmas. In Section 3, we give the main results and some properties of them are studied in the same section. Also, we give examples of applications.

2. Preliminaries

Throughout this section, let ( Ω , F , P ) denote a fixed probability space and let { X n } n Z be a stationary sequence taking values from a finite set X = { 1 , 2 , , b } . For the sequence { X n } n Z , denote the partial sequence X i , , X j by X i j and x i , , x j by x i j for i < j . Likewise, we write X n , x n for the sequence of { X i } i n and { x i } i n , respectively. Let
p ( x i j ) = P ( X i j = x i j )
and
p ( x j | x i j 1 ) = P ( X j = x j | X i j 1 = x i j 1 )
wherever the conditioning event has positive probability. Define random variables
p ( X i j ) and P ( X j | X i j 1 )
by setting X j = X j ( ω ) in the corresponding definitions. Since
P ( p ( X i j ) = 0 ) = 0
the conditional probability makes sense P a . e . (i.e., almost everywhere holds true under measure P ).
Definition 1
(see e.g., [2]). The canonical Markov approximation of order m to the probability is defined for large j > m as
p [ m ] ( X i i + j 1 ) = p ( X i i + m 1 ) k = i + m i + j 1 p ( X k | X k m k 1 )
We will prove a new version of AEP for a stationary process { X n } n Z . Before developing the main theme of the paper, we shall need to derive some basic lemmas. Let { a n , ϕ ( n ) } n Z be a pair of positive integers such that ϕ ( n ) as n , and for every ε > 0 , Σ n = 1 2 ε ϕ ( n ) < .
Lemma 1.
Let { X n } n Z be a stationary process with values from a finite set X ; then, we have
lim sup n 1 ϕ ( n ) log p [ m ] ( X a n a n + ϕ ( n ) 1 ) p ( X a n a n + ϕ ( n ) 1 ) 0 , P a . e .
and
lim sup n 1 ϕ ( n ) log p ( X a n a n + ϕ ( n ) 1 ) p ( X a n a n + ϕ ( n ) 1 | X a n 1 ) 0 , P a . e .
where the base of the logarithm is taken to be 2.
Proof. 
Let A be the support set of p ( X a n a n + ϕ ( n ) 1 ) ; then,
E P p [ m ] ( X a n a n + ϕ ( n ) 1 ) p ( X a n a n + ϕ ( n ) 1 ) = x a n a n + ϕ ( n ) 1 A p [ m ] ( x a n a n + ϕ ( n ) 1 ) p ( x a n a n + ϕ ( n ) 1 ) · p ( x a n a n + ϕ ( n ) 1 ) = x a n a n + ϕ ( n ) 1 A p [ m ] ( x a n a n + ϕ ( n ) 1 ) = p [ m ] ( A ) 1 .
where E P indicates taking expectation under measure P .
Similarly, let B ( X a n 1 ) denote the support set of p ( · | X a n 1 ) . Then, we have
E P p ( X a n a n + ϕ ( n ) 1 ) p ( X a n a n + ϕ ( n ) 1 | X a n 1 ) = E P E P p ( X a n a n + ϕ ( n ) 1 ) p ( X a n a n + ϕ ( n ) 1 | X a n 1 ) | X a n 1 = E P x a n a n + ϕ ( n ) 1 B ( X a n 1 ) p ( x a n a n + ϕ ( n ) 1 ) p ( x a n a n + ϕ ( n ) 1 | X a n 1 ) · p ( x a n a n + ϕ ( n ) 1 | X a n 1 ) = E P [ x a n a n + ϕ ( n ) 1 B ( X a n 1 ) p ( x a n a n + ϕ ( n ) 1 ) ] 1
By Markov’s inequality and Equation (4), we have, for any ε > 0 ,
P 1 ϕ ( n ) log p [ m ] ( X a n a n + ϕ ( n ) 1 ) p ( X a n a n + ϕ ( n ) 1 ) ε 1 2 ε ϕ ( n )
Noting that n = 1 2 ε ϕ ( n ) < , we see by the Borel–Cantelli lemma that the event
P ω : 1 ϕ ( n ) log p [ m ] ( X a n a n + ϕ ( n ) 1 ) p ( X a n a n + ϕ ( n ) 1 ) ε i . o . = 0
By the arbitrariness of ε , we have
lim sup n 1 ϕ ( n ) log p [ m ] ( X a n a n + ϕ ( n ) 1 ) p ( X a n a n + ϕ ( n ) 1 ) 0 , P a . e .
Applying the same arguments using Markov’s inequality to Equation (3), we obtain
lim sup n 1 ϕ ( n ) log p [ m ] ( X a n a n + ϕ ( n ) 1 ) p ( X a n a n + ϕ ( n ) 1 | X a n 1 ) 0 , P a . e .
This proves the lemma. □
Lemma 2.
(SLLN for MA): For a stationary stochastic process { X n } n Z ,
lim n 1 ϕ ( n ) log p [ m ] ( X a n a n + ϕ ( n ) 1 ) = H m , P a . e .
and
lim n 1 ϕ ( n ) log p ( X a n a n + ϕ ( n ) 1 | X a n 1 ) = H , P a . e .
where H m = E P [ m ] { log p ( X 0 | X m 1 ) } , H = E P [ m ] { log p ( X 0 | X 1 ) } .
Proof. 
It is not difficult to verify that E P p ( X a n a n + m 1 ) 1 . An argument similar to the one used in Lemma 1 shows that
lim n 1 ϕ ( n ) log p ( X a n a n + m 1 ) = 0 , P a . e .
Let s [ 1 2 , 1 2 ] { 0 } , and define
Λ a n , ϕ ( n ) ( s , ω ) = 2 s k = a n + m a n + ϕ ( n ) log p ( X k | X k m k 1 ) k = a n + m a n + ϕ ( n ) 1 E P [ m ] [ 2 s log ( X k | X k m k 1 ) | X k m k 1 ] , n Z
Since
E P [ m ] Λ a n , ϕ ( n ) ( s , ω ) = E P [ m ] [ E P [ m ] ( Λ a n , ϕ ( n ) ( s , ω ) | X a n a n + ϕ ( n ) 1 ) ] = E P [ m ] E P [ m ] ( Λ a n , ϕ ( n ) 1 ( s , ω ) · 2 s log p ( X a n + ϕ ( n ) 1 | X a n + ϕ ( n ) m 1 a n + ϕ ( n ) 2 ) E P [ m ] ( 2 s log p ( X a n + ϕ ( n ) 1 | X a n + ϕ ( n ) m 1 a n + ϕ ( n ) 2 ) | X a n + ϕ ( n ) m a n + ϕ ( n ) 2 ) | X a n a n + ϕ ( n ) 2 ) = E P [ m ] Λ a n , ϕ ( n ) 1 ( s , ω ) E P [ m ] ( 2 s log p ( X a n + ϕ ( n ) 1 | X a n + ϕ ( n ) m 1 a n + ϕ ( n ) 2 ) | X a n + ϕ ( n ) m 1 a n + ϕ ( n ) 2 ) · E P [ m ] ( 2 s log p ( X a n + ϕ ( n ) 1 | X a n + ϕ ( n ) m 1 a n + ϕ ( n ) 2 ) | X a n a n + ϕ ( n ) 2 ) = E P [ m ] Λ a n , ϕ ( n ) 1 ( s , ω ) E P [ m ] ( 2 s log p ( X a n + ϕ ( n ) | X a n + ϕ ( n ) m a n + ϕ ( n ) 1 ) | X a n + ϕ ( n ) m a n + ϕ ( n ) 1 ) · E P [ m ] ( 2 s log p ( X a n + ϕ ( n ) 1 | X a n + ϕ ( n ) m 1 a n + ϕ ( n ) 2 ) | X a n + ϕ ( n ) m 1 a n + ϕ ( n ) 2 ) ( by Markov property ) = E P [ m ] Λ a n , ϕ ( n ) 1 ( s , ω ) = = 1 .
It is straightforward to show that
lim sup n 1 ϕ ( n ) log Λ a n , ϕ ( n ) ( s , ω ) 0 , P [ m ] a . e .
Note that
1 ϕ ( n ) log Λ a n , ϕ ( n ) ( s , ω ) = 1 ϕ ( n ) k = a n + m a n + ϕ ( n ) 1 [ s log p ( X k | X k m k 1 ) log E P [ m ] ( 2 s log p ( X k | X k m k 1 ) | X k m k 1 ) ] ,
By Equations (8) and (9) and and the property of superior limits, we have
lim sup n 1 ϕ ( n ) k = a n + m a n + ϕ ( n ) 1 s log p ( X k | X k m k 1 ) lim sup n 1 ϕ ( n ) k = a n + m a n + ϕ ( n ) 1 log E P [ m ] ( 2 s log p ( X k | X k m k 1 ) | X k m k 1 ) , P [ m ] a . e .
Setting s ( 0 , 1 2 ] , dividing both sides of Equation (10) by s, we obtain
lim sup n 1 ϕ ( n ) k = a n + m a n + ϕ ( n ) 1 [ log p ( X k | X k m k 1 ) ] = lim sup n 1 ϕ ( n ) [ log k = a n + m a n + ϕ ( n ) 1 p ( X k | X k m k 1 ) ] lim sup n 1 ϕ ( n ) k = a n + m a n + ϕ ( n ) 1 1 s log E P [ m ] ( 2 s log p ( X k | X k m k 1 ) | X k m k 1 ) , P [ m ] a . e .
Using the inequalities log x < x 1 ln 2 ( x > 0 ) and 0 < 2 x 1 x ln 2 < 1 2 ( x ln 2 ) 2 e | x ln 2 | , x R .
  • It follows from Equation (11) that
lim sup n 1 ϕ ( n ) k = a n + m a n + ϕ ( n ) 1 1 s log E P [ m ] ( 2 s log p ( X k | X k m k 1 ) | X k m k 1 ) E P [ m ] ( log p ( X 0 | X m 1 ) + lim sup n 1 ϕ ( n ) k = a n + m a n + ϕ ( n ) 1 E P [ m ] ( 2 s log p ( X k | X k m k 1 ) | X k m k 1 ) 1 s ln 2 + E P [ m ] ( log p ( X k | X k m k 1 ) | X k m k 1 ) = H m + lim sup n 1 ϕ ( n ) k = a n + m a n + ϕ ( n ) 1 E P [ m ] [ ( 2 s log p ( X k | X k m k 1 ) 1 + s ln 2 log p ( X k | X k m k 1 ) | X k m k 1 ) ] s ln 2 H m + s 2 ln 2 lim sup n 1 ϕ ( n ) k = a n + 1 a n + ϕ ( n ) 1 E P [ m ] [ ln 2 2 log 2 p ( X k | X k m k 1 ) e s | ln 2 log p ( X k | X k m k 1 ) | | X k m k 1 ] = H m + s 2 ln 2 lim sup n 1 ϕ ( n ) k = a n + 1 a n + ϕ ( n ) 1 E P [ m ] [ ln 2 p ( X k | X k m k 1 ) e s | ln p ( X k | X k m k 1 ) | | X k m k 1 ] H m + s 2 ln 2 lim sup n 1 ϕ ( n ) k = a n + 1 a n + ϕ ( n ) 1 E P [ m ] [ ln 2 p ( X k | X k m k 1 ) p 1 2 ( X k | X k m k 1 ) | X k m k 1 ] , P [ m ] a . e .
By the fact that max { t 1 2 ln 2 t , 0 t 1 } = 16 e 2 , we have
E P [ m ] [ ln 2 p ( X k | X k m k 1 ) p 1 2 ( X k | X k m k 1 ) | X k m k 1 ] = j = 1 b ln 2 p ( j | X k m k 1 ) p 1 2 ( j | X k m k 1 ) · p ( j | X k m k 1 ) = j = 1 b ln 2 p ( j | X k m k 1 ) · p 1 2 ( j | X k m k 1 ) 16 b e 2
From Equations (11)–(13), we have
lim sup n 1 ϕ ( n ) [ log k = a n + m a n + ϕ ( n ) 1 p ( X k | X k m k 1 ) ] H m + 8 log e · s b e 2 , P [ m ] a . e .
Putting s 0 in Equation (14), we obtain
lim sup n 1 ϕ ( n ) [ log k = a n + m a n + ϕ ( n ) 1 p ( X k | X k m k 1 ) ] H m , P [ m ] a . e .
Replacing s ( 0 , 1 2 ] by s [ 1 2 , 0 ) in the above argument, we can obtain
lim inf n 1 ϕ ( n ) [ log k = a n + m a n + ϕ ( n ) 1 p ( X k | X k m k 1 ) ] H m , P [ m ] a . e .
These imply that
lim n 1 ϕ ( n ) [ log k = a n + m a n + ϕ ( n ) 1 p ( X k | X k m k 1 ) ] = H m , P [ m ] a . e .
Note that P P [ m ] ; therefore, we have, by Equation (15),
lim n 1 ϕ ( n ) [ log k = a n + m a n + ϕ ( n ) 1 p ( X k | X k m k 1 ) ] = H m , P a . e .
Since P [ m ] ( X a n a n + ϕ ( n ) 1 ) = p ( X a n a n + m 1 ) k = a n + m a n + ϕ ( n ) 1 p ( X k | X k m k 1 ) , Equation (5) follows immediately from Equations (7) and (16).
  • Similarly, let s be a nonzero real number, and define
Δ a n , ϕ ( n ) ( s , ω ) = 2 s k = a n a n + ϕ ( n ) 1 log p ( X k | X k 1 ) [ E P ( 2 s log ( X 0 | X 1 ) | X 1 ) ] ϕ ( n ) , n Z
Note that
E P Δ a n , ϕ ( n ) ( s , ω ) = E P [ E P ( Δ a n , ϕ ( n ) ( s , ω ) | X a n + ϕ ( n ) 2 ) ] = E P E P ( Δ a n , ϕ ( n ) 1 ( s , ω ) · 2 s log p ( X a n + ϕ ( n ) 1 | X a n + ϕ ( n ) 2 ) E P ( 2 s log p ( X 0 | X 1 ) | X 1 ) | X a n + ϕ ( n ) 2 ) = E P Δ a n , ϕ ( n ) 1 ( s , ω ) E P ( 2 s log p ( X 0 | X 1 ) | X 1 ) · E P ( 2 s log p ( X a n + ϕ ( n ) 1 | X a n + ϕ ( n ) 1 ) | X a n + ϕ ( n ) 2 ) = E P Δ a n , ϕ ( n ) 1 ( s , ω ) E P ( 2 s log p ( X 0 | X 1 ) | X 1 ) · E P ( 2 s log p ( X 0 | X 1 ) | X 1 ) ( by stationary ) = E P Δ a n , ϕ ( n ) 1 ( s , ω ) = = 1 .
The remainder of the argument is analogous to that in proving Equation (5) and is left to the reader. □
Lemma 3.
(No gap): H m H and H = H .
Proof. 
We know that for stationary precesses H m H , so it remains to show that H m H .
Let Z 0 = log p ( X 0 ) , Z n = log p ( X 0 | X n 1 ) , n 1 . Since E ( Z n ) = H ( X 0 | X n 1 ) H ( X 0 ) < ,   Z n is integrable. Now, since all random variables are discrete, we may write
E P ( Z n + 1 | X n 0 = x n 0 ) = x ( n + 1 ) p ( x ( n + 1 ) | x n 0 ) log p ( x 0 | x ( n + 1 ) 1 ) + x ( n + 1 ) p ( x ( n + 1 ) 0 ) p ( x n 0 ) log p ( x ( n + 1 ) 1 ) p ( x ( n + 1 ) 0 ) log x ( n + 1 ) p ( x ( n + 1 ) 1 ) p ( x n 0 ) ( J e n s e n s i n e q u a l i t y ) = log p ( x n 1 ) p ( x n 0 ) = log p ( x 0 | x n 1 )
Therefore, E P ( Z n + 1 | X n 0 ) Z n and Z n is measurable relative to σ field σ ( X 0 , X 1 , , X n ) ; { Z n } n 1 is a non-negative supermartingale, hence converges a . e . to an integrable limit function for all x 0 X .
Note that, for any m,
H m = E P { log p ( X m + a n | X a n a n + m 1 ) } = E P { log p ( X 0 | X m 1 ) }
where the last equation follows from stationarity.
Since X is finite and p log p is bounded and continuous in p for all 0 p 1 , the bounded convergence theorem allows interchange of expectation and limit, yielding
lim m H m = lim m E P { x 0 X p ( x 0 | X m 1 ) log p ( x 0 | X m 1 ) } = E P { x 0 X p ( x 0 | X 1 ) log p ( x 0 | X 1 ) } = H
Thus, H m H = H . □

3. Main Results

With the preliminary accounted for, we wish to use the Lemma 1 to conclude that
1 ϕ ( n ) log p ( X a n a n + ϕ ( n ) 1 ) ) = 1 ϕ ( n ) i = 0 ϕ ( n ) 1 log p ( X a n + i | X a n a n + i 1 ) lim n E P ( log p ( X n | X 0 n 1 )
It is not easy to prove Equation (17). However, the closely related quantities p ( X a n + ϕ ( n ) | X a n + ϕ ( n ) m a n + ϕ ( n ) 1 ) and p ( X a n + ϕ ( n ) | X a n + ϕ ( n ) 1 ) are easily identified as entropy rates.
Recall that the entropy rate is given by
H = lim m H m = lim n 1 n m = 0 n 1 H m
Of course, H m H by stationarity and the fact that conditioning does not increase entropy. It will be crucial that H m H = H .
With the help of the preceding lemmas, we can now prove the following theorem:
Theorem 1.
(AEP) If H is the entropy rate of a finite-valued stationary process { X n } n Z , then it holds that
lim n 1 ϕ ( n ) log p ( X a n a n + ϕ ( n ) 1 ) = H , P a . e .
Remark 1.
In the case a n 1 , ϕ ( n ) = n , Theorem 1 reduces to the famous Shannon–McMillan–Breiman theorem, which is the fundamental theorem of information theory. Let a n = n ; it gives a delayed average version of AEP.
Proof. 
We argue that the sequence of random variables 1 ϕ ( n ) log p ( X a n a n + ϕ ( n ) 1 ) ) is asymptotically sandwiched between the upper bound H m and the lower bound H for all m 0 . The AEP will follow since H m H and H = H . □
From Lemma 1, we have
lim sup n 1 ϕ ( n ) log p [ m ] ( X a n a n + m 1 ) p ( X a n a n + ϕ ( n ) 1 ) 0 , P a . e .
which we rewrite, taking the existence of the lim n 1 ϕ ( n ) log p [ m ] ( X a n a n + ϕ ( n ) 1 ) into account, as
lim sup n 1 ϕ ( n ) log 1 p ( X a n a n + ϕ ( n ) 1 ) lim n 1 ϕ ( n ) log 1 p m ( X a n a n + ϕ ( n ) 1 ) = H m
for m = 1 , 2 , . Also, from Lemma 1, we have
lim sup n 1 ϕ ( n ) log p ( X a n a n + ϕ ( n ) 1 ) p ( X a n a n + ϕ ( n ) 1 | X a n 1 ) 0 , P a . e .
which we rewrite as
lim sup n 1 ϕ ( n ) log 1 p ( X a n a n + ϕ ( n ) 1 ) lim n 1 ϕ ( n ) log 1 p ( X a n a n + ϕ ( n ) 1 | X a n 1 ) = H
From the definition of H in Lemma 2, we have, by putting together Equations (6) and (7),
H lim inf n 1 ϕ ( n ) log p ( X a n a n + ϕ ( n ) 1 ) lim sup n 1 ϕ ( n ) log p ( X a n a n + ϕ ( n ) 1 ) H m
for all m.
  • But, by Lemma 3,  H m H = H . Consequently,
lim n 1 ϕ ( n ) log p ( X a n a n + ϕ ( n ) 1 ) = H , P a . e .
Now, we give some interesting applications of our main results in the next examples.
Example 1.
Let { X n } n Z be independent, identically distributed random variables drawn from the probability mass function p ( x ) ; then,
lim n 1 ϕ ( n ) log p ( X a n a n + ϕ ( n ) 1 ) = H ( X ) , a . e .
Example 2.
Let
X = 1 , 1 2 ; 2 , 1 4 ; 3 , 1 4 .
Let { X n } n Z be drawn i.i.d. according to this distribution; then,
lim n p ( X a n a n + ϕ ( n ) 1 ) 1 ϕ ( n ) = 2 1 4 log 6 , a . e .
Example 3.
Let { X n } n Z be independent identically distributed random variables drawn according to the probability mass function p ( x ) , x X . Thus, p ( x a n a n + ϕ ( n ) 1 ) = i = a n a n + ϕ ( n ) 1 p ( x i ) . Let q ( x a n a n + ϕ ( n ) 1 ) = i = a n a n + ϕ ( n ) 1 q ( x i ) , where q is another probability mass function on X ; then,
lim n 1 ϕ ( n ) log q ( X a n a n + ϕ ( n ) 1 ) p ( X a n a n + ϕ ( n ) 1 ) = D ( p q ) , a . e .
where D ( p q ) is the informational divergence between two probability distributions p and q on a common alphabet X .
Since convergence almost everywhere implies convergence in probability, Theorem 1 has the following implication:
Definition 2.
The typical set A ε a n , ϕ ( n ) with respect to P is the set of sequence ( x a n a n + ϕ ( n ) 1 ) X ϕ ( n ) with the following properties:
2 ϕ ( n ) ( H + ε ) p ( x a n a n + ϕ ( n ) 1 ) 2 ϕ ( n ) ( H ε )
As a consequence of the Theorem 1, we can show that the set A ε ( a n , ϕ ( n ) ) has the following properties:
Proposition 1.
Let { X n } n Z be independent, identically distributed random variables drawn from the probability mass function p ( x ) ; then,
(1).
If ( x a n a n + ϕ ( n ) 1 ) A ε ( a n , ϕ ( n ) ) , then
H ε 1 ϕ ( n ) log p ( x a n a n + ϕ ( n ) 1 ) H + ε .
(2).
P ( A ε ( a n , ϕ ( n ) ) ) > 1 ε for sufficiently large n.
(3).
| A ε ( a n , ϕ ( n ) ) | 2 ϕ ( n ) ( H ε ) , where | A | denotes the number of elements in set A.
(4).
| A ε ( a n , ϕ ( n ) ) | ( 1 ε ) 2 ϕ ( n ) ( H ε ) for sufficiently large n.
Proof. 
The property (1) is immediate from the definition of A ε ( a n , ϕ ( n ) ) .
Property (2) follows directly from Theorem 1, since the probability of the event ( X a n a n + ϕ ( n ) 1 ) A ε ( a n , ϕ ( n ) ) tends to 1 as n .
Thus, for any δ > 0 , there exists an n 0 such that for all n n 0 , we have
P { | 1 ϕ ( n ) log p ( X a n a n + ϕ ( n ) 1 ) H ( X ) | < ε } > 1 δ .
Setting δ = ε , we have the following:
To prove property (3), noticing that
1 = x a n a n + ϕ ( n ) 1 X ϕ ( n ) p ( x a n a n + ϕ ( n ) 1 p ( x a n a n + ϕ ( n ) 1 ) ) x a n a n + ϕ ( n ) 1 A ε ( a n , ϕ ( n ) ) ) A ε ( a n , ϕ ( n ) ) x a n a n + ϕ ( n ) 1 A ε ( a n , ϕ ( n ) ) ) 2 ϕ ( n ) [ H ( X ) + ε ] = 2 ϕ ( n ) [ H ( X ) + ε ] | A ε ( a n , ϕ ( n ) ) | ,
where the second inequality follows from Equation (19),
| A ε ( a n , ϕ ( n ) ) | 2 ϕ ( n ) ( H ε ) .
Finally, for sufficiently large n, P ( A ε ( a n , ϕ ( n ) ) ) > 1 ε ,
1 ε < P ( A ε ( a n , ϕ ( n ) ) ) x a n a n + ϕ ( n ) 1 A ε ( a n , ϕ ( n ) ) ) 2 ϕ ( n ) [ H ( X ) ε ] = 2 ϕ ( n ) [ H ( X ) ε ] | A ε ( a n , ϕ ( n ) ) | ,
where the second inequality follows from Definition 2. Therefore,
| A ε ( a n , ϕ ( n ) ) | ( 1 ε ) 2 ϕ ( n ) ( H ε ) .
These complete the proof of the proposition. □
Example 4.
Let { X n } n Z be i.i.d. p ( x ) , x X . Let H = p ( x ) log p ( x ) . Let A a n , ϕ ( n ) = { x a n a n + ϕ ( n ) 1 X ϕ ( n ) : | 1 ϕ ( n ) log p ( x a n a n + ϕ ( n ) 1 ) H | ϵ } and B a n , ϕ ( n ) = { x a n a n + ϕ ( n ) 1 X ϕ ( n ) : | 1 ϕ ( n ) Σ i = a n a n + ϕ ( n ) 1 X i E X i | ϵ } . Then we have the following:
(1) lim n P { X a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) } = 1 ;
(2) lim n P { X a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) B a n , ϕ ( n ) } = 1 ;
(3) | A a n , ϕ ( n ) B a n , ϕ ( n ) | 2 ϕ ( n ) ( H + ϵ ) , for all n;
(4) | A a n , ϕ ( n ) B a n , ϕ ( n ) | ( 1 2 ) 2 ϕ ( n ) ( H ϵ ) , for sufficiently large n.
Proof. 
(1) By Theorem 1, the probability X a n a n + ϕ ( n ) 1 is typical goes to 1.
(2) By the Strong Law of Large Numbers for moving average, we have P ( X a n a n + ϕ ( n ) 1 B a n , ϕ ( n ) ) 1 . So there exists ϵ > 0 and N 1 such that P ( X a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) ) > 1 ϵ 2 for all n > N 1 , and there exists N 2 such that P ( X a n a n + ϕ ( n ) 1 B a n , ϕ ( n ) ) > 1 ϵ 2 for all n > N 2 . So for all n > max ( N 1 , N 2 ) ,
P ( X a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) B a n , ϕ ( n ) ) = P ( X a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) ) + P ( X a n a n + ϕ ( n ) 1 B a n , ϕ ( n ) ) P ( X a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) B a n , ϕ ( n ) ) > 1 ϵ 2 + 1 ϵ 2 1 = 1 ϵ
So for any ϵ > 0 there exists N = max ( N 1 , N 2 ) such that P ( X a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) B a n , ϕ ( n ) ) > 1 ϵ for all n > N ; therefore, P ( X a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) B a n , ϕ ( n ) ) 1 .
(3) By the law of total probability Σ x a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) B a n , ϕ ( n ) p ( x a n a n + ϕ ( n ) 1 ) 1 . Also, for x a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) , from Theorem 1 in the text, p ( x a n a n + ϕ ( n ) 1 ) 2 ϕ ( n ) ( H + ϵ ) . Combining these two equations gives
1 Σ x a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) B a n , ϕ ( n ) p ( x a n a n + ϕ ( n ) 1 ) Σ x a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) B a n , ϕ ( n ) 2 ϕ ( n ) ( H + ϵ ) = | A a n , ϕ ( n ) B a n , ϕ ( n ) | 2 ϕ ( n ) ( H + ϵ ) .
Multiplying through by 2 ϕ ( n ) ( H + ϵ ) gives the result | A a n , ϕ ( n ) B a n , ϕ ( n ) | 2 ϕ ( n ) ( H + ϵ ) .
(4) Since from (2) P { X a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) B a n , ϕ ( n ) } 1 , there exists N such that P { X a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) B a n , ϕ ( n ) } 1 2 for all n > N . From Theorem 1 in the text, for x a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) , p ( x a n a n + ϕ ( n ) 1 ) 2 n ( H ϵ ) . So, combining these two gives
1 2 Σ x a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) B a n , ϕ ( n ) p ( x a n a n + ϕ ( n ) 1 ) Σ x a n a n + ϕ ( n ) 1 A a n , ϕ ( n ) B a n , ϕ ( n ) 2 ϕ ( n ) ( H ϵ ) = | A a n , ϕ ( n ) B a n , ϕ ( n ) | 2 ϕ ( n ) ( H ϵ ) .
Multiplying through by 2 ϕ ( n ) ( H ϵ ) gives the result | A a n , ϕ ( n ) B a n , ϕ ( n ) | ( 1 2 ) 2 ϕ ( n ) ( H ϵ ) for sufficiently large n. □

Author Contributions

Writing—original draft, Y.R. and Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by NSF of Anhui University China (No. KJ2021A0386).

Data Availability Statement

No new data were created or analyzed in this study.

Acknowledgments

It is a pleasure to acknowledge our debt to Weicai Peng who suggested to us the problem addressed herein. We are grateful to the three anonymous referees and the editor for the useful comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2005. [Google Scholar]
  2. Algoet, P.H.; Cover, T.M. A sandwich proof of the Shannon-McMillan-Breiman theorem. Ann. Probab. 1988, 16, 899–909. [Google Scholar] [CrossRef]
  3. Breiman, L. The Individual Ergodic Theorem of Information Theory. Ann. Math. Stat. 1957, 28, 809–811. [Google Scholar] [CrossRef]
  4. McMillan, B. The basic theorems of information. Ann. Math. Stat. 1953, 24, 196–219. [Google Scholar] [CrossRef]
  5. Neshveyev, S.; StØrmer, E. The McMillan Theorem for a Class of Asymptotically Abelian C*-Algebras. Ergod. Theory Dyn. Syst. 2002, 22, 889–897. [Google Scholar] [CrossRef]
  6. Girardin, V. On the different extensions of the ergodic theorem of information theory. In Recent Advance in Applied Probability; Springer Science+Business Media: Berlin/Heidelberg, Germany, 2005; pp. 163–179. [Google Scholar]
  7. Akcoglu, M.A.; Junco, D. Convergence of averages of point transformations. Proc. Am. Math. Soc. 1975, 49, 265–266. [Google Scholar] [CrossRef]
  8. Bellow, A.; Jones, R.; Rosenblatt, J.M. Convergence for moving averages. Ergod. Theory Dyn. Syst. 1990, 10, 43–62. [Google Scholar] [CrossRef]
  9. del Junco, A.; Steele, J.M. Moving averages of ergodic process. Metrika 1977, 24, 35–43. [Google Scholar] [CrossRef]
  10. Schwartz, M. Polynomially moving ergodic average. Proc. Am. Math. Soc. 1988, 103, 252–254. [Google Scholar] [CrossRef]
  11. Haili, H.K.; Nair, R. Optimal continued fractions and the moving average ergodic theorem. Period. Math. Hung. 2013, 66, 95–103. [Google Scholar] [CrossRef]
  12. Wang, Z.Z.; Yang, W.G. The generalized entropy ergodicity theorem for nonhomogeneous Markov chains. J. Theor. Probab. 2016, 29, 761–775. [Google Scholar] [CrossRef]
  13. Wang, Z.Z.; Yang, W.G. Markov approximation and the generalized entropy ergodic theorem for non-null stationary process. Proc. Indian Acad. Sci. (Math. Sci.) 2020, 130, 13. [Google Scholar] [CrossRef]
  14. Shi, Z.Y.; Wang, Z.Z.; Zhong, P.P. The generalized entropy ergodicity theorem for nonhomogeneous bifurcating Markov chains indexed by a binary tree. J. Theor. Probab. 2022, 35, 1367–1390. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ren, Y.; Wang, Z. On Asymptotic Equipartition Property for Stationary Process of Moving Averages. Symmetry 2024, 16, 827. https://doi.org/10.3390/sym16070827

AMA Style

Ren Y, Wang Z. On Asymptotic Equipartition Property for Stationary Process of Moving Averages. Symmetry. 2024; 16(7):827. https://doi.org/10.3390/sym16070827

Chicago/Turabian Style

Ren, Yuanyuan, and Zhongzhi Wang. 2024. "On Asymptotic Equipartition Property for Stationary Process of Moving Averages" Symmetry 16, no. 7: 827. https://doi.org/10.3390/sym16070827

APA Style

Ren, Y., & Wang, Z. (2024). On Asymptotic Equipartition Property for Stationary Process of Moving Averages. Symmetry, 16(7), 827. https://doi.org/10.3390/sym16070827

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop