1. Introduction
Shannon introduced entropy, originally a thermodynamic function, into information theory to measure the uncertainty of random phenomena [
1]. Shannon’s entropy in information theory has been successfully applied in many engineering fields, including vibration-signal feature extraction [
2], chaotic image encryption [
3], groundwater quality evaluation [
4], etc. In order to accurately compute the average uncertainty of a stochastic process, the existence and properties of the entropy rate should be estimated [
5,
6]. Many approaches have been adopted to improve the theoretical integrity of the entropy rate. One of the famous works is the Shannon–McMillan–Breiman theorem or entropy ergodic theorem, or the asymptotic equipartition property (AEP), which reflects the almost surely (a.s.) convergence of the entropy rate to a constant. Liu ang Yang [
7] proposed an extension of the Shannon–McMillan–Breiman theorem and some limit properties for nonhomogeneous Markov chains. Yang [
8] proved the AEP for a nonhomogeneous Markov information source. Ordentlich et al. [
9] used the Blackwell measure to compute the entropy rate. The entropy rate for Hidden Markov Models (HMMs) has been expressed in terms of upper and lower bounds [
10]. However, even if states of HMMs are finite, the finite expression of the entropy rate does not exist, which is mainly because the set of their predictive features is generically infinite. To solve this problem, Jurgens et al. [
11] evolved the mixed state according to the iterated function system and sampled the entropy of the place-dependent probability distribution at each step sequentially. Using an arbitrarily long word, the mean of these entropies can converge to the entropy rate.
The entropy rate is defined by the average of random variables throughout the entire process. In fact, we often encounter time series that appear to be “locally stationary”, so we can take an average of what has happened in some window of the recent past. The generalized entropy rate in the form of delayed averages can overcome the redundancy of initial information while ensuring stationarity and, therefore, has better practical value. Essentially, the generalized entropies are nonnegative functions defined on probability distributions that satisfy continuity, maximality, and expansibility. Delayed averages of random variables were first discussed by Zygmund [
12]. Using the limiting behavior of delayed averages, Chow [
13] proposed necessary and sufficient conditions for the Borel summability of independent identically distributed random variables. Lai [
14] studied analogues of the law of the iterated logarithm for delayed averages of independent random variables. On this basis, Gut and Stradtm
ller [
15] studied the strong law of large numbers on delayed averages of random fields. Wang [
16] discussed the limit theorems of delayed averages for row-wise conditionally independent stochastic arrays and a class of asymptotic properties of moving averages for Markov chains in Markovian environments. From these studies, it can be seen that the limit of delayed averages has laid a solid theoretical foundation and is naturally applied to the entropy rate.
Combining the generalized entropy rate and nonhomogeneous Markov chains, Wang and Yang [
17,
18] studied generalized entropy ergodic theorems with a.s. and
convergence for time nonhomogeneous Markov chains, and they obtained generalized entropy ergodic theorems for non-null stationary processes using Markov approximation. Shi et al. [
19] studied the generalized AEP of higher-order nonhomogeneous Markov information sources by establishing several strong deviation theorems. The entropy rate is an important characteristic of HMMs, and the entropy rate of HMMs plays an important role in applications such as communication decoding, compression, and sorting. Therefore, theoretical research on the entropy rate of HMMs is very necessary. The classical HMMs were first introduced by Baum and Petrie, and they have been widely applied in various fields, including speech recognition, facial expression recognition, gene prediction, gesture recognition, musical composition, bio-informatics, and big data ranking [
20,
21,
22,
23]. The power of these models is that they can be very efficiently implemented and simulated. A homogeneous HMM contains two stochastic processes: the observed process is assumed to be conditionally temporally independent given the hidden process, and the hidden process is assumed to evolve according to a first-order Markov chain. A Nonhomogeneous Hidden Markov Model (NHMM) provides the idea by allowing for the transition matrices of the hidden states to be related to a set of observed covariates, that is, the transition matrix of a homogeneous HMM is unique, while an NHMM requires the transition matrices to be dependent on time variables. From the perspective of model structure, NHMMs are novel extensions of homogeneous HMMs. In the last ten years, new theories on NHMMs have emerged. Yang et al. [
24] stated the law of large numbers for countable NHMMs. Zhang et al. [
25] studied the stability analysis and controller design for a family of nonhomogeneous hidden semi-Markov jump systems with limited information of sojourn-time probability density functions. Shahzadi et al. [
26] proposed a class of Nonhomogeneous Hidden Semi-Markov Models for modelling partially observed processes that do not necessarily behave in a stationary and memoryless manner.
Although there have been fruitful achievements in the two fields of generalized entropy ergodic theorems and NHMMs, research on generalized entropy ergodic theorems for NHMMs is limited. Therefore, we consider extending the application scenarios of the generalized entropy rate. Motivated by the above work, the main focus of this paper is to obtain a strong limit theorem of delayed averages of real-number functions and generalized entropy ergodic theorems with an almost surely convergence for NHMMs. These results provide a general idea for the relevant theoretical proof and concise formulas for the computation and estimation of the generalized entropy rate for NHMMs, and they lay the necessary mathematical, theoretical foundation for the reliability of model applications. The rest of this paper is organized as follows: in
Section 2, a detailed description of NHMMs and related definitions are introduced.
Section 3 presents some limit properties that are used in
Section 4. In
Section 4, the main results and the proofs are given.
Section 5 summarizes the important content and discusses the significance of the research.
2. Preliminaries
This section provides preliminaries for the subsequent important contents. Continuing with the introduction, we state basic concepts and properties of NHMMs, point out the relationship between NHMMs and homogeneous HMMs, and define the generalized entropy, the generalized entropy rate, and other commonly used symbols on this basis.
Firstly, we give the definition and properties of NHMMs. Let be a nonhomogeneous Markov chain defined on the probability space taking values in a finite state space , and a stochastic process defined on the probability space taking values in a finite state space . is called an NHMM if and only if it meets the following forms and conditions:
The initial distribution of the nonhomogeneous Markov chain
is
and the transition matrices are
where
For any
n
where
are the realizations of
.
If for any n, , and the conditional probabilities are independent on n, where Q is a stochastic matrix, and are realizations of and , then is a homogeneous Markov chain and is a homogeneous HMM.
Next, we list some properties for NHMMs, which are also equivalent definitions of the model and play a role in proving the theorems. For any , let , and let and be the realizations of and , respectively. In addition, For any , let .
is an NHMM if and only if for any
n,
is an NHMM if and only if for any
n,
is an NHMM if and only if for any
n,
and
By Equations (7) and (8), we have
In the following text, probability measures are widely used. Therefore, for any
, denote
and
A delayed average is one of the widely known technical indicators used to predict future data in time series analysis, and we define the entropy and the entropy rate in the form of delayed averages. Let be a sequence of non-negative integers such that as .
For simplicity, we use the natural logarithm here; thus, the generalized entropy can be measured. According to information theory, the definition of the entropy of an NHMM
is
where
represents the expectation, and the definition of the entropy rate of an NHMM
is
Combining the concept of the generalized entropy with the properties of NHMMs, we have
where
and
denotes
and
, respectively, and we use the distribution
which is the same below.
To prepare for the following text, we introduce concepts that may frequently appear.
Definition 1. Let be an NHMM defined as above. Define the generalized entropy density aswhere . In the following text, we prove that the limit of the entropy density is the entropy rate.
Definition 2 ([
21])
. Let be a homogeneous Markov chain. Let Q be a transition matrix of . Q is called strongly ergodic if there exists a probability distribution in satisfyingwhere is a starting vector. Obviously, Equation (19) implies , and π is called the stationary distribution determined by Q. Definition 3. Let be a vector, and then the norm of α is defined by Let be a square matrix, and then the norm of A is defined by 3. Some Limit Properties
In this section, we give some lemmas, which are used to prove the main conclusions. These lemmas include a strong limit theorem for NHMMs and limit properties of a norm. Lemma 1 provides a strong limit theorem for bounded functions of NHMMs. Lemma 2 gives the convergence of bounded functions of real sequences on averages. Lemma 3 points out that under the condition of the convergence of the transition matrices of NHMMs, the counting averages of Markov chains converge to a stationary distribution of irreducible matrices. Lemma 4 points out the convergence of the transition matrix multiplication average. Lemma 5 proves the relationship between the convergence of vector sequences on averages and the convergence of subsequences.
Lemma 1. Let be an NHMM. Let be a sequence of bounded real number functions defined on . If for any ,and there exists a positive number γ such that for any ,thenwhere represents conditional expectation. Proof. Let
u be a nonzero real number, and define
where
and
represents conditional expectation. By the properties of conditional expectations, we have
For any
, by Markov inequality and Equation (
25), we have
Combining the Borel–Cantelli Lemma and the arbitrariness of
, we have
Expanding Equation (
28), we have
Let
. Using the inequalities
and
, we have
Let
in Equation (
30), then
Similarly, let
, we have
Equation (
24) follows immediately from inequalities (31) and (32). □
The key to prove Lemma 1 is to construct the likelihood ratio. By using the approximation of the upper and lower limits, it is possible to obtain an almost surely convergence of bounded functions. This lemma is used to prove Theorem 1.
Lemma 2 ([
17])
. Let be a bounded function defined on an real-number interval D and a sequence in D. If , is continuous at point z, and Equation (22) holds, then Lemma 3 ([
17])
. Let Q be an irreducible transition matrix. If Equation (22) holds, and for any ,thenholds for any , where is an indicative function and is the unique stationary distribution determined by Q. These two lemmas serve as support for theorem 2. It should be emphasized that Q in Lemma 3 is irreducible, so this lemma still holds for ergodic matrices.
Lemma 4. Let be a nonhomogeneous Markov chain with transition matrices . Let Q be a periodic strong ergodic stochastic matrix. Assume that is a left eigenvector of Q, the unique solution of equations . Let B be a constant random matrix, where each row of B is c. If Equation (22) holds andthenholds for any m , where (I is the identical matrix). The proof process of Lemma 4 is similar to theorem 1 of [
27]. It should be emphasized that the matrices appearing in Lemma 4 are composed of constants, so the norm can be calculated. Lemma 4 is one of the prerequisites for Theorem 3.
Lemma 5. Let and β be column vectors with real entries. If Equation (22) holds andthen there exists a subsequence of such that Proof. Constructing an inequality, there exists
From Equation (
40), it can be concluded that there exists
, such that
holds. Choose a positive sequence
, and by Equation (
41),
,
Therefore, there exists
such that
. By (40), we have
Similarly, there exists
such that
. Generally, we can get a subsequence
of
such that
. Equation (
39) follows immediately. □
The key to the proof of Lemma 5 is to construct inequality (40). According to the properties of the convergent sequence, even if the sequence is not monotonic, there exists , and inequality (40) holds. Lemma 5 is also one of the prerequisites for Theorem 3.
4. Generalized Entropy Ergodic Theorems
In this section, we give the main results and their proofs. Generalized entropy ergodic theorems with an almost surely convergence for NHMMs are presented. These results provide concise formulas for the computation and estimation of the generalized entropy rate for NHMMs.
Theorem 1. Let be an NHMM. If for any ,and there exists a positive number γ, such that for any ,then Proof. Set
and
in Lemma 3. Using the inequality
, we can conclude that for any
,
It is not hard to verify that
The transition probability falls within the interval [0,1], so its logarithm is bounded. In addition, infinitesimal multiplication by a bounded quantity is still infinitesimal, so the initial distribution is erased. Hence the conclusion can be deduced immediately from Lemma 1. □
Theorem 2. Let be an NHMM. Assume that for any ,and there exists a positive number γ, such that for any ,If for any ,holds, where q represents terms of an irreducible transition matrix Q, thenwhere belongs to , which is a stationary distribution of Q. Proof. Set
in Lemma 2; then, for any
,
Using absolute inequality, we have
where
belongs to
, which is a stationary distribution of
Q. By Lemma 3, Equation (
52) follows from Equation (
54). □
The proof of Theorem 1 mainly relies on the construction of inequalities, and the proof of Theorem 2 utilizes the properties of norms. The two theorems explain that the limit of the generalized entropy density is the generalized entropy rate.
Theorem 3. Let be an NHMM. is another transition matrix, and assume that Q is periodic and strongly ergodic. Letwhere are the elements of column vectors and β, respectively, and assume that are bounded. If Equation (22) holds,andthen the generalized entropy rate of exists, andwhere is the unique stationary distribution determined by Q. Proof. Let
be a row vector with elements
. Hence, using the definition and properties of
, we have
Simply take
B as a constant random matrix whose rows are equal to
. Note that
, where
is an initial distribution of the Markov chain. Since
where
,
(
I is the identical matrix), by Equations (57) and (61) and Lemma 5, we have
By Equation (
58) and Lemma 5, there exists a subsequence
of
such that
Hence,
is finite. By Equation (
62) and properties of entropy, we have
This completes the proof of Theorem 3. □
Theorem 3 gives a method to compute the generalized entropy rate of an NHMM under some mild conditions, and when the model degenerates into a homogeneous HMM and a nonhomogeneous Markov chain, the results are still the same. The two corollaries are existing results, which indirectly demonstrate the correctness of this theorem.
Corollary 1. Let be an NHMM with a periodic and strongly ergodic transition matrix and an emission probability matrix , where . Letwhere are the elements of column vector β. Assume that is bounded. If Equation (22) holds, then the generalized entropy rate of exists, andwhere is the unique stationary distribution determined by Q. Corollary 2. Let be a nonhomogeneous Markov chain. is another transition matrix, and assume that Q is periodic and strongly ergodic, where . Letwhere are the elements of column vectors and β, respectively. Assume that are bounded. If Equation (22) holds,andthen the generalized entropy rate of X exists, andwhere is the unique stationary distribution determined by Q.