**3. Applications**

This section borrows its statements largely from D ˛ebowski [1–3] and is provided only to sketch some context for our research and justify its applicability to statistical language modeling. Let (*Xi*)*<sup>i</sup>*∈<sup>Z</sup> be a two-sided infinite stationary process over a countable alphabet X on a probability space (XZ, X Z, *<sup>P</sup>*), where *Xk*((*<sup>ω</sup>i*)*<sup>i</sup>*∈<sup>Z</sup>) := *ωk*. We denote random blocks *Xkj* := (*Xi*)*j*≤*i*≤*k* and complete *σ*-fields G*kj* := *σ*(*Xkj* ) generated by them. By the generalized calculus of Shannon information measures, i.e., Theorems 1 and 2, we can define the entropy rate *hP* and the excess entropy *EP* of process (*Xi*)*<sup>i</sup>*∈<sup>Z</sup> as

$$h\_P := \lim\_{n \to \infty} H\_P(\mathcal{G}\_0 | \mathcal{G}\_{-n}^{-1}) = H\_P(\mathcal{G}\_0 | \mathcal{G}\_{-\infty}^{-1}) \text{ if } \mathbb{X} \text{ is finite},\tag{41}$$

$$E\_P := \lim\_{n \to \infty} I\_P(\mathcal{G}\_{-n}^{-1}; \mathcal{G}\_0^{n-1}) = I\_P(\mathcal{G}\_{-\infty}^{-1}; \mathcal{G}\_0^{\infty}), \tag{42}$$

see [10] for more background.

Let *<sup>T</sup>*((*<sup>ω</sup>i*)*<sup>i</sup>*∈<sup>Z</sup>) := (*<sup>ω</sup>i*+<sup>1</sup>)*<sup>i</sup>*∈<sup>Z</sup> be the shift operation and let I := "*A* ∈ X Z : *<sup>T</sup>*−<sup>1</sup>(*A*) = *A*# be the invariant *σ*-field. By the Birkhoff ergodic theorem [11], we have *σ*(I) ⊂ *<sup>σ</sup>*(G−∞) ∩ *<sup>σ</sup>*(G∞) for the tail *σ*-fields G−∞ := '<sup>∞</sup>*n*=<sup>1</sup> G−*n* −∞ and G∞ := '<sup>∞</sup>*n*=<sup>1</sup> G<sup>∞</sup>*n* . Hence, by Theorems 1 and 2 we further obtain expressions

$$h\_P = H\_P(\mathcal{G}\_0 | \mathcal{G}\_{-\infty}^{-1}) = H\_P(\mathcal{G}\_0 | \mathcal{G}\_{-\infty}^{-1} \wedge \mathcal{T}) \text{ if } \mathbb{X} \text{ is finite},\tag{43}$$

$$E\_P = I\_P(\mathcal{G}\_{-\infty}^{-1}; \mathcal{G}\_0^{\infty}) = H\_P(\mathcal{Z}) + I\_P(\mathcal{G}\_{-\infty}^{-1}; \mathcal{G}\_0^{\infty}|\mathcal{Z}).\tag{44}$$

Denoting the conditional probability *F*(*A*) := *<sup>P</sup>*(*A*|I), which is a random stationary ergodic measure by the ergodic decomposition theorem [12], we notice that *HP*(G0|G−<sup>1</sup> −∞ ∧ I) = **E** *PHF*(G0|G−<sup>1</sup> −<sup>∞</sup>) and *IP*(G−<sup>1</sup> −<sup>∞</sup>; G<sup>∞</sup>0 |I) = **E** *P IF*(G−<sup>1</sup> −<sup>∞</sup>; G<sup>∞</sup>0 ), and consequently we obtain the ergodic decomposition of the entropy rate and excess entropy, which reads

$$h\_P = \mathbb{E}\_P h\_F \text{ if } \mathbb{X} \text{ is finite} \\ \text{s.} \tag{45}$$

$$E\_P = H\_P(\mathcal{T}) + \mathbf{E}\_P E\_F. \tag{46}$$

Formulae (45) and (46) were derived by Gray and Davisson [13] and D ˛ebowski [1] respectively. The ergodic decomposition of the entropy rate (45) states that a stationary process is asymptotically deterministic, i.e., *hP* = 0, if and only if almost all its ergodic components are asymptotically deterministic, i.e., *hF* = 0 almost surely. In contrast, the ergodic decomposition of the excess entropy (46) states that a stationary process is infinitary, i.e., *EP* = <sup>∞</sup>, if some of its ergodic components are infinitary, i.e., *EF* = ∞ with a nonzero probability, or if *HP*(I) = <sup>∞</sup>, i.e., if the process is strongly nonergodic in particular, see [14,15].

The linguistic interpretation of the above results is as follows. There is a hypothesis by Hilberg [16] that the excess entropy of natural language is infinite. This hypothesis can be partly confirmed by the original estimates of conditional entropy by Shannon [17], by the power-law decay of the estimates of the entropy rate given by the PPM compression algorithm [18], by the approximately power-law growth of vocabulary called Heaps' or Herdan's law [2,3,19,20], and by some other experiments applying neural statistical language models [21,22]. In parallel, D ˛ebowski [1–3] supposed that the very large excess entropy in natural language may be caused by the fact that texts in natural language describe some relatively slowly evolving and very complex reality. Indeed, it can be mathematically proved that if the abstract reality described by random texts is unchangeable and infinitely complex, then the resulting stochastic process is strongly nonergodic, i.e., *HP*(I) = ∞ in particular [1–3]. Consequently, its excess entropy is infinite by formula (46). We suppose that a similar mechanism may work for natural language, see [23–26] for further examples of abstract stochastic mechanisms leading to infinitary processes.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The author declares no conflict of interest.
