Next Article in Journal
The Effects of Spontaneous Random Activity on Information Transmission in an Auditory Brain Stem Neuron Model
Previous Article in Journal
Detection and Modeling of Cyber Attacks with Petri Nets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The McMillan Theorem for Colored Branching Processes and Dimensions of Random Fractals

1
Department of Mathematics, IT and Landscape Architecture, John Paul II Catholic University of Lublin, Konstantynuv Str. 1H, 20-708 Lublin, Poland
2
Department of Mechanics and Mathematics, Belarusian State University, Nezavisimosti Ave. 4, 220030 Minsk, Belarus
Entropy 2014, 16(12), 6624-6653; https://doi.org/10.3390/e16126624
Submission received: 16 September 2014 / Revised: 18 November 2014 / Accepted: 12 December 2014 / Published: 19 December 2014
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
For the simplest colored branching process, we prove an analog to the McMillan theorem and calculate the Hausdorff dimensions of random fractals defined in terms of the limit behavior of empirical measures generated by finite genetic lines. In this setting, the role of Shannon’s entropy is played by the Kullback–Leibler divergence, and the Hausdorff dimensions are computed by means of the so-called Billingsley–Kullback entropy, defined in the paper.

1. Introduction

Let us fix a finite set Ω = {1,…, r}, whose elements denote different colors, and consider a finite random set X containing a random number of elements of colors 1, …, r. The simplest colored branching process can be defined as an evolution of a population in which all individuals live the same fixed time, and then, when the lifetime ends, each individual generates (independently of others) a random set of “children” containing individuals of different colors and distributed as X. We will assume that the evolution starts with a unique initial individual. It is suitable to represent this process as a random genealogical tree with individuals as vertices and each vertex connected with its “children” by edges. Denote by Xn the set of all genetic lines of length n starting from the initial individual. The colored branching process can degenerate (when, starting from some n, all of the sets Xn are empty) or, otherwise, evolve endlessly. Every genetic line x = (x1, …, xn) ∈ Xn generates an empirical measure δx,n on the set of colors Ω in accordance with the following rule: for each i ∈ Ω, the value of δx,n(i) is the quota in the string (x1, …, xn) of all elements having the color i.
Denote by μ(i), where i ∈ Ω, the expectation of the number of all elements xX having the color i. Let ν be an arbitrary probability measure on Ω. An analog to the McMillan theorem that will be proven below (see Theorem 11) asserts that under the condition of non-degeneracy of the colored branching process, the cardinality of the set {xXn | δx,n ≈ ν} almost surely (i.e., with probability one) has asymptotics (up to the logarithmic equivalence) of order e(ν,μ), where:
ρ ( ν , μ ) = i Ω ν ( i ) ln ν ( i ) μ ( i ) .
Formally, the value of ρ(ν, μ) coincides with the usual Kullback–Leibler divergence and differs from it only in the fact that in our setting, the measure μ is not a probability, and so ρ(ν, μ) can be negative.
This is the first main result of the paper. It generalizes the usual McMillan theorem, which deals with Ωn, rather than Xn. Its proof is based essentially on the Legendre duality between the Kullback–Leibler divergence and the so-called spectral potential, which was introduced first in [1] and then investigated profoundly in [2,3].
We investigate also Hausdorff dimensions of random fractals defined in terms of the limit behavior of the sequence of empirical measures δx,n. Let X be the set of all infinite genetic lines. Fix an arbitrary vector θ = (θ(1), …, θ(r)) ∈ (0, 1)r and define the following metric on X:
dist ( x , y ) = t = 1 n θ ( x t ) , where n = inf { t | x t y t } 1.
Denote by V any set of probability measures on Ω. The second main result of the paper (concerning Hausdorff dimensions of random fractals) is the following: under the condition of non-degeneracy of the colored branching process,
dim H { x X | δ x , n V } = sup ν V d ( ν , μ , θ )
almost surely, where δx,nV means that the sequence δx,n has at least one limit point in V and d(ν, μ, θ) is the Billingsley–Kullback entropy defined as:
d ( ν , μ , θ ) = ρ ( ν , μ ) i = 1 r ν ( i ) ln θ ( i ) = i = 1 r ν ( i ) ln ν ( i ) μ ( i ) i = 1 r ν ( i ) ln θ ( i )
Moreover, almost surely:
dim H { x X | δ x , n ν } = d ( ν , μ , θ ) .
These results are the essence of Theorems 12 and 17 below.
The systematic study of random self-similar fractals was initiated in the works by Falconer [4], Mauldin, Williams [5] and Graf [6]. They considered the attractor K of a random version of an iterated function system (IFS) in ℝn, letting the scaling factors θ(i) ∈ [0, 1) be independent and identically distributed (i.i.d.) random variables. The main result proven in [46] is the following: if the open set condition holds and i θ ( i ) > 1, then K ≠ ∅ with positive probability, and with the same probability (i.e., almost surely, provided K ≠ ∅), its Hausdorff dimension s is a unique solution to the equation i θ ( i ) s = 1. Recently, the research was extended to the case of random self-affine fractals in ℝn (see [7,8]) and random affine code tree fractals [9].
On the other hand, Billingsley in [10,11] considered fractals in the space of infinite sequences Ω, where Ω is a finite set, defined in terms of the limit behavior of empirical measures. He investigated Hausdorff dimensions of such fractals and, in particular, obtained formulas analogous to Formulars (2) and (4) (with Ω instead of X and μ(i) 1). Afterwards, Billingsley’s ideas and methods were transferred to strange attractors of dynamical systems in ℝn (see [12,13]).
The present research combines both features of random self-similar fractals and subsets of nonrandom self-similar fractals defined in terms of the limit behavior of empirical measures. This combination provides its novelty.
The paper can be divided into two parts. The first one (Sections 3–7) contains preliminary information necessary for our research. All theorems in this part are known and supplied with appropriate references; however, some of them are modified in a certain way for the sake of convenience in further usage. Most of them are proven for completeness of presentation and the reader’s convenience. The second part (Sections 2 and 8–11) contains essentially new results, which have been briefly formulated above.

2. Examples and Discussion

Example 1. Suppose that Ω = {1, 2}, and the random set X ⊂ Ω contains the elements 1, 2 with probabilities p(1), p(2), respectively. Obviously, for i = 1, 2, the expectation of the number of elements in X that are equal to i is p(i). Therefore, in the notation of the previous section, μ(i) = p(i).
Let us define a sequence of random sets Xn ⊂ Ωn in the following way: each Xn contains each of 2n strings x = (x1, …, xn) ∈ Ωn with probability p(x1) ⋯ p(xn). In other words, under the condition (x1, …, xn) ∈ Xn, the set Xn+1 contains the string (x1, …, xn, 1) with probability p(1) and the string (x1, …, xn, 2) with probability p(2). In particular, Xn+1 contains both with probability p(1)p(2) and contains neither of them with probability (1 − p(1))(1 − p(2)).
Let ν be the probability distribution on Ω, such that ν(1) = 1/3 and ν(2) = 2/3. Question: how many strings (x1, …, xn) in which the quota of elements that equal one is close to 1/3 and the quota of elements that equal two is close to 2/3 are there in the random set Xn? According to Formula (1), under the condition of non-degeneracy of the stochastic process X1, X2, X3, …, the number of those strings with probability one has an asymptotics of order e(ν,μ), where:
ρ ( ν , μ ) = i Ω ν ( i ) ln ν ( i ) μ ( i ) = 1 3 ln 1 / 3 p ( 1 ) + 2 3 ln 2 / 3 p ( 2 ) .
In particular, if p(1) = p(2) = 2/3, then ρ ( ν , μ ) = 1 3 ln 2 and, respectively, e(ν,μ) = 2n/3. On the other hand, if p(1) = 1 and p(2) = 1/3, then ρ ( ν , μ ) = 1 3 ln 4 3 and e(ν,μ) = (3/4)n/3. The latter means that for all large enough n, there are no strings x = (x1, …, xn) in the random set Xn, such that δx,n(1) ≈ 1/3 and δx,n(2) ≈ 2/3.
Example 2. (random Cantor dust). To each string x = (x1, …, xn) ∈ Xn from Example 1, let us put in correspondence a line segment I(x) ⊂ ℝ as follows:
I ( ) = [ 0 , 1 ] , I ( 1 ) = [ 0 , 1 / 4 ] , I ( 2 ) = [ 1 / 2 , 1 ] , I ( 1 , 1 ) = [ 0 , 1 / 16 ] , I ( 1 , 2 ) = [ 1 / 8 , 1 / 4 ] , I ( 2 , 1 ) = [ 1 / 2 , 5 / 8 ] , I ( 2 , 2 ) = [ 3 / 4 , 1 ] ,
and so on. If a certain segment I(x) = I(x1, …, xn) has already been defined, then I(x1, …, xn, 1) and I(x1, …, xn, 2) are the left quarter and the right half of I(x), respectively. The random Cantor dust C is defined as:
C = n = 1 C n , where C n = x X n I ( x ) .
Put θ(1) = 1/4 and θ(2) = 1/2. Let the distance function on the random set of infinite strings X be defined as:
dist ( x , y ) = t = 1 n θ ( x t ) , where n = inf { t | x t y t } 1.
Consider the mapping π: XC that takes any infinite string x = (x1, x2,…) ∈ X to the point π(x) = ∩n I(x1, …, xn). Obviously, this mapping is bi-Lipschitz, and hence, the Hausdorff dimensions of any subset AX and its image π(A) ⊂ ℝ are the same. Question: what is the Hausdorff dimension of the set π(A), where:
A = { x X | δ x , n ( 1 ) 1 / 3 , δ x , n ( 2 ) 2 / 3 } ?
The answer is given by Formulas (3) and (4). For example, if for each segment I(x1, …, xn), the “probabilities of survival” of its left quarter I(x1, …, xn, 1) and its right half I(x1, …, xn, 2) are p(1) = p(2) = 2/3, then under the condition π(A) ≠ ∅ with probability one, we have:
dim H π ( A ) = d ( ν , μ , θ ) = i = 1 2 ν ( i ) ln ν ( i ) μ ( i ) i = 1 2 ν ( i ) ln θ ( i ) = 1 3 ln 1 / 3 2 / 3 + 2 3 ln 2 / 3 2 / 3 1 3 ln 1 4 + 2 3 ln 1 2 = 1 4 .
As far as I know, no theorem proven or published previously gives answers to the questions put in Examples 1 and 2. On the other hand, if, in the setting of Example 2, one replaces the set π(A) by the whole of π(X), then dimH π(X) is computed, for instance, in [46] and Theorem 15.2 in [14]. The answer is given there as a solution to the so-called Bowen equation. Moreover, in those papers, dimH π(X) was computed in a more general setting when the scaling factors θ(i) ∈ [0, 1) are random.
Therefore, the following problem arises naturally: how can one compute dimH A and dimH π(A) for the set A defined in Formula (5) in the case of random scaling factors θ(i)? The present work does not solve this problem.
Falconer and Miao started the investigation of random self-affine fractals [7]. They considered a random iterated function system (RIFS) generated by a finite collection of contracting affine maps Si: ℝn → ℝn and encoded by a random tree T that is defined exactly in the same manner as the random set of infinite genetic lines X above. It turned out that under the condition of non-degeneracy T ≠ ∅ and some additional assumptions, the attractor of RIFS with probability one has a constant dimension (Hausdorff and box-counting), which can be found as a solution to a certain equation. Recently, analogous results were obtained for random affine code tree fractals [9] and for uniformly random self-similar fractals [8]. Thereby, the natural problem arises: is it possible to compute almost sure Hausdorff dimensions for subsets of random fractals like in [79], provided the subsets are defined in terms of the limit behavior of the corresponding empirical measures (say, as in Formula (5))?
One more possible direction to generalize the results of the paper is the case when the distribution ν, empirical measures δx,n and/or the scaling factors θ(i) are defined not on the set of colors Ω, but on the whole set of infinite strings Ω.

3. The Spectral Potential

The notion of spectral potential was introduced in [13]. It was defined there as a logarithm of the spectral radius of a weighted shift or transfer operator. We will use below a simplified version of the spectral potential that corresponds to an operator of weighted integration.
Let X be an arbitrary finite set. Denote by B(X) the space of all real-valued functions on X, by M(X) the set of all positive measures on X and by M1(X) the set of all probability distributions on X.
Every measure μM(X) defines a linear functional on B(X) of the form:
μ [ f ] = X f d μ = x X f ( x ) μ ( x ) .
It is easily seen that this functional is positive (i.e., it takes nonnegative values on nonnegative functions). If, in addition, the measure μ is a probability, then this functional is normalized (i.e., μ[1] = 1).
Consider the nonlinear functional:
λ ( φ , μ ) = ln μ [ e φ ] ,
where φB(X) and μM(X). We will call it the spectral potential. It is monotone (if φψ then λ(φ, μ) ≥ λ(ψ, μ)), additively homogeneous (that is, λ(φ + t, μ) = λ(φ, μ) + t for each constant t) and analytic in φ.
Define a family of probability measures μφ on X, depending on the functional parameter φB(X), by means of the formula:
μ φ [ f ] = μ [ e φ f ] μ [ e φ ] , f B ( X ) .
Evidently, each measure μφ is equivalent to μ and has the density eφλ(φ,μ) with respect to μ.
Let us compute the first two derivatives of the spectral potential with respect to the argument φ. We introduce the notation:
λ ( φ , μ ) [ f ] = d λ ( φ + t f , μ ) d t | t = 0 .
This is nothing else than the derivative of the spectral potential in the direction f at the point φ. An elementary computation shows that:
λ ( φ , μ ) [ f ] = d ln μ [ e φ + t f ] d t | t = 0 = μ [ e φ f ] μ [ e φ ] = μ φ [ f ] .
In other words, the derivative λ′(φ, μ) coincides with the probability measure μφ. Now, put:
λ ( φ , μ ) [ f , g ] = 2 λ ( φ + t f + s g , μ ) s t | s , t = 0 s
and compute this derivative using the just obtained Formula (7):
λ ( φ , μ ) [ f , g ] = s ( μ [ e φ + s g f ] μ [ e φ + s g ] ) | s = 0 = μ [ e φ f g ] μ [ e φ ] μ [ e φ f ] μ [ e φ g ] ( μ [ e φ ] ) 2 = μ φ [ f g ] μ φ [ f ] μ φ [ g ] .
In probability theory, the expression μφ[f] is usually called the expectation of a random variable f with respect to the probability distribution μφ, and the expression μφ[fg] − μφ[f]μφ[g] is called the covariance of random variables f and g. In particular, the second derivative:
d 2 λ ( φ + t f , μ ) d t 2 | t = 0 = μ φ [ f 2 ] μ φ [ f ] 2 = μ φ [ ( f μ φ [ f ] ) 2 ]
is equal to the variance of the random variable f with respect to the distribution μφ. Since the variance is nonnegative, it follows that the spectral potential is convex in φ.

4. The Kullback Action

Denote by B(X) the space of all linear functionals on B(X). Obviously,
M 1 ( X ) M ( X ) B * ( X ) .
For any measure μM(X) and nonnegative function φB(X), define a product φμM(X) by means of the formula (φμ)(x) = φ(x)μ(x).
The Kullback action is the following functional of two arguments νB(X) and μM(X):
ρ ( ν , μ ) = { μ [ φ ln φ ] = ν [ ln φ ] , if ν M 1 ( X ) and ν = φ μ , + in all of the other cases .
To be more precise, “all of the other cases” fit into at least one of the three categories: (1) probability measures ν, such that ν(x0) > 0 and μ(x0) = 0 for a certain x0X; (2) non-normalized functionals ν; and (3) nonpositive functionals ν.
As far as we know, in the literature, this functional was defined only for probability measures ν and μ. Different authors call it differently: the relative entropy, the deviation function, the Kullback–Leibler information function and the Kullback–Leibler divergence.
When ν is a probability measure, the Kullback action can be defined by the explicit formula:
ρ ( ν , μ ) = x X ν ( x ) ln ν ( x ) μ ( x ) .
In particular, if μ(x) 1, then the Kullback action differs only in sign from Shannon’s entropy:
H ( ν ) = x X ν ( x ) ln ν ( x ) .
In the case of a probability measure μ, the Kullback action is nonnegative and vanishes only if ν = μ. Indeed, if the functional ν is not a probability measure that is absolutely continuous with respect to μ, then ρ(ν, μ) = +. Otherwise, if ν is a probability measure of the form ν = φμ, then Jensen’s inequality and strong convexity of the function f(x) = x ln x imply:
ρ ( ν , μ ) = μ [ f ( φ ) ] f ( μ [ φ ] ) = 0
(since μ[φ] = ν[1] = 1) and the equality ρ(ν, μ) = 0 holds if and only if φ is constant almost everywhere, and therefore, ν coincides with μ.
Every measure μM(X) can be written in the form μ = 1, where c = μ[1] and μ1M1(X). If νM1(X), then Formula (9) implies:
ρ ( ν , μ ) = ρ ( ν , μ 1 ) ln c ln μ [ 1 ] .
In case νM1(X), this inequality holds, as well, since the Kullback action is infinite.
Theorem 1. The spectral potential and the Kullback action satisfy the Young inequality:
ρ ( ν , μ ) ν [ ψ ] λ ( ψ , μ ) ,
that turns into equality if and only if ν = μψ.
Proof. If ρ(ν, μ) = +, then the Young inequality is trivial. If ρ(ν, μ) < +, then by definition of the Kullback action, the functional ν is an absolutely continuous probability measure of the form ν = φμ, where φ is a nonnegative density. In this case:
λ ( ψ , μ ) = ln μ [ e ψ ] ln φ ( x ) > 0 e ψ ( x ) μ ( x ) = ln φ ( x ) > 0 e ψ ( x ) ln φ ( x ) ν ( x ) = ln ν [ e ψ ln φ ] ν [ ψ ln φ ]
(where at the last step, we have used Jensen’s inequality and concavity of the logarithm function). Since ρ(ν, μ) = ν[ln φ], this formula implies inequality (12).
Recall that μψ = eψλ(ψ,μ)μ. Therefore, if ν = μψ, then by definition:
ρ ( ν , μ ) = ν [ ψ λ ( ψ , μ ) ] = ν [ ψ ] λ ( ψ , μ ) .
Vice versa, suppose that ρ(ν, μ) = ν[ψ] − λ(ψ, μ). Then, subtracting from the above equality the Young inequality ρ(ν, μ) ≥ ν[φ] − λ(φ, μ), one obtains:
λ ( φ , μ ) λ ( ψ , μ ) ν [ φ ψ ]
which implies that ν = λ′(ψ, μ). Finally, λ′(ψ, μ) coincides with μψ.
Theorem 2. The Kullback action ρ(ν, μ) is the Legendre transform of the spectral potential:
ρ ( ν , μ ) = sup ψ B ( X ) { ν [ ψ ] λ ( ψ , μ ) } , ν B * ( X ) , μ M ( X ) .
Proof. By the Young inequality, the left-hand side of Formula (13) is not less than the right-hand one. Therefore, it is enough to associate with any functional νB(X) a family of functions ψt, depending on the real-valued parameter t, on which the equality in Formula (13) is attained.
Suppose first that ν is a probability measure that is absolutely continuous with respect to μ. Therefore, ν has the form ν = φμ, where φ is a nonnegative density. Consider the family of functions:
ψ t ( x ) = { ln φ ( x ) , if φ ( x ) > 0 , t , if φ ( x ) = 0.
When t → +, we have the following relations:
μ [ e ψ t ] = φ ( x ) > 0 φ ( x ) μ ( x ) + φ ( x ) = 0 e t μ ( x ) x X φ ( x ) μ ( x ) = 1 , ν [ ψ t ] = φ ( x ) > 0 φ ( x ) ln φ ( x ) μ ( x ) φ ( x ) = 0 t φ ( x ) μ ( x ) = μ [ φ ln φ ] , ν [ ψ t ] λ ( ψ t , μ ) = ν [ ψ t ] ln μ [ e ψ t ] μ [ φ ln φ ] = ρ ( ν , μ ) ,
and so, Formula (13) is proven.
In all other cases, when ν is not an absolutely continuous probability measure, by definition, we have ρ(ν, μ) = +. Let us consider these cases one after another.
If ν is a probability measure, which is singular with respect to μ, then there exists x0X, such that μ(x0) = 0 and ν(x0) > 0. In this case, consider the family of functions:
ψ t ( x ) = { t , if x = x 0 , 0 , if x x 0 .
It is easily seen that:
ν [ ψ t ] λ ( ψ t , μ ) t ν ( x 0 ) ln μ [ e ψ t ] t ν ( x 0 ) ln μ [ 1 ] .
The right-hand side of the above formula goes to +, while t increases, and so, Formula (13) holds.
If the functional ν is not normalized, then put ψt(x) ≡ t. In this case, the expression:
ν [ ψ t ] λ ( ψ t , μ ) = ν [ t ] ln μ [ e t ] = t ( ν [ 1 ] 1 ) ln μ [ 1 ]
is unbounded from above, and hence, Formula (13) is still valid.
Finally, if the functional ν is not positive, then there exists a nonnegative function φ, such that ν[φ] < 0. Consider the family ψt = −tφ, where t > 0. For this family:
ν [ ψ t ] λ ( ψ t , μ ) t ν [ φ ] λ ( 0 , μ ) +
as t → +, and Formula (13) still remains in force. □
Corollary 1. The functional ρ(⋅, μ) is convex and lower semicontinuous on the space B(X).
Proof. These are properties of the Legendre transform. □

5. The Local Large Deviations Principle and the McMillan Theorem

As above, we keep the following notation: X is a finite set; B(X) stands for the space of real-valued functions on X; B(X) is the space of linear functionals on B(X); M1(X) is the set of all probability measures on X; and M(X) is the set of all positive measures on X.
With each finite sequence x = (x1,…, xn) ∈ Xn, we associate an empirical measure δx,nM1(X), which is supported on the set {x1,…, xn} and assigns to every point xi the measure 1/n. The integral of any function f with respect to δx,n looks like:
δ x , n [ f ] = f ( x 1 ) + + f ( x n ) n .
For any measure μM(X), we denote by μn its Cartesian power, which is defined on Xn.
Theorem 3 (The local large deviations principle). For any positive measure μM(X), functional νB(X) and ε > 0, there exists a neighborhood O(ν), such that:
μ n { x X n | δ x , n O ( ν ) } e n ( ρ ( ν , μ ) ε ) .
On the other hand, for any ε > 0 and any neighborhood O(ν), the following asymptotic estimate holds:
μ n { x X n | δ x , n O ( ν ) } e n ( ρ ( ν , μ ) ε ) , n .
If ρ(ν, μ) = +, then by the difference ρ(ν, μ) − ε in Formula (14), we mean an arbitrary positive number.
In the case of probability measure μ, Theorem 3 is a partial case of Varadhan’s large deviations principle (the explicit formulation of which can be found, e.g., in [15,16]). Therefore, this theorem can be deduced from Varadhan’s large deviations principle by means of mere renormalization of μ. Nevertheless, we will prove it independently for the sake of completeness.
Proof. By Theorem 2, for any ε > 0, there exists ψB(X), such that:
ρ ( ν , μ ) ε / 2 < ν [ ψ ] λ ( ψ , μ ) .
Consider the probability measure μψ = eψλ(ψ,μ)μ. Obviously,
μ n ( x ) μ ψ n ( x ) = i = 1 n μ ( x i ) μ ψ ( x i ) = i = 1 n e λ ( ψ , μ ) ψ ( x i ) = e n ( λ ( ψ , μ ) δ x , n [ ψ ] ) .
Define a neighborhood of the functional ν as follows:
O ( ν ) = { δ B * ( X ) | δ [ ψ ] > ν [ ψ ] ε / 2 } .
Then, it follows from Formulas (17) and (16) that under the condition δx,nO(ν),
μ n ( x ) μ ψ n ( x ) < e n ( λ ( ψ , μ ) ν [ ψ ] + ε / 2 ) < e n ( ρ ( ν , μ ) + ε ) .
Consequently,
μ n { x X n | δ x , n O ( ν ) } = δ x , n O ( ν ) μ n ( x ) δ x , n O ( ν ) e n ( ρ ( ν , μ ) + ε ) μ ψ n ( x ) e n ( ρ ( ν , μ ) ε ) .
Thus, the first part of Theorem 3 is proven.
If ρ(ν, μ) = +, then estimating Formula (15) is trivial. Therefore, it is enough to prove it only for the case when ν is a probability measure of the form ν = φμ, and the Kullback action ρ(ν, μ) = ν[ln φ] is finite. Fix any number ε > 0 and any neighborhood O(ν). Define the sets:
Y n = { x X n | δ x , n O ( ν ) , | δ x , n [ ln φ ] ν [ ln φ ] | < ε / 2 }
(the last inequality in the braces implies that φ(xi) > 0 at each point of the sequence x = (x1,…,xn)). Note that for xYn,
μ n ( x ) ν n ( x ) = i = 1 n s μ ( x i ) ν ( x i ) = i = 1 n 1 φ ( x i ) = e n δ x , n [ ln φ ] > e n ( ν [ ln φ ] + ε / 2 ) .
Consequently,
μ n ( Y n ) = x Y n μ n ( x ) x Y n e n ( ν [ ln φ ] + ε / 2 ) ν n ( x ) = e n ρ ( ν , μ ) n ε / 2 ) ν n ( Y n ) .
By the Law of large numbers νn(Yn) 1. Hence, Formula (18) implies Formula (15).
Corollary 2 (The McMillan theorem). For any probability measure νM1(X) and ε > 0, there exists a neighborhood O(ν), such that:
# { x = ( x 1 , , x n ) X n | δ x , n O ( ν ) } e n ( H ( ν ) + ε ) .
On the other hand, for any neighborhood O(ν) and ε > 0:
# { x X n | δ x , n O ( ν ) } e n ( H ( ν ) ε ) a s n .
Here, H(ν) denotes Shannon’s entropy defined in Formula (10).
Proof. This follows from Formulas (9) and (10) and Theorem 3 if we set μ(x) = 1 for all xX.

6. Hausdorff Dimension and the Maximal Dimension Principle

Let us define the Hausdorff dimension of an arbitrary metric space Ω.
Suppose that Ω is covered by at most a countable collection of sets U = {Ui}. Denote by |U| the diameter of this covering: |U| = sup |Ui|, where |Ui| is the diameter of Ui. For every α ∈ ℝ, put:
mes ( U , α ) = i | U i | α .
The Hausdorff measure (of dimension α) of the metric space Ω is:
mes ( Ω , α ) = lim inf | U | 0 mes ( U , α ) ,
where U is at most a countable covering of Ω. Obviously,
mes ( U , β ) ( U , α ) | U | β α if β α .
This implies the basic property of the Hausdorff measure: if mes(Ω, α) < for some α, then mes(Ω, β) = 0 for all β > α.
The Hausdorff dimension of the space Ω is the number:
dim H = inf { α | mes ( , α ) = 0 } .
In other words, dimH Ω = α0 if mes(Ω, α) = 0 for all α > α0 and mes(Ω, α) = for all α < α0.
Note that, according to this definition, we obtain dimH Ω = − when Ω = ∅ and dimH Ω = 0 when Ω is nonempty finite or countable.
Below, we will consider the space of sequences:
X = { x = ( x 1 , x 2 , x 3 , ) } , x i X = { 1 , , r } .
Let x = (x1, x2, ) ∈ X. Denote by Zn(x) the set of all sequences y = (y1, y2,) whose first n coordinates coincide with the same coordinates of x. This set will be called a cylinder of rank n. In particular, Z0(x) = X. All cylinders generate the Tychonoff topology on the space X and the cylinder σ-algebra of subsets in X.
On the set of all cylinders, we fix an arbitrary positive function η that possesses the following two properties: first, if Zn(x) ⊂ Zm(y), then η(Zn(x)) ≤ η(Zm(y)) and, second, η(Zn(x)) 0 as n → ∞ at each point xX. Define the cylindrical metric on X by means of the formula:
dist ( x , y ) = η ( Z n ( x ) ) , where n = max { m | Z m ( x ) = Z m ( y ) } .
Evidently, the diameter of Zn(x) in this metric coincides with η(Zn(x)).
Suppose that in addition to cylindrical metric (20) on X, there is given a Borel measure μ on X. The function:
d μ ( x ) = lim inf n ln μ ( Z n ( x ) ) ln | Z n ( x ) |
is called the (lower) pointwise dimension of the measure μ.
The next theorem provides an effective tool for computing the Hausdorff dimensions of various subsets of X.
Theorem 4. Suppose AX. If there exists a finite Borel measure μ on X, such that dμ(x) ≤ d for each point xA, then dimH A ≤ d. On the contrary, if dμ(x) ≥ d for each xA and the outer measure μ(A) is positive, then dimH A ≥ d.
It follows that if dμ(x) ≡ d on the whole of the subset AX, then its dimension is equal to d.
A weakened version of the second part of Theorem 4 in which the condition dμ(x) ≥ d is replaced by a more strong one μ(Zn(x)) ≤ |Zn(x)|d is usually called the mass distribution principle.
Theorem 4 was first proven by Billingsley in the case when the function η in Formula (20) is a probability measure on X (see Theorems 2.1 and 2.2 in [11]). An analog to this theorem for subsets A ⊂ Rr was proven in works [12,13].
Proof. Every cylinder Zn(x) is, in fact, a ball in metric (20), whose radius equals its diameter, and vice versa, any ball in this metric coincides with a cylinder. Besides, any two cylinders Zn(x) and Zm(y) either have an empty intersection or one of them is embedded in the other. Therefore, while computing the Hausdorff measure and the dimension of a subset AX, it is enough to operate only with disjoint coverings of A by cylinders.
Suppose first that dμ(x) < α for all points x ∈ A. Then, for each x ∈ A, there exist arbitrarily small cylinders Zn(x) satisfying the condition |Zn(x)|α < μ(Zn(x)). Using cylinders of this kind, we can arrange a disjoint covering U of the set A of arbitrarily small diameter. For this covering, we have:
mes ( U , α ) = Z n ( x ) U | Z n ( x ) | α Z n ( x ) U μ ( Z n ( x ) ) μ ( X ) ,
and hence, dimH A ≤ α. Thus, the first part of the theorem is proven.
Suppose now that dμ(x) > α for all points x ∈ A. Define the sets:
A ε = { x A | | Z n ( x ) | α > μ ( Z n ( x ) ) whenever | Z n ( x ) | < ε } .
Obviously, A = ∪ε>0 Aε. Hence, there exists an ε, such that μ(Aε) > 0. Let U be a disjoint covering of A by cylinders of diameters less than ε. From the definition of Aε, it follows that mes(U, α) ≥ μ(Aε). Therefore, dimH A ≥ α, and thus, the second part of the theorem is proven.
Each point x = (x1, x2, ) ∈ X generates a sequence of empirical measures δx,n on the set X:
δ x , n ( i ) = # { t | x t = i , t n | } n , i X .
In other words, δx,n(i) is the fraction of coordinates of the vector (x1,, xn) coinciding with i.
For every probability measure νM1(X), let us define its basin B(ν) as the set of all points xX, such that δx,n converges to ν:
B ( v ) = { x X | δ x , n v } .
Evidently, basins different measures do not intersect each other and are nonempty. If xB(ν) and yX differs from x in only finite number of coordinates, then yB(ν). This implies the density of each basin in X.
Every measure νM1(X) generates the Bernoulli distribution Pν = ν on X. By the strong law of large numbers, the basin B(ν) has probability one with respect to Bernoulli distribution Pν, and its complement has zero probability Pν. In particular, any basin different from B(ν) has zero probability.
Points that do not belong to the union of all basins will be called irregular. The set or irregular points has zero probability with respect to any distribution Pν, where νM1(X). As a result, X turns out to be decomposed into the disjoint union of different basins and the set of irregular points.
Let us fix some numbers θ(i) ∈ (0, 1) for all elements iX = {1,…, r} and define a cylindrical θ-metric on X by the rule:
dist ( x , y ) = t = 1 n θ ( x t ) , n = inf { t | x t y t } 1.
It is a partial case of cylindrical metric (20).
For each measure νM1(X) and θ-metric (21), define the quantity:
S ( v , θ ) = i = 1 r v ( i ) ln v ( i ) i = 1 r v ( i ) ln θ ( i ) .
We will call it the Billingsley entropy, because he was the first who wrote down this formula and applied it for the computation of the Hausdorff dimensions [10]. He also expressed this quantity in terms of Shannon’s entropy and the Kullback action:
S ( v , θ ) = H ( v ) H ( v ) + ρ ( v , θ ) .
Theorem 5. The Hausdorff dimension of any basin B(ν) with respect to θ-metric (21) is equal to the Billingsley entropy S(ν, θ).
A partial case of this theorem, in which θ(1) = ⋯ = θ(r) = 1/r, was first proven by Eggleston [17]. In the complete form, this theorem and its generalizations were proven by Billingsley [10,11].
Proof. Assume first that ν(i) > 0 for every i = 1,, r. Obviously,
ln P v ( Z n ( x ) ) ln | Z n ( x ) | = t = 1 n ln v ( x t ) t = 1 n ln θ ( x t ) = i = 1 r n δ x , n ( i ) ln v ( i ) i = 1 r n δ x , n ( i ) ln θ ( i ) .
Hence, for each point xB(ν), we have:
d P v ( x ) = lim inf x ln P v ( Z n ( x ) ) ln | Z n ( x ) | = i = 1 r v ( i ) ln v ( i ) i = 1 r v ( i ) ln θ ( i ) = S ( v , θ ) .
Applying Theorem 4 to the set A = B(ν) and measure μ = Pν, we obtain the statement of Theorem 5.
In the general case, the same argument provides only the lower bound d P ν ( x ) S ( ν , θ ), which implies the lower bound dimH B(ν) ≥ S(ν, θ). The inverse inequality is provided by the next lemma.
Lemma 1. Suppose that X is equipped with metric (21). Then, for any measure νM1(X) and ε > 0, there exists a neighborhood O(ν), such that the Hausdorff dimension of the set:
A = { x X | N n > N : δ x , n O ( v ) }
does not exceed S(ν, θ) + ε.
Proof. Fix a measure νM1(X) and an arbitrary positive number x. By McMillan’s theorem, there exists a neighborhood O(ν), such that for each positive integer n:
# { Z n ( x ) | δ x , n O ( v ) } e n ( H ( v ) + x ) .
Decrease this neighborhood in such a way that, in addition to Formula (24), for every measure δO(ν), the next inequality holds:
i = 1 r δ ( i ) ln θ ( i ) < i = 1 r v ( i ) ln θ ( i ) + x .
Then, for every cylinder Zn(x) satisfying the condition δx,nO(ν), we have the estimate:
| Z n ( x ) | = t = 1 n θ ( x t ) = exp { t = 1 n ln θ ( x t ) } = exp { n i = 1 r δ x , n ( i ) ln θ ( i ) } < exp { n i = 1 r v ( i ) ln θ ( i ) + n x } .
For any positive integer N, the set A is covered by the collection of cylinders:
U N = n = N { Z n ( x ) | δ x , n O ( v ) } .
Evidently, the diameter of this covering goes to zero when N increases. Now, we can evaluate mes(UN, α) by means of Formulas (24) and (25):
mes ( U N , α ) = Z n ( x ) U N | Z n ( x ) | α n = N e n ( H ( v ) + x ) exp { α n i = 1 r v ( i ) ln θ ( i ) + α n x } = n = N exp { n ( i = 1 r v ( i ) ln v ( i ) + α i = 1 r v ( i ) ln θ ( i ) + x + α x ) } .
If α > S(ν, θ), then we can choose κ > 0 so small that the exponent in braces in Formula (26) is negative, and all of the sum in Formula (26) goes to zero as N → ∞. Therefore, the Hausdorff measure (of dimension α) of the set A is zero, and hence, dimH A does not exceed α.
We will say that a sequence of empirical measures δx,n condenses on a subset VM1(X) (notation: δx,nV) if it has at least one limit point in V.
Similar to the known large deviations principle by Varadhan [15,16], it is natural to call the next theorem the maximal dimension principle.
Theorem 6. Let the space X be equipped with cylindrical θ-metric (21). Then, for any nonempty subset VM1(X):
dim H { x X | δ x , n V } = sup v V S ( v , θ ) .
Proof. The set A = {xX | δx,nV} contains basins of all measures νV. Therefore, by Theorem 5, its dimension is not less than the right-hand side of Formula (27).
It is easily seen from the definition (22) of the Billingsly entropy S(ν, θ) that it depends continuously on the measure νM1(X). Consider the closure V ¯ of V. Obviously, it is compact. Fix any ε > 0. By Lemma 1 for any measure v V ¯, there exists a neighborhood O(ν), such that:
dim H { x X | δ x , n O ( v ) } S ( v , θ ) + ε sup v V S ( v , θ ) + ε .
Let us take a finite covering of V ¯ composed of neighborhoods of this sort. Then, the set A = {xX | δx,nV} will be covered by a finite collection of sets of the form {xX | δx,nO(ν)} satisfying inequality Formula (28). By the arbitrariness of ε, this implies the statement of Theorem 6.
A result very similar to Theorem 6 was proven by Billingsley in Theorem 7.1 in [10].

7. Branching Processes

First, let us recall the basic notions about the simplest Galton–Watson branching process.
Suppose that a random variable Z takes nonnegative values k ∈ Z+ with probabilities pk. The Galton–Watson branching process is a sequence of random variables Z0, Z1, Z2,…, such that Z0 1, Z1 = Z, and further, each Zn+1 is defined as the sum of Zn independent counterparts of the random variable Z. In particular, if Zn = 0, then Zn+1 = 0, as well. Usually, Zn is understood as the total number of descendants in the n-th generation of a unique common ancestor under the condition that each descendant independently of the others gives birth to Z children.
It is known that in some cases, the posterity of the initial ancestor may degenerate (when starting from a certain n, all Zn are zeros), and in other cases, it can “flourish” (when Zn grows exponentially). The type of behavior of a branching process depends on the mean number of children of any individual:
m = Z = k = 0 k p k
and on the generating function of that number:
f ( s ) = f 1 ( s ) = k = 0 p k s k .
Obviously, the restriction of the function f(s) to the segment [0, 1] is nonnegative, nondecreasing, convex and satisfies f(1) = 1 and f′(1) = m.
In the theory of branching processes (see, for instance, [18,19]), the following statements were proven.
Theorem 7. The generating functions of the number of descendants in the n-th generation:
f n ( s ) = k = 0 P { Z n = k } s k
satisfy the recursion relation fn+1(s) = f(fn(s)).
Theorem 8. If m ≤ 1, then the branching process degenerates almost surely (except the case when each individual gives birth to exactly one child). If m > 1, then the probability of degeneration q is less than one and coincides with a unique non-unit root of the equation f(s) = s on the segment [0, 1].
Theorem 9. If m > 1 and EZ2 < ∞, then the sequence Wn = Z n/mn converges almost surely to a random variable W, such that P{W > 0} = 1 − q. If m > 1 and EZ2 = ∞, then, for any number m0 < m, with probability 1 − q:
lim n Z n / m n < , lim n Z n / ( m ) n =
(here, q is the probability of degeneration of the branching process).
Thereby, in the case m > 1, there is an alternative for the total number of descendants Zn: either it vanishes at a certain moment n0 (with probability q < 1) or it is asymptotically equivalent to Wmn (with the complementary probability 1 − q), where the random variable W > 0 does not depend on n (except the case EZ2 = , when only the logarithmic equivalence ln Zn ln mn is guaranteed). All of the other types of behavior of descendants’ totality have zero probability.
We will exploit these theorems in the study of colored branching processes.
Suppose now that each individual may give birth to children of r different colors (or r different genders, if one likes). We will assume that the posterity of each individual in the first generation is a random set X containing a random number k1 of children of the first color, a random number k2 of children of the second color, and so on, up to a random number kr of children of color r. All of the elements of X (including elements of the same color) are treated as different. The ordered array k = ( k 1 , , k r ) + r will be called the color structure of the set X of children. Denote by pk the probability of the birth of the set X with color structure k = (k1, k2,…, kr). Naturally, all of the probabilities pk are nonnegative and:
k + r p k = 1.
If an individual x1 gave birth to x2, then x2 gave birth to x3, and so on, up to an individual xn, then the sequence x = (x1,…, xn) will be called a genetic line of length n.
Let us construct a new branching process taking into account not only the total number of descendants, but also the color of each individual and all its upward and downward lineal relations. This process may be treated as a random genealogical tree with a common ancestor in the root and all its descendants in the vertices, where each parent is linked with all its children. In the case of a degenerating population, its genealogical tree is finite, and in the case of a “flourishing” one, the tree is infinite.
Formally, it is convenient to define such a process as a sequence of random sets Xn containing all genetic lines of length n. As the first set X1, we take X. The subsequent sets Xn are built by induction: if Xn is already known, then for all genetic lines (x1,…, xn) ∈ Xn, one defines disjoint independent random sets of children X(x1,…, xn), each with a color structure distribution as in X, and Xn+1 is given by:
X n + 1 = { ( x 1 , , x n , x n + 1 ) | ( x 1 , , x n ) X n , x n + 1 X ( x 1 , , x n ) } .
The stochastic process X1, X2, X3,… built in this way will be referred to as a colored branching process (or unconditional colored branching process, if one wishes to emphasize that the posterity of any individual is independent of its color and genealogy).

8. The McMillan Theorem for Colored Branching Processes

Consider a colored branching process X1, X2, X3,… with a given finite collection of colors Ω = {1,…, r} and a probability distribution { p k | k + r }, where k = (k1,…, kr) is the color structure of each individual’s set of children X. We will always assume that X1 = X is generated by a unique initial individual.
For any genetic line x = (x1,…, xn) ∈ Xn, define the spectrum δx,n as the corresponding empirical measure on Ω given by the rule:
δ x , n ( i ) = # { t | g ( x t ) = i } n , i Ω ,
where g(xt) denotes the color of xt. In other words, δx,n(i) is the fraction of individuals of color i in the genetic line x. Our next goal is to obtain asymptotical estimates for cardinalities of the random sets:
{ x = ( x 1 , , x n ) X n | δ x , n O ( ν ) } ,
where O(ν) is a small neighborhood of the distribution ν on the set of colors Ω.
Denote by μ(i) the expectation of members of color i in X:
μ ( i ) = k + r k i p k , i = 1 , , r .
Provided all μ(i) are finite, the vector μ = (μ(1),…, μ(r)) can be regarded as a measure on the set of colors Ω. This measure generates the measure μn on Ωn as a Cartesian product.
Define a mapping G : Xn → Ωn by means of the formula:
G ( x 1 , , x n ) = ( g ( x 1 ) , , g ( x n ) ) ,
where g(xt) is the color of xt.
Lemma 2. For any ω = ( ω 1 , , ω n ) Ω n, we have:
# { x X n | G ( x ) = ω } = t = = 1 n μ ( ω t ) = μ n ( ω ) .
Proof. Discard the last coordinate in ω, and let ω = ( ω 1 , , ω n 1 ). For every genetic line (x1,…, xn−1) ∈ Xn−1, by virtue of the definition of unconditional colored branching process, we have:
# { x n X ( x 1 , , x n 1 ) | g ( x n ) = ω n } = μ ( ω n ) .
Evidently, this expression does not depend on x′ = (x1,…, xn−1). Therefore,
# { x n X n | G ( x ) = ω } = # { x X n 1 | G ( x ) = ω } = μ ( ω n ) .
Repeated application of the latter equality gives Formula (31).
For the measure μ from Formula (30), let us define the Kullback action:
ρ ( ν , μ ) = i Ω ν ( i ) ln ν ( i ) μ ( i ) , ν M 1 ( Ω ) ,
where M1(Ω) is the set of all probability measures on Ω. This formula is a copy of Formula (9).
Theorem 10. Suppose X1, X2, X3,… is an unconditional colored branching process with a finite collection of colors Ω. Then, for any ε > 0 and probability measure ν ∈ M1(Ω), there exists a neighborhood O(ν) ⊂ M1(Ω), such that for all natural n:
# { x X n | δ x , n O ( ν ) } e n ( ρ ( ν , μ ) + ε ) .
On the other hand, for any ε > 0 and any neighborhood O(ν):
# { x X n | δ x , n O ( ν ) } e n ( ρ ( ν , μ ) + ε ) a s n .
If ρ(ν, μ) = +, the expression −ρ(ν, μ) + ε in Formula (32) should be treated as an arbitrary negative real number.
Proof. It follows from Formula (29) that, for every genetic line x ∈ Xn, its spectrum δx,n coincides with the empirical measure δω,n, where ω = G(x). Therefore,
# { x X n | δ x , n O ( ν ) } = ω Ω n : δ ω , n O ( ν ) # { x X n | G ( x ) = ω } .
It follows from Formulas (31) and (34) that:
# { x X n | δ x , n O ( ν ) } = μ n { ω Ω n | δ ω , n O ( ν ) } .
The latter equality converts estimate Formulas (32) and (33) into already proven estimate Formulas (14) and (15) from the large deviations principle.
It is remarkable that the last reference to the large deviations principle serves as a unique “umbilical cord” linking the first three Sections of the paper with the others.
Now, we are ready to state an analog of the McMillan theorem for colored branching processes. Let q be the probability of degeneration of the process (the probability of the event when starting from a certain number n all of the sets Xn turn out to be empty).
Theorem 11. Suppose that X1, X2,… is an unconditional colored branching process with a finite collection of colors Ω. Then, for any ε > 0 and any probability measure ν ∈ M1(Ω), there exists a neighborhood O(ν) ⊂ M1(Ω), such that almost surely:
# { x X n | δ x , n O ( ν ) } = < e n ( ρ ( ν , μ ) + ε ) a s n .
On the other hand, if ρ(ν, μ) < 0, then, for any neighborhood O(ν) and positive ε, the estimate:
# { x X n | δ x , n O ( ν ) } = < e n ( ρ ( ν , μ ) ε ) a s n .
holds with probability 1 − q (or almost surely under the condition that our branching process does not degenerate).
Proof. The application of Chebyshev’s inequality to Formula (32) gives:
P { # { x X n | δ x , n O ( ν ) } e n ( ρ ( ν , μ ) + 2 ε ) } e n ε .
Summing up these inequalities over all n ≥ N, one obtains:
P { n N : # { x X n | δ x , n O ( ν ) } e n ( ρ ( ν , μ ) + 2 ε ) } e N ε 1 e ε .
This implies Formula (35) with constant 2ε instead of ε, which does not change its sense.
Now, we proceed to the second part of the theorem. Let x = −ρ(ν, μ) − ε and, at the same time, the number ε be so small that x > 0. By the second part of Theorem 10, for any neighborhood O(ν), there exists N, such that:
# { x X N | δ x , N O ( ν ) } > e N x .
Without loss of generality, we may assume that O(ν) is convex.
Now, we wish to construct a Galton–Watson branching process Z1, Z2,… satisfying the conditions:
Z 1 = # { x X N | δ x , N O ( ν ) } ,
Z n # { x X n N | δ x , n N O ( ν ) } , n = 2 , 3 ,
Let the random variable Z1 be defined by Formula (37). For n > 1, set Zn to be the total number of genetic lines:
( x 1 , , x n , , x ( n 1 ) N + 1 , , x n N ) X n N
such that the spectrum of each segment (xkN+1,…, x(k+1)N) belongs to O(ν). In other words, we treat as “individuals” of the process Z1, Z2, Z3,… those segments (xkN+1,…, x(k+1)N) of genetic lines of the initial process whose spectrum lies in O(ν). Now, Formula (38) follows from convexity of O(ν); and from the unconditionality of the initial colored branching process, it can be concluded that the sequence Z1, Z2,… in fact forms a Galton–Watson branching process.
By construction, one has EZ1 > eNx. In this setting, Theorem 9 asserts that there is an alternative for the sequence Zn: either it tends to zero with a certain probability q < 1 or it grows faster than enNx with probability 1 − q. In the second case, Formula (38) implies:
# { x X n N | δ x , n N O ( ν ) } e n N x , n .
To finish the proof, we have to do two things: verify that Formula (39) is valid with probability 1 − q and get rid of the multiplier N there. To do this, we will exploit two ideas. First, if the colored branching process X1, X2,… had been generated by m initial individuals instead of a unique one, then Formula (39) would be valid with a probability of at least 1 − qm. Second, if one genetic line is a part of another one and the ratio of their lengths is close to one, then their spectra are close to each other, as well.
Obviously, the total number of individuals in the n-th generation of the initial branching process X1, X2, X3,… equals |Xn|. The sequence of random variables |Xn| forms a Galton–Watson branching process with the probability of degeneration q, that does not exceed q. Therefore, the sequence |Xn| grows exponentially with probability 1 − q.
Consider the colored branching process Xk+1, Xk+2, Xk+3,… obtained from the initial one by truncation of the first k generations. It can be thought of as a union of |Xk| independent branching processes generated by all individuals of k-th generation. Therefore, it satisfies Formula (39) with probability at least 1 q | X k |. Hence, for the initial process with even greater probability, we obtain the condition:
# { x X k + n N | δ x , k + n N O * ( ν ) } e n N x , n .
where O(ν) is an arbitrary neighborhood of ν containing the closure of O(ν).
Suppose that the sequence |Xn| grows exponentially. For every m ∈ ℕ, we set the numbers:
k i = min { k : | X k | m , k = i mod N } , i = 0 , 1 , N 1 .
For each k = ki, condition (40) holds with probability at least 1 − qm, and together, they give the estimate:
# { x X n | δ x , n O * ( ν ) } e n x , n ,
with probability at least 1 − Nqm. By the arbitrariness of m, this estimate is valid almost surely (under the condition |Xn| → ∞, which takes place with probability 1−q). It is equivalent to Formula (36).

9. Dimensions of Random Fractals (Upper Bounds)

Here, we continue the investigation of a colored branching process X1, X2,… with a finite collection of colors Ω = {1,…, r}. Let us consider the corresponding set of infinite genetic lines:
X = { x = ( x 1 , x 2 , x 3 , ) | ( x 1 , , x n ) X n n } .
Define the cylindrical θ-metric on X:
dist ( x , y ) = t = 1 n θ ( x t ) , n = inf { t | x t y t } 1 ,
where the numbers θ(1),…, θ(r) are taken from (0, 1).
We will be interested in the Hausdorff dimensions of both the space X and its various subsets, defined in terms of the partial limits of empirical measures on Ω (we call these measures spectra; see Formula (29), and denote by δx,n). If the colored branching process degenerates, then X = ∅. Therefore, the only case of interest is when m = E|X1| > 1, and the cardinality of Xn increases with a rate of order mn.
As before, we denote by μ(i), where i ∈ Ω, the expectation of individuals of color i in the random set X1. It will be always assumed that μ(i) < . Consider any probability measure ν ∈ M1(Ω). It will be proven below that the dimension of the set {x ∈ X | δx,n → ν} can be computed by means of the function:
d ( ν , μ , θ ) = ρ ( ν , μ ) i = 1 r ν ( i ) ln θ ( i ) = i = 1 r ν ( i ) ln ν ( i ) μ ( i ) i = 1 r ν ( i ) ln θ ( i ) .
We will call it the Billingsley–Kullback entropy.
The numerator in Formula (42) is the Kullback action, and the denominator is negative. If μ is a probability measure on Ω, then the Kullback action is nonnegative. However, in our setting, this is not the case, since m = μ(1) + ⋯ + μ(r) > 1. In particular, if μ(i) > ν(i), for all i ∈ Ω, then the Kullback action is negative, and the Billingsley–Kullback entropy is positive. Note, in addition, that if μ(1) = ⋯ = μ(r) = 1, then −ρ(ν, μ) is equal to Shannon’s entropy H(ν), and the Billingsley–Kullback entropy turns into the Billingsley entropy (22).
Lemma 3. Let the space X of infinite genetic lines be equipped with metric (41). Then, for any probability measure ν ∈ M1(Ω) and any ε > 0, there exists a neighborhood O(ν), such that the Hausdorff dimension of the set:
A = { x X | N n > N : δ x , n O ( ν ) }
does not exceed d(ν, μ, θ) + ε almost surely.
The proof is carried out in the same manner as in Lemma 1. Take any x > 0. By Theorem 32, there exists a neighborhood O(ν), such that almost surely:
# { x X n | δ x , n O ( ν ) } e n ( ρ ( ν , μ ) + x ) , as n .
Let us reduce this neighborhood in such a way that in addition to Formula (43), for all δ ∈ O(ν),
i = 1 r δ ( i ) ln θ ( i ) < i = 1 r ν ( i ) ln θ ( i ) + x .
Then, for each cylinder Zn(x) satisfying the condition δx,n ∈ O(ν), the following estimate holds:
| Z n ( x ) | = i = 1 n θ ( x t ) = exp { t = 1 n ln θ ( x t ) } = exp { n t = 1 n δ x , n ( i ) ln θ ( i ) } < exp { n t = 1 n ν ( i ) ln θ ( i ) + n x } .
For every natural N, the set A is covered by the collection of cylinders:
U N = n = N { Z n ( x ) | δ x , n O ( ν ) } .
Evidently, the diameter of this covering tends to zero as N → ∞. Hence, mes ( U N , α ) can be estimated by means of Formulas (43) and (44):
mes ( U N , α ) = Z n ( x ) U N | Z n ( x ) | α n = N e n ( ρ ( ν , μ ) + x ) exp { a n i = 1 r ν ( i ) ln θ ( i ) + a n x } = n = N exp { n ( i = 1 r ν ( i ) ln ν ( i ) μ ( i ) + a i = 1 r ν ( i ) ln θ ( i ) + x + a x ) } .
If α > d(ν, μ, θ), then x can be chosen so small, that the last exponent in braces is negative, and all of the sum (45) tends to zero as N → ∞. Thus, the Hausdorff measure (of dimension α) of the set A is zero, and its dimension does not exceed α.
As before, we say that a sequence of empirical measures δx,n condenses on a subset V ⊂ M1(Ω) ( notation δ x , n V ) if it has a limit point in V.
Theorem 12. Let X1, X2,… be an unconditional colored branching process with a finite set of colors Ω, and let the set X of all infinite genetic lines be equipped with a cylindrical θ-metric (41). Then, for any subset V ⊂ M1(Ω), almost surely:
dim H { x X | δ x , n V } sup ν V d ( ν , μ , θ ) .
In particular, dimH X ≤ s almost surely, where s is a unique root of the “Bowen equation”:
i = 1 r μ ( i ) θ ( i ) s = 1 .
Proof. It follows from the definition of the Billingsley–Kullback entropy d(ν, μ, θ) that it depends continuously on the measure ν ∈ M1(Ω). Let V ¯ be the closure of V. Obviously, it is compact. Take an arbitrary ε > 0. By Lemma 3, for any measure ν V ¯, there exists a neighborhood O(ν), such that almost surely:
dim H { x X | δ x , n O ( ν ) } d ( ν , μ , θ ) + ε sup ν V d ( ν , μ , θ ) + ε .
Choose a finite covering of the set V ¯ by neighborhoods of this kind. Then, the set { x X | δ x , n V } will be covered by a finite collection of sets of the form { x X | δ x , n O ( ν ) } satisfying Formula (48). By the arbitrariness of ε, this implies the first statement of Theorem 12.
Let s be a solution to Equation (47). Note that for any ν ∈ M1(Ω), the concavity of the logarithm function implies that:
s i = 1 r ν ( i ) ln θ ( i ) i = 1 r ν ( i ) ln ν ( i ) μ ( i ) = i = 1 r ν ( i ) ln μ ( i ) θ ( i ) s ν ( i ) ln { ν ( i ) > 0 r ν ( i ) ln μ ( i ) θ ( i ) s ν ( i ) } 0.
Consequently, d(ν, μ, θ) ≤ s. Now, the second part of the theorem follows from the first one if we take V = M1(Ω).
Remark. Usually, the term “Bowen equation” is used for the equation P () = 0, where P () is the topological pressure of a weight function in some dynamical system (more detailed explanations can be found in [13]). If we replace the topological pressure P () by the spectral potential:
λ ( s φ , μ ) = ln i = 1 r e s φ ( i ) μ ( i ) , where φ ( i ) = ln θ ( i ) ,
then the Bowen equation turns into the equation λ(sφ, μ) = 0, which is equivalent to Formula (47).

10. Block Selections of Colored Branching Processes

Let ξ1, ξ2, ξ3,… be a sequence of independent identically distributed random variables taking values of zero or one (independent Bernoulli trials).
Lemma 4. If 0 < p′ < p < 1 and P{ξi = 1} ≥ p, then:
P { ξ 1 + + ξ k p k } 1 a s k
uniformly with respect to the probability P { ξ i = 1 } p.
Proof. In the case P{ξi = 1} = p, this follows from the law of large numbers. If P{ξi = 1} increases, then the probability in the left-hand side of Formula (49) increases, as well.
Consider a colored branching process X1, X2,… with a finite set of colors Ω = {1,…, r}. Each Xn consists of genetic lines (x1, x2,…, xn) of length n, in which every subsequent individual is born by the previous one. Fix a (large enough) natural N. We will split genetic lines of length divisible by N into blocks of length N:
( x 1 , x 2 , , x n N ) = ( y 1 , , y n ) , where y k = ( x ( k 1 ) N + 1 , , x k N ) .
Each block yk generates an empirical measure δ y k (spectrum) on Ω by the rule:
δ y k ( i ) = # { t | g ( x t ) = i , ( k 1 ) N < t k N } N ,
where g(xt) denotes the color of xt.
A block selection of order N from a colored branching process X1, X2,… is any sequence of random subsets Yn ⊂ XnN with the following property: if (y1,…, yn+1) ∈ Yn+1, then (y1,…, yn) ∈ Yn. In this case, the sequence of blocks (y1,…, yn+1) will be called a prolongation of the sequence (y1,…, yn).
As above (see Formula (30)), denote by μ(i) the expectation of children of color i born by each individual and by μ the corresponding measure on Ω.
Theorem 13. Let X1, X2,… be an unconditional colored branching process with a finite set of colorsand the probability of degeneration q < 1. If a measure ν ∈ M1(Ω) satisfies the condition ρ(ν, μ) < 0, then for any neighborhood O(ν) ⊂ M1(Ω) and any number ε > 0 with probability 1 − q, one can extract from the branching process a block selection Y1, Y2,… of an order N, such that each sequence of blocks (y1,…, yn) ∈ Yn has at least l(N) prolongations in Yn+1, where:
l ( N ) = e N ( ρ ( ν , μ ) ε ) ,
and the spectra of all blocks belong to O(ν).
Proof. Fix any numbers p and ε satisfying the conditions:
0 < p < p + ε < 1 q * , ρ ( ν , μ ) + ε < 0.
By the second part of Theorem 11 for all large enough N, we have:
P { # { x X N | δ x , N O ( ν ) } e N ( ρ ( ν , μ ) ε / 2 ) } p + ε .
Further, we will consider finite sequences of random sets X1,…, XnN and extract from them block selections Y1,…, Yn of order N, such that the spectra of all of their blocks belong to O(ν) and each sequence of blocks (y1,…, yk) ∈ Yk has at least l(N) prolongations in Yk+1. Denote by An the event of the existence of a block selection with these properties. Define one more event A by the condition:
# { x X N | δ x , N O ( ν ) } > l ( N ) e N ε / 2 .
It follows from Formulas (50) and (51) that P(A) ≥ p + ε. Evidently, A ⊂ A1. Therefore, P(A1) ≥ p + ε. Now, we are going to prove by induction that P(An) ≥ p whenever the order N of our block selection is large enough. Let us perform the step of induction. Suppose that P(An) ≥ p is valid for some n. Consider the conditional probability P(An+1|A). By the definition of events An+1 and A, it cannot be less than the probability of the following event: there are at least l(N) wins in a sequence of [ l ( N ) e N ε / 2 ] independent Bernoulli trials with the probability of win P(An) in each. Choosing N large enough and using Lemma 4 (with p′ = p/2 and k = [ l ( N ) e N ε / 2 ], one can make this probability greater than 1 − ε. Then,
P ( A n + 1 ) P ( A ) P ( A n + 1 | A ) > ( p + ε ) ( 1 ε ) > p .
Thus, the inequality P(An) > p is proven for all n.
This means that with probability greater than p, one can extract from the sequence X1,…, XnN a block selection Y1,…, Yn of order N, such that the spectra of all blocks belong to the neighborhood O(ν) and each sequence of blocks (y1,…, yk) ∈ Yk has at least l(N) prolongations in Yk+1.
To obtain a block selection of infinite length with the same properties, we will construct finite block selections Y1,…, Yn in the following manner. Initially, suppose that every Yk, where k ≤ n, consists of all of the sequences of blocks (y1,…, yk) ∈ XkN, such that the spectrum of each block lies in O(ν). At the first step, we exclude from Yn−1 all of the sequences of blocks having less than l(N) prolongations in Yn and then exclude from Yn all prolongations of the sequences that were excluded from Yn−1. At the second step, we exclude from Yn−2 all of the sequences of blocks having after the first step less than l(N) prolongations in the modified Yn−1, and then, exclude from Yn−1 and Yn all prolongations of the sequences that were excluded from Yn−2. Proceeding further in the same manner, after n steps, we will obtain a block selection Y1,…, Yn, such that each sequence of blocks from any Yk has at least l(N) prolongations in Yk+1. Evidently, this selection will be the maximal one among all of the selections of order N having the mentioned “l(N)-prolongations” property. Therefore, with probability at least p, all of the sets Yk are nonempty.
For every n, let us construct, as is described above, the maximal block selection Y 1 ( n ) , , Y n ( n ) possessing the mentioned “l(N)-prolongations” property. From the maximality of these selections, it follows that:
Y n ( n ) Y n ( n + 1 ) Y n ( n + 2 )
Define the sets Y n = k n Y n ( k ). Then, with probability at least p, all of them are nonempty and compose an infinite block selection from Theorem 13. Since p may be chosen arbitrarily close to 1 – q*, such selections do exist with probability 1 – q*.
Theorem 13 can be strengthened by taking several measures in place of a unique measure ν M1(Ω).
Theorem 14. Let X1, X2, X3,… be an unconditional colored branching process with a finite set of colorsand probability of degeneration q* < 1. If a finite collection of measures νi M1(Ω), where i = 1,…, k, satisfy the inequalities ρ(νi, μ) < 0, then for any neighborhoods O(νi) M1(Ω) and any ε > 0 with probability 1 – q*, one can extract from the branching process a block selection Y1, Y2,… of an order N, such that for every i = 1,…, k, each sequence of blocks (y1,…, yn) Yn has at least:
l i ( N ) = e N ( ρ ( ν i , μ ) ε )
prolongations (y1,…, yn, y) Yn+1 with the property δy O(νi).
It can be proven in the same manner as the previous one. Here, the event An should be understood as the existence of a finite block selection Y1,…, Yn satisfying the conclusion of Theorem 14, and the event A should be defined by the system of inequalities:
# { x X N | δ x , N O ( ν i ) } > l i ( N ) e N ε / 2 , i = 1 , , k .
We leave the details to the reader.

11. Dimensions of Random Fractals (Lower Bounds)

Now, we continue the investigation of the space of infinite genetic lines:
X = { x = ( x 1 , x 2 , x 3 , ) | ( x 1 , , x n ) X n n } ,
which is generated by an unconditional colored branching process X1, X2,… with a finite set of colors Ω = {1,…, r}. It is assumed that there is a measure:
μ = ( μ ( 1 ) , , μ ( r ) )
on Ω, where μ(i) denote the expectation of children of color i born by each individual, and X is equipped with cylindrical θ-metric (41).
Theorem 15. Let X1, X2, X3,… be an unconditional colored branching process with a finite set of colorsand probability of degeneration q* < 1. If a measure ν M1(Ω) satisfies the condition d(ν, μ, θ) > 0, then with probability 1 – q* for any neighborhood O(ν), we have the lower bound:
dim H { x X | N n > N δ x , n O ( ν ) } d ( ν , μ , θ ) .
Proof. Let us fix any number α < d(ν, μ, θ) and ε > 0 so small that:
d ( ν , μ , θ ) = ρ ( ν , μ ) i = 1 r ν ( i ) ln θ ( i ) > ρ ( ν , μ ) + 2 ε i = 1 r ν ( i ) ln θ ( i ) ε > α .
Then, we choose a convex neighborhood O*(ν), whose closure lies in O(ν) and such that, for any measure δO*(ν):
i = 1 r δ ( i ) ln θ ( i ) > i = 1 r ν ( i ) ln θ ( i ) ε .
By Theorem 13 with probability 1 – q*, one can extract from the branching process under consideration a block selection Y1, Y2,… of an order N, such that any sequence of blocks (y1,…, yn) from Yn has at least l(N) prolongations in Yn+1, where:
l ( N ) = e N ( ρ ( ν , μ ) ε ) ,
and for each block y k = ( x ( k 1 ) N + 1 , , x k N ), the corresponding empirical measure δyk (spectrum) belongs to O*(ν). Let us exclude from this selection a certain part of genetic lines in such a way that each of the remaining sequences of blocks (y1,…, yn) ∈ Yn has exactly [l(N)] prolongations in Yn+1.
Define the random set:
Y = { y = ( y 1 , y 2 , ) | ( y 1 , , y n ) Y n , n = 1 , 2 , } .
Any sequence y = (y1, y2,…) ∈ Y consists of blocks of length N. Writing down in order the elements of all of these blocks, we obtain from y an infinite genetic line x = (x1, x2,…) ∈ X. Denote it as π(y). By the definition of Y, the spectrum of each block yk belongs to O*(ν). For every point x = π(y), where yY, the empirical measure δx,nN is the arithmetical mean of the empirical measures corresponding to the first n blocks of y, and so, it belongs to O*(ν), as well. It follows that:
π ( Y ) { x X | N n > N δ x , n O ( ν ) } .
The family of all cylinders of the form ZnN(x), where xπ(Y), generates some σ-algebra on π(Y). Define a probability measure P on this σ-algebra, such that:
P ( Z n N ( x ) ) = [ l ( N ) ] n .
Then, for all large enough N, all xπ(Y) and all natural n:
P ( Z n N ( x ) ) e n N ( ρ ( ν , μ ) + 2 ε ) .
On the other hand, by Formula (54):
| Z n N ( x ) | = t = 1 n N θ ( x t ) = exp { t = 1 n N ln θ ( x t ) } = exp { i = 1 r n N δ x , n N ( i ) ln θ ( i ) } exp { n N ( i = 1 r ν ( i ) ln θ ( i ) ε ) } .
It follows from the last two formulas and Formula (53) that:
| Z n N ( x ) | α exp { n N α ( i = 1 r ν ( i ) ln θ ( i ) ε ) } e n N ( ρ ( ν , μ ) + 2 ε ) P ( Z n N ( x ) ) .
Now, we are ready to compute the Hausdorff measure of dimension α of the set π(Y). If, while computing the Hausdorff measure, we used coverings of π(Y) not by arbitrary cylinders, but only by cylinders of orders divisible by N, then the last formula would imply that such a measure is not less than P (π(Y)) = 1. Any cylinder can be embedded in a cylinder of order divisible by N, such that the difference of their orders is less than N and the ratio of their diameters greater than min θ(i). Therefore,
mes ( π ( Y ) , α ) min θ ( i ) N α
and hence, dimH π(Y) ≥ α.
The set defined in the right-hand part of Formula (55) contains π(Y). Thus, its dimension is at least α. Recall that we have proven this fact by means of a block selection that exists with probability 1 – q*. By the arbitrariness of α < d(ν, μ, θ), this implies the desired bound (52) with the same probability.
Theorem 16. Let s be a root to the “Bowen equation”:
i = 1 r μ ( i ) θ ( i ) s = 1 .
If s ≤ 0, then X = ∅ almost surely. Otherwise, if s > 0, then X is nonempty with a positive probability, and with the same probability, its dimension equals s.
We note that a more general version of Theorem 16, in which the similarity coefficients θ(i) are random, is proven in [46,14].
Proof. The expectation of the total number of children born by each individual in the branching process generating the set X is equal to m = μ(1) + ⋯ + μ(r). If s ≤ 0, then m ≤ 1. In this case, by Theorem 8, our branching process degenerates almost surely, and X = ∅.
If s > 0, then m > 1. In this case, by Theorem 8, our branching process does not degenerate with a positive probability, and X is nonempty. Define a measure νM1(Ω) by means of the equality:
ν ( i ) = μ ( i ) θ ( i ) s , i Ω .
Then, evidently, d(ν, μ, θ) = s. By the previous theorem, dimH X ≥ s with the same probability with which X ≠ ∅. On the other hand, by Theorem 12, the inverse inequality holds almost surely.
For every probability measure νM1(Ω), we define its basin B(ν) ⊂ X as the set of all infinite genetic lines x = (x1, x2, x3,…), such that the corresponding sequence of empirical measures δx,n converges to ν. What is the dimension of B(ν)? By Theorem 12, it does not exceed the Billingsley–Kullback entropy d(ν, μ, θ) with probability one. On the other hand, the inverse inequality does not follow from the previous results (and, in particular, from Theorem 15). To obtain it, we ought to enhance the machinery of block selections.
Lemma 5. Let Q 1 , , Q 2 r be the vertices of a cube inr. Then, there exists a law of choice i: ℝr → {1,…, 2r}, such that if neighborhoods O(Qi) are small enough, and sequences:
δ n a n d Δ n = δ 1 + + δ n n
satisfy the conditions δ n + 1 O ( Q i ( Δ n ) ) and δ 1 O ( Q 1 ) O ( Q 2 r ), then the sequence Δn converges to the center of the cube.
Proof. First, we consider the case r = 1, when the cube is a segment. Let, for definiteness, Q1 = 1 and Q2 = 1. Set:
i ( Δ ) = { 1 , if Δ 0 , 2 , if Δ < 0.
Take any neighborhoods O(Q1) and O(Q2)with radii at most one. Then, for any sequence δn satisfying the conditions δ n + 1 O ( Q i ( Δ n ) ) and 1| < 2, we have the estimate |n| < 2/n. It may be easily proven by induction. Thus, in the one-dimensional case, the lemma is proven. To prove it in the multidimensional case, one should choose a coordinate system with the origin at the center of the cube and the axes parallel to its edges and apply the law of choice (56) to each of the coordinates independently.
Theorem 17. Let X1, X2, X3,… be an unconditional colored branching process with a finite set of colorsand probability of degeneration q* < 1. If a measure νM1(Ω) satisfies the condition d(ν, μ, θ) > 0, then with probability 1 – q*:
dim H B ( ν ) = d ( ν , μ , θ ) .
Proof. Fix any number α < d(ν, μ, θ) and ε > 0 so small, that:
d ( ν , μ , θ ) = ρ ( ν , μ ) i = 1 r ν ( i ) ln θ ( i ) > ρ ( ν , μ ) + 3 ε i = 1 r ν ( i ) ln θ ( i ) ε > α .
The set M1(Ω) is, in fact, a simplex of dimension r − 1, where r = ||. Suppose first that ν is an inner point of this simplex (in other words, ν(i) > 0 for all i ∈ Ω). Take a small convex neighborhood O(ν) ⊂ M1(Ω), such that for any measure δO(ν):
ρ ( δ , μ ) < ρ ( ν , μ ) + ε , i = 1 r δ ( i ) ln θ ( i ) > i = 1 r ν ( i ) ln θ ( i ) ε .
Let Q 1 , , Q 2 r 1 be the vertices of some cube in O(ν) with the center at ν. Define for them small neighborhoods O(Qi) ⊂ O(ν) as in Lemma 5. By Theorem 14 with probability 1 – q*, one can extract from the colored branching process X1, X2,… a block selection Y1, Y2,… of an order N, such that for every i ≤ 2r−1, each sequence of blocks (y1,…, yn) ∈ Yn has at least:
l ( N ) = e N ( ρ ( ν , μ ) 2 ε )
prolongations (y1,…, yn, y) ∈ Yn+1 possessing the property δyO(Qi).
We exclude from this block selection a certain part of the genetic lines, so that each of the remaining sequences of blocks (y1,…, yn) ∈ Yn has exactly [l(N)] prolongations (y1,…, yn, y) ∈ Yn+1, and all of these prolongations satisfy the law of choice from Lemma 5, namely,
δ y O ( Q i ( Δ n ) ) , where Δ n = δ y 1 + + δ y n n .
Let π(Y) be the set of all infinite genetic lines (x1, x2,…) ∈ X for which every initial segment of length nN, being partitioned into blocks of length N, turns into an element of Yn. Then, by Lemma 5, we have the inclusion π(Y) ⊂ B(ν).
Reproducing the reasoning from the proof of Theorem 15, one can verify that the dimension of π(Y) is greater than α. Since α can be taken arbitrarily close to d(ν, μ, θ), we obtain the lower bound dimH B(ν) ≥ d(ν, μ, θ). The inverse inequality, as was mentioned above, follows from Theorem 12. Thus, in the case of an inner point νM1(Ω), the theorem is proven.
If the measure ν belongs to the boundary of the simplex M1(Ω), then one should exclude from Ω all elements i with ν(i) = 0 and consider the set:
Ω = { i Ω | ν ( i ) > 0 } .
Let us exclude from the branching process X1, X2,… all of the genetic lines containing elements of colors not in Ω0 and denote by X 1 , X 2 the resulting branching process (with the set of colors Ω). The corresponding set of infinite genetic lines X is contained in X. It follows from the definition of the Billingsley–Kullback entropy d(ν, μ, θ) that it is the same for the sets of colors Ω and Ω. Besides, the measure ν lies in the interior of the simplex M1(Ω). Therefore, dim H B ( ν ) X = d ( ν , μ , θ ) with the same probability as the probability of the event X .
The theorem would be completely proven if the probability of the event X was equal to 1 – q*. However, it may be less than 1 – q*. This obstacle can be overcome as follows.
Let m = i Ω μ ( i ). This is nothing else than the expectation of each individual’s number of children in the branching process X 1 , X 2 If m′ ≤ 1, then Formula (11) implies the inequality ρ(ν, μ) ≥ 0, which contradicts the condition d(ν, μ, θ) > 0 of our theorem. Therefore, m′ > 1, and respectively, the probability of the event X is strictly less than one. Let us denote it by q′.
If the branching process X 1 , X 2 was generated not by a unique initial element, but by k initial elements, then the probability of the event X would be equal to (q′)k. Recall that the cardinality of Xn grows exponentially with probability 1 – q*. If this is the case, one can first wait for the event |Xn| ≥ k and then consider separately |Xn| independent counterparts of the branching process X 1 , X 2 generated by different elements of Xn. This trick allows one to obtain the bound dimH B(ν) ≥ d(ν, μ, θ) with conditional probability at least 1 (q′)k under the condition |Xn| → ∞. Since k is arbitrary, the above-mentioned conditional probability is, in fact, equal to one, and the complete probability cannot be less than 1 – q*.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Bakhtin, V.I. Positive processes. Ergod. Theor. Dyn. Syst. 2007, 27, 639–670. [Google Scholar]
  2. Antonevich, A.B.; Bakhtin, V.I.; Lebedev, A.V. On t-entropy and variational principle for the spectral radius of transfer and weighted shift operators. Ergod. Theor. Dyn. Syst. 2011, 31, 995–1042. [Google Scholar]
  3. Antonevich, A.B.; Bakhtin, V.I.; Lebedev, A.V. A road to the spectral radius of transfer operators. Dyn. Syst. Group Actions 2012, 567, 17–51. [Google Scholar]
  4. Falconer, K.J. Random fractals. Math. Proc. Camb. Philos. Soc. 1986, 100, 559–582. [Google Scholar]
  5. Mauldin, R.D.; Williams, S.C. Random recursive constructions: Asymptotic geometric and topological properties. Trans. Am. Math. Soc. 1986, 295, 325–346. [Google Scholar]
  6. Graf, S. Statistically self-similar fractals. Probab. Theory Relat. Fields. 1987, 74, 357–392. [Google Scholar]
  7. Falconer, K.; Miao, J. Random subsets of self-affine fractals. Mathematica 2010, 56, 61–76. [Google Scholar]
  8. Koivusalo, H. Dimension of uniformly random self-similar fractals. Real Anal. Exch. 2014, 39, 73–90. [Google Scholar]
  9. Järvenpää, E.; Järvenpää, M.; Käenmäki, A.; Koivusalo, H. Stenflo, Ö.; Suomala, V. Dimensions of random affine code tree fractals. Ergod. Theor. Dyn. Syst. 2014, 34, 854–875. [Google Scholar]
  10. Billingsley, P. Hausdorff dimension in probability theory. Ill. J. Math. 1960, 4, 187–209. [Google Scholar]
  11. Billingsley, P. Hausdorff dimension in probability theory II. Ill. J. Math. 1961, 5, 291–298. [Google Scholar]
  12. Young, L.-S. Dimension, entropy and Lyapunov exponents. Ergod. Theor. Dyn. Syst. 1982, 2, 109–124. [Google Scholar]
  13. Pesin, Y.B. Dimension Theory in Dynamical Systems: Contemporary Views and Applications; Chicago University Press: Chicago, IL, USA, 1998. [Google Scholar]
  14. Falconer, K. Fractal Geometry: Mathematical Foundations and Applications, 2nd ed; Wiley: Chichester, UK, 2003. [Google Scholar]
  15. Deuschel, J.-D.; Stroock, D.W. Large Deviations: Pure and Applied Mathematics; Academic Press: Boston, MA, USA, 1989; Volume 137. [Google Scholar]
  16. Varadhan, S.R.S. Large deviations. Ann. Probab. 2008, 36, 397–419. [Google Scholar]
  17. Eggleston, H.G. The fractional dimension of a set defined by decimal properties. Q. J. Math. 1949, 20, 31–36. [Google Scholar]
  18. Athreya, K.B.; Ney, P.E. Branching Processes; Springer: Berlin/Heidelberg, Germany, 1972. [Google Scholar]
  19. Harris, T.E. The Theory of Branching Processes; Springer: Berlin/Göttingen/Heidelberg, Germany, 1963. [Google Scholar]

Share and Cite

MDPI and ACS Style

Bakhtin, V. The McMillan Theorem for Colored Branching Processes and Dimensions of Random Fractals. Entropy 2014, 16, 6624-6653. https://doi.org/10.3390/e16126624

AMA Style

Bakhtin V. The McMillan Theorem for Colored Branching Processes and Dimensions of Random Fractals. Entropy. 2014; 16(12):6624-6653. https://doi.org/10.3390/e16126624

Chicago/Turabian Style

Bakhtin, Victor. 2014. "The McMillan Theorem for Colored Branching Processes and Dimensions of Random Fractals" Entropy 16, no. 12: 6624-6653. https://doi.org/10.3390/e16126624

APA Style

Bakhtin, V. (2014). The McMillan Theorem for Colored Branching Processes and Dimensions of Random Fractals. Entropy, 16(12), 6624-6653. https://doi.org/10.3390/e16126624

Article Metrics

Back to TopTop