Next Article in Journal
Multiscale Geometric Analysis Fusion-Based Unsupervised Change Detection in Remote Sensing Images via FLICM Model
Previous Article in Journal
Estimating Non-Gaussianity of a Quantum State by Measuring Orthogonal Quadratures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Entropy-Variance Curves of Binary Sequences Generated by Random Substitutions of Constant Length

by
Juan Carlos Nuño
*,† and
Francisco J. Muñoz
Department of Applied Mathematics, Universidad Politécnica de Madrid, 28040 Madrid, Spain
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Entropy 2022, 24(2), 290; https://doi.org/10.3390/e24020290
Submission received: 17 January 2022 / Revised: 10 February 2022 / Accepted: 15 February 2022 / Published: 18 February 2022
(This article belongs to the Section Complexity)

Abstract

:
We study some properties of binary sequences generated by random substitutions of constant length. Specifically, assuming the alphabet { 0 , 1 } , we consider the following asymmetric substitution rule of length k: 0 0 , 0 , , 0 and 1 Y 1 , Y 2 , , Y k , where Y i is a Bernoulli random variable with parameter p [ 0 , 1 ] . We obtain by recurrence the discrete probability distribution of the stochastic variable that counts the number of ones in the sequence formed after a number i of substitutions (iterations). We derive its first two statistical moments, mean and variance, and the entropy of the generated sequences as a function of the substitution length k for any successive iteration i, and characterize the values of p where the maxima of these measures occur. Finally, we obtain the parametric curves entropy-variance for each iteration and substitution length. We find two regimes of dependence between these two variables that, to our knowledge, have not been previously described. Besides, it allows to compare sequences with the same entropy but different variance and vice versa.

1. Introduction

Binary sequences (strings or chains) appear naturally in physical systems for describing growing processes generated from an initial state. Examples are ubiquitous, from the spin up-down systems in physics to codification in information theory and its reflection in biomolecules as DNA and RNA (where two sets of nucleotides exist: purines and pyrimidines). They are also common objects in mathematics, representing, for example, sets of words formed from letters of an alphabet or symbolic representations of dynamical systems, where mapping can be defined to transform real trajectories into two state series [1]. Binary sequence can also codified fractal sets obtained by an iterative process, as occurs in the classical Cantor set [2,3].
In order to study the properties of systems with fractal geometry, Mandelbrot defined some random sets that are generated recursively from a given initial set as a percolation process [2,4]. These sets are formed by successive application of a defined set of rules that, either divide the initial set and discard some subsets in each partition, or enlarge the initial set by substituting the existing subsets by multiple replicas of themselves. These sets can be embedded in 1D, 2D and 3D spaces, conditioning some of their characteristics, in particular, percolation.
In this paper, we study the properties of the sequences formed by performing random substitutions of constant length (see, for instace, [5]). Specifically, we consider a binary alphabet, that we denotate as { 0 , 1 } , and substitutions of length k = 2 , 3 , . As in the classical percolation process, we assume that, once a void site is formed, i.e., a 0 occurs at this site, subsequent substitutions of it yield k zeros, always. On the other hand, filled sites, i.e., 1 at this site, are substituted by random words of k letters with a (uniform) probability of inserting a 1 defined by a parameter p. In this substitution, the probability of inserting a 0 is q = 1 p .
The process defined in this way gives rise to sequences of increasing size. The number of ones (filled sites) in the sequence at iteration i is a stochastic variable that depends on the probability p and the length k. We study the probabilistic properties of these sequences. Specifically, we derive the probability distribution of the number of ones at iteration i. We also obtain the expected values of X and X 2 , the corresponding variance V A R , and obtain some properties of them. As an alternative measure of uncertainty, we calculate the mean entropy of these sequences and compare it with their variance [6,7]. Finally, we find the parametric curves that relate both measures for each iteration and substitution length. These curves present two regimes: for low values of p the dependence entropy-variance is convex (concave down), whereas for large values of p this dependence is practically lost. Therefore, for each iteration and substitution length, we can generate sequences with the same entropy but with different variance and vice versa.

2. Sequence Generation

The generation of the sequences studied in this work is a one-dimensional example of the classical well-known Mandelbrot’s percolation process [4]. Contrary to the classical formulation, where an initial segment of length L is subdivided into k subsegments of length L / k and, with probability p, some of them are chosen to be further divided, the model we study here considers an initial digit, that we denotate by 1, that is substituted by a word of length k formed by digits { 0 , 1 } . As in the classical model, we assume that digits denotated by 0 are substituted by a word of k zeros (see Figure 1) and sites occupied by ones are substituted by a word of k letters with a independent probability p. This procedure is applied successively to generate a binary sequence of finite length, after i iterations or, in the limit, infinite length. When k = 3 , this procedure can be viewed as a probabilistic version of the Cantor set for 0 < p < 1 [2].
These procedures are called substitutions, as they are obtained by replacing a digit by a word of digits [1,8,9]. In this context, we study the following map that applies at each random substitution of length k:
0 0 0 0 1 Y 1 Y 2 Y k
where Y j is a Bernoulli random variable with parameter p [ 0 , 1 ] , i.e., p means the (uniform) probability of inserting 1 in each position and independently for all j. We note that an alternative description of this substitution process can be defined by using the Kronecker product [10,11,12,13].
The deterministic version of this procedure brings about well known sequences. For the classical Cantor set, this map reduces to:
0 0 0 0 1 1 0 1
which, starting from a unique 1, yield the sequences:
s 1 = ( 1 , 0 , 1 ) s 2 = ( 1 , 0 , 1 , 0 , 0 , 0 , 1 , 0 , 1 ) s 3 = ( 1 , 0 , 1 , 0 , 0 , 0 , 1 , 0 , 1 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 1 , 0 , 1 , 0 , 0 , 0 , 1 , 0 , 1 )
whose limit of the iterative process is the infinite sequence that corresponds to the classical Cantor set. For this particular case, the length of the i-th vector is L i = 3 i . In general, for substitutions of constant length k, the ith-iterated vector has length L i = k i .
There are other well-known sequences generated iteratively by substitutions of different length. For instance, the Fibonacci sequence is generated applying the following map [14,15]:
0 1 1 1 0
Starting from 0, after six iterations we obtain the binary vector: ( 1 , 0 , 1 , 1 , 0 , 1 , 0 , 1 ) . Here, the number of 1 of the ith-sequence represents the fertile population of rabbits. Thus, the number of mature pairs of rabbits after six generations is 5. Notice that, in principle, the position in the sequence has not a spatial meaning. It is also worthy to remark that this sequence can be generated by different maps [14].
Another self-similar sequence generated by a recursive process is the so-called Morse-Thue (MT), which is obtained by the following map:
0 0 1 1 1 0
starting from 0 (although it could be also initiated by 1) [14,15]. Also for this example, there is an alternative way of generating this sequence, simply by take the sequence of the preceding step and add the complement. Besides, this sequence is aperiodic, although completely deterministic. On the contrary, it can be proven that there are short and long range correlations [14].

3. Main Statistical Properties

In this section, we study the statistical properties of the sequences generated by applying the rule (1) recurrently to the starting sequence (1). A classical issue to be considered is the probability distribution of each of the letters of the alphabet. In particular, in the case of binary sequences, this problem is reduced to know the probability distribution of, for instance, the number of ones of any population of sequences generated from a set of random rules, e.g. random substitutions.
As a starting point, the classical Bernoulli process could be illustrative. This process can be also viewed as a random substitution of constant length, k, where both letters, e.g., 0 and 1, are identically substituted according with the rule:
0 Z 1 , Z 2 , Z k ; 1 Y 1 , Y 2 , Y k
being Z j and Y j Bernoulli processes with parameter p [ 0 , 1 ] . In each iteration, i, the length of the sequence is n = k i , and the probability distribution of the number of ones, X, is given by the binomial distribution:
P ( X = x ) = n x p x ( 1 p ) n x
By applying this distribution function, the expected values of the number of ones in a sequence of length n, X n , and its square, respectively E B ( X n ) and E B ( X n 2 ) , can be calculated:
E B ( X n ) = n p
E B ( X n 2 ) = n p ( 1 + ( n 1 ) p )
for n = 1 , 2 , 3 , . Using these expressions, the variance of X reads:
V A R B ( X n ) = E B ( X n 2 ) ( E B ( X n ) ) 2 = n p ( 1 p )
Note that for the binomial distribution, the ratio variance-mean is equal to 1 p for any n.
Similarly, we can compute these moments for the sequences generated from random substitutions defined in the previous section. In this case, a recurrent formula provides this distribution for any iteration i and for any value of the substitution length k.
Let us present first the results for a substitution of length k = 2 . As stated before, we start the generation of the binary sequences from a sequence with a unique one. Thus, the probability of having this initial sequence is assumed to be 1:
P ( X 0 = 1 ) = 1
In the first step, a sequence of length two is formed whose distribution of ones depends on the probability p (see Figure 2). To obtain the the distribution probability of ones in this first step, P ( X 1 ) , we multiply from left P ( X 0 ) by the substitution matrix:
M 3 × 1 = 2 0 ( 1 p ) 2 2 1 p ( 1 p ) 2 2 p 2
givin rise to the vector:
P ( X 1 ) = P 1 ( X = 0 ) P 1 ( X = 1 ) P 1 ( X = 2 ) = M 3 × 1 P ( X 0 ) = M 3 × 1
where X 1 = ( 0 , 1 , 2 ) .
In the next substitution, a four digits sequence is formed and the probability distribution of ones can be computed from the previous one as follows:
P ( X 2 ) = P ( X = 0 ) P ( X = 1 ) P ( X = 2 ) P ( X = 3 ) P ( X = 4 ) = 1 2 0 ( 1 p ) 2 4 0 ( 1 p ) 4 0 2 1 p ( 1 p ) 4 1 p ( 1 p ) 3 0 2 2 p 2 4 2 p 2 ( 1 p ) 2 0 0 4 3 p 3 ( 1 p ) 0 0 4 4 p 4 P ( X = 0 ) P ( X = 1 ) P ( X = 2 ) = M 5 × 3 P ( X 1 )
where X 2 = ( 0 , 1 , 2 , 3 , 4 ) . The third iteration confirms the formula (see Figure 2):
P ( X 3 ) = M 9 × 5 P ( X 2 )
with
P ( X 3 ) = P ( X = 0 ) P ( X = 1 ) P ( X = 2 ) P ( X = 3 ) P ( X = 4 ) P ( X = 5 ) P ( X = 6 ) P ( X = 7 ) P ( X = 8 )
where X 3 = ( 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ) , and
M 9 × 5 = 1 2 0 ( 1 p ) 2 4 0 ( 1 p ) 4 6 0 ( 1 p ) 6 8 0 ( 1 p ) 8 0 2 1 p ( 1 p ) 4 1 p ( 1 p ) 3 6 1 p ( 1 p ) 5 8 1 p ( 1 p ) 7 0 2 2 p 2 4 2 p 2 ( 1 p ) 2 6 2 p 2 ( 1 p ) 4 8 2 p 2 ( 1 p ) 6 0 0 4 3 p 3 ( 1 p ) 6 3 p 3 ( 1 p ) 3 8 3 p 3 ( 1 p ) 5 0 0 4 4 p 4 6 4 p 4 ( 1 p ) 2 8 4 p 4 ( 1 p ) 4 0 0 0 6 5 p 5 ( 1 p ) 8 5 p 5 ( 1 p ) 3 0 0 0 6 6 p 6 8 6 p 6 ( 1 p ) 2 0 0 0 0 8 7 p 7 ( 1 p ) 0 0 0 0 8 8 p 8
The general expression for the recurrent formula for the probability distribution of the number of ones in the i-th iteration is:
P ( X i ) = M ( 2 i + 1 ) × ( 2 i 1 + 1 ) P ( X i 1 )
being the vector X i = ( 0 , 1 , 2 , , 2 i ) and the general substitution matrix, M ( 2 i + 1 ) × ( 2 i 1 + 1 ) , whose coefficients are given by:
m a b i = 2 ( b 1 ) a 1 p a 1 ( 1 p ) 2 ( b 1 ) ( a 1 ) if a 2 b 1 0 otherwise
for i = 2 , 3 , and 1 a 2 i + 1 and 1 b 2 i 1 + 1 . For i = 1 , the coefficients of matrix M 3 × 1 (12) are given by:
m a 1 1 = 2 a 1 p a 1 ( 1 p ) 2 ( a 1 ) for 1 a 3
By recurrence, Formula (18) can be converted into a matrix product chain:
P ( X i ) = M ( 2 i + 1 ) × ( 2 i 1 + 1 ) M ( 2 i 1 + 1 ) × ( 2 i 2 + 1 ) M 9 × 5 M 5 × 3 P ( X 1 )
Figure 3 depicts the probability distributions for different values of p for the iteration i = 7 . Figure 4 shows both the analytical (black continuous curves) and the corresponding numerical distributions obtained from 1000 realizations. As it can be seen, the distributions exhibit a peak at X = 0 whose height decreases as p increases. Besides this peak, the distributions also present other maxima whose heights depends on p, being smaller for lower values of p. This multimodality could be a consequence of the generation rule that fosters the emergence of clusters of scalable sizes. As a matter of fact, the fractal properties of these sequences are the main subject of a forthcoming paper.
The probability of generating a null sequence, i.e., formed completely by 0, after i substitutions, P ( X i = 0 ) , corresponds to the first component of P ( X i ) . The following recurrence formula can be obtained [16]:
P ( X i + 1 = 0 ) = P ( X i = 0 ) p + ( 1 p ) 2
for i = 1 , 2 , . Note that p P ( X i 0 ) represents the probability of yielding a non-null sequence after a substitution that includes at least one 1 in the previous non-null sequence and then, 1 p P ( X i 0 ) represents the probability of yielding a null sequence after a substitution of length 1. This expression can be rewritten as:
1 p P ( X i 0 ) = 1 p ( 1 P ( X i = 0 ) ) = 1 p + p P ( X i = 0 )
Because the substitution has length k = 2 , and it is formed by independent digits, this expression must be multiplied by itself resulting in Equation (22).
The first term in this sequence is:
P ( X 1 = 0 ) = ( 1 p ) 2
which yields the second one:
P ( X 2 = 0 ) = ( 1 p ) 2 p + ( 1 p ) 2
It is worthy to remark that the expansions of P i ( X i = 0 ) as polynomials of p and q = 1 p have the triangle by rows (oeis-A202019) [17] as coefficients.
Assuming that this sequence of probability distributions converges to ϕ , the following equation holds:
ϕ = ϕ p + ( 1 p ) 2
whose lowest solution:
ϕ ( p ) = min { 1 , 1 p p 2 }
depends on p as depicted in Figure 5. Note that, for p below p c = 1 2 the probability of generating a null infinite sequence is one, i.e., ϕ ( p ) = 1 for p < p c .
Equation (22) can be generalized straightforwardly to any substitution length k using the same demonstration as above (see [16]). Now, the probability of having a null sequence at iteration i + 1 is obtained after multiplying k times the factor 1 p P ( X i 0 ) :
P ( X i + 1 = 0 ) = 1 p P ( X i 0 ) k = P ( X i = 0 ) p + ( 1 p ) k
As before, in the limit this equation converges to:
ϕ = ϕ p + ( 1 p ) k
that also exhibits a critical value of p:
p c ( k ) = 1 k
As it can be observed in Figure 5, as the substitution length k increases the probability of generating a null sequence decreases for p > p c ( k ) ; also does the range where this probability is one, i.e., p c ( k ) 0 as k .

Expected Values and Variance

Having already the probability distributions of ones, it is straightforward to compute its expected value after i iterations:
E ( X ) = j = 0 2 i j P i ( X = j )
This equation can be written in matrix form:
E ( X i ) = X i P ( X i )
Now, by applying Formula (18)
E ( X i ) = X i M ( 2 i + 1 ) × ( 2 i 1 + 1 ) P ( X i 1 )
It turns out that
X i M ( 2 i + 1 ) × ( 2 i 1 + 1 ) = E B ( X i )
where by E B ( X i ) we refer to a vector whose entries are the expectation of a Binomial random variable Y j B i n ( 2 j , p ) , j = 0 , 1 , 2 , . . . , 2 i 1 for successive iterations. Thus:
E B ( X i ) = 2 X i = ( 0 , 2 p , 4 p , , 2 i p )
Then, the following recurrence formula holds:
E ( X i + 1 ) = 2 p E ( X i )
for i = 0 , 1 , 2 , and E 1 = 2 p . This recurrence yields the expression:
E ( X i ) = μ i ( p , 2 ) = ( 2 p ) i
for i = 0 , 1 , 2 , .
This formula can be straightforwardly generalized to any length k:
E ( X i ) = μ i ( p , k ) = ( k p ) i i = 0 , 1 , 2 ,
The mean of the distribution of 0 in the sequence formed after i substitutions of length k is:
μ i 0 ( p , k ) = k i ( k p ) i = k i ( 1 p i )
that yields a ratio # 1 / # 0 at the ith-iteration:
μ i 10 ( p ) = ( k p ) i k i ( 1 p i ) = p i 1 p i
This ratio tends to 0 as i for all 0 p < 1 and for all k. Furthermote, the limit of μ i 10 ( p ) as p 1 is infinite for all values of i = 1 , 2 , . Note that this ratio is one for the value of p μ = 1 = 2 1 i , independently of k.
Similarly, it can be calculated the expected value of X 2 , E ( X 2 ) . First, for the case k = 2 we obtain:
E ( X i 2 ) = j = 0 i j 2 P ( X = j )
For the second iteration, it is straightforward to calculate this value from this definition:
E ( X 1 2 ) = ( 0 , 1 , 4 ) ( 1 p ) 2 2 p ( 1 p ) p 2 = 2 p ( 1 + p )
Using the recurrence of the probability distributions for successive iterations, the expected value of the next iteration is also computed:
E ( X 2 2 ) = ( 0 , 1 , 4 , 9 , 16 ) 1 2 0 ( 1 p ) 2 4 0 ( 1 p ) 4 0 2 1 p ( 1 p ) 4 1 p ( 1 p ) 3 0 2 2 p 2 4 2 p 2 ( 1 p ) 2 0 0 4 3 p 3 ( 1 p ) 0 0 4 4 p 4 ( 1 p ) 2 2 p ( 1 p ) p 2
that simplifies to:
E ( X 2 2 ) = ( 0 , 2 p ( 1 + p ) , 4 p ( 1 + 3 p ) ) ( 1 p ) 2 2 p ( 1 p ) p 2 = 4 p 2 ( 2 p 2 + p + 1 )
Here we have used the expected value of X 2 for the binomial distribution for the number of ones in a sequence of length n.
In order to calculate the expected values of X 2 for successive iteration we find the following recursive formula:
E ( X i 2 ) = 2 p ( 1 p ) E ( X i 1 ) + ( 2 p ) 2 E ( X i 1 2 )
that can be derived using Equation (8) for the expected value of X 2 . Effectively,
E ( X i 2 ) = X i 2 M ( 2 i + 1 ) × ( 2 i 1 + 1 ) P ( X i 1 ) = E B ( X i 2 ) P ( X i 1 )
where
X i 2 = 0 , 1 , 2 2 , 3 2 , , ( 2 i ) 2
As before, E B ( X i 2 ) represents a vector with entries E ( Y j 2 ) where Y j B i n ( k j , p ) for j = 0 , 1 , 2 , . . . , 2 i 1 and, consequently, is given by:
E B ( X i 2 ) = 0 , 2 p ( 1 + p ) , 2 2 p ( 1 + 3 p ) , , 2 i p ( 2 i 1 ) p )
Then, since this vector can be split into two vectors:
E B ( X i 2 ) = 0 , 2 p , 2 2 p , , 2 i p + 0 , 2 p 2 , 2 2 p 2 , , 2 i p 2
Equation (45) holds.
It can be proven that the expected value of X 2 for the i-th iterations depends on p as follows:
E ( X i 2 ) = 2 i 1 p i 1 + j = 0 i ( 2 p ) j
for n = 1 , 2 , 3 , . If we replace the geometric sum:
E ( X i 2 ) = 1 2 ( 2 p ) i 1 + 1 ( 2 p ) i + 1 1 2 p
This expression can be applied to calculate the variance of the distribution of the number of ones in a sequence generated after i iterations, i.e., V A R ( X i ) = E ( X i 2 ) ( E ( X i ) ) 2 :
V A R ( X i ) = 1 2 ( 2 p ) i 1 + 1 ( 2 p ) i + 1 1 2 p ( 2 p ) 2 i
that simplifies to:
V A R ( X i ) = 1 p 1 2 p ( 2 p ) i ( 1 ( 2 p ) i )
This expression can be further compacted if we use the expression of the mean (37):
V A R ( X i ) = 1 p 1 2 p μ i ( p ) ( 1 μ i ( p ) )
These computations can be equally performed for any length k = 3 , 4 , . A general formulation for any value of k yields the following functions of p for the expected values of X and X 2 at the i-th iteration:
E ( X i ) ( p , k ) = ( k p ) i
E ( X i 2 ) ( p , k ) = ( k p ) i 1 + ( k 1 ) p 1 ( k p ) i 1 k p
Then, the variance is straightforward calculated and yields:
V A R ( X i ) = 1 p 1 k p ( k p ) i ( 1 ( k p ) i )
The limits as i tends to infinity of the mean μ and the variance V A R depend on the value of p. For 0 < p < 1 k , these limits equal 0, whereas for 1 k < p < 1 , the limits are infinite.
Contrary to the Bernoulli sequences, the variance reaches a maximum for an intermediate value of p, that depends on both k and i (see Figure 6).
An interesting index that could provide information about the stochasticity of the generation process is the variance-mean ratio, also known as the dispersion index, D. For iteration i, it can be written as:
D i ( p , k ) = V A R ( X i ) E ( X i ) = 1 p 1 k p ( 1 ( k p ) i )
By comparison with the Poisson distribution, that possesses a value of the dispersion equal to 1, it is said that a process that has a D-value lower than 1 is “under-dispersed”; on the contrary, having a D-value larger than 1 reflects an “over-dispersed” stochasticity. Since D i ( 0 , k ) = 1 and the derivative of D i ( p , k ) is positive at p = 0 for all i > 1 and furthermore, D i ( 1 , k ) = 0 , then there exist a value of p ( k , i ) = p 1 such that D i ( p 1 , k ) = 1 . This value of p is unique because D i ( p , k ) exhibits a unique maximum in ( 0 , 1 ) . Moreover, it is very close to 1 for all k and i and tends to 1 as i tends to infinity, meaning that under-dispersion only occurs when the probability of inserting 0 is very low.
As observed in Figure 6, the standard deviation of X exhibits a maximum at p = p m . Besides, this maximum exists for any k and i and it is achieved for the value of p such that:
d V A R ( X i ) d p = 0
which can be written as follows:
k i p i 1 i + j = 1 i 1 ( i + j ) k j 1 p j i k i p i = 0
which implies that:
j = 1 i 1 ( i + j ) k j 1 p j = i ( ( k p ) i 1 )
for any k and for each iteration i. Summing the finite sum of the left hand side of this equation yields to:
( 1 + i ) k p i k 2 p 2 2 i k i p i + ( 1 + 2 i ) k 1 + i p 1 + i k ( 1 + k p ) 2 = i ( ( k p ) i 1 )
This equation cannot be explicitly solved for p for any value of k and i. Nonetheless, the roots of the polynomial:
r k , i ( p ) = i + j = 1 i 1 ( i + j ) k j 1 p j i k i p i
can be obtained numerically. For each substitution length, k = 1 , 2 , , we are able to fit the roots of polynomial r k , i ( p ) to the function of i:
r k ( i ) = 1 1 + α ( k ) α ( k ) + ( i 1 ) β ( k )
The values of the fitting parameters α ( k ) and β ( k ) , as well as the corresponding residual sum of squares, are shown in Table 1. As it can be observed, as the k-value increases, it seems that α ( k ) 0.5 and β ( k ) 1 , that is:
r k ( i ) r ( i ) = 1 1 2 i
This suggests that the roots of the polynomials, i.e., the values of p where the variance of X achieves its maximum, tends to 1 as i tends to infinity. In addition, the variance of X at the maximum tends to infinity, although, by definition, this variance is null at p = 1 . We have not a rigorous explanation of the reason behind the almost independence with k of the roots of the polynomial r k , i ( p ) . Likely, it seems to be related to the self-similar properties of the generation process.

4. Entropy

In this section, we study the mean entropy of a population of sequences generated from the substitution rules defined in the previous section. As a matter of illustration, we consider first the entropy of the Bernoulli process of probability p:
H ( p ) = p l o g 2 ( p ) ( 1 p ) l o g 2 ( 1 p )
This function is symmetric and exhibits a maximum at p = 1 2 . If we now consider the sequences formed by n stochastic variables that follow a Bernoulli process, we can compute the entropy of a sequence of length n with a number of ones j in terms of the frequency j n , that coincides, in average, with p:
H ( j ) = j n l o g 2 ( j n ) ( 1 j n ) l o g 2 ( 1 j n )
Obviously, the sequences with minimum information content, j n = p = 0 and j n = p = 1 , have null entropy, i.e., H ( 0 ) = H ( 1 ) = 0 . In contrast, the maximum entropy is achieved for j = n 2 and takes the value: H ( n 2 ) = 1 .
For a population of Bernoulli sequences of probability p, the mean entropy can be calculated applying the binomial distribution to the entropy of the sequences generated with a probability p:
H n B ( p ) = H P ( p ) = k = 0 n H ( k ) n k p k ( 1 p ) n k
where H is the row vector entropy for a sequence of length n whose components, H ( k ) are the corresponding entropies Equation (67) for all possible combinations of ones. For instance, for n = 1 , this formula reduces to:
H 1 B ( p ) = 2 0 p 0 ( 1 p ) 2 H ( 0 ) + 2 1 p 1 ( 1 p ) 1 H ( 1 ) + 2 2 p 2 ( 1 p ) 0 H ( 2 ) = 2 p ( 1 p )
where H ( k ) are the entropies of sequences with 0, 1 and 2 ones: H ( 0 ) = 0 , H ( 1 ) = l o g 2 ( 2 ) = 1 and H ( 2 ) = 0 . Similarly, the mean entropy for a length n = 2 can be calculated as follows:
H 2 B ( p ) = k = 0 3 3 k p k ( 1 p ) n k H 2 ( k ) = 3 p ( 1 p ) ( 1 p ) 3 4 log ( 3 ) 2 + p
To compute the mean entropy of a population of sequences randomly generated by substitutions according to the rules defined in Section 2, we have to know the probability of appearance of each of the frequencies of ones in order to weights adequately the sequence entropies. As an example, let us start with the substitution length k = 2 . In this case, the probability distributions of 1 for the i-th iteration are given by Equation (18), and this yields the mean entropy:
H i ( p ) = H i P ( X i )
Here, the components of the entropy vector H i are given by:
H i ( j ) = j 2 i log 2 ( j 2 i ) ( 1 j 2 i ) log 2 ( ( 1 j 2 i ) )
for a sequence of length n = 2 i , with a number of ones j, such that 0 j 2 i .
Using the matrix expression for P , (Equation (21)), this equation reduces to:
H i ( p ) = H i M ( 2 i + 1 ) × ( 2 i 1 + 1 ) M ( 2 i 1 + 1 ) × ( 2 i 2 + 1 ) M 5 × 3 M 3 × 1 P ( X 1 ) = H i M ( 2 i + 1 ) × 1 P ( X 1 )
The entropy in the first iteration, i.e., after substituting the initial 1 by applying the rule (1) with k = 2 is given by:
H 1 ( p ) = 0 , 1 2 l o g 2 1 2 1 2 l o g 2 1 2 , 0 2 0 ( 1 p ) 2 2 1 p ( 1 p ) 2 2 p 2
which yields:
H 1 ( p ) = 2 p ( 1 p )
In the next iteration, the entropy can be calculated by the formula:
H 2 ( p ) = H 2 M 5 × 3 P ( X 1 )
where
H 2 = 0 , 1 4 l o g 2 1 4 3 4 l o g 2 3 4 , 1 2 l o g 2 1 2 1 2 l o g 2 1 2 , 3 4 l o g 2 3 4 1 4 l o g 2 1 4 , 0
Multiplying the matrices, we get:
H 2 ( p ) = 10 p 2 ( 1 p ) ( p 3 p 2 + 1 5 p + 4 5 ) 3 5 ( p 3 p 2 + 1 2 ) l o g 2 ( 3 )
In the same way, we can calculate the entropy of the 8-digit binary sequences obtained after three substitutions:
H 3 ( p ) = 202 p 3 ( 1 p ) 20 / 101 p 3 + 4 / 101 p + p 10 3 p 9 + 285 / 101 p 8 + 15 / 101 p 7 173 / 101 p 6 + 91 / 101 p 5 + 17 / 101 p 4 12 / 101 p 2 + 12 / 101 + + 7 / 101 p 10 + 21 / 101 p 9 42 / 101 p 8 + 35 / 202 p 7 + 42 / 101 p 6 42 / 101 p 5 7 / 101 p 4 + 21 / 202 p 3 + 7 / 101 p 2 7 / 202 l o g 2 ( 7 ) 35 101 p 27 / 35 p 7 + 9 / 14 p 6 + 39 / 35 p 5 33 / 35 p 4 9 / 35 p 3 + 3 / 35 p 2 + 9 / 70 p + 3 / 70 l o g 2 ( 3 ) + + 35 101 p 3 ( 1 p ) p 6 2 p 5 + 4 / 7 p 4 + 13 / 14 p 3 1 / 2 p 2 3 / 14 p 1 / 14 l o g 2 ( 5 ) ]
In Figure 7, we depict the entropies calculated analytically for the first three iterations, as well as their corresponding numeric estimations. Note that, contrary to the Bernoulli sequences, the entropies of these sequences are not symmetric and achieve a maximum at a value of p that depends on the iteration. Furthermore, it can be shown that the entropy "per digit", i.e., 1 2 i H i , tends to 0 as i for all values of p.
The differential entropy for successive iterations:
h i ( p ) = H i + 1 ( p ) H i ( p ) for i = 1 , 2 ,
presents an interesting behaviour, as it can be seen in Figure 7b. There is a critical value that separates two regimes: for p < p c h ( i ) , h i ( p ) is negative, whereas for p > p c h ( i ) , h i ( p ) is positive. As it can be seen, p c h ( i ) depends on the iteration: as i , p c h ( i ) 1 . As a consequence, the p-interval where h i ( p ) is positive shrinks to 0. Besides, h i ( p ) tends to 0 for all p, which means that the entropy converges to the function H ( p ) , for the infinite sequence.
Because both variance and entropy are measures of uncertainty, it could be interesting to find a functional dependence between both (see Figure 7 and Figure 8). The curves we see, for the case k = 2 , can be parametrized either by p, for each iteration i, or by i, for each p. The parametric equations for each p and i are given by Equations (53) and (71), for the variance and the entropy, respectively. As it can be observed, the relationship between H and V A R depends on p for each iteration i. It is important to remark that, contrary to what is known for classical probability distributions [6,7], the dependence of H on V A R is not the graph of a function. Instead, the variation of p [ 0 , 1 ] brings about a closed curve. Certainly, for p < p r , this functional dependence coincides with that of the classical probability distributions [6]. On the contrary, for p > p r , where both variables decrease with p, we find an almost linear dependence between them. Figure 8 depicts the complete parametric curves for the first iterations obtained analytically.
An interesting consequence of this dependence is that, for the same variance, we can find a value of p that maximizes the entropy. This can be done for any iteration i > 1 (remember that for i = 1 , H ( p ) = V A R ( p ) for all p [ 0 , 1 ] ). Note that the dispersion index, D, of these sequences would be different because the mean would change at both values of p. As an example, we find two values of p with a similar variance, in particular, V A R ( 0.85 ) V A R ( 0.99 ) 8700 but, with a mean entropy values H ( 0.85 ) = 0.67 and H ( 0.99 ) = 0.41 . Obviously, we can also find values of p that give rise to sequences with the same entropy that have different variance. For example, a similar mean entropy is achieved for p = 0.79 and p = 0.99 , whereas the variance of these populations are very distant, V A R ( 0.79 ) 3368 and V A R ( 0.99 ) 8741 .

5. Concluding Remarks

Motivated by the classical percolation process defined by Mandelbrot [4], we have studied a class of random substitutions of constant length that maps 0 0 , 0 , , 0 and 1 Y 1 , Y 2 , , Y k , being Y j a Bernoulli process with parameter p [ 0 , 1 ] . Contrary to the classical Bernoulli process, this is an asymmetric rule that yields infinite sequences with an average ratio # 1 / # 0 that tends to 0 for any 0 p < 1 . By applying this substitution map, we have randomly generated binary sequences whose length increases with the substitution length k as k i for i = 0 , 1 , 2 , for iteration i. Specifically, we have analyzed the combinatorial, statistical and entropic properties of this substitution process starting from a sequence formed by the digit 1.
This asymmetry is reflected in the mass distribution of the number of ones, specifically with regards to its uncertainty. We have computed the variance of this distribution and the entropy of the ensemble of sequences generated at each iteration as a function of p. The relationship between these two magnitudes is depicted in the H-VAR curves of Figure 8. It is interesting to remark the two regimes that appear in the curves H-VAR, i.e., concavity down for p < p r and up for p > p r that, to the best of our knowledge, have not been shown before [6]. This allows to compare sequences generated with different values of H but with the same variance and, on the contrary, sequences that have the same entropy and different variance. We have left for a forthcoming paper the study of other properties of the generated sequences as, for instance, the distribution of the sizes of the sets of ones and the fractality, measured by the Haussdorff dimension.
The same kind of substitution rules can also be performed in the plane or in the space to generate sets with specific properties. In particular, random substitutions have been applied to study percolation properties of 2D fractal objects [2,4]. Using an iterated function system [3], random fractals are generated as a function of a probabilistic control parameter, e.g. the probability of any site of being occupied, and their percolation characteristics are quantified in terms of this parameter.

Author Contributions

Conceptualization, J.C.N. and F.J.M.; Formal analysis, J.C.N. and F.J.M.; Investigation, J.C.N. and F.J.M.; Software, F.J.M.; Writing—original draft, J.C.N. and F.J.M.; Writing—review and editing, J.C.N. and F.J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been partially funded by the project PID2019-110330GB-C22 (Spanish Ministry of Science and Innovation).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to thank the reviewers for their fruitful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fogg, N.P.; Berthé, V.; Ferenczi, S.; Mauduit, C.; Siegel, A. (Eds.) Substitutions in Dynamics, Arithmetics and Combinatorics; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
  2. Dekking, F.M.; Meester, R.W.J. On the Structure of Mandelbrot’s Percolation Process and Other Random Cantor Sets. J. Stat. Phys. 1990, 58, 1109–1126. [Google Scholar] [CrossRef]
  3. Falconer, K. Fractal Geometry: Mathematical Foundations and Its Applications; John Wiley & Sons Ltd.: New Jersey, NJ, USA, 1990. [Google Scholar]
  4. Mandelbrot, B.B. The Fractal Geometry of Nature; W. H. Freeman and Company: New York, NY, USA, 1983. [Google Scholar]
  5. Barbé, A.; Von Haeseler, F.; Skordev, G. Limit sets of restricted random substitutions. Fractals 2006, 14, 37–47. [Google Scholar] [CrossRef]
  6. Mukher jee, D.; Ratnaparkhi, M.V. On the functional relationship between entropy and variance with related applications. Commun. Stat.-Theory Methods 1986, 15, 291–311. [Google Scholar]
  7. Ebrahimi, N.; Maasoumi, E.; Soofi, E.S. Ordering univariate distributions by entropy and variance. J. Econom. 1999, 90, 317–336. [Google Scholar] [CrossRef]
  8. García-Morales, V. Substitution systems and nonextensive statistics. Phys. A Stat. Mech. Appl. 2015, 440, 110–117. [Google Scholar] [CrossRef] [Green Version]
  9. Wolfram, S. A New Kind of Science; Wolfram Media Champaign: Illinois, IL, USA, 2002. [Google Scholar]
  10. Shallit, J.; Stolfi, J. Two methods for Generating fractals. Comput. Graph. 1989, 13, 185–191. [Google Scholar] [CrossRef]
  11. Gel<i>β</i>, P.; Schütte, C. Tensor-generated fractals: Using tensor decompositions for creating self-similar patterns. arXiv 2018, arXiv:1812.00814v1. [Google Scholar]
  12. Voevudko, A.E. Fractal dimension of the kronecker product based fractals. arXiv 2018, arXiv:1803.02766v1. [Google Scholar]
  13. Xue, D.; Zhu, Y.; Zhu, G.X.; Yan, X. Generalized kronecker product and fractals. In SPIE Digital Library, Proceedings of the Fourth International Conference on Computer-Aided Design and Computer Graphics, Wuhan, China, 23–25 October 1995; International Society for Optics and Photonics: Washington, DC, USA, 1996. [Google Scholar]
  14. Schroeder, M.R. Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise; Courier Corporation: Massachusetts, MA, USA, 1991. [Google Scholar]
  15. Baake, M.; Grimm, U.; Penrose, R. Aperiodic Order; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
  16. Chayes, L. Aspects of the fractal percolation process. Progress in Probability, Vol. 37. In Fractal Geometry and Stochastics; Springer: Berlin/Heidelberg, Germany, 1995. [Google Scholar]
  17. Sloane, N.J.A. Sequence A202019 in The On-Line Encyclopedia of Integer Sequences. Available online: http://www.oeis.org (accessed on 16 February 2022).
Figure 1. An example of schematic representation of the sequence generation (from top) for the cases (a) k = 2 and (b) k = 3 , after six and five substitutions, respectively. For both cases, p = 0.9 . Black and white represent 1 and 0, respectively.
Figure 1. An example of schematic representation of the sequence generation (from top) for the cases (a) k = 2 and (b) k = 3 , after six and five substitutions, respectively. For both cases, p = 0.9 . Black and white represent 1 and 0, respectively.
Entropy 24 00290 g001
Figure 2. A tree representation of the probabilities of obtaining a number of ones after i = 3 iterations using rule (1). As usual, C ( i , j ) means the combinatorial numbers C ( i , j ) = i j . The rightmost column represents the number of ones for each branch.
Figure 2. A tree representation of the probabilities of obtaining a number of ones after i = 3 iterations using rule (1). As usual, C ( i , j ) means the combinatorial numbers C ( i , j ) = i j . The rightmost column represents the number of ones for each branch.
Entropy 24 00290 g002
Figure 3. Discrete probability distributions of the number of ones for different values of p for the seventh iteration ( i = 7 ). Concretely, p = 0.5 (a), p = 0.8 (b), p = 0.9 (c) and p = 0.99 (d). As it can be seen, the distributions for low value of p exhibit a peak at X = 0 that disappears when p tends to 1.
Figure 3. Discrete probability distributions of the number of ones for different values of p for the seventh iteration ( i = 7 ). Concretely, p = 0.5 (a), p = 0.8 (b), p = 0.9 (c) and p = 0.99 (d). As it can be seen, the distributions for low value of p exhibit a peak at X = 0 that disappears when p tends to 1.
Entropy 24 00290 g003
Figure 4. Analytical (black continuous curve) and the corresponding numerical distributions for p = 0.9 (a), p = 0.92 (b), p = 0.94 (c) and p = 0.96 (d). The numerical distributions are histograms (gray bars) obtained from 1000 realizations. Note the multimodal character of the distributions; for large values of p, in addition to the peak at X = 0 , other peaks appear at X 18 , 30 , 54 and 85. The heights of these maxima vary with p and new peaks are visible as p increases (although, in the limit p = 1 , only one remains).
Figure 4. Analytical (black continuous curve) and the corresponding numerical distributions for p = 0.9 (a), p = 0.92 (b), p = 0.94 (c) and p = 0.96 (d). The numerical distributions are histograms (gray bars) obtained from 1000 realizations. Note the multimodal character of the distributions; for large values of p, in addition to the peak at X = 0 , other peaks appear at X 18 , 30 , 54 and 85. The heights of these maxima vary with p and new peaks are visible as p increases (although, in the limit p = 1 , only one remains).
Entropy 24 00290 g004
Figure 5. Probability distribution of having a null sequence at the limit of the generating process, ϕ , for k = 2 (black), k = 3 (blue), k = 4 (red) and k = 5 (green), as a function of p.
Figure 5. Probability distribution of having a null sequence at the limit of the generating process, ϕ , for k = 2 (black), k = 3 (blue), k = 4 (red) and k = 5 (green), as a function of p.
Entropy 24 00290 g005
Figure 6. Standard deviation, σ = V A R ( X i ) for the substitution lengths k = 2 (black) and k = 3 (blue) after i = 10 substitutions as a function of p. Note the different scale at the Y-axis, likely as a consequence of the different sequence size 2 10 vs. 3 10 .
Figure 6. Standard deviation, σ = V A R ( X i ) for the substitution lengths k = 2 (black) and k = 3 (blue) after i = 10 substitutions as a function of p. Note the different scale at the Y-axis, likely as a consequence of the different sequence size 2 10 vs. 3 10 .
Entropy 24 00290 g006
Figure 7. (a) Entropy as a function of p for the first 10 iterations. Note the displacement of the maximum to larger values of p as the i increases. (b) Differential entropy h i ( p ) = H i + 1 H i , for i = 1 , 2 , 3 , , 9 . As it can be seen, h i ( p ) is positive for p > p c h ( i ) . As the iteration increases, the interval of p where h i is positive shrinks. As a matter of fact, in the limit, it tends to 0. Furthermore, h i tends to 0 as i , which means a convergence to a limit entropy for the infinite sequence. The substitution length is k = 2 .
Figure 7. (a) Entropy as a function of p for the first 10 iterations. Note the displacement of the maximum to larger values of p as the i increases. (b) Differential entropy h i ( p ) = H i + 1 H i , for i = 1 , 2 , 3 , , 9 . As it can be seen, h i ( p ) is positive for p > p c h ( i ) . As the iteration increases, the interval of p where h i is positive shrinks. As a matter of fact, in the limit, it tends to 0. Furthermore, h i tends to 0 as i , which means a convergence to a limit entropy for the infinite sequence. The substitution length is k = 2 .
Entropy 24 00290 g007
Figure 8. Parametric curves H V A R in terms of p for the first 10 iterations. All these curves can be split into two regimes: (i) for 0 < p < p r , where H has a concave (down) dependence on V A R and (ii) for p > p r , when both variables decreases with p almost linearly. The value of p r coincides with the point of the curve farthest from the origin. The dashed line connects these points. Due to the large value of the variance in comparison with the corresponding entropy values, the value of p r tends to be the value where the maximum of VAR occurs. An interesting consequence of this dependence is that, for a given iteration i, we can find two values of p that yield sequences with the same variance but, with different entropies, having one of them the maximum value. The substitution length is k = 2 .
Figure 8. Parametric curves H V A R in terms of p for the first 10 iterations. All these curves can be split into two regimes: (i) for 0 < p < p r , where H has a concave (down) dependence on V A R and (ii) for p > p r , when both variables decreases with p almost linearly. The value of p r coincides with the point of the curve farthest from the origin. The dashed line connects these points. Due to the large value of the variance in comparison with the corresponding entropy values, the value of p r tends to be the value where the maximum of VAR occurs. An interesting consequence of this dependence is that, for a given iteration i, we can find two values of p that yield sequences with the same variance but, with different entropies, having one of them the maximum value. The substitution length is k = 2 .
Entropy 24 00290 g008
Table 1. Values of the fitting parameters for some values of the substitution length k. As it can be seen, as k increases the values of α ( k ) and β ( k ) seems to converge to 0.5 and 1, respectively. The third column depicts the Residual Sum of Squares ( R S S ) of the fitted model.
Table 1. Values of the fitting parameters for some values of the substitution length k. As it can be seen, as k increases the values of α ( k ) and β ( k ) seems to converge to 0.5 and 1, respectively. The third column depicts the Residual Sum of Squares ( R S S ) of the fitted model.
kαβRSS
20.62561.0645 1.78 × 10 5
30.58321.0486 2.35 × 10 5
40.56221.0383 2.10 × 10 5
50.54971.0316 1.73 × 10 5
100.52481.0167 6.99 × 10 6
1000.50251.0017 1.05 × 10 7
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Nuño, J.C.; Muñoz, F.J. Entropy-Variance Curves of Binary Sequences Generated by Random Substitutions of Constant Length. Entropy 2022, 24, 290. https://doi.org/10.3390/e24020290

AMA Style

Nuño JC, Muñoz FJ. Entropy-Variance Curves of Binary Sequences Generated by Random Substitutions of Constant Length. Entropy. 2022; 24(2):290. https://doi.org/10.3390/e24020290

Chicago/Turabian Style

Nuño, Juan Carlos, and Francisco J. Muñoz. 2022. "Entropy-Variance Curves of Binary Sequences Generated by Random Substitutions of Constant Length" Entropy 24, no. 2: 290. https://doi.org/10.3390/e24020290

APA Style

Nuño, J. C., & Muñoz, F. J. (2022). Entropy-Variance Curves of Binary Sequences Generated by Random Substitutions of Constant Length. Entropy, 24(2), 290. https://doi.org/10.3390/e24020290

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop