Next Article in Journal / Special Issue
An Erdős-Révész Type Law for the Length of the Longest Match of Two Coin-Tossing Sequences
Previous Article in Journal
Towards Secure Internet of Things: A Coercion-Resistant Attribute-Based Encryption Scheme with Policy Revocation
Previous Article in Special Issue
A Simple Wide Range Approximation of Symmetric Binomial Distribution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Convergence Rate for the Longest at Most T-Contaminated Runs of Heads

by
István Fazekas
1,*,
Borbála Fazekas
2 and
László Fórián
1
1
Faculty of Informatics, University of Debrecen, Kassai Street 26, 4028 Debrecen, Hungary
2
Institute of Mathematics, University of Debrecen, Egyetem Square 1, 4032 Debrecen, Hungary
*
Author to whom correspondence should be addressed.
Entropy 2025, 27(1), 33; https://doi.org/10.3390/e27010033
Submission received: 1 November 2024 / Revised: 29 December 2024 / Accepted: 31 December 2024 / Published: 3 January 2025
(This article belongs to the Special Issue The Random Walk Path of Pál Révész in Probability)

Abstract

:
In this paper, we study the usual coin tossing experiment. We call a run at most T-contaminated, if it contains at most T tails. We approximate the distribution of the length of the longest at most T-contaminated runs. We offer a more precise approximation than the previous one.

1. Introduction

Consider the usual coin tossing experiment. Let p be the probability of heads and q = 1 p be the probability of tails. Here, p is a fixed number with 0 < p < 1 . We toss a coin N times independently. We write 1 for heads and 0 for tails. Therefore, we consider independent identically distributed random variables X 1 , X 2 , , X N with distribution P ( X i = 1 ) = p and P ( X i = 0 ) = q = 1 p , i = 1 , 2 , , N .
Let T be a fixed non-negative integer. We shall study the length of at most T-contaminated (in other words, at most T-interrupted) runs of heads. It means that there are at most T zeros in an m-length sequence of ones and zeros.
There are several well-known results on the length of the pure head runs. Fair coins were studied in the paper of Erdos and Rényi [1]. Almost sure limit results for the length of the longest runs containing at most T tails were obtained in [2]. Földes [3] presented asymptotic results for the distribution of the number of T-contaminated head runs, the first hitting time of a T-contaminated head run having a fixed length, and the length of the longest T-contaminated head run. Móri [4] proved an almost sure limit theorem for the longest T-contaminated head run.
Gordon, Schilling, and Waterman [5] applied extreme value theory to obtain the asymptotic behaviour of the expectation and the variance of the length of the longest T-contaminated head run. Then, accompanying distributions were obtained for the length of the longest T-contaminated head run. Ref. [6] proved results on the accuracy of the approximation to the distribution of the length of the longest head run in a Markov chain.
In this paper, we follow the lines of Arratia, Gordon, and Waterman [7], where Poisson approximation was used to find the asymptotic behaviour of the length of the longest at most T-contaminated head run. We shall use the basic results presented in [7], and give a new approximation for the distribution of the length of the longest at most T-contaminated head run. We show that for T > 0 the rate of the approximation in our new result is O 1 / ( log ( n ) ) 2 , where log denotes the logarithm to base 1 / p . Here and in what follows, f ( n ) = O ( h ( n ) ) means that f ( n ) / h ( n ) is bounded as n . We shall see that for T > 0 the rate of the approximation offered by [7] is O log ( log ( n ) ) / log ( n ) , so our result considerably improves the former result. In our opinion the much better rate O log ( n ) / n presented without detailed proof in [7] is just a misprint, that is true only for T = 0 . The main result is Theorem 1. For completeness, we give a proof of the former result, see Proposition 1. In Section 4, we present some simulation results supporting our theorem.
For T = 1 and T = 2 , our result is the same as our former result in [8], where a powerful lemma by Csáki, Földes and Komlós [9] was used in the proof.

2. The Approximation of Arratia, Gordon, and Waterman

Using the notation of [7], let S i = X 1 + + X i , and let S n , t be the largest increment in the sequence S i in t steps; more precisely, S n , t is the maximal number of heads in a window of length t starting in the first n tosses. Let R n ( T ) be the length of the longest at most T-interrupted runs of heads starting in the first n tosses. (One can see that R n ( T ) is the length of the longest precisely T-interrupted runs of heads starting in the first n tosses.) Then,
{ R n ( T ) < t } = { S n , t < t T } .
According to Theorem 1 of [7], for the distribution of S n , t , we have the following approximation. For positive integers n, s, and t with s t and s / t > p ,
| P ( S n , t < s ) e E W | 7 t P ( X 1 + + X t = s ) + P ( X 1 + + X t > s ) ,
e n s t p P ( X 1 + + X t = s ) · e 2 n 1 s t P ( X 1 + + X t = s ) P ( X 1 + + X t > s ) e E W e n s t p P ( X 1 + + X t = s ) .
In the above inequalities E W is the expectation of the random variable W defined in [7]. We shall use inequalities (1) and (2) with s = t T . Using notation α = n s t p P ( X 1 + + X t = s ) and β = 2 n 1 s t P ( X 1 + + X t = s ) P ( X 1 + + X t > s ) , the above inequality is of the form
e α e β e E W e α .
In this paper, the approximation of e α will serve as the main term.
Now, we shall analyse that approximation of R n ( T ) which was proposed in [7]. The centering constant in [7] is
c n ( T ) = log n + T log log n log ( T ! ) + log ( q T + 1 p T ) .
Let x be a fixed number so that c n ( T ) + x = t is an integer. We want to estimate P ( R n ( T ) c n ( T ) < x ) = P ( S n , t < t T ) . In the following we shall use both exp ( x ) and e x for the usual exponential function.
Proposition 1.
Let [ c n ( T ) ] be the integer part of c n ( T ) and { c n ( T ) } = c n ( T ) [ c n ( T ) ] be its fractional part.
If T = 0 , then for any integer l,
P ( R n ( T ) [ c n ( 0 ) ] < l ) = exp p l { c n ( 0 ) } 1 + O log n n .
If T > 0 , then for any integer l,
P ( R n ( T ) [ c n ( T ) ] < l ) = exp p l { c n ( T ) } 1 + O log log n log n .
Remark 1.
In Corollary 3 of [7], the same remainder term O log n n is given for the case T > 0 , too. However, in our opinion, it contains only a part of the remainder terms.
Proof Proposition 1.
As our remainder term and the remainder term offered by [7] are different, we give the details of the more or less simple calculation. First, we calculate the right hand side of inequality (1) for s = t T and t = c n ( T ) + x , where x is chosen so that t is an integer.
P ( X 1 + + X t = t T ) = t T p t ( q / p ) T κ ( log n ) T ( 1 / p ) log n + T log log n = O 1 n .
Here and in what follows, κ is an appropriate finite positive constant. Therefore,
7 t P ( X 1 + + X t = t T ) = O log n n .
For T > 0 , we have
P ( X 1 + + X t > t T ) T t t T + 1 p t T + 1 κ t T 1 p t κ ( log n ) T 1 n ( log n ) T = O 1 n log n .
So we obtain
| P ( S n , t < t T ) e E W | = O log n n .
This last formula is valid for T = 0 , too.
Now, we turn to the other parts of the approximation. First, consider T = 0 . Then, the main term of the approximation, i.e., e α in Formula (3) is
e α = e n t t p P ( X 1 + + X t = t ) = e p log ( n q ) + t .
We have to approximate P ( R n ( 0 ) [ c n ( 0 ) ] < l ) , where l is an integer, c n ( 0 ) = log n + log q , and [ . ] denotes the integer part. So, we should apply the previous equality with t = [ c n ( 0 ) ] + l , so we obtain
e α = e p l { c n ( 0 ) } ,
where { . } denotes the fractional part. We see that, if T = 0 , then β = 0 , so in inequality (3), we have equality. So, for T = 0 , this part of the approximation is precise, i.e., the main term does not contain a remainder part.
Now, we consider the approximation of the main term for T > 0 .
e α = e n t T t p P ( X 1 + + X t = t T ) = e n q T t t T q T p t T .
Now, denote by L the base 1 / p logarithm of the negative of the exponent, that is, L = log α . So,
L = log n + log ( q T / t ) + log ( t ( t 1 ) ( ( t T + 1 ) ) log T ! + T log q + T t .
We shall use t = c n ( T ) + x . Applying Taylor’s expansion of the logarithm function, log ( x 0 + y ) = log x 0 + y c x 0 y 2 2 c x ˜ 0 2 , where x ˜ 0 is between x 0 and x 0 + y , and where c = ln ( 1 / p ) , we obtain
L = log n + log q T c q t O 1 t 2 + log t T t T 1 T 2 c t T + O 1 t 2 log T ! + T log q + T t
= log n + T log t 1 c t T q + T 2 + O 1 t 2 log T ! + ( T + 1 ) log q + T t .
We insert t = c n ( T ) + x = log n + T log log n + E , where E is defined by the equation at hand so it does not depend on n. Using again Taylor’s expansions of the logarithm function as log ( x 0 + y ) = log x 0 + y c x 0 y 2 2 c x 0 2 + y 3 3 c x ˜ 0 3 , where x ˜ 0 is between x 0 and x 0 + y , and for the 1 / t function, as 1 x 0 + y = 1 x 0 y x 0 2 + y 2 x ˜ 0 3 , where x ˜ 0 is between x 0 and x 0 + y , we obtain
L = log n + T log log n + T log log n + E c log n ( T log log n + E ) 2 2 c ( log n ) 2 + O ( log log n ) 3 ( log n ) 3 1 c T q + T 2 1 log n T log log n + E ( log n ) 2 + O ( log log n ) 2 ( log n ) 3 + O 1 t 2 log T ! + ( T + 1 ) log q + T t .
Now, using t = c n ( T ) + x and inserting the value of c n ( T ) , we obtain
L = x + T 2 log log n c log n + O 1 log n ,
which implies that
L = x + O log log n log n ,
and this rate is not improvable. We remark that this relation is valid for T = 1 , too.
Therefore, by applying the Taylor series expansion e y = 1 + y + e y ˜ y 2 2 twice, where y ˜ is between 0 and y, we obtain
e α = e ( 1 / p ) L = e p x 1 ln 1 p T 2 log log n c log n + O 1 log n
= e p l { c n ( T ) } 1 + O log log n log n ,
and this rate is not improvable.
Now, we consider the e β part. Here,
β = 2 T t i = t T + 1 t t i p i q t i n t T p t T q T
with t = c n ( T ) + x = log n + T log log n + E . The largest term in the above sum is the first one, and it is
t T 1 p t q p T 1 = O 1 n log n .
Then,
t T p t T q T = O 1 n .
Using Taylor’s expansion,
T t = O 1 log n .
So, β = O ( 1 / n ( log n ) 2 ) , and
e β = 1 O 1 n ( log n ) 2 .
Therefore,
e α e β = e p l { c n ( T ) } 1 + O log log n log n 1 O 1 n ( log n ) 2 = e p l { c n ( T ) } 1 + O log log n log n .

3. A New Approximation

Theorem 1.
Let T 1 be an integer. Let
c ˜ n ( T ) = log ( q n ) + T log ( log ( q n ) ) + T 2 log ( log ( q n ) ) c log ( q n ) T c q 0 log ( q n ) T 3 2 c log ( log ( q n ) ) log ( q n ) 2 + T 2 log ( log ( q n ) ) c q 0 ( log ( q n ) ) 2 + T 3 log ( log ( q n ) ) ( c log ( q n ) ) 2 + T log q p log ( T ! ) 1 + T c log ( q n ) T 2 log ( log ( q n ) ) c ( log ( q n ) ) 2 ,
where log denotes the logarithm to base 1 / p , c = ln ( 1 / p ) , ln denotes the natural logarithm to base e, and q 0 = 2 q 2 + T q q . Let [ c ˜ n ( T ) ] denote the integer part of c ˜ n ( T ) , while { c ˜ n ( T ) } denotes the fractional part of c ˜ n ( T ) , i.e. { c ˜ n ( T ) } = c ˜ n ( T ) [ c ˜ n ( T ) ] .
Then,
P ( R n ( T ) [ c ˜ n ( T ) ] < l ) = exp p ( l { c ˜ n ( T ) } ) 1 T c log ( q n ) + T 2 log ( log ( q n ) ) c ( log ( q n ) ) 2 1 + O 1 ( log n ) 2
for any integer l, where f ( n ) = O ( h ( n ) ) means that f ( n ) / h ( n ) is bounded as n .
Proof. 
We use the same approach as in the previous section. First, we calculate the right hand side of inequality (1) for s = t T and t = c ˜ n ( T ) + x , where x is chosen so that t is an integer. As
c ˜ n ( T ) = log ( n ) + T log ( log ( n ) ) + O ( 1 ) ,
we obtain
P ( X 1 + + X t = t T ) = t T p t ( q / p ) T κ ( log n ) T ( 1 / p ) log n + T log log n = O 1 n .
Therefore,
7 t P ( X 1 + + X t = t T ) = O log n n .
Similarly,
P ( X 1 + + X t > t T ) κ t T 1 p t = O 1 n log n .
So,
| P ( S n , t < t T ) e E W | = O log n n .
Now, we turn to the approximation of the main term e α . Denote by L again the base 1 / p logarithm of the negative of the exponent, so
L = log α = log n + log ( q T / t ) + log ( t ( t 1 ) ( ( t T + 1 ) ) log T ! + T log q + T t .
We shall apply it for t = c ˜ n ( T ) + x . Therefore,
L = log q T t + log n + log t T T ( T 1 ) 2 t T 1 + O ( t T 2 ) t + log ( ( q / p ) T ) log ( T ! ) = log q T t + log n + log ( t T ) T ( T 1 ) 2 t T 1 c t T + O 1 t 2 t + log ( ( q / p ) T ) log ( T ! ) = log q T c q t + log n + T log t T ( T 1 ) 2 c t t + log ( ( q / p ) T ) log ( T ! ) + O 1 ( log n ) 2 = log ( q n ) T c q 0 t + T log t t + log ( ( q / p ) T ) log ( T ! ) + O 1 ( log n ) 2 ,
where we applied Taylor’s expansion of the log function up to the second order and used the notation q 0 = 2 q 2 + T q q .
Introduce notation
D = T 3 2 c log ( log ( q n ) ) log ( q n ) 2 + T 2 log ( log ( q n ) ) c q 0 ( log ( q n ) ) 2 + T 3 log ( log ( q n ) ) ( c log ( q n ) ) 2 + T log q p log ( T ! ) T c log ( q n ) T 2 log ( log ( q n ) ) c ( log ( q n ) ) 2 ,
B = T 2 log ( log ( q n ) ) c log ( q n ) T c q 0 log ( q n ) + D
and
A = T log ( log ( q n ) ) + B .
Then, t = c ˜ n ( T ) + x = c ˜ n ( T ) + l { c ˜ n ( T ) } , where l is an integer, so
t = T log q p log ( T ! ) + log ( q n ) + A + l { c ˜ n ( T ) } .
Inserting this value of t into the term t of L, we obtain
L = T c q 0 t + T log t A l + { c ˜ n ( T ) } + O 1 ( log n ) 2 .
Then, use Taylor’s expansion for the function 1 / t to obtain
L = T c q 0 log ( q n ) + T 2 log ( log ( q n ) ) c q 0 ( log ( q n ) ) 2 + T log log ( q n ) + T log ( log ( q n ) ) + B + log ( ( q / p ) T ) log ( T ! ) + l { c ˜ n ( T ) } A l + { c ˜ n ( T ) } + O 1 ( log n ) 2 .
Now, by Taylor’s expansion for the log ( x ) function, we obtain
L = T c q 0 log ( q n ) + T 2 log ( log ( q n ) ) c q 0 ( log ( q n ) ) 2 + T log ( log ( q n ) ) + T T log ( log ( q n ) ) + B + log ( ( q / p ) T ) log ( T ! ) + l { c ˜ n ( T ) } c log ( q n ) 1 2 T T log ( log ( q n ) ) + B + log ( ( q / p ) T ) log ( T ! ) + l { c ˜ n ( T ) } 2 c ( log ( q n ) ) 2 A l + { c ˜ n ( T ) } + O 1 ( log n ) 2 .
Now, we can omit B from the quadratic term. Then, we apply A = T log ( log ( q n ) ) + B , so we obtain
L = T c q 0 log ( q n ) + T 2 log ( log ( q n ) ) c q 0 ( log ( q n ) ) 2 + T 2 log ( log ( q n ) ) c log ( q n ) + T ( log ( ( q / p ) T ) log ( T ! ) ) c log ( q n ) + T 3 log ( log ( q n ) ) ( c log ( q n ) ) 2 T 2 q 0 ( c log ( q n ) ) 2 + T D c log ( q n ) + T ( l { c ˜ n ( T ) ) } ) c log ( q n ) 1 2 T 3 ( log ( log ( q n ) ) ) 2 c ( log ( q n ) ) 2 1 2 T log ( ( q / p ) T ) log ( T ! ) + l { c ˜ n ( T ) } 2 c ( log ( q n ) ) 2 2 T 2 T log ( log ( q n ) ) log ( ( q / p ) T ) log ( T ! ) + l { c ˜ n ( T ) } c ( log ( q n ) ) 2 B l + { c ˜ n ( T ) } + O 1 ( log n ) 2 = ( l { c ˜ n ( T ) } ) T c log ( q n ) T 2 log ( log ( q n ) ) c ( log ( q n ) ) 2 1 + O 1 ( log n ) 2 .
So,
e α = e p ( l { c ˜ n ( T ) } ) 1 T c log ( q n ) + T 2 log ( log ( q n ) ) c ( log ( q n ) ) 2 + O 1 ( log n ) 2 .
Using Taylor’s expansion again,
e α = e p ( l { c ˜ n ( T ) } ) 1 T c log ( q n ) + T 2 log ( log ( q n ) ) c ( log ( q n ) ) 2 1 + O 1 ( log n ) 2 .
Now, turn to the e β part, where
β = 2 T t i = t T + 1 t t i p i q t i n t T p t T q T
and t = c ˜ n ( T ) + x . Simple calculations shows that β κ ( 1 / n ( log n ) 2 ) , and so
e β = 1 + O 1 n ( log n ) 2 .
Therefore,
e α e β = e p ( l { c ˜ n ( T ) } ) 1 T c log ( q n ) + T 2 log ( log ( q n ) ) c ( log ( q n ) ) 2 1 + O 1 ( log n ) 2 .

4. Simulation Results

We performed several computer simulation studies for certain fixed values of p and T. Here, we present the results of three simulations. The length of each simulated sequence was N = 10 6 , and s = 2000 was the number of repetitions of the N-length sequences in each case. In each case, the number of contaminations was T = 3 .
Figure 1, Figure 2 and Figure 3 present the results of the simulations. The left hand side of each figure shows the empirical distribution function of the longest at most T-contaminated run and its approximation suggested by our Theorem 1. The asterisk (i.e., ∗) denotes the result of the simulation, i.e., the empirical distribution of the longest at most T-contaminated run, and the circle (∘) denotes the approximation offered by Theorem 1. The right hand side of each figure shows the approximation by the former result. The asterisk denotes the result of the simulation again, and the circle (∘) denotes the approximation offered by Proposition 1. The simulation results support that our new theorem offers a better approximation than the previous one.

5. Discussion

We were able to obtain a practically applicable approximation for the distribution of the longest at most T-contaminated head-run. We presented both detailed mathematical proof and simulation evidence.

Author Contributions

Conceptualization, I.F.; methodology, I.F.; software, L.F.; validation, B.F.; formal analysis, I.F., B.F. and L.F.; investigation, I.F., B.F. and L.F.; writing—original draft preparation, I.F., B.F. and L.F.; writing—review and editing, I.F.; visualization, L.F.; supervision, I.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Erdos, P.; Rényi, A. On a new law of large numbers. J. Analyse Math. 1970, 23, 103–111. [Google Scholar] [CrossRef]
  2. Erdos, P.; Révész, P. On the length of the longest head-run. In Topics in Information Theory (Second Colloq., Keszthely, 1975); Colloq. Math. Soc. János Bolyai; North-Holland: Amsterdam, The Netherlands, 1977; Volume 16, pp. 219–228. [Google Scholar]
  3. Földes, A. The limit distribution of the length of the longest head-run. Period. Math. Hungar. 1979, 10, 301–310. [Google Scholar] [CrossRef]
  4. Móri, T.F. The a.s. limit distribution of the longest head run. Canad. J. Math. 1993, 45, 1245–1262. [Google Scholar] [CrossRef]
  5. Gordon, L.; Schilling, M.F.; Waterman, M.S. An extreme value theory for long head runs. Probab. Theory Relat. Fields 1986, 72, 279–287. [Google Scholar] [CrossRef]
  6. Novak, S.Y. On the length of the longest head run. Statist. Probab. Lett. 2017, 130, 111–114. [Google Scholar] [CrossRef]
  7. Arratia, R.; Gordon, L.; Waterman, M.S. The Erdos-Rényi Law in Distribution, for Coin Tossing and Sequence Matching. Ann. Statist. 2008, 18, 539–570. [Google Scholar] [CrossRef]
  8. Fazekas, I.; Fazekas, B.; Ochieng Suja, M. Convergence rate for the longest T-contaminated runs of heads. Statist. Probab. Lett. 2024, 208, 110059. [Google Scholar] [CrossRef]
  9. Csáki, E.; Földes, A.; Komlós, J. Limit theorems for Erdos-Rényi type problems. Studia Sci. Math. Hungar. 1987, 22, 321–332. [Google Scholar]
Figure 1. Longest at most T = 3 contaminated run when p = 0.4 .
Figure 1. Longest at most T = 3 contaminated run when p = 0.4 .
Entropy 27 00033 g001
Figure 2. Longest at most T = 3 contaminated run when p = 0.5 .
Figure 2. Longest at most T = 3 contaminated run when p = 0.5 .
Entropy 27 00033 g002
Figure 3. Longest at most T = 3 contaminated run when p = 0.6 .
Figure 3. Longest at most T = 3 contaminated run when p = 0.6 .
Entropy 27 00033 g003
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fazekas, I.; Fazekas, B.; Fórián, L. On the Convergence Rate for the Longest at Most T-Contaminated Runs of Heads. Entropy 2025, 27, 33. https://doi.org/10.3390/e27010033

AMA Style

Fazekas I, Fazekas B, Fórián L. On the Convergence Rate for the Longest at Most T-Contaminated Runs of Heads. Entropy. 2025; 27(1):33. https://doi.org/10.3390/e27010033

Chicago/Turabian Style

Fazekas, István, Borbála Fazekas, and László Fórián. 2025. "On the Convergence Rate for the Longest at Most T-Contaminated Runs of Heads" Entropy 27, no. 1: 33. https://doi.org/10.3390/e27010033

APA Style

Fazekas, I., Fazekas, B., & Fórián, L. (2025). On the Convergence Rate for the Longest at Most T-Contaminated Runs of Heads. Entropy, 27(1), 33. https://doi.org/10.3390/e27010033

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop