Next Article in Journal
Hurricane Footprints in Global Climate Models
Next Article in Special Issue
Relaxed Plasma Equilibria and Entropy-Related Plasma Self-Organization Principles
Previous Article in Journal
Entropy 2008, 10, 240-247: Ferri et al. Deformed Generalization of the Semiclassical Entropy
Previous Article in Special Issue
Assessing the Information Content in Environmental Modelling: A Carbon Cycle Perspective
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Entropy and Letter Frequencies of Powerfree Words

Department of Mathematics&Statistics, The Open University, Walton Hall, Milton Keynes MK76AA, UK
*
Author to whom correspondence should be addressed.
Entropy 2008, 10(4), 590-612; https://doi.org/10.3390/e10040590
Submission received: 24 May 2008 / Accepted: 7 August 2008 / Published: 12 November 2008

Abstract

:
We review the recent progress in the investigation of powerfree words, with particular emphasis on binary cubefree and ternary squarefree words. Besides various bounds on the entropy, we provide bounds on letter frequencies and consider their empirical distribution obtained by an enumeration of binary cubefree words up to length 80.

Graphical Abstract

1. Introduction

The interest in combinatorics on words goes back to the work of Axel Thue at the beginning of the 20th century [1]. He showed, in particular, that the famous morphism
ρ : 0 01 1 10 ,
called Thue-Morse morphism since the work of Morse [2], is cubefree. Its iteration on the initial word 0 produces an infinite cubefree word
0110100110010110100101100110100110010110011010010110100110010110
over a binary alphabet, which means that it does not contain any subword of the form 0 3 = 000 , 1 3 = 111 , ( 01 ) 3 = 010101 , ( 10 ) 3 = 101010 and so forth. Moreover, the statement that the morphism is cubefree means that it maps any cubefree word to a cubefree word, so it preserves this property. Generally, the iteration of a powerfree morphism is a convenient way to produce infinite powerfree words.
The investigation of powerfree, or more generally of pattern-avoiding words, is one particular aspect of combinatorics on words; we refer the reader to the book series [3,4,5] for a comprehensive overview of the area, including algebraic formulations and applications. The area has attracted considerable activity in the past decades [6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24], and continues to do so, see [25,26,27,28,29,30,31,32,33] for some recent work. Beyond the realm of combinatorics on words and coding theory, substitution sequences, such as the Thue-Morse sequence, have been investigated for instance in the context of symbolic dynamics [34,35,36] and aperiodic order [37], to name but two. In the latter case, one is interested in systems which display order without periodicity, and substitution sequences often provide paradigmatic models, which are used in many applications in physics and materials science. However, sequences produced by a substitution such as in Eq. (1) have subexponential complexity and hence zero combinatorial entropy, cf. Definition 12 below. A natural generalisation to interesting sets of positive entropy is provided by powerfree or pattern-avoiding words.
In this article, we review the recent progress on powerfree words, with emphasis on the two ‘classic’ cases of binary cubefree and ternary squarefree words. We include a summary of relevant results which are scattered over 25 years of literature, and also discuss some new results as well as conjectures on cubefree morphisms and letter frequencies in binary cubefree words.
The first term of interest is the combinatorial entropy of the set of powerfree words. Due to the fact that every subword of a powerfree word is again powerfree, the entropy of powerfree words exists as a limit. It is a measure for the exponential growth rate of the number of powerfree words of length n. Unfortunately, neither an explicit expression for the entropy of k-powerfree words nor an easy way to compute it numerically is known. Nevertheless, there are several strategies to derive upper and lower bounds for this limit. Upper bounds can be obtained, for example, by enumeration of all powerfree words up to a certain length, or by the derivation of generating functions for the number of powerfree words, see Section 4. Until recently all methods to achieve lower bounds relied on powerfree morphisms. However, the lower bounds obtained in this way are not particularly good, since they are considerably smaller than the upper bounds as well as reliable numerical estimates of the actual value of the entropy. A completely different approach introduced recently by Kolpakov [29], which amounts to choosing a parameter value to satisfy a number of inequalities derived from a Perron-Frobenius-type argument, provides surprisingly good lower bounds for the entropy of ternary squarefree and binary cubefree words.
In the following section, we briefly introduce the notation and basic terminology; see [3] for a more detailed introduction. We continue with a summary of results on k-powerfree morphisms, which can be used to derive lower bounds for the corresponding entropy. We then proceed by introducing the entropy of k-powerfree words and summarise the methods to derive upper and lower bounds in general, and for binary cubefree and ternary squarefree words in particular. We conclude with a discussion of the frequencies of letters in binary cubefree and ternary squarefree words.

2. Powerfree words and morphisms

Define an alphabet A as a finite non-empty set of symbols called letters. The cardinality of A is denoted by Card ( A ) . Finite or infinite sequences of elements from A are called words. The empty word is denoted by ε. The set of all finite words, the operation of concatenation of words and the empty word ε form the free monoid A * . The free semigroup generated by A is A + : = A { ε } .
The length of a word u A * , denoted by | u | , is the number of letters that u consists of. The length of the empty word is | ε | : = 0 .
For two words u , v A * , we say that v is a subword or a factor of u if there are words x , y A * such that u = x v y . If x = ε , the factor v is called a prefix of u, and if y = ε , v is called a suffix of u. Given a set of words X A * (here and in what follows, the symbol ⊂ is meant to include the possibility that both sets are equal), the set of all factors of words in X is denoted by Fact ( X ) .
A map ρ : A * B * , where A and B are alphabets, is called a morphism if ρ ( u v ) = ρ ( u ) ρ ( v ) holds for all u , v A * . Obviously, a morphism ρ is completely determined by ρ ( a ) for all a A , and satisfies ρ ( ε ) = ε . A morphism ρ : A * B * is called ℓ-uniform, if | ρ ( a ) | = for all a A .
For a word u, we define u 0 : = ε , u 1 : = u and, for an integer k > 1 , the power u k as the concatenation of k occurrences of the word u. If u ε , u k is called a k-power. A word v contains a k-power if at least one of its factors is a k-power. If a word does not contain any k-power as a factor, it is called k-powerfree. If a word does not contain the k-power of any word up to a certain length p as a factor, it is called length-p k-powerfree, i.e., w = x u k y implies that u = ε whenever x , u , y A * with | u | p . We denote the set of k-powerfree words in an alphabet A by F ( k ) ( A ) A * and the set of length-p k-powerfree by F ( k , p ) ( A ) A * . By definition, the empty word ε is k-powerfree for all k. A word w A * is called primitive, if w = v n , with v A * and n N , implies that n = 1 , meaning that w is not a proper power of another word v.
A morphism ρ : A * B * is called k-powerfree, if ρ ( u ) is k-powerfree for every k-powerfree word u. In other words, ρ is powerfree if ρ F ( k ) ( A ) F ( k ) ( B ) . A test-set for k-powerfreeness of morphisms on an alphabet A is a set T A * such that, for any morphism ρ : A * B * , ρ is k-powerfree if and only if ρ ( T ) is k-powerfree. A morphism is called powerfree if it is a k-powerfree morphism for every k 2 .
In particular, 2-powerfree and 3-powerfree words and morphisms are called squarefree and cubefree, respectively. A morphism from A * to B * with Card ( A ) = 2 is also called a binary morphism. The notion of powerfreeness can be extended to non-integer powers; see, for instance, Ref. [25] for an investigation of k-powerfree binary words for k 2 . However, in this article we shall concentrate on the cases k = 2 and k = 3 , and hence restrict the discussion to integer powers.

3. Characterisations of k-powerfree morphisms

In what follows, we summarise a number of relevant results on k-powerfree morphisms. In particular, we are interested in the question how to test a specified morphism for k-powerfreeness. We start with results relating to the case k = 2 .

3.1. Characterisations of squarefree morphisms

A sufficient (but in general not necessary) condition for the squarefreeness of a morphism is known since 1979.
Theorem 1
(Bean et al. [8]). A morphism ρ : A * B * is squarefree if
(i)
ρ ( w ) is squarefree for every squarefree word w A * of length | w | 3 ;
(ii)
a = b whenever a , b A and ρ ( a ) is a factor of ρ ( b ) .
If the morphism ρ is uniform, this condition is in fact also necessary, because in this case ρ ( a ) being a factor of ρ ( b ) implies that ρ ( a ) = ρ ( b ) . If a , b A exist with a b and ρ ( a ) = ρ ( b ) , then clearly ρ is not squarefree since ρ ( a b ) = ρ ( a ) ρ ( b ) is a square. This gives the following corollary.
Corollary 2.
A uniform morphism ρ : A * B * is squarefree if and only if ρ ( w ) is squarefree for every squarefree word w A * of length | w | 3 .
This corollary corresponds to Brandenburg’s Theorem 2 in Ref. [11] which only demands that ρ ( w ) is squarefree for every squarefree word w A * of length exactly 3. A short calculation reveals that this condition is equivalent to (i), because every squarefree word of length smaller than 3 occurs as a factor of a squarefree word of length 3.
For the next characterisation, we need the notion of a pre-square with respect to a morphism ρ. Let A be an alphabet, w A * a squarefree word and ρ : A * B * a morphism. A factor u ε of ρ ( w ) = α u β is called a pre-square with respect to ρ, if there exists a word w A * satisfying: w w is squarefree and u is a prefix of β ρ ( w ) or w w is squarefree and u is a suffix of ρ ( w ) α . Obviously, if u is a pre-square, then either ρ ( w w ) or ρ ( w w ) contains u 2 as a factor.
Theorem 3
(Crochemore [9]). A morphism ρ : A * B * is squarefree if and only if
(i)
ρ ( w ) is squarefree for every squarefree word w A * of length | w | 3 ;
(ii)
for any a A , ρ ( a ) does not have any internal pre-squares.
It follows that, for a ternary alphabet A , a finite test-set exists, as specified in the following corollary. However, the subsequent theorem shows that, as soon as we consider an alphabet with Card ( A ) > 3 , no such finite test-sets exist, so the situation becomes more complex when considering larger alphabets.
Corollary 4
(Crochemore [9]). Let Card ( A ) = 3 . A morphism ρ : A * B * is squarefree if and only if ρ ( w ) is squarefree for every squarefree word w A * of length | w | 5 .
Theorem 5
(Crochemore [9]). Let Card ( A ) > 3 . For any integer n, there exists a morphism ρ : A * B * which is not squarefree, but maps all squarefree words of length up to n on squarefree words.

3.2. Characterisations of cubefree and k-powerfree morphisms

We now move on to characterisations of cubefree and k-powerfree morphisms for k > 3 . We start with a recent result on cubefree binary morphisms.
Theorem 6
(Richomme, Wlazinski [23]). A set T { a , b } * is a test-set for cubefree morphisms from A * = { a , b } * to B * with Card ( B ) 2 if and only if T is cubefree and Fact ( T ) T min , where
T min : = { a b b a b b a , b a a b a a b , a b a b b a , b a b a a b , a b b a b a , b a a b a b , a a b b a , b b a a b , a b b a a , b a a b b , a b a b a , b a b a b } .
Obviously, the set T min itself is a test-set for cubefree binary morphisms. Another test-set is the set of cubefree words of length 7, as each word of T min appears as a factor of this set. There are even single words which contain all the elements of T min as factors. For instance, the cubefree word a a b b a b a b b a b b a a b a a b a b a a b b is one of the 56 words of length 24 which are test-sets for cubefree morphisms on { a , b } . The length of this word is optimal: no cube-free word of length 23 contains all the words of T min as factors.
The following sufficient characterisation of k-powerfree morphisms generalises Theorem 1 to integer powers k > 2 .
Theorem 7
(Bean et al. [8]). Let ρ : A * B * be a morphism for alphabets A and B and let k > 2 . Then ρ is k-powerfree if
(i)
ρ ( w ) is k-powerfree whenever w A * is k-powerfree and of length | w | k + 1 ;
(ii)
a = b whenever a , b A with ρ ( a ) a factor of ρ ( b ) ;
(iii)
the equality x ρ ( a ) y = ρ ( b ) ρ ( c ) , with a , b , c A and x , y B * , implies that either x = ε , a = b or y = ε , a = c .
As in the squarefree case above, a uniform morphism ρ for which (i) holds also meets (ii), because uniformity implies that ρ ( a ) = ρ ( b ) . If a b , the word a k - 1 b is k-powerfree but ρ ( a k - 1 b ) = ρ ( a ) k is a k-power, which produces a contradiction. The condition (iii) means that, for all letters a A , the images ρ ( a ) do not occur as an inner factor of ρ ( b c ) for any b , c A . In general, this is not necessary for uniform morphisms; an example is given by the Thue-Morse morphism ρ of Eq. (1). For instance, ρ ( 00 ) = 0101 = 0 ρ ( 1 ) 1 , which violates condition (iii) in Theorem 7. Nevertheless, the Thue-Morse morphism is cubefree [1].
Alphabets with Card ( B ) < 2 only provide trivial results, because the only k-powerfree morphism from A * to { ε } * is the empty morphism ε, and for Card ( B ) = 1 the only additional morphism is the map for Card ( A ) = 1 that maps the single element in A to the single letter in B . From now on, we consider alphabets with Card ( B ) 2 . First, we deal with the case Card ( A ) 3 .
Theorem 8
(Richomme, Wlazinski [23]). Given two alphabets A and B such that Card ( A ) 3 and Card ( B ) 2 , and given any integer k 3 , there is no finite test-set for k-powerfree morphisms from A * to B * .
This again is a negative result, which shows that the general situation is difficult to handle. In general, no finite set of words suffices to verify the k-powerfreeness of a morphism. The situation improves if we restrict ourselves to uniform morphisms, and look for test-sets for this restricted class of morphisms only. Here, a test-set for k-powerfreeness of uniform morphisms on A * is a set T A * such that, for every uniform morphisms ρ on A * , ρ is k-powerfree if and only if ρ ( T ) is k-powerfree.
The existence of finite test-sets of this type was recently established by Richomme and Wlazinski [28]. Let Card ( A ) 2 and k 3 be an integer. Define
T ( k ) ( A ) : = U ( k ) ( A ) F ( k ) ( A ) V ( k ) ( A )
where U ( k ) ( A ) is the set of k-powerfree words over A of length at most k + 1 , and V ( k ) ( A ) is the set of words over A that can be written in the form a 0 w 1 a 1 w 2 a k - 1 w k a k with letters a 0 , a 1 , , a k A and words w 1 , w 2 w k A * which contain every letter of A at most once and satisfy | w i | - | w j | 1 . Obviously, this set is finite and comprises words with a maximum length of
max | w | | w T ( k ) ( A ) k Card ( A | ) + 1 + 1 .
Theorem 9
(Richomme, Wlazinski [28]). Let Card ( A ) 2 and k 3 be an integer. The finite set T ( k ) ( A ) is a test-set for k-powerfreeness of uniform morphisms on A * .
Due to the upper bound on the maximum length of words in T ( k ) ( A ) , the following corollary is immediate.
Corollary 10
(Richomme, Wlazinski [28]). A uniform morphism ρ on A * is k-powerfree for an integer power k 3 if and only if ρ ( w ) is k-powerfree for all k-powerfree words w of length at most k Card ( A ) + 1 + 1 .
Although this result provides an explicit test-set for k-powerfreeness, it is of limited practical use, simply because the test-set becomes large very quickly. Already for Card ( A ) = 4 and k = 3 , the set T ( 3 ) ( A ) has 26247020 elements. For comparison, the set of cubefree words in four letters of length 16, as required in Corollary 10, has 1939267560 elements, so is still much larger.
Finally, let us quote the following result of Keränen [38], which characterises k-powerfree binary morphisms and indicates that the test-set of Theorem 9 is far from optimal.
Theorem 11
(Keränen [38]). Let ρ : { a , b } B * be a uniform morphism with ρ ( a ) ρ ( b ) and primitive words ρ ( a ) , ρ ( b ) and ρ ( a b ) . For every k-powerfree word w { a , b } * , ρ ( w ) is k-powerfree if and only if ρ ( v ) is k-powerfree for every subword v of w with
| v | 4 for 3 k 6 ; 2 3 ( k + 1 ) for k 7 .

4. Entropy of powerfree words

Let A be an alphabet. A subset X A * is called factorial if for any word x X all factors of x are also contained in X. Define for a factorial subset X A * the number of words of length n occurring in X by c X ( n ) . This number gives some idea of the complexity of X: the larger the number of words of length n, the more diverse or complicated is the set. That is why c X : N N is called the complexity function of X.
Definition 12.
The (combinatorial) entropy of an infinite factorial set X A * is defined by
h ( X ) = lim n 1 n log c X ( n ) .
The requirement that X is factorial ensures the existence of the limit, see for example [39, Lemma 1].
We note the following:
(i)
if X A * with Card ( A ) = r , then 1 c X ( n ) r n for all n which implies 0 h ( X ) log   r .
(ii)
if X A * with Card ( A ) = r , then c X ( n ) = r n and h ( X ) = log   r .
The set of k-powerfree words F ( k ) ( A ) over an alphabet A is obviously a factorial subset of A * , which is infinite for suitable values of k for a given alphabet A . The precise value of the corresponding entropy, which coincides with the topological entropy [40], is not known, but lower and upper bounds exist for many cases. Recently, much improved upper and lower bounds have been established for h F ( 2 ) ( { 0 , 1 , 2 } ) and h F ( 3 ) ( { 0 , 1 } ) , which will be outlined below. Generally, it is easier to find upper bounds than to give lower bounds, due to the factorial nature of the set of k-powerfree words, so we start with describing several methods to produce upper bounds on the entropy.

4.1. Upper bounds for the entropy

A simple way to provide upper bounds is based on the enumeration of the set of k-powerfree words up to some length. Clearly, for the case of r = Card ( A ) letters, the number of words c ( n ) : = c F ( k ) ( A ) ( n ) is bounded by r n , so the corresponding entropy is h : = h F ( k ) ( A ) log r , as mentioned above. Suppose we know the actual value of c ( n ) for some fixed n. Then, due to the factorial nature of the set F ( k ) ( A ) ,
c ( m n ) c ( n ) m
for any m 1 . Hence
h = lim m log c ( m n ) m n log c ( n ) n ,
which, for any n, yields an upper bound for h. Obviously, the larger the value of n, the better the bound obtained in this way. In some cases, the bound can be slightly improved by considering words that overlap in a couple of letters; see [39] for an example.
Sharper upper bounds can be produced by following a different approach, namely by considering a set of words that do not contain k-powers of a fixed finite set of words, for instance k-powers of all words up to a given length. This limitation means that the number of forbidden words is finite, and that the resulting factorial set has a larger entropy than the set of k-powerfree words, so the latter provides an upper bound. Again, by increasing the number of forbidden words, the bounds can be systematically improved.
As Noonan and Zeilberger pointed out [41], it is possible to calculate the generating function for the numbers of words avoiding a finite set of forbidden words by solving a system of linear equations. The generating functions are rational functions, and the location of the pole closest to the origin determines the radius of convergence, and hence the entropy of the corresponding set of words. This approach has been applied in Ref. [26] to derive an upper bound for the set of squarefree words in three letters, and generating functions for cubefree words in two letters are discussed below.
A related, though computationally easier approach is based on a Perron-Frobenius argument. It is sometimes referred to as the ‘transfer matrix’ or the ‘cluster’ approach. Here, a matrix is constructed, which determines how k-powerfree words of a given length can be concatenated to form k-powerfree words, and the growth rate is then determined by the maximum eigenvalue of this matrix. Both methods yield upper bounds that can be improved by increasing the length of the words involved, and in principle can approximate the entropy arbitrarily well, though in practice this is limited by the computational problem of computing the leading eigenvalue of a large matrix, or solving a large system of linear equations; see, for instance, [27] for details.

4.2. Lower bounds for the entropy

Until very recently, all methods used to prove that the entropy of k-powerfree words is positive and to establish lower bounds on the entropy were based on k-powerfree morphisms. Clearly, a k-powerfree morphism, iterated on a single letter, produces k-powerfree words of increasing length and suffices to show the existence of infinite k-powerfree words. For example, the fact that the Thue-Morse morphism (1) is cubefree shows the existence of cubefree words of arbitrary length in two letters. To prove that the entropy is actually positive, one has to show that the number of k-powerfree words grows exponentially with their length. Essentially, this is achieved by considering k-powerfree morphisms from a larger alphabet. The following theorem is a generalisation of Brandenburg’s method, compare [11], and provides a path to produce lower bounds for the entropy of k-powerfree words.
Theorem 13.
Let A and B be alphabets with Card ( A ) = rCard ( B ) , where r > 1 is an integer. If there exists an ℓ-uniform k-powerfree morphism ρ : A * B * , then
h F ( k ) ( B ) log r - 1 .
Proof.
For this proof define h : = h F ( k ) ( B ) , c ( n ) : = c F ( k ) ( B ) ( n ) and s : = Card ( B ) . Label the elements of A as { a 11 , , a 1 r , a 21 , , a 2 r , , a s 1 , , a s r } and the elements of B as { b 1 , , b s } . Define the map ϕ : A * B * as ϕ ( a i j ) : = b i for i = 1 , , s and j = 1 , , r . Hence Card ( ϕ - 1 ( b i ) ) = r . Every k-powerfree word of length m over B has r m different preimages of ϕ which, by construction, consist only of k-powerfree words. These words are mapped by ρ, which is injective due to its k-powerfreeness, to different k-powerfree words of length m over B . This implies the inequality
c ( m ) r m c ( m )
for any m > 0 . This means
c ( m ) c ( m ) 1 m r ,
and hence
log c ( m ) m - log c ( m ) m log r
for any m > 0 . Taking the limit as m gives
( - 1 ) h log r ,
thus establishing the lower bound. ☐
This result means that, whenever we can find a uniform k-powerfree morphism from a sufficiently large alphabet, it provides a lower bound for the entropy. Clearly, the larger r and the smaller the better the bound, so one is particularly interested in uniform k-powerfree morphisms from large alphabets of minimal length.
Another method due to Brinkhuis [12], which is related to Brandenburg’s method, can be generalised as follows. Let again B = { b 1 , , b s } be an alphabet and r N . For i = 1 , s let
U i : = { U i , 1 , U i , 2 , , U i , r }
with U i , j F ( k ) ( B ) , where the latter denotes the words in F ( k ) ( B ) which have length . The set U = { U 1 , U n } is called an ( k , , r ) -Brinkhuis-set if the -uniform substitution (in the context of formal language theory) ϕ from B * to itself defined by
ϕ : b i U i for i = 1 , , s
has the property ϕ ( F ( k ) ( B ) ) F ( k ) ( B ) . In other words U is an ( k , , r ) -Brinkhuis-set if the substitution of every letter b i , occurring in a k-powerfree word, by an element of U i results in a k-powerfree word over B . The existence of a ( k , , r ) -Brinkhuis-set delivers the lower bound
h ( F ( k ) ( B ) ) log r - 1
because every k-powerfree word of length m is mapped to r m powerfree words of length m ; compare Eq. (3).
The method of Brinkhuis is stronger than the method of Brandenburg. Not every ( k , , r ) -Brinkhuis-set implies a map according to Theorem 13, see [42, p. 287] for an example. Conversely, if there exists an -uniform k-powerfree morphism ρ : A * B * according to Theorem 13, then there exists a ( k , , r ) -Brinkhuis-set, namely U i = ρ ( a i 1 ) , , ρ ( a i r ) for i = 1 , , s , with the notation of Theorem 13.
Brinkhuis’ method was applied in Refs. [21,43,44]; see also below for a summary of bounds obtained for binary cubefree and ternary squarefree words. These bounds have in common that they are nowhere near the actual value of the entropy, and while a systematic improvement is possible by increasing the value of r in Theorem 13 (which, however, also means that one has to consider larger values of ), it will always result in a much smaller growth rate, because only a subset of words is obtained in this way.
Recently, a different approach has been proposed [29], based essentially on the derivation of an inequality
S m ( n + 1 ) α S m ( n )
for the weighted sum S m ( n ) of the number of elements in a certain subset (which depends on the choice of m N ) of squarefree (resp. cubefree) words of length n over a ternary (resp. binary) alphabet and a parameter α > 1 which satisfies two inequalities for i = m , m + 1 , , n - 1 . The estimation of S m ( n + 1 ) starts from a Perron-Frobenius argument and concludes with the observation that the order of growth of the number of squarefree (resp. cubefree) words cannot be less than the order of growth of S m ( n ) , which is α. This implies
h ( F ( k ) ( A ) ) > log ( α )
for k = 2 and A = { 0 , 1 , 2 } or k = 3 and A = { 0 , 1 } , with the corresponding values for α. In the end, this method leads to a recipe to check, with a computer, several conditions for the parameters (including m), which ensure that the inequality for S m holds. By increasing the parameter m, it appears to be possible to estimate the growth rate of cubefree and squarefree words with an arbitrary precision. For details, we refer the reader to Ref. [29].

5. Bounds on the entropy of binary cubefree and ternary squarefree words

We now consider the two main examples, binary cubefree and ternary squarefree words, in more detail, reviewing the bounds derived by the various approaches mentioned above. We start with the discussion of binary cubefree words, and then give a brief summary of the analogous results for ternary squarefree words.

5.1. Binary cubefree words

Define for this section b ( n ) : = c F ( 3 ) ( { 0 , 1 } ) ( n ) as the number of binary cubefree words of length n and h : = h F ( 3 ) ( { 0 , 1 } ) as the entropy of cubefree words over the alphabet { 0 , 1 } . The values for b ( n ) with n 47 are given in [45]; an extended list for n 80 is shown in Table 1. They were obtained by a straight-forward iterative construction of cubefree words, appending a single letter at a time. According to Eq. (2), the corresponding upper limit for the entropy h is
h log b ( 80 ) 80 0 . 389855 .
For comparison, the limit obtained using the number of words of length 79 is 0 . 390020 , which indicates that these limits are still considerably larger than the actual value of h. As in the case of ternary squarefree words [26], the asymptotic behaviour of b ( n ) fits a simple form b ( n ) A x c - n as n , pointing at a simple pole as the dominating singularity of the corresponding generating function at x = x c . The estimated values of the coefficients are A 2 . 847 and x c 1 . 4575773 , leading to a numerical estimate of h = log ( x c ) 0 . 3767757 for the entropy.
Let us compare this with the upper limit derived from generating function of the number of binary length-p cubefree words. To this end, let b p ( n ) : = c F ( 3 , p ) ( { 0 , 1 } ) ( n ) denote the number of length-p cubefree words, and define
B p ( x ) = n = 0 b p ( n ) x n
to be the generating function for the number of binary length-p cubefree words. These functions of x are rational [41]. The first few generating functions read
B 0 ( x ) = 1 1 - 2 x = 1 + 2 x + 4 x 2 + 8 x 3 + 16 x 4 + 32 x 5 + 64 x 6 + , B 1 ( x ) = 1 + x + x 2 1 - x - x 2 = 1 + 2 x + 4 x 2 + 6 x 3 + 10 x 4 + 16 x 5 + 26 x 6 + , B 2 ( x ) = 1 + 2 x + 3 x 2 + 3 x 3 + 3 x 4 + 3 x 5 + 2 x 6 1 - x 2 - x 3 - x 4 - x 5 = 1 + 2 x + 4 x 2 + 6 x 3 + 10 x 4 + 16 x 5 + 24 x 6 +
The degrees of the numerator and denominator polynomials for p 14 are given in Table 2. The generating functions B p ( x ) have a finite radius of convergence, determined by the location of the zero x c of its denominator polynomial which lies closest to the origin. A plot of the location of poles of B 14 ( x ) is shown in Figure 1. It very much resembles the analogous distribution for ternary squarefree words [26]; again, the poles seem to accumulate, with increasing p, on or near the unit circle, which may indicate the presence of a natural boundary beyond which the generating function for cubefree binary words (corresponding to taking p ) cannot be analytically continued; see [26] for a discussion of this phenomenon in the case of ternary squarefree words.
Table 1. The number b ( n ) of binary cube-free words of length n for n 80 .
Table 1. The number b ( n ) of binary cube-free words of length n for n 80 .
n b ( n ) n b ( n ) n b ( n ) n b ( n )
12 217754 4114565048 6127286212876
24 2211320 4221229606 6239771765144
36 2316502 4330943516 6357970429078
410 2424054 4445102942 6484496383550
516 2535058 4565741224 65123160009324
624 2651144 4695822908 66179515213688
736 2774540 47139669094 67261657313212
856 28108664 48203577756 68381385767316
980 29158372 49296731624 69555899236430
10118 30230800 50432509818 70810266077890
11174 31336480 51630416412 711181025420772
12254 32490458 52918879170 721721435861086
13378 33714856 531339338164 732509125828902
14554 341041910 541952190408 743657244826158
15802 351518840 552845468908 755330716904964
161168 362213868 564147490274 767769931925578
171716 373226896 576045283704 7711325276352154
182502 384703372 588811472958 7816507465616784
193650 396855388 5912843405058 7924060906866922
205324 409992596 6018720255398 8035070631260904
As a consequence of Pringsheim’s theorem [46, Sec. 7.2], there is a dominant singularity on the positive real axis; we denote the position of the singularity by x c . For the cases we considered, this simple pole appears to be the only dominant singularity. Since the radius of convergence of the power series B p ( x ) is given by lim sup n b p ( n ) n - 1 , the entropy h p of the set of binary length-p cubefree words is h p = - log x c . Clearly, h p h p for p p , and h = lim p h p , so for any finite p the entropy h p provides an upper bound of the entropy h of binary cubefree words. The values of the entropy h p for p 14 are given in Table 2. As was observed for ternary squarefree words [26], the values appear to converge very quickly with increasing p, but it is difficult to extract a reliable estimate of the true value of the entropy without making assumptions on the asymptotic behaviour.
Already in 1983, Brandenburg [11] showed that
2 · 2 n 9 b ( n ) 2 · 1251 n 17
which leads in our setting to 0 . 07701 h 0 . 41952 . The currently best upper bounds are due to Edlin [45] and Ochem and Reix [27]. Analysing length-15 cubefree words up to a finite length, Edlin [45] arrives at the bound of h 0 . 376777 (which is what we would expect to find if we extended Table 2 to n = 15 , but this would require huge computational effort to compute the corresponding generating function completely), while using the transfer matrix (or cluster) approach described above, Ochem and Reix obtained an upper bound on the growth rate of 1 . 45758131 , which corresponds to the bound
h 0 . 3767784
on the entropy.
Table 2. The entropy h p of binary length-p cubefree words, obtained from the radius of convergence of the generating functions B p ( x ) of Eq. (5). Here, d num and d den denote the degree of the polynomial in the numerator and denominator of B p ( x ) , respectively.
Table 2. The entropy h p of binary length-p cubefree words, obtained from the radius of convergence of the generating functions B p ( x ) of Eq. (5). Here, d num and d den denote the degree of the polynomial in the numerator and denominator of B p ( x ) , respectively.
p d num d den h p
0010.693147
1220.481212
2650.427982
321130.394948
429170.385103
543250.380594
685570.378213
7127990.377332
81651270.377179
93002540.376890
104503950.376835
115695130.376811
12109810310.376790
13175016560.376783
14262725400.376779
Figure 1. Location of poles of the generating function B 14 ( x ) .
Figure 1. Location of poles of the generating function B 14 ( x ) .
Entropy 10 00590 g001
We now move on to the lower bound and cubefree morphisms. We already have seen one example above, the Thue-Morse morphism, which is a cubefree morphism from a binary alphabet to a binary alphabet. As explained above, it is also useful to find uniform cubefree morphisms from larger alphabets, because these provide lower bounds on the entropy. Clearly, if we have a uniform cubefree morphism ρ : A * { 0 , 1 } * of length , with Card ( A ) = r , it is completely specified by the r words w i , i = 1 , , r , which are the images of the letters in A . Since any permutation of the letters in A will again yield a uniform cubefree morphism, the set w 1 , , w r { 0 , 1 } of generating words determines the morphism up to permutation of the letters in A .
Moreover, the set w 1 ¯ , , w r ¯ , where w ¯ denotes the image of w under the permutation 0 1 , also defines cubefree morphisms, as does w 1 ˜ , , w r ˜ , where w ˜ denotes the reversal of w, i.e., the words w read backwards. This is obvious because the test-sets of Theorem 9 are invariant under these operations. Unless the words are palindromic (which means that w ˜ = w ), the set w 1 , , w r thus represents four different morphisms (not taking into account permutation of letters in A ), the forth obtained by performing both operations, yielding w 1 ¯ ˜ , , w r ¯ ˜ .
For cubefree morphisms from a three-letter alphabet A to two letters one needs words of length at least six. For length six, there are twelve in-equivalent (with respect to the permutation of letters in A ) cubefree morphisms. The corresponding sets of generating words are
{ w 1 , w 2 , w 4 } , { w 2 , w 3 ¯ , w 3 ¯ ˜ } , { w 2 , w 3 ¯ , w 4 } ,
and the corresponding images under the two operations explained above. Here, the four words are
w 1 = 001011 , w 2 = 001101 , w 3 = 010110 , w 4 = 011001 .
It turns out that none of these morphisms actually satisfy the sufficient criterion of Theorem 7, but cubefreeness was verified using the test-set of Theorem 9.
One has to go to length nine to find cubefree morphisms from four to two letters. There are 16 in-equivalent morphisms with respect to permutations of the four letters. Explicitly, they are given by the generating sets
{ w 1 , w 2 , w 2 ¯ ˜ , w 3 ¯ ˜ } , { w 4 , w 6 ¯ , w 7 ¯ , w 9 ¯ } , { w 5 , w 5 ¯ , w 8 , w 8 ¯ } , { w 5 , w 5 ¯ , w 8 ˜ , w 8 ¯ ˜ } , { w 6 ¯ , w 7 ˜ , w 8 ¯ ˜ , w 9 }
with words
w 1 = 001001101 , w 2 = 001010011 , w 3 = 001011001 , w 4 = 001101001 , w 5 = 010010110 , w 6 = 010011010 , w 7 = 010100110 , w 8 = 011001001 , w 9 = 011010110 .
Note that w 9 = w 9 ˜ is a palindrome, and that two of the five sets are invariant under the permutation 0 1 , which explains why they only represent 16 different morphisms.
Beyond four letters, the test-set of Theorem 9 becomes unwieldy, but the sufficient criterion of Theorem 7 can be used to obtain morphisms. However, these may not have the optimal length, as the examples here show – again for length nine all morphisms violate the conditions of Theorem 7. Still, this need not be the case; for instance, morphisms from a five-letter alphabet that satisfy the sufficient criterion exist for length 12, which in this case is the optimal length.
As a consequence of Theorem 13, the morphisms (6) from a four letter alphabet show that the entropy of cubefree binary words is positive, and that
h log 2 8 0 . 08664 .
Using the sufficient condition, this bound can be improved. For instance, for length 15, one can find cubefree morphisms from 10 letters, which yields a lower bound of
h log 5 14 0 . 11496 .
However, a large step to close the gap between these lower bounds and the upper bound was achieved by the work of Kolpakov [29]. With his approach, a lower bound of
h 0 . 37676 ,
which is the best lower bound so far, has been established. The difference between this bound and the upper bound 0 . 3767784 by Ochem and Reix [27] is just 10 - 5 , showing the huge improvement over the previously available estimates.

5.2. Ternary squarefree words

Denote by a ( n ) : = c F ( 2 ) ( { 0 , 1 , 2 } ) ( n ) the number of ternary squarefree words and by a p ( n ) the number of length-p squarefree words of length n. For this section let h : = h F ( 2 ) ( { 0 , 1 , 2 } ) be the entropy of squarefree words over the alphabet { 0 , 1 , 2 } . See [39] for a list of a ( n ) for n 90 and [21] for 91 n 110 . The generating functions are defined according to the binary cubefree case. The first four of them are stated in [26, Sec. 3], which also contains a list of their radii of convergence for p 24 . Already in 1983 Brandenburg [11] showed that
6 · 2 n 22 a ( n ) 6 · 1172 n 22
which leads in our setting to
0 . 03151 h 0 . 32120 .
In 1999, Noonan and Zeilberger [41] lowered the upper bound to 0 . 26391 by means of generating functions for the number of words avoiding squares of up to length 23. Grimm and Richard [26] used the same method to improve the upper bound to 0 . 263855 . At the moment, the best known upper bound is 0 . 263740 which was established by Ochem in 2006 using an approach based on the transfer matrix (or cluster) method, see [27] for details.
In 1998, Zeilberger showed that a Brinkhuis pair of length 18 exists, which by Theorem 13 implies that the entropy is bounded by h log ( 2 ) / 17 0 . 04077 [47]. By going to larger alphabets, this was subsequently improved to h log ( 65 ) / 40 0 . 10436 by Grimm [21] and h log ( 110 ) / 42 0 . 11192 by Sun [44]. Again, the recent work of Kolpakov [29] has made a large difference to the lower bounds; he achieved the best current lower bound which is h 0 . 26369 . The difference between the best known upper and lower bound is now just 5 × 10 - 5 .

6. Letter frequencies

For a finite word w of length n, the frequency of the letter a is # a ( w ) / n [ 0 , 1 ] , where # a ( w ) denotes the number of occurrences of the letter a in w. In general, infinite k-powerfree words need not have well-defined letter frequencies. However, we can define upper and lower frequencies f a + f a - of a letter a A of a word w A * as
f a + : = sup { w n } lim sup n # a ( w n ) n , f a - : = inf { w n } lim inf n # a ( w n ) n ,
where w n is a n-letter subword of w. Here, we take the supremum and infimum over all sequences { w n } . Alternatively, we can compute these frequencies from a n + = max w n w # a ( w n ) and a n - = min w n w # a ( w n ) by f a ± = lim n a n ± / n . The limits exist due to the subadditivity of the sequences { a n + } and { 1 - a n - } . If the infinite word w is such that f a + = f a - = : f a , we call f a the frequency of the letter a in w.
The requirement that a word is k-powerfree for some k restricts the possible letter frequencies. For instance, for cubefree binary words, there cannot be three consecutive zeros, and hence the frequency of the letter 0 is certainly bounded from above by 2 / 3 . Due to symmetry under permutation of letters, it is bounded from below by 1 / 3 . In a similar way, considering maximum and minimum frequencies of letters in finite k-powerfree words produces bounds on the possible (upper and lower) frequencies of letters in infinite words. It is of interest, for which frequency of a letter k-powerfree words cease to exist, and how the entropy of k-powerfree words depends on the letter frequency. To answer these questions, k-powerfree morphisms are exploited once again, and in two ways. Firstly, the argument using frequencies in finite words only produces ‘negative’ results, in the sense that you can exclude the existence of k-powerfree words for certain ranges of the frequency. To show that k-powerfree words of a certain frequency actually exist, these are produced as fixed points of k-powerfree morphisms. The letter frequency for an infinite word obtained as a fixed point of a morphism ρ on the alphabet A = { a 1 , a 2 , , a m } is well-defined, and obtained from the (statistically normalised) right Perron-Frobenius eigenvector of the associated m × m substitution matrix M with elements M i j = # a i ρ ( a j ) ; see for instance [48]. For example, for the Thue-Morse morphism (1), the substitution matrix is M = 1 1 1 1 with Perron-Frobenius eigenvalue 2 and corresponding eigenvector ( 1 2 , 1 2 ) T , so both letters occur with frequency 1 / 2 in the infinite Thue-Morse word.
To show that there exist exponentially many words with a given letter frequency, or, in other words, that the entropy of the set of k-powerfree words with a given letter frequency is positive, a variant of Theorem 13 is used.
Theorem 14.
Let A = { a 11 , , a 1 r , a 21 , , a 2 r , , a s 1 , , a s r } and B = { b 1 , , b s } be alphabets with Card ( A ) = r s and Card ( B ) = s , where r , s > 1 are integers. Assume that there exists an ℓ-uniform k-powerfree morphism ρ : A * B * with
# b ρ ( a i j ) = # b ρ ( a i j )
for all b B , 1 i s and 1 j , j r . Define the r × r matrix M with elements
M i j = # b i ρ ( a j 1 ) ,
and denote its right Perron-Frobenius eigenvector (with eigenvalue ℓ) by ( f 1 , , f r ) T , with statistical normalisation f 1 + + f r = 1 . Then, the entropy h of the set of k-powerfree words in B with prescribed letter frequencies f i of b i , 1 i r , is bounded by
h log r - 1 .
Proof.
The bound is the same as in Theorem 13, and the statement thus follows by showing that the infinite words obtained from the uniform k-powerfree morphism ρ have letter frequency given by f 1 , , f r .
We again introduce the morphism ϕ : A * B * by ϕ ( a i j ) : = b i for i = 1 , , s and j = 1 , , r . Every k-powerfree word of length m over B has r m different preimages of ϕ which, by construction, consist only of k-powerfree words. These words are mapped by ρ, which is injective due to its k-powerfreeness, to different k-powerfree words of length m over B . Due to the condition # b ρ ( a i j ) = # b ρ ( a i j ) on ρ, the letter statistics do not depend on the choice of the preimage under ϕ. The letter frequencies of words obtained by the procedure described in the proof of Theorem 13 are thus well defined, and given by the right Perron-Frobenius eigenvector of the r × r matrix M. ☐
Some results for binary cubefree words, as well as a discussion of the empirical frequency distribution of cubefree binary words obtained from the enumeration up to length 80, are detailed below.

6.1. Binary cubefree words

When counting the numbers b ( n ) of binary cubefree words of length n shown in Table 1, we also counted the number b ( n , n 0 ) of words with n 0 occurrences of the letter 0. Clearly, these numbers satisfy
b ( n ) = n 0 = 0 n b ( n , n 0 )
and b ( n , n - n 0 ) = b ( n , n 0 ) as a consequence of the symmetry under permutation of letters. Their values for n = 80 are given in Table 3.
Table 3. The number of the binary cube-free words of length 80 with given excess e = n 0 - 40 of the letter 0.
Table 3. The number of the binary cube-free words of length 80 with given excess e = n 0 - 40 of the letter 0.
| e | b ( 80 , 40 + e )
0 9502419002570
1 7575510051076
2 3805516412947
3 1172047753336
4 210113470848
5 20038955440
6 866998237
7 12460464
8 26819
9 0
Obviously, there are at least 32 and at most 48 occurrences of the letter 0 in any cubefree binary word of length 80, so the frequency of a letter is bounded by 2 / 5 f 0 3 / 5 . A stronger bound has been obtained by Ochem [30], who showed (amongst many results for a number of rational powers) that f 0 > 115 283 0 . 40636 , using a backtracking algorithm.
One is interested to locate the minimum frequency f min , such that infinite cubefree words with frequency f 0 = f min exist, but not for any f 0 < f min . Clearly, the lower bound above is a lower bound for f min . In order to obtain an upper bound, we need to prove the existence of an infinite binary cubefree words of a given letter frequency. This is again done by using a cubefree morphism, which provides an infinite word with well-defined letter frequencies. For instance,
0 011011010110110011011010110 1 011011010110110011010110110
is a uniform morphism of length 27 with substitution matrix 11 11 16 16 , so the infinite fixed point word has letter frequencies f 0 = 11 27 and f 1 = 16 27 . Hence we deduce that
0 . 406360 115 283 < f min 11 27 0 . 407407 .
Using the data from our enumeration of binary cubefree words up to length 80, we can study the empirical distribution for small length, and try to conjecture the behaviour for large words. Figure 2 shows a plot of the normalised data b ( 80 , 40 + e ) / b ( 80 ) of Table 3, compared with a Gaussian distribution, which appears to fit the data very well. Here, the Gaussian profile was determined from the variance σ 2 of the data points, which is approximately σ 2 2 . 124 .
Figure 2. Distribution of cubefree words of length 80 as a function of the excess e = n 0 - 40 , compared to a Gaussian distribution with the same variance σ 2 .
Figure 2. Distribution of cubefree words of length 80 as a function of the excess e = n 0 - 40 , compared to a Gaussian distribution with the same variance σ 2 .
Entropy 10 00590 g002
To draw any conclusions on the limit of large word length, we need to consider the scaling of the distribution with the word length n. The first step is to determine how the variance scales with n. A plot of the numerical data is given in Figure 3, which shows that, for large n, the variance appears to scale linearly with n. A least squares fit to the data points for 40 n 80 gives a slope of 0 . 021616 .
Figure 3. Variance of the distribution of the letter frequency in binary cubefree words of length n.
Figure 3. Variance of the distribution of the letter frequency in binary cubefree words of length n.
Entropy 10 00590 g003
Assuming that the distribution for fixed n is Gaussian, the suitably re-scaled data
g n ( x ) = n b ( n , n 2 + e ) b ( n ) ,
considered as a function of the rescaled letter excess
x = e n ,
should approach a Gaussian distribution
G ( x ) = 1 2 π σ 2 exp - x 2 2 σ 2
with variance σ 2 0 . 021616 . Figure 4 shows a plot of this distribution, together with the data points obtained for 40 n 80 . Clearly, there are some deviations, which has to be expected due to the fact that the relationship between the variance and the length shown in Figure 3, while being asymptotically linear, is not a proportionality; however, the overall agreement is reasonable. A plausible conjecture, therefore, is that the scaled distribution becomes Gaussian in the limit of large word length. In terms of the entropy, the observed concentration property is consistent with the entropy maximum occurring at letter frequency 1 / 2 , and a lower entropy for other letter frequencies. This is similar to the observed and conjectured behaviour for ternary squarefree words in Ref. [26].
Figure 4. The scaled data g n ( x ) for lengths 40 n 80 , compared to a Gaussian distribution G ( x ) .
Figure 4. The scaled data g n ( x ) for lengths 40 n 80 , compared to a Gaussian distribution G ( x ) .
Entropy 10 00590 g004
By an application of Theorem 14, the cubefree morphisms of Eq. (6) show that the entropy for the case of letter frequency f 0 = f 1 = 1 / 2 is positive. More interesting in the case of non-equal letter frequencies. As an example, consider the 13-uniform morphism
a 11 0010010110011 a 12 0010011010011 a 21 0010110010011 a 22 0100101001011 ,
where all words on the right-hand side comprise seven letters 0 and six letters 1. One can check that this morphism satisfies the criterion of Theorem 9, hence is cubefree. Consequently, the matrix M of Theorem 14 is M = 7 7 6 6 , and the letter frequencies of any word constructed by this morphism are f 0 = 7 / 13 and f 1 = 6 / 13 . Hence, the set of binary cubefree words with letter frequencies f 0 = 7 / 13 and f 1 = 6 / 13 has positive entropy bounded by h 1 6 log 2 0 . 115525 (and, by symmetry, this also holds for f 0 = 6 / 13 and f 1 = 7 / 13 ). Again, like in the case of ternary squarefree words discussed in Ref. [26], it is plausible to conjecture that the entropy is positive on an entire interval of letter frequencies around 1 / 2 (where it is maximal), presumably on ( f min , 1 - f min ) .

6.2. Ternary squarefree words

Letter frequencies in ternary squarefree words were first studied by Tarannikov [49]. He showed that the minimal letter frequency f min is bounded by
0 . 274649 1780 6481 f min 64 233 0 . 274678 ,
see [49, Thm. 4.2]. These bounds have recently been improved by Ochem [30] to
0 . 2746498 1000 3641 f min 883 3215 0 . 2746501 ,
who also showed that the maximum frequency f max of a letter in a ternary squarefree word is bounded by
f max 255 653 0 . 390505 ;
see [30, Thm. 1]. Very recently, Khalyavin [33] proved that the minimum frequency is indeed equal to Ochem’s upper bound, so
f min = 883 3215 ,
which finally settles this question.
By constructing suitable squarefree morphisms in accordance with Theorem 14, Richard and Grimm [26] showed that, for a number of letter frequencies, the number of ternary squarefree words grows exponentially. This has recently been further investigated by Ochem [32].

7. Summary and Outlook

In this paper, we reviewed recent progress on the combinatorics of k-powerfree words, with particular emphasis on the examples of binary cubefree and ternary squarefree words, which have attracted most attention over the years. Recent work in this area, using extensive computer searches, but also new methods, has led to a drastic improvement of the known bounds for the entropy of these sets. No analytic expression for the entropy is known to date, and the results on the generating function for the sets of length-p powerfree words indicate that this may be out of reach. However, considerable progress has been made on other combinatorial questions, such as letter frequencies, where again bounds have been improved, but eventually also a definite answer has emerged, in this case on the minimum letter frequency in ternary squarefree words.
We also presented some new results on binary cubefree words, including an enumeration of the number of words and their letter frequencies for length up to 80. The empirical distribution of the number of words as a function of the excess of one letter is investigated, and conjectured to become Gaussian in the limit of infinite word lengths after suitable scaling. We also found bounds on the letter frequency in binary squarefree words, and show that exponentially many words with unequal letter frequency exist, like in the case of ternary squarefree words. The analysis of the generating functions of length-p binary cubefree words, which we calculated for p 14 , also shows striking similarity to the case of ternary squarefree words, suggesting that the observed behaviour may be generic for sets of k-powerfree words.
While a lot of progress has been made, there remain many open questions. For instance, is there an explanation for the observed accumulation of poles and zeros of the generating functions on or near the unit circle, and is it possible to prove what happens in the limit when p ? How does the entropy depend on the power, say for binary k-powerfree words? A partial answer to this question is given in Ref. [25], but it would be nice to show that, at least in some region, the entropy increases by a finite amount at any rational value of k, which you might expect to happen. Concerning powerfree words with given letter frequencies, how does the entropy vary as a function of the frequency? One might conjecture that the entropy changes continuously, but at present all we have are results that for some very specific frequencies, where powerfree morphisms have been found, the entropy is positive. Some of these questions may be too hard to hope for an answer in full generality, but the recent progress in the area shows that one should keep looking for alternative approaches which may succeed.

Acknowledgements

This research has been supported by EPSRC grant EP/D058465/1. The authors thank Christoph Richard for useful comments and discussions. UG would like to thank the University of Tasmania (Hobart) and the AMSI (Melbourne) for their kind hospitality, and for the opportunity to present his research as part of the AMSI/MASCOS Theme Program Concepts of Entropy and Their Applications.

References and Notes

  1. Thue, A. Selected mathematical papers; Nagell, T., Selberg, A., Selberg, S., Thalberg, K., Eds.; Universitetsforlaget: Oslo, 1977. [Google Scholar]
  2. Morse, H. M. Recurrent geodesics on a surface of negative curvature. Trans. Amer. Math. Soc. 1921, 22(1), 84–100. [Google Scholar] [CrossRef]
  3. Lothaire, M. Combinatorics on words; Cambridge Mathematical Library; Cambridge University Press: Cambridge, 1997; corrected reprint. [Google Scholar]
  4. Lothaire, M. Algebraic combinatorics on words; Encyclopedia of Mathematics and its Applications; Cambridge University Press: Cambridge, 2002. [Google Scholar]
  5. Lothaire, M. Applied combinatorics on words; Encyclopedia of Mathematics and its Applications; Cambridge University Press: Cambridge, 2005. [Google Scholar]
  6. Zech, T. Wiederholungsfreie folgen. Z. angew. Math. Mech. 1958, 38, 206–209. [Google Scholar] [CrossRef]
  7. Pleasants, P. A. B. Nonrepetitive sequences. Proc. Cambr. Philos. Soc. 1970, 68, 267–274. [Google Scholar] [CrossRef]
  8. Bean, D.; Ehrenfeucht, A.; McNulty, G. Avoidable patterns in strings of symbols. Pacific J. Math. 1979, 95, 261–294. [Google Scholar] [CrossRef]
  9. Crochemore, M. Sharp characterizations of squarefree morphisms. Theoret. Comput. Sci. 1982, 18, 221–226. [Google Scholar] [CrossRef]
  10. Shelton, R. O. On the structure and extendibility of squarefree words. In Combinatorics on Words; Cummings, J. L., Ed.; Academic Press: Toronto, 1983; pp. 101–118. [Google Scholar]
  11. Brandenburg, F.-J. Uniformly growing k-th power-free homomorphisms. Theoret. Comput. Sci. 1983, 23, 69–82. [Google Scholar] [CrossRef]
  12. Brinkhuis, J. Nonrepetitive sequences on three symbols. Quart. J. Math. Oxford Ser. (2) 1983, 34, 145–149. [Google Scholar] [CrossRef]
  13. Leconte, M. A characterization of power-free morphisms. Theoret. Comput. Sci. 1985, 38(1), 117–122. [Google Scholar] [CrossRef]
  14. Leconte, M. k-th power-free codes. In Automata on Infinite Words; Nivat, M., Perrin, D., Eds.; Springer: Berlin, 1985; Lecture Notes in Computer Science; Vol. 192, pp. 172–187. [Google Scholar]
  15. Séébold, P. Overlap-free sequences. In Automata on Infinite Words; Nivat, M., Perrin, D., Eds.; Springer: Berlin, 1985; Lecture Notes in Computer Science; Vol. 192, pp. 207–215. [Google Scholar]
  16. Kobayashi, Y. Repetition-free words. Theoret. Comput. Sci 1986, 44, 175–197. [Google Scholar] [CrossRef]
  17. Keräenen, V. On the k-freeness of morphisms on free monoids. Lecture Notes in Computer Science 1987, 247, 180–188. [Google Scholar]
  18. Baker, K.; McNulty, G.; Taylor, W. Growth problems for avoidable words. Theoret. Comput. Sci. 1989, 69, 319–345. [Google Scholar] [CrossRef]
  19. Currie, J. Open problems in pattern avoidance. Amer. Math. Monthly 1993, 100, 790–793. [Google Scholar] [CrossRef]
  20. Kolpakov, R.; Kucherov, G.; Tarannikov, Y. On repetition-free binary words of minimal density. Theoret. Comput. Sci. 1999, 218, 161–175. [Google Scholar] [CrossRef]
  21. Grimm, U. Improved bounds on the number of ternary square-free words. J. Integer Seq. 2001, 4(2). Article 01.2.7. [Google Scholar]
  22. Currie, J. There are circular square-free words of length n for n≥18. Electron. J. Combin. 2002, 9, #N10. [Google Scholar]
  23. Richomme, G.; Wlazinski, F. Some results on k-power-free morphisms. Theoret. Comput. Sci. 2002, 273, 119–142. [Google Scholar] [CrossRef]
  24. Kucherov, G.; Ochem, P.; Rao, M. How many square occurrences must a binary sequence contain? Electron. J. Combin. 2003, 10, #R12. [Google Scholar]
  25. Karhumäki, J.; Shallit, J. Polynomial versus exponential growth in repetition-free binary words. J. Combin. Theory Ser. A 2004, 105, 335–347. [Google Scholar] [CrossRef]
  26. Richard, C.; Grimm, U. On the entropy and letter frequencies of ternary square-free words. Electron. J. Combin. 2004, 11(1), #R14. [Google Scholar]
  27. Ochem, P.; Reix, T. Upper bound on the number of ternary square-free words. Presented at the Workshop on Words and Automata (WOWA’06); St. Petersburg, 2006. [Google Scholar]
  28. Richomme, G.; Wlazinski, F. Existence of finite test-sets for k-powerfreeness of uniform morphims. Discrete Applied Math. 2007, 155, 2001–2016. [Google Scholar] [CrossRef]
  29. Kolpakov, R. Efficient lower bounds on the number of repetition-free words. J. Integer Seq. 2007, 10(3). Article 07.3.2. [Google Scholar]
  30. Ochem, P. Letter frequency in infinite repetition-free words. Theoret. Comput. Sci. 2007, 380(3), 388–392. [Google Scholar] [CrossRef]
  31. Chalopin, J.; Ochem, P. Dejean’s conjecture and letter frequency. Electronic Notes in Discrete Mathematics 2007, 501–505. [Google Scholar] [CrossRef]
  32. Ochem, P. Unequal letter frequencies in ternary square-free words. In Proceedings of 6th International Conference on Words (WORDS 2007), Marseille, 2007.
  33. Khalyavin, A. The minimal density of a letter in an infinite ternary square-free word is 883 3215 . J. Integer Seq. 2007, 10. Article 07.6.5. [Google Scholar]
  34. Queffélec, M. Substitution dynamical systems—spectral analysis; Springer-Verlag: Berlin, 1987; Lecture Notes in Mathematics. [Google Scholar]
  35. Fogg, N. P. Substitutions in dynamics, arithmetics and combinatorics; Berthé, V., Ferenczi, S., Mauduit, C., Siegel, A., Eds.; Springer-Verlag: Berlin, 2002; Lecture Notes in Mathematics. [Google Scholar]
  36. Allouche, J.-P.; Shallit, J. Automatic sequences; Cambridge University Press: Cambridge, 2003. [Google Scholar]
  37. Moody, R. V. The mathematics of long-range aperiodic order; Kluwer Academic Publishers Group: Dordrecht, 1997; NATO Advanced Science Institutes Series C: Mathematical and Physical Sciences; Vol. 489. [Google Scholar]
  38. Keränen, V. On k-repetition free words generated by length uniform morphisms over a binary alphabet. Lecture Notes in Computer Science 1985, 194, 338–347. [Google Scholar]
  39. Baake, M.; Elser, V.; Grimm, U. The entropy of square-free words. Math. Comput. Modelling 1997, 26, 13–26. [Google Scholar] [CrossRef]
  40. Walters, P. An introduction to ergodic theory; Springer-Verlag: New York, 1982. [Google Scholar]
  41. Noonan, J.; Zeilberger, D. The goulden-jackson cluster method: Extensions, applications, and implementations. J. Difference Eq. Appl. 1999, 5, 355–377. [Google Scholar] [CrossRef]
  42. Berstel, J. Growth of repetition-free words – a review. Theoret. Comput. Sci. 2005, 340(2), 280–290. [Google Scholar] [CrossRef]
  43. Elser, V. Repeat-free sequences. Lawrence Berkeley Laboratory report 1983, LBL-16632. [Google Scholar]
  44. Sun, X. New lower-bound on the number of ternary square-free words. J Integer Seq. 2003, 6. Article 03.3.2. [Google Scholar]
  45. Edlin, A. E. The number of binary cube-free words of length up to 47 and their numerical analysis. J. Differ. Equations Appl. 1999, 5(4-5), 353–354. [Google Scholar] [CrossRef]
  46. Titchmarsh, E. C. The theory of functions; Oxford University Press: Oxford, 1976. [Google Scholar]
  47. Ekhad, S.; Zeilberger, D. There are more than 2n/17 n-letter ternary square-free words. J. Integer Seq. 1998, 1. Article 98.1.9. [Google Scholar]
  48. Baake, M.; Grimm, U.; Joseph, D. Trace maps, invariants, and some of their applications. Int. J. Mod. Phys. B 1993, 7, 1527–1550. [Google Scholar] [CrossRef]
  49. Tarannikov, Y. The minimal density of a letter in an infinite ternary square-free words is 0.2746. J. Integer Seq. 2002, 5. Article 02.2.2. [Google Scholar]

Share and Cite

MDPI and ACS Style

Grimm, U.; Heuer, M. On the Entropy and Letter Frequencies of Powerfree Words. Entropy 2008, 10, 590-612. https://doi.org/10.3390/e10040590

AMA Style

Grimm U, Heuer M. On the Entropy and Letter Frequencies of Powerfree Words. Entropy. 2008; 10(4):590-612. https://doi.org/10.3390/e10040590

Chicago/Turabian Style

Grimm, Uwe, and Manuela Heuer. 2008. "On the Entropy and Letter Frequencies of Powerfree Words" Entropy 10, no. 4: 590-612. https://doi.org/10.3390/e10040590

APA Style

Grimm, U., & Heuer, M. (2008). On the Entropy and Letter Frequencies of Powerfree Words. Entropy, 10(4), 590-612. https://doi.org/10.3390/e10040590

Article Metrics

Back to TopTop