Zipf’s, Heaps’ and Taylor’s Laws are Determined by the Expansion into the Adjacent Possible

Tria, Francesca; Loreto, Vittorio; Servedio, Vito D. P.

doi:10.3390/e20100752

Open AccessArticle

Zipf’s, Heaps’ and Taylor’s Laws are Determined by the Expansion into the Adjacent Possible

by

Francesca Tria

^1,*

,

Vittorio Loreto

^1,2,3 and

Vito D. P. Servedio

³

¹

Physics Department, Sapienza University of Rome, P.le Aldo Moro 5, 00185 Rome, Italy

²

Sony Computer Science Laboratories, 6, rue Amyot, 75005 Paris, France

³

Complexity Science Hub Vienna, Josefstädter Strasse 39, A-1080 Vienna, Austria

^*

Author to whom correspondence should be addressed.

Entropy 2018, 20(10), 752; https://doi.org/10.3390/e20100752

Submission received: 31 July 2018 / Revised: 17 September 2018 / Accepted: 25 September 2018 / Published: 30 September 2018

(This article belongs to the Special Issue Economic Fitness and Complexity)

Download

Browse Figures

Versions Notes

Abstract

:

Zipf’s, Heaps’ and Taylor’s laws are ubiquitous in many different systems where innovation processes are at play. Together, they represent a compelling set of stylized facts regarding the overall statistics, the innovation rate and the scaling of fluctuations for systems as diverse as written texts and cities, ecological systems and stock markets. Many modeling schemes have been proposed in literature to explain those laws, but only recently a modeling framework has been introduced that accounts for the emergence of those laws without deducing the emergence of one of the laws from the others or without ad hoc assumptions. This modeling framework is based on the concept of adjacent possible space and its key feature of being dynamically restructured while its boundaries get explored, i.e., conditional to the occurrence of novel events. Here, we illustrate this approach and show how this simple modeling framework, instantiated through a modified Pólya’s urn model, is able to reproduce Zipf’s, Heaps’ and Taylor’s laws within a unique self-consistent scheme. In addition, the same modeling scheme embraces other less common evolutionary laws (Hoppe’s model and Dirichlet processes) as particular cases.

Keywords:

innovation dynamics; stylized facts; Zipf’s law; Heaps’ law, Taylor’s law; adjacent possible; Pólya’s Urns; Poisson-Dirichlet processes

1. Introduction

Innovation processes are ubiquitous. New elements constantly appear in virtually all systems and the occurrence of the new goes well beyond what we now call innovation. The term innovation refers to a complex set of phenomena that includes not only the appearance of new elements in a given system, e.g., technologies, ideas, words, cultural products, etc., but also their adoption by a given population of individuals. From this perspective one can distinguish between a personal, or local, experience of the new—for instance when we discover a new favorite writer or a new song—and a global occurrence of the new, i.e., every time something appears that never appeared before—for instance, if we write a new book or write a new song. In all these cases there is something new entering the history of a given system or a given individual.

Given the paramount relevance of innovation processes, it is highly important to grasp their nature and understand how the new emerges in all its possible instantiations. To this end, it is essential to fix a certain number of stylized facts characterizing the overall phenomenology of the new and quantifying its occurrence and its dynamical properties. Here we focus in particular on three basic laws whose general validity has been assessed in virtually all systems displaying innovation. The Zipf’s law [1,2,3,4], quantifying the frequency distribution of elements in a given system, the Heaps’ law [5,6], quantifying the rate at which new elements enter a given system and the Taylor’s law [7], quantifying the intrinsic fluctuations of variables associated to the occurrence of the new. Any basic theory, supposedly close to the actual phenomenology of innovation processes, should be able at least to explain those three laws from first principles. Despite an abundant literature on the subject related to many different disciplines, a clear and self-consistent framework to explain the above-mentioned stylized facts has been missing for a very long time. Many approaches have been proposed so far, often adopting ad-hoc assumptions or attempting to derive the three laws while taking the others for granted. The aim of this paper is that of trying to put order in the often scattered and disordered literature, by proposing a self-consistent framework that, in its simplicity and generality, is able to account for the existence of the three laws from very first principles.

The framework we propose is based on the notion of “Adjacent Possible” and, more generally, on the interplay between what Francois Jacob named the dichotomy between the “actual” and the “possible”, the actual realization of a given phenomenon and the space of possibilities still unexplored. Originally introduced by the famous biologist and complex-systems scientist Stuart Kauffman, the notion of the adjacent possible [8,9] refers to the progressive expansion, or restructuring, of the space of possibilities, conditional to the occurrence of novel events. Based on this early intuition, we recently introduced, in collaboration with Steven Strogatz, a mathematical framework [10,11] to investigate the dynamics of the new via the adjacent possible. The modeling scheme is based on older schemes, named Polya’s urns and it mathematically predicts the notion that “one thing leads to another”, i.e., the intuitive idea, presumably we all have, that innovation processes are non-linear and the conditions for the occurrence of a given event could realize only after something else happened.

It turns out that the mathematical framework encoding the notion of adjacent possible represents a sufficient first-principle scheme to explain the Zipf’s, Heaps’ and Taylor’s laws on the same ground. In this paper we present this approach and we discuss the links it bears with other approaches. In particular, we discuss the relation of our approach with well known stochastic processes, widely studied in the framework of nonparametric Bayesian inference, namely the Dirichlet and the Poisson-Dirichlet processes [12,13,14]. In addition, based on this comparison, a coherent framework emerges where the importance of the adjacent possible scheme appears as crucial to understand the basic phenomenology of innovation processes. Though we can only conjecture that the expansion of the adjacent possible space is also a necessary condition for the validity of the three laws mentioned above, no counterexamples have been found so far that, without a dynamical space of possibilities, that one can use to satisfactorily explain the empirically observed laws.

2. Zipf’s and Heaps’ Laws

2.1. Frequency-Rank Relations: the Estoup-Zipf’s Law

Let us consider a generic text and count the number of occurrences of each word. Now, suppose one repeats the same operation for all the distinct words in a long text, and ranks all the words according to their frequency of occurrence and plots them in a graph showing the number of occurrences vs. the rank. This is what George Kingsley Zipf did [2,3,4] in the 1920s. A more recent analysis of the same behaviour is reported in Figure 1, based on data of the Gutenberg corpus [15].

The existence of straight lines in the log-log plot is the signature of power-law functions of the form:

f (R) \sim R^{- α}

(1)

The original result obtained by Zipf, corresponding to the first slope with

α ≃ 1

, revealed a striking regularity in the way words are adopted in texts: said

f (1)

the frequency of the most frequent word (rank

R = 1

), the frequency of the second most frequent word is

f (1) / 2

, that of the third

f (1) / 3

and so on. For high rankings, i.e., highly infrequent words, one observes a second slope, with an exponent larger than two.

It should be remarked that perhaps the first one to observe the above reported law was Jean-Baptiste Estoup, who was the General Secretary of the Institut Sténographique de France. In his book Gammes sténographiques [1,16], pioneered the investigation of the regularity of word frequencies and observed that the frequency with which words are used in texts appears to follow a power law behaviour. This observation was later acknowledged by Zipf [2] and examined in depth to bring what is also known as Estoup-Zipf’s law. From now on we shall refer to this law as Zipf’s law.

It is also important to remark that Zipf-like behaviours have been observed in a large variety of cases and situations. Zipf itself reported [4] about the distribution of metropolitan districts in 1940 in the USA, as well as service establishments, manufacturers, and retails stores in the USA in 1939. As the years pass, examples and situations where Zipf’s law has been invoked has been steadily growing. For instance, Zipf’s law has been invoked in city populations, the statistics of webpage visits and other internet traffic data, company sizes and other economic data, science citations and other bibliometric data, as well as in scaling in natural and physical phenomena. A thorough account of all these cases is out of the scope of the present paper and we refer to recent reviews and references therein for an account of the latest developments [17,18,19].

2.2. The Innovation Rate: Herdan-Heaps’ Law

Let us now make a step forward and look at a generic text (or, without loss of generality, at a generic sequence of characters) and focus now on the occurrence of the novelties. For a generic text, one can ask when new words, i.e., never occurred before in the text, appear. Now, if one plots the number of new words as a function of the number of words read (which is our measure of the intrinsic time), one gets a plot like that of Figure 2, where one observes two main behaviors.

A linear growth for short times where at the beginning, basically all the words are appearing for the first time. Later on the growth slows down and an asymptotic behavior is observed of the form:

D (N) \sim N^{γ}

(2)

with

γ \in [0, 1]

. In the specific case of Figure 2

γ ≃ 0.45

but the exponent slightly changes from text to text. The relation of Equation (2) is known as Heaps’ law from Harold Stanley Heaps [6], who formulated it in the framework of information retrieval (see also [20]), though its first discovery is due to Gustav Herdan [5] in the framework of linguistics (see also [21,22]). From now onward we shall refer to it as Heaps’ law.

2.3. Zipf’s vs. Heaps’ Laws

In this section we compare the two laws just observed, Zipf’s law for the frequencies of occurrence of the elements in a system and Heaps’ law for their temporal appearance. It has often been claimed that Heaps’ and Zipf’s law are trivially related and that one can derive Heaps’s law once the Zipf’s is known. This is not true in general. It turns out to be true only under the specific hypothesis of random-sampling as follows. Suppose the existence of a strict power-law behaviour of the frequency-rank distribution,

f (R) \sim R^{- α}

, and construct a sequence of elements by randomly sampling from this Zipf distribution

f (R)

. Through this procedure, one recovers a Heaps’ law with the functional form

D (t) \sim t^{γ}

[23,24] with

γ = 1 / α

. In order to do that we need to consider the correct expression for

f (R)

that includes the normalisation factor, whose expression can be derived through the following approximated integral:

\int_{1}^{R_{max}} f (\tilde{R}) d \tilde{R} = 1 .

(3)

Let us now distinguish the two cases. For

α \neq 1

one has

f (R) = \frac{1 - α}{R_{max}^{1 - α} - 1} R^{- α} .

(4)

while for

α = 1

one obtains:

f (R) = \frac{1}{log R_{max}} R^{- 1} .

(5)

When

α > 1

, one can neglect the term

R_{max}^{1 - α}

in Equation (4), and when

α < 1

, one can write

R_{max}^{1 - α} - 1 ≃ R_{max}^{1 - α}

.

Summarizing one has then:

\begin{matrix} α > 1 : & f (R) ≃ (α - 1) R^{- α} . \end{matrix}

(6)

\begin{matrix} α = 1 : & f (R) ≃ \frac{R^{- 1}}{ln R_{max}} . \end{matrix}

(7)

\begin{matrix} 0 < α < 1 : & f (R) ≃ (1 - α) \frac{R^{- α}}{R_{max}^{1 - α}} . \end{matrix}

(8)

We are now interested in estimating the number, D, of distinct elements appearing in the sequence as a function of its length N. To do that, let us consider the entrance of a new element (never appeared before) in the sequence and let the number of distinct elements in the sequence be D after this entrance. This new element will have maximum rank

R_{max} = D

, and frequency

f (R_{max}) = 1 / N

. From Equation (6) we obtain:

f (D) ≃ (α - 1) D^{- α} = \frac{1}{N}

(9)

which, after an inversion gives:

\begin{matrix} D ≃ N^{γ} with & γ = \frac{1}{α} \end{matrix}

(10)

The same reasoning can be extended to generic functional forms for

D (N)

as follows:

\begin{matrix} α > 1 : & f (D) ≃ (α - 1) D^{- α} = \frac{1}{N} . \end{matrix}

(11)

\begin{matrix} α = 1 : & f (D) ≃ \frac{1}{D ln D} = \frac{1}{N} . \end{matrix}

(12)

\begin{matrix} 0 < α < 1 : & f (D) ≃ \frac{1 - α}{D^{1 - α} - 1} D^{- α} = \frac{1}{N} . \end{matrix}

(13)

Inverting these relations, one eventually finds:

\begin{matrix} α > 1 : & D ≃ N^{γ} with γ = 1 / α \end{matrix}

(14)

\begin{matrix} α = 1 : & D ≃ N / ln N with γ ≃ 1 \end{matrix}

(15)

\begin{matrix} 0 < α < 1 : & D ≃ N with γ = 1 . \end{matrix}

(16)

Summarizing, under the hypothesis of random sampling from a frequency rank distribution expressed by a power-law function

f (R) \sim R^{- α}

, one recovers a Heaps’ law

D (N) \sim N^{γ}

with the following relation between

γ

and

α

:

\begin{matrix} α > 1 : & γ = 1 / α \\ 0 < α \leq 1 : & γ = 1 . \end{matrix}

(17)

In [24], it has been demonstrated that finite-size effects can affect the above-seen relationships that happen to be true only for very long sequences. For short enough sequences, one observes a systematic deviation from Equation (17), especially for

α

values close to 1.

Another important observation is now in order. The assumption of random sampling considered above is strong and sometimes unrealistic (e.g., [25]). First of all it implies the a priori existence of Zipf’s law with an infinite support. In addition, the frequency-rank plots one empirically observes are far from featuring a pure power-law behavior. In all those cases, the relation between the Zipf’s law and the Heaps’ law seen above and summarized by Equation (17) happens to hold only when looking at the tail of the Zipf’s plot, i.e., for high ranks (small frequencies) in the frequency-rank plots and long times, i.e., high N, in the plot expressing the Heaps’ law. In a later section we shall also discuss the so-called Taylor’s law that connects the standard deviation s of a random variable (for instance the size D of the dictionary) to its mean

μ

. Simple analytic calculations [26] show that the poissonian sampling of a power-law leads to a Taylor’s law with exponent

1 / 2

, i.e.,

s \propto \sqrt{μ}

. This is not the case for real texts for which one observes an exponent close to 1 [26].

The ensemble of all these facts implies that the explanation of the empirical findings of the Zipf’s and Heaps’ law cannot be done by only deriving one of the laws and deducing the other one accordingly, based on Equation (17). Rather, both Zipf’s and Heaps’ laws and the Taylor’s law should all be derived in the framework of a self-consistent theory. This is precisely the aim of this paper.

3. Urn Model with Triggering

We now introduce a simple modeling scheme able to reproduce both Zipfs’ and Heaps’ laws simultaneously. Crucial for this result is the conditional expansion of the space of possibilities, that we will elucidate in the following. In [8,9], S. Kauffman introduces and discusses the notion of the adjacent possible, which is all those things that are one step away from what actually exists. The idea is that evolution does not proceed by leaps, but moves in a space where each element should be connected with its precursor. The Kauffman’s theoretical concept of adjacent possible, originally discussed in his investigations of molecular and biological evolution, has also been applied to the study of innovation and technological evolution [27,28]. To clarify the concept, let us think about a baby that is learning to talk. We can say almost surely that she will not utter “serendipity” as the first word in her life. More than this, we can safely guess that her first word will be “papa”, or “mama”, or one among a list of few other possibilities. In other words, in the period of lallation, only few words belong to the space of the adjacent possible and can be actualized in the next future. Once the baby has learned how to utter simple words, she can try more sophisticated ones, involving more demanding articulation efforts. In the process of learning, her space of possibilities (her adjacent possible) considerably grows, with the result that guessing a priori the first 100 words learned by a child is much less obvious than guessing which will be the first one.

Here we formalize the idea that by opening up new possibilities, an innovation paves the way for other innovations, explicitly introducing this concept in a Pólya’s urn based model. In particular, we will discuss the simplest version of the model introduced in [10], which we will name Pólya’s urn model with triggering (PUT). The interest of this model lies, on the one hand, in its generality, the only assumptions it makes refer to the general and not system-specific mechanisms for the expansion into the adjacent possible; on the other hand, its simplicity allows to draw analytical solutions.

The model works as follows (please refer to Figure 3). An urn

U

initially contains

N_{0}

distinct elements, represented by balls of different colors. By randomly extracting elements from the urn, we construct a sequence

S

mimicking the evolution of our system (e.g., the sequence of words in a given text). Both the urn and the sequence enlarge during the process: (i) at each time step t, an element

s_{t}

is drawn at random from the urn, added to the sequence, and put back in the urn along with

ρ

additional copies of it (Figure 3A); (ii) iff the chosen element

s_{t}

is new (i.e., it appears for the first time in the sequence

S

),

ν + 1

brand new distinct elements are also added to the urn (Figure 3B). These new elements represent the set of new possibilities opened up by the seed

s_{t}

. Hence

ν + 1

is the size of the new adjacent possible available once an innovation occurs.

Simple asymptotic formulas for the number

D (n)

of distinct elements appearing in the sequence as a function of the sequence’s length n (Heaps’ law), and for the asymptotic power-law behavior of the frequency-rank distribution (Zipf’s law), in terms of the model parameters

ρ

and

ν

can be derived. In order to do so, one can write a recursive formula for

D (n)

as:

D (n + 1) = D (n) + P_{N} (n)

(18)

where we have defined

P_{N} (n)

as the probability of drawing a new ball (never extracted before) at time n (note that we consider intrinsic time, that is we identify the time elapsed with the length of the sequence constructed). The probability

P_{N} (n)

is equal to the ratio (at time n) between the number of elements in the urn never extracted and the total number of elements in the urn. Approximating Equation (18) with its continuous limit, we can write:

\frac{d D}{d n} = \frac{N_{0} + ν D}{N_{0} + (ν + 1) D + ρ n},

(19)

where

N_{0}

is the number of balls, all distinct, initially placed in the urn. This equation can be integrated analytically in the limit of large n, when

N_{0}

can be neglected, by performing a change of variable

z = \frac{D}{n}

. After some algebra (detailed computations can be found in [11] and in Appendix A for an extended model), we find the asymptotic solutions (valid for large n):

\begin{matrix} ρ > ν & \Rightarrow & D (n) \sim {(\frac{ρ - ν}{ρ + 1})}^{\frac{ν}{ρ}} n^{\frac{ν}{ρ}}; \end{matrix}

(20)

\begin{matrix} ρ < ν & \Rightarrow & D (n) \sim \frac{ν - ρ}{ν + 1} n; \end{matrix}

(21)

\begin{matrix} ρ = ν & \Rightarrow & D (n) log D \sim \frac{ν}{ν + 1} n \to D \sim \frac{ν}{ν + 1} \frac{n}{log n} . \end{matrix}

(22)

For the derivation of the Zipf’s law we refer the reader to the SI of [10] and to Appendix B for an alternative derivation based on the continuous approximation. Results contrasting numerical results and theoretical predictions for the Heaps’ and Zipf’s laws are reported in Figure 4.

The Role of the Adjacent Possible: Heaps’ and Zipf’s Laws in the Classic Multicolors Pólya Urn Model

One question that naturally emerges concerns the relevance of the notion of adjacent possible and its conditional growth. One could for instance argue that the same predictions of the PUT model could be replicated having all the possible outcomes of a process immediately available from the outset, instead of appearing progressively through the conditional process related to the very notion of adjacent possible. In order to remove all doubt, we consider an urn initially filled with

N_{0}

distinct colors, with

N_{0}

arbitrarily large, with no other colors entering into the urn during the process of construction of the sequence

S

. This is the Pólya multicolors urn model [29] and we here briefly discuss the Heaps’ and Zipf’s laws emerging from it. Let us thus consider an urn initially containing

N_{0}

balls, all of different colors. At each time step, a ball is withdrawn at random, added to a sequence, and placed back in the urn along with

ρ

additional copies of it. This process corresponds to the one depicted in Figure 3A, that is to the rule of the PUT model in the case that the drawn element is not new.

Note that although in this case the urn does not acquire new colors during the process, we can still study the dynamic of innovation by looking at the entrance of new color in the growing sequence. Let us then consider

N_{0}

very large, so that we can consider a long time interval far from saturation (when there are still many colors in the urn that have not already appeared in

S

). The number of different colors

D (n)

added to the sequence at time n follows the equation (when the continuous limit is taken):

\frac{d D}{d n} = \frac{N_{0} - D (n)}{N_{0} + ρ n}, D (0) = 0 \Rightarrow D (n) = N_{0} [1 - {(1 + \frac{ρ n}{N_{0}})}^{- \frac{1}{ρ}}] .

(23)

We thus obtain that for

ρ n ≪ N_{0}

,

D (n)

follows a linear behaviour (

D (n) ≃ n

), while for large n saturates at

D (n) ≃ N_{0}

, failing to predict the power law (sublinear) growth of new elements. In Figure 5, we report results for both the Heaps’ and Zipf’s laws predicted by the model along with their theoretical predictions, referring the reader to [30] for a detailed derivation of the Zipf’s law. It is evident that a simple exploration of a static, though large, space of possibilities cannot account for the empirical observations summarized by the Zipf’s and the Heaps’s laws.

4. Connection of the Urn Model with Triggering and with Stochastic Processes Featuring Innovation

The PUT model is closely related to well known stochastic processes, widely studied in the framework of nonparametric Bayesian inference, namely the Dirichlet and the Poisson-Dirichlet processes. We will discuss here those processes in terms of their predictive probabilities, referring to excellent reviews [12,13,14] for a complete and formal definition of them.

The problem can be framed in the following way. Given a sequence of events

x_{1}, \dots, x_{n}

, we want to estimate the probability that the next event will be

\tilde{x}

, where

\tilde{x}

can be one of the already seen events

x_{i}

,

i = 1, \dots, n

, or a completely new one, unseen until the intrinsic time n.

4.1. Urn Model with Triggering and the Poisson-Dirichlet Process

Let us first discuss the Poisson-Dirichlet process, whose predictive conditional probability reads:

p (\tilde{x} | x_{1}, \dots, x_{n}; α, θ, p_{0}) = \frac{θ + D α}{θ + n} P_{0} (\tilde{x}) + \sum_{i}^{D} \frac{n_{i} - α}{θ + n} δ_{{\tilde{x}}_{i}, \tilde{x}}

(24)

where

0 \leq α < 1

and

θ > - α

are parameters of the model,

P_{0}

a given continuous probability distribution defined a priori on the possible values the variables

x_{i}

can take, named base probability distribution, and

{\tilde{x}}_{i}

the D distinct values appearing in the sequence

x_{1}, \dots, x_{n}

, respectively with multiplicity

n_{i}

. Let us briefly discuss Equation (24). The first term on the right hand side refers to the probability that

x_{n + 1}

takes a value that has never appeared before, i.e., a novel event. This happens with probability

\frac{θ + D α}{θ + n}

, depending both on the total number n of events seen until time n, and on the total number D of distinct events seen until time n. In this way, in the Poisson-Dirichlet process, the concept that the more novelties that are actualized, the higher the probability of encountering further novelties is implicit. The second term in Equation (24) weights the probability that

x_{n + 1}

equals one of the events that has previously occurred, and differs from a bare proportionality rule when

α > 0

.

The Poisson-Dirichlet process predicts an asymptotic power-law behavior for the number

D (n)

of distinct elements seen as a function of the sequence length n. The exact expression for the expected value of

D (n)

can be found in [12]. Here we report the results obtained under the same approximations made for the urn model with triggering:

\frac{d D}{d n} = \frac{θ + α D}{θ + n}, D (0) = 0,

(25)

that can be solved by separation of variables, leading to:

D (n) \sim \frac{θ^{1 - α} {(θ + n)}^{α}}{α} - \frac{θ}{α} .

(26)

Note that the Poisson-Dirichlet process predicts a sublinear power law behavior for

D (n)

but cannot reproduce a linear growth for it, being only defined for

α < 1

.

The ubiquity of the Poisson-Dirichlet process is due, together with its ability of producing sequences featuring Heaps’ and Zipf’s laws, to the fundamental property of exchangeability [12,31]. This refers to the fact that the probability of a sequence generated by the Poisson-Dirichlet process does not depend on the order of the elements in the sequence:

p (x_{1}, \dots, x_{n}; α, θ, p_{0}) = p (π (x_{1}), \dots, π (x_{n}); α, θ, p_{0})

for any permutation

π

of the sequence elements, so that we can write the joint probability distribution

p (n_{1}, \dots, n_{D}; α, θ, p_{0})

for the number of occurrences of the variables

x_{i}

. Exchangeability is a powerful property related to the de Finetti theorem [32,33]; it is also a strong and sometimes unrealistic assumption on the lack of correlations and causality in the data.

Returning to the PUT model, we observe that the model produces, in general, sequences that are not exchangeable. It recovers exchangeability in a particular case, corresponding to a slightly different definition of rule (i): the drawn element

s_{t}

is put back in the urn along with

ρ

additional copies of it iff

s_{t}

is not new; in the other case (i.e., when we apply rule (ii)),

s_{t}

is put back in the urn along with

\tilde{ρ}

additional copies of it, with

\tilde{ρ} = ρ - (ν + 1)

. In this particular case the PUT model corresponds exactly to the Poisson-Dirichlet process, with

θ = \frac{N_{0}}{ρ}

and

α = \frac{ν}{ρ}

. In this case, at odds with the previously discussed version of the model, the urn acquires the same number of balls at each time step, regardless of whether a novelty occurs. This variant makes the generated sequences exchangeable, but imposes the constraint

ρ \geq (ν + 1)

, and thus in this case we cannot recover the linear growth of

D (n)

; this is the same for the Poisson-Dirichlet process. We demonstrate in Appendix A that the dependence of the power law’s exponents of the Heaps’ and Zipf’s laws on the PUT model’s parameters

ρ

and

ν

reads the same as in Equations (20–22) if we modify rule (i) with any

\tilde{ρ} \geq 0

.

Here we wish to remark that the urn representation of the PUT model allows for straightforward generalizations where correlations can be explicitly taken into account (see for instance [10] for a first step in this direction). In addition, it can be easily rephrased in terms of walks in a complex space (for instance a graph), allowing to consider more complex underlying structures for the space of possibilities (see for instance the SI of [10,34,35]).

4.2. Urn Model with Triggering, Dirichlet Process and Hoppe Model

By setting

α = 0

in Equation (24), we obtain the predictive conditional probability for the Dirichlet process, predicting a logarithmic growth of

D (n)

[12]. Correspondingly, if we chose

ν = 0

in the urn model, we obtain:

\frac{d D}{d n} = \frac{N_{0}}{N_{0} + D + ρ n}

(27)

If we now neglect

D (n)

in the denominator of (27), we can solve in the large limit of large n:

D (n) \sim \frac{N_{0}}{ρ} log (1 + \frac{ρ}{N_{0}} n) .

(28)

The same asymptotic growth of

D (n)

is also found in one of the first models introducing innovation in the framework of Pólya’s urn, namely the Hoppe’s model [36]. The motivation of the Hoppe’s work was to derive the Ewens’ sampling formula [37], describing the allelic partition at equilibrium of a sample from a population evolved according to a discrete time Wright-Fisher process [38,39]. In the Hoppe model, innovations are introduced through a special ball, the “mutator”. In particular, the process starts with only the mutator in the urn, with a mass

θ

. At any time n, a ball is withdrawn with a probability proportional to its mass, and, if the ball is the mutator, it is placed back in the urn along with a ball of a brand new color, with unitary mass, thus increasing the number of different colors present in the urn. Otherwise, the selected ball is placed back in the urn along with another ball of the same color. Writing the recursive formula for

D (n)

and taking the continuous limit, we obtain:

\frac{d D (n)}{d n} = \frac{θ}{θ + n}, D (0) = 0,

(29)

that is exactly Equation (27) with

α = 0

. It predicts a logarithmic increase of new colors in the urn:

D (n) = θ ln (θ + n) - θ ln (θ) = θ ln (1 + \frac{n}{θ}),

(30)

corresponds to Equation (28), by identifying

\frac{N_{0}}{ρ}

with

θ

. Hoppe’s urn scheme is non-cooperative in the sense that one novelty does nothing to facilitate another. In other words, while in the Hoppe’s model a mechanism is already present that allows for the expansion of the space of possibilities, this mechanism is completely independent of the actual realization of a novelty, and fails to reproduce both the Heaps’ and the Zipf’s laws.

5. Fluctuation Scaling (Taylor’s Law)

From Equations (14)–(16), it is clear that randomly sampling a Zipf’s law with a given exponent results in a Heaps’ law with linear and sublinear exponents tuned by the exponent of the Zipf’s. On the other hand, Equations (20)–(22) show that the PUT model is also producing the same Heaps’ exponents with the same relation to the Zipf’s exponent as in the random sampling. Therefore, one legitimate question is whether the PUT is also actually performing a kind of sophisticated random sampling of an underlying Zipf’s law. One possible way to discriminate PUT from a random sampling is to look at the fluctuation scaling, i.e., the Taylor’s law discussed in Section 2.3, which connects the standard deviation s of a random variable to its mean

μ

. Simple analytic calculations [26] show that the Poissonian sampling of a power-law leads to a Taylor’s law with exponent 1/2, i.e.,

s \propto \sqrt{μ}

.

Real text analysis shows instead a Taylor exponent of 1 [26], which points to the obvious conclusion that the process of writing texts is not an uncorrelated choice of words from a fixed distribution. In [26] this was explained by a “topic-dependent frequencies of individual words”. The empirical observation therein, was that the frequency of a given word changes according to the topic of the writing. For example, the term “electron” has a high frequency in physics books and a low frequency in fairy tales, so that its rank is low in the first case and high in the second. The result is that there exist different Zipf’s laws with the same exponent according to the topic and the enhanced variance of the dictionary size is ascribable to this multitude of Zipf’s laws that add a further variability to the sampling process.

In PUT there is certainly no topicality as in real texts. Nevertheless, we find numerically a linear Taylor’s law in the case of sublinear Heaps’ exponents (

ν < ρ

). In PUT, there is no Zipf’s law beforehand: it is built during the process instead and this is sufficient to boost the variance of the dictionary, on average, at any given time.

In Figure 6, we show the numerical results of two simulations of PUT with

ν < ρ

, one with

ν = ρ

and one with

ν > ρ

, in order to cover all the possible cases of Equations (20)–(22), plus the random sampling from a zipfian distribution with exponent

α = - 2

. Besides the interesting linearity of the fluctuation scaling in the case of

ν < ρ

, its behaviour in case of fast growing spaces

ν > ρ

can also be pointed out. In that case, Heaps’ law is linear as shown in Equation (21), and the traditional model of reference is the Yule-Simon model (YSM) [40]. The Yule-Simon model generates a sequence of characters with the following iterative rule. Starting from an initial character, at each time with a constant probability p, a brand new character is chosen while with probability

1 - p

one selects one of the characters already present in the sequence (which implies drawing them with their multiplicity). In this way, in YSM, the rate of growth of different characters is constant and equal to p and this constant rate of innovation yields a linear Heaps’ law. The preferential attachment rule leads to a Zipf’s law with exponent

| α | = 1 - p

. This is consistent with Equation (16) of random sampling and even with Equation (21) of PUT. The difference between YSM and PUT can be appreciated with Taylor’s law. In YSM, new characters appear with probability p so that the average number of different characters at step N is

μ = p N

, and the variance

σ^{2} = N p (1 - p) = μ (1 - p)

as in the binomial distribution. As a result, in YSM one gets the Poissonian result

σ \propto \sqrt{μ}

. In contrast, PUT features numerically an exponent of

≃ 0.58

, i.e., larger than

1 / 2

but still less than 1 (see Figure 6).

Given the intrinsic inability of YSM to accomplish for sub-linear dictionary growths, Zanette and Montemurro [41] proposed a simple variant of it. In this variant (ZM), the rate of introduction of new characters, i.e., p, is not constant any more. It is made instead time-dependent with an ad hoc chosen functional form able to reproduce the right range for the Heaps’ exponents. For a Heaps’ exponent

γ

, the rate of innovation p is chosen proportional to

t^{γ - 1}

. This expedient allows to reproduce both Zipf’s law and, by construction, Heaps’ law. The two mechanisms for Zipf’s and Heaps’ production are independent of each other as in YSM so that we expect for Taylor’s law the same behavior of YSM, i.e., an exponent

0.5

. After all, ZM can be seen as a YSM with a diluting time flow, which might not affect the scaling of the fluctuations of YSM at a given time. In Figure 6 we show that indeed ZM features a Taylor’s exponent of

0.5

(magenta curve).

For the Poisson-Dirichlet and the Dirichlet processes, analytical solutions can be computed for the moments of the probability distribution

P (D (n))

[13,42], yielding the asymptotic exponents respectively 1 and

1 / 2

in the Taylor’s law. Numerical results are given in Figure 6. Note that a non trivial exponent in the Taylor’s law is featured by the Poisson-Dirichlet process, where the probability of a novelty to occur does depend on the number of previous novelties, while the Dirichlet process lacks both properties.

6. Discussion

In this paper we have argued that the notion of adjacent possible is key to explain the occurrence of the Zipf’s, Heaps’ and Taylor’s laws in a very general way. We have presented a mathematical framework, based on the notion of adjacent possible, and instantiated through a Polya’s urn modelling scheme, that accounts for the simultaneous validity of the three laws just mentioned in all their possible regimes.

We think this a very important result that will help in assessing the relevance and the scope of the many approaches proposed so far in the literature. In order to be as clear as possible, let us itemize the key points:

The first point we make is about the many claims made in literature about the possibility to deduce the Heaps’ law by simply sampling a Zipf-like distribution of frequencies of events. Though, as seen above, it is possible to deduce a power-law behaviour for the growth of distinct elements by randomly drawing from a Zipf-like distribution, this procedure does not allow to reproduce the empirical results. It has been conjectured in [26], that texts are subject to a topicality phenomenon, i.e., writers do not sample the same Zipf’s law. This implies that the same word can appear at different ranking positions depending on the specific context. Though this is an interesting point, we think that the deduction of the Heaps’ law from the sampling of a Zipfian distribution is not satisfactory from two different points of view. First of all, the empirical Heaps’ and Zipf’s laws are never pure power-laws. We have seen for instance that for written texts the frequency-rank plot features a double slope. Nevertheless, we have seen that a relation exists between the exponent of the frequency-rank distribution at high ranks (rare words) and the asymptotic exponent of the Heaps’ law. In other words, the behaviour of the rarest words is responsible for the entrance rate of new words (or new items). Even though a pure power-law behaviour was observed, we have shown that the statistics of fluctuations, represented by the Taylor’s law, would not reproduce the empirical results (unless a specific sampling procedure based on the hypothesis of topicality is adopted [26]). The conclusion to be taken is that in general the Heaps’ and the Zipf’s laws are non-trivially related and their explanation should be made based instead on first-principle.
Models featuring a fixed space of possibilities are not able to reproduce the simultaneous occurrence of the three laws. For instance, a multicolor Polya’s urn model [29] does not even produce power-law-like behaviours for the Zipf’s and the Heaps’ laws. It rather features a saturation phenomenon, related to the exploration of the predefined boundaries of the space of possibilities. The conclusion here is that one needs a modelling scheme featuring a space of possibilities with dynamical boundaries, for instance expanding ones.
Models that incorporate the possibility to expand the space of possibilities like the Yule-Simon [40] model or the Hoppe model fail in explaining the empirical results. In the Yule-Simon model, the innovation rate is constant and the the Heaps’ law is reproduced with the trivial unitary exponent. An ad-hoc correction to this has been proposed by Zanette and Montemurro [41], who postulate a sublinear power-law Heaps’s law form the outset, without providing any first-principle explanation for it. In addition, in this case the result is not satisfactory because the resulting time-series does not obey the Taylor’s law, being instead compatible with a series of i.i.d variables. The question is now why this approach is not reproducing Taylor’s law despite the fact that it fixes the expansion of the space of possibilities. In our opinion what is lacking in the scheme by Zanette and Montemurro is the interplay between the preferential attachment mechanism and the exploration of new possibilities. In other words, the triggering effect which is instead a key features of the PUT model (see next item). The situation for the Hoppe model is different [36], i.e., a multicolor Polya’s urn with a special replicator color. In this case, though a self-consistent expansion of the space of possibilities is in place, an explicit mechanism of triggering, in which the realization of an innovation facilitates the realization of further innovations, lacks. In this case the innovation rate is too weak and the Heap’s law features only a logarithmic growth, i.e., it is slower than any power-law sublinear behaviour.
The Polya’s urn model with triggering (PUT) [10], incorporating the notion of adjacent possible, allows to simultaneously account for the three laws, Zipf’s, Heaps’ and Taylor’s, in all their regimes, without ad-hoc or arbitrary assumptions. In this case, the space of possibilities expands conditional to the occurrence of novel events in a way that is compatible with the empirical findings. From the mathematical point of view, the expansion into the adjacent possible solves another issue related to Zipf’s and Heaps’ generative models. In fact, in PUT one can switch with continuity from the sublinear to the linear regime of the dictionary growth and vice-versa and this by tuning one parameter only: the ratio $ν / ρ$ . This ratio is not limited to a ratio of integers. In fact, in the SI of [10] it was demonstrated that the same expressions for the Heaps’ and Zipf’s laws are recovered if one uses parameters $ρ$ and $ν$ extracted from a distribution with fixed means. One possible strategy is to fix an integer $ρ$ while $ν$ can assume any value in the real numbers (in simulations this is a floating point value), and the mantissa can be taken into account by resorting to probabilities. Therefore, it is perfectly sound to state that one switches with continuity from the sublinear regime to the linear one in the interval $| ν / ρ - 1 | < ε$ , with $ε ≪ 1$ , although the rigorous mathematical characterization of the transition is far from being understood.
It should be remarked that the Poisson-Dirichlet process [12,13,14] is also able to explain the three Zipf’s, Heaps’ and Taylor’s laws only in the strict sub-linear regime for the Heaps’ law. It cannot however account for a constant innovation rate as in the PUT modelling scheme. We also point out that the PUT model embraces the Poisson-Dirichlet and the Dirichlet processes as particular cases.

In this paper we highlighted that the simultaneous occurrence of the Zipf’s, Heaps’ and Taylor’s laws can be explained in the framework of the adjacent possible scheme. This implies considering a space of possibilities that expands or gets restructured conditional to the occurrence of a novel event. The Pólya’s urn with triggering features these properties. Poisson-Dirichlet processes can also be said to belong to the adjacent possible scheme. Though no explicit mention is made about the space of possibilities in those schemes, the probability of the occurrence of a novel event closely depends on how many novelties occurred in the past. We recall that the PUT model includes Dirichlet-like processes as particular cases. From this perspective, PUT-like models seem to be good candidates to explain higher order features connected to innovation processes. We conclude by saying that the very notion of adjacent possible, though sufficient to explain the stylized facts of innovation processes, can be only conjecture as a necessary condition for the validity of the three laws mentioned above. No counterexamples have been found so far with which, without a dynamically restructured space of possibilities, one can satisfactorily explain the empirically observed laws.

Author Contributions

Conceptualization, F.T., V.L. and V.D.P.S.; Data curation, F.T. and V.D.P.S.; Formal analysis, F.T. and V.D.P.S.; Funding acquisition, V.L.; Investigation, F.T., V.L. and V.D.P.S.; Methodology, F.T., V.L. and V.D.P.S.; Resources, V.L.; Software, V.D.P.S., F.T.; Validation, F.T., V.L. and V.D.P.S.; Writing—Original Draft, F.T., V.L. and V.D.P.S.; Writing—Review & Editing, F.T., V.L. and V.D.P.S.

Funding

This research was partially funded by the Sony Computer Science Laboratories Paris.

Acknowledgments

We thank S. H. Strogatz for interesting discussions. V.D.P.S. is grateful to M. Osella for interesting discussions on the relevance of Taylor’s law. V.D.P.S. acknowledges the Austrian Research Promotion Agency FFG under grant #857136 for financial support.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

PUT:	Pólya’s Urn with Triggering [10]
PD:	Poisson-Dirichlet process
Dir:	Dirichlet process
RS:	Random Sampling
YSM:	Yule-Simon Model [40]
ZM:	Zanette-Montemurro model [41]

Appendix A. Analytic Derivation of Heaps’ law in the Urn Model with Triggering

We here derive the Heaps’ law for a general variation of the urn model with triggering. For the sake of completeness, we recall here the model: An urn

U

initially contains

N_{0}

distinct elements, represented by balls of different colors. By randomly extracting elements from the urn, we construct a sequence

S

. Both the urn and the sequence enlarge during the process. At each time step t, an element

s_{t}

is drawn at random from the urn: (i) iff the chosen element

s_{t}

is old (i.e., it already appeared in the sequence

S

), it is added to the sequence, and put back in the urn along with

ρ

additional copies of it; (ii) iff the chosen element

s_{t}

is new (i.e., it appears for the first time in the sequence

S

), it is added to the sequence, and put back in the urn along with

\tilde{ρ}

additional copies of it. Further,

ν + 1

brand new distinct elements are also added to the urn.

We can now write the equation governing the growth of the number of distinct elements

D (n)

as a function of the total number n of elements in the sequence (n is also obviously denoting the time step t above):

\frac{d D}{d n} = \frac{N_{0} + ν D}{N_{0} + a D + ρ n},

(A1)

where we have defined

a = ν + 1 - ρ + \tilde{ρ}

.

By defining

z = \frac{D}{n}

and neglecting

N_{0}

, we can write:

\frac{d z}{d n} = \frac{1}{n} \frac{d D}{n} - \frac{D}{n^{2}} \Rightarrow \frac{d D}{d n} = n \frac{d z}{d n} + z = \frac{ν z}{a z + ρ}

(A2)

which gives:

\int_{z (n_{0})}^{z (n)} \frac{a z + ρ}{z (ν - a z - ρ)} d z = \int_{n_{0}}^{n} \frac{d n}{n} .

(A3)

Here we note that by definition

0 \leq z \leq 1

, and

z (n_{0}) = D (n_{0}) / n_{0}

, for a given

n_{0}

such that the solutions we found are valid for any

n \geq n_{0}

. In order to integrate Equation (A3) we need to study the sign of the expression

ν - a z - ρ

. Let us do this by considering separately the case

ρ > ν

and

ρ < ν

, and postponing the computation for

ρ = ν

.

Appendix A.1. Case 1: ρ>ν

In this case, if

a \geq 0

we have

ν - a z - ρ < 0

(and thus obviously

a z + ρ - ν > 0

), while if

a < 0

it exists a

z_{0}

such that

ν - a z - ρ < 0

for

z < z_{0}

. Thus, if

z (n)

is decreasing in n, we can safely perform the integration for any

n \geq n_{0}

, for some

n_{0}

. Let us make this assumption and verify it at the end of the computation. By integrating Equation (A3) we thus obtain:

- log (a z + ρ - ν) (1 + \frac{ρ}{ν - ρ}) ∣_{z (n_{0})}^{z (n)} + \frac{ρ}{ν - ρ} log z ∣_{z (n_{0})}^{z (n)} = log n ∣_{n_{0}}^{n}

(A4)

and solving:

{(a z + ρ - ν)}^{ν} = A n^{ρ - ν} z^{ρ}, A = exp (C), C = log \frac{{(a z (n_{0}) + ρ - ν)}^{ν}}{z {(n_{0})}^{ρ}} - log n_{0}^{ρ - ν} .

(A5)

We can now substitute

z = \frac{D}{n}

, and after some algebra we can write:

D - \frac{a}{A^{\frac{1}{ν}}} D^{\frac{ν}{ρ}} = B n^{\frac{ν}{ρ}}, B = A^{-^{\frac{1}{ρ}}} {(ρ - ν)}^{\frac{ν}{ρ}}

(A6)

that gives the solution:

D (n) = B n^{\frac{ν}{ρ}} + O (n^{\frac{ν^{2}}{ρ^{2}}}) .

(A7)

We observe that

{lim}_{n \to \infty} z (n) = {lim}_{n \to \infty} \frac{D (n)}{n} = 0

monotonically, so that the assumption made above is satisfied.

Appendix A.2. Case 2: ρ<ν

Let us now assume

ν - a z - ρ > 0

and let us verify the assumption at the end. We thus write:

- log (ν - a z - ρ) (1 + \frac{ρ}{ν - ρ}) ∣_{z (n_{0})}^{z (n)} + \frac{ρ}{ν - ρ} log z ∣_{z (n_{0})}^{z (n)} = log n ∣_{n_{0}}^{n} .

(A8)

After similar calculation as in the case

ρ > ν

, we arrive to the relation:

D + \frac{A^{\frac{1}{ν}}}{a} D^{\frac{ρ}{ν}} = \frac{ν - ρ}{a} n, A = exp (C), C = log \frac{{(ν - a z (n_{0}) - ρ)}^{ν}}{z {(n_{0})}^{ρ}} - log n_{0}^{ρ - ν}

(A9)

that gives the solution:

D (n) = \frac{ν - ρ}{a} n - \frac{A^{\frac{1}{ν}}}{a} O (n^{\frac{ρ}{ν}})

(A10)

Note that

a \geq 0

per

ν > ρ

. We already discussed in the main text the case

a = 0

, that has to be treated separately, so we here consider

a > 0

. From (A10) we observe that

z (n) = D (n) / n

is increasing in n:

z (n) = \frac{ν - ρ}{a} - \tilde{z} (n)

with

{lim}_{n \to \infty} \tilde{z} (n) = 0

. Thus

ν - ρ - a z = a \tilde{z} (n) > 0

, with limit zero in the asymptotic limit

n \to \infty

. The initial assumption is thus satisfied in the entire range of the z values.

Appendix B. Analytic Determination of Zipf’s Law in the Continuous Approximation

In the following we derive the expression of Zipf’s exponents for the model of Pólya’s urn with innovations, by exploiting the continuous approximation.

Appendix B.1. Preliminary Considerations

The time evolution of the number of different colors D in the stream can be approximated by the following equation (see Equation (19) with

N_{0} = 1

):

\dot{D} = \frac{1 + ν D}{1 + ρ t + (ν + 1) D} with D (0) = 1 .

(A11)

Putting aside the particular cases

ν = 0

and

ν = ρ

, Equation (A11) can be solved analytically to yield, in the leading terms at large t (see also Appendix A):

D (t) \approx \{\begin{matrix} \frac{ν - ρ}{ν + 1} t & if ν > ρ \\ {(\frac{ρ (ρ - ν)}{ν (ρ - 1) + 2 ρ} t)}^{ν / ρ} & if ν < ρ \end{matrix} with t ≫ 1 .

(A12)

The two regimes given by the relative values of

ν

and

ρ

result in two different Heaps’ exponents

γ

, i.e.,

γ = 1

and

γ = ν / ρ

.

In the denominator of Equation (A11), the total number of balls in the urn appears:

N (t) = 1 + ρ t + (ν + 1) D

, so that we can write:

N (t) = \frac{ν D}{\dot{D}} \approx \frac{ν t}{γ} \approx \{\begin{matrix} ν t if ν > ρ \\ ρ t if ν < ρ \end{matrix} .

(A13)

Appendix B.2. Master Equation

We denote with

N_{k}

the number of balls with a given color occurring k-times in the urn. In particular, we have

\sum_{k} N_{k} = D

. The following master equation can be written for the

N_{k}

:

\frac{\partial N_{k}}{\partial t} = \frac{(k - ρ) N_{k - ρ}}{N (t)} - \frac{k N_{k}}{N (t)} \approx - \frac{ρ}{N (t)} \frac{\partial k N_{k}}{\partial k} \approx - \frac{ρ \dot{D}}{ν D} \frac{\partial k N_{k}}{\partial k} .

(A14)

We introduce now the probability

p_{k}

that a given color appears k-times in the urn, i.e., the corresponding normalized version of the number of occurrences

N_{k}

. In order to have

\sum p_{k} = 1

, we must choose

N_{k} = D p_{k}

. The idea is that, as the time runs, the probabilities

p_{k}

will tend to a stationary distribution, i.e., a distribution independent of t. By substituting

N_{k} = D p_{k}

in Equation (A14), we get

p_{k} = - \frac{ρ}{ν} \frac{\partial k p_{k}}{\partial k} .

(A15)

This equation can be solved easily by substituting

p_{k} \propto k^{- β}

and solving for

β

, which leads to

β = 1 + \frac{ν}{ρ}

(A16)

and conversely the frequency-rank exponent

α = ρ / ν

. Note that, while

γ

depends on the relative values of

ν

and

ρ

,

α

does not. To relate the

p_{k}

distribution in the urn to that of the stream is an easy task.

Appendix B.3. Particular case ν=ρ

In addition, in this case Equation (A11) can be solved analytically with the solution including the Lambert W function. At large values of t, the solution can be approximated as

D (t) \approx \frac{ν}{ν + 1} \frac{t}{log t},

(A17)

so that

N (t) \approx ν t

. Equation (A14) can be written as before as:

\frac{\partial D p_{k}}{\partial t} = - \frac{ν \dot{D}}{ν D} \frac{\partial D k p_{k}}{\partial k} \approx ⟶ p_{k} = - \frac{\partial k p_{k}}{\partial k},

(A18)

which results in

β = 2

and

α = 1

.

Appendix B.4. Particular Case ν=0

This case is identical to the Hoppe’s urn model. When a ball with a brand new color is extracted, exactly one new color enters the urn so that the number of unobserved colors stays the same during the whole dynamics. If we start with one single ball, there will always be only one unobserved color in the urn and this color would have exactly the same function of the black ball with weight one in Hoppe’s model. The equation for the growth of novelties will be:

\dot{D} = \frac{1}{1 + ρ t + D} \overset{t \to \infty}{\approx} \frac{1}{ρ t} ⟶ D \approx \frac{1}{ρ} log t,

(A19)

while the frequency-rank will be decaying exponentially. In order to introduce the equivalent counterpart of the weight of the black ball in the Hoppe’s model, whenever a novelty is extracted, w balls of the same brand new color could be added to the urn.

References

Estoup, J.B. Les Gammes Sténographiques; Institut Sténographique de France: Paris, France, 1916. [Google Scholar]
Zipf, G.K. Relative Frequency as a Determinant of Phonetic Change. Harvard Stud. Class. Philol. 1929, 40, 1–95. [Google Scholar] [CrossRef]
Zipf, G.K. The Psychobiology of Language; Houghton-Mifflin: New York, NY, USA, 1935. [Google Scholar]
Zipf, G.K. Human Behavior and the Principle of Least Effort; Addison-Wesley: Reading, MA, USA, 1949. [Google Scholar]
Herdan, G. Type-Token Mathematics: A Textbook of Mathematical Linguistics; Janua linguarum. Series Maior. No. 4; Mouton en Company: The Hague, The Netherlands, 1960. [Google Scholar]
Heaps, H.S. Information Retrieval-Computational and Theoretical Aspects; Academic Press: Orlando, FL, USA, 1978. [Google Scholar]
Taylor, L. Aggregation, Variance and the Mean. Nature 1961, 189, 732. [Google Scholar] [CrossRef]
Kauffman, S.A. Investigations: The Nature of Autonomous Agents and the Worlds They Mutually Create; SFI Working Papers; Santa Fe Institute: Santa Fe, NM, USA, 1996. [Google Scholar]
Kauffman, S.A. Investigations; Oxford University Press: New York, NY, USA; Oxford, UK, 2000. [Google Scholar]
Tria, F.; Loreto, V.; Servedio, V.D.P.; Strogatz, S.H. The dynamics of correlated novelties. Nat. Sci. Rep. 2014, 4. [Google Scholar] [CrossRef] [PubMed]
Loreto, V.; Servedio, V.D.P.; Tria, F.; Strogatz, S.H. Dynamics on expanding spaces: modeling the emergence of novelties. In Universality and Creativity in Language; Altmann, E., Esposti, M.D., Pachet, F., Eds.; Springer: Cham, Switzerland, 2016; pp. 59–83. [Google Scholar]
Pitman, J. Combinatorial stochastic processes. In Lecture Notes in Mathematics; Springer-Verlag: Berlin, Germany, 2006; Volume 1875, p. x+256. [Google Scholar]
Buntine, W.; Hutter, M. A Bayesian View of the Poisson-Dirichlet Process. arXiv, 2010; arXiv:1007.0296. [Google Scholar]
De Blasi, P.; Favaro, S.; Lijoi, A.; Mena, R.H.; Pruenster, I.; Ruggiero, M. Are Gibbs-Type Priors the Most Natural Generalization of the Dirichlet Process? IEEE Trans. Pattern Anal. Mach. Intel. 2015, 37, 212–229. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hart, M. Project Gutenberg. 1971. Available online: http://www.gutenberg.org/.
Petruszewycz, M. L’histoire de la loi d’Estoup-Zipf: Documents. Math. Sci. Hum. 1973, 44, 41–56. [Google Scholar]
Li, W. Zipf’s Law everywhere. Glottometrics 2002, 5, 14–21. [Google Scholar]
Newman, M.E.J. Power laws, Pareto distributions and Zipf’s law. Contemp. Phys. 2005, 46, 323–351. [Google Scholar] [CrossRef]
Piantadosi, S.T. Zipf’s word frequency law in natural language: A critical review and future directions. Psychon. Bull. Rev. 2014, 21, 1112–1130. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Baeza-Yates, R.; Navarro, G. Block addressing indices for approximate text retrieval. J. Am. Soc. Inf. Sci. 2000, 51, 69–82. [Google Scholar] [CrossRef] [Green Version]
Baayen, R. Word Frequency Distributions; Number v. 1 in Text, Speech and Language Technology; Springer: Dordrecht, The Netherlands, 2001. [Google Scholar]
Egghe, L. Untangling Herdan’s law and Heaps’ Law: Mathematical and informetric arguments. J. Am. Soc. Inf. Sci. Technol. 2007, 58, 702–709. [Google Scholar] [CrossRef]
Serrano, M.A.; Flammini, A.; Menczer, F. Modeling statistical properties of written text. PLoS ONE 2009, 4, e5372. [Google Scholar] [CrossRef] [PubMed]
Lü, L.; Zhang, Z.K.; Zhou, T. Zipf’s law leads to Heaps’ law: Analyzing their relation in finite-size systems. PLoS ONE 2010, 5, e14139. [Google Scholar] [CrossRef] [PubMed]
Cristelli, M.; Batty, M.; Pietronero, L. There is More than a Power Law in Zipf. Sci. Rep. 2012, 2, 812. [Google Scholar] [CrossRef] [PubMed]
Gerlach, M.; Altmann, E.G. Scaling laws and fluctuations in the statistics of word frequencies. New J. Phys. 2014, 16, 113010. [Google Scholar] [CrossRef] [Green Version]
Johnson, S. Where Good Ideas Come From: The Natural History of Innovation; Riverhead Hardcover: New York, NY, USA, 2010. [Google Scholar]
Wagner, A.; Rosen, W. Spaces of the possible: universal Darwinism and the wall between technological and biological innovation. J. R. Soc. Interface 2014, 11, 20131190. [Google Scholar] [CrossRef] [PubMed]
Gouet, R. Strong Convergence of Proportions in a Multicolor P’olya Urn. J. Appl. Probab. 1997, 34, 426–435. [Google Scholar] [CrossRef]
Tria, F. The dynamics of innovation through the expansion in the adjacent possible. Nuovo Cim. C Geophys. Space Phys. C 2016, 39, 280. [Google Scholar]
Pitman, J. Exchangeable and partially exchangeable random partitions. Probab. Theory Relat. Fields 1995, 102, 145–158. [Google Scholar] [CrossRef]
De Finetti, B. La Prévision: Ses Lois Logiques, Ses Sources Subjectives. Annales de l’Institut Henri Poincaré 1937, 17, 1–68. [Google Scholar]
Zabell, S. Predicting the unpredictable. Synthese 1992, 90, 205–232. [Google Scholar] [CrossRef]
Monechi, B.; Ruiz-Serrano, A.; Tria, F.; Loreto, V. Waves of Novelties in the Expansion into the Adjacent Possible. PLoS ONE 2017, 12, e0179303. [Google Scholar] [CrossRef] [PubMed]
Iacopini, I.; Milojević, S.C.V.; Latora, V. Network Dynamics of Innovation Processes. Phys. Rev. Lett. 2018, 120, 048301. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hoppe, F.M. Pólya-like urns and the Ewens’ sampling formula. J. Math. Biol. 1984, 20, 91–94. [Google Scholar] [CrossRef]
Ewens, W. The Sampling Theory of Selectively Neutral Alleles. Theor. Popul. Biol. 1972, 3, 87–112. [Google Scholar] [CrossRef]
Fisher, R.A. The Genetical Theory of Natural Selection; Clarendon Press: Oxford, UK, 1930. [Google Scholar]
Wright, S. Evolution in Mendelian populations. Genetics 1931, 16, 97. [Google Scholar] [PubMed]
Simon, H. On a class of skew distribution functions. Biometrika 1955, 42, 425–440. [Google Scholar] [CrossRef]
Zanette, D.; Montemurro, M. Dynamics of Text Generation with Realistic Zipf’s Distribution. J. Quant. Linguist. 2005, 12, 29. [Google Scholar] [CrossRef]
Yamato, H.; Shibuya, M. Moments of some statistics of pitman sampling formula. Bull. Inf. Cybern. 2000, 32, 1–10. [Google Scholar]

Figure 1. Zipf’s law computed on the Gutenberg corpus [15]. In this case, the exponent of the asymptotic behaviour is

α ≃ 2.25

. Similar behaviours are observed in many other systems.

Figure 1. Zipf’s law computed on the Gutenberg corpus [15]. In this case, the exponent of the asymptotic behaviour is

α ≃ 2.25

. Similar behaviours are observed in many other systems.

Figure 2. Growth of the number of distinct words computed on the Gutenberg corpus of texts [15]. The position of texts in the corpus is chosen at random. In this case

γ ≃ 0.44

. Similar behaviours are observed in many other systems.

Figure 2. Growth of the number of distinct words computed on the Gutenberg corpus of texts [15]. The position of texts in the corpus is chosen at random. In this case

γ ≃ 0.44

. Similar behaviours are observed in many other systems.

Figure 3. Urn model with triggering. (A) An element that had previously been drawn from the urn, is drawn again: the element is added to

S

and it is put back in the urn along with

ρ

additional copies of it. (B) An element that never appeared in the sequence is drawn: the element is added to

S

, put back in the urn along with

ρ

additional copies of it, and

ν + 1

brand new and distinct balls are also added to the urn.

Figure 3. Urn model with triggering. (A) An element that had previously been drawn from the urn, is drawn again: the element is added to

S

and it is put back in the urn along with

ρ

additional copies of it. (B) An element that never appeared in the sequence is drawn: the element is added to

S

, put back in the urn along with

ρ

additional copies of it, and

ν + 1

brand new and distinct balls are also added to the urn.

Figure 4. Heaps’ law (left) and Zipf’s law (right) in the urn model with triggering. Straight lines in the Heaps’ law plots show functions of the form

f (x) = a x^{γ}

with the exponent

γ = ν / ρ

as predicted by the analytic results and confirmed in the numerical simulations. Straight lines in the Zipf’s law plots show functions of the form

f (x) = a x^{- α}

, with

α = γ^{- 1} = ρ / ν

.

Figure 4. Heaps’ law (left) and Zipf’s law (right) in the urn model with triggering. Straight lines in the Heaps’ law plots show functions of the form

f (x) = a x^{γ}

with the exponent

γ = ν / ρ

as predicted by the analytic results and confirmed in the numerical simulations. Straight lines in the Zipf’s law plots show functions of the form

f (x) = a x^{- α}

, with

α = γ^{- 1} = ρ / ν

.

Figure 5. Results for the multicolors Pólya urn model without innovation. Results are reported both from simulations of the process (points) and from the analytical predictions (straight lines), for different values of the initial number of balls

N_{0}

and of the reinforcement parameter

ρ

. Left: Number of different colors

D (n)

added in the sequence as a function of the total number t of extracted balls. The curves from analytical predictions of Equation (23) exactly overlap the simulated points. Right: Frequency-rank distribution. Simulations of the process are here reported along with both: (i) the prediction obtained by inverting the relation

R ≃ = \frac{N_{0}}{Γ (\frac{1}{ρ})} Γ (\frac{1}{ρ}, \frac{N_{0} - 1 - ρ}{ρ} f)

; (ii) the asymptotic solution, valid for

R ≫ 1

, obtained by inverting Equation (23)

f (R) ≃ \frac{ρ}{N_{0}} {[{(1 - \frac{R}{N_{0}})}^{- r} - 1]}^{- 1}

(refer to [30] for their derivation).

Figure 5. Results for the multicolors Pólya urn model without innovation. Results are reported both from simulations of the process (points) and from the analytical predictions (straight lines), for different values of the initial number of balls

N_{0}

and of the reinforcement parameter

ρ

. Left: Number of different colors

D (n)

added in the sequence as a function of the total number t of extracted balls. The curves from analytical predictions of Equation (23) exactly overlap the simulated points. Right: Frequency-rank distribution. Simulations of the process are here reported along with both: (i) the prediction obtained by inverting the relation

R ≃ = \frac{N_{0}}{Γ (\frac{1}{ρ})} Γ (\frac{1}{ρ}, \frac{N_{0} - 1 - ρ}{ρ} f)

; (ii) the asymptotic solution, valid for

R ≫ 1

, obtained by inverting Equation (23)

f (R) ≃ \frac{ρ}{N_{0}} {[{(1 - \frac{R}{N_{0}})}^{- r} - 1]}^{- 1}

(refer to [30] for their derivation).

Figure 6. Taylor’s law in various generative models. (Left panel) Models that do not display a square root dependence of the dictionary standard deviation versus the dictionary itself are shown in color, the others in gray. Curves are listed from top to bottom according to their visual ordering. The Pólya’s urn model with triggering (PUT) shows an exponent one when

ν < ρ

and exponents in the range from 1/2 to ca. 0.87 when

ν \geq ρ

. The Poisson-Dirichlet (PD) process also displays a unity exponent. (Right panel) Models with a square root dependence of the dictionary standard deviation versus the dictionary itself are shown in color, the rest (highlighted in the left panel) in gray. The models are Zanette-Montemurro (ZM), Random Sampling (RS), Yule-Simon Model (YSM) and the Dirichlet process (Dir). All these four as well as the PUT with parameters

ν = 1

and

ρ = 2

, the PD with

α = 1 / 2

and

θ = 1

produce the same Heaps’ law with exponent

γ = 1 / 2

. Each curve is the result of 100 runs of

10^{6}

steps each. The dashed lines with exponents 1/2 and 1 are shown as a guide for the eye.

Figure 6. Taylor’s law in various generative models. (Left panel) Models that do not display a square root dependence of the dictionary standard deviation versus the dictionary itself are shown in color, the others in gray. Curves are listed from top to bottom according to their visual ordering. The Pólya’s urn model with triggering (PUT) shows an exponent one when

ν < ρ

and exponents in the range from 1/2 to ca. 0.87 when

ν \geq ρ

. The Poisson-Dirichlet (PD) process also displays a unity exponent. (Right panel) Models with a square root dependence of the dictionary standard deviation versus the dictionary itself are shown in color, the rest (highlighted in the left panel) in gray. The models are Zanette-Montemurro (ZM), Random Sampling (RS), Yule-Simon Model (YSM) and the Dirichlet process (Dir). All these four as well as the PUT with parameters

ν = 1

and

ρ = 2

, the PD with

α = 1 / 2

and

θ = 1

produce the same Heaps’ law with exponent

γ = 1 / 2

. Each curve is the result of 100 runs of

10^{6}

steps each. The dashed lines with exponents 1/2 and 1 are shown as a guide for the eye.

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tria, F.; Loreto, V.; Servedio, V.D.P. Zipf’s, Heaps’ and Taylor’s Laws are Determined by the Expansion into the Adjacent Possible. Entropy 2018, 20, 752. https://doi.org/10.3390/e20100752

AMA Style

Tria F, Loreto V, Servedio VDP. Zipf’s, Heaps’ and Taylor’s Laws are Determined by the Expansion into the Adjacent Possible. Entropy. 2018; 20(10):752. https://doi.org/10.3390/e20100752

Chicago/Turabian Style

Tria, Francesca, Vittorio Loreto, and Vito D. P. Servedio. 2018. "Zipf’s, Heaps’ and Taylor’s Laws are Determined by the Expansion into the Adjacent Possible" Entropy 20, no. 10: 752. https://doi.org/10.3390/e20100752

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Zipf’s, Heaps’ and Taylor’s Laws are Determined by the Expansion into the Adjacent Possible

Abstract

1. Introduction

2. Zipf’s and Heaps’ Laws

2.1. Frequency-Rank Relations: the Estoup-Zipf’s Law

2.2. The Innovation Rate: Herdan-Heaps’ Law

2.3. Zipf’s vs. Heaps’ Laws

3. Urn Model with Triggering

The Role of the Adjacent Possible: Heaps’ and Zipf’s Laws in the Classic Multicolors Pólya Urn Model

4. Connection of the Urn Model with Triggering and with Stochastic Processes Featuring Innovation

4.1. Urn Model with Triggering and the Poisson-Dirichlet Process

4.2. Urn Model with Triggering, Dirichlet Process and Hoppe Model

5. Fluctuation Scaling (Taylor’s Law)

6. Discussion

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Analytic Derivation of Heaps’ law in the Urn Model with Triggering

Appendix A.1. Case 1: ρ>ν

Appendix A.2. Case 2: ρ<ν

Appendix B. Analytic Determination of Zipf’s Law in the Continuous Approximation

Appendix B.1. Preliminary Considerations

Appendix B.2. Master Equation

Appendix B.3. Particular case ν=ρ

Appendix B.4. Particular Case ν=0

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI