*3.2. Divergence D*α

To test the relationship between the sample size and *D*α for different α-values, we computed *D*α for a "text" that consists of the first *n* = 2*k* word tokens, a "text" that consists of the last *n* = 2*k* word tokens for each version of our database for *k* = 6, 7, ... , 26, and took averages. As for *H*α above, we then calculated the Spearman correlation between the sample size and *D*α for different minimum sample sizes. It is worth pointing out that the idea here is that the "texts" come from the same population, i.e., all *Der Spiegel* articles, so one should expect that with growing sample sizes, *D*α should fluctuate around 0 with no systematic relationship between *D*α and the sample size. Table 3 summarizes the results, while Figure 4B visualizes the convergence pattern. For all settings, there is a strong monotonic relationship between the sample size and *D*α that passes the permutation test in almost every case. For α = 0.25, the Spearman correlation coefficients are positive. This seems to be due to the fact that *H*<sup>α</sup>=*0.25* is dominated by word types from the lower end of the frequency spectrum (cf. Table 1). Because, for example, word types that only occur once contribute almost half of *H*<sup>α</sup>=0.25. Those word types then either appear in the first 2*k* or in the last 2*k* word tokens.


**Table 3.** Spearman correlation between the sample size and *D*α for different α-values \*.

\* An asterisk indicates that the corresponding correlation coefficient passed the permutation test at *p* < 0.001. For minimum sample sizes above 219, an exact permutation test is calculated.

The results demonstrate that the larger the sample sizes the larger *D*α (cf. the pink line in Figure 4B). For = 0.75, a similar pattern is observed for smaller sample sizes (cf. the orange line in Figure 4 B). However, at around *k* = 15, the pattern changes. For *k* ≥ 15, there is a perfect monotonic negative relationship between *D*<sup>α</sup>=*0.75* and the sample size. Surprisingly, there is a perfect monotonic negative relationship for all settings for α ≥ 1.00, even if we restrict the calculation to relatively large sample sizes. However, the corresponding values are very small. For instance, *D*<sup>α</sup>=*2.00* = 7.91 × 10−<sup>8</sup> for *n* = 224, *D*<sup>α</sup>=*2.00* = 4.08 × 10−<sup>8</sup> for *n* = 225, and *D*<sup>α</sup>=*2.00* = 1.379 × 10−<sup>8</sup> for *n* = 226. One might object that this systematic sample size dependence is practically irrelevant. In the next section, we show that, unfortunately, this is not the case.
