2.4.2. Authors

We select all books from the 20 most prolific authors ( selected from the authors of the 100 most downloaded books in order to avoid authors such as "Anonymous"). For each author, we draw 1000 pairs of books (*i*, *j*) from the same author and compare the distance *Di*,*<sup>j</sup>* with 1000 pairs (*i*, *j*) where *j* comes from a different author. We observe that the distance between books from the same author is consistently smaller than for two books from different authors – not only in terms of the median, but also in terms of a much smaller spread in the values of *Di*,*<sup>j</sup>* (Figure 5). This consistent variability across authors sugges<sup>t</sup> the potential applicability in the study of stylistic differences, such as in problems of authorship attribution [55,56].

**Figure 5.** Distance between books from the same author is significantly smaller than distance between books from different authors. For each author, the boxplots shows the 5, 25, 50, 75, 95-percentile of the distribution of distances from 1000 pairs of books from the same author (**green**) and to a different author (**gray**).
