2.4.3. Time

We compare the distance *Di*,*<sup>j</sup>* between pairs of books *i*, *j* taken each from a 20-year time period *ti*, *tj* ∈ {1800 − 1820, ... , 1980 − <sup>2000</sup>}. In Figure 6, we show the distance between two time windows *Dti*,*tj* by averaging over each 1000 pairs of books. We observe that the average distance increases with increasing separation between the time periods. However, we emphasize that we only observe a substantial increase in *Dti*,*tj* for large separation between *ti* and *tj* and later time periods (after 1900). This could be caused by the rough approximation of the publication year and a potential change in composition of the SPGC after 1930 due to copyright laws. In fact, the observed effects are likely a mixture of temporal and topical variability, because the topical composition of PG over time is certainly not uniform. This suggests the limited applicability of PG books for diachronic studies without further filtering (such as subject/bookshelf). Other resources such as the COHA corpus might be more adequate in this case, although potentially a more genre-balanced version of SPGC could be created using the provided metadata.

**Figure 6.** Distance between books increases with their time separation. Average and standard error of the distance between 1000 pairs of books, where the two books in each pair is drawn from two different 20-year time intervals. We fix the first interval and continuously increase the second time interval.
