3.2.2. Results

There are 44,722 noun and 45,159 verb tokens in the analyzed sample. With 5817 unique types, nouns are a far more lexically diverse category than verbs with 2574 types. By contrast, the 116,960 function word tokens are represented by 144 unique types.

Figure 3 shows the token distribution of the three largest grammatical categories. We can see that both verbs and nouns have a closer fit to power law compared to geometric distribution: *R*2*pl* = 0.976 and *<sup>R</sup>*2*geom* = 0.701 for verbs; *R*2*pl* = 0.971 and *<sup>R</sup>*2*geom* = 0.772 for nouns.

**Figure 3.** Word frequency distributions of nouns, verbs, and function words in the Buckeye Corpus [32] show that the substantially smaller (compared to nouns) set of verbs has a closer fit to power law distribution, indicating more aggregation. The shape of the distribution in function words suggests that function words form a natural empirical distribution.

By contrast, the 144 unique function words (*ntokens* = 11, 696) show an almost perfect fit to geometric *R*2*pl* = 0.796, *<sup>R</sup>*2*geom* = 0.992. A separate analysis shows a better fit to geometric over power law in distributions of determiners (*n* = 16, *<sup>R</sup>*2*geom* = 0.953, *R*2*pl* = 0.830), pronouns (*n* = 28, *<sup>R</sup>*2*geom* = 0.957, *R*2*pl* = 0.741), and prepositions/subordinating conjunctions (*n* = 78, *<sup>R</sup>*2*geom* = 0.983, *R*2*pl* = 0.863). The aggregated set of function words, however, improves the fit to geometric.
