**6. Conclusions**

In this work we tried to capture two dimensions of morphological complexity. Languages that have a high TTR have the potential of encoding many different functions at the word level, therefore, they produce many different word forms. On the other hand, we proposed that the entropy rate of a sub-word language model could reflect how uncertain are the sequences of morphological elements within a word, languages with high entropy may have many irregular phenomena that are harder to predict than other languages. We were particularly interested in this latter dimension, since there are less quantitative methods, based on raw corpora, for measuring it.

The measures were consistent across two different parallel corpora. Moreover, the correlation between the different complexity measures sugges<sup>t</sup> that our entropy rate approach is capturing a different complexity dimension than measures such as TTR or *CWALS*.

Deeper linguistic analysis is needed; however, corpus-based quantitative measures can complement and deepen the study of morphological complexity.

**Author Contributions:** Conceptualization, V.M. and X.G.-V.; Investigation, X.G.-V. and V.M.; Methodology, X.G.-V. and V.M.; Writing—original draft, X.G.-V. and V.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Swiss Government Excellence Scholarship and the Mexican Council of Science and Technology (CONACYT). Fellowships 2019.0022 and 442471

**Acknowledgments:** We thank the reviewers, and Tanja Samardzic, for their valuable comments.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Complexity Measures for JW300 Corpus**


**Table A1.** Complexity measures on the JW300 corpus (for all languages).

**Table A1.** *Cont*.



**Table A1.** *Cont*.

#### **Appendix B. Complexity Measures Bibles Corpus**

**Table A2.** Complexity measures on the Bibles corpus (for all languages).


**Table A2.** *Cont*.


#### **Appendix C. Complexity Measures Using** *CWALS*

**Table A3.** *CWALS* complexity for the subset of languages shared with the Bibles corpus.



**Appendix D. Complexity Measures Using** *MCC*

#### **Appendix E. Correlation Using Typological Classifications**

For each language in the intersection set between the Bibles and JW300 corpora, we extracted its information about the feature 20A: "Fusion of Selected Inflectional Formatives" (WALS database). We focused on the languages classified as "concatenative" or "isolating". For each corpus, we calculated the correlations within complexity measures for concatenative languages and the correlations within the isolating ones (Tables A5 and A6).

**Table A5.** Spearman's correlation between complexity measures in concatenative and isolating languages (Bibles corpus).


**Table A6.** Spearman's correlation between complexity measures in concatenative and isolating languages (JW300 corpus).


#### **Appendix F. Correlation Using Average Word Length**

We calculate the average word length per language in both corpora. This is formulated as the average of the number of characters per word. Tables A7 and A8 show the correlations of the average word length with the other measures for the Bibles and JW300 corpora, respectively.

**Table A7.** Spearman's correlation between complexity measures and the average length per word in the Bibles corpus.


**Table A8.** Spearman's correlation between complexity measures and the average length per word in the JW300 corpus.

