An Information Theoretic Approach to Symbolic Learning in Synthetic Languages
Abstract
:1. Introduction
2. Synthetic Language Symbols
2.1. Aspects of Symbolization
2.2. Zipf–Mandelbrot–Li Symbolization
2.3. Maximum Intelligibility Symbolization
3. Learning Synthetic Language Symbols
A Linguistic Constrained EM Symbolization Algorithm (LCEM)
4. Example Results
4.1. Authorship Classification
4.2. Symbol Learning Using an LCEM Algorithm
4.3. Potential Translation of Animal Behavior into Human Language
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Derivation of Zipf–Mandelbrot–Li Probabilistic Symbolization Algorithm
Appendix B. Derivation of Intelligibility Maximization (MaxIntel) Algorithm
Appendix C. Derivation of LCEM Symbolization Algorithm
References
- Piantadosi, S.T.; Fedorenko, E. Infinitely productive language can arise from chance under communicative pressure. J. Lang. Evol. 2017, 2, 141–147. [Google Scholar] [CrossRef]
- Back, A.D.; Angus, D.; Wiles, J. Determining the Number of Samples Required to Estimate Entropy in Natural Sequences. IEEE Trans. Inf. Theory 2019, 65, 4345–4352. [Google Scholar] [CrossRef]
- Back, A.D.; Angus, D.; Wiles, J. Transitive Entropy—A Rank Ordered Approach for Natural Sequences. IEEE J. Sel. Top. Signal Process. 2020, 14, 312–321. [Google Scholar] [CrossRef]
- Shannon, C.E. A Mathematical Theory of Communication (Parts I and II). Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
- Shannon, C.E. A Mathematical Theory of Communication (Part III). Bell Syst. Tech. J. 1948, 27, 623–656. [Google Scholar] [CrossRef]
- Shannon, C.E. Prediction and Entropy of Printed English. Bell Syst. Tech. J. 1951, 30, 50–64. [Google Scholar] [CrossRef]
- Barnard, G. Statistical calculation of word entropies for four western languages. IRE Trans. Inf. Theory 1955, 1, 49–53. [Google Scholar] [CrossRef]
- Herrera, J.; Pury, P. Statistical keyword detection in literary corpora. Eur. Phys. J. B 2008, 63, 135–146. [Google Scholar] [CrossRef]
- Wang, Q.; Suen, C.Y. Analysis and Design of a Decision Tree Based on Entropy Reduction and Its Application to Large Character Set Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1984, 6, 406–417. [Google Scholar] [CrossRef]
- Kim, J.; André, E. Emotion Recognition Based on Physiological Changes in Music Listening. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 2067–2083. [Google Scholar] [CrossRef]
- Shore, J.E.; Gray, R. Minimum Cross-Entropy Pattern Classification and Cluster Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1982, 4, 11–17. [Google Scholar] [CrossRef]
- Lee, H.K.; Kim, J.H. An HMM-Based Threshold Model Approach for Gesture Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1999, 21, 961–973. [Google Scholar]
- Shekar, B.H.; Kumari, M.S.; Mestetskiy, L.; Dyshkant, N. Face recognition using kernel entropy component analysis. Neurocomputing 2011, 74, 1053–1057. [Google Scholar] [CrossRef]
- Hampe, J.; Schreiber, S.; Krawczak, M. Entropy-based SNP selection for genetic association studies. Hum. Genet. 2003, 114, 36–43. [Google Scholar] [CrossRef] [PubMed]
- Li, Y.; Xiang, Y.; Deng, H.; Sun, Z. An Entropy-based Index for Fine-scale Mapping of Disease Genes. J. Genet. Genom. 2007, 34, 661–668. [Google Scholar] [CrossRef]
- Allen, B.; Kon, M.; Bar-Yam, Y. A New Phylogenetic Diversity Measure Generalizing the Shannon Index and Its Application to Phyllostomid Bats. Am. Nat. 2009, 174, 236–243. [Google Scholar] [CrossRef] [Green Version]
- Rao, C. Diversity and dissimilarity coefficients: A unified approach. Theor. Popul. Biol. 1982, 21, 24–43. [Google Scholar] [CrossRef]
- Fuhrman, S.; Cunningham, M.J.; Wen, X.; Zweiger, G.; Seilhamer, J.J.; Somogyi, R. The application of Shannon entropy in the identification of putative drug targets. Biosystems 2000, 55, 5–14. [Google Scholar] [CrossRef]
- Max, J. Quantizing for minimum distortion. IRE Trans. Inf. Theory 1960, 6, 7–12. [Google Scholar] [CrossRef]
- Farvardin, N.; Modestino, J. Optimum quantizer performance for a class of non-Gaussian memoryless sources. IEEE Trans. Inf. Theory 1984, 30, 485–497. [Google Scholar] [CrossRef]
- Gray, R.; Gray, A.; Rebolledo, G.; Shore, J. Rate-distortion speech coding with a minimum discrimination information distortion measure. IEEE Trans. Inf. Theory 1981, 27, 708–721. [Google Scholar] [CrossRef]
- Gray, R. Vector quantization. IEEE ASSP Mag. 1984, 1, 4–29. [Google Scholar] [CrossRef]
- Gill, M.K.; Kaur, R.; Kaur, J. Vector Quantization based Speaker Identification. Int. J. Comput. Appl. 2010, 4, 1–4. [Google Scholar] [CrossRef]
- Liu, A.H.; Tu, T.; Lee, H.Y.; Lee, L.S. Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 7259–7263. [Google Scholar]
- Toda, T.; Black, A.W.; Tokuda, K. Voice Conversion Based on Maximum-Likelihood Estimation of Spectral Parameter Trajectory. IEEE Trans. Audio Speech Lang. Process. 2007, 15, 2222–2235. [Google Scholar] [CrossRef]
- Kohonen, T. Learning Vector Quantization. In Self-Organizing Maps; Springer: Berlin/Heidelberg, Germany, 1995; pp. 175–189. [Google Scholar]
- Huang, Z.; Weng, C.; Li, K.; Cheng, Y.C.; Lee, C.H. Deep learning vector quantization for acoustic information retrieval. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 1350–1354. [Google Scholar]
- Shlezinger, N.; Chen, M.; Eldar, Y.C.; Poor, H.V.; Cui, S. UVeQFed: Universal Vector Quantization for Federated Learning. IEEE Trans. Signal Process. 2021, 69, 500–514. [Google Scholar] [CrossRef]
- Koch, T.; Vazquez-Vilar, G. A rigorous approach to high-resolution entropy-constrained vector quantization. IEEE Trans. Inf. Theory 2018, 64, 2609–2625. [Google Scholar] [CrossRef] [Green Version]
- van den Oord, A.; Vinyals, O.; Kavukcuoglu, K. Neural Discrete Representation Learning. Adv. Neural Inf. Process. Syst. 2017, 30, 6309–6318. [Google Scholar]
- Niu, B.; Cao, X.; Wei, Z.; He, Y. Entropy Optimized Deep Feature Compression. IEEE Signal Process. Lett. 2021, 28, 324–328. [Google Scholar] [CrossRef]
- Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef] [Green Version]
- Back, A.D.; Wiles, J. Entropy Estimation Using a Linguistic Zipf-Mandelbrot-Li Model for Natural Sequences. Entropy 2021, 23, 1100. [Google Scholar] [CrossRef]
- Morvai, G.; Weiss, B. On universal algorithms for classifying and predicting stationary processes. Probab. Surv. 2021, 18, 77–131. [Google Scholar] [CrossRef]
- Debowski, L. Is Natural Language a Perigraphic Process? The Theorem about Facts and Words Revisited. Entropy 2018, 20, 85. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lowie, W.M.; Verspoor, M.H. Individual Differences and the Ergodicity Problem. Lang. Learn. 2019, 69, 184–206. [Google Scholar] [CrossRef]
- Ziv, J.; Hershkovitz, Y. Another look at universal data compression. In Proceedings of the 1994 IEEE International Symposium on Information Theory, Trondheim, Norway, 27 June–1 July 1994. [Google Scholar]
- Zipf, G. The Psycho-Biology of Language: An Introduction to Dynamic Philology; Houghton Mifflin: Cambridge, MA, USA, 1935. [Google Scholar]
- Li, W. Random texts exhibit Zipf’s-law-like word frequency distribution. IEEE Trans. Inf. Theory 1992, 38, 1842–1845. [Google Scholar] [CrossRef] [Green Version]
- Li, W. Zipf’s Law Everywhere. Glottometrics 2002, 5, 14–21. [Google Scholar]
- Corral, Á.; Boleda, G.; Ferrer-i-Cancho, R. Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts. PLoS ONE 2015, 10, 1–23. [Google Scholar] [CrossRef] [Green Version]
- Ferrer-i-Cancho, R.; Solé, R.V. The Small-World of Human Language. Proc. R. Soc. Lond. B 2001, 268, 2261–2265. [Google Scholar] [CrossRef] [Green Version]
- Piantadosi, S.T. Zipf’s word frequency law in natural language: A critical review and future directions. Psychon. Bull. Rev. 2014, 21, 1112–1130. [Google Scholar] [CrossRef] [Green Version]
- Booth, A.D. A Law of occurrences for words of low frequency. Inf. Control 1967, 10, 386–393. [Google Scholar] [CrossRef] [Green Version]
- Montemurro, M.A. Beyond the Zipf-Mandelbrot law in quantitative linguistics. Physica A 2001, 300, 567–578. [Google Scholar] [CrossRef] [Green Version]
- Mandelbrot, B. The Fractal Geometry of Nature; W. H. Freeman: New York, NY, USA, 1983. [Google Scholar]
- Peperkamp, S. Phonological acquisition: Recent attainments and new challenges. Lang. Speech 2003, 46, 87–113. [Google Scholar] [CrossRef] [PubMed]
- Flipsen, P. Measuring the intelligibility of conversational speech in children. Clin. Linguist. Phon. 2006, 20, 303–312. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Gurevich, N.; Scamihorn, S.L. Speech-Language Pathologists’ Use of Intelligibility Measures in Adults with Dysarthria. Am. J. Speech-Lang. Pathol. 2017, 26, 873–892. [Google Scholar] [CrossRef]
- Gooskens, C. The contribution of linguistic factors to the intelligibility of closely related languages. J. Multiling. Multicult. Dev. 2007, 28, 445–467. [Google Scholar] [CrossRef]
- Hillers, K.J. Crossover interference. Curr. Biol. 2004, 14, R1036–R1037. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kay, N.M. Rerun the tape of history and QWERTY always wins. Res. Policy 2013, 42, 1175–1185. [Google Scholar] [CrossRef]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum Likelihood from Incomplete Data via the EM Algorithm. J. R. Stat. Soc. Ser. B Methodol. 1977, 39, 1–22. [Google Scholar]
- Chakravarty, P.; Cozzi, G.; Ozgul, A.; Aminian, K. A novel biomechanical approach for animal behaviour recognition using accelerometers. Methods Ecol. Evol. 2019, 10, 802–814. [Google Scholar] [CrossRef] [Green Version]
- Trepka, E.; Spitmaan, M.; Bari, B.A.; Costa, V.D.; Cohen, J.Y.; Soltani, A. Entropy-based metrics for predicting choice behavior based on local response to reward. Nat. Commun. 2021, 12, 6567. [Google Scholar] [CrossRef]
- Owoeye, K.; Musolesi, M.; Hailes, S. Characterization of Animal Movement Patterns using Information Theory: A Primer. bioRxiv 2021, 311241. [Google Scholar] [CrossRef]
- Kadota, M.; White, E.J.; Torisawa, S.; Komeyama, K.; Takagi, T. Employing relative entropy techniques for assessing modifications in animal behavior. PLoS ONE 2011, 6, e28241. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Butail, S.; Mwaffo, V.; Porfiri, M. Model-free information-theoretic approach to infer leadership in pairs of zebrafish. Phys. Rev. E 2016, 93, 042411. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jescheniak, J.D.; Levelt, W.J.M. Word frequency effects in speech production: Retrieval of syntactic information and of phonological form. J. Exp. Psychol. Learn. Mem. Cogn. 1994, 20, 824–843. [Google Scholar] [CrossRef]
- Bovée, A.G. Teaching Vocabulary by the Direct Method. Mod. Lang. J. 1919, 4, 63–72. [Google Scholar] [CrossRef]
- Matamoros-González, J.A.; Rojas, M.A.; Romero, J.P.; Vera-Quiñonez, S.; Soto, S.T. English language teaching approaches: A comparison of the grammar-translation, audiolingual, communicative, and natural approaches. Theory Pract. Lang. Stud. 2017, 7, 965–973. [Google Scholar] [CrossRef] [Green Version]
- Sdobnikov, V. In Defense of Communicative-functional Approach to Translation. Procedia Soc. Behav. Sci. 2016, 231, 92–98. [Google Scholar] [CrossRef] [Green Version]
- Coupé, C.; Oh, Y.M.; Dediu, D.; Pellegrino, F. Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche. Sci. Adv. 2019, 5, eaaw2594. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hao, S.; Paul, M.J. An Empirical Study on Crosslingual Transfer in Probabilistic Topic Models. Comput. Linguist. 2020, 46, 95–134. [Google Scholar] [CrossRef]
- Zoph, B.; Yuret, D.; May, J.; Knight, K. Transfer Learning for Low-Resource Neural Machine Translation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; Association for Computational Linguistics: Austin, TX, USA, 2016; pp. 1568–1575. [Google Scholar]
- Newmeyer, F.J. Possible and Probable Languages: A Generative Perspective on Linguistic Typology; Oxford University Press: Oxford, UK, 2005. [Google Scholar]
- Altmann, G. Word class diversification of Arabie verbal roots. In Diversification Processes in Language: Grammar; Rothe, U., Ed.; Hagen: Rottmann, Germany, 1991; pp. 57–59. [Google Scholar]
- Ziegler, A. Word class frequencies in Brazilian-Portuguese press texts. J. Quant. Linguist. 1998, 5, 269–280. [Google Scholar] [CrossRef]
- Liang, J.; Liu, H. Noun distribution in natural languages. Pozn. Stud. Contemp. Linguist. 2013, 49, 509–529. [Google Scholar] [CrossRef]
- Naseem, T.; Snyder, B.; Eisenstein, J.; Barzilay, R. Multilingual Part-of-Speech Tagging Two Unsupervised Approaches. J. Artif. Intell. Res. 2009, 36, 341–385. [Google Scholar] [CrossRef]
- Petrov, S.; Das, D.; McDonald, R. A Universal Part-of-Speech Tagset. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, 23–25 May 2012; European Language Resources Association (ELRA): Istanbul, Turkey, 2012; pp. 2089–2096. [Google Scholar]
- Carnie, A. Syntax: A Generative Introduction; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
- Silverman, B.W. Density Estimation for Statistics and Data Analysis; Chapman and Hall: London, UK, 1986. [Google Scholar]
- Wasserman, L. Estimating the CDF and Statistical Functionals. In All of Nonparametric Statistics; Springer: New York, NY, USA, 2004. [Google Scholar]
- Wiener, H. Structural determination of paraffin boiling points. J. Am. Chem. Soc. 1947, 69, 17–20. [Google Scholar] [CrossRef] [PubMed]
- Doyle, J.; Graver, J. Mean distance in a graph. Discret. Math. 1977, 17, 147–154. [Google Scholar] [CrossRef] [Green Version]
- Drezner, Z. Computation of the Trivariate Normal Integral. Math. Comput. 1994, 63, 289–294. [Google Scholar] [CrossRef]
- Drezner, Z.; Wesolowsky, G.O. On the Computation of the Bivariate Normal Integral. J. Stat. Comput. Simul. 1989, 35, 101–107. [Google Scholar] [CrossRef]
- Genz, A. Numerical Computation of Rectangular Bivariate and Trivariate Normal and t Probabilities. Stat. Comput. 2004, 14, 251–260. [Google Scholar] [CrossRef]
- Genz, A.; Bretz, F. Numerical Computation of Multivariate t Probabilities with Application to Power Calculation of Multiple Contrasts. J. Stat. Comput. Simul. 1999, 63, 361–378. [Google Scholar] [CrossRef]
- Genz, A.; Bretz, F. Comparison of Methods for the Computation of Multivariate t Probabilities. J. Comput. Graph. Stat. 2002, 11, 950–971. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Back, A.D.; Wiles, J. An Information Theoretic Approach to Symbolic Learning in Synthetic Languages. Entropy 2022, 24, 259. https://doi.org/10.3390/e24020259
Back AD, Wiles J. An Information Theoretic Approach to Symbolic Learning in Synthetic Languages. Entropy. 2022; 24(2):259. https://doi.org/10.3390/e24020259
Chicago/Turabian StyleBack, Andrew D., and Janet Wiles. 2022. "An Information Theoretic Approach to Symbolic Learning in Synthetic Languages" Entropy 24, no. 2: 259. https://doi.org/10.3390/e24020259
APA StyleBack, A. D., & Wiles, J. (2022). An Information Theoretic Approach to Symbolic Learning in Synthetic Languages. Entropy, 24(2), 259. https://doi.org/10.3390/e24020259