Advanced Word Game Design Based on Statistics: A Cross-Linguistic Study with Extended Experiments
Abstract
:1. Introduction
2. Scientific Background
- Within this study, we developed a new dataset for the Uzbek language by involving language experts. This can be considered a good contribution because the Uzbek language is a low-resource language and there are not enough publicly open corpora.
- We developed a novel methodology based on vowel–consonant patterns and statistical analysis to design a cubic-oriented MLG for low-resource and high-resource languages.
- The performance of the model was evaluated by comprehensive experiments on 12 datasets.
3. Methodology
3.1. Data Preparation
- (1)
- Collection of words. We extracted the 3–5-letter words that are appropriate for young learners from the largest dictionary book in the Uzbek language [20] and 6–7-letter words from [21], as well as a syntactic tagged corpus for the Uzbek language [22]. We extracted 3–5-letter words for the English language from ESL Forums https://eslforums.com (accessed on 20 September 2024), for the Kazakh language from Ref. [23], for the Russian language from [24], and for Slovenian from Ref. [25]. All words for German, Tatar, Spanish, Kazakh, Malay, Polish, Turkish, and French were extracted from Ref. [21].
- (2)
- Normalization. After the generation of the word list for our dataset, we performed normalization to simplify the coding.
- (a)
- The Uzbek language involves digraphs with diacritic markings, and it is not an easy matter to calculate letter frequency in them. To bypass this, we replaced g’ and o’ with modified characters (ḡ and ō), converting each digraph into a single character. Despite initially being regarded as two single characters in a word, with this substitution, the letter frequency could be determined with a high level of accuracy. For ch and sh, these two digraphs were analyzed and regarded as individual parts (s and h). As c is not part of the Uzbek alphabet, we retained it in its form of a digraph and regarded it as an individual character in the calculation of letter frequency.
- (b)
- The Uzbek alphabet contains a phonetic glottal stop (Tutuq belgisi) character, which is not a letter but is included in the alphabet. Since only 18 words in the corpus of Uzbek contained this character, in an attempt to make the analysis easier, in filtering, we eliminated these words.
- (3)
- Filtering. After the normalization, the Uzbek corpus consisted of 18,523 words. To make our corpus more appropriate for kids and teenagers, 4 volunteer experts from the Uzbek Linguistics Department of Urgench State University were involved, and they helped eliminate infrequent and unfamiliar words not suitable for young learners. As a result, the filtered corpus consisted of 4558 three- to five-letter words and 8456 six- to seven-letter words. All other datasets were generated by filtering words with 3- to 7-letters and removing less frequently occurring words. The datasets are publicly available at https://github.com/UlugbekSalaev/MatchingLetterGame (accessed on 14 February 2025).
3.2. Descriptive Statistics
3.3. Proposed Method
Algorithm 1 Generation of Letter Frequencies () based on a character-level N-gram technique |
Input: An Alphabet A, dataset D, number of character n Output: Dictionary (list of key-value pair) Initialization: Empty dictionary to store the frequency percentage of character(s) N-gram, assign 0 to
|
Algorithm 2 Elimination of infrequent letters () |
Input: An Alphabet A, Dataset D, p threshold Output: A Initialization: Empty
|
Algorithm 3 Generation of Potential Letter Cubes (GPLC) |
Input: (2D array with size of Nx6) Output: A set U, optimized cube configurations (each configuration being a set of N cubes with 6 distinct letters). Initialization: Assign an empty set U to store combinations of cubes, number of cubes N from .
|
Algorithm 4 Generation of the optimized letter cube |
Input: Alphabet A, a list of vowels V, dataset D, number of cubes N Output: Set of ( is a 2D array with size of Nx6) Initialization: Empty list L to store frequent letters, empty list to store the duplicate letters, Empty list to store the sequence of letters, empty 2D array with size of Nx6 to store cubic letters
|
4. Experimental Results
5. Discussion of Results
6. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ACG | Advanced Cubic-oriented Game |
MLG | Matching letter game |
GLF | Generation of Letter Frequencies |
EIL | Elimination of Infrequent letters |
GPLC | Generation of Potential Letter Cubes |
de | German |
en | English |
es | Spanish |
fr | French |
kz | Kazakh |
ms | Malay |
pl | Polish |
ru | Russian |
sl | Slovenian |
tr | Turkish |
tt | Tatar |
uz | Uzbek |
References
- Viera, R.T. Vocabulary knowledge in the production of written texts: A case study on EFL language learners. Rev. Tecnol. ESPOL (RTE) 2017, 30, 89–105. [Google Scholar]
- Alqahtani, M. The importance of vocabulary in language learning and how to be taught. Int. J. Teach. Educ. 2015, 3, 21–34. [Google Scholar] [CrossRef]
- Azar, A.S. The Effect of Games on EFL Learners’ Vocabulary Learning Strategies. Int. J. Basic Appl. Sci. 2012, 1, 252–256. [Google Scholar] [CrossRef]
- Rohani, M.; Pourgharib, B. The Effect of Games on Learning Vocabulary. Int. J. Basic Appl. Sci. 2013, 4, 3540–3543. [Google Scholar]
- Alavi, G.; Gilakjani, A.P. The Effectiveness of Games in Enhancing Vocabulary Learning among Iranian Third Grade High School Students. Malays. J. ELT Res. 2019, 16, 1. [Google Scholar]
- Najjar, M.; Masri, A. The Effect of Using Word Games on Primary Stage Students’ Achievement in English Language Vocabulary in Jordan. Am. Int. J. Contemp. Res. 2014, 4, 144–152. [Google Scholar]
- Bakhsh, S. Using Games as a Tool in Teaching Vocabulary to Young Learners. Engl. Lang. Teach. 2016, 9, 120. [Google Scholar] [CrossRef]
- Huyen, N.; Nga, K. Learning Vocabulary Through Games. Asian EFL J. 2003, 5, 4. [Google Scholar]
- Uberman, A. The use of games for vocabulary presentation and revision. Forum 1998, 36, 20–27. [Google Scholar]
- Shchukina, T.J.; Mardieva, L.A.; Alyokine, T.A. Teaching Russian Language: The Role of Word Formation. In Teacher Education-IFTE 2016, Volume 12. European Proceedings of Social and Behavioural Sciences; Valeeva, R., Ed.; Future Academy, Kazan Federal University: Kazan, Russia, 2016; pp. 190–196. [Google Scholar] [CrossRef]
- Whitney, C. How the brain encodes the order of letters in a printed word: The SERIOL model and selective literature review. Psychon. Bull. Rev. 2001, 8, 221–243. [Google Scholar] [CrossRef] [PubMed]
- Aristides, V.; Monica, G.; Maria, F.; Christos, T. Utilizing NLP Tools for the Creation of School Educational Games. In Educating Engineers for Future Industrial Revolutions. ICL 2020. Advances in Intelligent Systems and Computing; Auer, M.E., Rüütmann, T., Eds.; Springer: Tallinn, Estonia, 2020. [Google Scholar] [CrossRef]
- Mattiev, J.; Salaev, U.; Kavsek, B. Word Game Modeling Using Character-Level N-Gram and Statistics. Mathematics 2023, 11, 1380. [Google Scholar] [CrossRef]
- Salaev, U. UzMorphAnalyser: A morphological analysis model for the Uzbek language using inflectional endings. AIP Conf. Proc. 2024, 3244, 030058. [Google Scholar] [CrossRef]
- Zaitun, M.; Fitri, A.J.E. Big Cube Game: An Instructional Medium Used in Students’ Vocabulary Mastery. J. Engl. Lit. Educ. 2020, 7, 101–106. [Google Scholar]
- Anadón, X.; Sanahuja, P.; Traver, V.J.; Lopez, A.; Ribelles, J. Characterising Players of a Cube Puzzle Game with a Two-Level Bag of Words. In Proceedings of the 29th ACM Conference on User Modeling, Adaptation and Personalization (UMAP ’21), Utrecht, The Netherlands, 21–25 June 2021; pp. 47–53. [Google Scholar] [CrossRef]
- Jiang, L. Word Data Prediction Based on Statistical Method. Trans. Comput. Sci. Intell. Syst. Res. 2024, 5, 1662–1670. [Google Scholar] [CrossRef]
- Vu, N.N.; Linh, P.T.M.; Lien, N.T.H.; Van, N.T.T. Using Word Games to Improve Vocabulary Retention in Middle School EFL Classes. In Proceedings of the 18th International Conference of the Asia Association of Computer-Assisted Language Learning (AsiaCALL–2-2021), Advances in Social Science, Volume 621, Education and Humanities Research, Ho Chi Minh City, Vietnam, 26–27 November 2021; pp. 97–108. [Google Scholar]
- Anugerah, R.; Wijaya, B.; Bunau, E. The use of build-a-sentence cubes game in teaching simple past tense. J. Pendidik. Dan Pembelajaran Khatulistiwa (JPPK) 2016, 5. [Google Scholar] [CrossRef]
- Madvaliyev, A.; Begmatov, E. O’zbek Tilining Imlo Lug‘ati; Mahmudov, N., Ed.; Akadem-nashr: Tashkent, Uzbekistan, 2012. [Google Scholar]
- Goldhahn, D.; Eckart, T.; Quasthoff, U. Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages. In Proceedings of the 8th International Language Resources and Evaluation (LREC’12), Istanbul, Turkey, 21–27 May 2012. [Google Scholar]
- Sharipov, M.; Mattiev, J.; Sobirov, J.; Baltayev, R. Creating a Morphological and Syntactic Tagged Corpus for the Uzbek Language. In Proceedings of the International Conference and Workshop on Agglutanative Language Technologies as a Challenge of Natural Language Processing, ALTNLP 2022, Koper, Slovenia, 6–8 June 2022; pp. 93–98. [Google Scholar]
- Allaberdiev, B.; Matlatipov, G.; Kuriyozov, E.; Rakhmonov, Z. Parallel texts dataset for Uzbek-Kazakh machine translation. Data Brief 2024, 53. [Google Scholar] [CrossRef] [PubMed]
- OpenCorpora: An Open Source Initiative for Building a Free and Comprehensive Corpora for Russian and Other Slavic Languages. Available online: http://opencorpora.org/ (accessed on 15 October 2024).
- Kaja, D.; Simon, K.; Peter, H.; Tomaž, E.; Miro, R.; Špela, A.H.; Jaka, Č.; Luka, K.; Marko, R.-Š. Morphological Lexicon Sloleks 2.0, Slovenian Language Resource Repository CLARIN.SI, ISSN 2820-4042. Available online: http://hdl.handle.net/11356/1230 (accessed on 10 October 2024).
- Jurafsky, D.; Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd ed.; Pearson/Prentice Hall: Upper Saddle River, NJ, USA, 2009; Available online: https://web.stanford.edu/~jurafsky/slp3/ (accessed on 15 January 2025).
- Cox, C.R.; Haebig, E. Child-Oriented Word Associations Improve Models of Early Word Learning. Behav. Res. 2023, 55, 16–37. [Google Scholar] [CrossRef] [PubMed]
de | en | es | fr | kz | ms | pl | ru | sl | tr | tt | uz | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
e | 14.6 | e | 10.4 | a | 15.1 | e | 11.2 | а | 11.8 | a | 16.6 | a | 9.9 | а | 10.8 | a | 11.2 | a | 12.8 | а | 10.9 | a | 14.1 |
t | 8.8 | a | 9.2 | o | 10.0 | a | 8.4 | е | 6.8 | i | 8.3 | o | 6.2 | o | 9.0 | e | 9.3 | e | 8.4 | ы | 6.6 | i | 8.3 |
a | 7.1 | s | 8.5 | e | 9.3 | s | 7.7 | ы | 6.7 | u | 6.9 | i | 6.0 | к | 6.2 | o | 8.5 | i | 7.6 | т | 6.2 | o | 8.2 |
n | 6.7 | o | 7.4 | s | 7.4 | r | 7.1 | т | 6.1 | e | 6.1 | e | 5.5 | р | 6.0 | r | 6.8 | r | 5.6 | е | 6.2 | r | 6.0 |
r | 6.4 | r | 6.0 | r | 6.6 | i | 7.1 | і | 5.2 | r | 5.8 | r | 5.2 | т | 5.9 | i | 6.6 | k | 5.6 | к | 5.7 | l | 5.1 |
s | 5.2 | l | 5.4 | i | 5.6 | o | 6.3 | р | 5.0 | t | 5.8 | u | 4.4 | е | 5.2 | t | 5.3 | n | 5.5 | р | 5.1 | s | 4.9 |
i | 5.2 | t | 5.4 | l | 5.1 | t | 5.7 | н | 4.9 | s | 5.8 | s | 4.3 | л | 5.0 | n | 5.3 | l | 4.9 | н | 5.1 | t | 4.9 |
l | 4.8 | i | 5.4 | t | 4.6 | u | 5.2 | с | 4.6 | k | 5.5 | y | 4.2 | и | 4.8 | k | 5.3 | t | 4.6 | ә | 5.0 | u | 4.8 |
o | 4.6 | d | 4.6 | n | 4.5 | l | 5.0 | л | 3.9 | n | 5.1 | k | 4.2 | с | 4.6 | l | 4.7 | s | 4.1 | и | 4.5 | n | 4.4 |
h | 4.1 | n | 4.3 | c | 4.2 | n | 4.8 | қ | 3.6 | l | 5.0 | n | 4.0 | н | 4.5 | s | 4.0 | m | 3.9 | л | 4.5 | m | 3.7 |
u | 3.5 | c | 4.0 | u | 4.0 | é | 4.5 | у | 3.5 | o | 4.4 | t | 4.0 | у | 3.6 | p | 3.5 | ı | 3.9 | у | 3.8 | q | 3.6 |
g | 3.4 | u | 3.7 | p | 3.3 | c | 3.6 | к | 3.3 | m | 3.9 | m | 3.6 | м | 3.1 | vs. | 3.3 | u | 3.9 | с | 3.6 | k | 3.4 |
m | 3.1 | b | 3.6 | d | 3.2 | m | 3.1 | o | 3.2 | p | 3.4 | d | 3.6 | п | 3.1 | d | 3.2 | d | 3.4 | м | 3.3 | y | 3.1 |
b | 3.0 | p | 3.2 | m | 3.1 | p | 3.0 | д | 3.0 | b | 3.2 | w | 3.5 | в | 3.0 | u | 2.9 | o | 3.0 | г | 2.3 | h | 3.0 |
d | 2.9 | m | 2.9 | b | 2.4 | d | 2.7 | м | 2.9 | d | 2.9 | z | 3.4 | д | 3.0 | m | 2.9 | y | 2.8 | o | 2.3 | b | 2.9 |
f | 2.4 | h | 2.7 | g | 2.1 | g | 2.4 | п | 2.5 | h | 2.6 | p | 3.3 | б | 2.7 | b | 2.6 | ü | 2.3 | я | 2.2 | e | 2.6 |
k | 2.3 | g | 2.5 | vs. | 1.5 | b | 2.2 | ш | 2.3 | g | 2.2 | l | 3.2 | ь | 2.4 | j | 2.4 | z | 2.3 | п | 2.2 | d | 2.5 |
w | 2.0 | f | 2.2 | h | 1.5 | vs. | 2.1 | б | 2.3 | c | 1.5 | b | 2.8 | г | 2.0 | c | 2.3 | b | 2.1 | б | 2.2 | z | 2.5 |
c | 1.8 | y | 2.2 | f | 1.4 | f | 1.7 | з | 2.0 | j | 1.4 | ł | 2.8 | з | 1.9 | g | 1.9 | ş | 1.9 | з | 2.1 | vs. | 2.1 |
p | 1.7 | k | 2.0 | j | 1.1 | h | 1.3 | и | 1.9 | f | 1.1 | c | 2.5 | я | 1.7 | z | 1.9 | p | 1.8 | ч | 2.1 | ō | 1.9 |
z | 1.3 | w | 1.7 | y | 1.0 | y | 0.6 | ж | 1.8 | w | 0.9 | ą | 2.4 | ч | 1.4 | š | 1.7 | ç | 1.6 | ш | 2.1 | p | 1.4 |
ü | 1.0 | vs. | 1.1 | z | 0.8 | x | 0.6 | ұ | 1.8 | y | 0.7 | g | 1.9 | ы | 1.4 | ž | 1.4 | h | 1.5 | д | 2.0 | f | 1.4 |
ä | 1.0 | x | 0.5 | k | 0.6 | j | 0.5 | й | 1.7 | vs. | 0.4 | ę | 1.7 | ш | 1.3 | h | 1.2 | g | 1.3 | ү | 1.9 | g | 1.3 |
v | 0.9 | z | 0.4 | ñ | 0.4 | k | 0.5 | ү | 1.5 | z | 0.3 | j | 1.7 | й | 1.3 | č | 1.1 | ö | 1.2 | ө | 1.5 | j | 1.2 |
ö | 0.5 | j | 0.4 | x | 0.4 | è | 0.4 | ө | 1.4 | x | 0.1 | ó | 1.4 | х | 1.3 | f | 0.8 | f | 1.2 | й | 1.4 | ḡ | 1.1 |
j | 0.5 | q | 0.1 | q | 0.3 | z | 0.3 | ғ | 1.3 | q | 0.1 | ż | 1.2 | ж | 1.2 | vs. | 1.2 | җ | 0.8 | x | 1.0 | ||
ß | 0.4 | w | 0.3 | â | 0.3 | г | 1.0 | f | 0.7 | ё | 1.0 | c | 0.9 | ң | 0.8 | c | 0.6 | ||||||
y | 0.4 | ê | 0.3 | ң | 1.0 | ć | 0.7 | ф | 0.9 | ğ | 0.6 | х | 0.7 | ||||||||||
x | 0.3 | w | 0.2 | ә | 0.9 | h | 0.7 | ц | 0.6 | j | 0.1 | ю | 0.6 | ||||||||||
q | 0.1 | ô | 0.2 | я | 0.6 | ś | 0.5 | ю | 0.5 | ф | 0.6 | ||||||||||||
û | 0.2 | х | 0.4 | ń | 0.2 | щ | 0.3 | в | 0.6 | ||||||||||||||
q | 0.2 | ф | 0.3 | ź | 0.1 | э | 0.3 | э | 0.5 | ||||||||||||||
î | 0.1 | ю | 0.3 | ъ | 0.1 | һ | 0.2 | ||||||||||||||||
ï | 0.1 | в | 0.2 | ь | 0.2 | ||||||||||||||||||
œ | 0.1 | э | 0.1 | ж | 0.1 | ||||||||||||||||||
ç | 0.1 | ь | 0.1 | ц | 0.1 | ||||||||||||||||||
ë | 0.1 | ц | 0.1 | ъ | 0.0 | ||||||||||||||||||
à | 0.1 | ч | 0.1 | ||||||||||||||||||||
æ | 0.1 | һ | 0.1 | ||||||||||||||||||||
щ | 0.1 | ||||||||||||||||||||||
ъ | 0.1 | ||||||||||||||||||||||
ё | 0.1 |
de | en | es | fr | kz | ms | pl | ru | sl | tr | tt | uz | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
e | 17.7 | e | 13.1 | a | 16.2 | e | 12.3 | а | 12.8 | a | 16.5 | a | 10.1 | o | 9.7 | o | 10.0 | a | 12.8 | а | 11.5 | a | 14.6 |
t | 9.4 | s | 8.5 | e | 10.5 | r | 8.7 | ы | 8.5 | n | 8.7 | o | 7.1 | а | 8.8 | a | 10.0 | e | 8.9 | ы | 7.7 | i | 12.8 |
n | 8.5 | r | 7.8 | r | 8.8 | s | 8.2 | е | 7.0 | i | 8.6 | i | 6.9 | е | 7.5 | i | 9.4 | i | 7.7 | е | 6.9 | o | 6.2 |
r | 7.4 | a | 7.4 | o | 8.7 | a | 8.0 | н | 6.6 | e | 8.5 | e | 6.4 | и | 6.7 | e | 9.3 | n | 6.7 | н | 6.5 | r | 6.2 |
a | 5.5 | i | 6.9 | s | 7.5 | i | 7.0 | т | 6.3 | r | 6.2 | r | 5.2 | р | 5.7 | n | 6.6 | r | 6.1 | р | 6.2 | n | 5.9 |
s | 5.3 | t | 6.4 | i | 6.1 | t | 6.3 | і | 5.8 | u | 5.4 | z | 4.7 | т | 5.6 | r | 5.8 | l | 5.6 | л | 5.8 | l | 5.8 |
i | 5.3 | n | 6.1 | n | 5.6 | n | 5.8 | р | 5.3 | t | 5.4 | n | 4.5 | н | 5.6 | l | 5.2 | ı | 5.6 | ә | 5.8 | s | 5.2 |
l | 5.1 | o | 5.3 | t | 4.9 | é | 5.5 | л | 4.5 | s | 5.1 | w | 4.3 | л | 4.7 | t | 4.3 | k | 4.7 | т | 5.6 | t | 4.7 |
g | 4.4 | l | 5.2 | d | 4.4 | o | 5.3 | с | 4.3 | k | 4.7 | y | 4.1 | с | 4.7 | s | 4.1 | t | 4.3 | к | 5.4 | d | 3.9 |
h | 3.8 | d | 5.2 | l | 4.4 | u | 4.8 | д | 4.0 | m | 4.3 | k | 4.1 | к | 4.0 | vs. | 3.9 | m | 4.2 | с | 3.6 | u | 3.3 |
u | 3.0 | c | 3.8 | c | 4.3 | l | 4.5 | қ | 3.3 | l | 4.1 | s | 3.9 | в | 4.0 | k | 3.8 | d | 3.9 | и | 3.5 | h | 3.3 |
b | 2.8 | u | 3.5 | u | 3.5 | c | 4.1 | к | 2.8 | p | 3.3 | c | 3.6 | м | 3.7 | p | 3.7 | s | 3.7 | м | 3.1 | k | 3.1 |
o | 2.5 | g | 3.3 | m | 2.7 | p | 2.9 | у | 2.7 | g | 3.3 | t | 3.5 | у | 3.2 | d | 3.5 | y | 3.6 | г | 3.1 | m | 3.1 |
m | 2.5 | p | 3.0 | p | 2.6 | m | 2.7 | м | 2.7 | d | 3.2 | d | 3.4 | д | 3.1 | j | 3.1 | u | 3.5 | у | 2.6 | g | 2.8 |
d | 2.3 | m | 2.5 | b | 2.1 | d | 2.6 | o | 2.2 | o | 2.9 | u | 3.2 | п | 2.9 | m | 3.1 | o | 2.3 | д | 2.5 | b | 2.6 |
f | 2.3 | h | 2.3 | g | 1.8 | g | 2.1 | б | 2.1 | b | 2.9 | m | 3.1 | ы | 2.8 | z | 2.5 | ü | 2.1 | б | 2.1 | y | 2.5 |
c | 2.2 | b | 2.0 | vs. | 1.5 | vs. | 1.8 | п | 2.0 | h | 2.0 | p | 3.0 | з | 2.0 | u | 2.3 | z | 1.7 | o | 1.9 | e | 2.4 |
k | 2.1 | y | 1.7 | f | 1.0 | b | 1.6 | ғ | 1.7 | y | 1.2 | ł | 2.9 | б | 1.9 | b | 1.9 | b | 1.7 | ш | 1.8 | q | 2.3 |
p | 1.6 | f | 1.7 | j | 0.9 | f | 1.6 | ш | 1.7 | j | 1.0 | l | 2.5 | я | 1.8 | č | 1.6 | ş | 1.6 | з | 1.8 | z | 1.7 |
z | 1.3 | k | 1.2 | h | 0.8 | h | 1.3 | й | 1.6 | c | 1.0 | b | 2.0 | г | 1.6 | g | 1.6 | g | 1.2 | ч | 1.7 | ō | 1.3 |
w | 1.2 | w | 1.2 | z | 0.6 | x | 0.5 | з | 1.6 | f | 0.6 | ą | 1.9 | й | 1.5 | h | 1.3 | ç | 1.2 | п | 1.5 | vs. | 1.3 |
ü | 1.0 | vs. | 1.1 | ñ | 0.3 | q | 0.5 | ж | 1.5 | w | 0.5 | j | 1.7 | х | 1.4 | š | 1.1 | p | 1.0 | ү | 1.5 | p | 1.1 |
ä | 1.0 | x | 0.3 | q | 0.3 | y | 0.4 | и | 1.4 | vs. | 0.3 | g | 1.6 | ч | 1.2 | c | 0.9 | ğ | 1.0 | й | 1.3 | c | 0.9 |
v | 0.7 | z | 0.2 | x | 0.2 | è | 0.4 | ң | 1.4 | z | 0.2 | ę | 1.1 | ь | 1.1 | ž | 0.9 | c | 1.0 | я | 1.3 | f | 0.8 |
ö | 0.5 | j | 0.2 | y | 0.2 | j | 0.3 | г | 1.3 | q | 0.1 | ó | 1.1 | ж | 1.0 | f | 0.3 | h | 1.0 | ө | 1.0 | x | 0.7 |
ß | 0.2 | q | 0.2 | k | 0.1 | k | 0.2 | ұ | 1.1 | x | 0.1 | h | 1.0 | ю | 0.7 | ö | 0.9 | ң | 0.9 | ḡ | 0.7 | ||
j | 0.2 | w | 0.1 | ê | 0.1 | ө | 1.0 | ć | 0.9 | ш | 0.7 | vs. | 0.9 | в | 0.6 | j | 0.6 | ||||||
y | 0.1 | z | 0.1 | ү | 1.0 | ż | 0.8 | ц | 0.6 | f | 0.7 | х | 0.6 | ||||||||||
x | 0.1 | ô | 0.1 | ә | 0.6 | ś | 0.6 | ё | 0.6 | j | 0.1 | җ | 0.6 | ||||||||||
q | 0.1 | ç | 0.1 | я | 0.3 | f | 0.4 | ф | 0.4 | ф | 0.4 | ||||||||||||
â | 0.1 | ф | 0.2 | ń | 0.2 | щ | 0.3 | э | 0.3 | ||||||||||||||
û | 0.1 | х | 0.2 | ź | 0.1 | э | 0.1 | ю | 0.3 | ||||||||||||||
î | 0.1 | в | 0.2 | ъ | 0.1 | ь | 0.2 | ||||||||||||||||
ï | 0.1 | ц | 0.1 | һ | 0.1 | ||||||||||||||||||
w | 0.1 | ь | 0.1 | ж | 0.1 | ||||||||||||||||||
œ | 0.1 | ю | 0.1 | ц | 0.1 | ||||||||||||||||||
ë | 0.1 | э | 0.1 | ъ | 0.1 | ||||||||||||||||||
ч | 0.1 | щ | 0.1 | ||||||||||||||||||||
һ | 0.1 | ||||||||||||||||||||||
ъ | 0.1 | ||||||||||||||||||||||
щ | 0.1 |
Dataset | # of 3-Letter Words | # of 4-Letter Words | # of 5-Letter Words | Total | # of 6-Letter Words | # of 7-Letter Words | Total |
---|---|---|---|---|---|---|---|
de | 439 | 924 | 1763 | 3126 | 2944 | 4100 | 7044 |
en | 1026 | 2499 | 2499 | 6024 | 3295 | 3908 | 7203 |
es | 483 | 1514 | 3541 | 5538 | 2500 | 3707 | 6207 |
fr | 350 | 991 | 2293 | 3634 | 3430 | 4633 | 8063 |
kz | 493 | 1210 | 2836 | 4539 | 4068 | 5371 | 9439 |
ms | 438 | 1129 | 2262 | 3829 | 2275 | 2995 | 5270 |
pl | 350 | 1274 | 3094 | 4718 | 4108 | 5448 | 9556 |
ru | 516 | 1285 | 2507 | 4308 | 4017 | 5068 | 9085 |
sl | 515 | 1598 | 4167 | 6280 | 4198 | 4991 | 9189 |
tr | 417 | 1151 | 2824 | 4392 | 3563 | 5191 | 8754 |
tt | 403 | 1089 | 2574 | 4066 | 4077 | 5401 | 9478 |
uz | 518 | 1165 | 2875 | 4558 | 3774 | 4682 | 8456 |
Dataset | # of Generated Cubes | 3 Letters | 4 Letters | 5 Letters | Total | Max Case |
---|---|---|---|---|---|---|
de | 596 | 91.8 ± 0.6 | 82.9 ± 0.9 | 70.5 ± 1.3 | 77.1 ± 0.9 | 78.9 |
en | 512 | 92.4 ± 0.2 | 92.1 ± 0.4 | 88.4 ± 0.8 | 90.6 ± 0.5 | 91.8 |
es | 525 | 84.0 ± 0.1 | 87.4 ± 0.3 | 86.3 ± 0.9 | 86.4 ± 0.6 | 87.5 |
fr | 489 | 83.6 ± 0.7 | 78.5 ± 0.7 | 74.5 ± 0.9 | 76.5 ± 0.8 | 77.9 |
kz | 598 | 77.4 ± 0.7 | 72.6 ± 0.9 | 55.8 ± 1.2 | 62.6 ± 0.9 | 65.7 |
ms | 503 | 95.6 ± 0.2 | 95.2 ± 0.3 | 90.4 ± 0.8 | 92.4 ± 0.5 | 93.2 |
pl | 591 | 86.3 ± 0.8 | 75.8 ± 1.1 | 58.0 ± 1.2 | 64.9 ± 1.0 | 67.8 |
ru | 617 | 84.8 ± 0.7 | 76.7 ± 1.0 | 56.8 ± 1.8 | 66.1 ± 1.3 | 69.8 |
sl | 544 | 95.4 ± 0.5 | 93.3 ± 0.5 | 84.8 ± 1.1 | 87.8 ± 0.8 | 90.0 |
tr | 578 | 91.2 ± 0.7 | 85.3 ± 0.7 | 69.5 ± 1.1 | 75.7 ± 0.9 | 78.3 |
tt | 600 | 75.1 ± 1.2 | 66.4 ± 1.1 | 47.5 ± 1.0 | 55.3 ± 0.9 | 58.7 |
uz | 577 | 94.0 ± 0.5 | 87.6 ± 0.9 | 73.9 ± 2.0 | 79.7 ± 1.5 | 84.0 |
Dataset | # of Generated Cubes | 3 Letters | 4 Letters | 5 Letters | Total | Max Case |
---|---|---|---|---|---|---|
de | 790 | 96.8 ± 0.3 | 93.4 ± 0.4 | 86.3 ± 1.0 | 89.8 ± 0.6 | 91.4 |
en | 720 | 94.5 ± 0.1 | 94.2 ± 0.3 | 93.5 ± 0.7 | 94.0 ± 0.4 | 94.8 |
es | 740 | 90.5 ± 0.1 | 93.0 ± 0.2 | 91.5 ± 0.7 | 91.8 ± 0.5 | 93.0 |
fr | 687 | 87.4 ± 0.1 | 84.9 ± 0.2 | 86.4 ± 0.3 | 86.1 ± 0.2 | 86.8 |
kz | 792 | 86.5 ± 0.4 | 87.8 ± 0.4 | 78.9 ± 0.9 | 82.1 ± 0.7 | 83.4 |
ms | 728 | 99.6 ± 0.1 | 98.8 ± 0.3 | 95.0 ± 0.9 | 96.7 ± 0.6 | 98.2 |
pl | 791 | 92.4 ± 0.6 | 88.7 ± 0.7 | 78.9 ± 1.2 | 82.5 ± 1.0 | 85.0 |
ru | 797 | 94.2 ± 0.3 | 92.0 ± 0.5 | 85.9 ± 0.8 | 88.7 ± 0.6 | 90.5 |
sl | 750 | 99.2 ± 0.1 | 97.4 ± 0.5 | 93.3 ± 1.0 | 94.8 ± 0.8 | 96.6 |
tr | 770 | 97.4 ± 0.3 | 95.1 ± 0.4 | 87.2 ± 1.0 | 90.3 ± 0.7 | 92.1 |
tt | 773 | 90.6 ± 0.6 | 86.2 ± 0.6 | 78.0 ± 0.8 | 81.5 ± 0.6 | 83.2 |
uz | 761 | 99.4 ± 0.2 | 97.7 ± 0.4 | 93.9 ± 0.8 | 95.5 ± 0.6 | 96.5 |
Dataset | # of Generated Cubes | 6 Letters | 7 Letters | Avg Total | Max Case |
---|---|---|---|---|---|
de | 790 | 73.2 ± 2.1 | 52.9 ± 2.6 | 61.4 ± 2.3 | 67.7 |
en | 720 | 84.9 ± 1.6 | 67.5 ± 2.4 | 75.4 ± 2.0 | 79.7 |
es | 740 | 86.7 ± 1.6 | 70.1 ± 2.5 | 76.8 ± 2.1 | 82.2 |
fr | 687 | 86.1 ± 0.9 | 75.8 ± 1.7 | 80.2 ± 1.3 | 82.8 |
kz | 792 | 64.5 ± 1.4 | 43.3 ± 2.0 | 52.4 ± 1.7 | 56.9 |
ms | 728 | 88.1 ± 1.7 | 72.1 ± 2.3 | 79.0 ± 2.0 | 85.8 |
pl | 791 | 65.9 ± 1.6 | 45.9 ± 2.0 | 54.5 ± 1.8 | 58.3 |
ru | 797 | 70.3 ± 1.1 | 48.8 ± 1.6 | 58.3 ± 1.3 | 61.4 |
sl | 750 | 87.3 ± 1.7 | 71.5 ± 2.6 | 78.7 ± 2.2 | 83.6 |
tr | 770 | 73.4 ± 1.3 | 49.7 ± 2.0 | 59.3 ± 1.7 | 62.9 |
tt | 773 | 63.7 ± 1.1 | 43.3 ± 1.8 | 52.0 ± 1.4 | 54.7 |
uz | 761 | 84.9 ± 1.5 | 66.4 ± 2.3 | 74.7 ± 1.9 | 77.8 |
Dataset | # of Generated Cubes | 6 Letters | 7 Letters | Avg Total | Max Case |
---|---|---|---|---|---|
de | 996 | 75.2 ± 3.1 | 58.5 ± 3.8 | 65.5 ± 3.5 | 76.0 |
en | 977 | 87.3 ± 1.4 | 71.2 ± 2.8 | 78.6 ± 2.2 | 84.8 |
es | 971 | 94.0 ± 1.2 | 83.7 ± 2.3 | 87.8 ± 1.8 | 92.4 |
fr | 921 | 90.5 ± 1.0 | 84.7 ± 1.9 | 87.2 ± 1.5 | 90.3 |
kz | 1014 | 80.7 ± 1.1 | 64.3 ± 1.9 | 71.5 ± 1.5 | 74.0 |
ms | 986 | 89.9 ± 1.9 | 75.5 ± 3.2 | 81.7 ± 2.6 | 88.3 |
pl | 1003 | 88.0 ± 0.8 | 76.6 ± 1.3 | 81.5 ± 1.1 | 84.3 |
ru | 1023 | 86.5 ± 1.1 | 74.6 ± 1.8 | 79.8 ± 1.4 | 83.2 |
sl | 974 | 94.7 ± 0.7 | 86.1 ± 1.4 | 90.0 ± 1.1 | 92.8 |
tr | 987 | 83.9 ± 1.1 | 66.9 ± 1.6 | 73.8 ± 1.4 | 76.4 |
tt | 1001 | 80.6 ± 1.0 | 68.5 ± 1.6 | 73.7 ± 1.3 | 75.4 |
uz | 991 | 92.2 ± 1.2 | 78.9 ± 2.6 | 84.8 ± 1.9 | 88.7 |
Dataset | 6–7 Letters | 3–5 Letters | ||
---|---|---|---|---|
9 Cubes | 8 Cubes | 8 Cubes | 7 Cubes | |
de | 0.6962 | 0.5316 | 0.3197 | 0.2798 |
en | 0.6420 | 0.4042 | 0.3260 | 0.2737 |
es | 0.5455 | 0.4413 | 0.3214 | 0.2885 |
fr | 0.5389 | 0.6160 | 0.3962 | 0.4126 |
kz | 0.7133 | 0.6570 | 0.4005 | 0.5063 |
ms | 0.7697 | 0.2471 | 0.1608 | 0.2819 |
pl | 0.7588 | 0.5665 | 0.3090 | 0.1833 |
ru | 0.6798 | 0.6512 | 0.4210 | 0.4813 |
sl | 0.6780 | 0.5591 | 0.3924 | 0.3796 |
tr | 0.5905 | 0.5379 | 0.3469 | 0.2871 |
tt | 0.8481 | 0.8310 | 0.5322 | 0.3065 |
uz | 0.7402 | 0.4367 | 0.3263 | 0.2874 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Mattiev, J.; Salaev, U.; Kavšek, B. Advanced Word Game Design Based on Statistics: A Cross-Linguistic Study with Extended Experiments. Big Data Cogn. Comput. 2025, 9, 103. https://doi.org/10.3390/bdcc9040103
Mattiev J, Salaev U, Kavšek B. Advanced Word Game Design Based on Statistics: A Cross-Linguistic Study with Extended Experiments. Big Data and Cognitive Computing. 2025; 9(4):103. https://doi.org/10.3390/bdcc9040103
Chicago/Turabian StyleMattiev, Jamolbek, Ulugbek Salaev, and Branko Kavšek. 2025. "Advanced Word Game Design Based on Statistics: A Cross-Linguistic Study with Extended Experiments" Big Data and Cognitive Computing 9, no. 4: 103. https://doi.org/10.3390/bdcc9040103
APA StyleMattiev, J., Salaev, U., & Kavšek, B. (2025). Advanced Word Game Design Based on Statistics: A Cross-Linguistic Study with Extended Experiments. Big Data and Cognitive Computing, 9(4), 103. https://doi.org/10.3390/bdcc9040103