Genre Classification of Books in Russian with Stylometric Features: A Case Study
Abstract
:1. Introduction
- RQ1: Do stylometric features improve genre classification accuracy?
- RQ2: What genres are easier to classify?
- RQ3: Does contrastive learning perform better for genre classification than fine-tuned transformer models and traditional models?
- RQ4: Does removing punctuation decrease classification accuracy for genre classification?
- RQ5: Does a transformer model pre-trained on Russian perform better than a multi-lingual transformer model?
2. Related Work
3. The SONATA Dataset
- A wide, up-to-date, and legitimate selection of titles, and agreements with leading Russian and international publishers;
- Clear genre labels and a wide selection of genres;
- The option to freely and legally download a significant number of text samples in .txt format;
- A convenient site structure that allows automated data collection.
3.1. The Genres
3.2. Data Processing and Statistics
4. Binary and Multi-Class Genre Classification
4.1. The Pipeline
4.2. Preprocessing and Data Setup
4.3. Text Representations
4.3.1. Sentence Embeddings
4.3.2. BOW Vectors with tf-idf and n-Gram Weights
4.3.3. Stylometric Features
4.4. Classification Models
4.4.1. Traditional Models
4.4.2. Voting Ensemble of Traditional Models
4.4.3. Fine-Tuned Transformers
4.4.4. Dual Contrastive Learning
5. Experimental Results
5.1. Hardware Setup
5.2. Software Setup
5.3. Models and Representations
5.4. Metrics
5.5. Binary Genre Classification Results
5.5.1. Traditional Models
5.5.2. The Voting Model
5.5.3. Fine-Tuned Transformers
5.5.4. Dual Contrastive Learning
5.6. Multi-Class Genre Classification Results
5.6.1. Traditional Models
5.6.2. The Voting Model
5.6.3. Fine-Tuned Transformer Models
5.6.4. Dual Contrastive Learning
5.7. Punctuation Importance
6. Conclusions
7. Limitations and Future Research Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
NLP | Natural Language Processing |
RQ | Research Question |
RF | Random Forest |
XGB | eXtreme Gradient Boost |
LR | Logistic Regression |
CL | Contrastive Learning |
RNN | Recurrent Neural Network |
SVM | Support Vector Machine |
BERT | Bidirectional Encoder Representations from Transformers |
R | Recall |
P | Precision |
F1 | F1 measure |
Appendix A
Appendix A.1. The List of Stylometric Features
Lexical Features |
---|
type-token ratio for words lemmas, content words, function words, content words types, function words types, nouns in plural, nouns in singular, proper names, personal names, animate nouns, inanimate nouns, neutral nouns, feminine nouns, masculine nouns, feminine proper nouns, masculine proper nouns, surnames, given names, flat multiwords expressions, direct objects, indirect objects, nouns in Nominative case, nouns in Genitive case, nouns in Dative case, nouns in Accusative case, nouns in Instrumental case, nouns in Locative case, qualitative adj positive, relative adj, qualitative comparative adj, qualitative superlative adj, direct adjective, indirect adjective, punctuation, dots, comma, semicolon, colon, dashes, numerals, relative pronouns, indexical pronouns, reflexive pronoun, posessive pronoun, negative pronoun, positive adverbs, comparative adverbs, superlative adverbs |
Part-of-Speech Features |
---|
verbs, nouns, adjectives, adverbs, determiners, interjections, conjunctions, particles, numerals, prepositions, pronouns, code-switching, number of words in narrative sentences, number of words in negative sentences, number of words in parataxis sentences, number of words in sentences that do not have any root verbs, words in sentences with quotation marks, number of words in exclamatory sentences, number of words in interrogative sentences, number of words in general questions, number of words in special questions, number of words in alternative questions, number of words in tag questions, number of words in elliptic sentences, number of positionings, number of words in conditional sentences, number of words in imperative sentences, number of words in amplified sentences |
Grammar Features |
---|
root verbs in imperfect aspect, all verbs in imperfect aspect, active voice, root verbs in perfect form, all verbs in perfect form, verbs in the present tense, indicative mood, imperfect aspect, verbs in the past tense, indicative mood, imperfect aspect, verbs in the past tense, indicative mood, perfect aspect, verbs in the future tense, indicative mood, perfect aspect, verbs in the future tense, indicative mood, imperfect aspect, simple verb forms, verbs in the future tense, indicative mood, complex verb forms, verbs in infinitive, verbs in the passive form, transitive verbs, intransitive verbs, impersonal verbs, passive participles, active participles, adverbial perfect participles, adverbial imperfect participles |
Appendix A.2. The List of Russian Stopwords
Stopwords (Russian) |
и, в, вo, не, чтo, oн, на, я, с, сo, как, а, тo, все, oна, так, егo, нo, да, ты, к, у, же, вы, за, бы, пo, тoлькo, ее, мне, былo, вoт, oт, меня, еще, нет, o, из, ему, теперь, кoгда, даже, ну, вдруг, ли, если, уже, или, ни, быть, был, негo, дo, вас, нибудь, oпять, уж, вам, ведь, там, пoтoм, себя, ничегo, ей, мoжет, oни, тут, где, есть, надo, ней, для, мы, тебя, их, чем, была, сам, чтoб, без, будтo, чегo, раз, тoже, себе, пoд, будет, ж, тoгда, ктo, этoт, тoгo, пoтoму, этoгo, какoй, сoвсем, ним, здесь, этoм, oдин, пoчти, мoй, тем, чтoбы, нее, сейчас, были, куда, зачем, всех, никoгда, мoжнo, при, накoнец, два, oб, другoй, хoть, пoсле, над, бoльше, тoт, через, эти, нас, прo, всегo, них, какая, мнoгo, разве, три, эту, мoя, впрoчем, хoрoшo, свoю, этoй, перед, инoгда, лучше, чуть, тoм, нельзя, такoй, им, бoлее, всегда, кoнечнo, всю, между |
Translation |
and, in, in the, not, that, he, on, I, with, with, like, and, then, all, she, so, his, but, yes, you, to, at, already, you (plural), behind, would, by, only, her, to me, was, here, from, me, yet, no, about, to him, now, when, even, well, suddenly, whether, if, already, or, neither, to be, was, him, before, to you, ever, again, already, you (plural), after all, there, then, oneself, nothing, to her, can, they, here, where, there is, need, her, for, we, you (singular), them, than, was, oneself, without, as if, of what, time, also, to oneself, under, will be, what, then, who, this, of that, therefore, of this, what kind, completely, him, here, in this, one, almost, my, by, her, now, were, where, why, all, never, can, at, finally, two, about, other, even if, after, above, more, that, through, these, us, about, all, what kind of, many, whether, three, this, my, however, well, her own, this, before, sometimes, better, a bit, that, cannot, such, to them, more, always, of course, whole, between |
Appendix A.3. BOW Vector Statistics for the No-Punctuation SONATA Data Sample
Char n-Grams | Word n-Grams | ||
---|---|---|---|
Genre | Vector Sizes | Vector Sizes | tf-idf Vector Size |
n = [1, 2, 3] | n = [1, 2, 3] | ||
action | 10,593 | 35,521 | 17,923 |
adventure | 9085 | 18,236 | 11,506 |
children’s | 11,647 | 27,262 | 14,581 |
classic | 11,907 | 27,456 | 15,905 |
contemporary | 11,844 | 35,396 | 19,132 |
detective | 11,750 | 34,963 | 18,532 |
fantasy | 9334 | 35,635 | 18,044 |
non-fiction | 13,900 | 34,900 | 17,867 |
romance | 10,867 | 33,398 | 16,553 |
science-fiction | 9590 | 35,824 | 18,799 |
short-stories | 9298 | 23,673 | 13,442 |
all | 23,902 | 315,812 | 87,706 |
Appendix A.4. Changing the Number of Samples
Char n-Grams | Word n-Grams | ||
---|---|---|---|
Genre | Vector Sizes | Vector Sizes | tf-idf Vector Size |
n = [1, 2, 3] | n = [1, 2, 3] | ||
action | 12,104 | 23,470 | 13,037 |
adventure | 11,845 | 18,024 | 11,108 |
children’s | 11,803 | 23,115 | 12,414 |
classic | 13,332 | 23,692 | 14,159 |
contemporary | 12,956 | 23,747 | 14,167 |
detective | 11,680 | 24,236 | 13,564 |
fantasy | 10,910 | 24,235 | 13,639 |
non-fiction | 14,435 | 24,656 | 13,676 |
romance | 10,852 | 22,598 | 12,048 |
science-fiction | 11,866 | 24,234 | 14,016 |
short-stories | 12,853 | 20,803 | 11,874 |
all | 26,192 | 222,487 | 69,362 |
Char n-Grams | Word n-Grams | ||
---|---|---|---|
Genre | Vector Sizes | Vector Sizes | tf-idf Vector Size |
n = [1, 2, 3] | n = [1, 2, 3] | ||
action | 16,721 | 77,373 | 31,126 |
adventure | 13,319 | 29,283 | 16,241 |
children’s | 15,911 | 58,789 | 24,921 |
classic | 18,091 | 62,025 | 29,327 |
contemporary | 19,068 | 84,345 | 36,132 |
detective | 16,964 | 93,493 | 35,765 |
fantasy | 15,759 | 95,349 | 37,162 |
non-fiction | 23,288 | 90,602 | 35,625 |
romance | 15,637 | 88,178 | 32,476 |
science-fiction | 17,351 | 95,400 | 39,043 |
short-stories | 16,684 | 46,214 | 21,788 |
all | 37,228 | 687,100 | 139,054 |
Representation | Classifier | Sample Size | Sample Size | Sample Size |
---|---|---|---|---|
N = 50 Acc | N = 100 Acc | N = 150 Acc | ||
SE | RF | 0.3645 | 0.3375 ↓ | 0.3462 ↑ |
SE | LR | 0.3785 | 0.4293 ↑ | 0.4095 ↓ |
SE | XGB | 0.2617 | 0.2978 ↑ | 0.3163 ↑ |
SE + stylometry | RF | 0.3458 | 0.3400 ↓ | 0.3374 ↓ |
SE + stylometry | LR | 0.3738 | 0.4367 ↑ | 0.4130 ↓ |
SE + stylometry | XGB | 0.2710 | 0.3127 ↑ | 0.3603 ↑ |
char n-grams | RF | 0.2710 | 0.2333 ↓ | 0.2882 ↑ |
char n-grams | LR | 0.2477 | 0.2705 ↑ | 0.2953 ↑ |
char n-grams | XGB | 0.1729 | 0.2432 ↑ | 0.3005 ↑ |
char n-grams + stylometry | RF | 0.2336 | 0.2531 ↑ | 0.2882 ↑ |
char n-grams + stylometry | LR | 0.2477 | 0.2506 ↑ | 0.3146 ↑ |
char n-grams + stylometry | XGB | 0.2290 | 0.2382 ↑ | 0.3040 ↑ |
n-grams | RF | 0.2570 | 0.2333 ↓ | 0.2794 ↑ |
n-grams | LR | 0.3084 | 0.2754 ↓ | 0.2988 ↑ |
n-grams | XGB | 0.1168 | 0.1712 ↑ | 0.2320 ↑ |
n-grams + stylometry | RF | 0.2757 | 0.2878 ↑ | 0.3076 ↑ |
n-grams + stylometry | LR | 0.3037 | 0.2779 ↓ | 0.2917 ↑ |
n-grams + stylometry | XGB | 0.2523 | 0.2233 ↓ | 0.2812 ↑ |
tfidf | RF | 0.2383 | 0.2432 ↑ | 0.2865 ↑ |
tfidf | LR | 0.2991 | 0.2903 ↓ | 0.3409 ↑ |
tfidf | XGB | 0.0981 | 0.1439 ↑ | 0.2021 ↑ |
tfidf + stylometry | RF | 0.2523 | 0.2804 ↑ | 0.2917 ↑ |
tfidf + stylometry | LR | 0.2430 | 0.2531 ↑ | 0.3093 ↑ |
tfidf + stylometry | XGB | 0.2664 | 0.2134 ↓ | 0.2654 ↑ |
Appendix A.5. Full Results of Traditional Models for the Multi-Class Genre Classification Task
Representation | Classifier | P | R | F1 | Acc |
---|---|---|---|---|---|
SE | RF | 0.3221 | 0.3310 | 0.3113 | 0.3275 |
SE | LR | 0.4289 | 0.4332 | 0.4264 | 0.4293 |
SE | XGB | 0.3154 | 0.3019 | 0.2997 | 0.3027 |
SE + stylometry | RF | 0.3361 | 0.3182 | 0.3003 | 0.3176 |
SE + stylometry | LR | 0.4415 | 0.4471 | 0.4386 | 0.4367 |
SE + stylometry | XGB | 0.3082 | 0.3075 | 0.2961 | 0.2978 |
char n-grams | RF | 0.2163 | 0.2315 | 0.2034 | 0.2333 |
char n-grams | LR | 0.2865 | 0.2650 | 0.2711 | 0.2705 |
char n-grams | XGB | 0.2180 | 0.2373 | 0.2188 | 0.2357 |
char n-grams + stylometry | RF | 0.2150 | 0.2436 | 0.2080 | 0.2407 |
char n-grams + stylometry | LR | 0.2694 | 0.2471 | 0.2550 | 0.2506 |
char n-grams + stylometry | XGB | 0.2444 | 0.2480 | 0.2357 | 0.2457 |
n-grams | RF | 0.2055 | 0.2494 | 0.2062 | 0.2333 |
n-grams | LR | 0.3004 | 0.2866 | 0.2800 | 0.2754 |
n-grams | XGB | 0.2011 | 0.1771 | 0.1600 | 0.1712 |
n-grams + stylometry | RF | 0.2962 | 0.2911 | 0.2617 | 0.2878 |
n-grams + stylometry | LR | 0.2956 | 0.2868 | 0.2784 | 0.2779 |
n-grams + stylometry | XGB | 0.2373 | 0.2349 | 0.2190 | 0.2233 |
tfidf | RF | 0.2234 | 0.2332 | 0.1939 | 0.2283 |
tfidf | LR | 0.2996 | 0.2990 | 0.2372 | 0.2903 |
tfidf | XGB | 0.2071 | 0.1964 | 0.1794 | 0.1911 |
tfidf + stylometry | RF | 0.2503 | 0.2617 | 0.2308 | 0.2581 |
tfidf + stylometry | LR | 0.2325 | 0.2549 | 0.2190 | 0.2531 |
tfidf + stylometry | XGB | 0.2187 | 0.2289 | 0.1975 | 0.2233 |
Appendix A.6. The Python Script Used to Download Books from knigogo.net
- The url_download_for_each_book function takes a URL (in this case, a page with links to free books) and retrieves the HTML content. It then parses the HTML to extract URLs that link to book download pages, specifically those that match a pattern (starts with https://knigogo.net/knigi/ (accessed on 1 January 2024) and end with /#lib_book_download).
- The url_text_download_for_each_book function takes the list of book download URLs obtained in the previous step and retrieves the HTML content of each page. It then parses these pages to extract URLs of the actual text files.
- The download_url function attempts to download the content of a given URL and returns the content if successful.
- The download_book function receives a text file URL, a book ID, and a save path. It downloads the text file’s content and saves it locally as a .txt file in the specified directory.
Appendix A.7. Validation on Texts from a Different Source
Genre | Representation | Classifier | F1 | Acc |
---|---|---|---|---|
science-fiction | SE | RF | 0.6900 | 0.6900 |
science-fiction | SE | LR | 0.7097 | 0.7100 |
science-fiction | SE | XGB | 0.6387 | 0.6400 |
science-fiction | SE + stylometry | RF | 0.7698 | 0.7700 |
science-fiction | SE + stylometry | LR | 0.6995 | 0.7000 |
science-fiction | SE + stylometry | XGB | 0.6297 | 0.6300 |
science-fiction | char n-grams | RF | 0.6394 | 0.6400 |
science-fiction | char n-grams | LR | 0.6808 | 0.6900 |
science-fiction | char n-grams | XGB | 0.6673 | 0.6700 |
science-fiction | char n-grams + stylometry | RF | 0.7796 | 0.7800 |
science-fiction | char n-grams + stylometry | LR | 0.6784 | 0.6900 |
science-fiction | char n-grams + stylometry | XGB | 0.5900 | 0.5900 |
science-fiction | n-grams | RF | 0.6970 | 0.7000 |
science-fiction | n-grams | LR | 0.7100 | 0.7100 |
science-fiction | n-grams | XGB | 0.6052 | 0.6100 |
science-fiction | n-grams + stylometry | RF | 0.7300 | 0.7300 |
science-fiction | n-grams + stylometry | LR | 0.7100 | 0.7100 |
science-fiction | n-grams + stylometry | XGB | 0.6096 | 0.6100 |
science-fiction | tfidf | RF | 0.6532 | 0.6600 |
science-fiction | tfidf | LR | 0.6800 | 0.6800 |
science-fiction | tfidf | XGB | 0.5512 | 0.5600 |
science-fiction | tfidf + stylometry | RF | 0.7093 | 0.7100 |
science-fiction | tfidf + stylometry | LR | 0.6255 | 0.6300 |
science-fiction | tfidf + stylometry | XGB | 0.6394 | 0.6400 |
Genre | Representation | F1 | Acc |
---|---|---|---|
science-fiction | SE | 0.6800 | 0.6800 |
science-fiction | SE + stylometry | 0.6999 | 0.7000 |
science-fiction | char n-grams | 0.6862 | 0.6900 |
science-fiction | char n-grams + stylometry | 0.7383 | 0.7400 |
science-fiction | n-grams | 0.6768 | 0.6800 |
science-fiction | n-grams + stylometry | 0.6995 | 0.7000 |
science-fiction | tfidf | 0.6305 | 0.6400 |
science-fiction | tfidf + stylometry | 0.6753 | 0.6800 |
References
- Kochetova, L.; Popov, V. Research of Axiological Dominants in Press Release Genre based on Automatic Extraction of Key Words from Corpus. Nauchnyi Dialog. 2019, 6, 32–49. [Google Scholar] [CrossRef]
- Lagutina, K.V. Classification of Russian texts by genres based on modern embeddings and rhythm. Model. I Anal. Informatsionnykh Sist. 2022, 29, 334–347. [Google Scholar] [CrossRef]
- Houssein, E.H.; Ibrahem, N.; Zaki, A.M.; Sayed, A. Semantic protocol and resource description framework query language: A comprehensive review. Mathematics 2022, 10, 3203. [Google Scholar] [CrossRef]
- Romanov, A.; Kurtukova, A.; Shelupanov, A.; Fedotova, A.; Goncharov, V. Authorship identification of a Russian-language text using support vector machine and deep neural networks. Future Int. 2020, 13, 3. [Google Scholar] [CrossRef]
- Fedotova, A.; Romanov, A.; Kurtukova, A.; Shelupanov, A. Authorship attribution of social media and literary Russian-language texts using machine learning methods and feature selection. Future Int. 2021, 14, 4. [Google Scholar] [CrossRef]
- Embarcadero-Ruiz, D.; Gómez-Adorno, H.; Embarcadero-Ruiz, A.; Sierra, G. Graph-based siamese network for authorship verification. Mathematics 2022, 10, 277. [Google Scholar] [CrossRef]
- Kessler, B.; Nunberg, G.; Schütze, H. Automatic detection of text genre. arXiv 1997, arXiv:cmp-lg/9707002. [Google Scholar]
- Russian Language—Wikipedia, The Free Encyclopedia. 2024. Available online: https://en.wikipedia.org/wiki/Russian_language (accessed on 16 May 2024).
- Shavrina, T. Differential Approach to Webcorpus Construction. In Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference Dialogue 2018; National Research University Higher School of Economics: Moscow, Russia, 2018. [Google Scholar]
- VKontakte. 2024. Available online: https://vk.com (accessed on 1 January 2024).
- OpenCorpora. 2024. Available online: http://opencorpora.org (accessed on 1 January 2024).
- Barakhnin, V.; Kozhemyakina, O.; Pastushkov, I. Automated determination of the type of genre and stylistic coloring of Russian texts. In ITM Web of Conferences; EDP Sciences: Les Ulis, France, 2017; Volume 10, p. 02001. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Sun, H.; Liu, J.; Zhang, J. A survey of contrastive learning in NLP. In Proceedings of the 7th International Symposium on Advances in Electrical, Electronics, and Computer Engineering, Xishuangbanna, China, 18–20 March 2022; Volume 12294, pp. 1073–1078. [Google Scholar]
- Bulygin, M.; Sharoff, S. Using machine translation for automatic genre classification in Arabic. In Proceedings of the Komp’juternaja Lingvistika i Intellektual’nye Tehnologii, Moscow, Russia, 30 May–2 June 2018; pp. 153–162. [Google Scholar]
- Nolazco-Flores, J.A.; Guerrero-Galván, A.V.; Del-Valle-Soto, C.; Garcia-Perera, L.P. Genre Classification of Books on Spanish. IEEE Access 2023, 11, 132878–132892. [Google Scholar] [CrossRef]
- Ozsarfati, E.; Sahin, E.; Saul, C.J.; Yilmaz, A. Book genre classification based on titles with comparative machine learning algorithms. In Proceedings of the 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS), Singapore, 23–25 February 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 14–20. [Google Scholar]
- Steinwart, I.; Christmann, A. Support Vector Machines; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Saraswat, M.; Srishti. Leveraging genre classification with RNN for Book recommendation. Int. J. Inf. Technol. 2022, 14, 3751–3756. [Google Scholar] [CrossRef]
- Webster, R.; Fonteyne, M.; Tezcan, A.; Macken, L.; Daems, J. Gutenberg goes neural: Comparing features of dutch human translations with raw neural machine translation outputs in a corpus of english literary classics. Informatics 2020, 7, 32. [Google Scholar] [CrossRef]
- Alfraidi, T.; Abdeen, M.A.; Yatimi, A.; Alluhaibi, R.; Al-Thubaity, A. The Saudi novel corpus: Design and compilation. Appl. Sci. 2022, 12, 6648. [Google Scholar] [CrossRef]
- Mendhakar, A. Linguistic profiling of text genres: An exploration of fictional vs. non-fictional texts. Information 2022, 13, 357. [Google Scholar] [CrossRef]
- Williamson, G.; Cao, A.; Chen, Y.; Ji, Y.; Xu, L.; Choi, J.D. Exploring a Multi-Layered Cross-Genre Corpus of Document-Level Semantic Relations. Information 2023, 14, 431. [Google Scholar] [CrossRef]
- Shavrina, T. Genre Classification on Text-Internal Features: A Corpus Study. In Proceedings of the Web Corpora as a Language Training Tool Conference (ARANEA 2018), Univerzita Komenského v Bratislave, Bratislava, Slovakia, 23–24 November 2018; pp. 134–147. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Le-Khac, P.H.; Healy, G.; Smeaton, A.F. Contrastive representation learning: A framework and review. IEEE Access 2020, 8, 193907–193934. [Google Scholar] [CrossRef]
- Chen, Q.; Zhang, R.; Zheng, Y.; Mao, Y. Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation. arXiv 2022, arXiv:2201.08702. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Wright, R.E. Logistic Regression. In Reading and Understanding Multivariate Statistics; Grimm, L.G., Yarnold, P.R., Eds.; American Psychological Association: Worcester, MA, USA, 1995; pp. 217–244. [Google Scholar]
- Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd aCm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Neal, T.; Sundararajan, K.; Fatima, A.; Yan, Y.; Xiang, Y.; Woodard, D. Surveying stylometry techniques and applications. ACM Comput. Surv. 2017, 50, 86. [Google Scholar] [CrossRef]
- Lagutina, K.; Lagutina, N.; Boychuk, E.; Vorontsova, I.; Shliakhtina, E.; Belyaeva, O.; Paramonov, I.; Demidov, P. A survey on stylometric text features. In Proceedings of the 2019 25th Conference of Open Innovations Association (FRUCT), Helsinki, Finland, 5–8 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 184–195. [Google Scholar]
- Stamatatos, E.; Fakotakis, N.; Kokkinakis, G. Automatic text categorization in terms of genre and author. Comput. Linguist. 2000, 26, 471–495. [Google Scholar] [CrossRef]
- Sarawgi, R.; Gajulapalli, K.; Choi, Y. Gender attribution: Tracing stylometric evidence beyond topic and genre. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, Portland, OR, USA, 23–24 June 2011; pp. 78–86. [Google Scholar]
- Eder, M. Rolling stylometry. Digit. Scholarsh. Humanit. 2016, 31, 457–469. [Google Scholar] [CrossRef]
- Eder, M.; Rybicki, J.; Kestemont, M. Stylometry with R: A package for computational text analysis. R J. 2016, 8, 107–121. [Google Scholar] [CrossRef]
- Maciej, P.; Tomasz, W.; Maciej, E. Open stylometric system WebSty: Integrated language processing, analysis and visualisation. CMST 2018, 24, 43–58. [Google Scholar]
- McNamara, D.S.; Graesser, A.C.; McCarthy, P.M.; Cai, Z. Cohesive Features in Expository Texts: A Large-scale Study of Expert and Novice Writing. Writ. Commun. 2014, 31, 151–183. [Google Scholar]
- Okulska, I.; Stetsenko, D.; Kołos, A.; Karlińska, A.; Głąbińska, K.; Nowakowski, A. StyloMetrix: An Open-Source Multilingual Tool for Representing Stylometric Vectors. arXiv 2023, arXiv:2309.12810. [Google Scholar]
- Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep learning–based text classification: A comprehensive review. ACM Comput. Surv. 2021, 54, 62. [Google Scholar] [CrossRef]
- Cunha, W.; Viegas, F.; França, C.; Rosa, T.; Rocha, L.; Gonçalves, M.A. A Comparative Survey of Instance Selection Methods applied to Non-Neural and Transformer-Based Text Classification. ACM Comput. Surv. 2023, 55, 265. [Google Scholar] [CrossRef]
- Face, H. Hugging Face. 2016. Available online: https://huggingface.co/ (accessed on 26 April 2024).
- Kuratov, Y.; Arkhipov, M. Adaptation of deep bidirectional multilingual transformers for Russian language. arXiv 2019, arXiv:1905.07213. [Google Scholar]
- LitRes. LitRes: Digital Library and E-Book Retailer. 2024. Available online: https://www.litres.ru (accessed on 1 January 2024).
- Royallib. Royallib: Free Online Library. 2024. Available online: https://royallib.com/ (accessed on 1 January 2024).
- Knigogo. 2013. Available online: https://knigogo.net/zhanryi/ (accessed on 1 January 2024).
- Belkina, A.C.; Ciccolella, C.O.; Anno, R.; Halpert, R.; Spidlen, J.; Snyder-Cappione, J.E. Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Nat. Commun. 2019, 10, 5415. [Google Scholar] [CrossRef]
- Bird, S.; Loper, E.; Klein, E. NLTK: The Natural Language Toolkit. arXiv 2009, arXiv:arXiv:cs/0205028. [Google Scholar] [CrossRef]
- ZILiAT-NASK. StyloMetrix: An Open-Source Multilingual Tool for Representing Stylometric Vectors (Code Repository). 2023. Available online: https://github.com/ZILiAT-NASK/StyloMetrix (accessed on 26 April 2024).
- Okulska, I.; Stetsenko, D.; Kołos, A.; Karlińska, A.; Głąbińska, K.; Nowakowski, A. StyloMetrix Metrics List (Russian). 2023. Available online: https://github.com/ZILiAT-NASK/StyloMetrix/blob/main/resources/metrics_list_ru.md (accessed on 26 April 2024).
- Schapire, R.E. Improving Regressors using Boosting Techniques. In Proceedings of the International Conference on Machine Learning (ICML), Austin, TX, USA, 21–23 June 1990. [Google Scholar]
- Hiyouga. Dual Contrastive Learning. 2022. Available online: https://github.com/hiyouga/Dual-Contrastive-Learning (accessed on 26 March 2024).
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
- Google Research. BERT: Multilingual (Uncased). 2018. Available online: https://huggingface.co/google-bert/bert-base-multilingual-uncased (accessed on 26 April 2024).
- DeepPavlov. RuBERT: Russian (Cased). 2021. Available online: https://huggingface.co/DeepPavlov/rubert-base-cased (accessed on 26 April 2024).
- Makridakis, S. Accuracy measures: Theoretical and practical concerns. Int. J. Forecast. 1993, 9, 527–529. [Google Scholar] [CrossRef]
- Streiner, D.L.; Norman, G.R. “Precision” and “accuracy”: Two terms that are neither. J. Clin. Epidemiol. 2006, 59, 327–330. [Google Scholar] [CrossRef]
Genre (Russian) | Translation |
---|---|
aнтaстикa | Science fiction |
Adventure | Detective |
Рoмaн | Romance |
энтези | Fantasy |
Клaссикa | Classics |
Бoевик | Action |
Нехудoжественнaя литерaтурa | Non-fiction |
Coвременнaя литерaтурa | Contemporary literature |
риключения | Adventure |
Нoвеллы, рaсскaзы | Short stories |
Детские книги | Children’s books |
Genres | Books Number | Chunks Number |
---|---|---|
action | 640 | 59,040 |
adventure | 120 | 6969 |
children’s | 282 | 4801 |
classic | 463 | 20,919 |
detective | 1303 | 51,022 |
science-fiction | 1909 | 244,217 |
fantasy | 2595 | 113,044 |
non-fiction | 896 | 22,632 |
contemporary | 811 | 28,648 |
short stories | 206 | 3798 |
romance | 1219 | 39,779 |
all genres | 10,444 | 594,869 |
all genres, unique | 8189 | 414,574 |
Genre | Total Chunks | Avg Chars per Chunk | Unique Words | Word(s) with the Highest Frequency | Translation |
---|---|---|---|---|---|
action | 630 | 2604.4 | 48,058 | прoстo | simply |
adventure | 120 | 2596.1 | 16,241 | время, сказал | time, said |
children’s | 279 | 2549.8 | 26,423 | сказал | said |
classic | 479 | 2164.9 | 40,127 | сказал | said |
contemporary | 802 | 2587.3 | 61,526 | oчень | very |
detective | 989 | 2578.1 | 63,937 | oчень | very |
fantasy | 992 | 2621.2 | 67,054 | прoстo | simply |
non-fiction | 886 | 2730.2 | 61,922 | кoтoрые | which/what |
romance | 994 | 2528.7 | 58,459 | прoстo | simply |
science-fiction | 989 | 2632.1 | 70,551 | прoстo | simply |
short-stories | 206 | 2511.0 | 21,788 | oчень | very |
all | 7366 | 2576.5 | 200,285 | прoстo | simply |
Char n-Grams | Word n-Grams | ||
---|---|---|---|
Genre | Vector Sizes | Vector Sizes | tf-idf Vector Size |
n = [1, 2, 3] | n = [1, 2, 3] | ||
action | 14,665 | 46,877 | 21,781 |
adventure | 13,319 | 29,283 | 16,241 |
children’s | 13,727 | 36,189 | 17,489 |
classic | 16,278 | 42,994 | 22,414 |
contemporary | 16,053 | 46,980 | 23,588 |
detective | 13,962 | 46,734 | 21,841 |
fantasy | 13,389 | 47,811 | 22,830 |
non-fiction | 19,688 | 48,554 | 23,176 |
romance | 13,237 | 44,325 | 19,902 |
science-fiction | 14,759 | 48,227 | 23,806 |
short-stories | 14,765 | 31,974 | 16,561 |
all | 32,500 | 403,194 | 101,350 |
Genre | Representation | Classifier | P | R | F1 | Acc |
---|---|---|---|---|---|---|
action | SE + stylometry | RF | 0.8144 | 0.8084 | 0.7997 | 0.8000 |
adventure | SE + stylometry | LR | 0.8548 | 0.8542 | 0.8541 | 0.8542 |
children’s | tfidf + stylometry | RF | 0.8486 | 0.8555 | 0.8397 | 0.8400 |
classic | SE + stylometry | RF | 0.9355 | 0.9130 | 0.9179 | 0.9200 |
contemporary | tfidf + stylometry | RF | 0.7192 | 0.7150 | 0.7159 | 0.7200 |
detective | SE | LR | 0.7905 | 0.7456 | 0.7453 | 0.7600 |
fantasy | char n-grams | XGB | 0.7420 | 0.7432 | 0.7399 | 0.7400 |
non-fiction | SE | LR | 0.8800 | 0.8824 | 0.8798 | 0.8800 |
romance | SE | XGB | 0.7603 | 0.7552 | 0.7565 | 0.7600 |
science-fiction | SE | LR | 0.7890 | 0.7866 | 0.7799 | 0.7800 |
short-stories | SE | RF | 0.6899 | 0.6899 | 0.6800 | 0.6800 |
Genre | Classifier | P | R | F1 | Acc | Comparison to the Best Trad Model |
---|---|---|---|---|---|---|
action | SE | 0.7890 | 0.7866 | 0.7799 | 0.7800 | ↓ |
adventure | SE | 0.7937 | 0.7917 | 0.7913 | 0.7917 | ↓ |
children’s | tfidf + stylometry | 0.8621 | 0.8621 | 0.8400 | 0.8400 | ↓ |
classic | n-grams | 0.9200 | 0.9227 | 0.9199 | 0.9200 | ↓ |
contemporary | n-grams + stylometry | 0.6782 | 0.6747 | 0.6753 | 0.6800 | ↓ |
detective | n-grams + stylometry | 0.7585 | 0.7585 | 0.7585 | 0.7600 | ↓ |
fantasy | SE + stylometry | 0.7388 | 0.7399 | 0.7391 | 0.7400 | ↓ |
non-fiction | SE + stylometry | 0.9010 | 0.8977 | 0.8990 | 0.9000 | ↑ |
romance | SE + stylometry | 0.7388 | 0.7399 | 0.7391 | 0.7400 | ↓ |
science-fiction | SE | 0.7734 | 0.7681 | 0.7596 | 0.7600 | ↓ |
short-stories | SE | 0.6659 | 0.6672 | 0.6599 | 0.6600 | ↓ |
Genre | mlBERT F1 | mlBERT Acc | ruBERT F1 | ruBERT Acc |
---|---|---|---|---|
action | 0.4762 | 0.5600 | 0.4156 | 0.5000 |
adventure | 0.4045 | 0.4792 | 0.4678 | 0.4792 |
children’s | 0.3107 | 0.4000 | 0.5833 | 0.6000 |
classic | 0.4156 | 0.5000 | 0.3189 | 0.3200 |
contemporary | 0.4165 | 0.5400 | 0.3151 | 0.4600 |
detective | 0.3506 | 0.5400 | 0.4746 | 0.5000 |
fantasy | 0.3689 | 0.4600 | 0.4058 | 0.4400 |
non-fiction | 0.4000 | 0.4000 | 0.4802 | 0.6000 |
romance | 0.3151 | 0.4600 | 0.3151 | 0.4600 |
science-fiction | 0.3506 | 0.5400 | 0.5833 | 0.6000 |
short-stories | 0.4283 | 0.5600 | 0.3810 | 0.4800 |
Model | Genre | ruBERT F1 | ruBERT Acc | mlBERT F1 | mlBERT Acc |
---|---|---|---|---|---|
DualCL | action | 0.5703 | 0.6200 | 0.5331 | 0.5600 |
DualCL | adventure | 0.5623 | 0.5625 | 0.5279 | 0.5625 |
DualCL | children’s | 0.5536 | 0.5600 | 0.5484 | 0.5600 |
DualCL | classic | 0.6716 | 0.6800 | 0.4900 | 0.5800 |
DualCL | contemporary | 0.6486 | 0.6600 | 0.3658 | 0.5000 |
DualCL | detective | 0.6394 | 0.6400 | 0.5942 | 0.6000 |
DualCL | fantasy | 0.6394 | 0.6400 | 0.3969 | 0.5600 |
DualCL | non-fiction | 0.7391 | 0.7400 | 0.3867 | 0.5400 |
DualCL | romance | 0.6162 | 0.6200 | 0.5066 | 0.6400 |
DualCL | science-fiction | 0.5942 | 0.6000 | 0.4172 | 0.6000 |
DualCL | short-stories | 0.5824 | 0.6200 | 0.5785 | 0.5800 |
Representation | Classifier | P | R | F1 | Acc |
---|---|---|---|---|---|
SE | LR | 0.4289 | 0.4332 | 0.4264 | 0.4293 |
SE + stylometry | LR | 0.4415 | 0.4471 | 0.4386 | 0.4367 |
char n-grams | LR | 0.2865 | 0.2650 | 0.2711 | 0.2705 |
char n-grams + stylometry | LR | 0.2694 | 0.2471 | 0.2550 | 0.2506 |
n-grams | LR | 0.3004 | 0.2866 | 0.2800 | 0.2754 |
n-grams + stylometry | RF | 0.2962 | 0.2911 | 0.2617 | 0.2878 |
tfidf | LR | 0.2996 | 0.2990 | 0.2372 | 0.2903 |
tfidf + stylometry | RF | 0.2503 | 0.2617 | 0.2308 | 0.2581 |
Genre | P | R | F1 |
---|---|---|---|
action | 0.3654 | 0.3878 | 0.3762 |
adventure | 0.7353 | 0.5814 | 0.6494 |
children’s | 0.3519 | 0.5000 | 0.4130 |
classic | 0.3478 | 0.3902 | 0.3678 |
contemporary | 0.6000 | 0.7241 | 0.6562 |
detective | 0.3600 | 0.3333 | 0.3462 |
fantasy | 0.3514 | 0.4062 | 0.3768 |
non-fiction | 0.4500 | 0.4500 | 0.4500 |
romance | 0.4375 | 0.2593 | 0.3256 |
science-fiction | 0.5610 | 0.6571 | 0.6053 |
short-stories | 0.2963 | 0.2286 | 0.2581 |
Representation | P | R | F1 | Acc | vs. Best Trad. Model |
---|---|---|---|---|---|
SE | 0.4076 | 0.4052 | 0.3995 | 0.3970 | ↓ |
SE + stylometry | 0.3978 | 0.3977 | 0.3924 | 0.3921 | ↓ |
char n-grams | 0.2779 | 0.2635 | 0.2651 | 0.2655 | ↓ |
char n-grams + stylometry | 0.2740 | 0.2595 | 0.2630 | 0.2605 | ↑ |
n-grams | 0.2915 | 0.2965 | 0.2799 | 0.2854 | ↑ |
n-grams + stylometry | 0.3181 | 0.3044 | 0.2923 | 0.2953 | ↑ |
tfidf | 0.2286 | 0.2459 | 0.2072 | 0.2382 | ↓ |
tfidf + stylometry | 0.2935 | 0.2956 | 0.2640 | 0.2878 | ↓ |
Genre | P | R | F1 | vs. Best Trad. Model |
---|---|---|---|---|
action | 0.3000 | 0.3061 | 0.3030 | ↓ |
adventure | 0.7931 | 0.5349 | 0.6389 | ↓ |
children’s | 0.3704 | 0.5263 | 0.4348 | ↑ |
classic | 0.3542 | 0.4146 | 0.3820 | ↑ |
contemporary | 0.5556 | 0.6897 | 0.6154 | ↓ |
detective | 0.3200 | 0.2963 | 0.3077 | ↓ |
fantasy | 0.3500 | 0.4375 | 0.3889 | ↑ |
non-fiction | 0.4000 | 0.3000 | 0.3429 | ↓ |
romance | 0.4103 | 0.2963 | 0.3441 | ↑ |
science-fiction | 0.5366 | 0.6286 | 0.5789 | ↓ |
short-stories | 0.2308 | 0.1714 | 0.1967 | ↓ |
classifier | P | R | F1 | Acc |
ruBERT | 0.0087 | 0.0841 | 0.0158 | 0.0918 |
per genre scores | ||||
genre | P | R | F1 | |
action | 0 | 0 | 0 | |
adventure | 0 | 0 | 0 | |
children’s | 0 | 0 | 0 | |
classic | 0 | 0 | 0 | |
contemporary | 0 | 0 | 0 | |
detective | 0 | 0 | 0 | |
fantasy | 0 | 0 | 0 | |
non-fiction | 0 | 0 | 0 | |
romance | 0 | 0 | 0 | |
science-fiction | 0 | 0 | 0 | |
short-stories | 0.0956 | 0.925 | 0.1733 | |
classifier | P | R | F1 | Acc |
mlBERT | 0.0091 | 0.0909 | 0.0166 | 0.0993 |
per genre scores | ||||
genre | P | R | F1 | |
action | 0 | 0 | 0 | |
adventure | 0 | 0 | 0 | |
children’s | 0 | 0 | 0 | |
classic | 0 | 0 | 0 | |
contemporary | 0 | 0 | 0 | |
detective | 0 | 0 | 0 | |
fantasy | 0.1005 | 1 | 0.1826 | |
non-fiction | 0 | 0 | 0 | |
romance | 0 | 0 | 0 | |
science-fiction | 0 | 0 | 0 | |
short-stories | 0 | 0 | 0 |
classifier | P | R | F1 | Acc |
DualCL | 0.3706 | 0.3400 | 0.3400 | 0.3704 |
ruBERT | ||||
per genre scores | ||||
genre | P | R | F1 | |
action | 0.2761 | 0.3700 | 0.3162 | |
adventure | 0.1020 | 0.2083 | 0.1370 | |
children’s | 0.5714 | 0.2857 | 0.3810 | |
classic | 0.4497 | 0.6979 | 0.5469 | |
contemporary | 0.2039 | 0.3100 | 0.2460 | |
detective | 0.3220 | 0.1900 | 0.2390 | |
fantasy | 0.4353 | 0.3700 | 0.4000 | |
non-fiction | 0.8228 | 0.6500 | 0.7263 | |
romance | 0.3763 | 0.3500 | 0.3627 | |
science-fiction | 0.4561 | 0.2600 | 0.3312 | |
short-stories | 0.0606 | 0.0476 | 0.0533 | |
classifier | P | R | F1 | Acc |
DualCL | 0.3582 | 0.3571 | 0.3354 | 0.3758 |
mlBERT | ||||
per genre scores | ||||
genre | P | R | F1 | |
action | 0.3684 | 0.2100 | 0.2675 | |
adventure | 0.1400 | 0.2917 | 0.1892 | |
children’s | 0.3011 | 0.5000 | 0.3758 | |
classic | 0.6092 | 0.5521 | 0.5792 | |
contemporary | 0.1970 | 0.2600 | 0.2241 | |
detective | 0.3409 | 0.1500 | 0.2083 | |
fantasy | 0.3286 | 0.4600 | 0.3833 | |
non-fiction | 0.7660 | 0.7200 | 0.7423 | |
romance | 0.3473 | 0.5800 | 0.4345 | |
science-fiction | 0.3750 | 0.1800 | 0.2432 | |
short-stories | 0.1667 | 0.0238 | 0.0417 |
Genre | Representation | Classifier | P | R | F1 | Acc | vs. Best Punct Acc |
---|---|---|---|---|---|---|---|
action | SE | LR | 0.6987 | 0.6997 | 0.6989 | 0.7000 | ↓ |
adventure | SE | RF | 0.7500 | 0.7500 | 0.7500 | 0.7500 | ↓ |
children’s | SE | LR | 0.8346 | 0.7989 | 0.8069 | 0.8200 | ↓ |
classic | char n-grams | XGB | 0.5455 | 0.5197 | 0.4673 | 0.5800 | ↓ |
contemporary | SE + stylometry | RF | 0.7167 | 0.7093 | 0.6989 | 0.7000 | ↓ |
detective | char n-grams | LR | 0.6575 | 0.6562 | 0.6566 | 0.6600 | ↓ |
fantasy | SE + stylometry | RF | 0.7788 | 0.7802 | 0.7792 | 0.7800 | ↑ |
non-fiction | tfidf + stylometry | RF | 0.9600 | 0.9630 | 0.9599 | 0.9600 | ↑ |
romance | SE | LR | 0.8622 | 0.8639 | 0.8599 | 0.8600 | ↑ |
science-fiction | SE + stylometry | LR | 0.6502 | 0.6473 | 0.6394 | 0.6400 | ↓ |
short-stories | tfidf + stylometry | LR | 0.6619 | 0.6430 | 0.6380 | 0.6531 | ↓ |
Representation | Classifier | Acc with Punctuation | Acc without Punctuation |
---|---|---|---|
SE | RF | 0.3375 | 0.2926 |
SE | LR | 0.4293 | 0.3232 |
SE | XGB | 0.2978 | 0.2850 |
SE + stylometry | RF | 0.3400 | 0.3181 |
SE + stylometry | LR | 0.4367 | 0.3282 |
SE + stylometry | XGB | 0.3127 | 0.2799 |
char n-grams | RF | 0.2333 | 0.2214 |
char n-grams | LR | 0.2705 | 0.2366 |
char n-grams | XGB | 0.2432 | 0.1934 |
char n-grams + stylometry | RF | 0.2531 | 0.2087 |
char n-grams + stylometry | LR | 0.2506 | 0.2316 |
char n-grams + stylometry | XGB | 0.2382 | 0.1985 |
n-grams | RF | 0.2333 | 0.2341 |
n-grams | LR | 0.2754 | 0.2748 |
n-grams | XGB | 0.1712 | 0.1501 |
n-grams + stylometry | RF | 0.2878 | 0.2316 |
n-grams + stylometry | LR | 0.2779 | 0.2621 |
n-grams + stylometry | XGB | 0.2233 | 0.1985 |
tfidf | RF | 0.2432 | 0.2341 |
tfidf | LR | 0.2903 | 0.2545 |
tfidf | XGB | 0.1439 | 0.1908 |
tfidf + stylometry | RF | 0.2804 | 0.2545 |
tfidf + stylometry | LR | 0.2531 | 0.2545 |
tfidf + stylometry | XGB | 0.2134 | 0.1832 |
Representation | Acc with Punctuation | Acc without Punctuation |
---|---|---|
SE | 0.3970 | 0.3282 |
SE + stylometry | 0.3921 | 0.3435 |
char n-grams | 0.2655 | 0.2392 |
char n-grams + stylometry | 0.2605 | 0.2494 |
n-grams | 0.2854 | 0.2468 |
n-grams + stylometry | 0.2953 | 0.2875 |
tfidf | 0.2382 | 0.2265 |
tfidf + stylometry | 0.2878 | 0.2723 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vanetik, N.; Tiamanova, M.; Kogan, G.; Litvak, M. Genre Classification of Books in Russian with Stylometric Features: A Case Study. Information 2024, 15, 340. https://doi.org/10.3390/info15060340
Vanetik N, Tiamanova M, Kogan G, Litvak M. Genre Classification of Books in Russian with Stylometric Features: A Case Study. Information. 2024; 15(6):340. https://doi.org/10.3390/info15060340
Chicago/Turabian StyleVanetik, Natalia, Margarita Tiamanova, Genady Kogan, and Marina Litvak. 2024. "Genre Classification of Books in Russian with Stylometric Features: A Case Study" Information 15, no. 6: 340. https://doi.org/10.3390/info15060340
APA StyleVanetik, N., Tiamanova, M., Kogan, G., & Litvak, M. (2024). Genre Classification of Books in Russian with Stylometric Features: A Case Study. Information, 15(6), 340. https://doi.org/10.3390/info15060340