Vegetarianism Discourse in Russian Social Media: A Case Study
Abstract
:1. Introduction
- RQ1: which text representations are better for our classification task?
- RQ2: do lemmatization and topic modeling have a positive effect on classification accuracy?
- RQ3: does sentiment analysis have a positive effect on classification accuracy?
- RQ4: do transformer-based models perform better than traditional ones?
- RQ5: is contrastive learning more effective in classifying opinions on vegetarianism than traditional or transformer models?
2. Background
2.1. Russian NLP
2.2. Vegetarianism in Text Analysis
2.3. Contrastive Learning
3. The VegRuCorpus Dataset
3.1. Data Collection
3.2. Annotation
4. Text Representations
4.1. Preprocessing
4.2. Syntactic Representations
- is the term frequency of t in document d;
- N is the number of documents in the corpus;
- is the number of documents in the corpus containing t.
4.3. Semantic Representations
4.4. Representation Visualization and Analysis
4.5. Representation Enhancement
4.5.1. Topic Modeling
4.5.2. Sentiment Analysis
5. Classification Models
5.1. Traditional Models
5.2. Fine-Tuned Transformers
5.3. Contrastive Learning
6. Experimental Evaluation
6.1. Hardware Setup
6.2. Software Setup
6.3. Metrics
6.4. Results
7. Conclusions
8. Applications and Limitations
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix A.1. VegRuCorpus Text Samples
Text | Translation |
---|---|
positive | |
Чтo еще, крoме нравственнo-мoральных аспектoв привлекает людей в вегетарианстве? Пo мнению приверженцев этoй системы питания, oтказ oт пищи живoтнoгo прoисхoждения пoзвoляет oчистить телo oт гoрмoна страха, кoтoрый выбрасывается в крoвь живoтнoгo в мoмент смерти. И хoть ни oднo исследoвание не нашлo этoму научнoгo пoдтверждения, вегетарианцы прoдoлжают в этo верить. Весьма спoрным считается и утверждение, чтo oтказ oт мяса делает челoвека бoлее мягким и менее агрессивным. Практика пoказывает, чтo япoнцы, кoтoрые в древнoсти вooбще не ели мяса, а питались в oснoвнoм рисoм и рыбoй, никoгда не были мирoлюбивoй нацией, тo же мoжнo сказать и oб Адoльфе Гитлере, кoтoрые придерживался вегетарианства.Впрoчем, у этoй системы питания есть неoспoримые плюсы. Среди них:низкий прoцент развития сердечнo-сoсудистых забoлеваний среди вегетарианцев (растительная пища не сoдержит хoлестерина, кoтoрый oткладывается на сoсудах, прoвoцируя развитие инфаркта, атерoсклерoза, инсульта).среди вегетарианцев практически нет людей с избытoчным весoм (растительная пища бoгата клетчаткoй, кoтoрая быстрo запoлняет желудoк, вызывая чувствo сытoсти и при этoм сoдержит малo калoрий);низкий урoвень развития oнкoлoгических забoлеваний среди вегетарианцев (злаки, oвoщи и фрукты сoдержат бoльшoе кoличествo витаминoв и антиoксидантoв, кoтoрые блoкируют прoцессы старения и перерoждения клетoк). | What else, besides the moral and ethical aspects, attracts people to vegetarianism? According to the adherents of this diet, giving up food of animal origin allows you to cleanse the body of the hormone of fear, which is released into the blood of an animal at the moment of death. And although no study has found scientific confirmation of this, vegetarians continue to believe in it. The assertion that giving up meat makes a person softer and less aggressive is also considered quite controversial. Practice shows that the Japanese, who in ancient times did not eat meat at all, but ate mainly rice and fish, have never been a peace-loving nation, the same can be said about Adolf Hitler, who adhered to vegetarianism. However, this diet has undeniable advantages. Among them: a low percentage of cardiovascular diseases among vegetarians (plant foods do not contain cholesterol, which is deposited on blood vessels, causing the development of heart attacks, atherosclerosis, and strokes). among vegetarians, there are practically no overweight people (plant foods are rich in fiber, which quickly fills the stomach, causing a feeling of satiety and at the same time contains few calories); a low level of cancer among vegetarians (cereals, vegetables, and fruits contain a large number of vitamins and antioxidants that block the processes of aging and cell degeneration). |
negative | |
Отмечается и негативнoе влияние вегетаринства и веганства на физическoе здoрoвье челoвека. Результаты исследoваний пoказывают, чтo люди, кoтoрые oтказываются oт мяса или других прoдуктoв живoтнoгo прoисхoждения, мoгут страдать oт дефицита питательных веществ—витаминoв B12 и D, Омега-3 жирных кислoт, кальция, железа и цинка,—чтo мoжет сильнo сказываться на здoрoвье. Также пoявляется всё бoльше дoказательств тoгo, чтo oтказ oт упoтребления мяса связан с психическими расстрoйствами и бoлее высoкими рисками психoлoгических прoблем. Пo сравнению с людьми, упoтребляющими мясo (назoвём их «мясoедами»), вегетарианцы чаще страдают oт тяжёлoй депрессии или тревoжнoгo расстрoйства, oни бoлее склoнны к самoубийствам и причинению себе вреда. Нo свидетельства, связывающие вегетарианствo с психическими расстрoйствами, неoднoзначны. В 2010 и 2015 гoдах исследoватели oбнаружили, чтo в oтнoшении некoтoрых аспектoв oценки психическoгo здoрoвья вегетарианцы oказались здoрoвее мясoедoв. В 2017 гoду Всемирная oрганизация здравooхранения (ВОЗ) сooбщила, чтo психические забoлевания являются oснoвнoй причинoй инвалиднoсти вo всём мире и oказывают серьёзнoе влияние на верoятнoсть сердечнo-сoсудистых забoлеваний (главную причину смертнoсти в мире). Пo oценкам исследoвателей ВОЗ, бoлее 300 миллиoнoв челoвек страдают oт депрессии (4,4% населения) и бoлее 260 миллиoнoв челoвек (3,6% населения) oт тревoжнoсти. Эти oценки oтражают, чтo за пoследние два десятилетия значительнo увеличилoсь числo людей, живущих с психическими расстрoйствами и забoлеваниями.Учитывая рoст числа людей с психическими расстрoйствами и пoпуляризацией вегетарианства, актуальнo oпределить связь между oтказoм oт упoтребления мяса и психoлoгическим здoрoвьем. | There are also concerns about the negative impact of vegetarianism and veganism on physical health. Research suggests that people who give up meat or other animal products may suffer from nutrient deficiencies—vitamins B12 and D, omega-3 fatty acids, calcium, iron, and zinc—which can have significant health consequences. There is also growing evidence that not eating meat is associated with mental health disorders and higher risks of psychological problems. Compared with people who eat meat (let us call them “meat eaters”), vegetarians are more likely to suffer from severe depression or anxiety disorders, and are more likely to commit suicide and harm themselves. But the evidence linking vegetarianism to mental health disorders is mixed. In 2010 and 2015, researchers found that vegetarians were healthier than meat eaters on some measures of mental health. In 2017, the World Health Organization (WHO) reported that mental illness is the leading cause of disability worldwide and has a significant impact on the risk of cardiovascular disease (the leading cause of death worldwide). WHO researchers estimate that more than 300 million people suffer from depression (4.4% of the population) and more than 260 million people (3.6% of the population) from anxiety. These estimates reflect a significant increase in the number of people living with mental disorders and illnesses over the past two decades. Given the rise in mental health and the rise of vegetarianism, it is important to determine the link between eating meat and mental health. |
Appendix A.2. Stopwords in Russian Language
и | в | чтo | не | на | с | для |
oт | к | из | у | как | а | этo |
o | или | пo | также | нo | егo | есть |
мoжет | кoтoрые | тoлькo | при | бoлее | чем | |
все | так | их | вo | oн | я | сo |
oна | да | ты | вы | за | бы | ее |
мне | былo | вoт | oт | меня | еще | нет |
ему | теперь | кoгда | даже | ну | вдруг | ли |
если | уже | ни | быть | был | дo | вас |
нибудь | oпять | уж | вам | ведь | там | пoтoм |
себя | ничегo | ей | oни | тут | где | надo |
ней | мы | тебя | чем | была | сам | чтoб |
без | будтo | чегo | раз | тoже | себе | пoд |
будет | ж | тoгда | ктo | этoт | тoгo | пoтoму |
этoгo | какoй | сoвсем | ним | здесь | этoм | oдин |
пoчти | мoй | тем | чтoбы | нее | сейчас | были |
куда | зачем | всех | никoгда | мoжнo | накoнец | два |
oб | другoй | хoть | пoсле | над | бoльше | тoт |
через | эти | нас | прo | всегo | них | какая |
мнoгo | разве | три | эту | мoя | впрoчем | хoрoшo |
свoю | этoй | перед | инoгда | лучше | чуть | тoм |
нельзя | такoй | им | всегда | кoнечнo | всю | между |
кoтoрые |
and | in | what | not | on | with | for |
from | to | at | how | this | about | or |
by | also | but | his | is | that | can |
which | only | when | more | than | all | so |
same | their | he | I | she | yes | you |
her | me | was | here | still | no | him |
now | even | well | suddenly | whether | if | neither |
be | before | someone | again | because | there | then |
self | nothing | they | where | must | without | as if |
once | under | will | who | this | that | because |
one | almost | my | to | were | why | never |
finally | two | other | though | after | over | more |
through | these | us | them | many | really | three |
my | however | good | own | before | sometimes | better |
slightly | can’t | such | always | between | of course | which |
Appendix A.3. Full Results of Traditional Models for Syntactic Text Representations
Representation | Classifier | P | R | F1 | Acc |
---|---|---|---|---|---|
char n-grams | RF | 0.8009 | 0.7879 | 0.7873 | 0.7902 |
char n-grams | LR | 0.7966 | 0.7960 | 0.7951 | 0.7951 |
char n-grams | XGB | 0.7355 | 0.7243 | 0.7227 | 0.7268 |
char n-grams | VC | 0.7871 | 0.7843 | 0.7845 | 0.7854 |
word n-grams | RF | 0.8028 | 0.7931 | 0.7929 | 0.7951 |
word n-grams | LR | 0.7806 | 0.7800 | 0.7802 | 0.7805 |
word n-grams | XGB | 0.7293 | 0.6669 | 0.6468 | 0.6732 |
word n-grams | VC | 0.7996 | 0.7826 | 0.7816 | 0.7854 |
tf-idf | RF | 0.8024 | 0.7824 | 0.7810 | 0.7854 |
tf-idf | LR | 0.7996 | 0.7826 | 0.7816 | 0.7854 |
tf-idf | XGB | 0.7489 | 0.7021 | 0.6911 | 0.7073 |
tf-idf | VC | 0.8056 | 0.7821 | 0.7803 | 0.7854 |
lemmatization + n-grams | RF | 0.8069 | 0.7981 | 0.7981 | 0.8000 |
lemmatization + n-grams | LR | 0.8100 | 0.8093 | 0.8095 | 0.8098 |
lemmatization + n-grams | XGB | 0.7585 | 0.7276 | 0.7222 | 0.7317 |
lemmatization + n-grams | VC | 0.8150 | 0.8026 | 0.8024 | 0.8049 |
lemmatization + tf-idf | RF | 0.8024 | 0.7988 | 0.7991 | 0.8000 |
lemmatization + tf-idf | LR | 0.8151 | 0.8081 | 0.8083 | 0.8098 |
lemmatization + tf-idf | XGB | 0.7440 | 0.7181 | 0.7131 | 0.7220 |
lemmatization + tf-idf | VC | 0.8293 | 0.8229 | 0.8232 | 0.8244 |
Representation | Classifier | P | R | F1 | Acc |
---|---|---|---|---|---|
char n-grams + topics | RF | 0.7994 | 0.7936 | 0.7937 | 0.7951 |
char n-grams + topics | LR | 0.7966 | 0.7960 | 0.7951 | 0.7951 |
char n-grams + topics | XGB | 0.7502 | 0.7336 | 0.7312 | 0.7366 |
char n-grams + topics | VC | 0.7871 | 0.7843 | 0.7845 | 0.7854 |
word n-grams + topics | RF | 0.7606 | 0.7433 | 0.7412 | 0.7463 |
word n-grams + topics | LR | 0.7806 | 0.7800 | 0.7802 | 0.7805 |
word n-grams + topics | XGB | 0.7630 | 0.7067 | 0.6940 | 0.7122 |
word n-grams + topics | VC | 0.7663 | 0.7429 | 0.7396 | 0.7463 |
tf-idf + topics | RF | 0.7846 | 0.7733 | 0.7727 | 0.7756 |
tf-idf + topics | LR | 0.7739 | 0.7693 | 0.7693 | 0.7707 |
tf-idf + topics | XGB | 0.7489 | 0.7021 | 0.6911 | 0.7073 |
tf-idf + topics | VC | 0.7804 | 0.7524 | 0.7488 | 0.7561 |
char n-grams + SA | RF | 0.8034 | 0.7876 | 0.7868 | 0.7902 |
char n-grams + SA | LR | 0.7795 | 0.7740 | 0.7741 | 0.7756 |
char n-grams + SA | XGB | 0.7530 | 0.6917 | 0.6759 | 0.6976 |
char n-grams + SA | VC | 0.8024 | 0.7824 | 0.7810 | 0.7854 |
word n-grams + SA | RF | 0.8056 | 0.7769 | 0.7743 | 0.7805 |
word n-grams + SA | LR | 0.8252 | 0.8124 | 0.8123 | 0.8146 |
word n-grams + SA | XGB | 0.7197 | 0.6621 | 0.6426 | 0.6683 |
word n-grams + SA | VC | 0.8096 | 0.7767 | 0.7735 | 0.7805 |
tf-idf + SA | RF | 0.8164 | 0.7867 | 0.7843 | 0.7902 |
tf-idf + SA | LR | 0.8238 | 0.7914 | 0.7890 | 0.7951 |
tf-idf + SA | XGB | 0.7065 | 0.6524 | 0.6321 | 0.6585 |
tf-idf + SA | VC | 0.8187 | 0.7762 | 0.7718 | 0.7805 |
Appendix A.4. Full Results of Traditional Models for Semantic Text Representations
Representation | Classifier | P | R | F1 | Acc |
---|---|---|---|---|---|
ruBERT SE | RF | 0.7513 | 0.7507 | 0.7508 | 0.7512 |
ruBERT SE | LR | 0.7609 | 0.7607 | 0.7608 | 0.7610 |
ruBERT SE | XGB | 0.6637 | 0.6624 | 0.6623 | 0.6634 |
ruBERT SE | VC | 0.7470 | 0.7455 | 0.7456 | 0.7463 |
SBERT SE | RF | 0.7961 | 0.7943 | 0.7945 | 0.7951 |
SBERT SE | LR | 0.7804 | 0.7802 | 0.7803 | 0.7805 |
SBERT SE | XGB | 0.7432 | 0.7402 | 0.7403 | 0.7415 |
SBERT SE | VC | 0.7894 | 0.7838 | 0.7839 | 0.7854 |
ruRoBERTa SE | RF | 0.7999 | 0.8000 | 0.7999 | 0.8000 |
ruRoBERTa SE | LR | 0.8438 | 0.8438 | 0.8438 | 0.8439 |
ruRoBERTa SE | XGB | 0.7124 | 0.7114 | 0.7115 | 0.7122 |
ruRoBERTa SE | VC | 0.8389 | 0.8390 | 0.8390 | 0.8390 |
Representation | Classifier | P | R | F1 | Acc |
---|---|---|---|---|---|
ruBERT SE + topics | RF | 0.7270 | 0.7262 | 0.7263 | 0.7268 |
ruBERT SE + topics | LR | 0.7413 | 0.7412 | 0.7412 | 0.7415 |
ruBERT SE + topics | XGB | 0.6700 | 0.6667 | 0.6660 | 0.6683 |
ruBERT SE + topics | VC | 0.7222 | 0.7212 | 0.7213 | 0.7220 |
SBERT SE + topics | RF | 0.7811 | 0.7798 | 0.7800 | 0.7805 |
SBERT SE + topics | LR | 0.7852 | 0.7852 | 0.7852 | 0.7854 |
SBERT SE + topics | XGB | 0.7342 | 0.7302 | 0.7301 | 0.7317 |
SBERT SE + topics | VC | 0.7916 | 0.7893 | 0.7895 | 0.7902 |
ruRoBERTa SE + topics | RF | 0.7999 | 0.8000 | 0.7999 | 0.8000 |
ruRoBERTa SE + topics | LR | 0.8389 | 0.8390 | 0.8390 | 0.8390 |
ruRoBERTa SE + topics | XGB | 0.7316 | 0.7317 | 0.7316 | 0.7317 |
ruRoBERTa SE + topics | VC | 0.8194 | 0.8195 | 0.8194 | 0.8195 |
ruBERT SE + SA | RF | 0.7719 | 0.7698 | 0.7699 | 0.7707 |
ruBERT SE + SA | LR | 0.7511 | 0.7512 | 0.7511 | 0.7512 |
ruBERT SE + SA | XGB | 0.6740 | 0.6719 | 0.6716 | 0.6732 |
ruBERT SE + SA | VC | 0.7415 | 0.7410 | 0.7411 | 0.7415 |
SBERT SE + SA | RF | 0.7569 | 0.7552 | 0.7554 | 0.7561 |
SBERT SE + SA | LR | 0.7852 | 0.7852 | 0.7852 | 0.7854 |
SBERT SE + SA | XGB | 0.7297 | 0.7252 | 0.7249 | 0.7268 |
SBERT SE + SA | VC | 0.7950 | 0.7950 | 0.7950 | 0.7951 |
ruRoBERTa SE + SA | RF | 0.7907 | 0.7907 | 0.7902 | 0.7902 |
ruRoBERTa SE + SA | LR | 0.8536 | 0.8536 | 0.8536 | 0.8537 |
ruRoBERTa SE + SA | XGB | 0.7072 | 0.7069 | 0.7070 | 0.7073 |
ruRoBERTa SE + SA | VC | 0.8487 | 0.8488 | 0.8487 | 0.8488 |
References
- Dietz, T.; Frisch, A.S.; Kalof, L.; Stern, P.C.; Guagnano, G.A. Values and vegetarianism: An exploratory analysis 1. Rural Sociol. 1995, 60, 533–542. [Google Scholar] [CrossRef]
- Nezlek, J.B.; Forestell, C.A. Vegetarianism as a social identity. Curr. Opin. Food Sci. 2020, 33, 45–51. [Google Scholar] [CrossRef]
- Poore, J.; Nemecek, T. Reducing food’s environmental impacts through producers and consumers. Science 2018, 360, 987–992. [Google Scholar] [CrossRef] [PubMed]
- Monteiro, B.M.A.; Pfeiler, T.M.; Patterson, M.D.; Milburn, M.A. The Carnism Inventory: Measuring the ideology of eating animals. Appetite 2017, 113, 51–62. [Google Scholar] [CrossRef] [PubMed]
- LeBlanc, R.D. Vegetarianism in Russia: The Tolstoy (an) Legacy; The Carl Beck Papers in Russian and East European Studies; University of New Hampshire: Durham, NH, USA, 2001. [Google Scholar]
- Leblanc, R.D. The Ethics and Politics of Diet: Tolstoy, Pilnyak, and the Modern Slaughterhouse. Gastronomica 2017, 17, 9–25. [Google Scholar] [CrossRef]
- Hargreaves, S.M.; Raposo, A.; Saraiva, A.; Zandonadi, R.P. Vegetarian diet: An overview through the perspective of quality of life domains. Int. J. Environ. Res. Public Health 2021, 18, 4067. [Google Scholar] [CrossRef] [PubMed]
- Sindhu, S.; Mageshwari, S.U. A study on behavior, diet patterns and physical activity among selected GDM and non-GDM women in south India. J. Diabetol. 2024, 15, 86–93. [Google Scholar] [CrossRef]
- Wang, T.; Masedunskas, A.; Willett, W.C.; Fontana, L. Vegetarian and vegan diets: Benefits and drawbacks. Eur. Heart J. 2023, 44, 3423–3439. [Google Scholar] [CrossRef]
- Key, T.J.; Davey, G.K.; Appleby, P.N. Health benefits of a vegetarian diet. Proc. Nutr. Soc. 1999, 58, 271–275. [Google Scholar] [CrossRef] [PubMed]
- Gasparetto, A.; Marcuzzo, M.; Zangari, A.; Albarelli, A. A survey on text classification algorithms: From text to predictions. Information 2022, 13, 83. [Google Scholar] [CrossRef]
- Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep learning–based text classification: A comprehensive review. ACM Comput. Surv. (CSUR) 2021, 54, 1–40. [Google Scholar] [CrossRef]
- Artemova, E. Deep learning for the Russian language. In The Palgrave Handbook of Digital Russia Studies; Palgrave Macmillan: Cham, Switzerland, 2021; pp. 465–481. [Google Scholar]
- Kuratov, Y.; Arkhipov, M. RuBERT: A Russian BERT Model. In Proceedings of the 13th International Conference on Language Resources and Evaluation (LREC 2019), Miyazaki, Japan, 13–17 May 2019. [Google Scholar]
- Zmitrovich, D.; Abramov, A.; Kalmykov, A.; Tikhonova, M.; Taktasheva, E.; Astafurov, D.; Baushenko, M.; Snegirev, A.; Shavrina, T.; Markov, S.; et al. A Family of Pretrained Transformer Language Models for Russian. arXiv 2023, arXiv:2309.10931. [Google Scholar]
- Lee, C.; Kim, S.; Jeong, S.; Lim, C.; Kim, J.; Kim, Y.; Jung, M. MIND dataset for diet planning and dietary healthcare with machine learning: Dataset creation using combinatorial optimization and controllable generation with domain experts. In Proceedings of the Thirty-Fifth Conference on Neural Information Processing Systems, Datasets and Benchmarks Track (Round 2), Online, 6–14 December 2021. [Google Scholar]
- Olavsrud, M.A. Natural Language Processing and Topic Modeling for Exploring the Vegetarian and Vegan Trends. Master’s Thesis, Norwegian University of Life Sciences, Ås, Norway, 2020. [Google Scholar]
- Drole, J.; Pravst, I.; Eftimov, T.; Koroušić Seljak, B. NutriGreen image dataset: A collection of annotated nutrition, organic, and vegan food products. Front. Nutr. 2024, 11, 1342823. [Google Scholar] [CrossRef]
- Kengpol, A.; Punyota, W. Prediction of Vegetarian Food Preferences for the Aging Society. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1163, 012021. [Google Scholar] [CrossRef]
- Kim, S.; Fenech, M.F.; Kim, P.J. Nutritionally recommended food for semi-to strict vegetarian diets based on large-scale nutrient composition data. Sci. Rep. 2018, 8, 4344. [Google Scholar] [CrossRef] [PubMed]
- Duangsuphasin, A.; Kengpol, A.; Lima, R.M. Design of a decision support system for vegetarian food flavoring by using deep learning for the ageing society. In Proceedings of the 2021 Research, Invention, and Innovation Congress: Innovation Electricals and Electronics (RI2C), Bangkok, Thailand, 1–3 September 2021; pp. 54–59. [Google Scholar]
- DeepPavlov. ruBERT-base-cased. Pretrained Model on Hugging Face Hub. Available online: https://huggingface.co/DeepPavlov/rubert-base-cased (accessed on 1 June 2024).
- Sber AI. ruRoberta-large. Pretrained Model on Hugging Face Hub. Available online: https://huggingface.co/ai-forever/ruRoberta-large (accessed on 1 June 2024).
- Li, Q.; Peng, H.; Li, J.; Xia, C.; Yang, R.; Sun, L.; Yu, P.S.; He, L. A survey on text classification: From shallow to deep learning. arXiv 2020, arXiv:2008.00364. [Google Scholar]
- Vaswani, A. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Sber AI. BERT Large Model Multitask (Cased) for Sentence Embeddings in Russian Language. Pretrained Model on Hugging Face Hub. Available online: https://huggingface.co/ai-forever/sbert_large_nlu_ru (accessed on 1 June 2024).
- Malkovsky, M.G. TULIPS-2-Natural Language Learning System. In Proceedings of the Coling 1982: Proceedings of the Ninth International Conference on Computational Linguistics, Prague, Czech Republic, 5–10 July 1982. [Google Scholar]
- Krogh, A. An introduction to hidden Markov models for biological sequences. In New Comprehensive Biochemistry; Elsevier: Amsterdam, The Netherlands, 1998; Volume 32, pp. 45–63. [Google Scholar]
- Zdorenko, T. Subject omission in Russian: A study of the Russian National Corpus. In Corpus-Linguistic Applications; Brill: Leiden, The Netherlands, 2010; pp. 119–133. [Google Scholar]
- Minetz, D.; Gorushkina, A. Morphological Analysizer of a Text: Functional Opportunities. Litera 2017, 1, 12–22. [Google Scholar]
- Mikheev, A.; Liubushkina, L. Russian morphology: An engineering approach. Nat. Lang. Eng. 1995, 1, 235–260. [Google Scholar] [CrossRef]
- Popova, E.; Spitsyn, V. Sentiment analysis of short russian texts using bert and word2vec embeddings. In Proceedings of the Graphion Conferences on Computer Graphics and Vision, Nizhny Novgorod, Russia, 27–30 September 2021; Volume 31, pp. 1011–1016. [Google Scholar]
- Korogodina, O.; Klyshinsky, E.; Karpik, O. Evaluation of vector transformations for Russian Word2Vec and FastText Embeddings. In Proceedings of the CEUR Workshop Proceedings, Luxembourg, 3–4 December 2020. [Google Scholar]
- Burtsev, M.; Seliverstov, A.; Airapetyan, N.; Arkhipov, M.; Kuratov, Y.; Kuznetsov, V.; Litinsky, D.; Ryabinin, M.; Sapunov, A.; Semenov, A.; et al. DeepPavlov: Open-Source Library for Dialogue Systems. In Proceedings of the ACL 2018, System Demonstrations, Melbourne, Australia, 15–20 July 2018; pp. 122–127. [Google Scholar]
- Shavrina, T.; Fenogenova, A.; Emelyanov, A.; Shevelev, D.; Artemova, E.; Malykh, V.; Mikhailov, V.; Tikhonova, M.; Chertok, A.; Evlampiev, A. RussianSuperGLUE: A Russian language understanding evaluation benchmark. arXiv 2020, arXiv:2010.15925. [Google Scholar]
- Research, G. Bert-Base-Multilingual-Cased. 2020. Available online: https://huggingface.co/google-bert/bert-base-multilingual-cased (accessed on 10 December 2024).
- Pires, T. How multilingual is multilingual BERT. arXiv 2019, arXiv:1906.01502. [Google Scholar]
- Kuratov, Y.; Arkhipov, M. Adaptation of deep bidirectional multilingual transformers for Russian language. arXiv 2019, arXiv:1905.07213. [Google Scholar]
- Snegirev, A.; Tikhonova, M.; Maksimova, A.; Fenogenova, A.; Abramov, A. The Russian-focused embedders’ exploration: ruMTEB benchmark and Russian embedding model design. arXiv 2024, arXiv:2408.12503. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Approach. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP), Hong Kong, China, 3–7 November 2019; pp. 5362–5370. [Google Scholar]
- Shvetsova, V.; Smirnov, I.; Nikolaev, S. Vikhr: Instruction-tuned Open-Source Models for Russian. arXiv 2024, arXiv:2405.13929. [Google Scholar]
- Fenogenova, A.; Tikhonova, M.; Mikhailov, V.; Shavrina, T.; Emelyanov, A.; Shevelev, D.; Kukushkin, A.; Malykh, V.; Artemova, E. Russian superglue 1.1: Revising the lessons not learned by russian nlp models. arXiv 2022, arXiv:2202.07791. [Google Scholar]
- Rogers, A.; Romanov, A.; Rumshisky, A.; Volkova, S.; Gronas, M.; Gribov, A. RuSentiment: An enriched sentiment analysis dataset for social media in Russian. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; pp. 755–763. [Google Scholar]
- Smetanin, S. The applications of sentiment analysis for Russian language texts: Current challenges and future perspectives. IEEE Access 2020, 8, 110693–110719. [Google Scholar] [CrossRef]
- Zakharova, O.; Glazkova, A. GreenRu: A Russian Dataset for Detecting Mentions of Green Practices in Social Media Posts. Appl. Sci. 2024, 14, 4466. [Google Scholar] [CrossRef]
- Romanov, A.; Kurtukova, A.; Shelupanov, A.; Fedotova, A.; Goncharov, V. Authorship identification of a Russian-language text using support vector machine and deep neural networks. Future Internet 2020, 13, 3. [Google Scholar] [CrossRef]
- Kalabikhina, I.; Moshkin, V.; Kolotusha, A.; Kashin, M.; Klimenko, G.; Kazbekova, Z. Advancing Semantic Classification: A Comprehensive Examination of Machine Learning Techniques in Analyzing Russian-Language Patient Reviews. Mathematics 2024, 12, 566. [Google Scholar] [CrossRef]
- Graça, J.; Oliveira, A.; Calheiros, M.M. Meat, beyond the plate. Data-driven hypotheses for understanding consumer willingness to adopt a more plant-based diet. Appetite 2015, 90, 80–90. [Google Scholar] [CrossRef]
- Karageorgou, D.; Castor, L.L.; de Quadros, V.P.; de Sousa, R.F.; Holmes, B.A.; Ioannidou, S.; Mozaffarian, D.; Micha, R. Harmonising dietary datasets for global surveillance: Methods and findings from the Global Dietary Database. Public Health Nutr. 2024, 27, e47. [Google Scholar] [CrossRef] [PubMed]
- Karabay, A.; Bolatov, A.; Varol, H.A.; Chan, M.Y. A central Asian food dataset for personalized dietary interventions. Nutrients 2023, 15, 1728. [Google Scholar] [CrossRef] [PubMed]
- Mikhalkova, E.; Ganzherli, N.; Karyakin, Y. A Comparative Analysis of Social Network Pages by Interests of Their Followers. arXiv 2017, arXiv:1707.05481. [Google Scholar]
- Shamoi, E.; Turdybay, A.; Shamoi, P.; Akhmetov, I.; Jaxylykova, A.; Pak, A. Sentiment analysis of vegan related tweets using mutual information for feature selection. PeerJ Comput. Sci. 2022, 8, e1149. [Google Scholar] [CrossRef] [PubMed]
- Le-Khac, P.H.; Healy, G.; Smeaton, A.F. Contrastive representation learning: A framework and review. IEEE Access 2020, 8, 193907–193934. [Google Scholar] [CrossRef]
- Tolegen, G.; Toleu, A.; Mussabayev, R. Contrastive Learning for Morphological Disambiguation Using Large Language Models in Low-Resource Settings. Appl. Sci. 2024, 14, 9992. [Google Scholar] [CrossRef]
- Wu, T.; Yang, S. Contrastive Enhanced Learning for Multi-Label Text Classification. Appl. Sci. 2024, 14, 8650. [Google Scholar] [CrossRef]
- Chen, Q.; Zhang, R.; Zheng, Y.; Mao, Y. Dual Contrastive Learning: Text Classification via Label-Aware Data Augmentation. arXiv 2022, arXiv:2201.08702. [Google Scholar]
- Sun, H.; Liu, J.; Zhang, J. A survey of contrastive learning in NLP. In Proceedings of the 7th International Symposium on Advances in Electrical, Electronics, and Computer Engineering, Xishuangbanna, China, 18–20 March 2022; Volume 12294, pp. 1073–1078. [Google Scholar]
- Yandex Zen. Yandex Zen Platform. Available online: https://zen.yandex.ru (accessed on 1 June 2024).
- Google. Google Search Engine. Available online: https://www.google.com (accessed on 1 June 2024).
- Cohen, J. A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 1960, 20, 37–46. [Google Scholar] [CrossRef]
- McHugh, M.L. Interrater reliability: The kappa statistic. Biochem. Medica 2012, 22, 276–282. [Google Scholar] [CrossRef]
- xiamx. node-nltk-stopwords. Available online: https://github.com/xiamx/node-nltk-stopwords (accessed on 1 June 2024).
- Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python; O’Reilly Media Inc.: Sebastopol, CA, USA, 2009. [Google Scholar]
- Honnibal, M.; Montani, I. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. Appear 2017, 7, 411–420. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. scikit-learn: Machine Learning in Python. 2011. Available online: https://scikit-learn.org (accessed on 4 June 2024).
- Řehůřek, R.; Sojka, P. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, Valletta, Malta, 22 May 2010; pp. 45–50. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Kochetova, L.; Popov, V. Research of Axiological Dominants in Press Release Genre based on Automatic Extraction of Key Words from Corpus. Nauchnyi Dialog 2019, 1, 32–49. [Google Scholar] [CrossRef]
- van der Maaten, L.; Hinton, G. Visualizing Data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- Blei, D.M.; Ng, A.Y.; Jordan, M.I. Latent Dirichlet Allocation. J. Mach. Learn. Res. 2003, 3, 993–1022. [Google Scholar]
- Pang, B.; Lee, L.; Vaithyanathan, S. Thumbs up?: Sentiment classification using machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, Philadelphia, PA, USA, 6–7 July 2002; pp. 79–86. [Google Scholar]
- Wankhade, M.; Rao, A.C.S.; Kulkarni, C. A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 2022, 55, 5731–5780. [Google Scholar] [CrossRef]
- Klyuev, G.; Gritsenko, I.; Panchenko, A.; Ruder, S.; Klyuev, M.D.; Oseledets, M.S.; Rakhlin, A.S. RuBERT: Pretrained Contextualized Embeddings for Russian. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Online, 16–20 November 2020; pp. 5369–5374. [Google Scholar]
- Blanchefort, G. blanchefort/rubert-base-cased-sentiment. 2020. Available online: https://huggingface.co/blanchefort/rubert-base-cased-sentiment (accessed on 8 November 2024).
- Chen, T.; Guestrin, C. Xgboost: Extreme Gradient Boosting; R package version 0.4-2. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
- Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
- Wright, R.E. Logistic regression. In Reading and Understanding Multivariate Statistics; Springer: New York, NY, USA, 1995; pp. 217–244. [Google Scholar]
- Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
- LaValley, M.P. Logistic regression. Circulation 2008, 117, 2395–2399. [Google Scholar] [CrossRef]
- Kumar, U.K.; Nikhil, M.S.; Sumangali, K. Prediction of breast cancer using voting classifier technique. In Proceedings of the 2017 IEEE International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), Chennai, India, 2–4 August 2017; pp. 108–114. [Google Scholar]
- Hiyouga. Dual Contrastive Learning. 2022. Available online: https://github.com/hiyouga/Dual-Contrastive-Learning (accessed on 26 March 2024).
- Google Colaboratory. Google Colaboratory. Available online: https://colab.research.google.com/ (accessed on 1 June 2024).
- Kotelnikova, A.; Paschenko, D.; Razova, E. Lexicon-based methods and BERT model for sentiment analysis of Russian text corpora. In Proceedings of the CEUR Workshop Proceedings, online, 16–19 December 2021; Volume 2922, pp. 73–81. [Google Scholar]
- NLTK Team. Stopwords Documentation. Available online: https://www.nltk.org/search.html?q=stopwords&check_keywords=yes&area=default (accessed on 1 June 2024).
- SpaCy Team. SpaCy Russian Language Models. Available online: https://spacy.io/models/ru (accessed on 1 June 2024).
Query | Translation |
---|---|
вегетарианствo за и прoтив | vegetarianism pros and cons |
вегетарианствo вред и пoльза | vegetarianism benefits and harms |
вегетарианствo дoстoинства и недoстатки | vegetarianism advantages and disadvantages |
вегетарианствo плюсы и минусы | vegetarianism pluses and minuses |
Cohen’s Kappa Score | Interpretation |
---|---|
<0 | No agreement |
– | none to mild agreement |
– | fair agreement |
– | moderate agreement |
– | substantial agreement |
– | nearly perfect agreement |
Text | Translation | Final Annotation |
---|---|---|
За пoследние два гoда кoличествo выбрoсoв углекислoгo газа в атмoсферу Земли вырoслo дo рекoрдных 37 миллиардoв тoнн. Пoка бoрцы за экoлoгию и сoчувствующие им вегетарианцы утверждают, чтo прoизвoдствo мясных прoдуктoв является oснoвным фактoрoм, загрязняющим атмoсферу, специалисты дoказывают oбратнoе. | In the past two years, carbon dioxide emissions into Earth’s atmosphere have risen to a record 37 billion tons. While environmental activists and supportive vegetarians claim that meat production is the main factor polluting the atmosphere, experts prove the opposite. | Neg |
Для тoгo, чтoбы не стoлкнуться с недoстаткoм витаминoв и минералoв, неoбхoдимo разнooбразнo питаться. Насытиться растительнoй пищей слoжнее, пoэтoму её нужнo гoтoвить в бoльших кoличествах и, сooтветственнo, тратить бoльше денег на прoдукты. С другoй стoрoны, люди перестают тратить деньги на мясo и начинают пoкупать фрукты, пoэтoму разница мoжет быть не такoй уж и бoльшoй. | To avoid vitamin and mineral deficiencies, one must eat a varied diet. It is harder to feel full on plant-based food, so you need to cook it in larger quantities and, accordingly, spend more money on groceries. On the other hand, people stop spending money on meat and start buying fruits, so the difference may not be that significant. | Pos |
Label | Texts | Avg Words | Avg Chars |
---|---|---|---|
pos | 526 | 147.92 | 895.85 |
neg | 498 | 118.43 | 892.71 |
total | 1024 | 145.22 | 884.95 |
Vector | Range | Size |
---|---|---|
tf-idf | − | 19,877 |
lemmatized tf-idf | − | 10,541 |
word n-grams | (1,3) | 20,499 |
lemmatized word n-grams | (1,3) | 19,398 |
character n-grams | (1,3) | 19,220 |
Range | Top n-Grams | Count | Translation |
---|---|---|---|
(1, 1) | этo | 705 | this |
мяса | 485 | meat | |
питания | 343 | nutrition | |
(2, 2) | живoтнoгo прoисхoждения | 183 | animal origin |
вегетарианская диета | 98 | vegetarian diet | |
питательных веществ | 69 | nutrients | |
(3, 3) | прoдуктoв живoтнoгo прoисхoждения | 59 | products of animal origin |
прoдуктах живoтнoгo прoисхoждения | 32 | in products of animal origin | |
прoдукты живoтнoгo прoисхoждения | 28 | products of animal origin | |
(1, 3) | этo | 705 | this |
мяса | 485 | meat | |
питания | 343 | nutrition |
Model | Settings |
---|---|
RF | n_estimators = 200; no maximum depth for each tree; Gini impurity for splitting; square root of the number of features considered for splitting at each node. |
XGB | learning rate = 0.1; maximum depth = 3 for each tree; logistic loss function for binary classification. |
LR | L2 regularization penalty; regularization strength (C) = 1.0; lbfgs solver. |
VC | combined predictions of RF and LR using a majority voting scheme. |
ruBERT, SBERT | pretrained on masked language modeling (MLM) and next sentence prediction (NSP) objectives; uses byte-pair encoding (BPE) tokenization; vocabulary size = 12·104 tokens; fine-tuned for sentence classification on relevant data. |
ruRoBERTa | pretrained on MLM objective; uses byte-level BPE tokenization; vocabulary size = 5·104 tokens; fine-tuned for sentence classification on relevant data. |
DualCL | Dual contrastive learning method; uses RuBERT, SBERT, or ruRoBERTa models for feature representations; trained for 20 epochs; contrastive loss with label-aware data augmentation. |
Representation | Classifier | P | R | F1 | Acc |
---|---|---|---|---|---|
char n-grams | LR | 0.7966 | 0.7960 | 0.7951 | 0.7951 |
word n-grams | RF | 0.8028 | 0.7931 | 0.7929 | 0.7951 |
tf-idf | LR | 0.7996 | 0.7826 | 0.7816 | 0.7854 |
lemmatization + word n-grams | LR | 0.8100 | 0.8093 | 0.8095 | 0.8098 |
lemmatization + tf-idf | LR | 0.8151 | 0.8081 | 0.8083 | 0.8098 |
char n-grams + topics | LR | 0.7966 | 0.7960 | 0.7951 | 0.7951 |
word n-grams + topics | LR | 0.7806 | 0.7800 | 0.7802 | 0.7805 |
tf-idf + topics | RF | 0.7846 | 0.7733 | 0.7727 | 0.7756 |
topics | LR | 0.6225 | 0.6110 | 0.6037 | 0.6146 |
char n-grams + SA | RF | 0.8034 | 0.7876 | 0.7868 | 0.7902 |
word n-grams + SA | LR | 0.8252 | 0.8124 | 0.8123 | 0.8146 |
tf-idf + SA | LR | 0.8238 | 0.7914 | 0.7890 | 0.7951 |
SA | LR | 0.5767 | 0.5733 | 0.5670 | 0.5707 |
Representation | Classifier | P | R | F1 | Acc |
---|---|---|---|---|---|
ruBERT SE | LR | 0.7609 | 0.7607 | 0.7608 | 0.7610 |
SBERT SE | RF | 0.7961 | 0.7943 | 0.7945 | 0.7951 |
ruRoBERTa SE | LR | 0.8438 | 0.8438 | 0.8438 | 0.8439 |
ruBERT SE + topics | LR | 0.7413 | 0.7412 | 0.7412 | 0.7415 |
SBERT SE + topics | LR | 0.7852 | 0.7852 | 0.7852 | 0.7854 |
ruRoBERTa SE + topics | LR | 0.8389 | 0.8390 | 0.8390 | 0.8390 |
ruBERT SE + SA | RF | 0.7719 | 0.7698 | 0.7699 | 0.7707 |
SBERT SE + SA | LR | 0.7852 | 0.7852 | 0.7852 | 0.7854 |
ruRoBERTa SE + SA | LR | 0.8536 | 0.8536 | 0.8536 | 0.8537 |
Representation | Classifier | P | R | F1 | Acc |
---|---|---|---|---|---|
FT ruBERT | SE | 0.5317 | 0.4288 | 0.4747 | 0.5050 |
FT ruRoBERTa | SE | 0.4926 | 0.3988 | 0.4410 | 0.4690 |
FT SBERT | SE | 0.5707 | 0.5563 | 0.5634 | 0.5478 |
Classifier | Base Model | P | R | F1 | Acc |
---|---|---|---|---|---|
CL | ruBERT | 0.7817 | 0.7791 | 0.7795 | 0.7805 |
CL | SBERT | 0.8193 | 0.8195 | 0.8194 | 0.8195 |
CL | ruRoBERTa | 0.8784 | 0.8787 | 0.8780 | 0.8780 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gorduna, N.; Vanetik, N. Vegetarianism Discourse in Russian Social Media: A Case Study. Appl. Sci. 2025, 15, 259. https://doi.org/10.3390/app15010259
Gorduna N, Vanetik N. Vegetarianism Discourse in Russian Social Media: A Case Study. Applied Sciences. 2025; 15(1):259. https://doi.org/10.3390/app15010259
Chicago/Turabian StyleGorduna, Nikita, and Natalia Vanetik. 2025. "Vegetarianism Discourse in Russian Social Media: A Case Study" Applied Sciences 15, no. 1: 259. https://doi.org/10.3390/app15010259
APA StyleGorduna, N., & Vanetik, N. (2025). Vegetarianism Discourse in Russian Social Media: A Case Study. Applied Sciences, 15(1), 259. https://doi.org/10.3390/app15010259