Building a New Sentiment Analysis Dataset for Uzbek Language and Creating Baseline Models †
Abstract
:1. Introduction
2. Experiments & Results
Funding
Conflicts of Interest
References
- Barnes, J.; Klinger, R.; Walde, S.S.I. Assessing State-of-the-Art Sentiment Models on State-of-the-Art Sentiment Datasets. arXiv 2017, arXiv:1709.04219. [Google Scholar]
- Matlatipov, G.; Vetulani, Z. Representation of Uzbek Morphology in Prolog. In Aspects of Natural Language Processing; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2009; Volume 5070. [Google Scholar]
- Grave, E.; Bojanowski, P.; Gupta, P.; Joulin, A.; Mikolov, T. Learning Word Vectors for 157 Languages. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 7–12 May 2018. [Google Scholar]
- Kuriyozov, E.; Matlatipov, S.; Alonso, M.A.; Gómez-Rodrıguez, C. Deep Learning vs. Classic Models on a New Uzbek Sentiment Analysis Dataset. In Proceedings of the Human Language Technologies as a Challenge for Computer Science and Linguistics—2019, Roznan, Poland, 6–8 November 2009; pp. 258–262. [Google Scholar]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Chollet, F. Keras. 2015. Available online: https://github.com/fchollet/keras (accessed on 5 September 2019).
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. 2015. Available online: tensorflow.org. (accessed on 5 September 2019).
Methods Used | ManualTT | TransTT | TTMT |
---|---|---|---|
Support-vector Machines based on linear kernel model | 0.8002 | 0.8588 | 0.7756 |
Logistic Regression model based on word ngrams | 0.8547 | 0.8810 | 0.7720 |
Recurrent + Convolutional neural network | 0.8653 | 0.8864 | 0.7850 |
Recurrent Neural Network with fastText pre-trained word embeddings | 0.8782 | 0.8832 | 0.7996 |
Logistic Regression model based on word and character ngram | 0.8846 | 0.8956 | 0.8145 |
Recurrent Neural Network without pre-trained embeddings | 0.8868 | 0.8832 | 0.8052 |
Logistic Regression model based on character ngrams | 0.8868 | 0.8945 | 0.8021 |
Convolutional Neural Network (Multichannel) | 0.8888 | 0.8832 | 0.8120 |
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kuriyozov, E.; Matlatipov, S. Building a New Sentiment Analysis Dataset for Uzbek Language and Creating Baseline Models. Proceedings 2019, 21, 37. https://doi.org/10.3390/proceedings2019021037
Kuriyozov E, Matlatipov S. Building a New Sentiment Analysis Dataset for Uzbek Language and Creating Baseline Models. Proceedings. 2019; 21(1):37. https://doi.org/10.3390/proceedings2019021037
Chicago/Turabian StyleKuriyozov, Elmurod, and Sanatbek Matlatipov. 2019. "Building a New Sentiment Analysis Dataset for Uzbek Language and Creating Baseline Models" Proceedings 21, no. 1: 37. https://doi.org/10.3390/proceedings2019021037
APA StyleKuriyozov, E., & Matlatipov, S. (2019). Building a New Sentiment Analysis Dataset for Uzbek Language and Creating Baseline Models. Proceedings, 21(1), 37. https://doi.org/10.3390/proceedings2019021037