BERT Models for Arabic Text Classification: A Systematic Review
Abstract
:1. Introduction
2. Research Methodology
2.1. Definition of Research Questions
- What Bert models have been used for Arabic text classification, and how do they differ?
- How effective are they in classifying Arabic text?
- How effective are they compared to the original English BERT models?
2.2. Search Strategy
- Population: BERT, Arabic.
- Intervention: text classification, sentiment analysis.
- Comparison and Outcomes: these two dimensions were omitted, as the research questions do not warrant a restriction of the results to a particular outcome or comparison.
2.3. Selection of Studies
- It uses a BERT model for the Arabic text classification task.
- It evaluates the performance of the utilized BERT model.
- The dataset that has been used to evaluate the model is well described.
- It is written in English or Arabic.
- The full text of the article is not available online.
- The article is in the form of a poster, tutorial, abstract, or presentation.
- It is not in English or Arabic.
- It does not evaluate the performance of the utilized BERT model.
- The dataset that was used to evaluate the model is not described.
2.4. Quality Assessment
2.5. Data Extraction
2.6. Synthesis of Results
3. Results
3.1. Included Studies Overview
3.2. Quality of the Included Studies
3.3. BERT Models
3.4. Models Evaluation
3.5. Evaluation Datasets
4. Discussion
4.1. What BERT Models Were Used for Arabic Text Classification, and How Do They Differ?
- Multilingual BERT:
- 2.
- AraBERT:
- 3.
- MARBERT:
- 4.
- ArabicBERT:
- 5.
- ARBERT:
- 6.
- XLM-RoBERTa:
- 7.
- QARiB:
- 8.
- GigaBERT:
- 9.
- Arabic ALBERT:
4.2. How Effective Are They for Classifying Arabic Text?
4.3. How Effective Are They Compared to the Original English BERT Models?
5. Implications for Future Research
6. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Vijayan, V.K.; Bindu, K.; Parameswaran, L. A comprehensive study of text classification algorithms. In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, 13–16 September 2017; pp. 1109–1113. [Google Scholar]
- El-Din, D.M.; Hussein, M. A survey on sentiment analysis challenges. J. King Saud Univ.-Eng. Sci. 2018, 30, 330–338. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Rogers, A.; Kovaleva, O.; Rumshisky, A. A Primer in BERTology: What We Know About How BERT Works. Trans. Assoc. Comput. Linguist. 2020, 8, 842–866. [Google Scholar] [CrossRef]
- Zaib, M.; Sheng, Q.Z.; Emma Zhang, W. A short survey of pretrained language models for conversational AI-a new age in NLP. In Proceedings of the Australasian Computer Science Week Multiconference, Canberra, Australia, 1–5 February 2016; Association for Computing Machinery: New York, NY, USA, 2020; pp. 1–4. [Google Scholar]
- Alshalan, R.; Al-Khalifa, H. A deep learning approach for automatic hate speech detection in the saudi twittersphere. Appl. Sci. 2020, 10, 8614. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V.J. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T.J. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
- Almuqren, L. Twitter Analysis to Predict the Satisfaction of Saudi Telecommunication Companies’ Customers. Ph.D. Thesis, Durham University, Durham, UK, 2021. [Google Scholar]
- Pelicon, A.; Shekhar, R.; Škrlj, B.; Purver, M.; Pollak, S. Investigating cross-lingual training for offensive language detection. PeerJ Comput. Sci. 2021, 7, e559. [Google Scholar] [CrossRef]
- Antoun, W.; Baly, F.; Hajj, H. Arabert: Transformer-based model for arabic language understanding. arXiv 2020, arXiv:2003.00104. [Google Scholar]
- Abdul-Mageed, M.; Elmadany, A.; Nagoudi, E.M.B. ARBERT & MARBERT: Deep bidirectional transformers for Arabic. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; pp. 7088–7105. [Google Scholar]
- James, K.L.; Randall, N.P.; Haddaway, N.R. A methodology for systematic mapping in environmental sciences. Environ. Evid. 2016, 5, 1–13. [Google Scholar] [CrossRef] [Green Version]
- Moher, D.; Altman, D.G.; Liberati, A.; Tetzlaff, J. PRISMA statement. Epidemiology 2011, 22, 128. [Google Scholar] [CrossRef] [Green Version]
- Paez, A. Gray literature: An important resource in systematic reviews. J. Evid.-Based Med. 2017, 10, 233–240. [Google Scholar] [CrossRef] [PubMed]
- Kitchenham, B.; Charters, S. Guidelines for Performing Systematic Literature Reviews in Software Engineering; EBSE: Durham, UK, 2007. [Google Scholar]
- Zhou, Y.; Zhang, H.; Huang, X.; Yang, S.; Babar, M.A.; Tang, H. Quality assessment of systematic reviews in software engineering: A tertiary study. In Proceedings of the 19th International Conference on Evaluation and Assessment in Software Engineering, Nanjing, China, 27–29 April 2015; Association for Computing Machinery: New York, NY, USA, 2015; pp. 1–14. [Google Scholar]
- Bondas, T.; Hall, E.O. Challenges in approaching metasynthesis research. Qual. Health Res. 2007, 17, 113–121. [Google Scholar] [CrossRef] [PubMed]
- Morgan, J.A.; Olagunju, A.T.; Corrigan, F.; Baune, B.T. Does ceasing exercise induce depressive symptoms? A systematic review of experimental trials including immunological and neurogenic markers. J. Affect. Disord. 2018, 234, 180–192. [Google Scholar] [CrossRef] [PubMed]
- Alammary, A. Blended learning models for introductory programming courses: A systematic review. PLoS ONE 2019, 14, e0221765. [Google Scholar] [CrossRef] [PubMed]
- Bilal, S. A Linguistic System for Predicting Sentiment in Arabic Tweets. In Proceedings of the 2021 3rd International Conference on Natural Language Processing (ICNLP), Beijing, China, 26–28 March 2021; pp. 134–138. [Google Scholar]
- Al-Twairesh, N.; Al-Negheimish, H.J.I.A. Surface and deep features ensemble for sentiment analysis of arabic tweets. IEEE Access 2019, 7, 84122–84131. [Google Scholar] [CrossRef]
- Pàmies Massip, M. Multilingual Identification of Offensive Content in Social Media. Available online: https://www.diva-portal.org/smash/get/diva2:1451543/FULLTEXT01.pdf (accessed on 19 December 2021).
- Moudjari, L.; Akli-Astouati, K.; Benamara, F. An Algerian corpus and an annotation platform for opinion and emotion analysis. In Proceedings of the 12th Language Resources and Evaluation Conference, LREC 2020, Marseille, France, 11–16 May 2020; pp. 1202–1210. [Google Scholar]
- Khalifa, M.; Hassan, H.; Fahmy, A. Zero-Resource Multi-Dialectal Arabic Natural Language Understanding. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 1–15. [Google Scholar] [CrossRef]
- Alshehri, A.; Nagoudi, E.M.B.; Abdul-Mageed, M. Understanding and Detecting Dangerous Speech in Social Media; European Language Resource Association: Paris, France, 2020. [Google Scholar]
- Abdul-Mageed, M.; Zhang, C.; Elmadany, A.; Ungar, L. Toward micro-dialect identification in diaglossic and code-switched environments. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, Online, 8–12 November 2020; pp. 5855–5876. [Google Scholar]
- Ameur, M.S.H.; Aliane, H. AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News & Hate Speech Detection Dataset. Procedia Comput. Sci. 2021, 189, 232–241. [Google Scholar]
- Moudjari, L.; Karima, A.-A. An Experimental Study On Sentiment Classification Of Algerian Dialect Texts. Procedia Comput. Sci. 2020, 176, 1151–1159. [Google Scholar] [CrossRef]
- Alsafari, S.; Sadaoui, S.; Mouhoub, M. Deep learning ensembles for hate speech detection. In Proceedings of the 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Baltimore, MD, USA, 9–11 November 2020; pp. 526–531. [Google Scholar]
- Abdelali, A.; Mubarak, H.; Samih, Y.; Hassan, S.; Darwish, K. QADI: Arabic dialect identification in the wild. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; pp. 1–10. [Google Scholar]
- Alsafari, S.; Sadaoui, S.; Mouhoub, M.; Media. Hate and offensive speech detection on Arabic social media. Online Soc. Netw. 2020, 19, 100096. [Google Scholar] [CrossRef]
- Mubarak, H.; Hassan, S.; Abdelali, A. Adult content detection on arabic twitter: Analysis and experiments. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; pp. 136–144. [Google Scholar]
- Farha, I.A.; Magdy, W. Benchmarking transformer-based language models for Arabic sentiment and sarcasm detection. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; pp. 21–31. [Google Scholar]
- Uyangodage, L.; Ranasinghe, T.; Hettiarachchi, H. Transformers to fight the COVID-19 infodemic. Available online: https://arxiv.org/pdf/2104.12201.pdf (accessed on 4 December 2021).
- Obied, Z.; Solyman, A.; Ullah, A.; Fat’hAlalim, A.; Alsayed, A. BERT Multilingual and Capsule Network for Arabic Sentiment Analysis. In Proceedings of the 2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), Khartoum, Sudan, 26–28 February 2021; pp. 1–6. [Google Scholar]
- Mubarak, H.; Rashed, A.; Darwish, K.; Samih, Y.; Abdelali, A. Arabic Offensive Language on Twitter: Analysis and Experiments. Available online: https://arxiv.org/pdf/2004.02192.pdf (accessed on 17 November 2021).
- Safaya, A.; Abdullatif, M.; Yuret, D. Kuisail at semeval-2020 task 12: Bert-cnn for offensive speech identification in social media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, Barcelona, Spain, 12–13 December 2020; pp. 2054–2059. [Google Scholar]
- El-Alami, F.-z.; El Alaoui, S.O.; Nahnahi, N.E. Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization. J. King Saud Univ. Comput. Inf. Sci. 2021. [Google Scholar] [CrossRef]
- Abdelali, A.; Hassan, S.; Mubarak, H.; Darwish, K.; Samih, Y. Pre-training bert on arabic tweets: Practical considerations. arXiv 2021, arXiv:2102.10684. [Google Scholar]
- Mansour, M.; Tohamy, M.; Ezzat, Z.; Torki, M. Arabic dialect identification using BERT fine-tuning. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, Barcelona, Spain, 12 December 2020; pp. 308–312. [Google Scholar]
- Balaji, N.N.A.; Bharathi, B. Semi-supervised fine-grained approach for Arabic dialect detection task. In Proceedings of the Fifth Arabic Natural Language Processing Workshop, Barcelona, Spain, 12 December 2020; pp. 257–261. [Google Scholar]
- Abuzayed, A.; Al-Khalifa, H. Sarcasm and sentiment detection in Arabic tweets using BERT-based models and data augmentation. In Proceedings of the sixth Arabic natural language processing workshop, Kyiv, Ukraine, 19 April 2021; pp. 312–317. [Google Scholar]
- Saeed, H.H.; Calders, T.; Kamiran, F. OSACT4 shared tasks: Ensembled stacked classification for offensive and hate speech in Arabic tweets. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, France, 12 May 2020; pp. 71–75. [Google Scholar]
- Zhang, C.; Abdul-Mageed, M. No army, no navy: Bert semi-supervised learning of arabic dialects. In Proceedings of the Fourth Arabic Natural Language Processing Workshop, Florence, Italy, 1 August 2019; pp. 279–284. [Google Scholar]
- Naski, M.; Messaoudi, A.; Haddad, H.; BenHajhmida, M.; Fourati, C.; Mabrouk, A.B.E. iCompass at Shared Task on Sarcasm and Sentiment Detection in Arabic. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; pp. 381–385. [Google Scholar]
- Hassan, S.; Samih, Y.; Mubarak, H.; Abdelali, A. ALT at SemEval-2020 task 12: Arabic and English offensive language identification in social media. In Proceedings of the Fourteenth Workshop on Semantic Evaluation, Barcelona, Spain, 12–13 December 2020; pp. 1891–1897. [Google Scholar]
- Faraj, D.; Abdullah, M. Sarcasmdet at sarcasm detection task 2021 in arabic using arabert pretrained model. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; pp. 345–350. [Google Scholar]
- Israeli, A.; Nahum, Y.; Fine, S.; Bar, K. The IDC System for Sentiment Classification and Sarcasm Detection in Arabic. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; pp. 370–375. [Google Scholar]
- Aldjanabi, W.; Dahou, A.; Al-qaness, M.A.; Abd Elaziz, M.; Helmi, A.M.; Damaševičius, R. Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model. Informatics 2021, 8, 69. [Google Scholar] [CrossRef]
- Elgabry, H.; Attia, S.; Abdel-Rahman, A.; Abdel-Ate, A.; Girgis, S. A contextual word embedding for Arabic sarcasm detection with random forests. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; pp. 340–344. [Google Scholar]
- Alam, F.; Shaar, S.; Dalvi, F.; Sajjad, H.; Nikolov, A.; Mubarak, H.; Martino, G.D.S.; Abdelali, A.; Durrani, N.; Darwish, K. Fighting the COVID-19 infodemic: Modeling the perspective of journalists, fact-checkers, social media platforms, policy makers, and the society. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing., Punta Cana, Dominican Republic, 7–11 November 2021. [Google Scholar]
- Al-Yahya, M.; Al-Khalifa, H.; Al-Baity, H.; AlSaeed, D.; Essam, A. Arabic Fake News Detection: Comparative Study of Neural Networks and Transformer-Based Approaches. Complexity 2021, 2021. [Google Scholar] [CrossRef]
- Mulki, H.; Ghanem, B.J. Let-mi: An Arabic Levantine Twitter dataset for misogynistic language. arXiv 2021, arXiv:2103.10195. [Google Scholar]
- Mubarak, H.; Abdelali, A.; Hassan, S.; Darwish, K. Spam detection on arabic twitter. In Proceedings of the International Conference on Social Informatics, Pisa, Italy, 6–9 October 2020; pp. 237–251. [Google Scholar]
- Mubarak, H.; Hassan, S. Arcorona: Analyzing arabic tweets in the early days of coronavirus (COVID-19) pandemic. In Proceedings of the 12th International Workshop on Health Text Mining and Information Analysis, Virtual Conference, Online, 19–20 April 2021. [Google Scholar]
- El-Alami, F.-z.; El Alaoui, S.O.; Nahnahi, N.E. A multilingual offensive language detection method based on transfer learning from transformer fine-tuning model. J. King Saud Univ. Comput. Inf. Sci. 2021. [Google Scholar] [CrossRef]
- Al-Twairesh, N. The Evolution of Language Models Applied to Emotion Analysis of Arabic Tweets. Information 2021, 12, 84. [Google Scholar] [CrossRef]
- Husain, F.; Uzuner, O. Leveraging offensive language for sarcasm and sentiment detection in Arabic. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; pp. 364–369. [Google Scholar]
- Wadhawan, A. Arabert and farasa segmentation based approach for sarcasm and sentiment detection in arabic tweets. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021. [Google Scholar]
- Bashmal, L.; AlZeer, D. ArSarcasm Shared Task: An Ensemble BERT Model for SarcasmDetection in Arabic Tweets. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; pp. 323–328. [Google Scholar]
- Gaanoun, K.; Benelallam, I. Sarcasm and Sentiment Detection in Arabic language A Hybrid Approach Combining Embeddings and Rule-based Features. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; pp. 351–356. [Google Scholar]
- Alharbi, A.I.; Lee, M. Multi-task learning using a combination of contextualised and static word embeddings for arabic sarcasm detection and sentiment analysis. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; pp. 318–322. [Google Scholar]
- Abdel-Salam, R. Wanlp 2021 shared-task: Towards irony and sentiment detection in arabic tweets using multi-headed-lstm-cnn-gru and marbert. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kyiv, Ukraine, 19 April 2021; pp. 306–311. [Google Scholar]
- Wu, S.; Dredze, M. Are all languages created equal in multilingual BERT? In Proceedings of the 5th Workshop on Representation Learning for NLP, Online, 9 July 2020; pp. 120–130. [Google Scholar]
- Abdaoui, A.; Pradel, C.; Sigel, G. Load What You Need: Smaller Versions of Multilingual BERT. In Proceedings of the SustaiNLP: Workshop on Simple and Efficient Natural Language Processing, Online, 10 November 2021; pp. 119–123. [Google Scholar]
- Conneau, A.; Khandelwal, K.; Goyal, N.; Chaudhary, V.; Wenzek, G.; Guzmán, F.; Grave, E.; Ott, M.; Zettlemoyer, L.; Stoyanov, V. Unsupervised cross-lingual representation learning at scale. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 8440–8451. [Google Scholar]
- Lan, W.; Chen, Y.; Xu, W.; Ritter, A. An Empirical Study of Pre-trained Transformers for Arabic Information Extraction. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 4727–4734. [Google Scholar]
- Safaya, A. Arabic-ALBERT. arXiv 2022, arXiv:2201.07434. [Google Scholar]
- Minaee, S.; Kalchbrenner, N.; Cambria, E.; Nikzad, N.; Chenaghlu, M.; Gao, J. Deep learning--based text classification: A comprehensive review. ACM Comput. Surv. 2021, 54, 1–40. [Google Scholar] [CrossRef]
- Li, Q.; Peng, H.; Li, J.; Xia, C.; Yang, R.; Sun, L.; Yu, P.S.; He, L. A Survey on Text Classification: From Traditional to Deep Learning. ACM Trans. Intell. Syst. Technol. 2021, 37. [Google Scholar] [CrossRef]
- Virtanen, A.; Kanerva, J.; Ilo, R.; Luoma, J.; Luotolahti, J.; Salakoski, T.; Ginter, F.; Pyysalo, S. Multilingual is not enough: BERT for Finnish. arXiv 2019, arXiv:1912.07076. [Google Scholar]
- Martin, L.; Muller, B.; Suárez, P.J.O.; Dupont, Y.; Romary, L.; De La Clergerie, É.V.; Seddah, D.; Sagot, B. CamemBERT: A Tasty French Language Model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7203–7219. [Google Scholar]
- Ranasinghe, T.; Zampieri, M. Multilingual offensive language identification for low-resource languages. Trans. Asian Low-Resour. Lang. Inf. Processing 2021, 21, 1–13. [Google Scholar] [CrossRef]
- Jain, M.; Mathew, M.; Jawahar, C. Unconstrained scene text and video text recognition for arabic script. In Proceedings of the 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, 3–5 April 2017; pp. 26–30. [Google Scholar]
- Himdi, H.; Weir, G.; Assiri, F.; Al-Barhamtoshy, H. Arabic fake news detection based on textual analysis. Arab. J. Sci. Eng. 2022, 1–17. [Google Scholar] [CrossRef] [PubMed]
- Statista. Leading Countries Based on Number of Twitter Users as of January 2022. Available online: https://www.statista.com/statistics/242606/number-of-active-twitter-users-in-selected-countries/ (accessed on 12 January 2022).
- Moores, B.; Mago, V. A Survey on Automated Sarcasm Detection on Twitter. arXiv 2022, arXiv:2202.02516. [Google Scholar]
- Rao, Y.; Xie, H.; Li, J.; Jin, F.; Wang, F.L.; Li, Q. Social emotion classification of short text via topic-level maximum entropy model. Inf. Manag. 2016, 53, 978–986. [Google Scholar] [CrossRef]
- Lee, J.; Yoon, W.; Kim, S.; Kim, D.; Kim, S.; So, C.H.; Kang, J. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2019, 36, 1234–1240. [Google Scholar] [CrossRef]
- Schwartz, R.; Dodge, J.; Smith, N.A.; Etzioni, O. Green AI. Commun. ACM 2020, 63, 54–63. [Google Scholar] [CrossRef]
- Sun, C.; Shrivastava, A.; Singh, S.; Gupta, A. Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision, Venice, Italy, 22–29 October 2017; pp. 843–852. [Google Scholar]
- Al-Maimani, M.R.; Al Naamany, A.; Bakar, A.Z.A. Arabic information retrieval: Techniques, tools and challenges. In Proceedings of the 2011 IEEE GCC Conference and Exhibition (GCC), Dubai, United Arab Emirates, 19–22 February 2011; pp. 541–544. [Google Scholar]
- Wang, Y.; Sun, Y.; Ma, Z.; Gao, L.; Xu, Y. Named Entity Recognition in Chinese Medical Literature Using Pretraining Models. Sci. Program. 2020, 2020, 8812754. [Google Scholar] [CrossRef]
- Khemakhem, I.T.; Jamoussi, S.; Hamadou, A.B. Integrating morpho-syntactic features in English-Arabic statistical machine translation. In Proceedings of the Second Workshop on Hybrid Approaches to Translation, Sofia, Bulgaria, 8 August 2013; pp. 74–81. [Google Scholar]
- Akan, M.F.; Karim, M.R.; Chowdhury, A.M.K. An analysis of Arabic-English translation: Problems and prospects. Adv. Lang. Lit. Stud. 2019, 10, 58–65. [Google Scholar] [CrossRef]
Database | Search String |
---|---|
IEEE Xplore | (“BERT” AND “Arabic” AND (“text classification” OR “sentiment analysis”)) |
Sciencedirect | (‘BERT’ AND ‘Arabic’ AND (‘text classification OR ‘sentiment analysis’)) |
Springer Link | “BERT” AND “Arabic” AND (“text classification” OR “sentiment analysis”) |
Taylor & Francis Online | [All: “bert”] AND [All: “arabic”] AND [[All: “text classification”] OR [All: “sentiment analysis”]] |
ACM digital library | [All: “bert”] AND [All: “arabic”] AND [[All: “text classification”] OR [All: “sentiment analysis”]] |
ProQuest journals | “BERT” AND “Arabic” AND (“text classification” OR “sentiment analysis”) |
Google Scholar | “bert” AND “arabic” AND (“text classification” OR “sentiment analysis”) |
Dimension | Criterion |
---|---|
Reporting | Is there a clear definition of the research questions, aims, and objectives? |
Is the research context adequately described? | |
Rigor | Are the research design, method, and measures clearly described? |
Is the research design appropriate to answer the research question? | |
Credibility | Is there a clear description of the study findings? |
Has adequate data been presented to support the study findings? | |
Relevance | Does the study contribute to research or practice? |
Is the conclusion supported by the results? |
Criterion | Fully Met | Partially Met | Not Met |
---|---|---|---|
Is there a clear definition of the research questions, aims, and objectives? | 19 | 23 | 6 |
Is the research context adequately described? | 32 | 13 | 3 |
Are the research design, method, and measures clearly described? | 27 | 16 | 5 |
Is the research design appropriate to answer the research question? | 33 | 10 | 5 |
Is there a clear description of the study findings? | 41 | 7 | 0 |
Has adequate data been presented to support the study findings? | 18 | 21 | 9 |
Does the study contribute to research or practice? | 17 | 29 | 2 |
Is the conclusion supported by the results? | 27 | 13 | 8 |
Model | Articles | Feature Extraction | Fine-Tuning |
---|---|---|---|
Multilingual BERT | 31 articles [21], [22], [10], [23], [24], [25], [26], [27], [28], [29], [6], [30], [31], [32], [33], [34], [35], [36], [37], [11], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48] | 2 | 29 |
AraBERT | 28 articles [49], [50], [51], [9], [52], [28], [53], [54], [55], [30], [31], [33], [56], [34], [35], [36], [57], [37], [11], [58], [39], [40], [59], [43], [60], [46], [61], [48] | 3 | 26 |
MARBERT | 9 articles [50], [62], [27], [53], [34], [43], [46], [63], [64] | 0 | 9 |
ArabicBERT | 5 articles [34], [58], [38], [40], [43] | 1 | 5 |
ARBERT | 4 articles [53], [34], [34], [43] | 0 | 4 |
XLM-RoBERTa | 4 articles [52], [25], [34], [48] | 0 | 4 |
QARiB | 4 articles [53], [34], [43], [40] | 0 | 4 |
GigaBERT | 3 articles [49], [34], [43] | 0 | 3 |
Arabic ALBERT | 1 article [34] | 0 | 1 |
Model | Outperformed | Outperformed By | Performance Score |
---|---|---|---|
Multilingual BERT (Fine-tuned) | XLM-RoBERTa, cseBERT, SVM, Random Forest Classifier, NB, Word2Vec, CNN, Majority Baseline Class, LSTM, LR, Mazajak Embeddings, hULMonA, Capsule Network, fastText + SVM, Bi-LSTM, SVM + TF-IDF, Doc2vec + MLP, LSA + SVM, Doc2vec + SVM, TFIDF + SVM, Word2vec + SVM, Voting Classifier, AraVec, AraVec + CBOW, Gated Recurrent Unit (GRU), fastText | ArabicBERT, ArabicBERT + CNN, ARBERT, Arabic ALBERT, GigaBERT, QARiB, XLM-RoBERTa, MARBERT, AraBERT, Sentiment Embeddings, Generic Word Embeddings, Surface Features, CNN, LSTM + Word Embedding, LSTM, CNN+ Word Embedding, BiLSTM, AraVec, Multi-Layer, Flair, Perceptron (MLP), FastText, GRU, CNN + GRU, Majority Class, SVM, AraELECTRA, AraGPT2, Mazajak + SVM, AraVec + SVM, CNN-Text, Semi-Supervised FastText, TF-IDF, Word2Vec, SVM + ValenceList, SVM+ N-gram+ Mazajak, C-LSTM, SVM+ N-gram, Multilingual BERT + SVM, ARBERT + SVM | Highest F1 Score: 0.973 Lowest F1 Score: 0.118 Average F1 Score: 0.678 |
Multilingual BERT (Feature extraction) | FastText-SkipGram+ CNN, FastText-SkipGram + BiLSTM, Unigram + NB SVM LR, Word-ngrams + NB SVM LR, FastText +NB SVM LR, Random Embedding+ CNN LSTM GRU, AraVec(CBOW)+ CNN LSTM GRU, AraVec + CNN LSTM GRU, AraVec-100-CBOW, AraVec-300-SG, BiLSTM, LSTM, Doc2vec_MLP, LSA_SVM, Doc2vec_SVM, TFIDF_SVM, Word2vec_SVM, Multilingual BERT (Fine-tuned) | AraBert + CNN, FastText, Word2Vec, AraBERT (Fine-tuned) | Highest F1 Score: 0.95 Lowest F1 Score: 0.589 Average F1 Score: 0.798 |
AraBERT (Fine-tuned) | Multilingual BERT, GigaBERT, XLM-RoBERTa, QARiB, MARBERT, ArabicBERT, ARBERT, Arabic ALBERT, TF-IDF, hULMonA, FastText, AraELECTRA, AraGPT2, BOW + TF-IDF, LSTM, Majority Class, SVM, Mazajak, SVM + Mazajak, BiLSTM, AraGPT, Capsule Network, GRU, CNN, CNN-GRU, AraULMFiT, Mazajak + SVM, AraVec + SVM, fastText + SVM, AraVec + skip-gram (SG), AraVec + CBOW, Doc2vec + MLP, LSA + SVM, Doc2vec + SVM, TFIDF + SVM, Word2vec + SVM, Sentence embedding + LR classifier, TF-IDF + LR classifier, BOW + LR classifier, Multilingual BERT + SVM, ARBERT + SVM | ArabicBert, MARBERT, QARIB, ARBERT, AraBERT + Gradient Boosting Trees (GBT), Multi-Task Learning Model + MarBERT, Frenda et al. (2018) model, SVM | Highest F1 Score: 0.99 Lowest F1 Score: 0.35 Average F1 Score: 0.769 |
AraBERT (Feature extraction) | Multilingual BERT +CNN, AraBERT, GigaBERT, FastText-SkipGram+ CNN, FastText-SkipGram + BiLSTM, TF-IDF, BiLSTM, LSTM, Doc2vec_MLP, LSA_SVM, Doc2vec_SVM, TFIDF_SVM, Word2vec_SVM, Multilingual BERT (Fine-tuned), Multilingual BERT (Feature extraction) | AraBERT (Fine-tuned) | Highest F1 Score: 0.97 Lowest F1 Score: 0.606 Average F1 Score: 0.791 |
MARBERT (Fine-tuned) | AraBERT, ArabicBERT, ARBERT, Arabic ALBERT, GigaBERT, QARiB, Multilingual BERT, XLM-RoBERTa, Bi-LSTM + Mazajak, Gaussian Naive Bayes, Gated Recurrent Unit (GRU), AraELECTRA, BiLSTM, AraGPT, MTL-LSTM, MTL-CNN, CNN-CE, CNN-AraVec, Multi-headed-LSTM-CNNGRU, CNN-LSTM+ Dialect Information, Multi-headed-LSTM-CNNGRU+ TF-IDF | AraBERT, QARiB, ARBERT, AraELECTRA, AraGPT2, Weighted ensemble, Multi-Task Learning (MTL)-CNN-LSTM | Highest F1 Score: 0.934 Lowest F1 Score: 0.57 Average F1 Score: 0.710 |
ArabicBERT (Fine-tuned) | AraBert, ARBERT, Arabic ALBERT, GigaBERT, XLM-RoBERTa, Multilingual BERT, QARiB, BiLSTM, AraGPT2, AraVec + skip-gram (SG), AraVec + CBOW, TF-IDF, CNN-Text, Bi-LSTM, SVM + TF-IDF | QARiB, AraBERT, MARBERT, Arabic BERT + CNN, ARBERT, AraELECTRA | Highest F1 Score: 0.884 Lowest F1 Score: 0.53 Average F1 Score: 0.721 |
ArabicBERT (Feature extraction) | ArabicBERT, Multilingual BERT, CNN-Text, Bi-LSTM, SVM + TF-IDF | - | Highest F1 Score: 0.897 Lowest F1 Score: 0.897 Average F1 Score: 0.897 |
ARBERT (Fine-tuned) | MarBERT, Arabic ALBERT, GigaBERT, Multilingual BERT, AraBERT, ArabicBERT, BiLSTM, AraGPT2 | AraBERT, QARiB, MARBERT, ArabicBERT, XLM-RoBERTa, AraGPT2, AraELECTRA | Highest F1 Score: 0.891 Lowest F1 Score: 0.57 Average F1 Score: 0.721 |
XLM-RoBERTa (Fine-tuned) | ARBERT, Arabic ALBERT, GigaBERT, Multilingual BERT, FastText, Majority Class, BiLSTM, CNN, AraGPT2 | AraBERT, Multilingual BERT, MARBERT, ArabicBERT, QARiB, AraELECTRA | Highest F1 Score: 0.922 Lowest F1 Score: 0.399 Average F1 Score: 0.684 |
QARiB (Fine-tuned) | ARBERT, AraELECTRA, AraBERT, AraGPT2, Multilingual BERT, XLM-RoBERTa, ArabicBERT, BiLSTM, Arabic ALBERT, GigaBERT, MARBERT | AraBERT, MARBERT, ARBERT, Arabic ALBERT, AraELECTRA, XLM-RoBERTa, ArabicBERT | Highest F1 Score: 0.87 Lowest F1 Score: 0.589 Average F1 Score: 0.750 |
GigaBERT (Fine-tuned) | Multilingual BERT, AraGPT2 | AraBERT, AraBERT+ Gradient Boosting Trees (GBT), MARBERT, ArabicBERT, ARBERT, Arabic ALBERT, QARiB, XLM-RoBERTa, BiLSTM, AraELECTRA | Highest F1 Score: 0.692 Lowest F1 Score: 0.51 Average F1 Score: 0.601 |
Arabic ALBERT (Fine-tuned) | GigaBERT, Multilingual BERT, AraGPT2, | AraBERT, MARBERT, ArabicBERT, ARBERT, QARiB, XLM-RoBERTa, BiLSTM, AraELECTRA | Highest F1 Score: 0.691 Lowest F1 Score: 0.555 Average F1 Score: 0.623 |
Type of Data | Size of Dataset | Number of Articles | Modelss |
---|---|---|---|
Sarcastic and non-sarcastic tweets | 12,548, 10,547, 15,548 | 13 | AraBERT, Multilingual BERT, GigaBERT, XLM-RoBERTa, MARBERT, ArabicBERT, ARBERT, Arabic ALBERT, QARiB |
Offensive and hate speech posts from social media | 20,000, 6024, 5846, 7839, 10,000, 9000, 5011, 10,828, 27,800, 13,794, 10,000 | 11 | AraBERT, Multilingual BERT |
Tweets that were written in standard and dialects Arabic | 22,000, 5971, 11,112, 2479, 2,400,000, 540,000, 21,000, 10,000, 288,086 | 8 | Multilingual BERT, MARBERT, AraBERT, ArabicBERT |
Tweets about customers satisfaction | 20,000 | 1 | AraBERT |
Accurate and inaccurate tweets about COVID-19 | 4966, 3032, 8000, 4072 | 4 | AraBERT, QARiB, ARBERT, MARBERT, Multilingual BERT |
Misogynistic and non-misogynistic tweets | 6603 | 1 | AraBERT |
Spam and ham tweets | 134,222 | 1 | AraBERT |
Hateful and normal tweets | 8964, 3480, 5340 | 3 | Multilingual BERT, AraBERT, |
Adult (sexual) and normal tweets | 50,000 | 1 | Multilingual BERT, AraBERT, |
Positive and negative tweets | 10,000, 3962, 10,000 | 3 | AraBERT, Multilingual BERT, ArabicBERT |
Hotel and book reviews | 156,700 | 1 | AraBERT, Multilingual BERT |
Articles classified by subject | 22,429 | 1 | AraBERT, Multilingual BERT |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alammary, A.S. BERT Models for Arabic Text Classification: A Systematic Review. Appl. Sci. 2022, 12, 5720. https://doi.org/10.3390/app12115720
Alammary AS. BERT Models for Arabic Text Classification: A Systematic Review. Applied Sciences. 2022; 12(11):5720. https://doi.org/10.3390/app12115720
Chicago/Turabian StyleAlammary, Ali Saleh. 2022. "BERT Models for Arabic Text Classification: A Systematic Review" Applied Sciences 12, no. 11: 5720. https://doi.org/10.3390/app12115720
APA StyleAlammary, A. S. (2022). BERT Models for Arabic Text Classification: A Systematic Review. Applied Sciences, 12(11), 5720. https://doi.org/10.3390/app12115720