Adversarially Robust Multitask Learning for Offensive and Hate Speech Detection in Arabic Text Using Transformer-Based Models and RNN Architectures
Abstract
1. Introduction
- RQ1: How can we develop an adversarially robust multitask model that combines adversarial and multitask learning to improve the detection of Arabic hate speech and offensive language on social media platforms and resist adversarial attacks?
- How can we evaluate various configurations of Arabic pre-trained language models combined with various sequential layers across three settings to identify the most suitable model for detection of Arabic offensive language and hate speech?
- Can an augmented dataset effectively improve a model’s performance in detecting Arabic offensive language and hate speech compared to a non-augmented dataset?
- Can multitask learning effectively improve a model’s performance in detecting Arabic offensive language and hate speech compared to single-task models?
- How can we perform an effective adversarial attack strategy to generate adversarial examples that can defeat the detection model?
- How can we perform a defensive strategy against adversarial attacks to improve the robustness of the detection model?
- We propose a novel adversarial multitask learning framework that combines multitask learning with adversarial learning to enhance the robustness and generalization of the detection of Arabic offensive language and hate speech detection models;
- We augment the training dataset with a substantial number of Arabic posts collected from the X social media platform to address the class imbalance problem and improve the model’s generalizability, while preserving the test set’s natural class distribution to reflect realistic conditions;
- We conduct a comprehensive comparison of learning models by evaluating multiple combinations of Arabic pre-trained language models with various recurrent architectures trained under diverse learning settings;
- We demonstrate the effectiveness of multitask learning compared to that of single-task models by showing improved performance and better generalization across both offensive and hate speech classification tasks;
- We propose novel adversarial attack scenarios that are specifically designed for Arabic text, which subtly modify the input while preserving their meaning and readability, and which effectively deceive standard detection models;
- We implement and evaluate targeted defensive strategies, including adversarial training and input transformation techniques, to maintain model performance under adversarial conditions;
- We evaluate the model’s performance under real-world imbalanced conditions by maintaining the original distribution of the test dataset, offering a more realistic assessment of its robustness in practical scenarios.
2. Related Work
2.1. Studies on Arabic Offensive/Hate Speech Detection
2.2. Multitask Learning in Offensive/Hate Speech Detection
2.3. Adversarial Attacks Methods for Generating Adversarial Text
2.4. Gaps in the Literature
3. Methodology
3.1. Problem Definition
3.2. Dataset and Augmentation
- Offensive language detection: each post was first assessed as either “Clean” (i.e., free of any offensive, hateful, or profane content) or “Offensive” (i.e., containing unacceptable language such as insults, profanity, threats, swear words, or any form of untargeted profanity);
- Hate speech identification: posts labeled as “Offensive” were further categorized as follows:
- Hateful posts: posts that targeted individuals or groups based on protected characteristics such as race, religion, gender, ethnicity/nationality, ideology, social class, disability, or disease;
- Offensive but not hateful posts: posts that contained profanity or general offensive language but did not target individuals based on their identity or group characteristics.
3.3. Data Pre-Processing
- Normalization of Arabic characters (e.g., unifying Alef variants, replacing “ة” with “ه”);
- Removal of diacritics, URLs, emojis, numbers, and non-Arabic characters;
- User mention stripping and hashtag splitting (e.g., حريةـالرأي#حرية الرأي → );
- Text cleaning was performed by eliminating punctuation, character elongation (e.g., “راااائع”), and repetitive letters;
- Tokenization using the corresponding tokenizer for each Arabic PLM (e.g., MARBERT tokenizer) with truncation and padding to a fixed maximum sequence length.
3.4. Model Architecture
- The offensive language detection (OFF) branch applies a pooling layer to compress the sequential output, followed by fully connected layers with ReLU activation and dropout regularization to reduce overfitting. The final output is generated through a sigmoid activation layer that produces a binary classification indicating whether the input text is offensive;
- The hate speech detection (HS) branch extends the sequence with an additional RNN layer to further refine task-specific temporal patterns. This is followed by pooling and fully connected layers with ReLU activation and dropout. Finally, a sigmoid output layer provides binary prediction for hate speech presence.
- Setting 1: single-task learning using the original dataset
- Setting 2: single-task learning using the augmented dataset
- Setting 3: multitask learning using the augmented dataset
3.5. Handling Imbalanced Datasets and Evaluation Metrics
3.6. Adversarial Sample Generation
- Insert-Space: Insert spaces between characters. For example, وقح → و ق ح;
- Delete-Char: Randomly remove characters from a word. For example, تافهه → تاهه;
- Swap-Letters: Swap adjacent characters. For example, خائن → اخئن;
- Moreover, we applied a sentence-level adversarial attack:
- Back-Translation: translate the text from Arabic to English and back to Arabic.
3.7. Defense and Robustness Enhancement Techniques
- Adversarial training
- Input Transformation techniques
- Letter concatenation: reunite sequential characters that have been separated by spaces;
- Leetspeak conversion: if a number appears within a word and is not separated by spaces, convert it to the corresponding similar character.
3.8. Hyperparameter Settings
4. Results and Discussion
4.1. Comparison of Model Configurations
- Setting 1: single-task models trained on the original dataset.
- Setting 2: single-task models trained with the augmented dataset.
- Setting 3: multitask models trained with the augmented dataset.
4.2. Final Evaluation on Clean vs Adversarial Data
4.3. Effectiveness of Adversarial Training
5. Conclusions and Future Work
- We will develop more sophisticated adversarial attack and defense mechanisms at multiple linguistic levels. At the word level, we will explore context-aware word substitution attacks using masked language models and synonym replacement strategies, as well as defense based on embedding consistency and lexical similarity. At the sentence level, we aim to implement paraphrase-based attacks, generating semantically equivalent but structurally varied sentences to challenge models’ generalization. These methods will allow for a more thorough evaluation of the model’s robustness under realistic and diverse adversarial scenarios;
- We intend to extend the framework to multi-label or multi-lingual classification tasks. We plan to adapt our multitask model to handle overlapping labels more effectively and evaluate it across multiple dialects or languages by incorporating multilingual pretrained language models such as XLM-R or mBERT. This will help generalize robustness strategies beyond Arabic;
- We will integrate character-level or sub-word-level neural components, such as CNNs or byte-pair encoding (BPE), before or within the encoder layers. These components will allow the model to better capture morphological variations and defend against character-level perturbations, which are particularly prevalent in noisy Arabic social media text. We plan to compare the performance of CNN-enhanced models against pure transformer-based approaches to assess their gains in robustness and generalization.
6. Limitations
- Dialect and domain generalization: the model, trained on Arabic posts from the X social media platform, may not generalize well to other domains (e.g., forums, news comments) or less-represented Arabic dialects, which limits its applicability across various text sources and societal contexts;
- Adversarial attack coverage: our evaluation focused on specific perturbation types (e.g., keyboard typos, diacritics, equivalent Leetspeak) but did not consider more advanced adversarial attacks such as word substitutions, which could degrade model performance;
- Limited linguistic context: despite using MARBERTv2 embeddings, the model may struggle with deep semantic nuances, sarcasm, or implicit hate speech, which require broader discourse or world knowledge for accurate detection.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
OFF | Offensive Language |
HS | Hate Speech |
MTL | Multitask Learning |
MSA | Modern Standard Arabic |
PLMs | Pre-trained Language Models |
RNN | Recurrent Neural Networks |
GRU | Gated Recurrent Unit |
BiGRU | Bidirectional Gated Recurrent Unit |
LSTM | Long Short-Term Memory |
BiLSTM | Bidirectional Long Short-Term Memory |
CNN | Convolutional Neural Network |
ARABERT | Arabic BERT Language Model |
Qarib | An Arabic Pre-Trained Language Model |
NLP | Natural Language Processing |
References
- Chen, Y.; Zhou, Y.; Zhu, S.; Xu, H. Detecting offensive language in social media to protect adolescent online safety. In Proceedings of the 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing, Amsterdam, The Netherlands, 3–5 September 2012; pp. 71–80. [Google Scholar]
- Shende, S.B.; Deshpande, L. A computational framework for detecting offensive language with support vector machine in social communities. In Proceedings of the 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Delhi, India, 3–5 July 2017; pp. 1–4. [Google Scholar]
- Aldjanabi, W.; Dahou, A.; Al-qaness, M.A.; Abd Elaziz, M.; Helmi, A.M.; Damaševičius, R. Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model. In Informatics; Multidisciplinary Digital Publishing Institute: Basel, Switzerland, 2021; Volume 8, p. 69. [Google Scholar]
- Vogel, I.; Regev, R. FraunhoferSIT at GermEval 2019: Can Machines Distinguish Between Offensive Language and Hate Speech? Towards a Fine-Grained Classification. In Proceedings of the 15th Conference on Natural Language Processing (KONVENS 2019), KONVENS, Erlangen, Germany, 9–11 October 2019; pp. 315–319. Available online: https://www.researchgate.net/profile/Inna-Vogel-2/publication/336373536_FraunhoferSIT_at_GermEval_2019_Can_Machines_Distinguish_Between_Offensive_Language_and_Hate_Speech_Towards_a_Fine-Grained_Classification/links/5d9eb6c292851cce3c910f74/FraunhoferSIT-at-GermEval-2019-Can-Machines-Distinguish-Between-Offensive-Language-and-Hate-Speech-Towards-a-Fine-Grained-Classification.pdf. (accessed on 10 August 2025).
- Haddad, B.; Orabe, Z.; Al-Abood, A.; Ghneim, N. Arabic Offensive Language Detection with Attention-based Deep Neural Networks. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, France, 11–16 May 2020; pp. 76–81. [Google Scholar]
- Alshalan, R.; Al-Khalifa, H. A Deep Learning Approach for Automatic Hate Speech Detection in the Saudi Twittersphere. Appl. Sci. 2020, 10, 8614. [Google Scholar] [CrossRef]
- Wiedemann, G.; Ruppert, E.; Jindal, R.; Biemann, C. Transfer learning from lda to bilstm-cnn for offensive language detection in twitter. arXiv 2018, arXiv:1811.02906. [Google Scholar]
- Zampieri, M.; Malmasi, S.; Nakov, P.; Rosenthal, S.; Farra, N.; Kumar, R. Predicting the Type and Target of Offensive Posts in Social Media. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1. [Google Scholar]
- Schmidt, A.; Wiegand, M. A survey on hate speech detection using natural language processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media; Association for Computational Linguistics: Valencia, Spain, 2017; pp. 1–10. [Google Scholar]
- Yuan, L.; Rizoiu, M.-A. Generalizing hate speech detection using multi-task learning: A case study of political public figures. Comput. Speech Lang. 2025, 89, 101690. [Google Scholar] [CrossRef]
- ElSherief, M.; Ziems, C.; Muchlinski, D.; Anupindi, V.; Seybolt, J.; De Choudhury, M.; Yang, D. Latent hatred: A benchmark for understanding implicit hate speech. arXiv 2021, arXiv:2109.05322. [Google Scholar] [CrossRef]
- Mao, C.; Gupta, A.; Nitin, V.; Ray, B.; Song, S.; Yang, J.; Vondrick, C. Multitask learning strengthens adversarial robustness. In Proceedings of the European Conference on Computer Vision, Montreal, QC, Canada, 11 October 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 158–174. [Google Scholar]
- Alshemali, B.; Kalita, J. Improving the reliability of deep neural networks in NLP: A review. Knowl.Based Syst. 2020, 191, 105210. [Google Scholar] [CrossRef]
- Huq, A.; Pervin, M. Adversarial attacks and defense on texts: A survey. arXiv 2020, arXiv:2005.14108. [Google Scholar] [CrossRef]
- Gröndahl, T.; Pajola, L.; Juuti, M.; Conti, M.; Asokan, N. All You Need is” Love” Evading Hate Speech Detection. In Proceedings of the 11th ACM Workshop on Artificial Intelligence and Security, New York, NY, USA, 15–19 October 2018; pp. 2–12. [Google Scholar]
- Zhang, Y.; Yang, Q. A survey on multi-task learning. IEEE Trans. Knowl. Data Eng. 2021, 34, 5586–5609. [Google Scholar] [CrossRef]
- Crawshaw, M. Multi-task learning with deep neural networks: A survey. arXiv 2020, arXiv:2009.09796. [Google Scholar] [CrossRef]
- Duwairi, R.; Hayajneh, A.; Quwaider, M. A Deep Learning Framework for Automatic Detection of Hate Speech Embedded in Arabic Tweets. Arab. J. Sci. Eng. 2021, 46, 4001–4014. [Google Scholar] [CrossRef]
- Husain, F.; Uzuner, O. A survey of offensive language detection for the arabic language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2021, 20, 1–44. [Google Scholar] [CrossRef]
- Al-Hassan, A.; Al-Dossari, H. Detection of hate speech in social networks: A survey on multilingual corpus. In Proceedings of the 6th International Conference on Computer Science and Information Technology, Zurich, Switzerland, 23–24 November 2019; Volume 10. [Google Scholar]
- Boulouard, Z.; Ouaissa, M.; Ouaissa, M.; Krichen, M.; Almutiq, M.; Gasmi, K. Detecting Hateful and Offensive Speech in Arabic Social Media Using Transfer Learning. Appl. Sci. 2022, 12, 12823. [Google Scholar] [CrossRef]
- Husain, F.; Uzuner, O. Transfer Learning Across Arabic Dialects for Offensive Language Detection. In Proceedings of the 2022 International Conference on Asian Language Processing (IALP), Singapore, 27–28 October 2022; pp. 196–205. [Google Scholar]
- Elzayady, H.; Mohamed, M.S.; Badran, K.M.; Salama, G.I. A hybrid approach based on personality traits for hate speech detection in Arabic social media. Int. J. Electr. Comput. Eng. 2023, 13, 1979. [Google Scholar]
- Mohamed, M.S.; Elzayady, H.; Badran, K.M.; Salama, G.I. An efficient approach for data-imbalanced hate speech detection in Arabic social media. J. Intell. Fuzzy Syst. 2023, 45, 6381–6390. [Google Scholar] [CrossRef]
- Al-Dabet, S.; ElMassry, A.; Alomar, B.; Alshamsi, A. Transformer-based arabic offensive speech detection. In Proceedings of the 2023 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 1–3 March 2023; pp. 1–6. [Google Scholar]
- Mazari, A.C.; Kheddar, H. Deep learning-based analysis of Algerian dialect dataset targeted hate speech, offensive language and cyberbullying. Int. J. Comput. Digit. Syst. 2023, 13, 965–972. [Google Scholar] [CrossRef]
- Al-Ibrahim, R.M.; Ali, M.Z.; Najadat, H.M. Detection of hateful social media content for Arabic language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2023, 22, 1–26. [Google Scholar] [CrossRef]
- AlSukhni, E.; AlAzzam, I.; Hanandeh, S. Offensive Language Detection of Arabic Tweets Using Deep Learning Algorithm. In Proceedings of the 2024 15th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 13–15 August 2024; pp. 1–6. [Google Scholar]
- Mazari, A.C.; Benterkia, A.; Takdenti, Z. Advancing offensive language detection in Arabic social media: A BERT-based ensemble learning approach. Soc. Netw. Anal. Min. 2024, 14, 186. [Google Scholar] [CrossRef]
- Mousa, A.; Shahin, I.; Nassif, A.B.; Elnagar, A. Detection of Arabic offensive language in social media using machine learning models. Intell. Syst. Appl. 2024, 22, 200376. [Google Scholar] [CrossRef]
- Khairy, M.; Mahmoud, T.M.; Omar, A.; Abd El-Hafeez, T. Comparative performance of ensemble machine learning for Arabic cyberbullying and offensive language detection. Lang. Resour. Eval. 2024, 58, 695–712. [Google Scholar] [CrossRef]
- Mnassri, K.; Rajapaksha, P.; Farahbakhsh, R.; Crespi, N. Hate speech and offensive language detection using an emotion-aware shared encoder. In Proceedings of the ICC 2023-IEEE International Conference on Communications, Rome, Italy, 28 May–1 June 2023; pp. 2852–2857. [Google Scholar]
- Farha, I.A.; Magdy, W. Multitask learning for arabic offensive language and hate-speech detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, France, 11–16 May 2020; pp. 86–90. [Google Scholar]
- Mulki, H.; Ghanem, B. Let-mi: An Arabic levantine twitter dataset for Misogynistic language. arXiv 2021, arXiv:2103.10195. [Google Scholar] [CrossRef]
- Shapiro, A.; Khalafallah, A.; Torki, M. AlexU-AIC at Arabic Hate Speech 2022: Contrast to Classify. arXiv 2022, arXiv:2207.08557. [Google Scholar]
- Alrashidi, B.; Jamal, A.; Alkhathlan, A. Abusive content detection in arabic tweets using multi-task learning and transformer-based models. Appl. Sci. 2023, 13, 5825. [Google Scholar] [CrossRef]
- AlKhamissi, B.; Diab, M. Meta ai at arabic hate speech 2022: Multitask learning with self-correction for hate speech classification. arXiv 2022, arXiv:2205.07960. [Google Scholar]
- Djandji, M.; Baly, F.; Antoun, W.; Hajj, H. Multi-task learning using AraBert for offensive language detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, France, 11–16 May 2020; pp. 97–101. [Google Scholar]
- Kapil, P.; Ekbal, A. A Unified Multi-Task Learning Architecture for Hate Detection Leveraging User-Based Information. arXiv 2024, arXiv:2411.06855. [Google Scholar]
- Dai, W.; Yu, T.; Liu, Z.; Fung, P. Kungfupanda at semeval-2020 task 12: Bert-based multi-task learning for offensive language detection. arXiv 2020, arXiv:2004.13432. [Google Scholar]
- Kapil, P.; Ekbal, A. Leveraging multi-domain, heterogeneous data using deep multitask learning for hate speech detection. arXiv 2021, arXiv:2103.12412. [Google Scholar] [CrossRef]
- Halat, S.; Plaza-Del-Arco, F.M.; Padó, S.; Klinger, R. Multi-Task Learning with Sentiment, Emotion, and Target Detection to Recognize Hate Speech and Offensive Language. arXiv 2022, arXiv:2109.10255. [Google Scholar] [CrossRef]
- Plaza-Del-Arco, F.M.; Molina-González, M.D.; Ureña-López, L.A.; Martín-Valdivia, M.T. A multi-task learning approach to hate speech detection leveraging sentiment analysis. IEEE Access 2021, 9, 112478–112489. [Google Scholar] [CrossRef]
- Zampieri, M.; Ranasinghe, T.; Sarkar, D.; Ororbia, A. Offensive language identification with multi-task learning. J. Intell. Inf. Syst. 2023, 60, 613–630. [Google Scholar] [CrossRef]
- Jia, R.; Liang, P. Adversarial examples for evaluating reading comprehension systems. arXiv 2017, arXiv:1707.07328. [Google Scholar] [CrossRef]
- Liang, B.; Li, H.; Su, M.; Bian, P.; Li, X.; Shi, W. Deep text classification can be fooled. arXiv 2017, arXiv:1704.08006. [Google Scholar]
- Ebrahimi, J.; Rao, A.; Lowd, D.; Dou, D. Hotflip: White-box adversarial examples for text classification. arXiv 2017, arXiv:1712.06751. [Google Scholar]
- Belinkov, Y.; Bisk, Y. Synthetic and natural noise both break neural machine translation. arXiv 2017, arXiv:1711.02173. [Google Scholar]
- Gao, J.; Lanchantin, J.; Soffa, M.L.; Qi, Y. Black-box generation of adversarial text sequences to evade deep learning classifiers. In Proceedings of the 2018 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 24 May 2018; pp. 50–56. [Google Scholar]
- Li, J.; Ji, S.; Du, T.; Li, B.; Wang, T. Textbugger: Generating adversarial text against real-world applications. arXiv 2018, arXiv:1812.05271. [Google Scholar]
- Ren, S.; Deng, Y.; He, K.; Che, W. Generating natural language adversarial examples through probability weighted word saliency. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1085–1097. [Google Scholar]
- Alshemali, B.; Kalita, J. Adversarial examples in arabic. In Proceedings of the 2019 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 5–7 December 2019; pp. 371–376. [Google Scholar]
- Jin, D.; Jin, Z.; Zhou, J.T.; Szolovits, P. Is bert really robust? a strong baseline for natural language attack on text classification and entailment. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 8018–8025. [Google Scholar]
- Li, L.; Zhu, Z.; Du, D.; Ren, S.; Zheng, Y.; Chang, G. Adversarial Convolutional Neural Network for Text Classification. In Proceedings of the 2020 4th International Conference on Electronic Information Technology and Computer Engineering, Xiamen, China, 18–20 October 2020; pp. 692–696. [Google Scholar]
- Garg, S.; Ramakrishnan, G. Bae: Bert-based adversarial examples for text classification. arXiv 2020, arXiv:2004.01970. [Google Scholar] [CrossRef]
- Mubarak, H.; Rashed, A.; Darwish, K.; Samih, Y.; Abdelali, A. Arabic offensive language on twitter: Analysis and experiments. arXiv 2020, arXiv:2004.02192. [Google Scholar]
- Mulki, H.; Haddad, H.; Ali, C.B.; Alshabani, H. L-hsab: A levantine twitter dataset for hate speech and abusive language. In Proceedings of the Third Workshop on Abusive Language Online, Florence, Italy, 1 August 2019; pp. 111–118. [Google Scholar]
- Haddad, H.; Mulki, H.; Oueslati, A. T-hsab: A tunisian hate speech and abusive dataset. In Proceedings of the International Conference on Arabic Language Processing, Nancy, France, 16–17 October 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 251–263. [Google Scholar]
- Hassan, S.; Samih, Y.; Mubarak, H.; Abdelali, A.; Rashed, A.; Chowdhury, S.A. ALT submission for OSACT shared task on offensive language detection. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, France, 11–16 May 2020; pp. 61–65. [Google Scholar]
Reference | Year | Data Source | Technique | Performance Metric |
---|---|---|---|---|
[21] | 2022 | Twitter, YouTube | A transfer learning models based on (BERT, mBERT, AraBERT) | Acc = 98%, F1 = 98%, P = 98%, R = 98%. |
[22] | 2022 | Different social media | AraBERT | Acc = 0.87, F1 = 0.87, P = 0.86, R = 0.87 on L-HSAB dataset |
[23] | 2023 | AraBERT | mF1 = 82.3%. | |
[24] | 2023 | Ensemble model of MARBERTv2 variations | mF1 = 91.6% | |
[25] | 2023 | CAMeLBERT | F1 = 83.6%, Acc = 87.15% | |
[26] | 2023 | Different social media | RF, NB, LR, SGD, and linear SVC, CNN, LSTM, GRU, Bi-LSTM, and Bi-GRU | F1 = 75.8%, Acc = 73.6% |
[27] | 2023 | Enhanced BiLSTM and a modified CNN | mF1 = 92%, Acc = 92.2% | |
[28] | 2024 | RNN+LSTM, RNN+BiLSTM, and SVM | Acc = 95.6% | |
[29] | 2024 | Ensemble of different BERT structures | F1 = 90.97% | |
[30] | 2024 | Combination of ArabicBERT with BiLSTM and RBF | Acc = 98.4%, F1 = 98.4%, P = 98.2%, R = 92.8%. | |
[31] | 2024 | Different social media | KNN, LR, and linear SVC, ensemble method based on voting. | Acc = 71.1%, 76.7%, and 98.5% across three datasets. |
NOT-OFF | OFF | NOT-HS | HS | |
---|---|---|---|---|
Train | 5590 | 1410 | 6639 | 361 |
Dev | 821 | 179 | 956 | 44 |
Test | 1598 | 402 | 1899 | 101 |
Total | 8009 | 1991 | 9494 | 506 |
% | 80% | 20% | 95% | 5% |
NOT-OFF | OFF | NOT-HS | HS | |
---|---|---|---|---|
OSACT2020 Train | 5590 | 1410 | 6639 | 361 |
L-HSAB | - | 374 | - | 374 |
T-HSAB | - | 1068 | - | 1068 |
Collected from X | - | 4136 | - | 3787 |
Total | 5590 | 6988 | 6988 | 5590 |
% | 44% | 56% | 56% | 44% |
Hyperparameter | Value |
---|---|
Optimizer | AdamW |
Learning Rate | 5 × 10−4 |
Weight decay | 1× 10−6 |
Batch Size | 128 |
Sequence Length | 256 tokens |
Epochs | 30 |
Dropout Rates | 0.2, 0.5 |
GRU Units | 512, 256, 128 |
Activation Functions | ReLU, Sigmoid |
Loss Function | Weighted Binary Cross-Entropy |
Evaluation Metric | Macro F1-score |
Early Stopping | Patience = 3 |
Model | HS | OFF | ||||
---|---|---|---|---|---|---|
P | R | mF1 | P | R | mF1 | |
MARBERTv2+BiGRU | 89 | 81 | 85 | 93 | 92 | 93 |
MARBERTv2+GRU | 86 | 84 | 85 | 93 | 92 | 93 |
MARBERTv2+LSTM | 89 | 78 | 83 | 94 | 91 | 92 |
MARBERTv2+BiLSTM | 91 | 77 | 82 | 93 | 92 | 92 |
CAMeLBERT-DA+BiGRU | 85 | 71 | 76 | 92 | 88 | 90 |
CAMeLBERT-DA+GRU | 85 | 72 | 77 | 92 | 87 | 90 |
CAMeLBERT-DA+LSTM | 80 | 69 | 73 | 91 | 88 | 90 |
CAMeLBERT-DA+BiLSTM | 86 | 64 | 70 | 91 | 88 | 90 |
AraBERTv2-Twitter+BiGRU | 85 | 83 | 84 | 93 | 92 | 92 |
AraBERTv2-Twitter+GRU | 85 | 80 | 83 | 93 | 91 | 92 |
AraBERTv2-Twitter+LSTM | 88 | 76 | 81 | 93 | 90 | 91 |
AraBERTv2-Twitter+BiLSTM | 85 | 81 | 83 | 92 | 92 | 92 |
Qarib+BiGRU | 86 | 81 | 83 | 94 | 91 | 92 |
Qarib+GRU | 90 | 81 | 85 | 93 | 91 | 92 |
Qarib+LSTM | 85 | 72 | 77 | 93 | 91 | 92 |
Qarib+BiLSTM | 86 | 79 | 82 | 93 | 91 | 92 |
Model | HS | OFF | ||||
---|---|---|---|---|---|---|
P | R | mF1 | P | R | mF1 | |
MARBERTv2+BiGRU | 87 | 85 | 86 | 93 | 91 | 92 |
MARBERTv2+GRU | 87 | 85 | 86 | 93 | 92 | 93 |
MARBERTv2+LSTM | 84 | 83 | 84 | 93 | 91 | 92 |
MARBERTv2+BiLSTM | 84 | 86 | 85 | 93 | 91 | 92 |
CAMeLBERT-DA+BiGRU | 85 | 77 | 80 | 93 | 88 | 90 |
CAMeLBERT-DA+GRU | 83 | 83 | 83 | 93 | 89 | 91 |
CAMeLBERT-DA+LSTM | 83 | 79 | 81 | 93 | 88 | 90 |
CAMeLBERT-DA+BiLSTM | 89 | 81 | 84 | 92 | 88 | 90 |
AraBERTv2-Twitter+BiGRU | 82 | 84 | 83 | 94 | 92 | 93 |
AraBERTv2-Twitter+GRU | 82 | 84 | 83 | 93 | 92 | 92 |
AraBERTv2-Twitter+LSTM | 82 | 83 | 83 | 92 | 92 | 92 |
AraBERTv2-Twitter+BiLSTM | 83 | 84 | 84 | 91 | 92 | 91 |
Qarib+BiGRU | 86 | 86 | 86 | 94 | 90 | 92 |
Qarib+GRU | 86 | 84 | 85 | 93 | 90 | 92 |
Qarib+LSTM | 82 | 82 | 82 | 93 | 90 | 91 |
Qarib+BiLSTM | 84 | 81 | 82 | 93 | 90 | 91 |
Model | HS | OFF | ||||
---|---|---|---|---|---|---|
P | R | mF1 | P | R | mF1 | |
MARBERTv2+BiGRU | 88 | 88 | 88 | 93 | 92 | 93 |
MARBERTv2+GRU | 87 | 85 | 86 | 92 | 92 | 92 |
MARBERTv2+LSTM | 86 | 83 | 84 | 92 | 92 | 92 |
MARBERTv2+BiLSTM | 88 | 83 | 85 | 91 | 92 | 92 |
CAMeLBERT-DA+BiGRU | 85 | 82 | 84 | 92 | 90 | 91 |
CAMeLBERT-DA+GRU | 84 | 79 | 81 | 93 | 89 | 91 |
CAMeLBERT-DA+LSTM | 80 | 81 | 81 | 91 | 88 | 90 |
CAMeLBERT-DA+BiLSTM | 85 | 79 | 81 | 92 | 88 | 90 |
AraBERTv2-Twitter+BiGRU | 82 | 84 | 83 | 94 | 92 | 93 |
AraBERTv2-Twitter+GRU | 83 | 83 | 83 | 93 | 92 | 92 |
AraBERTv2-Twitter+LSTM | 87 | 78 | 82 | 93 | 91 | 92 |
AraBERTv2-Twitter+BiLSTM | 82 | 84 | 83 | 93 | 92 | 92 |
Qarib+BiGRU | 83 | 87 | 85 | 94 | 92 | 93 |
Qarib+GRU | 86 | 84 | 85 | 93 | 91 | 92 |
Qarib+LSTM | 84 | 83 | 84 | 91 | 91 | 91 |
Qarib+BiLSTM | 85 | 82 | 83 | 93 | 91 | 92 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alshahrani, E.S.; Aksoy, M.S. Adversarially Robust Multitask Learning for Offensive and Hate Speech Detection in Arabic Text Using Transformer-Based Models and RNN Architectures. Appl. Sci. 2025, 15, 9602. https://doi.org/10.3390/app15179602
Alshahrani ES, Aksoy MS. Adversarially Robust Multitask Learning for Offensive and Hate Speech Detection in Arabic Text Using Transformer-Based Models and RNN Architectures. Applied Sciences. 2025; 15(17):9602. https://doi.org/10.3390/app15179602
Chicago/Turabian StyleAlshahrani, Eman S., and Mehmet S. Aksoy. 2025. "Adversarially Robust Multitask Learning for Offensive and Hate Speech Detection in Arabic Text Using Transformer-Based Models and RNN Architectures" Applied Sciences 15, no. 17: 9602. https://doi.org/10.3390/app15179602
APA StyleAlshahrani, E. S., & Aksoy, M. S. (2025). Adversarially Robust Multitask Learning for Offensive and Hate Speech Detection in Arabic Text Using Transformer-Based Models and RNN Architectures. Applied Sciences, 15(17), 9602. https://doi.org/10.3390/app15179602