Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language
Abstract
:1. Introduction
2. Paraphrasing Theory and Models
2.1. Fundamental Concepts and Theoretical Foundations of Paraphrasing
2.2. AI-Based Methods for Paraphrasing
3. Methodology
3.1. Data Selection
3.2. Data Processing
3.3. Modeling
3.4. Evaluation
3.5. Semantic Similarity
3.6. Quality Ranking Framework for Paraphrasing
Algorithm 1: A quality ranking algorithm for each paraphrasing pair. | |
1: | //Calculate the score of paraphrasing quality |
2: | //Input: two sentences that are original text and paraphrased text |
3: | //Output: the final score of paraphrasing quality level |
4: | Start, |
5: | Ori_tokens ← word tokenization of original sentence |
6: | Par_tokens ← word tokenization of paraphrased sentence |
7: | Ori_vector ← embedding using AraBERT of original sentence |
8: | Par_vector ← embedding using AraBERT of paraphrased sentence |
9: | BLEU_1 ← calculate BLEU for Ori_tokens and Par_tokens |
10: | ROUGE_1_recall ← calculate ROUGE-Recall for Ori_tokens and Par_tokens |
11: | ROUGE_1_ precision ← calculate ROUGE-precision for Ori_tokens and Par_tokens |
12: | ROUGE_1_ F1 ← calculate ROUGE-F1 for Ori_tokens and Par_tokens |
13: | LD_avg ← calculate the average score of LD for Ori_tokens and Par_tokens |
14: | Jaccard_sim ← calculate Jaccard similarity for Ori_tokens and Par_tokens |
15: | Cosine_sim ← calculate cosine similarity between Ori_vector and Par_vector |
16: | Euclidean_sim ← calculate Euclidean similarity between Ori_vector and Par_vector |
17: | BELU_threshold ← values range [0.4, 1) \\ 1 is excluded |
18: | ROUGE_recall_threshold ← values range [0.6, 1) \\ 1 is excluded |
19: | ROUGE_precision_threshold ← values range [0.5, 1) \\ 1 is excluded |
20: | ROUGE_F1_threshold ← values range [0.5, 1) \\ 1 is excluded |
21: | Jaccard_threshold ← values range [0.5, 1) \\ 1 is excluded |
22: | Cosine_threshold ← values range [0.8, 1) \\ 1 is excluded |
23: | Euclidean_threshold ← values range (0, 3.5] \\ 0 is excluded |
24: | Checking threshold for all metrics: |
25: | If eight, seven, or six of them are achieved ← set quality level for the pair is Level-3 |
26: | If five, four, or three of them are achieved ← set quality level for the pair is Level-2 |
27: | If one or two of them are achieved ← set quality level for the pair is Level-1 |
28: | Else ← set quality level for the pair is Level-0 |
29: | Return quality level |
30: | Stop. |
4. Results and Discussion
4.1. Human Evaluation
4.2. Regular NLP Metrics
4.3. Corpus Semantic Similarity
Category | Cosine Similarity | Euclidean Distance | Jaccard Similarity |
---|---|---|---|
Culture | 0.89 | 3.21 | 0.33 |
Sport | 0.89 | 3.1 | 0.35 |
Technology | 0.89 | 3.16 | 0.36 |
Average | 0.89 | 3.16 | 0.35 |
4.4. Semantic Similarity for Paraphrase Pairing and Non-Pairing
4.5. Results of Quality Ranking Framework for Paraphrasing
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A
Text Category | Sentence A | Sentence B | How Far Does Sentence B Maintain the Meaning of Sentence A? | How Far Does Sentence B Look Natural and Fluent in Terms of Language and Sentence Structure? | Does Sentence B Follow Standard Grammatical Arabic Rules? | How Far Has Sentence B Been Restructured Compared to Sentence A | Have Some of the Words in Sentence B Been Replaced With Different Forms of the Word in Sentence A? | Have Some Words in Sentence B Been Replaced With Synonyms in Sentence A? | Considering All the Previous Criteria, How Would You Assess the Overall Quality of Sentence B? |
ثقافة | وأضافت سمو الشيخة بدور أن التجربة لقيت إقبالاً كبيراً من طرف الجمهور الذي تلقف كتب الدار بحفاوة، وقد نتج عن ذلك أن حصدت الدار مجموعة من الجوائز ووقعت عقوداً لترجمة عدد من إصداراتها إلى اللغات الأخرى. | أكدت سمو الشيخة بدور أن التجربة شهدت إقبالاً كبيراً من الجمهور، مما أسفر عن حصد الدار لعدد من الجوائز وتوقيع عقود لترجمة إصداراتها إلى لغات أخرى. | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable |
تقنية | ويعمل البرنامجان الجديدان بخاصية المبادرة إلى توفير الحماية وبشكل مستمر على تأمين ومراقبة عمل الحاسوب لاكتشاف أي نوع من أنواع التهديدات الأمنية لمنعها من أي نشاط تخريبي. | يعمل البرنامجان الجديدان بشكل مستمر لتوفير الحماية وتأمين ومراقبة عمل الحاسوب لاكتشاف ومنع أي تهديدات أمنية. | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable | 3: Excellent or applicable 2: Good 1: Poor 0: Very poor or not applicable |
References
- Thanaki, J. Python Natural Language Processing: Explore NLP with Machine Learning and Deep Learning Techniques for Natural Language Processing; Packt Publishing: Birmingham, UK, 2017. [Google Scholar]
- Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, Global ed; Pearson: London, UK, 2021. [Google Scholar]
- Ayanouz, S.; Abdelhakim, B.A.; Benhmed, M. A smart chatbot architecture based NLP and machine learning for health care assistance. In Proceedings of the 3rd International Conference on Networking, Information Systems & Security, Marrakech, Morocco, 31 March–2 April 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
- Burgueño, L.; Clarisó, R.; Gérard, S.; Li, S.; Cabot, J. An NLP-Based Architecture for the Autocompletion of Partial Domain Models. In Advanced Information System Engineering; Springer: Berlin/Heidelberg, Germany, 2021; pp. 91–106. [Google Scholar]
- Lende, S.P.; Raghuwanshi, M.M. Question answering system on education acts using NLP techniques. In Proceedings of the 2016 World Conference on Futuristic Trends in Research and Innovation for Social Welfare, Coimbatore, India, 29 February–1 March 2016. [Google Scholar] [CrossRef]
- Li, Z.; Jiang, X.; Shang, L.; Li, H. Paraphrase generation with deep reinforcement learning. arXiv 2017, arXiv:1711.00279. [Google Scholar]
- Zhou, J.; Bhat, S. Paraphrase generation: A survey of the state of the art. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 5075–5086. [Google Scholar]
- Singh, R.; Singh, S. Text Similarity Measures in News Articles by Vector Space Model Using NLP. J. Inst. Eng. Ser. B 2021, 102, 329–338. [Google Scholar] [CrossRef]
- Kotu, V.; Deshpande, B. Data Science Concepts and Practice, 2nd ed.; Morgan Kaufmann: Cambridge, UK, 2019. [Google Scholar]
- Refai, D.; Abo-Soud, S.; Abdel-Rahman, M. Data Augmentation using Transformers and Similarity Measures for Improving Arabic Text Classification. IEEE Access 2023, 11, 132516–132531. [Google Scholar] [CrossRef]
- McKeown, K. Focus constraints on language generation. In Proceedings of the Eighth International Joint Conference on Artificial Intelligence—Volume 1, Karlsruhe, Germany, 8–12 August 1983; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1983. [Google Scholar]
- Egonmwan, E.; Chali, Y. Transformer and seq2seq model for Paraphrase Generation. In Proceedings of the 3rd Workshop on Neural Generation and Translation; Birch, A., Finch, A., Hayashi, H., Konstas, I., Luong, T., Neubig, G., Oda, Y., Sudoh, K., Eds.; Association for Computational Linguistics: Hong Kong, China, 2019; pp. 249–255. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North, Minneapolis, MN, USA, 2–7 June 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Caldarini, G.; Jaf, S.; McGarry, K. A Literature Survey of Recent Advances in Chatbots. Information 2022, 13, 41. [Google Scholar] [CrossRef]
- Yang, Z.; Gan, Z.; Wang, J.; Hu, X.; Lu, Y.; Liu, Z.; Wang, L. An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA. Proc. AAAI Conf. Artif. Intell. 2022, 36, 3081–3089. [Google Scholar] [CrossRef]
- Sahib, T.M.; Alyasiri, O.M.; Younis, H.A.; Akhtom, D.; Hayder, I.M. A comparison between ChatGPT-3.5 and ChatGPT-4.0 as a tool for paraphrasing English Paragraphs. In Proceedings of the International Applied Social Sciences Congress, Valletta, Malta, 13–15 November 2023; pp. 471–480. [Google Scholar]
- Nagoudi, E.M.B.; Elmadany, A.; Abdul-Mageed, M. AraT5: Text-to-Text Transformers for Arabic Language Understanding and Generation. arXiv 2021, arXiv:2109.12068. [Google Scholar]
- Betti, M.J. Paraphrase in Linguistics. ResearchGate. Available online: https://www.researchgate.net/publication/357661190_Paraphrase_in_Linguistics/citation/download (accessed on 4 November 2024).
- Shen, L.; Liu, L.; Jiang, H.; Shi, S. On the Evaluation Metrics for Paraphrase Generation. arXiv 2022, arXiv:2202.08479. [Google Scholar]
- Rahayu, F.E.S.; Utomo, A.; Setyowati, R. Investigating Lexical Diversity of Children Narratives: A Case Study of L1 Speaking. Regist. J. 2020, 13, 371–388. [Google Scholar] [CrossRef]
- Lin, C.-Y. Rouge: A package for automatic evaluation of summaries. In Text Summarization Branches Out; Association for Computational Linguistics: Barcelona, Spain, 2004; pp. 74–81. [Google Scholar]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.-J. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 6–12 July 2002; Association for Computational Linguistics: Philadelphia, PA, USA, 2002; pp. 311–318. [Google Scholar]
- Sun, S.; Sia, S.; Duh, K. CLIReval: Evaluating Machine Translation as a Cross-Lingual Information Retrieval Task. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Philadelphia, PA, USA, 5 July 2002; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 134–141. [Google Scholar] [CrossRef]
- Koehn, P.; Monz, C. Manual and Automatic Evaluation of Machine Translation between European Languages. In Proceedings on the Workshop on Statistical Machine Translation; Koehn, P., Monz, C., Eds.; Association for Computational Linguistics: New York, NY, USA, 2006; pp. 102–121. Available online: https://aclanthology.org/W06-3114 (accessed on 12 September 2024).
- Mulimani, D.; Patil, P.; Chaklabbi, N. Image Captioning using CNN and Attention Based Transformer. In Data Science and Intelligent Computing Techniques; Soft Computing Research Society: New Delhi, India, 2023; pp. 157–166. [Google Scholar] [CrossRef]
- Zieve, M.; Gregor, A.; Stokbaek, F.J.; Lewis, H.; Mendoza, E.M.; Ahmadnia, B. Systematic TextRank Optimization in Extractive Summarization. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing; Mitkov, R., Angelova, G., Eds.; INCOMA Ltd.: Varna/Shoumen, Bulgaria, 2023; pp. 1274–1281. Available online: https://aclanthology.org/2023.ranlp-1.135 (accessed on 23 November 2024).
- Li, B.; Liu, T.; Wang, B.; Wang, L. Enhancing Deep Paraphrase Identification via Leveraging Word Alignment Information. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 7843–7847. [Google Scholar] [CrossRef]
- Chandrasekaran, D.; Mago, V. Evolution of Semantic Similarity—A Survey. Acm Comput. Surv. 2020, 54, 41. [Google Scholar] [CrossRef]
- Mohamed, M.; Oussalah, M. SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis. Inf. Process. Manag. 2019, 56, 1356–1372. [Google Scholar] [CrossRef]
- Zou, W.; Socher, R.; Cer, D.; Manning, C. Bilingual word embeddings for phrase-based machine translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; Association for Computational Linguistics: Florence, Italy, 2013; pp. 1393–1398. [Google Scholar]
- Rahutomo, F.; Kitasuka, T.; Aritsugi, M. Semantic Cosine Similarity. In Proceedings of the 7th International Student Conference on Advanced Science and Technology ICAST 2012, Seoul, Republic of Korea, 29–30 October 2012; Volume 4, pp. 1–2. [Google Scholar]
- Timkey, W.; van Schijndel, M. All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality. arXiv 2021, arXiv:2109.04404. [Google Scholar] [CrossRef]
- Rieger, J.; Koppers, L.; Jentsch, C.; Rahnenführer, J. Improving Reliability of Latent Dirichlet Allocation by Assessing Its Stability Using Clustering Techniques on Replicated Runs. arXiv 2020, arXiv:2003.04980. [Google Scholar]
- Habbat, N.; Anoun, H.; Hassouni, L. AraBERTopic: A Neural Topic Modeling Approach for News Extraction from Arabic Facebook Pages using Pre-trained BERT Transformer Model. Int. J. Comput. Digit. Syst. 2023, 14, 1–8. [Google Scholar] [CrossRef] [PubMed]
- Antoun, W.; Baly, F.; Hajj, H. AraBERT: Transformer-based Model for Arabic Language Understanding. arXiv 2020, arXiv:2003.00104. [Google Scholar]
- Zheng, W.; Lu, S.; Cai, Z.; Wang, R.; Wang, L.; Yin, L. PAL-BERT: An Improved Question Answering Model. Comput. Model. Eng. Sci. 2024, 139, 2729–2745. [Google Scholar] [CrossRef]
- Bello, A.; Ng, S.-C.; Leung, M.-F. A BERT Framework to Sentiment Analysis of Tweets. Sensors 2023, 23, 506. [Google Scholar] [CrossRef]
- Eke, C.I.; Norman, A.A.; Shuib, L. Context-Based Feature Technique for Sarcasm Identification in Benchmark Datasets Using Deep Learning and BERT Model. IEEE Access 2021, 9, 48501–48518. [Google Scholar] [CrossRef]
- Wang, J.; Huang, J.X.; Tu, X.; Wang, J.; Huang, A.J.; Laskar, M.T.R.; Bhuiyan, A. Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges. ACM Comput. Surv. 2024, 56, 185. [Google Scholar] [CrossRef]
- Wu, X.; Xia, Y.; Zhu, J.; Wu, L.; Xie, S.; Qin, T. A study of BERT for context-aware neural machine translation. Mach. Learn. 2022, 111, 917–935. [Google Scholar] [CrossRef]
- Wahle, J.P.; Ruas, T.; Meuschke, N.; Gipp, B. Are Neural Language Models Good Plagiarists? A Benchmark for Neural Paraphrase Detection. In Proceedings of the 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), Champaign, IL, USA, 27–30 September 2021; pp. 226–229. [Google Scholar] [CrossRef]
- Chelba, C.; Mikolov, T.; Schuster, M.; Ge, Q.; Brants, T.; Koehn, P.; Robinson, T. One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling. arxiv 2014, arXiv:1312.3005. [Google Scholar]
- Zhu, Y.; Kiros, R.; Zemel, R.; Salakhutdinov, R.; Urtasun, R.; Torralba, A.; Fidler, S. Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 19–27. [Google Scholar]
- El Koshiry, A.M.; Eliwa, E.H.I.; El-Hafeez, T.A.; Omar, A. Arabic Toxic Tweet Classification: Leveraging the AraBERT Model. Big Data Cogn. Comput. 2023, 7, 170. [Google Scholar] [CrossRef]
- Al-Twairesh, N. The Evolution of Language Models Applied to Emotion Analysis of Arabic Tweets. Information 2021, 12, 84. [Google Scholar] [CrossRef]
- Al-Yahya, M.; Al-Khalifa, H.; Al-Baity, H.; AlSaeed, D.; Essam, A. Arabic Fake News Detection: Comparative Study of Neural Networks and Transformer-Based Approaches. Complexity 2021, 2021, 5516945. [Google Scholar] [CrossRef]
- Abo-Elghit, A.H.; Hamza, T.; Al-Zoghby, A. Embedding Extraction for Arabic Text Using the AraBERT Model. Comput. Mater. Contin. 2022, 72, 1967–1994. [Google Scholar] [CrossRef]
- Mohdeb, D.; Laifa, M.; Zerargui, F.; Benzaoui, O. Evaluating transfer learning approach for detecting Arabic anti-refugee/migrant speech on social media. Aslib J. Inf. Manag. 2022, 74, 1070–1088. [Google Scholar] [CrossRef]
- Salloum, W.; Habash, N. Dialectal to standard Arabic paraphrasing to improve Arabic-English statistical machine translation. In Proceedings of the First Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties; Association for Computational Linguistics: Edinburgh, Scotland, 2011; pp. 10–21. [Google Scholar]
- Jurafsky, D.; Martin, J.H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition; Prentice Hall: Saddle River, NJ, USA, 2020. [Google Scholar]
- Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement Learning: A Survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef]
- Iyyer, M.; Wieting, J.; Gimpel, K.; Zettlemoyer, L. Adversarial example generation with syntactically controlled paraphrase networks. arXiv 2018, arXiv:1804.06059. [Google Scholar]
- Mahmoud, A.; Zrigui, M. Semantic similarity analysis for corpus development and paraphrase detection in Arabic. Int. Arab J. Inf. Technol. 2021, 18, 1–7. [Google Scholar] [CrossRef]
- Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving language understanding by generative pre-training. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf?utm_source=chatgpt.com (accessed on 1 January 2025).
- Radford, A.; Jeffrey, W.; Rewon, C.; David, L.; Dario, A.; Ilya, S. Language models are unsupervised multitask learners. OpenAI blog 2019, 1, 9. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Elnashar, A.; Spencer-Smith, J.; Schmidt, D.C. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv 2023, arXiv:2302.11382. [Google Scholar]
- Ormazabal, A.; Artetxe, M.; Soroa, A.; Labaka, G.; Agirre, E. Principled Paraphrase Generation with Parallel Corpora. In Proceedings of the Annual Meeting of the Association for Computational Linguistics; Association for Computational Linguistics: Dublin, Ireland, 2022. [Google Scholar] [CrossRef]
- Gudkov, V.; Mitrofanova, O.; Filippskikh, E. Automatically ranked Russian paraphrase corpus for text generation. arXiv 2020, arXiv:2006.09719. [Google Scholar]
- Fu, Y.; Feng, Y.; Cunningham, J.P. Paraphrase generation with latent bag of words. In Proceedings of the Advances in Neural Information Processing Systems 32, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
- Sancheti, A.; Srinivasan, B.V.; Rudinger, R. Entailment relation aware paraphrase generation. Proc. AAAI Conf. Artif. Intell. 2022, 36, 11258–11266. [Google Scholar]
- Ding, B.; Qin, C.; Liu, L.; Chia, Y.K.; Li, B.; Joty, S.; Bing, L. Is GPT-3 a Good Data Annotator? In Proceedings of the Annual Meeting of the Association for Computational Linguistics; Association for Computational Linguistics: Toronto, ON, Canada, 2023. [Google Scholar] [CrossRef]
- Surameery, N.M.S.; Shakor, M.Y. Use Chat GPT to Solve Programming Bugs. Int. J. Inf. Technol. Comput. Eng. 2023, 31, 17–22. [Google Scholar] [CrossRef]
- Goyal, T.; Li, J.J.; Durrett, G. News summarization and evaluation in the era of gpt-3. arXiv 2022, arXiv:2209.12356. [Google Scholar]
- Gutierrez, B.J.; McNeal, N.; Washington, C.; Chen, Y.; Li, L.; Sun, H.; Su, Y. Thinking about GPT-3 In-Context Learning for Biomedical IE? Think Again. In Findings of the Association for Computational Linguistics: EMNLP 2022; Association for Computational Linguistics: Abu Dhabi, United Arab Emirates, 2022. [Google Scholar] [CrossRef]
- Lan, W.; Qiu, S.; He, H.; Xu, W. A continuously growing dataset of sentential paraphrases. arXiv 2017, arXiv:1708.00391. [Google Scholar]
- Saad, M.; Ashour, W. OSAC: Open Source Arabic Corpora. In Proceedings of the 6th International Conference on Electrical and Computer Systems (EECS’10), Lefke, Cyprus, 25–26 November 2010. [Google Scholar]
- Bar, K.; Dershowitz, N. Deriving paraphrases for highly inflected languages from comparable documents. In Proceedings of the 24th International Conference on Computational Linguistics—Proceedings of COLING 2012: Technical Papers, Mumbai, India, 8–15 December 2012. [Google Scholar]
- Wang, Z.; Hamza, W.; Florian, R. Quora. Question Pairs Dataset. Kaggle 2018. Available online: https://www.kaggle.com/datasets/quora/question-pairs-dataset (accessed on 14 November 2024).
- Ganitkevitch, J.; Van Durme, B.; Callison-Burch, C. PPDB: The paraphrase database. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, USA, 9–14 June; Association for Computational Linguistics: Florence, Italy, 2013; pp. 758–764. [Google Scholar]
- Einea, O.; Elnagar, A.; Al Debsi, R. SANAD: Single-label Arabic News Articles Dataset for automatic text categorization. Data Brief 2019, 25, 104076. [Google Scholar] [CrossRef]
- Nabankema, H. Evaluation of Natural Language Processing Techniques for Information Retrieval. Eur. J. Inf. Knowl. Manag. 2024, 3, 38–49. [Google Scholar] [CrossRef]
- Yagi, S.; Elnagar, A.; Yaghi, E. Arabic punctuation dataset. Data Brief 2024, 53, 110118. [Google Scholar] [CrossRef]
- Bird, S.; Klein, E.; Loper, E. Natural Language Processing with Python; O’Reilly Media: Sebastopol, CA, USA, 2009; Available online: http://www.nltk.org/book/ (accessed on 5 February 2025).
- Wang, L.; Chen, X.; Deng, X.; Wen, H.; You, M.; Liu, W.; Li, Q.; Li, J. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. NPJ Digit. Med. 2024, 7, 41. [Google Scholar] [CrossRef] [PubMed]
- Research Google. Google Colaboratory. 2024. Available online: https://colab.research.google.com/ (accessed on 3 March 2025).
Paraphrasing Methods | Language | Original Sentence | Paraphrased Sentence |
---|---|---|---|
Synonyms substitution | Arabic | قرأ الطفل القصة بمتعة | قرأ الطفل القصة بسرور |
English | The child reads the story with enjoyment | The child reads the story happily | |
Different forms of a word | Arabic | قرأ الطفل القصة بمتعة | تمتع الطفل بقراءة القصة |
English | The child reads the story with enjoyment | The child enjoyed by reading the story | |
Changing tense | Arabic | قرأ الطفل القصة بمتعة | قرأت القصة بمتعة |
English | The child reads the story with enjoyment | The story was read with enjoyment | |
Altering the word order | Arabic | قرأ الطفل القصة بمتعة | بمتعة قرأ الطفل القصة |
English | The child reads the story with joy. | With joy, the child reads the story. |
Metric | Formula |
---|---|
BLEU-Precision | |
ROUGE-Recall | |
ROUGE-Precision | |
ROUGE-F1 Score |
Metric | Formula | Description |
---|---|---|
Cosine Similarity | The cosine score is calculated by finding the angle between two vectors (A and B) and focusing on their orientation rather than magnitude. | |
Euclidean Distance | Euclidean distance is computing the straight-line distance between two points (vectors) on a plane. | |
Jaccard Score | The Jaccard score is calculated by finding the intersection between two given sets of tokens. |
Paper | AI Approach | Language | Aim |
---|---|---|---|
McKeown [11] | Rule-based | English | He develops a rule-based paraphrase system by using pre-defined constraints. |
Salloum and Habash [50] | Rule-based | Dialectal Arabic text | They aim to improve the quality of Arabic–English statistical machine translation of the dialectal Arabic text. |
Nagoudi et al. [17] | Seq2seq | Arabic | They generate a novel Arabic benchmark for Arabic paraphrase generation. |
Ormazabal et al. [60] | Seq2seq | English and France | They focus on paraphrasing generation tasks by using round-trip machine translation. |
Gudkov et al. [60] | Seq2seq | Russian | They produce the first Russian corpus for paraphrase generation |
Fu et al. [61] | Seq2seq | English | They evolve a paraphrase generation system for the latent bag-of-words in English text. |
Sancheti et al. [62] | Reinforcement learning | English | They aim to initiate a new task of entailment relation-aware paraphrase generation. |
Li et al. [6] | Reinforcement learning | English | They enhance paraphrasing generation by evolving a novel deep-reinforcement |
Mahmoud and Zrigui [53] | Deep generative | Arabic | They work on detecting paraphrases |
Li et al. [27] | Deep generative | English | They work on paraphrase detection |
Iyyer et al. [52] | Deep generative | English | They develop a model for paraphrase generation |
Dataset | Language | Size |
---|---|---|
ARGEN [17] |
| 123.6 K paraphrase pairs |
Merged OSAC and KSUCCA [54] |
| 1 K sentence pairs |
Arabic corpus BD [70] |
| 100 documents correctly pairing |
QQP [71] |
| 150 K paraphrase pairs |
TwitterURL [68] |
| 51 K sentence pairs |
PPDB [72] |
| 220 M phrasal and lexical paraphrases |
Language | Prompt | Example |
---|---|---|
Arabic | أعد صياغة التالي في جملة واحدة | أعد صياغة التالي في جملة واحدة: “علم البيانات هو مجموعة من التقنيات التي تستخدم لاستخراج قيمة من البيانات.” |
English | Rephrase the following in one sentence. | Rephrase the following in one sentence: “Data science is a collection of techniques used to extract value from data” [9]. |
Category | Original Sentence | Paraphrase Sentence |
---|---|---|
Culture | 48,486 | 48,486 |
Technology | 74,323 | 74,323 |
Sport | 67,360 | 67,360 |
Total | 190,169 | 190,169 |
Evaluator #1 | Evaluator #2 | Evaluator #3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 * | 1 | 2 | 3 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | |
Culture | 0.71 | 1.43 | 2.14 | 95.71 | 1.43 | 7.86 | 5.00 | 85.71 | 2.14 | 1.43 | 10.71 | 85.71 |
Sport | 0.00 | 0.00 | 1.43 | 98.57 | 0.00 | 6.43 | 5.00 | 88.57 | 2.86 | 2.86 | 7.14 | 87.14 |
Technology | 4.29 | 5.00 | 2.86 | 87.86 | 8.57 | 5.00 | 10.00 | 76.43 | 6.43 | 5.71 | 3.57 | 84.29 |
Evaluator #1 | Evaluator #2 | Evaluator #3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 * | 1 | 2 | 3 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | |
Culture | 0.71 | 0.00 | 12.14 | 87.14 | 0.00 | 0.00 | 1.43 | 98.57 | 2.14 | 4.29 | 23.57 | 70.00 |
Sport | 0.71 | 2.86 | 12.14 | 84.29 | 0.00 | 0.71 | 2.14 | 97.14 | 1.43 | 2.86 | 30.71 | 65.00 |
Technology | 1.43 | 6.43 | 9.29 | 82.86 | 2.86 | 1.43 | 7.86 | 87.86 | 6.43 | 6.43 | 22.14 | 65.00 |
Evaluator #1 | Evaluator #2 | Evaluator #3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 * | 1 | 2 | 3 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | |
Culture | 0.71 | 0.00 | 7.86 | 91.43 | 0.00 | 0.00 | 2.14 | 97.86 | 2.86 | 2.86 | 7.86 | 86.43 |
Sport | 0.00 | 1.43 | 5.71 | 92.86 | 0.71 | 0.71 | 1.43 | 97.14 | 2.14 | 1.43 | 15.00 | 81.43 |
Technology | 2.14 | 2.14 | 10.00 | 85.71 | 2.86 | 1.43 | 8.57 | 87.14 | 11.43 | 3.57 | 10.00 | 75.00 |
Evaluator #1 | Evaluator #2 | Evaluator #3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 * | 1 | 2 | 3 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | |
Culture | 67.14 | 16.43 | 12.86 | 3.57 | 0.00 | 0.71 | 1.43 | 97.86 | 57.86 | 10.00 | 21.43 | 10.71 |
Sport | 69.29 | 14.29 | 12.14 | 4.29 | 1.43 | 0.71 | 2.14 | 95.71 | 50.00 | 8.57 | 20.71 | 20.71 |
Technology | 71.43 | 16.43 | 9.29 | 2.86 | 0.71 | 2.86 | 5.71 | 90.71 | 43.57 | 12.86 | 18.57 | 25.00 |
Evaluator #1 | Evaluator #2 | Evaluator #3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 * | 1 | 2 | 3 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | |
Culture | 53.57 | 21.43 | 19.29 | 5.71 | 0.71 | 6.43 | 6.43 | 86.43 | 75.71 | 2.86 | 15.71 | 5.71 |
Sport | 61.43 | 22.86 | 12.86 | 2.86 | 0.71 | 4.29 | 5.00 | 90.00 | 86.43 | 2.14 | 7.86 | 3.57 |
Technology | 67.14 | 18.57 | 12.86 | 1.43 | 2.14 | 7.86 | 15.00 | 75.00 | 87.14 | 0.71 | 7.14 | 5.00 |
Evaluator #1 | Evaluator #2 | Evaluator #3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 * | 1 | 2 | 3 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | |
Culture | 27.86 | 25.71 | 28.57 | 17.86 | 0.71 | 5.71 | 7.14 | 86.43 | 46.43 | 0.00 | 13.57 | 40.00 |
Sport | 34.29 | 23.57 | 22.86 | 19.29 | 1.43 | 4.29 | 5.71 | 88.57 | 46.43 | 4.29 | 12.14 | 37.14 |
Technology | 49.29 | 17.86 | 22.86 | 10.00 | 1.43 | 7.86 | 15.71 | 75.00 | 64.29 | 1.43 | 7.86 | 26.43 |
Evaluator #1 | Evaluator #2 | Evaluator #3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 * | 1 | 2 | 3 | 0 | 1 | 2 | 3 | 0 | 1 | 2 | 3 | |
Culture | 0.71 | 1.43 | 12.86 | 85.00 | 1.43 | 2.86 | 6.43 | 89.29 | 4.29 | 4.29 | 26.43 | 65.00 |
Sport | 0.71 | 1.43 | 17.14 | 80.71 | 2.14 | 2.86 | 2.86 | 92.14 | 2.86 | 4.29 | 33.57 | 59.29 |
Technology | 0.71 | 7.14 | 11.43 | 80.71 | 1.43 | 3.57 | 12.86 | 82.14 | 7.86 | 7.14 | 22.14 | 62.86 |
Category | BLEU-1 | ROUGE-1-Precision | ROUGE-1-Recall | ROUGE-1-F1 | AVG LD | Diff LD |
---|---|---|---|---|---|---|
Culture | 0.49 | 0.53 | 0.63 | 0.57 | 0.94 | 0.05 |
Sport | 0.55 * | 0.58 | 0.62 | 0.59 | 0.96 | 0.03 |
Technology | 0.55 | 0.58 | 0.68 | 0.62 | 0.95 | 0.05 |
Average | 0.53 | 0.56 | 0.64 | 0.59 | 0.95 | 0.04 |
Paper | BLEU_1 | ROUGE-1-Recall |
---|---|---|
Li et al. (2017), Quora-1 dataset [6] | [36–43] | [58–64] |
Li et al. (2017), Quora-2 dataset [6] | [26–34] | [47–57] |
Li et al. (2017), Twitter dataset [6] | [30–45] | [30–42] |
Engomwan and Chali (2019) [12] | [35–40] | |
Fu et al. (2019), Quora dataset [61] | [55–72] | [58–72] |
Fu et al. (2019), MSCOCO dataset [61] | [72–80] | [42–49] |
Nagoudi et al. (2022), paraphrasing dataset [17] | [17–19] | |
This research | [50–60] | [50–70] |
Case | Category | Cosine Similarity | Euclidean Distance | Jaccard Similarity |
---|---|---|---|---|
Its pair | Culture | 0.92 | 2.61 | 0.39 |
Sport | 0.92 | 2.68 | 0.38 | |
Technology | 0.94 | 2.33 | 0.45 | |
Average | 0.93 | 2.54 | 0.40 | |
Same category and not pair | Culture | 0.76 | 4.72 | 0.04 |
Sport | 0.73 | 5.01 | 0.04 | |
Technology | 0.72 | 5.19 | 0.05 | |
Average | 0.74 | 4.97 | 0.04 | |
Different categories and not pair | Culture | 0.68 | 5.51 | 0.03 |
Sport | 0.66 | 5.66 | 0.02 | |
Technology | 0.61 | 6.10 | 0.04 | |
Average | 0.65 | 5.76 | 0.03 |
Category | Level-0 | Level-1 | Level-2 | Level-3 | Total |
---|---|---|---|---|---|
Culture | 392 | 7011 | 15,564 | 25,519 | 48,486 |
Technology | 324 | 5508 | 17,407 | 51,084 | 74,323 |
Sport | 66 | 7831 | 15,656 | 43,807 | 67,360 |
Total | 782 | 20,350 | 48,627 | 120,410 | 190,169 |
Category | Level-0 | Level-1 | Level-2 | Level-3 | Total |
---|---|---|---|---|---|
Culture | 392 | 7011 | 15,564 | 25,519 | 48,486 |
Technology | 266 | 5508 | 17,407 | 51,084 | 74,265 |
Sport | 66 | 7831 | 15,656 | 43,807 | 67,360 |
Total | 724 | 20,350 | 48,627 | 120,410 | 190,111 |
Category | Ori_sentence | Par_sentence | Explanation |
---|---|---|---|
Culture | مُغلقةٌ حاويات النواياوراء ضبابٍ كتيمٍ تقيمُبقعر الظنونْ. | تقبعُ حاويات النوايا مُغلقةً في قعر الظنون، وراء ضبابٍ كتيم. | This case contains poetry text; GPT rewrites it as a sentence. |
Sport | وأشادت بعض الصحف المكسيكية بمنتخب بلادها، حيث تصدرت عناوين الانتصار والإشادة، صفحات بعض الجرائد هناك، مثل صحيفتي «ريفورما» و«يونيفيرسال»، فيما اكتفت صحيفة «إكسيلسيور» بتخصيص عمود صغير للتذكير بالنتيجة (3-1) | أشادت بعض الصحف المكسيكية بمنتخب بلادها بتصدُّر عناوين الانتصار والإشادة صفحتي «ريفورما» و«يونيفيرسال»، بينما اكتفت «إكسيلسيور» بتخصيص عمود صغير لتذكير القراء بالنتيجة (3-1) | The sport text shows match results in two circle packets. This shows the specific defined tokenization method for sport category works effectively in splitting sentence ending by a circle packet. |
Sport | وبات فيتوريا غيمارايش يبتعد بفارق 3 نقاط امام سبورتينغ لشبونة وصيف البطل اثر خسارة الاخير امام يونياو ليريا بهدف لليدسون (86) مقابل اربعة اهداف لبول سيزار (15 و19) ونغال (83) وكادو (90) | اتسع الفارق بين فيتوريا غيمارايش وسبورتينغ لشبونة وصيف البطل إلى 3 نقاط بعد خسارة الأخير أمام يونياو ليريا بنتيجة 1-4، حيث سجل لليدسون (86) وليرييا بول سيزار (15 و19)، ونغال (83)، وكادو (90). | GPT can understand sport text that contains number of goals or player numbers. Additionally, in the Ori_sentence, the number of goals is written as text, and GPT analyzes the text and rewrites it as a number between a hyphen such as 4-1 |
Technology | يُشار إلى أن حُزمة تريند مايكرو Data Loss Prevention for Endpoint تفوقت في اختبارات مماثلة أجرتها الدورية العالمية المتخصصة Network World؛ كما منحتها دورية SC Magazine لقبَ أفضل الحلول في مجال الحول دون فقدان البيانات. | تفوقت حُزمة تريند مايكرو Data Loss Prevention for Endpoint في اختبارات الدورية العالمية Network World، وحصلت على لقب أفضل حلول منع فقدان البيانات من دورية SC Magazine | The pair is ranked as Level-3 due to the high overlap in English computing words such as ‘Data Loss Prevention for Endpoint’. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alsulami, H.R.; Almansour, A.A. Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language. Appl. Sci. 2025, 15, 4139. https://doi.org/10.3390/app15084139
Alsulami HR, Almansour AA. Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language. Applied Sciences. 2025; 15(8):4139. https://doi.org/10.3390/app15084139
Chicago/Turabian StyleAlsulami, Haya Rabih, and Amal Abdullah Almansour. 2025. "Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language" Applied Sciences 15, no. 8: 4139. https://doi.org/10.3390/app15084139
APA StyleAlsulami, H. R., & Almansour, A. A. (2025). Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language. Applied Sciences, 15(8), 4139. https://doi.org/10.3390/app15084139