Social Media Sentiment Analysis for Sustainable Rural Event Planning: A Case Study of Agricultural Festivals in Al-Baha, Saudi Arabia

Alzahrani, Musaad; AlGhamdi, Fahad

doi:10.3390/su17093864

Open AccessArticle

Social Media Sentiment Analysis for Sustainable Rural Event Planning: A Case Study of Agricultural Festivals in Al-Baha, Saudi Arabia

by

Musaad Alzahrani

^*

and

Fahad AlGhamdi

Faculty of Computing and Information, Al-Baha University, Al-Baha P.O. Box 1988, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(9), 3864; https://doi.org/10.3390/su17093864

Submission received: 23 March 2025 / Revised: 16 April 2025 / Accepted: 23 April 2025 / Published: 25 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

Agricultural festivals play a vital role in promoting sustainable farming, local economies, and cultural heritage. Understanding public sentiment toward these events can provide valuable insights to enhance event organization, marketing strategies, and economic sustainability. In this study, we collected and analyzed social media data from Twitter to evaluate public perceptions of Al-Baha’s agricultural festivals. Sentiment analysis was performed using both traditional machine learning and deep learning approaches. Specifically, six machine learning models including Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), k-Nearest Neighbors (KNN), and XGBoost (XGB) were compared against AraBERT, a transformer-based deep learning model. Each model was evaluated based on accuracy, precision, recall, and F1-score. The results demonstrated that AraBERT achieved the highest performance across all metrics, with an accuracy of 85%, confirming its superiority in Arabic sentiment classification. Among traditional models, SVM and RF performed best, whereas MNB and KNN struggled with sentiment detection. These findings highlight the role of sentiment analysis in supporting sustainable agricultural and tourism initiatives. The insights gained from sentiment trends can help festival organizers, policymakers, and agricultural stakeholders make data-driven decisions to enhance sustainable event planning, optimize resource allocation, and improve marketing strategies in line with the Sustainable Development Goals (SDGs).

Keywords:

sentiment analysis; agricultural festivals; Sustainable Development Goals (SDGs); sustainable event planning; social media; natural language processing (NLP); machine learning; deep learning; transformer models; AraBERT

1. Introduction

Agricultural festivals play a crucial role in promoting local culture, fostering sustainable tourism, and driving rural economic development, particularly in regions with a rich agricultural heritage such as Al-Baha, Saudi Arabia. Located in the southwestern part of Saudi Arabia, Al-Baha is well known for its fertile lands and diverse agricultural production, including pomegranates, honey, dates, and cereals. The region organizes several annual agricultural festivals, such as the International Honey Festival and the National Pomegranate Festival, to celebrate its agricultural identity and stimulate economic growth. These festivals provide a platform to showcase sustainable agricultural practices, support local farmers, attract eco-tourism, and create new market opportunities. Despite the economic and cultural importance of these festivals, there is limited data-driven research on how they are perceived by the public. Understanding public sentiment is essential for evaluating the success of these events, improving their impact, and ensuring long-term sustainability. Sentiment analysis, also known as opinion mining, is a computational approach that analyzes people’s opinions, emotions, and attitudes toward a given entity [1]. It focuses on identifying and classifying the sentiment expressed in textual data, typically categorizing it as positive, neutral, or negative. In recent years, sentiment analysis has gained widespread acceptance not only among researchers but also across industries, governments, and organizations [2,3] due to its ability to extract meaningful insights from large-scale textual data. By leveraging sentiment analysis techniques, valuable insights can be obtained from social media discussions, particularly on platforms such as Twitter, where festival attendees actively share their experiences, expectations, and concerns. Analyzing these insights provides a data-driven approach to improving festival organization, enhancing sustainable marketing strategies, and promoting responsible tourism.

This study aims to analyze public sentiment surrounding Al-Baha’s agricultural festivals, focusing on their role in promoting local agricultural products, strengthening regional identity, and aligning with sustainability goals. To achieve this, the study employs Arabic Natural Language Processing (NLP)techniques to process and analyze social media data related to these festivals. Sentiments will be classified into positive, neutral, and negative categories using six machine learning models: Multinomial Naïve Bayes (MNB), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF), k-Nearest Neighbors (KNN), and XGBoost (Extreme Gradient Boosting). Each model will be evaluated individually to compare their effectiveness in sentiment classification. Additionally, AraBERT, a transformer-based deep learning model specialized in Arabic text processing, will be incorporated to assess its performance against traditional machine learning models. This comparative analysis will identify the most effective approach for sentiment classification in the context of Arabic-language agricultural festivals. By leveraging AI-driven sentiment analysis, this study provides practical recommendations for enhancing festival management, optimizing e-marketing campaigns, and improving overall stakeholder engagement in alignment with the Sustainable Development Goals (SDGs).

The findings of this study will benefit multiple stakeholders, including festival organizers, local farmers, policymakers, and tourism authorities. The insights gained can be utilized to improve future festival planning, implement data-driven marketing strategies, and reinforce Al-Baha’s reputation as a sustainable agricultural and cultural hub. Furthermore, this study contributes to the growing field of sentiment analysis, particularly in the context of Arabic text and agricultural tourism, offering a novel AI-based approach to evaluating public engagement and sustainable event strategies.

The remainder of this paper is structured as follows: Section 2 presents a review of related studies on sentiment analysis in the agricultural domain. Section 3 details the methodology employed in this study, including data collection, preprocessing, sentiment classification models, and evaluation metrics. Section 4 discusses the experimental results and provides a comparative analysis of the model performances. Finally, Section 5 concludes the paper and outlines potential directions for future research.

2. Related Work

Sentiment analysis has been increasingly employed across various domains, including healthcare [4,5,6,7,8], education [9,10,11,12], and agriculture [13,14,15,16,17,18,19]. Within the agricultural domain, sentiment analysis has served as a valuable tool for understanding consumer opinions, expert assessments, and public attitudes toward emerging technologies and practices.

Cao et al. [13] proposed an enhanced BERT-based sentiment analysis model tailored to the unique linguistic characteristics of agricultural product reviews. Their approach addressed non-standard expressions and sparse textual features, achieving an F1-score of 89.86%, thereby highlighting the potential of transformer-based models in agricultural e-commerce contexts. Similarly, Zikang et al. [17] compared several deep learning models (Text-CNN, Bi-LSTM, and BERT) for classifying sentiment in online agricultural product reviews, finding CNN to be particularly effective in processing short, opinion-rich texts. Liu et al. [18] extended this work by introducing the AgriMFLN model, which combines LSTM networks with multi-head attention to improve sentiment classification performance in complex agricultural narratives.

Other studies explored public sentiment toward agricultural innovations and technology adoption. Yadav et al. [14] examined YouTube comments on smart farming technologies, identifying a mix of optimism and concern through sentiment classification using the Pattern library. Kaushik et al. [15] applied multiple machine learning models along with explainability techniques (e.g., LIME) to understand public attitudes toward precision agriculture, reinforcing the importance of transparent AI in agricultural decision-making. Meanwhile, Nimirthi et al. [16] used a hybrid approach on agriculture-related Twitter data, showing that bigram-based models outperformed unigrams and that combining lexicon-based and machine learning techniques improved overall classification accuracy.

Rehman et al. [19] investigated expert opinions on factors affecting crop productivity, applying Naïve Bayes and k-Nearest Neighbors to classify sentiment based on textual feedback related to soil quality, climate conditions, and irrigation. Their study achieved up to 87% accuracy, demonstrating the viability of traditional machine learning methods for structured expert-generated content. These collective efforts show that sentiment analysis in agriculture not only enhances understanding of consumer and stakeholder perspectives but also informs strategic decisions in product marketing, policy formulation, and technology deployment.

In addition to agricultural applications, sentiment analysis has also found use in the tourism sector, particularly in evaluating public perceptions of destinations and services. Leelawat et al. [20] analyzed tweets about tourism in Thailand using SVM and Random Forest, identifying health-related and political concerns alongside cultural appreciation. Gupta et al. [21] focused on detecting deceptive hotel reviews using CNN and sentiment polarity features, contributing to trustworthy travel platforms. Cao et al. [22] applied a BERT model to Chinese tourist reviews of Melaka’s heritage sites, uncovering sentiment patterns linked to cultural and experiential factors. Finally, Alzahrani et al. [23] developed an AI-driven sentiment analysis framework using YouTube comments to assess tourist satisfaction in Al-Baha, Saudi Arabia, aligning tourism development with Vision 2030 goals.

In parallel with domain-specific developments, advances in Arabic natural language processing (NLP) have also played a crucial role in enhancing sentiment analysis capabilities. One of the most influential contributions is AraBERT, developed by Antoun et al. [24], which was pre-trained on a large Arabic corpus and designed to capture the linguistic intricacies of Arabic. Compared to earlier models that relied on static word embeddings (e.g., Word2Vec, fastText) combined with CNNs or LSTMs, AraBERT leverages contextual embeddings and has shown superior performance on various Arabic NLP tasks, including sentiment classification. Other models, such as CAMeLBERT [25] and QARiB [26], have also contributed to recent advancements, though AraBERT remains a widely adopted benchmark. Given its robustness and domain relevance, our study adopts AraBERT as a representative transformer model to compare against traditional machine learning approaches.

Despite the growing body of literature, no prior studies have applied sentiment analysis specifically to agricultural festivals, which represent a unique intersection of rural tourism, cultural heritage, and agricultural marketing. These festivals serve as vital platforms for promoting local produce, fostering community identity, and supporting sustainable rural economies. The present study addresses this gap by analyzing social media sentiment toward agricultural festivals in Al-Baha Province. It draws methodological inspiration from both agricultural and tourism-focused research but introduces a novel focus on event-centered sentiment, offering actionable insights for policymakers, marketers, and event organizers seeking to enhance rural development strategies.

3. Methodology

This study employs sentiment analysis techniques to evaluate public sentiment toward agricultural festivals in Al-Baha, Saudi Arabia. The methodology follows a structured approach consisting of multiple stages, including data collection, preprocessing, sentiment analysis, and model evaluation. Figure 1 provides an overview of the research framework, illustrating the key steps in the sentiment analysis pipeline. The process begins with data acquisition, followed by text preprocessing to refine the collected textual data. Sentiment classification is then performed using multiple machine learning and deep learning models, and finally, the models are evaluated using standard performance metrics. This structured methodology ensures a robust and systematic approach to analyzing public sentiment related to Al-Baha’s agricultural festivals.

3.1. Data Collection

The primary data for this research were collected from Twitter using the Nitter web interface, a free and privacy-focused alternative front-end for Twitter that allows users to browse and scrape content without JavaScript, tracking, or API restrictions [27]. Due to Twitter’s API limitations, Selenium-based web scraping techniques were employed to extract public tweets related to agricultural festivals in Al-Baha. Selenium is widely recognized as an effective tool for automating browser interactions and extracting web-based content for sentiment analysis studies [28]. A total of 2214 tweets and replies were collected using relevant Arabic-language hashtags related to agricultural festivals in the Al-Baha region. These included:

#National_Pomegranate_Festival
#Al-Baha_Pomegranate_Festival
#International_Honey_Festival
#Al-Baha_Agricultural_Festival

Note: Original hashtags were written in Arabic; only their English translations are shown here for clarity.

The dataset includes both tweets and their replies, ensuring a diverse range of public opinions and a richer sentiment analysis dataset. All data were collected from publicly available Arabic-language tweets. No personal identifiers (such as usernames, user IDs, or profile information) were stored or analyzed. During preprocessing, all tweets were anonymized, and only textual content relevant to the research objectives was retained. As the data were publicly accessible and analyzed in aggregate, no additional user consent was required.

3.2. Data Preprocessing

To ensure that our Twitter dataset is both high-quality and consistent for subsequent analysis and modeling, we performed a series of preprocessing steps. These operations not only standardize the text data but also help improve the effectiveness of sentiment classification and other downstream NLP tasks. Below is an overview of each step, along with illustrative examples.

Removing Diacritics: In Arabic text, diacritical marks are often used to guide pronunciation. We removed these marks to simplify text processing and reduce the dimensionality of features. This results in a more uniform text representation and avoids confusion between different variations of the same word.
Removing Kashida (Stretching): Kashida refers to extended dashes used for justification or stylistic purposes. Removing these ensures more consistent tokenization and accurate word recognition.
Converting Repeated Spaces to a Single Space: All multiple or repeated spaces were consolidated into a single space. This reduces noise and ensures clean token splitting during preprocessing.
Separating Punctuation Marks from Words: We inserted spaces between punctuation marks and adjacent words so that punctuation is treated as separate tokens. This improves tokenizer accuracy, especially in sentiment contexts (e.g., exclamation marks).
Removing Non-Arabic Posts: Any tweets written in non-Arabic languages were removed. Only Arabic posts were retained to ensure the dataset remains consistent with the sentiment classification task.
Removing Speech Effects (Elongated Letters): Letters repeated for emphasis (e.g., multiple instances of the same character) were reduced to a single instance. This step prevents the model from encountering overly long or rare tokens.
Removing Duplicate Posts: Duplicate tweets were identified and removed to ensure that each data point was unique. This prevents model bias and avoids inflating sentiment categories with repeated content.
Removing or Fixing Encoding Errors: Tweets with corrupted or unreadable characters were either corrected or discarded, ensuring data integrity and preventing errors during tokenization or model training.
Normalizing Common Orthographic Variants: Common variations in Arabic spelling were standardized. This included unifying different forms of certain characters and segmenting URLs and hashtags from words. These normalization steps reduce token fragmentation and improve classification consistency.

Overall, these preprocessing steps yield a cleaner, more consistent dataset. This standardization process is critical for sentiment analysis and other downstream Natural Language Processing (NLP) tasks, as it helps both reduce noise and better capture the true semantic patterns in Arabic text. The cleaned and structured data were stored in CSV format for further processing. After applying these preprocessing steps, the dataset was reduced to 1745 tweets and replies, ensuring a high-quality and representative sample for sentiment classification.

3.3. Data Annotation

To develop and evaluate our sentiment analysis models, every tweet/post in the dataset was assigned one of three sentiment labels Positive, Neutral, or Negative, following a Sentiment Annotation Guideline. The guideline aims to guide the annotators in their work in labeling the tweets/posts based on the overall sentiment expressed. Below are the essential points of the guideline, along with some real examples (in Arabic) drawn from our labeled dataset.

Positive
- Tweets or posts that convey approval, satisfaction, or commendation. For example: “May Allah bless your efforts and increase your blessings.” (labeled as Positive). This tweet praises the organizers’ efforts and blessings, reflecting an overall positive sentiment.
- Edge Cases:
  –
  Sarcasm: If a tweet appears positive on the surface (e.g., “Wow, this is just perfect…”), but strong contextual cues confirm sarcasm, it should be labeled Negative.
  –
  Mixed Emotions: In posts that contain both praise and complaint, the dominant sentiment takes precedence.
Neutral
- Tweets or posts containing factual statements, neutral questions, or general information without strong affect. For example: “The festival closing ceremony will be held on Thursday at 8 p.m.” (labeled as Neutral). This example provides a factual detail about the event schedule, with no explicit sentiment.
- Edge Cases:
  –
  Ambiguous Opinions: If a post seems vague or noncommittal (e.g., “The festival was normal”), default to Neutral.
  –
  Questions: Inquiries like “Has anyone tried this farm’s product?” remain Neutral unless there is a clear emotional undertone.
Negative
- Tweets or posts expressing dissatisfaction, critique, or displeasure. For example: “Unfortunately, the organization is poor and the crowding is stifling.” (labeled as Negative). This tweet criticizes the event’s organization and highlights a negative experience.
- Edge Cases:
  –
  Constructive Criticism: “The festival needs a lot of improvements.” should still be labeled Negative if it implies significant discontent.
  –
  Hyperbole: Exaggerated phrases like “I’m dying of boredom.” are also considered Negative given they express strong displeasure.

3.3.1. Annotator Instructions

Our two annotators were given specific directives to ensure consistency across all labeled samples. First, they were instructed to consider the complete context of each tweet so that they could accurately capture the sentiment within the broader discourse. In addition, the annotators were advised to regard instances of extreme punctuation (e.g., “!!!” or “???”) as a possible indicator of particularly strong emotion. Whenever the sentiment of a post remained unclear or appeared ambiguous, Neutral was to be selected as the default label. In tweets containing multiple emotions, the annotators were asked to identify which sentiment was dominant and assign the label accordingly. Lastly, unless the context strongly indicated irony or sarcasm, the annotators were expected to rely on the surface meaning of the text.

3.3.2. Quality Control

To preserve the integrity and consistency of the labeling process, we implemented rigorous quality control protocols. Firstly, Inter-Annotator Agreement (IAA) was assessed by duplicating at least 20% of the total tweets, distributing them anonymously among the annotators, and then calculating the overlap scores. If the overall IAA score fell below 90%, the overlapping tweets were re-annotated. Moreover, if the IAA for a particular sentiment label (Positive, Neutral, or Negative) fell below 80%, we revised the corresponding portion of the annotation guidelines to address any misunderstandings and then repeated the annotation.

In addition, a Reannotation of Low-Agreement Samples measure was enacted to handle discrepancies. Any tweets identified with unresolved disagreements were flagged and underwent a second round of annotation after further clarifying the guidelines. If differences still persisted, senior annotators convened to finalize the decision collaboratively. Through these methods, we ensured a robust and consistent labeling process.

Upon completion of the annotation, we arrived at the following distribution:

Total Tweets/Replies: 1745
Positive: 411
Neutral: 1223
Negative: 111

Hence, Neutral constituted roughly 70% of the data, Positive around 24%, and Negative nearly 6%. This distribution reflects the overall tone of the collected tweets, offering a sufficiently varied dataset for subsequent sentiment classification experiments. The detailed distribution of sentiment labels is illustrated in Figure 2, which visually highlights the class imbalance within the dataset.

3.4. Sentiment Analysis

Sentiment classification was performed using both traditional machine learning models and deep learning approaches.

3.4.1. Baseline Machine Learning Models

This study employs the following six machine learning models for sentiment classification. These models, widely used in natural language processing (NLP), serve as a baseline to evaluate the effectiveness of sentiment analysis on agricultural festival data.

Multinomial Naïve Bayes (MNB): A probabilistic classifier based on Bayes’ theorem, assuming feature independence given the class. It is widely used for text classification due to its efficiency and ability to handle high-dimensional sparse data [29].
Support Vector Machine (SVM): A supervised learning algorithm that finds the optimal hyperplane to separate classes in high-dimensional space. It is particularly effective for text classification as it maximizes the margin between different classes [30].
k-Nearest Neighbors (KNN): A non-parametric, instance-based learning algorithm that classifies data points based on the majority class of their nearest neighbors. While useful for text classification, it can be computationally expensive for large datasets [31].
Logistic Regression (LR): A statistical model that predicts categorical outcomes by estimating class probabilities using a linear decision boundary. While primarily used for binary classification, it can be extended to multi-class classification using techniques such as softmax regression (multinomial logistic regression), which we employed in this study [32].
Random Forest (RF): An ensemble-based model that constructs multiple decision trees and aggregates their predictions to improve accuracy and reduce overfitting. It is effective for handling high-dimensional and non-linear data [33].
XGBoost (Extreme Gradient Boosting): A high-performance gradient boosting algorithm optimized for speed and accuracy. It builds decision trees iteratively to minimize classification error and is widely used for structured data classification, including text classification [34].

Each model was trained on Term Frequency-Inverse Document Frequency (TF-IDF) vectorized text features, which represent words based on their frequency in the dataset while reducing the impact of common words [35]. The dataset was split into training and testing subsets while preserving the original class distribution to ensure a balanced evaluation of sentiment classification performance.

The hyperparameters used for each machine learning model are summarized in Table 1. Unless otherwise stated, default settings from the scikit-learn and xgboost libraries were used.

3.4.2. AraBERT Model

To improve classification performance, we leverage the AraBERT model, a variant of the Bidirectional Encoder Representations from Transformers (BERT) tailored for the Arabic language [24]. This model is pre-trained on extensive Arabic text corpora using a masked language modeling objective. The architecture is based on the Transformer, featuring multi-head self-attention and deep bidirectional layers, which enable the model to capture long-range dependencies and rich contextual information from input texts.

The input text is initially processed by the BERT tokenizer, which segments the text into sub-word units (tokens) and maps them to unique numerical identifiers. Along with token IDs, the tokenizer generates positional encodings and attention masks that differentiate between actual tokens and padding. This representation, composed of input IDs and attention masks, serves as the input to the BERT encoder, thereby allowing the model to produce contextualized embeddings for each token in the sequence.

Following the generation of contextualized embeddings, the representation corresponding to the special [CLS] token is extracted as a summary of the entire sequence. This [CLS] embedding is fed into a linear classifier head—a fully connected layer—that maps the high-dimensional embedding to a vector of logits, where each logit corresponds to one of the sentiment classes (i.e., Positive, Negative, Neutral). The logits are then converted into a probability distribution using the softmax function. Fine-tuning is performed by minimizing the cross-entropy loss between these predicted probabilities and the ground truth labels, with gradients propagated back through both the classifier head and the BERT layers. Although some approaches freeze the base layers, our methodology allows for full fine-tuning, ensuring that the model adapts comprehensively to the nuances of the sentiment analysis task.

The model is fine-tuned on a labeled sentiment dataset that has been stratified to preserve class distributions, particularly given the overwhelming prevalence of the Neutral class. The training process employs the AdamW optimizer with a carefully chosen learning rate (e.g., $2 \times 10^{- 5}$ ), weight decay (e.g., 0.01), and a warm-up phase to stabilize early training dynamics. The entire training pipeline is orchestrated using a high-level framework that manages data batching, evaluation, and logging.

The training pipeline is designed to optimize the model using a stratified dataset that is divided into training and evaluation subsets to preserve the original class distribution—an essential consideration given the predominance of the Neutral class. The following hyperparameters were configured in the training process:

Number of Training Epochs: 5
Batch Size: 8 per device for both training and evaluation
Evaluation Strategy: Conducted at the end of each epoch (eval_strategy="epoch")
Learning Rate: Set to $2 \times 10^{- 5}$ to enable fine-grained updates
Weight Decay: 0.01 to help mitigate overfitting
Warmup Steps: 500 to gradually ramp up the learning rate during the initial training phase
Logging Steps: 10, ensuring frequent updates on training progress
Random Seed: 42, to ensure reproducibility

These hyperparameters were chosen to balance convergence speed and model stability, particularly in the presence of class imbalance.

3.5. Result Evaluation and Comparison

To assess the performance of the sentiment classification models, we used four evaluation metrics: Accuracy, Precision, Recall, and F1-score. These metrics provide a comprehensive assessment of classification performance, particularly in multi-class sentiment analysis (Positive, Neutral, Negative).

3.5.1. Accuracy

Accuracy measures the overall correctness of the model by calculating the proportion of correctly classified instances across all sentiment categories. While accuracy provides a general performance measure, it may not be reliable when the dataset is imbalanced. The accuracy formula is given as:

Accuracy = \frac{T P + T N}{T P + T N + F P + F N}

(1)

where:

TP (True Positives) : Correctly classified instances of a given sentiment (Positive, Neutral, or Negative).
TN (True Negatives) : Correctly classified instances of other sentiment categories.
FP (False Positives) : Incorrectly classified instances where the model predicted a sentiment that does not match the actual label.
FN (False Negatives) : Incorrectly classified instances where the model failed to identify the correct sentiment.

3.5.2. Precision

Precision measures how many of the instances classified as a particular sentiment were correctly identified. In the context of festival sentiment analysis, high precision ensures that when the model predicts a sentiment category (Positive, Neutral, or Negative), it is correctly classified as such, minimizing false classifications. Precision is calculated as follows:

Precision = \frac{T P}{T P + F P}

(2)

3.5.3. Recall

Recall measures how well the model correctly identifies instances of each sentiment category. A high recall score means the model successfully captures most of the actual occurrences of a given sentiment. The recall formula is defined as follows:

Recall = \frac{T P}{T P + F N}

(3)

3.5.4. F1-Score

The F1-score provides a balanced evaluation by combining precision and recall into a single metric. It is particularly useful when dealing with class imbalances, ensuring that neither precision nor recall dominates the evaluation. The F1-score is computed as follows:

F 1 - score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(4)

Since this study involves three sentiment categories (Positive, Neutral, Negative), precision, recall, and F1-score are calculated for each sentiment class separately and then averaged using a weighted average. This ensures that each class contributes proportionally to the final evaluation scores.

Each of these metrics was computed for all six machine learning models (MNB, SVM, LR, RF, KNN, XGBoost) as well as for the AraBERT transformer-based model. The results were compared to determine the most effective approach for sentiment classification of Al-Baha’s agricultural festival discussions.

4. Results and Discussion

4.1. Model Performance Overview

Figure 3 presents the accuracy scores of all models. Among all models, AraBERT achieved the highest accuracy (0.85), demonstrating its superior ability to handle sentiment classification in Arabic text. Among traditional machine learning models, Support Vector Machine (SVM) (0.82) and Random Forest (RF) (0.81) outperformed other classifiers, highlighting their robustness in capturing sentiment nuances. In contrast, k-Nearest Neighbors (KNN) (0.73) and Multinomial Naïve Bayes (MNB) (0.74) had the lowest accuracy, suggesting that distance-based (KNN) and probabilistic (MNB) models struggle with the Arabic text with complex sentiment expressions.

4.2. Precision and Recall Analysis

Figure 4 and Figure 5 present the precision and recall scores, respectively. Precision measures how many of the predicted instances for a particular sentiment were actually correct. A high precision score indicates that the model made fewer false positive errors. AraBERT achieved the highest precision (0.84), meaning it was highly confident in its predictions with minimal false positives. RF (0.83) and SVM (0.82) followed closely, reinforcing their effectiveness in high-dimensional text classification. KNN and MNB had the lowest precision, suggesting they misclassified a significant portion of data, likely due to their oversimplified classification strategies.

Recall measures the model’s ability to correctly identify all instances of a given sentiment class. A high recall score indicates that the model successfully captures most relevant sentiment expressions. AraBERT had the highest recall (0.85), effectively capturing a broad range of sentiment nuances. SVM (0.82) and RF (0.81) also performed well, demonstrating their reliability in detecting sentiment accurately. KNN and MNB had the lowest recall, indicating that many actual sentiment instances were missed, which can be problematic in applications where comprehensive sentiment detection is essential.

4.3. F1-Score: Balancing Precision and Recall

The F1-score, presented in Figure 6, balances precision and recall to provide a more comprehensive evaluation of model performance. AraBERT exhibited the highest F1-score (0.85), confirming its ability to maintain both high precision and recall. SVM achieved an F1-score of 0.80, making it the best-performing traditional ML model, striking a strong balance between precision and recall. MNB and KNN had the lowest F1-scores, reinforcing their struggles in sentiment classification due to their inherent limitations in handling textual data complexity.

4.4. Summary and Insights of Model Results

The key findings from this analysis are as follows:

AraBERT consistently outperformed all other models across all evaluation metrics, making it the most effective choice for sentiment analysis in this study.
SVM and RF demonstrated strong performance among traditional ML models, making them viable alternatives for scenarios where computational efficiency is a concern.
MNB and KNN struggled with sentiment classification, likely due to their simplistic assumptions about text feature relationships, making them less suitable for complex sentiment tasks.

These findings suggest that deep learning models such as AraBERT are significantly better suited for Arabic sentiment analysis, while SVM and RF offer strong alternatives for researchers and practitioners who require computationally efficient solutions.

4.5. Practical Implications of Sentiment Trends

The sentiment analysis results offer meaningful insights that can guide practical decision-making for festival organizers, policymakers, and marketing professionals. Identifying patterns in public sentiment enables stakeholders to better understand community perceptions, anticipate challenges, and refine strategies to enhance the success and sustainability of future events.

For instance, positive sentiments often centered around the overall festive atmosphere and appreciation for local agricultural products, suggesting that these aspects should be emphasized in future promotional efforts. Neutral sentiments, on the other hand, may reflect a lack of strong emotional engagement, which can signal areas where the festival experience is underwhelming or unclear. Recognizing such patterns can help organizers refine communication strategies or enhance programming to generate stronger public interest.

Negative sentiments—though relatively limited in number—highlighted several recurring concerns. One notable issue pertained to the relocation of the Pomegranate Festival in 2019 from its traditional rural site near Bidah Valley to Raghadan Park near the city center. Many posts expressed disappointment, with some perceiving this shift as detracting from the festival’s authenticity and its connection to the local farming heritage.

Other negative sentiment patterns focused on the high prices of pomegranates sold during the event, which some users felt conflicted with the festival’s community-focused mission. Additionally, concerns were raised about the timing of certain festivals, particularly when scheduled outside peak tourist seasons or during financially constrained periods (e.g., immediately following holidays or before monthly salaries), thereby limiting public engagement and economic impact.

These insights underscore the value of sentiment analysis as a tool for uncovering nuanced public feedback that may not be captured through traditional evaluation methods. By leveraging these findings, decision-makers can make more informed choices regarding event logistics, pricing policies, scheduling, and venue selection to better align with public expectations and support sustainable rural development goals.

5. Conclusions and Future Work

This study applied sentiment analysis techniques to assess public perceptions of Al-Baha’s agricultural festivals using social media data. By employing both traditional machine learning models (Multinomial Naïve Bayes, Support Vector Machine, Logistic Regression, Random Forest, k-Nearest Neighbors, and XGBoost) and a transformer-based deep learning model (AraBERT), we classified sentiments into Positive, Neutral, and Negative categories. The results demonstrated that AraBERT significantly outperformed all traditional models across all evaluation metrics, achieving the highest accuracy (0.85), precision (0.84), recall (0.85), and F1-score (0.85). Among the traditional models, SVM and Random Forest performed the best, making them strong alternatives when computational efficiency is a priority. In contrast, Naïve Bayes and KNN exhibited the lowest performance, indicating their limitations in handling the complexities of Arabic text with complex sentiment expressions.

These findings highlight the effectiveness of deep learning in sentiment analysis tasks while also demonstrating the viability of machine learning models in resource-constrained environments. Beyond model performance, the sentiment trends uncovered in this study offer actionable insights for festival organizers, policymakers, and agricultural stakeholders. For instance, the prevalence of positive sentiments around the festive atmosphere and local products suggests that these elements could be emphasized in future promotional strategies to boost engagement. Conversely, negative sentiments revealed recurring concerns related to high product prices, festival scheduling outside peak tourism periods, and venue changes that were perceived to reduce the event’s authenticity. These insights provide a data-driven foundation for improving event timing, pricing strategies, and location planning which ultimately support more inclusive and sustainable rural development.

Although this study provides important insights, several areas for future research can be explored to further enhance sentiment analysis in this domain. Future studies can extend sentiment analysis to identify specific aspects of the festivals (e.g., organization, entertainment, agricultural products) and analyze sentiment for each category separately. Additionally, incorporating data from multiple social media platforms, such as Facebook, Instagram, and YouTube, could provide a more comprehensive view of public sentiment. Furthermore, combining traditional machine learning with deep learning models in an ensemble approach may further improve classification accuracy and robustness. By exploring these directions, future research can enhance the effectiveness of sentiment analysis in understanding public perceptions and contribute to sustainable development by enabling informed decision-making and improving stakeholder engagement in the agricultural sector.

Author Contributions

Conceptualization, M.A.; methodology, M.A. and F.A.; software, M.A. and F.A.; validation, M.A.; formal analysis, M.A. and F.A.; investigation, M.A. and F.A.; resources, M.A. and F.A.; data curation, F.A.; writing—original draft preparation, M.A.; writing—review and editing, M.A. and F.A.; visualization, M.A.; supervision, M.A.; project administration, M.A.; funding acquisition, M.A. All authors reviewed the results and approved the final version of the manuscript.

Funding

The authors extend their appreciation to the Deputyship of Research and Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project number MOE-BU-1-2020.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Acknowledgments

The authors acknowledge the Deputyship of Research and Innovation, Ministry of Education in Saudi Arabia.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Medhat, W.; Hassan, A.; Korashy, H. Sentiment analysis algorithms and applications: A survey. Ain Shams Eng. J. 2014, 5, 1093–1113. [Google Scholar] [CrossRef]
Wankhade, M.; Rao, A.C.S.; Kulkarni, C. A survey on sentiment analysis methods, applications, and challenges. Artif. Intell. Rev. 2022, 55, 5731–5780. [Google Scholar] [CrossRef]
Sánchez-Rada, J.F.; Iglesias, C.A. Social context in sentiment analysis: Formal definition, overview of current trends and framework for comparison. Inf. Fusion 2019, 52, 344–356. [Google Scholar] [CrossRef]
Nagaraj, P.; Deepalakshmi, P.; Muneeswaran, V.; Muthamil Sudar, K. Sentiment analysis on diabetes diagnosis health care using machine learning technique. In Congress on Intelligent Systems: Proceedings of CIS 2021; Springer: Berlin/Heidelberg, Germany, 2022; Volume 1, pp. 491–502. [Google Scholar] [CrossRef]
Ainley, E.; Witwicki, C.; Tallett, A.; Graham, C. Using Twitter Comments to Understand People’s Experiences of UK Health Care During the COVID-19 Pandemic: Thematic and Sentiment Analysis. J. Med. Int. Res. 2021, 23, e31101. [Google Scholar] [CrossRef]
Hilal, A.M.; Alfurhood, B.S.; Al-Wesabi, F.N.; Hamza, M.A.; Duhayyim, M.A.; Iskandar, H.G. Artificial Intelligence Based Sentiment Analysis for Health Crisis Management in Smart Cities. Comput. Mater. Contin. 2021, 71, 143–157. [Google Scholar] [CrossRef]
Lal, M.; Neduncheliyan, S. Enhanced V-Net approach for the emotion recognition and sentiment analysis in the healthcare data. Multimed. Tools Appl. 2024, 83, 72765–72787. [Google Scholar] [CrossRef]
Serrano-Guerrero, J.; Bani-Doumi, M.; Romero, F.P.; Olivas, J.A. A 2-tuple fuzzy linguistic model for recommending health care services grounded on aspect-based sentiment analysis. Expert Syst. Appl. 2024, 238, 122340. [Google Scholar] [CrossRef]
Toçoğlu, M.A.; Onan, A. Sentiment Analysis on Students’ Evaluation of Higher Educational Institutions. In Proceedings of the Intelligent and Fuzzy Techniques: Smart and Innovative Solutions, Istanbul, Turkey, 21–23 July 2020; Kahraman, C., Cevik Onar, S., Oztaysi, B., Sari, I.U., Cebi, S., Tolga, A.C., Eds.; Springer: Cham, Switzerland, 2021; pp. 1693–1700. [Google Scholar] [CrossRef]
Altrabsheh, N.; Gaber, M.M.; Cocea, M. SA-E: Sentiment analysis for education. In Intelligent Decision Technologies; IOS Press: Amsterdam, The Netherlands, 2013; pp. 353–362. [Google Scholar] [CrossRef]
Dake, D.K.; Gyimah, E. Using sentiment analysis to evaluate qualitative students’ responses. Educ. Inf. Technol. 2023, 28, 4629–4647. [Google Scholar] [CrossRef]
Hussain, T.; Yu, L.; Asim, M.; Ahmed, A.; Wani, M.A. Enhancing E-Learning Adaptability with Automated Learning Style Identification and Sentiment Analysis: A Hybrid Deep Learning Approach for Smart Education. Information 2024, 15, 277. [Google Scholar] [CrossRef]
Cao, Y.; Sun, Z.; Li, L.; Mo, W. A Study of Sentiment Analysis Algorithms for Agricultural Product Reviews Based on Improved BERT Model. Symmetry 2022, 14, 1604. [Google Scholar] [CrossRef]
Yadav, S.; Kaushik, A.; Sharma, M.; Sharma, S. Disruptive Technologies in Smart Farming: An Expanded View with Sentiment Analysis. AgriEngineering 2022, 4, 424–460. [Google Scholar] [CrossRef]
Kaushik, A.; Yadav, S.; Sharma, S.; McDaid, K. Harvesting Insights: Sentiment Analysis on Smart Farming YouTube Comments for User Engagement and Agricultural Innovation. IEEE Technol. Soc. Mag. 2024, 43, 91–100. [Google Scholar] [CrossRef]
Nimirthi, P.; Venkata Krishna, P.; Obaidat, M.S.; Saritha, V. A Framework for Sentiment Analysis Based Recommender System for Agriculture Using Deep Learning Approach. In Social Network Forensics, Cyber Security, and Machine Learning; Springer: Singapore, 2019; pp. 59–66. [Google Scholar] [CrossRef]
Zikang, H.; Yong, Y.; Guofeng, Y.; Xinyu, Z. Sentiment analysis of agricultural product ecommerce review data based on deep learning. In Proceedings of the 2020 International Conference on Internet of Things and Intelligent Applications (ITIA), Zhenjiang, China, 27–29 November 2020; pp. 1–7. [Google Scholar] [CrossRef]
Liu, R.; Wang, H.; Li, Y. AgriMFLN: Mixing Features LSTM Networks for Sentiment Analysis of Agricultural Product Reviews. Appl. Sci. 2023, 13, 6262. [Google Scholar] [CrossRef]
Rehman, M.; Razzaq, A.; Baig, I.A.; Jabeen, J.; Tahir, M.H.N.; Ahmed, U.I.; Altaf, A.; Abbas, T. Semantics Analysis of Agricultural Experts’ Opinions for Crop Productivity through Machine Learning. Appl. Artif. Intell. 2022, 36, 2012055. [Google Scholar] [CrossRef]
Leelawat, N.; Jariyapongpaiboon, S.; Promjun, A.; Boonyarak, S.; Saengtabtim, K.; Laosunthara, A.; Yudha, A.K.; Tang, J. Twitter data sentiment analysis of tourism in Thailand during the COVID-19 pandemic using machine learning. Heliyon 2022, 8, e10894. [Google Scholar] [CrossRef] [PubMed]
Gupta, D.; Bhargava, A.; Agarwal, D.; Alsharif, M.H.; Uthansakul, P.; Uthansakul, M.; Aly, A.A. Deep Learning-Based Truthful and Deceptive Hotel Reviews. Sustainability 2024, 16, 4514. [Google Scholar] [CrossRef]
Cao, Z.; Xu, H.; Teo, B.S.X. Sentiment of Chinese Tourists towards Malaysia Cultural Heritage Based on Online Travel Reviews. Sustainability 2023, 15, 3478. [Google Scholar] [CrossRef]
Alzahrani, A.; Alshehri, A.; Alamri, M.; Alqithami, S. AI-Driven Innovations in Tourism: Developing a Hybrid Framework for the Saudi Tourism Sector. AI 2025, 6, 7. [Google Scholar] [CrossRef]
Antoun, W.; Baly, F.; Hajj, H. AraBERT: Transformer-based Model for Arabic Language Understanding. In Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection, Marseille, France, 11–16 May 2020; Al-Khalifa, H., Magdy, W., Darwish, K., Elsayed, T., Mubarak, H., Eds.; European Language Resource Association: Marseille, France, 2020; pp. 9–15. [Google Scholar]
Inoue, G.; Alhafni, B.; Baimukan, N.; Bouamor, H.; Habash, N. The Interplay of Variant, Size, and Task Type in Arabic Pre-trained Language Models. In Proceedings of the Sixth Arabic Natural Language Processing Workshop, Kiev, Ukraine, 19 April 2021; Habash, N., Bouamor, H., Hajj, H., Magdy, W., Zaghouani, W., Bougares, F., Tomeh, N., Abu Farha, I., Touileb, S., Eds.; Association for Computational Linguistics: Kyiv, Ukraine, 2021; pp. 92–104. [Google Scholar]
Abdelali, A.; Hassan, S.; Mubarak, H.; Darwish, K.; Samih, Y. Pre-training bert on arabic tweets: Practical considerations. arXiv 2021, arXiv:2102.10684. [Google Scholar]
Zedeus. Nitter: A Free and Open-Source Alternative Twitter Front-End. Available online: https://github.com/zedeus/nitter (accessed on 9 March 2025).
Mitchell, R. Web Scraping with Python: Collecting Data from the Modern Web, 1st ed.; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2015. [Google Scholar]
McCallum, A.; Nigam, K. A comparison of event models for naive bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Madison, WI, USA, 27 July 1998; Volume 752, pp. 41–48. [Google Scholar]
Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar]
Cover, T.; Hart, P. Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Ramos, J. Using tf-idf to determine word relevance in document queries. In Proceedings of the First Instructional Conference on Machine Learning, Citeseer, Los Angeles, CA, USA, 23–24 June 2003; Volume 242, pp. 29–48. [Google Scholar]

Figure 1. The methodology pipeline for sentiment analysis of agricultural festivals in Al-Baha.

Figure 2. Distribution of sentiment labels in the dataset.

Figure 3. Model accuracy comparison.

Figure 4. Model precision comparison.

Figure 5. Model recall comparison.

Figure 6. Model F1-Score comparison.

Table 1. Hyperparameter configuration for baseline machine learning models.

Model	Library Used	Key Hyperparameters
Multinomial Naïve Bayes	`scikit-learn`	Default settings
Support Vector Machine (SVM)	`scikit-learn`	`kernel="linear"`
Logistic Regression	`scikit-learn`	`multi_class="multinomial"`, `solver="lbfgs"`, `max_iter=1000`
Random Forest	`scikit-learn`	`n_estimators=100`, `random_state=42`
k-Nearest Neighbors (KNN)	`scikit-learn`	`n_neighbors=5`
XGBoost	`xgboost`	`use_label_encoder=False`, `eval_metric="mlogloss"`

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alzahrani, M.; AlGhamdi, F. Social Media Sentiment Analysis for Sustainable Rural Event Planning: A Case Study of Agricultural Festivals in Al-Baha, Saudi Arabia. Sustainability 2025, 17, 3864. https://doi.org/10.3390/su17093864

AMA Style

Alzahrani M, AlGhamdi F. Social Media Sentiment Analysis for Sustainable Rural Event Planning: A Case Study of Agricultural Festivals in Al-Baha, Saudi Arabia. Sustainability. 2025; 17(9):3864. https://doi.org/10.3390/su17093864

Chicago/Turabian Style

Alzahrani, Musaad, and Fahad AlGhamdi. 2025. "Social Media Sentiment Analysis for Sustainable Rural Event Planning: A Case Study of Agricultural Festivals in Al-Baha, Saudi Arabia" Sustainability 17, no. 9: 3864. https://doi.org/10.3390/su17093864

APA Style

Alzahrani, M., & AlGhamdi, F. (2025). Social Media Sentiment Analysis for Sustainable Rural Event Planning: A Case Study of Agricultural Festivals in Al-Baha, Saudi Arabia. Sustainability, 17(9), 3864. https://doi.org/10.3390/su17093864

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Social Media Sentiment Analysis for Sustainable Rural Event Planning: A Case Study of Agricultural Festivals in Al-Baha, Saudi Arabia

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Data Collection

3.2. Data Preprocessing

3.3. Data Annotation

3.3.1. Annotator Instructions

3.3.2. Quality Control

3.4. Sentiment Analysis

3.4.1. Baseline Machine Learning Models

3.4.2. AraBERT Model

3.5. Result Evaluation and Comparison

3.5.1. Accuracy

3.5.2. Precision

3.5.3. Recall

3.5.4. F1-Score

4. Results and Discussion

4.1. Model Performance Overview

4.2. Precision and Recall Analysis

4.3. F1-Score: Balancing Precision and Recall

4.4. Summary and Insights of Model Results

4.5. Practical Implications of Sentiment Trends

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI