Weight Calculation

We filtered the tweets to keep only the tweets contain at least one term from the highlevel (L1), except for roadwork/damage event type, we keep tweets that contain at least one term from level (L1) and at least one term from level (L2). For roadwork/damage event, L1 includes terms such as maintenance ( ! >), development (/! :CB) while L2 include terms such as road ( ?! /@), street (D 1). So at least on terms from each level should be found to assignalabeltoatweet.Afterthat,theweightiscalculatedusingthefollowingequation:

$$\mathcal{W}\_{\mathbb{E}} = \left\{ (\text{size}(\text{Lx}\_1) \times \mathcal{W}\_{\mathbb{R}}) + (\text{size}(\text{Lx}\_2) \times \mathcal{W}\_{\mathbb{R}-1}) + \dots + (\text{size}(\text{Lx}\_n) \times \mathcal{W}\_1) \right\} \tag{2}$$

where *WE* is the total weight for the event *E*, *Lx* is the list of matching terms and *W* is the weight assigned for this level. Since we have 4 levels, the highest weight is 4, so, each of the term Tx that was found in Level *Lx*1 has a weight equal to 4.

### Sort and Filter Automatic Labeled Tweets

We used the weight to sort the labeled tweets and then we filtered them. Furthermore, we specified a threshold to discard labeled tweet that has low weight and kept tweets that have high weight because they are most likely related to the event. The same process is repeated for each event type. We take into consideration during this process that the tweet can have multiple labels.

### Testing and Evaluation

The testing and evaluation of the proposed automatic labeling tool are performed in two stages. The first stage is in the beginning in order to update and modify the list of terms in each level in the dictionary such as moving terms from level to another level or adding new terms. After each initial labeling iteration, we extracted the top vocabularies to search for the new important terms that are not ye<sup>t</sup> included in the dictionary of that event. For instance, for the first iteration, most of the weather event is about rains so the automatic generated dictionary contains terms related to rains. So, this stage is important to insert missing terms and then we manually update the dictionary to add their synonyms if they exist. The second stage is applied to randomly selected tweets to make sure that the tweets are labeled correctly. The main goal is reducing the number of false positive (labeled as an event but it is not) more than the number of false negatives (event but labeled as not related) to reduce the chance of making mistakes and including none event tweet in the training set of events classifiers. Besides, missing a few events tweets will not have a major effect on the size of the training set.

### 3.4.2. Automatic Labeling for Irrelevant Tweets

Before detecting the event, we need to train a classifier to classify the tweets into relevant and irrelevant to traffic. The training set contains positive tweets (related to traffic) and negative tweets (not related). Even though we train the filtering classifiers before the events classifiers, we generate the training set for event before filtering classifiers. The output from the automatic event labeling process is used for a positive class for tweet filtering. For negative class, we applied another automatic filtering approach by searching for the tweets that do not contain any terms related to traffic and transportation. We searched in the tweets collected by geo-filtering and we excluded tweets posted by any account related to traffic.

### *3.5. Feature Extractor Component (FEC)*

We used CountVectorizer and IDFModel algorithms provided in the Spark ML package to generate the feature vectors and rescale them. IDFModel applied TF-IDF (Term Frequency-Inverse Document Frequency), which reflects the importance of a token to a document (tweet) in a corpus. The TF-IDF is the product of TF and IDF where TF (t, d) is the frequency of the appearance of token t in document d while the IDF is calculated using Equation (3). A detailed explanation was given in our earlier paper [6]:

$$\text{IDF}(\mathbf{t}, \mathbf{D}) = \log \frac{|\mathbf{D}| + 1}{\text{DF}(t, D) + 1} \tag{3}$$

### *3.6. Tweets Filtering Component (TFC)*

### 3.6.1. Model Training

To filter tweets into related to road traffic and not related, we built a classifier using ML supervised classification algorithms. After labeling the tweets using both automatic and manual labeling approach, we used the Spark ML library to build and train the models. We built three models using SVM, naïve Bayes and logistic regression algorithms. We have an imbalanced dataset because in our work, the number of samples for the negative class (not related to traffic) is much higher than the positive class (traffic-related). This will lead to misleading evaluation results, especially for accuracy. To address this issue, we applied a random under-sampling approach to randomly remove some tweets from the negative class.

### 3.6.2. Hyperparameter Tuning

After data processing and feature extraction, we need to tune the parameters to obtain the best performance model. Grid search is one of the well-known ways to search for the best tuning parameter values. To do that, we need to specify a set of candidate tuningparameter values and then evaluate them. Cross-validation can help to generate samples from the training set to evaluate each distinct parameter value combination and see how they perform. After that, we can ge<sup>t</sup> the best tuning parameter combination and use them with the entire training set to train the final model.

Spark ML supports model selection using tools such as CrossValidator to select a relatively reasonable parameter setting from a grid of parameters. We used 5-fold crossvalidation, so CrossValidator will generate five (training, testing) dataset pairs. Then, the average evaluation metric for the five models will be computed and the best parameter will be founded. In the future, we plan to improve our method and use 10-fold validation.

### 3.6.3. Classification Model Evaluation

We compare the performance using the common evaluation metrics, which are accuracy, recall, precision and f-score. The model that achieves higher results is selected for the final classification of tweets. Since we are using binary classification to classify into relevant (class 1) and irrelevant (class 0), tuning the prediction threshold is very important. The default threshold is 0.5 and it can be any value in the range [0, 1]. If the estimated probability of class label 1 is greater than the specified threshold, the prediction result will be 1, otherwise, it will be class 0. Thus, specifying a high threshold value will encourage the model to predict 0 more often and vice versa. In our case, we need to minimize the chance of making mistakes and predicting 0 (irrelevant) as 1 (relevant) so we set the threshold to 0.8.

### *3.7. Events Detection Component (EDC)*

We focus in this work on detecting the following event types: Fire, Weather, Social, Traffic Condition, Roadwork/Road Damage, and Accident. The tweets are labeling using the labeling method explained in Section 3.4. The classes are not mutually exclusive where the tweet can be about multi-events at the same time. For instance, the tweet might explain the accident that occurs due to bad weather. Hence, two labels will be assigned to this tweet, which are accident and weather. To address the issue, we used a binary classification. We trained a model for each event type. For each model, we need positive and negative samples. Assume we have event type T, the tweets that are labeled as T considered as positive samples while all the remaining tweets that belong to the other events types are considered negative samples. Moreover, tweet that has more than one label such as accident and weather will be included in the positive class in the training set of accident as well as weather during training both classifiers. However, as the number of tweets on the negative class is very large compared to the positive because it includes all the tweets about the other events types, we have an imbalance dataset problem. To address this problem, we followed the same approach explained in Section 3.6.1 by applying a random under-sampling approach.

### *3.8. Spatio-Temporal Extractor Component (STEC)*

The location is the foremost matter of interest in transportation analysis and event detection domain. Thus, we applied different techniques for location extraction from the Tweet object.

### 3.8.1. Text, Hashtag and Username

The main approach is extracting location details which are mentioned within the post. It might be explicitly mentioned in the tweet's message or it might exist as part of the hashtags or accounts name especially if the tweets are posted by a specialized account that posts about the events and traffic condition in the cities. We created a list of cities name in Saudi Arabia to search for cities name in the tweet message. We pass the Arabic names list to the stemmer before using them to extract the place name from text because we extract them from pre-processed text. In addition, we searched for the cities name in English to extract them from accounts or hashtags using a predefined list of cities' names in English. We also created a list of specialized accounts that post about traffic in Saudi Arabia cities and does not include a city name. After that, we use this list to find the city name based on the username.

### 3.8.2. Tweets Geo Attributes

One of the approaches is obtaining coordinates in geotagged tweets by getting latitude and longitude from 'coordinate' or 'place' objects. The 'place' child object consists of several attributes including 'place\_type', 'pl ce\_name', 'country\_code'. The place type is either city or point of interest (poi). However, a small fraction of tweets are geotagged because most of the users used to disable location services in their smartphones for privacy reasons.

### 3.8.3. User Profile

The location information is also extracted from user profiles where they usually manually write the country and city name. We have to consider that this information might be written in Arabic or English and they use different spelling. For instance, Makkah can be written as Makkah or Mecca. The text is tokenized and then passed to the stemmer before searching for the city name using the created dictionaries.

We cannot rely on the geo attributes alone because geo coordinates information might not have been provided especially for users who disable location services in their smartphones where the value will be 'null' in this case, as shown in the JSON example in Section 3.2. Similarly, we cannot rely on profile information alone because users do not always fill in these fields with accurate information. In addition, they might travel to another city/country so the profile information, does not reflect the current location. Besides, both approaches do not necessarily represent the place of the event because users might post about events that occur in other cities.

In this work, we considered the text as the main source of location information because it is more accurate than the other attributes besides, we need to find the location where the events occur not where they were posted. If the information does not exist, we extract them from coordinates or place attributes. The last option is to find location from the profile because it is less accurate than the other since users specify their information in the profile manually and they do not usually update them whenever they travel to another city. For visualization, the geospatial coordinates of the detected locations are extracted to enable plotting them on the map.

### *3.9. Reporting and Visualization Component (RVC)*

This component supports plotting the output of the spatio-temporal information extraction component and event detection components to show the detected events and their location and time of occurrence. Also, it supports finding peak events based on configurable parameters as well as visualize the results. Algorithm 4 shows the peak events reporting algorithm. It enables searching for hourly, daily and monthly peak events when the tweets intensity exceeds a specific threshold value.

Moreover, this component supports visualizing the results of the model evaluation (see Section 3.6.3) for both tweet filtering and event detection components to illustrate which algorithm achieved better results.

**Algorithm 4** Peak Events Reporting.


 )

 **2for**d in duration**do**

 **//**DurationcanbeHours,DaysorMonths




**4**


### *3.10. External and Internal Validation Component (EIVC)*

To validate Iktishaf+ tool and its ability to detect events and their spatial and temporal nature. We searched against various sources on the web including news media. Then, we compared the information extracted by our tool with the information in the web sources. However, news media do not report all the existing events and even if they report them, they might not mention the time of occurrence. In this case, we searched in the tweets we have, related to the event, to find the validation information we need. We consider this process as an internal validation because it is based on the collected tweets. To find time information, we go back to the earliest tweet we have about the event and if the time is not mentioned explicitly in the tweet text, we refer to the time of posting the tweet as the starting time of the event. The process of searching in the external sources was done manually, but we plan to automate it in the future.

### **4. Analysis and Results**

*4.1. Detected Events*

### 4.1.1. Validation of Detected Events

To validate the ability of iktishaf+ and verify if a detected event really happened on the same detected date and location**,** we searched against external validation sources (news media) or an internal source (Twitter) (see Section 3.10). Table 1 shows a comparison between information extracted by Iktishaf+ and information from external/internal sources. We cannot discuss all the detected events, due to the limited number of pages allowed and the large period we are covering in this work (September 2018–October 2019). So, we selected samples of different event types occurred in different time and location. Column 1 shows the event types. Column 2 lists the location (city name) where the event occurs as we found from searching in various external sources as well as the location extracted by our tool. Column 3 gives the date when the events occur. Column 4 gives the time of occurrence to assess the ability of our tool to detect the time. We compared the starting time mentioned in the web sources with the peak time showing by Iktishaf+. As explained in Section 3.10 the time information may not be mentioned in the news reports. So, in this case, we searched in the collected tweet and ge<sup>t</sup> the earliest tweet about this event. Then, we extracted the time from the timestamp attached to the tweet.


**Table 1.** Example of Validation.

Moreover, we drew charts to display the time extracted by Iktishaf+ for each event in the table. Further, the locations of each event are overlayed on top of the Saudi Arabia map.

Row 1 shows the "Fire" event on 1 October 2018. Figure 4 illustrates the locations. Note the largest circle in Riyadh city, this matches the information found in the newspaper, where they reported about a huge fire that broke out at a power plant in Riyadh [83]. They also mentioned that the Saudi Civil Defense received notification about the fire at 3 p.m. Figure 5 shows the time extracted by our tool. It can be seen that the intensity started raising at 3 p.m. and the highest peak was at 4 p.m.

**Figure 4.** Fire Event on 1 October 2018.

**Figure 5.** Intensity of Detected Fire Event in Riyadh (1 October 2018).

Row 2 validates another "Fire" event. As we found in the web source [84], it was a massive fire that ripped through the main station of the Haramain high-speed railway in Jeddah city on 29 September 2019. It started at 12:35 p.m. according to the Haramain highspeed railways' Twitter account. Further, it burned for hours before it was brought under control. Figures 6 and 7 show the location and time information extracted by Iktishaf+. Note the largest circle in Jeddah city as well as the peak time around noon as shown in Figure 7.

**Figure 6.** Fire Event on 29 September 2019.

**Figure 7.** Intensity of Detected Fire Event in Jeddah (29 September 2019).

Furthermore, Row 3 validates the "Weather" event on 23 November 2018. This was due to the rains in Makkah and Jeddah cities as reported in the newspaper [85]. Figure 8 plots the locations of weather events on that date. Note the largest circles are in Jeddah and Makkah cities. The news article we found was posted around 5:42 a.m. and they mentioned that the rains started at dawn. This validates the time extracted by our tool. Figure 9 shows a peak at 4 a.m. As we know, Twitter users almost post about events like rain immediately once they happen and most likely earlier than newspapers.

Finally, Row 4 illustrates the "Accident" event on 8 October 2018. Note the largest circle in Riyadh shown in Figure 10. The time of occurrence is not available on the newspaper website so in this case, we went back to the tweets and searched for the earliest tweet that mentioned information about the same accident we wanted to validate. Then, we extracted the time from the timestamp (created\_at attribute) included with the tweet object and assumed that it was the time of occurrence (see Section 3.10). This is the English translation for the earliest tweet we found about this event "Congestion in every street and accident in the Alwashm bridge, stations crowded, crowded everywhere in Riyadh #Riyadh\_now". The time attached to this tweet is "Mon Oct. 08 04:48:17 +0000 2018" and the first peak time detected by Iktishaf+ as shown in Figure 11 is at 5 a.m. Therefore, it can be seen from the discussed results above that the information from external or internal validation

sources matches the information detected by Iktishaf+, which proves the ability of our tool to automatically detect events and their location and time without prior knowledge.

**Figure 8.** Weather Event on 23 November 2018.

**Figure 9.** Intensity of Detected Weather Event in Makkah (23 November 2018).

### 4.1.2. Spatial Analysis

Figure 12 depicts the percentage of the extracted location information using different approaches explained in Section 3.8. As shown in the pie chart, 44% of the information is extracted from tweet text while 16% are extracted using the information in the user's profile. However, 27% of tweets about events did not include any information about the location where it occurs. Besides, only 5% are extracted from geo attributes. This could be because few tweets are geo-tagged because users usually turn off the location service in their smartphones. Also, we only look into the geo attributes if the location does not mention in the text because we mainly focus on where the event occurs not where it has been posted.

**Figure 10.** Accident Event on 8 October 2018.

**Figure 11.** Intensity of Detected Accident Event in Riyadh (8 October 2018).

**Figure 12.** Number of Tweets Using Different Location Extraction Approaches.

After inferring cities' name from the tweets, we group them by province. Figure 13 gives the number of tweets for each event type in the large provinces in Saudi Arabia. It shows the aggregated number of tweets for the whole period (from September 2018 to October 2019). It can be seen that the number of events detected in Riyadh is higher than the events in other provinces. This could be because Riyadh is the capital and one of the largest cities. Besides, based on the latest report published by INREX [86], Riyadh is the most congested city in Saudi Arabia, which may explain the results we got.

**Figure 13.** The Number of Detected Events in Different Provinces.

### 4.1.3. Spatio-Temporal Analysis

Figure 14 shows the hourly distribution for the aggregated number of tweets for the whole period. We plot only the provinces that show the high number of events to eliminates having too much data in the chart since we have 13 provinces in Saudi Arabia. As shown in this figure, the number of tweets starts raising in the morning and becomes very high by the time of coming back from school and work, which usually between 12 p.m. and 5 p.m. Further, the number goes down after 8 pm, which is expected because usually the traffic flow and activities during day-time are higher than the night-time because of work and schools.

**Figure 14.** Hourly Distribution of Tweets Divided by Provinces (Aggregated).

### *4.2. Evaluation: Tweet Filtering Classifiers*

To evaluate the trained model for the Tweets Filtering Component (TFC), we used the common statistical metrics, accuracy, precision, recall, f-score (see Section 3.6.3). Most of the tweets we have are irrelevant to traffic, so we have an imbalanced dataset. So, to eliminate the effect on the evaluation results, we have two options either oversampling the majority class that represents the irrelevant tweets or under-sampling the minority class that represents the traffic tweets. In our particular case, it is better to have a large number of samples for both classes. Therefore, we decided to apply oversampling on positive (traffic-related) class. We simply duplicate the number of the tweet in the positive class (see Section 3.6.3). Figure 15 shows that SVM achieved higher results compared to the other algorithms. It achieved 91% for both accuracy and f1-score, 90% for precision and 89% for recall. The difference between the results achieved by SVM and LR is approximately 1%. However, we selected SVM where it performed better.

**Figure 15.** Numerical Evaluation (Tweets Filtering).

### *4.3. Evaluation: Event Classifiers*

We numerically evaluated the built binary classifiers in the Event Detection Component (EDC). For each event type, we trained three models using NB, SVM, and LR algorithms and then we selected the algorithm that achieved higher results in most evaluation metrics (accuracy, precision, recall and f1-score). We have used the four performance metrics as discussed before in the previous section (also see Section 3.6.3).

Figure 16 illustrates the evaluation results for the four performance metrics (left to right: accuracy, precision, recall, and F-score) in four separate figures. The results show that SVM performs better than other algorithms for Weather, Roadwork and Traffic condition events while NB performed the best for Fire event. Besides, for accident events, SVM achieved higher results for accuracy, precision and f1-score while NB performed slightly better for the recall where SVM achieves 86% whereas NB achieves 88%. However, since the SVM performed better for most metrics, we selected SVM for accident event. For Social event, NB achieved higher recall and precision while SVM performed better for accuracy and f1-score. However, we selected NB for Social events since the accuracy and f1-score

achieved by SVM are approximately 1% higher than NB. To summarize, SVM has been used for all the event types except Fire and Social where we used NB. Moreover, the highest results we go<sup>t</sup> in all metrics were achieved by SVM for Weather event where it achieved 98% for both accuracy and f1-score and 97% for recall and precision. We assume that the reason is we have a larger number of tweets for the training set of weather event since it occurs more often and a lot of users post about it compared with other event types such as accident and roadwork/damage.

**Figure 16.** Numerical Evaluation (Events Classification).

### **5. Conclusions and Outlook**

Digital societies could be characterized by their increasing desire to express themselves and interact with others. This is being realized through digital platforms such as social media that have increasingly become convenient and inexpensive sensors compared to physical sensors in many sectors of smart societies. One such major sector is road transportation, which is the backbone of modern economies and costs globally 1.25 million deaths and 50 million human injuries annually. The cutting-edge on big data-enabled social media analytics for transportation-related studies is limited.

In this paper, we introduced the Iktishaf+ tool that uses big data and distributed machine learning to automatically detect road traffic events from Arabic tweets. Manual labeling is a time-consuming process that makes supervised classification hard to apply to big data. In order to address this problem; we proposed an automatic labeling approach to reduce the effort and time of generating a training set for training supervised classification models. The traditional manual labeling for text is usually achieved by looking for specific terms to decide whether the text is relevant to the topic or not. Hence, our tool was designed to follow the same procedure. We built a dictionary for each event type that contains lists of terms that usually used when posting about the events. The dictionaries were generated automatically using the top vocabularies extracted from the manual labeled tweets. Then, we updated them manually to add synonyms and missing vocabulary. After that, we divided them into levels based on the degree of importance and relevance to the event type. Subsequently, the tool looked up the matched terms and labels the tweets based on that. Finally, the tool calculated weight for each labeled tweet, and only tweets that are highly related to the event are included in the training set.

Furthermore, we developed a location extractor to find the location of the events allowing spatio-temporal information extraction and visualization of the events. Moreover, using a stemmer is necessary for our work not only to minimize feature space for model training but it is also very helpful especially for terms searching during automatic labeling and location extraction. The existing Arabic stemmers are not efficient in our case where they might lead to removing an important letter from the word and then cause losing or changing the meaning of important words. Therefore, we designed a light stemmer that enables affix stripping with fewer changes in the word meaning.

We built and trained models to filter out irrelevant tweets to traffic events. We focused on six events that might affect road traffic which are accident, fire, weather, roadwork/damage, road condition, and social events. Furthermore, we built classifiers to automatically classify tweets into different events. We used three machine learning algorithms, which are SVM, NB, and logistic regression. Then, we selected the algorithm that achieves better results in terms of accuracy, recall, precision, and f-score. Moreover, we applied external validation using online sources such as newspapers. We selected the highest peaks from the detected events and find whether they occurred or not. The results show that our tool is able to automatically detect events and their spatial and temporal nature without prior knowledge.

The ability of the Iktishaf+ tool to use big data distributed computing technologies could save days, months, or years of computing time proportional to the size of the data. Moreover, it enables the scalability and interworking of big data analytics software systems. The utilization possibilities of our tool are many such as detection of transportation-related events for planning and operations, detection of causes of road congestion, understanding public concerns and their reactions to governmen<sup>t</sup> policies and actions, and many more. An elaboration of these aspects of our work (the novelty, contributions, and utilization) was given in Section 2.4.

We have shown good evidence of the use of automatic labelling, machine learning, and other methods. However, more work is needed to improve the breadth and depth of the work with regard to what can be detected, the diversity of data and machine and deep learning methods, the accuracy of detection in space and time, and the real-time analysis of the tweets.

The real-time operation of the proposed system could depend on a number of factors. Firstly, the definition of the term "real-time" per se depends on the application and the requirements at hand. Some applications may require reactions within sub-second periods while others may tolerate a few minutes or more. Moreover, taking preventive actions also depends on the event and the action being taken. In this particular context, and considering the example of a car accident, the Iktishaf system once trained can detect an accident from tweets instantaneously provided the tweets are available in real-time for the software to process. This can be achieved, for instance, by running the software at the edge or fog layers. The reactive actions, in this case, can mean to inform the police and ambulance services, which can be done in real-time by the software automatically through an automatic emergency call to 911, by sending tweets or other messages to the concerned bodies, or by other emergency strategies available in the area. The messages related to the particular actions in this context can also be propagated using vehicular ad hoc networks (VANETs), dedicated short-range communications (DSRC), etc. A more interesting and lucrative work would be to detect certain events (such as car chases or certain patterns in the traffic that may lead to accidents or certain social events that may cause congestion) before these happen and take actions to prevent the events before they happen. These will require further research and adding additional functionalities to the Iktishaf tool. Our future work will look into these areas.

Digitally and data-driven methods while bringing many benefits to the research and practice have their risks and disadvantages as is the case for anything else. These include issues related to security, privacy, data ownership, lack of standards describing ethical requirements from digital methods and compliance to these standards, the safety of the stakeholders involved in data-driven and digital methods, vulnerabilities of digital platforms, and the digital divide. For a detailed discussion of these issues, see Section 11 in [87], and the references therein. As regards the specific privacy issues of Twitter data, the data we use is openly available. The information about the location of these tweets is also public. However, we have not disclosed any personal information through our analysis. The information we detected and published is of general nature and therefore does not infringe on individual privacy. However, generally speaking, it is possible to detect information from Twitter data that affects individuals' privacy. Our earlier works [57,88,89] have looked into privacy and we plan to extend this to investigate Twitter data privacy in the future.

Our focus in this work is on Saudi Arabia. The tool hence currently works with tweets only in the Arabic language. The tool can be used in other Arabic languagespeaking countries, such as Egypt, Kuwait, Bahrain, and UAE. The system methodology and design of the tool developed in this paper are generic, and therefore the tool can be extended to other countries globally. This will require the adaptation of the tool with additional languages, such as English, Spanish, or Chinese, by additional modules in the pre-processing and clustering modules.

This line of our work deals with the use of Twitter data as a virtual sensor to detect transportation-related events. It is necessary to look also into other sources and methods of sensing in transportation systems, such as inductive loops, floating car data, automatic vehicle locators, virtual loop detectors, cooperative driving, etc. The real vision and potential of smart transportation systems will be realized when different sensing systems will be integrated within the transportation systems as well as with other urban systems. Our other strands of research have looked into different traffic sensing methods such as GPS [51], inductive loops [49], cooperative decision-making for autonomous vehicles [35], and urban travel data from travel cards and other sources [50]. Our future work will look into integrating these sensing methods along with other urban sensing systems, such as healthcare [7].

**Author Contributions:** Conceptualization, E.A. and R.M.; methodology, E.A. and R.M.; software, E.A.; validation, E.A. and R.M.; formal analysis, E.A. and R.M.; investigation, E.A., R.M., A.A., I.K. and T.Y.; resources, R.M., I.K., and A.A.; data curation, E.A.; writing—original draft preparation, E.A. and R.M.; writing—review and editing, R.M., A.A., I.K. and T.Y.; visualization, E.A.; supervision, R.M.; project administration, R.M., I.K. and A.A.; funding acquisition, R.M., A.A. and I.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under gran<sup>t</sup> number RG-11-611-40. The authors, therefore, acknowledge with thanks the DSR for their technical and financial support.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data was obtained from Twitter. Restrictions apply to the availability of these data.

**Acknowledgments:** The experiments reported in this paper were performed on the Aziz supercomputer at King Abdulaziz University.

**Conflicts of Interest:** The authors declare no conflict of interest.
