**4. Results and Analysis**

We will now discuss the results of our proposed system. Section 4.1 describes the detected pandemic measures and concerns (topics) using LDA. Section 4.2 provides an analysis of the identified measures and concerns as regards their temporal nature (the date) as well as the validation process of the identified concerns using internal sources (Twitter) and external sources (online news media). Section 4.3 provides an analysis in terms of their spatio-temporal nature (the date and the cities). Section 4.4 provides an analysis of the model execution times using distributed computing. Finally, Section 4.5 discusses the relationship between the detected measures and concerns.

### *4.1. COVID-19: Pandemic Measures, Public Concerns, and Macro-Concerns*

Table 1 lists the fifteen major pandemic measures and public concerns (hereon we refer to them as public concerns or concerns) discussed by the public on Twitter during the COVID-19 pandemic. These are grouped into six groups that we call macro-concerns (Column 1). These are virus infection, daily matters, contain the virus, social sustainability, economic sustainability, and back to normal. Column 2 gives the rank in terms of the importance of the concern based on the percentage of tweets for each concern (percentage is listed in Column 3). The concerns are listed, firstly, in groups (macro-concerns) and, within each macro-concern, by the descending order of the rank. The fifth column of the table shows the top ten keywords related to each concern. Primarily, these keywords are the clusters extracted by our tool using the LDA approach described in Section 3. Subsequently, we assigned a label (i.e., concern) to each cluster of keywords based on our understanding of the keywords in each cluster. For the purpose of gaining understanding about a cluster of keywords, we looked at the tweets that were associated with a cluster with the highest probabilities (we refer to these as the top-ranked tweets). We illustrate this in the following by example. The first row in the table lists the first public concern, which is **COVID-19 Cases**. This includes keywords including health, announce, new, case, register, and infection. These keywords are usually used by individuals and various organizations (e.g., the Ministry of Health in Saudi Arabia) when disseminating information related to the daily number of cases, deaths, etc. The following is one such tweet by the Ministry of Health (the number of cases, deaths, etc. would vary in these tweets).

ΓϝΡϭϱϑ ωΕϩϝΡϝΝαΕϭΩϱϑϭϙΩϱΩΝϝϥϭέϭϙαϭέϱϑΏΓΩϱΩΝϩΏι·ϩϝΡϝϱΝαΕϥωϥϝωΕΓΡιϝ الله ϡϩϡΡέ ϩϑϭ

*The Ministry of Health announces the registration of (382) new cases of infection with the new Coronavirus (COVID 19) and records (35) cases of recovery and (5) cases of death, may God have mercy on them.*

The second row lists the concern **Supplications** and its keywords. Supplication is an important part of Muslim beliefs and daily life. Muslims supplicate when they face difficulty or hear good news (they may also supplicate without any good or bad news). To illustrate, Muslims believe that a difficulty is a test from God (Allah), and thus they are encouraged to increase their supplications. During the pandemic, people might pray asking Allah to protect them and others from the virus. Muslims increase their supplications greatly during Ramadhan (the lunar month of fasting that comes once a year). The month of Ramadhan this year (2020) fell between 24 April and 23 May. The keywords for this concern are clearly representative of the label "Supplications".

The third concern is **Quarantine**. This is one of the methods that have been followed by various countries to prevent the spread of the virus by isolating healthy people from potentially unhealthy people who could have been infected with the SARS-CoV-2 virus. The fourth concern is about the **Five Daily Prayers**. Muslims pray in congregations, next to each other without gaps, at mosques five times a day. The Saudi government suspended all the congregational prayers across all mosques in the Kingdom to prevent the spread of the virus. We found tweets from individuals and organizations similar to the following top-ranked tweet.

ϱϑϩωϡΝϝ˯ΩΏΡϡαϝ˰ϩ *1441*ϝϭε*28*ΕΏαϝϡϭϱΓϱϩϥϯΕΡ˰ϩ *1441*ϝϭε*8*ΩΡᄴϡϭϱϥϡ˯ΩΕ ˱ ΏϝΝω ϩϙϡΩωϡΓ ϙϝϡϡ ϝΩΝαϡωϱϡΝ

*Urgent starting from Sunday 8 Shawwal 1441 AH until the end of Saturday 28 Shawwal 1441 AH prayers are permitted to be performed in all mosques of the Kingdom, except for #Makkah*

This explains the existence of the keywords Sunday, Saturday, and Shawwal in the clustered keywords. Shawal is the tenth month of the Islamic lunar calendar. The fifth identified concern is **Stay Home**. From the top keywords, we can see that people consider staying home a strong measure to stop the spread of COVID-19 and save lives. To increase awareness among people about the importance of their role in fighting the coronavirus outbreak, authorities used the slogan "We are all responsible", which is visible in the keywords of this concern. The sixth concern is **Loan**. The COVID-19 pandemic has severely affected people's financial situation globally due to reasons such as the loss of jobs. They are seeking loans or struggling to repay loans, which makes it one of the major pandemic concerns. The seventh concern is **Cleaning Services**. During the pandemic, the cleaning services were in high demand such as for cleaning public areas affected by virus-carrying people. The following tweet is an example of this concern.

*#*ΓΉϱΏ έϱϑϭΕϑΩϩΏϝϭΝΕϝBωϥϡΓέΕϑᄴᄴ νϱέϝ ϕέρϑϱυϥΕϭ ϡϱϕωΕϱϑϩΕᄴᄴ ϝιϭΕνϱέϝBϩϥϡ ϡωBαϭϥϙαϝϝϩϥϡΓϱΡι

*#Riyadh\_municipality continues its tours to sterilize and clean the roads of Riyadh during the period of #curfew to provide a safe and healthy environment for the residents # WAS\_general*



The eighth concern is **Hospital Treatment**. From the top keywords, we can see that the need for blood donation became very high during the pandemic. This was an international concern because fewer people donated blood. It could be because they cannot visit hospitals/clinics because of the curfew or because they are worried about getting infected. Besides this, according to the Food and Drug Administration (FDA) [16] the number of blood donations dramatically declined during the pandemic time due to the implementation of social distancing as well as the cancellation of blood drives. We found several tweets in our dataset similar to the following, with differing patient file number and hospital name. We removed the file number from the tweet to protect the patient's identity.

ΓΩΝϝιϱϑϙϝϡϝϯϑεΕαϡ ϝΉιϑϝωϱϡΝϝΏϕϱ Γϝϱιϑϝ ϡΩωέΏΕϩΝΡΏϑϝϡϝΏΡιϝΝω *Urgent owner of the file —– needs #Blood #Donation type: accepts all blood types King Faisal Hospital #Jeddah*

The ninth pandemic-related concern is about the **Prevention** of COVID-19. This is clear from the top keywords: reduce, spread, corona, virus, and others. The top tweets that we found for this concern have shown different prevention strategies applied by the government to instill a sense of responsibility and to increase awareness among people about the importance of their role in fighting the spread of this virus. One of the approaches is enforcing curfew. The following tweet was posted on 22 March by @spagov account, which is the official account of the official Saudi Press Agency (SPA) for the news of the royal decrees, orders, council of ministers, and official statements.

˯α˱ ϡ*7*Γωαϝϥϡ˯ΩΕ ˱ ΏΩϱΩΝϝϥϭέϭϙBαϭέϱϑέεΕϥ·ϥϡΩΡϝϝϝϭΝΕϝωϥϡΏϩέϡέΩιϱϥϱϑϱέεϝϥϱϡέΡϝϡΩΥ *The Custodian of the Two Holy Mosques issues a curfew order to limit the spread of the new #Corona\_virus starting at 7 p.m.*

Besides this, as another example, the Twitter account of the Ministry of Health (@Saudi-MOH) has posted the following tweet on 22 March.

ϥϭέϭϙBϥϡBΓϱϕϭϝΓΡϝϡϝέϱύΓϱΏρϝΕ˯έΝᄴ ϭΩϱωϭϡϝϝϱΝΕΏ ΡιϥϡϙΕᄴᄴ ϝΝϥϡ

*For your safety, we recommend postponing non-urgent medical appointments and procedures. #Coronavirus\_prevention*

Another tweet with the same hashtag, #Coronavirus\_prevention, was posted by the official account of the Minister of Health Dr. Tawfiq Al-Rabiah (@tfrabiah) on 15 May before the end of the curfew and the return to normal. He encouraged people to wear masks before getting out of their houses.

ϥϭέϭϙBϥϡBΓϱϕϭϝϝίϥϡϝϥϡΝϭέΥϝϝΓΝΡϝΩϥωΓϱεϡϕϝΓ ϡϡϙϝϡΩΥΕα·ΏωϱϡΝϝΡιϥ

*I advise everyone to use a cloth mask when going out of the house #Coronavirus\_prevention*

Moreover, we found another tweet posted by @SaudiMOH on 30 March about the government order to treat all COVID-19 patients for free.

ϥϭέϭϙαϭέϱϑΏϥϱΏιϡϝ ωϱϡΝϝ ϱϥΝϡϝᄴᄴᄴ͑ الله ϩυϑΡϱ ϥϱϑϱέεϝϥϱϡέΡϝϡΩΥέϡϥωϥϝωϱΓΡιϝέϱίϭ Γᄴᄴ ϡυϥϱϑϝΥϡϭϥϱϡϱϕϡϝϭϥϱϥρϭϡϝϥϡΩϱΩΝϝ*.* 

*The Minister of Health announces the order of the Custodian of the Two Holy Mosques, may God preserve him for free treatment to all citizens and residents infected and violators of the residency system with the new #Coronavirus.*

The tenth pandemic-related concern regards **Prize Draw**. Note in Table 1. the top keywords, such as withdrawal, documented, video, gift, and retweet. It is common on social media to see some users announce prizes that will be given to a randomly selected follower who retweets their tweet. This helps them to increase their popularity because they will get more followers and thus it would be a mean of earning. This can be done by individuals or companies. The following tweet is an example.

ωΏΕϭΕϱϭΕϱέ*11*ϥϭϑϱ ΓϱΩϩϭ ϱΩϱϑϝ ΏϕΙϭϡΓϝϱϝϝΏΡαϝ

*Withdrawal tonight is documented in the video* ... *the gift is iPhone 11 retweet and follow*

The 11th public concern includes the keywords roads and traffic, and therefore we named it **Mobility**. The levels of daily mobility have changed significantly during the COVID-19 crisis throughout the world. All forms of transportation from road traffic flow to commercial flight activities have been reduced due to the fear of getting infected and the government lockdowns. The following tweet shows an example from Jeddah, the second largest Saudi city. This was posted on 19 March by the official account of the traffic department in Saudi Arabia, @eMoroor.

˱ έϙεΓϱίέΕΡᄴϭϩϱΉϕϭϝΕ˯έΝᄴΏϡίΕϝᄴ αϙωϱϡϡˬΓϱέϭέϡϝΓϙέΡϝϯϭΕαϡϱϑνϑΥϥΩϩεΕϩΩΝϕέρ Γᄴᄴᄴ ωϱϡΝϝϝϯϥϡΕϥϭϡϙϝ

*Jeddah roads are witnessing a decrease in the level of traffic, which reflects the commitment to preventive and precautionary procedures [.] Thank you and we wish everyone safety.*

The 12th pandemic-related concern is **Salary**. The top keywords include salary, private, government, and sector. Many employees lost their jobs due to the government lockdown restrictions and the closure of shops. Besides this, small, medium, and large businesses were also severely affected. Many organizations cut down their employees' salaries and/or laid off their employees. The 13th concern is **Curfew**. The top keywords include prevent, wandering, and the names of some cities. The 14th public concern is **Offers**. Discount, code, and coupon are among the top keywords. Various vendors in order to compensate for their losses due to the business closures in physical spaces have provided offers to attract online shopping customers.

Finally, the 15th concern is **Back to Normal**. This is related to the issues that need to be addressed for returning to normal life (as opposed to the life during the pandemic). By the end of the curfew, the authorities in Saudi Arabia started a new awareness campaign under the slogan " -5[ & V8" (returning with caution). People were discussing and responding to this campaign on social media. This is the last concern in terms of the ranking, because we believe that it includes fewer tweets compared to the other concerns. The "Back to Normal" was a relatively recent public concern within the dataset this stage had started by the end of May and our dataset contains tweets until 1 June.

Figure 5 visualizes the correlation matrix. The correlation matrix is visualized as a heatmap using the Seaborn library in Python. We computed the correlation matrix by calculating the correlation coefficients between the keywords of the detected concerns to show the relationship between the keywords (see Section 3.5 for details on its computations). There are a total of 15 concerns with 10 keywords each. We remove the duplicates keywords that exist in multiple concerns and sort them based on the frequency and keep the top 50 keywords. The dark blue color represents the strongest positive relationship between keywords while the dark red represents the strongest negative correlation. For example, note the dark blue color between wandering and prevent, which are used when mentioning **Curfew**. Note the dark blue squares between the keywords facing, stay, home, and strong, which imply a strong positive relationship between them. As mentioned earlier, these keywords refer to the **Stay Home** concern. There also seems to be a strong positive correlation between custodian, holy, and Haramain, which are usually used when referring to the Custodian of the Two Holy Mosques, the King of Saudi Arabia. Besides, a strong positive correlation can also be noted between Makkah and Mukarramah, which is the full name of Makkah city, as well as Madinah and Munawwarah, which is the full name of Almadinah city, the two holiest cities in Islam. Additionally, note the light blue color between the Makkah and Madinah keywords that shows that these two words have a mild positive relationship, which makes sense because these two cities appear together in many tweets. Note also the positive correlation between case, health, announce, register, corona, and infection. As mentioned earlier, these keywords are used when posting about **COVID-19 cases**.

Note that the most distinctive horizontal or vertical line is the line for the corona keyword, indicating that it has a relatively distinctive relationship with most of the keywords even though the light colors indicate mild positive and negative correlations. The highest positive correlation appears to be between corona and virus, while the highest negative correlation is between corona and good. This makes sense, because good is a positive

keyword. Finally, we note that there are not many dark red colors, implying that none of the keywords have strong negative correlations between them.

**Figure 5.** The correlation matrix of keywords.

#### *4.2. Temporal Analysis*

In this section, we will investigate how the public concerns have evolved over time during the pandemic. Figure 6 depicts the changes in the intensity of the tweets over time for the fifteen identified public concerns. We elaborate the data on these trends in Figure 6 using the following six figures, one for each of the six public macro-concerns.

Figure 7 depicts the intensity of tweets related to the public macro-concern **Contain the Virus**. The public concerns in this macro-class include curfew, stay home, quarantine, prevention, and cleaning services. The curfew was ordered on 22 March and applied from the next day between 7 a.m. and 6 p.m. It can be seen that the highest peak (for **Curfew**) was on 2 April. From external validation [68], we found that on that day the Makkah and Madinah cities were put under a 24 h curfew to prevent the spread of the virus and protect the health of residents. It appears that this 24 h curfew event was this detected highest peak because these are the two holiest cities in Saudi Arabia and for the whole Islamic world, and thus the lookdown of these two cities drew the attention of everyone.

**Figure 6.** Daily Twitter activity of government measures and public concerns (all).

**Figure 7.** Daily Twitter activity for a macro-concern (**Contain the Virus**).

Figure 7 shows the Twitter activity for the **Stay at Home** public concern in red color. It can be seen that the highest peak for this concern was on 21 March. We found that on that day the Government Communication Center of the Information Ministry launched the new visual identity initiative for the awareness campaign for coronavirus under the slogan "D 49 - )e" (we are all responsible) to encourage people staying at home [69]. We believe that people interacted with this initiative and posted about it on Twitter using the hashtag #- )e\_D 49 that explains a large Twitter activity related to the **Stay at Home** concern on that date. The **Prevention** concern is represented in Figure 7 using a light purple color. The highest detected peak for this concern was on 22 March and the second-highest peak was on 30 March. We found that many orders have been placed around the end of March to control the spread of the virus, including the order of curfew that has been announced on 22 March [70]. Further, as posted in the Ministry of Health website, on 30 March the King

of Saudi Arabia ordered providing free treatment to all citizens, residents, and even those who violated the residency rules [71].

The line plot in purple color in Figure 7 represents the quarantine concern. There are several peaks between 22 March and 18 April. The posts about quarantine had increased after the spread of the virus in the country and the increase in the number of cases. As we mentioned earlier, the government enforced several actions, including lockdown and curfew, as well as closing mosques, schools, and shopping malls by the end of March. The public concern cleaning services is represented in Figure 7 in green color. Note in the graph that the number of tweets start increasing after 22 March and reach the highest point on 3 April. Generally speaking, individuals and organizations have become more careful and concerned with cleanliness. As mentioned in Section 4.1 using example tweets, the Riyadh municipality has been sterilizing and cleaning the roads of the Riyadh city to provide a safe and healthy environment. This tweet was posted on 26 March, which is in the same period that shows a surge in the discussion about this concern.

Figure 8 depicts the intensity of tweets related to the public macro-concern **Virus Infection** that includes one public concern, **COVID-19 Cases**. Note in the figure that between mid-March 2020 and the end of May (with some intermittent gaps), people have an increased Twitter activity related to the virus infection concern—i.e., the spread of coronavirus and the increase in the number of cases. Specifically, the top two highest peaks are on 22 and 30 March. We found from the external validation process that involves searching in online news media (see Section 3.6) that the number of daily cases increased on 22 March from 48 to 119, while on 30 March the number of cases increased from 96 to 154. This is a significant increase in the number of cases, considering that it was the beginning of the pandemic period in Saudi Arabia. This caught the attention of the people and increased the worries, leading to a peak in the Twitter activity on the subject.

**Figure 8.** Daily Twitter activity for a macro-concern (virus infection).

Figure 9 shows the intensity of the tweets for the public macro-concern **Back to Normal** that includes one public concern with the same name **Back to Normal**. The highest peak was on 29 May. We found that on that date the Minister of Health posted the following tweet on Twitter:

ϡΙالله ϯϝωΩϡΕωΕΕίέᄴᄴ ΓΩϱίϝΓΩϭωϝϥ·ϡϙϡίΕϝϯϝωΩϡΕωΫϝˬέΫΡΏBΓΩϭωϝϝΡέϡϯϝϭΓϱΩΏϱϑϥΡϥ ΓϱΉϕϭϝΕ˯έΝᄴ͑ ϙωΏΕ ϭΝέϥωϱϡΝϝϝΙΕϡϯϝω

*We are cautiously beginning the first stages of #returning\_with\_Caution, so we depend on your commitment. We hope that you follow the precautions.*

This tweet was posted by the end of the nationwide coronavirus curfew. The Ministry of Health considered it the first stage to return to normal and started a new awareness campaign under the slogan " -5[
& V8" (returning with caution). The interaction of people with this announcement as well as the use of the hashtag "# V8\_ -5[
& " explain the increase in the tweet intensity on that day.

Figure 10 plots the intensity of the tweets for the public macro-concern **Daily Matters** that includes three public concerns: **Five Daily Prayers** (**Salah**), **Supplications**, and **Mobility**. Note in the figure that the intensity of the tweets about **Supplications** (see camel color) increased with the spreading of the virus and the increased number of cases. People in Saudi Arabia increased their supplications in response to the COVID-19 crisis. They ask God to protect them and their families from the virus, as well as asking for an end to the pandemic. The light blue color represents the **Salah** concern. The highest peak is on 26 May. Looking in the news media, we found that on that day an official source in the Ministry of Interior announced that, starting from Sunday 8 Shawwal (31 May) until Saturday 28 Shawwal (20 June), they will allow prayers to be held in all mosques of the Kingdom (except the mosques in the Makkah city) [72]. This explains the sharp increase in the tweet intensity on that day, because people were very happy with this news since praying at the mosque is critical for Muslims. The orange color represents the intensity of tweets about the **Mobility** concern. Note in the figure that the highest peak is on 24 March, which is two days after the curfew was implemented in Saudi Arabia. This Twitter activity was in response to how the roads appeared (empty) on the first day of the curfew. We verified this through online articles (see, e.g., [73]). The users of social media shared videos and photos showing the main streets empty due to the coronavirus curfew.

Figure 11 shows the intensity of the tweets for the public macro-concern **Social Sustainability,** which includes one public concern, **Hospital Treatment**. There was an increase over time in the Twitter activity on this concern during the pandemic, particularly during the later part of March up until mid-April. This was due to the difficulties related to the difficulties in getting treatment at hospitals and other related matters. Particularly, we found several articles in the local newspaper (Okaz) [74,75] encouraging people to donate blood because the blood bank supplies became low due to the COVID-19 situation. Additionally, we found in the collected dataset several tweets about the need for blood donation where they shared the patient files numbers in different hospitals in different

cities. Furthermore, the Saudi Twitter hashtags account (@HashKSA) posted the following tweet on 12 April:

*Blood banks complain about the lack of donors after the Corona pandemic. The director of the blood bank in Specialist Hospital, Dr. Al-Humaidan, emphasizes the need and urges to donate blood and platelets, especially for the patients of #oncology and #organ\_transplants.*

Figure 12 depicts the Twitter activity related to the macro-concern economic sustainability, which includes the public concerns **Prize Draw**, **Salary**, **Loan**, and **Offers**. The blue color represents the **Prize Draw** concern. A well-known Twitter activity is about some Twitter users who post about a prize and then pick randomly from users who retweeted their tweet about the prize. One of the reasons for them to do this is to get more followers and become famous, and then this is one of the ways to earn income. This activity helps both the person who wins the prize and the one who announced it. It can be noticed in the graph that, during the pandemic, the intensity of the tweets related to this concern was on the rise. We think that having more free time due to staying at home could be a reason for the increase in such activities on social media. Besides this, the financial difficulties that have become a concern for many people due to the pandemic perhaps have led the people to find other ways to earn income. The green color represents activity for the concern **Offers**. Note in the figure that the intensity of the tweets began to increase around the end of March. The timeline coincides with the timeline of curfew enforcement and shop closures. This, we believe, led business owners to increase sale offers on their products to attract customers to keep shopping from their online stores. Our personal experience in Saudi Arabia in the last few months is that many businesses have gone online or have increased their online sales activities. Social media is one of the free and powerful ways for marketing, and the trend of online shopping and sales offers can be witnessed here.

The public concern **Salary** in Figure 12 is represented by the magenta color plot in the figure. We found that on 3 April King Salman of Saudi Arabia ordered the government to contribute towards 60% of the salaries of Saudi private-sector employees with a financial incentive of 9 billion Riyals in total [21]. This explains the dramatic rise in the intensity of tweets on that day. The brown color represents the **Loan** concern; its highest peak was

on 22 March. We found that on that day the Saudi Arabian Monetary Agency (SAMA) announced that Saudi local banks will postpone the 3-month mortgage installments of all public and private health workers starting from April 2020 [20].

**Figure 11.** Daily Twitter activity for a public macro-concern (social sustainability).

**Figure 12.** Daily Twitter activity for a public macro-concern (economic sustainability).

#### *4.3. Spatio-Temporal Analysis*

We investigate in this section the spatio-temporal behavior of selected public concerns during the pandemic. We overlay the location of the specific detected concerns on top of the map of Saudi Arabia. We plot only the tweets that include location information. The size of the circle represents the intensity of the relevant tweets.

Figure 13 depicts the location of tweets about the public concern **Curfew** posted on 2 April 2020. For governance purposes, Saudi Arabia is divided into 13 provinces. Their names are listed on the left of the figure. We have selected the spatial behavior of the concern curfew on this date because the temporal analysis we presented earlier (see Section 4.2, Figure 7) revealed that on that day a 24 h curfew was enforced in the

Makkah and Madinah cities. Note in the figure that the largest circle is over Makkah, and this validates the information we already have. We were expecting to find another large circle over Madinah city, but we did not. The official name of Madinah city in Arabic is " TU -TU", transliterated as "Al-Madinah Al-Munawwarah". The Arabic word " -TU" (Al-Madinah) can also mean "the city", referring to a city that is being referred to in a context, implicitly or explicitly—that is, people may refer to a city as "the city" that is being mentioned in the same tweet or the name of the city may be known from the context of the tweet. The choice we have made in designing the location extractor is that the word "Al-Madinah" if appearing without "Al-Munawwarah" is not considered as a location. We consider the tweet to be about the Madinah city only if the city name is mentioned in full (Al-Madinah Al-Munawwarah). Note in Figure 13 that the activities related to the concern curfew can also be seen in other cities around the kingdom, with some circles (Riyadh) larger than the others. This is because prayers in the main mosques of Makkah (Mecca) and Medina are important for people all around the world.

**Figure 13.** Spatio-temporal behavior of public concern (curfew: 2 April 2020).

Figures 14 and 15 illustrate the location of the tweets about the public concern **COVID-19 Cases** on 22 March and 30 March, respectively. These two dates are selected for the concern **COVID-19 Cases** because the temporal analysis we presented earlier (see Section 4.2, Figure 8) has revealed that the two top peak intensities for the concern happened on these two dates. A total of 119 cases were reported on 22 March, 72 of these in Makkah, 43 in Riyadh, 15 in Eastern Province (4 in Dammam, 4 in Qatif, 3 in Alhasa, 3 in Alkhobar, and one in Dhahran), and one in Alqassim [76]. This explains many circles in the eastern province in Figure 14. Each circle represents a city and the size reflects the tweets' intensity. Note the large light blue circle over Riyadh city and large green circles around Jeddah and Makkah (Jeddah is in Makkah province). We also know that people all around the country were interested in the situation, so they posted about the virus spread and the number of infected people. This explains the presence of circles in different cities around the kingdom.

**Figure 14.** Spatio-temporal behavior of public concern (COVID-19 cases: 22 March 2020).

Figure 15 depicts the spatial information for 30 March. A total of 154 cases were reported on the day with the following distribution: Makkah (40), Dammam (34), Riyadh (22), Madinah (22), Jeddah (9), Haffof (6), Alkhobar (6), Qatif (5), Taif (2), and one in each of the following cities: Yanbu, Buraydah, Alras, Khamis Mushait, Alduwadimi, Dhahran, Samta, Tabuk [77]. To help to understand the map, note that Dammam, Haffof, Alkhobar, Dhahran, and Qatif are in the Eastern Province, whereas the Makkah, Jeddah, and Taif cities are in Makkah Province. Comparing Figure 15 with Figure 14, note that there are some additional circles in Figure 15, implying that the discussion about the public concern had spread to other cities. Moreover, the discussions on the public concern increased in Makkah (dark green circles), perhaps mostly due to the concern becoming a bigger issue over time during March 2020.

#### *4.4. Execution Time Analysis*

We explained earlier that our tool is designed as a distributed computing tool to address scalability in terms of big data and compute-intensive analytics applications. The tool was developed using the distributed computing platform Apache Spark and was executed on the Aziz supercomputer (see Section 3.1). LDA clustering is RAM-intensive. We have used multiple nodes with 256 GB RAM each.

Figure 16 plots the execution times of the LDA algorithm with five iterations against a varying number of cores (24, 48, 72, 96, 120, 144, and 168). The number of features, in this case, was not limited (compare with Figure 17). The results show that parallelizing the LDA algorithm on a higher number of cores (up to a certain extent) reduces the execution time. The LDA algorithm took 163.9 h (6 days) on 24 cores. We were able to reduce this time to the minimum time of 23.6 h using 168 cores. Increasing the number of cores beyond 120 (to 144 and 168) did not help much and only managed to reduce the execution time of the LDA

algorithm a little. This behavior where the execution time of a parallel algorithm does not decrease with an increase in the number of computing cores or nodes is a normal behavior in parallel or distributed computing and happens when the task size is small relative to the number of cores. This is caused by the overhead of parallelizing or distributing a task. Usually, once the parallelization reaches a saturation point where an increase in the number of cores does not decrease the execution time, the execution time may even begin increasing with an increase in the number of cores (see Figure 17).

**Figure 15.** Spatio-temporal behavior of public concern (COVID-19 cases: 30 March 2020).

Figure 17 plots the execution times against the number of cores (24, 48, and 72) for a varying number of LDA iterations (5, 10, 50, 100, 250, 500, 1000) using 10,000 features/keywords (we have limited the number of features to reduce execution times). For the LDA algorithm with 1000 iterations, we are able to reduce the execution time by more than half, from 16.8 h on 24 cores to 7.4 h on 48 cores, benefitting from an increase in the number of cores. The LDA algorithms with the lower number of iterations (5, 10, 50, . . . , 500) have also benefited by their execution on a higher number of cores. However, a further increase in the number of cores (72 from 48) does not improve execution speed and rather increases the execution time. As explained earlier, this is a normal behavior in parallel computing due to the parallelization reaching the saturation point.

Generally speaking, a higher number of iterations is expected to produce better clusters. Our experiences in this work suggest that the clusters (public concerns) obtained from 100 iterations were better than the other configurations in terms of the relationship between the keywords of a cluster, etc., enabling us to better label the clusters with appropriate public concern names. Based on the results, the best choice was to execute LDA with 100 iterations on 72 cores. The results reported in this paper are based on this configuration (LDA with 100 iterations and 10,000 features).

**Figure 16.** Execution time vs. number of cores for varying number of LDA iterations (no limit on the number of features).

**Figure 17.** Execution time vs. number of cores for various numbers of LDA iterations (a limited number of features— 10,000 keywords).

It may appear that the total savings one would obtain by using our tool on Apache Spark would be 4 h (7.34–3.31 h, for the LDA algorithm with 100 iterations). The process of LDA clustering such as presented in this paper may require running the LDA algorithm many times on large volumes of data with different numbers of iterations and features. In our case, we executed the LDA algorithm with various configurations between 30 to 40 times. For this, using the LDA algorithm with 5 to 1000 iterations would easily require over a month of computing time. The ability of the tool to execute in parallel could save a month of computing time in this case and speed up the development process. For larger datasets, executing sequential codes may not even be possible, or distributed computing

could save years of development time. How to select the number of cores for a given job that could save experimental time and energy itself is a challenge and has been addressed in our other works [78,79].

#### *4.5. Pandemic Measures, and Public Concerns, and Their Interrelationship*

Table 1 lists the fifteen major pandemic measures and public concerns discussed by the public on Twitter during the COVID-19 pandemic. The pandemic measures are quarantine, stay home, prevention (COVID-19), cleaning services, curfew, loan, salary, and back to normal. The measures taken by the public and industry to address the economic difficulties caused due to the COVID-19 pandemic are offers and prize draw. The public concerns are COVID-19 cases, supplications, Five Daily Prayers (Salah), mobility, hospital treatment. Some measures, in a way, could also be concerns. For example, quarantine, stay home, prevention (COVID-19), curfew, loan, salary, and back to normal are both measures and concerns.

The interrelationship or impact of public, industry, or government measures on public concerns can be evidenced in our analysis presented in this section. For example, the events related to loans were being discussed by the public, but were the highest peak was detected on 22 March, the day when the Saudi Arabian Monetary Agency (SAMA) announced it in the media (see [20]). Another example is the "No Mobility" event (empty roads) that was vigorously discussed on 24 March, two days after the curfew measure was announced. The impact of the quarantine and curfew measures was also seen in a reduction in blood donations and blood supplies, leading to increased Twitter activity (concern) on this topic requesting blood donations from late March to mid-April. This concern can also be seen as a measure by the hospital authorities to announce the blood shortage and request action from the public.

### **5. Conclusions**

The level of digital and physical connectedness of today's societies has never been seen before. We travel a lot to distant lands and frequently share gifts and viruses with each other. Unfortunately, the COVID-19 pandemic has exposed the vulnerabilities of this unprecedentedly connected world. The COVID-19 pandemic is rapidly growing across the world. Many countries have been affected and the number of cases has greatly increased. World Health Organization (WHO) declared it a pandemic on 11 March 2020. Currently, medical specialists can only treat the symptoms of the disease, since there are no cures for this disease, and developing a new vaccine with low risks and a high success rate will take time. Therefore, it is a serious global health issue.

Social networking platforms such as Twitter streams hundreds of millions of posts daily. They can be treated as a useful medium for the dissemination of information about diseases. This provides us a great opportunity to study and capture the dynamics of real-world events and understand the various public measures being undertaken by governments, as well as the changes in the daily activities of people during such outbreaks.

In this paper, we proposed a software tool that aims to detect government pandemic measures and public concerns during the COVID-19 pandemic. The methods used in the tool include an unsupervised Latent Dirichlet Allocation (LDA) topic modeling algorithm, natural language processing (NLP), correlation analysis, and other spatio-temporal information extraction and visualization methods. The tool is built using a range of technologies, including MongoDB, Parquet, Apache Spark, Spark SQL, and Spark ML. The tool, its architecture, five software components, and its algorithms are described in detail. Using the tool, we collected a dataset comprising 14 million tweets from the Kingdom of Saudi Arabia (KSA) for the period 1 February 2020 to 1 June 2020. We formulated and analyzed the findings of this paper from three relationship perspectives: information-structural, temporal, and spatio-temporal.

Concerning the information-structural or subject matter perspective, we have detected 15 government pandemic measures and public concerns and have grouped them into

six macro-concerns. For the **pandemic measures** implemented by the Saudi government concerning the COVID-19 pandemic, we detected curfew and restrictions on mobility in the country, quarantine and fines, restrictions on praying in the mosques, campaigns to stay home, COVID-19 prevention, and cleaning services provided to curb the coronavirus spread. For **economic sustainability**, we detected that the government provided financial incentives including loans and private-sector salaries. Businesses increased offers to increase their sales. People moved to or increased in their online economic activities, such as activities related to prize draws for income earnings. For health, well-being, and **social sustainability**, we detected that blood donation and treatment at hospitals have been a major cause of concern. People also actively talked about the new number of cases. The **daily livelihood** issues in Saudi Arabia include the five daily congregational prayers at the mosques that were suspended by the government. People also increased in supplications for the safety of people. A significant reduction in mobility was noted across the country that was related to **environmental sustainability**, health, and well-being due to the reduction in traffic congestion and air pollution. As regards the temporal perspective, we were able to detect the timewise progression of events from the public discussions on COVID-19 cases in mid-March to the first curfew on 22 March, financial loan incentives on 22 March, the increased quarantine discussions during March–April, the discussions on the reduced mobility levels from 24 March onwards, the blood donation shortfall from late March onwards, the government's 9 billion SAR salary incentives on 3 April, lifting the ban on five daily prayers in mosques on 26 May, and finally the return to normal government measures on 29 May 2020. For the **spatio-temporal** perspective, we extracted location information using different approaches including tweet text and hashtags, geo-coordinate attributes, and user profiles. We were able to detect important events in over 50 cities around the kingdom, with major activities related to COVID-19 cases, curfew, etc., in the Makkah, Riyadh, and Eastern provinces. We validated the detected government measures and public concerns and their spatial and temporal nature through external validation by searching online news media or internal validation by checking tweets.

The detected events in KSA are also aligned with **international concerns,** such as various lockdown measures [14], reduced mobility [15], reduction in blood donations [16], financial difficulties and related government incentives [17,18], and worries related to returning to normal times [19]. Saudi Arabia has followed different strategies to fight the outbreak, instill a sense of responsibility, and raise awareness among people about the importance of their role in the fight against coronavirus. The government undertook early actions to prevent the spread of the virus. KSA reported its first case of the COVID-19 on 2 March. One week later, they closed the schools. On 16 March, they suspended all international and national flights, closed shopping malls, and suspended all sports activities. On 18 March, the attendance of employees at their workplaces in government agencies and the private sector was suspended. Furthermore, the king ordered free treatment for all citizens and residents, even for the violators of the residency system. The KSA government also provided financial incentives in terms of private-sector salaries and the temporary postponement of loan payments.

The research reported in this paper is different from the existing works on social media analytics for COVID-19-related studies in several respects, as has been discussed in detail in Section 2. None of the existing works have reported a similar COVID-19 analysis of Twitter data in the Arabic language in terms of the modelling methods used and the depth of the analysis. The software developed for this work is part of the tool Iktishaf [6–9] that we have been developing for the last few years. The ability of the tool to execute in parallel could save a month of computing time for the specific dataset size and the problem addressed in this paper and speed up the development process. For larger datasets, executing sequential codes may not even be possible, or distributed computing could save years of development time.

The findings presented in this paper show the effectiveness of the Twitter media in detecting important events, government measures, public concerns, and other information in time, space, and information-structure with no earlier knowledge about them. The utilization possibilities of such tools are unlimited. For example, governments could learn about the various public concerns in pandemic and normal times and develop policies and measures to address these concerns. The public could raise their concerns and give feedback on government policies. The public could learn about various public and industry activities (such as economic activities detected by our tool) and get involved in these to address financial, social, and other difficulties. The standardization and adoption of such tools could lead to real-time surveillance and the detection of disease outbreaks (and other potentially dangerous phenomena) across the globe and allow governments to take timely actions to prevent the spread of diseases and other disasters. The international standardization of such tools could allow governments to learn about the impact of policies of various countries and develop best practices for national and international response.

While we have shown good evidence of the use of LDA, NLP, and other methods, more work is needed to improve the breadth and depth of the work with regard to what can be detected, the diversity of data and machine and deep learning methods, the accuracy of detection in space and time, and the real-time analysis of the tweets.

Our focus in this work is on Saudi Arabia. The tool hence currently works with tweets only in the Arabic language. The tool can be used in other Arabic languagespeaking countries, such as Egypt, Kuwait, Bahrain, and UAE. The system methodology and design of the tool developed in this paper are generic, and therefore the tool can be extended to other countries globally. This will require the adaptation of the tool with additional languages, such as English, Spanish, or Chinese, by additional modules in the pre-processing and clustering modules.

**Author Contributions:** Conceptualization, E.A. and R.M.; methodology, E.A. and R.M.; software, E.A.; validation, E.A. and R.M.; formal analysis, E.A. and R.M.; investigation, E.A. and R.M.; resources, R.M., I.K., and A.A.; data curation, E.A.; writing—original draft preparation, E.A. and R.M.; writing—review and editing, R.M., A.A., and I.K; visualization, E.A.; supervision, R.M.; project administration, R.M., I.K., and A.A.; funding acquisition, R.M., A.A., and I.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This project was funded by the Deanship of Scientific Research (DSR) at King Abdulaziz University, Jeddah, under grant number RG-6-611-40. The authors, therefore, acknowledge with thanks the DSR for their technical and financial support.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data was obtained from Twitter. Restrictions apply to the availability of these data.

**Acknowledgments:** The experiments reported in this paper were performed on the Aziz supercomputer at King Abdulaziz University.

**Conflicts of Interest:** The authors declare no conflict of interest.
