3.1.2. Word Frequency

The word count (Figure 2) shows the most frequently identified term was clearly '#covid19 with other terms such as 'people', 'get' and 'vaccine' also frequently used. There was no mention of specific groups such as 'children' or 'parents', only the collective term 'people'.

**Figure 2.** Top 50 frequently recurring words.

A word cloud (Figure 3a) displays the most frequently used words in size descending order. The larger-sized words depict a higher frequency of the word. To further understand the relationship between words and their frequency, analysis into the most prevalent words was conducted from the separate positive, negative and neutral groups.

In the positive category (Figure 3b), the most commonly recurring words were '#covid19 (29,661), 'people' (5313) and 'please' (4455). In the neutral category (Figure 3c), the most commonly used words were '#covid19 (14,399), 'people' (2469) and '#vaccine' (2322). In the negative category (Figure 3d), the most commonly used words were '#covid19 (31,725), 'people' (7925) and 'get' (4282). Noticeable words in this category include 'don't', 'get', 'vaccinated' and 'death', which could suggest that users are advising others not to receive the vaccinations.

**Figure 3.** *Cont*.

**Figure 3.** (**a**) Word cloud of the top fifty repeated words (https://wordart.com/, (accessed on 15 August 2021)); (**b**) word cloud of the top twenty-five most repeated words in the positive category; (**c**) word cloud of the top twenty-five most repeated words in the neutral category; (**d**). word cloud of the top twenty-five most repeated words in the negative category.

The frequency and percentage (Table 2) of the sentiment of tweets in each week were determined to establish whether there was a trend across time between the groups.


During week 1, positive tweets were the most frequent (14,305; 39.0%) compared to negative (13,900; 37.9%) and neutral (8398; 22.9%). By week 2 and week 3, negative tweets (19,691; 39.0% and 20,308; 40.0%, respectively) were most frequent compared to positive (19,394; 38.4% and 19,372; 38.1%) and neutral (11,352; 22.5% and 11,061; 21.7%) (Table 2, Figure 4).

**Figure 4.** Frequency of negative, positive and neutral tweets over a 3 week period. The frequency of all sentiment groups increased in week 2 compared to week 1. The frequency of negative tweets continued to increase into week 3, whereas positive and neutral tweets slightly decreased.

> To determine whether there was a significant difference between the frequency of positive, negative and neutral scores, mean values were established for each week of data collection (Figure 5).

A two-sample *t*-test with equal standard deviation was performed between the first and final week of each sentiment group to investigate difference over time. The positive average (0.508; SD = 0.511) during week 1 was found to be equal to the positive average in week 3 (0.498; *p* = 0.110). The Test statistic (*t* = 1.597) was found in the 95% critical value accepted range. The negative average (−0.554; SD = 0.511) values during week 1 were found to be equal to the negative average in week 3 (−0.553; *p* = 0.858). The Test statistic (*t* = −0.177) was in the 95% critical value accepted range. The neutral average (0.00019; SD = 0.511) values during week 1 were found to be equal to the negative average in week 3 (0.00017; *p* = 0.997). The Test statistic (*t* = 0.003) was in the 95% critical value accepted range.

#### 3.1.3. Intensity of Sentiment

Week 1 (−0.345, 0.508, 0.00019) and week 3 (−0.358, 0.499, 0.00017) displayed similar trends of negative, positive and neutral tweets, respectively (Figure 5). During week 2, neutral tweets displayed more negativity than positivity (−1.322).

The means of tweets were subjected to a two-way ANOVA (Table 3). The difference between weeks is not statistically significant (*p* = 0.1951), which is indicative of no significant change in mean values between weeks. The difference between averages of the sentiment results (i.e., negative mean value against positive mean value against neutral mean value) is statistically significant (*p* < 0.0001).


Negative tweets had a higher mean value (0.52706) than positive (0.48196) and neutral (0.50119) tweets (Table 4). To compare the means between the groups, Welch's *t*-test (twosample *t*-test) was performed (due to unequal variance and differing *n*) using MATLAB. Firstly, the values were normalised by mapping to the range of 0–1, where 0 is the "least" and 1 is the "most", i.e., negative tweets were mapped from [−1, −0.05] to [0, 1], where 0 is least negative (−0.05) and 1 is most negative (−1). This was achieved using an inverse interpolation function (t−a)/(b−a), where t is the value, a is the lower bound and b is the upper bound.


**Table 4.** Descriptive statistics of collected data, post-normalisation.

<sup>1</sup> Sample size; <sup>2</sup> standard deviation.

Welch's *t*-test demonstrated that positive vs. negative (*p* < 0.001), positive vs. neutral (*p* < 0.001) and negative vs. neutral (*p* < 0.001) groups show statistical significance between the means. This suggests that sentiment across our dataset displays a larger intensity of negative sentiment compared to positive or neutral., i.e., the negative tweets are "more" negative than the positivity in positive tweets.

*3.2. Machine Learning vs. Lexicon Based: A Comparison of Negative, Positive and Neutral Tweets*

The Natural Language Toolkit (or NLTK) (https://www.nltk.org/, (accessed on 21 July 2021)) [60] was used for the VADER sentiment analysis and scored 53,899 tweets as negative, 53,071 as positive and 30,811 as neutral, whereas Azure determined the frequency of the categories as 67,538, 45,282 and 24,961, respectively. They reveal similar trends whereby most tweets were negative, followed by positive and neutral tweets being least prevalent (Table 5).

**Table 5.** Comparison between Python-based VADER and Microsoft Azure sentiment analysis approaches.


<sup>1</sup> Standard deviation.

The lexicon-based (VADER) and machine learning (Microsoft Azure) approaches to classify sentiment were compared (Table 5, Figure 6). A total of 39.11% of tweets were scored as negative by VADER and 49.01% were scored as negative by Azure. The percentage of tweets scored by VADER and Azure as positive were 38.51% and 32.86%, respectively. A total of 22.36% and 18.11% were considered neutral.

**Figure 6.** Total number of negative, positive and neutral tweets as determined by Microsoft Azure and VADER.
