*1.3. Social Media and Vaccine Hesitancy*

Web 2.0 has made discovering and sharing information online more convenient than ever with the move from passive consumption to active generation of content, leading to Health 2.0, where social media users share advice and experiences relating to health care [18]. However, despite social media being readily utilised to promote public health, and increasing numbers of people using social media to research vaccinations [17,19], health-care professionals remain a key source of vaccine information [20]. Media and celebrity opinion on social media is known to contribute to anti-vaccine beliefs [21] and the way in which research is interpreted by the media can have a profound effect on influencing public perception [22,23]. Scientists regularly challenge inaccurate information on social media and one high-profile example of this occurred in September 2021, when Professor Chris Whitty, the Chief Medical Officer for England and Chief Medical Advisor for the UK Government, was asked at a televised press conference about a tweet by rapper Nicki Minaj which claimed that her cousin's friend was rendered impotent after taking a Coronavirus vaccine which caused swelling in his testicles. Prof Whitty said that these "myths ... untrue ... designed to scare ... they should be ashamed", leading to a conversation which continued afterwards in the media, including on social media. Despite progress being made to combat false reporting of science [23], understanding reasons behind vaccine hesitation will allow insight into how these beliefs may be counteracted effectively. Analysis of tweets during a 2013 measles outbreak [24] noted users informing each other about the importance of vaccination in light of the outbreak, illustrating a positive application of social media to educate others regarding the importance of vaccines to prevent outbreaks of disease.

However, the echo-chamber effect described by Piedrahita-Valdés et al. (2021), explains how users with differing beliefs consume homogeneously polarised content regarding vaccines and form opposing groups who rarely communicate with one another positively [25]. Hence, debate regarding vaccines may have little positive outcome, as prior personal beliefs are only reinforced in this environment. Efforts by health professionals to promote vaccination through social media have not always received a positive response; and in extreme cases, health-care professionals have been threatened after posting videos online encouraging vaccination [26].

During the UK national lockdowns in 2020 and 2021, much of the conversation regarding COVID-19 took place on social media platforms including Twitter, which has approximately 300 million monthly users [27,28]. Social media has become a common platform for individuals to voice their concern and share their thoughts with others during times of crisis [29]; but whilst these platforms allow the rapid dissemination of information, there is no guarantee that the information is correct, reliable or accurate [30] and the majority of anti-vaccination communication and conversation takes place over the internet [31]. Google search interest for the term 'vaccine' has greatly increased since March 2020, peaking in March 2021 [32].

In a July 2020 UK survey, 16% of participants stated that they would be unlikely to accept a COVID-19 vaccine [33]; and between September and October 2020, 12% and 17% of individuals were strongly hesitant or very unsure, respectively [34]. The likelihood of refusal of the COVID-19 vaccine was also found to be higher among young adults who are indifferent about COVID-19 and lack trust in scientists [33].

#### *1.4. Sentiment Analysis and Data Mining*

Natural language processing (NLP) research topics rely heavily on the use of sentiment analysis and opinion mining, where sentiment analysis is the study of opinions, feelings and attitudes towards a product, organisation or event [35–37]. Opinion—or text—mining involves extracting knowledge and information from online text, usually focusing on a certain topic and categorising it as positive, negative or neutral [38,39].

Python is a versatile computer programming language which can manage large datasets, making it ideal for use in complex projects [40–42]. It can be used to retrieve tweets that contain chosen search terms and store them via a designated database engine, such as SQLite. Valance Aware Dictionary and sEntiment Reasoner (VADER) is one of many tools found within the popular Natural Language Toolkit (NLTK), with an excess of 9000 lexicon features and the ability to analyse sentiments extracted from social media sources. It produces a gold-standard sentiment lexicon by combining quantitative and qualitative methods [43]. Sentiment lexicons contain lists with initial lexical capabilities (words) categorised to a semantic orientation (i.e., positive or negative) [38,44]. The VADER lexicon is a collection of predefined words with an associated polarity score—analysing the positive and negative aspects of text and determining overall polarity. Typically, neutral sentiments have a polarity score of 0 due to unidentifiable sentiment in the text. Negative and positive sentiments are assigned polarity scores of less than and greater than 0, respectively [45]. According to Satter et al. (2021), it is one of the easiest approaches to sentiment classification [28] with VADER based on a gold-standard sentiment lexicon with an ability to process acronyms and slang words [46], making it highly sensitive to sentiment expressions when applied to social media contexts. Hutto and Gilbert (2014) determined that VADER analysis performed better in comparison to eleven other highly regarded sentiment models and interestingly the accuracy of VADER has been determined to outperform individual human analysers at correctly classifying the sentiment of tweets [47]. In the majority of machine learning approaches to sentiment classification, for example, Microsoft Azure's Text Analytics suite, a labelled dataset is required, whereby the polarity of text is predefined. Whilst Azure's graphical interface can be utilised by individuals with little to no formal computer programming experience, making it an ideal software to use for novices, VADER, on the other hand, requires domain-specific knowledge of computing to use.

#### *1.5. Sentiment Analysis of Vaccine Hesitance*

Vaccine hesitancy is a fluid and ever-changing phenomenon [47]. Previous studies have typically focused on vaccine hesitance in general rather than being directed at specific vaccines and have revealed different trends across time [25,48]. Rahim et al. (2020) analysed approximately 100,000 tweets about vaccinations between October 2019 and March 2020 and determined that the majority (41%) were positive in sentiment, closely followed by neutral sentiment (39%) and 20% were negative [48]. COVID-19-specific vaccine hesitancy has also been investigated: in May 2020, vaccine hesitancy rates were low (20–25%) in American and Canadian adults [49], whereas, in Italy, the rates of COVID-19 vaccine hesitancy were 41% [50] and 26% in France [51].
