1. Introduction
Since the COVID-19 pandemic spread around the world, the damage and changes to the global economy and trade, as well as people’s lifestyles and health, have been immeasurable [
1,
2,
3]. The COVID-19 pandemic has attracted widespread attention. People have been fighting COVID-19 since its discovery in December 2019 [
4,
5]. As the situation improves, COVID-19 pandemic prevention and control have become normalized and must be balanced to ensure normal life can proceed [
6].
COVID-19 belongs to the coronavirus family and has a single plus-stranded RNA genome. The unstable single-stranded structure means COVID-19 is prone to variation [
7]. Variants of concern (VOC), named by the World Health Organization, are associated with increased transmissibility and harm, changes in clinical manifestations, public health management measures, existing diagnostic and therapeutic modalities, and reduced vaccine efficacy [
8]. Virus variation is inevitable, predictable, and highly harmful. Therefore, the emergence of virus variation will pose new challenges to COVID-19 pandemic prevention and control.
Crisis refers to unexpected, sudden, and uncommon [
9] events threatening human life and property [
10]. In the context of the normalization of COVID-19 prevention and control, the sudden emergence of virus variants has produced unexpected and disruptive crises. Crises are characterized by uncertainty, suddenness, and destructive power. The timing of crises is unpredictable, so uncertainties exist in all stages of a crisis life cycle [
11].
Internet platforms provide the easiest access to information, and more than 70% of adults use them when their health is at risk [
12]. Information affects the effectiveness of individual knowledge acquisition and self-management on the Internet [
13]. As an information carrier of Internet platforms, videos are deeply loved by contemporary Internet users and provide essential support for individuals to obtain information during a crisis [
14]. During the COVID-19 pandemic, video-based information has become one of the most important information sources for the public [
15,
16,
17], and studies on video information has also become meaningful.
Any form of interaction between individuals on Internet platforms during a crisis is meaningful [
18] as interaction can reduce individual perceptions of uncertainty [
19]. In the era of rapid Internet development, the public is accustomed to using Internet platforms to obtain information and interact with others [
20,
21]. Public access to crisis information through Internet platforms and the ability to share it with others can help individuals ease the effects of negative emotions caused by the crisis and determine their next steps [
22]. When a crisis occurs, the public deepens their cognition of the crisis by collecting information, and individual behavior of obtaining information through videos involves watching videos. Information sharing can widely spread information and provides a method for individuals to communicate and interact with others. The information in videos is shared by forwarding videos [
23]. During a crisis, the public watches and forwards videos to gain information and communicate with others to reduce their possibility of being harmed. Crises cause informational and emotional gaps [
24]. Individual needs will change to reduce the gap, and people will forward and watch videos that can meet their informational and emotional needs.
In the context of the COVID-19 pandemic situation caused by the emergence of virus variants, in this study, we aimed to explore the impact of content relevance and emotional consistency between videos and virus variant topics on the number of views and shares of videos. We used the support vector machine (SVM) classification algorithm to calculate the content correlation between novel coronavirus videos and virus variant topics and then used sentiment analysis to calculate the emotional consistency between COVID-19 videos and virus variant topics. In this study, we included the calculated content relevance and emotional consistency scores in the independent variable index, and we conducted empirical to test the study hypothesis.
4. Materials and Methods
During data preprocessing, we used the SVM algorithm to evaluate the content relevance between videos and virus variant topics, and we used sentiment analysis to evaluate the emotional consistency between videos and virus variant topics. For hypothesis testing, we used empirical analysis to test the impact of content relevance and emotional consistency between videos and virus variant topics on the number of video views and shares. The research workflow is shown in
Figure 2, which we mainly divided into the following steps:
Step 1: Audio to text: We used mature audio-to-text technology to convert video audio into text.
Step 2: Sentence segmentation: We segmented the video audio that was processed into text, and we segmented the video audio text into several sentences.
Step 3: Data cleaning: We eliminated valid data should when the audio was converted to text and sentences were segmented.
Step 4: Conversion to text and sentence segmentation: With stratified random sampling and labeling, we randomly selected 4 sentences from each video as the labeling set. We invite researchers to label the sentences as related to virus variant topics or not, which formed the labeled data set.
Step 5: SVM classification: We used the annotated data set as the training set to train the SVM classifier with the linear kernel function, and all valid sentences were classified. The output of two data sets included a sentence set related to and a sentence set unrelated to virus variant topics. Then, according to the content relevance calculation formula, we calculated the content relevance between video and virus variant topics.
Step 6: Emotion analysis: We used the mixed emotion dictionary to analyze the emotion score of the two data sets, and we obtained the emotion score of each sentence and the emotion consistency score of each video.
Step 7: Negative binomial regression: We used the extracted independent variables (content relevance and emotion consistency) and dependent variables to perform negative binomial regression and hypothesis testing.
Step 8: Robustness test: We used the replacement variable measurement method to test the robustness of the model.
4.1. Data Sources
The research data in this paper comes from a well-known Chinese video platform, which is characterized by attention to originality and strong social functions. Originality ensures that videos on the platform are produced and published by the video owner. Videos on the platform rarely have different people publishing precisely the same content. Intense sociability indicates that users not only use the platform to obtain information but also are willing to socialize on the platform, such as by forwarding information. The platform and data characteristics met our study needs.
This paper used Chinese words related to virus variants and virus variant names such as “omicron” as search keywords to obtain video data. Our results of data analysis showed that May to August 2021 was the platform’s hottest time for virus variant topics. The number of videos published was above 500 per month, higher than the average of 238 in other months. In addition, the monthly growth rate in May 2021 was 25.37%, decreasing in August to 34.81%. Therefore, this article selected video data from May to August 2021, for a total of 4049 videos. The video data included the video screen and audio, video duration, views, shares, uploader certification type, activity video identification, leaderboard ranking, and other tags.
4.2. Data Preprocessing
Platform’s search mechanism finds keywords that appear in the video’s title, tag, description, or the publisher’s nickname, and then displays the search results. Therefore, our search results were not necessarily related to the virus variant crisis. Most videos were 2–16 min in length, and videos less than 2 min contained limited information. In a crisis, users’ negative emotions make needs for information acquisition or sharing urgent, so videos longer than 16 min do not conform to user habit of fragmentary information acquisition. Therefore, we eliminated videos that were entirely irrelevant to the COVID-19 topic through manual reading, and we removed videos that were shorter than 2 min and longer than 16 min, for a total of 2501 remaining videos.
This study used mature speech-to-text technology to convert the data of the 2501 videos from audio to text. The conversion accuracy was more than 99%, which met the study requirements. Through text processing, we found that some videos did not use human voice dubbing, so the audio-to-text result was empty. We removed the videos for which the result of audio-to-text was empty, the language was not Chinese, or the language was utterly illogical and rough. We finally obtained 2037 videos that met the study requirements.
A video 2–16 min in length contains multiple sentences. This article uses the Chinese ending words as the segmentation basis and divided the 2037 videos into several sentences. The video text contained the uploader’s opening and closing remarks, transitions between topics, and other content unrelated to the video content, which needed to be eliminated. The final total was 39,703 sentences.
4.3. Support Vector Machine (SVM) and Content Relevance
SVM is a machine learning algorithm based on statistics. It has substantial advantages in binary classification problems, is widely used in natural language processing, so was appropriate for our study [
50]. This paper selected 4 sentences from each video as the training data, and we invited two researchers to label each text according to whether the topic was related to virus variants. The labeling results were verified as reliable after inspection.
This paper used the “Jieba” library for word segmentation, processing the words after eliminating useless sentences. We used the disabled dictionary to remove meaningless words, imported the custom dictionary to increase the accuracy of the segmentation result, and replaced synonyms for words with the same meaning. Due to the limited information expressed by a single character, we retained only phrases with a character length greater than 2 in word segmentation. Then, we used TF-IDF to process the word vectors of the segmented data. The SVM using a linear kernel learned whether sentences were classifiers for virus variant topics. The accuracy of the SVM classifier was 89.37%, and the F1 value was 88.95%, indicating that the training was effective.
We obtained the data used in training the classifier from 39,703 sentences after segmentation, and each video had data included in the classifier. Therefore, our classifier could be appropriately applied in our experiments. Next, we used the trained classifier to analyze all 39,703 data items to determine whether they were related to virus variant topics.
The training classifier determined whether each sentence was related to virus variant topics, so that we could judge the correlation between the video and virus variant topics. The number of sentences related to virus variants indicated how many sentences related to virus variant topics were in a video. The total number of sentences indicated
4.4. Emotional Consistency
Sentiment analysis is a text analysis method used to explain emotion intensity [
51]. We adopted a sentiment lexicon as the tool for sentiment analysis. The mixed sentiment lexicon can make a lexicon more complete. A self-built sentiment lexicon can be used to modify the weight of some words and add new words to be more suitable for the study context. This paper assigned an emotional score to each sentence after the video was segmented and used the emotional score to calculate the emotional score of the video. Through the emotional score of the video, we could understand the emotional intensity of a video. The absolute value of the calculated video emotional score ranged from 0 to 6. The mean emotional score of sentences related to virus variant topics indicated the emotional intensity of the virus variant topic.
6. Discussion
6.1. Main Conclusions
From the theoretical combination URT and FET, this paper used emotion analysis, the SVM algorithm, and empirical analysis to explore the factors influencing the number of video views and shares when COVID-19 variation caused tension among people.
First, functional emotion theory well explained the impact of emotional consistency on the number of views and shares in this study. This paper used the combination of URT and FET to explain the complexity of individual acquisition of information and sharing behaviors. FET could explain the impact of emotion on the number of video views and shares during the COVID-19 crisis. URT is a theory commonly used in the study of public health emergencies, but uses of FET to explain individual information collection and sharing behaviors in crises have been relatively rare. FET has often been used to understand the motivational effect of emotional perception on behavior. Previous studies focused on the influence of negative emotions on adopting protective behaviors. Nabi proposed discrete emotions as a psychological framework. In drunk driving scenarios, emotions stimulate relevant behavioral tendencies, which promote selective allocation of attention and information-seeking behaviors [
53]. However, in different scenarios, the motivating emotion of a broader range of behaviors may be positive or negative, depending on the differences between the scenarios [
47]. Therefore, from the perspective of the consistency between video and virus variant topics, this paper considered how COVID-19 videos caused the emotional settings related to the virus variant topics.
Second, this paper trained an SVM classifier to automatically divide the audio text into a set of sentences related to virus variant topics and a set of unrelated sentences. The trained SVM was effective, with an accuracy of 89.37% and an F1 of 88.95%. Previous researchers mostly used content analysis to classify video content [
54], which required the watching of videos to manually mark the content, which limited the number of videos used. This paper extracted essential information from video employing audio-to-text conversion and used the SVM algorithm for automatic text recognition. The content relevance score indicated that videos were less relevant to the virus variant topics. A 2–16-min video can contain considerable amounts of information, discussing virus variations while conveying other information. A study of vaccine popularization information on YouTube also showed that a video can cover one or more topics in vaccine science and mechanism of action, vaccine trials, vaccine safety or side effects, and advocating preventive measures [
55].
Third, the results of empirical analysis showed that the content relevance between videos and virus variant topics positively correlated with the number of video views and shares. Compared with historical pandemic video information, viewers are more likely to watch and forward videos related to COVID-19 [
15]. Because historical pandemic information can only act as a reference, information about COVID-19 is more effective in reducing uncertainty by directly providing details. The information is not stable and consistent during a crisis [
56]. The dynamic changes in information caused by the changes in critical information and arguments make the information consumption behavior full of uncertainty [
57,
58]. Only a few researchers have considered the characteristics of information-seeking and sharing behavior when the uncertainty of a crisis creates information demand. The infectivity, fatality rate, and incubation period of the novel coronavirus variants were all unknown. The environment in which an individual lives is full of uncertainties. Individuals are only interested in information that can address their current uncertainties. To reduce uncertainty, individuals have tended to seek information about virus variants. Individuals watch and forward videos that provide more relevant information, i.e., videos that are more relevant to the virus variants.
Fourth, our results showed that the higher the emotional consistency, the more a COVID-19 video is viewed and shared. Emotion, as the external manifestation of an individual’s feelings, is reflected in individual language and writing, which reflect the individual’s attitude toward events. In studies on emotion in crisis, researchers determined individuals’ emotional expression from individual language and writing to measure individual feedback on public crisis management [
59]. However, the role of emotion as the internal force driving individual behavior in a crisis can be easily ignored. In the case of the tension caused by the COVID-19 variants, individuals’ information collection and sharing behaviors are related to emotions. Individuals tended to watch and forward videos on virus variant topics that conveyed emotion similar to theirs. When watching videos with high emotional consistency, individuals feel they are in an environment affected by virus variants, so are likely to generate corresponding emotional perceptions, such as fear and unease about virus variants. Such emotional perception prompts individuals to watch and forward the video.
COVID-19 prevention and control have always had to face the challenges produced by new crises, such as the emergence of new virus variants. From our findings, we provide several recommendations for public health departments, news media, and individuals concerned with the COVID-19 crisis.
First, the SVM algorithm classifier could identify whether the text in COVID-19 videos was related to virus variant topics. The coronavirus variations have led to crises during the COVID-19 pandemic, which have posed constant challenges to the prevention and control of the pandemic. The classifier could monitor the relevance between videos and virus variant topics and evaluate the change in viewer attention to virus variant topics. It can also provide technical support for popularizing highly relevant videos when a new COVID-19 variant emerges.
Second, this paper revealed that when the COVID-19 virus variant, the number of video views and shares is related to content relevance and the emotional consistency between the videos and virus variant topics. Specifically, the public is more willing to watch and forward videos highly related to virus variant topics and that have similar emotional expressions. Public health departments and news media use the Internet to provide information to the public and support for the public to use the Internet to collect and share information. This finding suggests that public health departments and news media should consider both the relevance of videos and virus variant topics, as well as whether the video’s emotion is close to the crisis emotion when providing video information to people in areas where the COVID-19 pandemic is serious.
Finally, for individuals in crisis, our findings explain what kind of video can help them understand the crisis and is suitable for sharing with others. In the case of unpredictable danger and the timing of crises, our findings can help individuals quickly find the information they need when a crisis occurs. Individuals can also reduce their uncertainty by watching and forwarding videos and feel sure about the next steps in a crisis.
6.2. Limitations of the Research
This study has many limitations. First, the sectional data we used could not reflect the changes over time in the number of video views and shares. The situations caused by the pandemic are variable, resulting in sudden tension, followed by relaxation as waves caused by the variants pass. The given situation considerably and quickly changes individual information-seeking and -sharing. Therefore, the time when a video is uploaded during a crisis is critical. In future studies, panel data of the number of video views and shares on specified days can be considered. The classification of time can also be more detailed. For example, when the information demand is high, we can use the classification of the video release time based on the search volume on Internet engines as the classification standard. Second, due to the limitation of the study methods and data, we did not consider the entertainment experience when individuals watched the videos. In addition to information and emotional needs, watching videos is a form of entertainment. Factors affecting individual viewing and immersion perception, such as the camera position of the video producer when recording the video, will also impact the number of video views and shares [
60].