1. Introduction
Emotions attract close attention because they make a decisive contribution to attitudes toward events, objects, and people. Motivation and behavior are largely determined emotionally at both the individual and collective levels, so understanding emotional phenomena provides a more complete picture of social processes and mass actions. Tracking the changes in collective emotions allows us to predict people’s reactions to events [
1] and prevent negative scenarios [
2]. Monitoring based on data from social networks favorably differs from classical sociological and statistical methods in its unobtrusiveness and minimal time lag between an emotional shift and its identification. This expands the possibilities of a prompt response to the growth of negative moods in society [
3]. In addition, several areas of public interest, including health, finance, entertainment, advertising, and culture, could potentially benefit from measuring human emotions on social media [
4]. The importance of understanding collective sentiment increases dramatically in a crisis situation, as shown by natural and man-made disasters, as well as the COVID-19 pandemic [
5,
6].
In our work, mood is considered as a complex long-lived emotional phenomenon, the main component of which is emotional tension. Due to its duration, unlike the emotion itself, mood has long-term dynamics. There is some evidence that collective sentiment is subject to seasonal fluctuations, with fluctuations occurring from year to year. If this is true, then the seasonality of public sentiment should be taken into account in socio-political practice.
The focus of the study is on emotional tension as the simplest component of collective mood, and we do not consider emotional investment in mood in this paper. The purpose of our study is to answer the following questions:
What methods are applicable to detect patterns of variation in multiple assessments of a population’s psychological states when observed over time?
Do collective emotional tensions in reality have seasonal variations that can be tracked through social media content analysis?
To study the seasonality of emotional tension, we used text comments from users of the social network VKontakte. In particular, we collected data from the largest communities dedicated to local news and events in a small town in the Nizhny Novgorod region of Russia.
Modern monitoring tools allow for the collection and storage of data in the form of time series. Time series contain the necessary information about the dynamics of the processes generating them. General methods of time series analysis, as a rule, make it possible to answer many of the questions related to the nature of the occurrence and seasonality of the processes being studied.
The distinctive features of the time series obtained as a result of monitoring social networks are discreteness, non-stationarity, and high sensitivity to data changes, which complicates their modeling and the use of traditional methods of analysis [
7,
8,
9,
10,
11].
We used a combined approach, which is more typical of data science. We combined different data analysis techniques into a single workflow. In this approach, we do not model the time series but take its values as the domain to define the functions that are used for analysis, thus eliminating the problem of data sensitivity.
The workflow includes steps such as a statistical method for categorizing data, exploratory data analysis (EDA), feature selection and identification of common patterns according to a new target variable, aggregation of data to model seasonal changes, identification of typical data properties through clustering, analysis of cluster properties, formulation, and the validation of seasonality criteria. As part of the workflow, an approach to modeling seasonality in a class of time-aggregated models is proposed, and the conditions for their compliance with the data and the criterion of seasonality on a specific dataset are formulated.
2. Related Works
Certain researchers have noted that conceptual inconsistencies have hampered progress in the field of mood research. Some streamlining of terminology and differentiation of emotional phenomena—which, in many NLP studies, are arbitrarily referred to as intentions, beliefs, feelings, emotions, mood, and sentiment—could play a significant role in improving the efficiency of analysis [
12]. In particular, ref. [
13] emphasized the difference between background sentiment (or mood) and rapid shifts in sentiment (or emotion), as well as the impossibility of accurately identifying sentiment shifts because the patterns of background sentiment evolution are largely ignored by existing methods.
The following is a summary of three issues related to mood research: the characteristics and structure of mood; the mood markers used in NLP; and mood duration and dynamics (including methods for recording mood swings).
Currently, the emotionality of online content is mainly studied in the form of transient emotions, while the background mood remains out of sight. Mood and emotion differ in many dimensions such as clarity, duration, intensity, stability, causality, and control [
14]. For example, emotion is more intense, but mood is longer in time; emotion is triggered by a specific event or incident, but mood does not necessarily need a contextual stimulus; and mood is strongly influenced by several factors, such as environment, physiology, or mental state [
12]. In addition, emotion is primarily associated with positive valence, while mood is primarily associated with negative valence [
14].
Under the names of popular forms of sentiment [
15], aggregate mood [
16], collective mood [
17], collective sentiment [
18], background sentiment [
13], etc., two types of mass mood studies can be distinguished: (1) where mood is analyzed in terms of positive and negative polarity, and the mood dynamics are related to the movement between the poles [
18,
19,
20,
21]; (2) where mood is studied as a set of feelings involving more than one emotion [
22], and the mood dynamics are seen as changes in the corresponding emotions [
23]. In such approaches, mood does not differ from emotion in duration and is labeled as a rapidly changing attribute [
16] with a minute [
24], hourly [
25], daily, or—at best—weekly cycle of change [
26,
27,
28]. But, psychologically, it is more correct to say that mood is stable for several weeks, which provides a basis for calling it a “chronic” emotional state [
29].
In terms of duration, mood can be recognized as a psychological phenomenon with an annual cycle of change. Indeed, it is believed that mood and related behavior are strongly dependent on the time of year [
30]. A strong argument for the seasonality of emotional fluctuations is Seasonal Affective Disorder (SAD)—a recurrent type of major depression. Typically, SAD begins in the fall and continues through the winter months. Less commonly, SAD causes depression in spring or early summer. Symptoms consist of a sad mood and low energy [
31].
In 2012, the prevalence of diagnosed SAD ranged from 1% to 10% of the global population, and, in temperate zones, from 3% to 10%. Subsyndromal SAD with blurred symptoms was found in 6% to 20% of the temperate population [
32]. The minimum percentage of people with syndromic and sub-syndromic forms of SAD in Russia as a whole was at least 9%. We do not know how this ratio has changed over the past decade, but it is unlikely that its decrease, if any, has been significant. In addition, in the fall and spring periods, exacerbations of other mental illnesses occur, which gives an additional surge of depressive, asthenic, neurotic, and hypochondriacal symptomatology in the patient population [
30,
33]. Thus, it is difficult to accurately estimate how many social media users suffer from various endogenous seasonal mood swings and how much they contribute to the total amount of content generated.
Many of the studies reviewed in [
34] empirically proved that seasonal mood swings are common in the general population as well. For example, nearly 50% of non-depressed people reported experiencing some depressive symptoms in winter, and it seemed that almost everyone had the most happiness in spring; however, it was also found that worthlessness, suicidality, and aggression have a significant connection with the seasons [
34,
35,
36,
37].
In terms of user-generated content, an analysis of 509 million tweets written by 2.4 million people in 84 countries showed that a shorter day length is associated with less positive sentiment in tweets [
38], and another analysis of 800 million tweets in the UK revealed peak sadness in winter [
39]. A study of Russian users’ search queries, using the Google Trends application, for “depression”, “anxiety”, “panic attack”, etc., showed that seasonal variations in web searches repeat the spring–autumn peaks and summer–winter valleys of depressive disorders and anxiety–depressive disorders [
40]. In contrast, the study of [
18] did not statistically support seasonal sentiment changes in structurally stable Twitter communities. The authors suggested that, when sentiment in a community temporarily deviates strongly from its normal level, it can usually be associated with a significant identifiable event that has affected the community, sometimes an external news event—in other words, the detected spikes are emotion-dependent rather than mood-dependent. Thus, it is still unclear whether seasonality is more or less intrinsic to the general population, as some researchers claim, or whether it can only be detected in fairly specific groups, as others believe.
For NLP, sentiment remains a very noisy signal due to the subtlety of human language [
18]. Indeed, many effective tools have been proposed to analyze sentiment in social media based on machine learning or lexicon [
41]. However, although mood word lists, idioms, emoticons, negation words, linguistic rules, and mood polarity classification algorithms [
23] have been used to extract emotions from user content, negation, irony, metaphorical, and contextual ways of expressing attitudes interfere with the analysis results.
In addition, the textual expression of mood is colored by a person’s peculiar vocabulary and style, as well as by the social context, including social norms, history, and common understanding [
15]. As for machine learning, recently, there have been doubts about its direct suitability for solving many of the problems of the socio-humanities in general and text analysis in particular [
42,
43].
In contrast to approaches to mood as a generalized bipolar emotion or as a combination of emotions, our study treats mood structurally, as a variable complex of emotions and emotional tension [
44]. Emotional tension is a less significant and well-defined component of mood than emotion, and it is experienced as a state ranging from apathy to agitation [
45]. In a social context, increases in public emotional tension in the form of mass forms of hostility, social anxiety, panic, hysteria, and aggression are associated with irrational collective behavior, such as social protest [
46]. For our purposes, we rely on the tradition of assessing emotional tension as a component of mood, and this is embedded in the widely used Profile of Mood States (POMS) questionnaire [
47]. Studies using the POMS questionnaire have shown that emotional tension is an attribute of both individual and group mood [
48,
49,
50].
The advantage of assessing emotional tension, rather than mood per se, is that it can be extracted by simpler and more reliable means than tone dictionaries and other lexical tools. Namely, a tense emotional state is revealed by the correlation of parts of speech—verbs, nouns, adjectives—and their forms in user-generated content [
51]. Such markers are less dependent on the topic and form of communication and are much less consciously controlled, which increases the reliability of the results. We investigated emotional tensions in social media using the Trager coefficient, or the ratio of verbs to adjectives in text.
The Trager coefficient was proposed to measure the level of a person’s emotional stability [
52]. Its norm is close to 1 (more precisely, 1.34 ± 0.05), and values above the norm indicate emotional arousal and other sthenic states. Low values indicate insecurity, dependence, and anxiety [
53]. Trager’s coefficient correlates with mental stress [
54], suicidality [
55], schizophrenia and clinical depression [
56,
57], expressed civil identity [
58], insincerity in communication [
59], etc., and it can also be used directly to assess emotional tension [
52]. We believe that the fluctuations in the Trager coefficient in user-generated content reflect the dynamics of emotional tension in a user’s mood structure.
The study of mood as a long-lasting emotional state requires special methods that allow for capturing and reflecting on the temporal patterns of ongoing processes. A useful tool for this purpose can be variation, the significance of which is now well known from the classical works of W. Shewhart, which laid the foundation for the widespread use of the statistical method for the continuous monitoring and diagnosis of ongoing processes [
60,
61] and statistical process control (SPC) [
62]. It is the variation in the Trager coefficient described above and mentioned in [
52] that is an indicator of the emotional state, so the use of Shewhart control charts are appropriate in this case.
Through using control charts based on variation values, one can divide the entire observation period into days differing in emotional intensity, and one can then attempt to identify seasonal patterns by examining the resulting categorical time series. This partition plays a key role in our scheme for seasonality searching since, after transforming the data into a categorical series, we can obtain a randomized sequence with more stable patterns than in the original data. The transition to a categorical time series makes it possible to confidently use both well-known methods for working with such objects, such as statistical methods [
63,
64,
65] and newly developed ones. A general theory of such series is currently being actively developed [
66,
67]; researchers propose various practical techniques depending on the software used [
68], some of which may be suitable for finding seasonality.
Different types of aggregation are often used to detect seasonality, i.e., by calendar period or by selected observations, within a sliding window [
69]. In the second case, the choice of periods is associated with an integer optimization problem (a review of methods for solving such problems is given in [
70]).
Seasonality modeling in the absence of comparative time series models is possible using clustering. Practical methods of clustering, in particular clustering based on model fitting, are given in [
69] and are more fully presented in [
70].
In our study, within the framework of the workflow, we identified the main problems that arise when determining the presence of seasonal changes in collective emotional tension. Using the example of a specific dataset, we present a possible way of solving them in this particular case. Moreover, because of the key transition to a categorical time series, it is possible to apply the abovementioned methods that are aimed at solving similar problems in the general case. Thus, we have answered the first question posed in this study.
5. Discussion
From a psychological point of view, two of the results were the most significant.
First, the obtained data confirmed the general seasonal dynamics of emotional tension to be traceable in the analysis of network communications. Indicators of emotional tension stability were statistically significantly higher in winter and summer than in spring and fall (see
Figure 2 and
Figure 10). We found that the indicator of collective emotional tension varied strongly from day to day more often in spring and fall than in winter and summer. The revealed dynamics corresponded well with the above-described trends of changes in emotional state, which were revealed in psychiatric and psychological practice or in the course of sociological surveys.
Second, we found the absence of spring and fall peaks in the dynamics of emotional tension in 2021. In contrast to 2020 and 2022 (where there were pronounced differences between the more stable winter and summer on the one hand, and the more volatile spring and fall on the other), the level of differences in emotion tension in 2021 remained relatively unchanged across all four seasons. The available data did not allow us to infer the nature of this equalization. The cause could be either constant fatigue and apathy or, conversely, constant excitement and overexcitement during the second year of the pandemic. Thus far, we can only point to an atypical pattern of seasonal dynamics of emotional tension in 2021 if we take the winter–spring and summer–fall differences as typical. Based on the idea of the endogenous nature of seasonal fluctuations of mood (and emotional tension as its component), we can assume that in the first year the dynamics are still intact, and in the third year it is somewhat restored.
A pandemic is a prolonged stressor that disrupts the normal life of the population and undoubtedly affects mass psychiatric conditions. At the outset of the pandemic, an excessive impact on mental functioning was identified and a further increase in psychopathologic symptoms was predicted [
78]. However, defense mechanisms (such as threat underestimation [
79] or humor [
80]) kept psychological states inert for some time. A possible basis for the recovery of mass mental functions is adaptation to prolonged stressors [
21]. Thus, the detected pattern of collective emotional tension can be explained as a result of the action of defense mechanisms in 2020, the disorder of adaptation of the population to a long-term stressor in 2021, and gradual adaptation to extreme conditions in 2022.
The presented scheme for determining seasonality may be of interest for various social practices. For example, by accumulating data on the severity of fluctuations in emotional tension in different regions, it will be possible to identify regions with an increased risk of chaotic mass behavior during periods of seasonal exacerbations. It is also useful to predict the possible deterioration of the collective emotional state in order to optimize the work of various social services that may face an increased flow of requests during unfavorable periods. It is possible to link the seasonality of emotional tension with the manifestation of mass somatic or mental disorders affecting the economic and social functioning of regions, etc.
Limitations and Future Work
This study raised many questions. We studied collective sentiment averaged over a large number of social media users. We do not know whether only users with pronounced emotional seasonal shifts affected the overall emotional tension in the network while others did not affect the tension at all, or whether all users contributed to some degree to the overall emotional tension online. To clarify this question, a special longitudinal study involving the identification of people with different emotional statuses is needed. For a meaningful characterization of “black days”, it is necessary to distinguish between days in which an instability of emotional tension is caused by a significant upward trend of the Trager coefficient (spikes of overexcitement) or a significant downward trend (spikes of apathy). Verification of the identified seasonal trends is possible both with the help of other methods for assessing emotional tension in online communication and by building up more texts for analysis. A promising direction for further research is to determine the emotional component of mood in addition to the assessment of emotional tension.
It should be noted that this study was conducted using data from one local social media community, which could potentially introduce bias. Thus, observations in other communities could show a different picture. In addition, this paper only considers the Trager coefficient to assess emotional tension, whereas our method could potentially be applied to other normally distributed psycholinguistic parameters. These limitations should be addressed in future work.