1. Introduction
Online consumption of video content is currently prevalent and is continuing to grow at a significant rate. This growing trend creates a natural opportunity for advertisers to integrate video ads with the primary video content in the form of in-stream video ads [
1]. However, this integration presents a challenge for advertisers to create ads that can effectively capture and maintain the attention of consumers. As a result, there is a growing interest in identifying factors that impact video ad performance [
2], particularly those that can help capture and maintain user attention [
3].
The problem is that advertising content that fails to catch consumers’ attention at first glance is most likely to be immediately skipped. Therefore, ad producers face the challenge of creating video ads in a way that decreases the likelihood of consumers skipping them. One factor that significantly contributes to this idea is consumer engagement. As consistently reported by different authors, consumers who are more engaged in video content are less likely to skip it [
4,
5]. Other studies in the field have also underlined the relationship between ad avoidance and cognitive categories closely related to engagement. For example, in [
6], it was found that ad content avoidance may be related to high and low arousal levels, evoked by the content. A similar observation was reported in [
7], where it was found that boring and unengaging content increases the intention of skipping ads.
The engagement in marketing content appears to be the primary factor influencing consumers’ tendency to avoid ads. Another crucial factor that impacts ad avoidance is the in-video length [
7]. The relationship between the rate of ad avoidance and in-video ad length has been discussed in several studies, which have identified a correlation between increased skipping behavior and longer marketing content [
7,
8]. Other research suggested that longer ads result in higher levels of disruption for goal-oriented searches [
9,
10].
Recent studies have shown that the acceptance level for extended videos has declined, and nowadays, the majority of consumers accept only extremely short video ads, with lengths of fifteen or even six seconds [
7]. Therefore, the current trend is to produce short video marketing content to better target consumers’ attention spans. On the one hand, this trend is positive for ad providers since they can increase the rate of presenting content to consumers without increasing the total cost of a campaign. On the other hand, some studies have shown that longer ads may be more effective and enhance brand recognition [
8]. Advertisements shortened to fifteen seconds can deliver similar results to thirty-second-long spots in terms of awareness and brand recall. However, thirty-second ads result in stronger persuasive and emotional effects [
11].
Hence, one advantage of longer video ads is the ability to convey emotions. This is important for ad providers because emotions are known as an essential factor for increasing consumer engagement, which directly translates to an increased effectiveness of video ads [
12]. The impact of emotions incorporated into video content on consumer behavior has been analyzed from various perspectives, including behavioral engagement [
12], content sharing [
13], and consumer attitudes [
14]. For example, recent studies have demonstrated a positive correlation between surprise, joyfulness, and the level of consumer attention and retaining target audiences [
15].
In addition to emotional tone, several other content features have been identified as significant factors in maintaining user engagement, such as language, linguistic style, subjectivity, and video category [
16]. In this paper, we present a study that analyzes another factor that may influence consumer engagement: the emotional style of the content. We assume that although emotions are crucial for sustaining consumer engagement in the presented content, emotionally uniform content has a lower potential to maintain consumer engagement for an extended period compared to content that elicits various emotions over time. Our assumption is based on the fact that emotionally uniform content may lead to attentional habituation, which is typical for continuous stimuli [
17]. According to [
18], the amount of attention paid to emotional content decreases as the time of presenting the content increases. Since the goal of affective habituation is to promote flexibility and prevent panic-like states, as suggested by [
19], responses to positive content decrease faster than those to fearful content, which is the result of a stronger response to signals that inform users about dangerous situations [
20]. From a neuroscience point of view, the habituation to emotional content is associated with decreased activity in the amygdala and prefrontal cortex, two brain regions that are responsible for processing and regulating emotions [
21].
As was mentioned above, attentional habituation occurs as a consequence of exposure to emotionally uniform content. In this paper, we postulate that manipulating the emotional content of a marketing video by introducing changes in the emotions elicited could impede the development of habituation and thus prolong consumer attention compared to videos with uniform emotion. In other words, we hypothesize that the use of mixed emotions could increase consumer engagement with video content by mitigating the habituation effect. If our hypothesis holds, the use of mixed emotions in video advertisements may prove to be a crucial factor in decisions to extend the length of video ads, which is associated with increased empathic response to ad content and, consequently, with an increased effectiveness of marketing campaigns [
22]. Additionally, according to [
10], while uniform emotional content may increase ad-skipping rates, complex emotional responses such as humor or nostalgia can enhance performance and sustain users’ interest.
Various studies have analyzed the impact of mixed emotions on ad performance. Concerning ad recall, mixed emotions were found to be more difficult to accurately recall than uniform emotions, and it was also observed that they might induce a quicker decay effect [
23]. However, in terms of impact on user behavior, results are dependent on the area of application and content type. For instance, in the case of viral content, intentions to follow videos with mixed emotions were shown to be lower than those for videos with uniform emotions and positive tones. Users were simply less interested in emotionally unstable content [
24]. Conversely, the authors of a recent study related to word-of-mouth marketing (WOM) reported that a mixed emotional appeal is more effective than pure happiness when combined with third-person narration [
25]. Additionally, it was observed that mixed emotional content has the potential to deliver a more positive user experience than pure positive content. For example, humorous elements mixed with fear increase ads’ effectiveness through greater persuasive potential [
26]. Earlier studies also showed that mixed emotions could be used to increase ad effectiveness through persuasive appeal [
27]. For instance, negative scenes followed by positive content lead to conflicting psychological states. In the sector of pro-environmental luxury companies, mixing happiness and sadness increases intention to purchase compared to happiness only [
28]. Lastly, mixed emotions incorporated into video content have a higher impact on post-message consumers’ behavior than uniform emotions [
29].
The studies mentioned earlier have examined the impact of mixed emotions on video ad effectiveness using measures such as persuasion, recall, and intention to purchase. However, the objective of the present study was to investigate this relationship in terms of consumer engagement using the most direct measure possible: the direct analysis of brain activity recorded during the watching process. The study analyzed two types of marketing videos: those with a clear and uniform emotional message and those with more complex emotional content. It was assumed that due to habituation, uniform videos would maintain viewer engagement for a significantly shorter period than videos with mixed emotional content. Thus, the primary hypothesis of the study was that marketing videos with a mixed emotional content maintain subjects’ engagement for a longer period than videos with consistent emotional messages.
Methods used for determining the level of subjects’ engagement can be broadly classified into three categories: (i) self-reporting questionnaires, (ii) behavioral measures based on task performance, and (iii) measures based on the physiology of users [
30]. The recognition of engagement levels based on self-reporting and behavior-based information tends to be delayed, sporadic, and intrusive [
31]. Additionally, performance-based measures can be misleading, since multiple degrees of engagement might correspond to the same level of performance [
30]. In contrast, physiological measures are available at any time and have significantly shorter delays measured on a scale of seconds. They are also independent of the subjects’ opinions, claims, and imaginations, which allows for the objective evaluation of engagement levels. Although different physiological measurements, such as electrocardiography (ECG), galvanic skin responses (GSRs), electromyography (EMG), or eye movements (ET) [
32], have been studied, measurements reflecting the subject’s brain activity, such as electroencephalography (EEG) and near-infrared spectroscopy (NIRS) [
30,
31,
33], provide the most direct information about a subject’s engagement level.
Engagement indexes based on features derived from electrical brain activity have been used to assess subjects’ engagement levels in many different types of tasks, both cognitive and physical. For example, in [
34], different engagement metrics based on EEG activity were studied to assess the engagement level for a set of cognitive tasks (grid location, mental arithmetic, forward and backward digit span, and trail-making task). A similar study, though only focused on three tasks (mental arithmetic and forward and backward digit span), was reported in [
30]. On the other hand, in [
35], engagement level was used as a predictor that enabled the recognition of cognitive task demands. The five tasks applied in that study were baseline with eyes opened, multiplication, letter composition, geometric figure rotation, and visual counting.
In addition to psychological experiments, EEG engagement metrics have been used in many real-life studies. For example, in [
36], EEG engagement estimates were used to predict the success or failure of solving math problems. Similarly, in [
37], the EEG engagement ratio, along with other indexes, was used to monitor the mental state of pilots in a flight simulator and an actual light aircraft. In [
38], players’ engagement was evaluated while playing the Super Meat Boy platform game, and in [
39], engagement level was monitored to improve stroke rehabilitation effectiveness.
The EEG correlates of engagement level are well documented in the neurophysiological literature [
40]. It is generally believed that an increase in beta activity reflects a higher degree of alertness and greater engagement in the task, whereas an increase in alpha and/or theta activity reflects less alertness and lower task engagement [
41]. Based on this assumption, various engagement indexes have been established and tested in the underlying studies [
41,
42,
43,
44]. Among these indexes, the most commonly used is the engagement index (EI), which directly correlates engagement with beta activity and inversely correlates engagement with alpha and theta activity [
39,
41,
42,
45,
46]. This index was also applied in our survey. Following the methodology outlined in [
44], we calculated the index values using EEG signals recorded over the parietal and frontal cortex (specifically, at F7, F3, Fz, F4, F8, P3, Pz, and P4).
To verify our research hypothesis, we recruited 13 participants to watch a series of eight videos with varying emotional content. While five videos featured emotionally stable content (including two happy, two sad, and one neutral video), the remaining three videos were designed to elicit diverse emotional responses across different scenes. The main outcome of the experiment indicated a notable decrease in participants’ engagement over time when viewing emotionally stable videos, regardless of emotional direction. Conversely, despite their longer duration in comparison to uniform videos, those with mixed emotional content evoked a progressive increase in participants’ engagement levels over time.
This paper is organized as follows.
Section 2 outlines the experimental setup and details the method used for calculating the engagement index. In
Section 3, we present the results of the experiment. In
Section 4, we provide a detailed discussion of the results in the context of previous research in the field. Finally,
Section 5 concludes the paper.
3. Results
Prior to analyzing the EEG data collected during the experiment, we first examined the emotional responses provided by the subjects to find out whether they aligned with our assumptions.
Table 4 presents the total number of subjects reporting each type of emotion for each video. It is important to note that since subjects could choose more than one emotional response for a given video, the total number of emotions reported by all subjects for each video may have exceeded the number of subjects.
The final column of
Table 4 reveals that for videos classified as ‘mixed’, much more emotions were ticked by subjects compared to videos classified as ‘consistent’. This trend is particularly noticeable for videos V6 and V7, for which the total number of reported emotions was twice as high than all ‘consistent’ videos (apart from V5). Furthermore, while the dominant emotions for ‘consistent’ videos were indeed consistent (either sad or happy/calm), this was not the case for ’mixed’ videos. For instance, in videos V6 and V7, sadness (a low-arousal emotion associated with withdrawal motivation) was mixed with anger (a high-arousal emotion associated with approach motivation). In the case of V6, sadness and anger were accompanied by surprise, whereas in V7, they were accompanied by fear and disgust. An even greater emotional mix was observed for V1, where subjects reported both happiness and sadness.
Upon comparing the emotions reported by the subjects to the emotions assigned to each video based on the soundtrack analysis, we found that for most ’consistent’ videos, the emotions reported by the subjects were consistent with our assumptions. However, for ’mixed’ videos, our predictions were not as accurate. For example, in V1, we correctly predicted sadness and surprise, but subjects reported happiness instead of anger. In V6, we forecasted anger, happiness, and calmness, but subjects reported anger, sadness, and happiness. Finally, in V7, we accurately predicted fear, sadness, and anger, but subjects also reported disgust.
We commenced the analysis of the EEG signals by examining whether there was a significant difference in the mean level of engagement among video events. As the distribution of data for some events was skewed towards the right, we transformed the data using a logarithmic function. Subsequently, Lilliefors normality tests were performed with a p-value of 0.001, indicating that the hypothesis of normality for each group (video) could not be rejected. ANOVA statistics were then applied to test the significance of differences among conditions. Given that the data streams were unequal for different videos, a one-way unbalanced ANOVA design was used with a factor VIDEO (eight levels). The significance level was set to 0.01 for this and all subsequent tests reported in this paper.
The test returned
F = 33.46 with a
p-value = 0, indicating that not all group means were equal. To identify which pairs of means were significantly different, we conducted a set of
t-tests with Bonferroni correction.
Table 5 presents the outcomes of the post hoc tests performed for each pair of video events. The pairs of video events with a significant
p-value (
p-value < 0.01) are marked with an asterisk in the table. Additionally,
Figure 7 illustrates the comparison of means calculated for each video. To ensure a reasonable interpretation of group means, the means shown in
Figure 7 were calculated using the original, instead of
-transformed, data.
The pairwise comparisons showed that subject engagement was not consistent across all videos. The highest engagement level was recorded for videos V1 (mean = 0.534) and V3 (mean = 0.495). However, while the engagement level for V1 was significantly higher than for most other videos (V2, V5, V6, V7, and V8), for V3, the data dispersion was so large that a significant difference was observed only for two videos (V5 and V7). Additionally, significant differences were noted for video V6 (mean = 0.451)—the engagement level was significantly higher than two other videos, V5 and V7. Although the engagement level observed for video V4 was slightly higher (mean = 0.455) than that for video V6, no significant differences were found for video V4. The smallest engagement levels were observed for videos V5 (mean = 0.420), V7 (mean = 0.420), and V8 (mean = 0.429).
The results presented in
Figure 7 did not allow for any conclusions about the differences between the videos with the mixed and consistent emotional loads since the two videos with the highest (V1 and V3) and lowest engagement levels (V5 and V7) belonged to opposite groups (V1 and V7—mixed; V3 and V5—consistent). Therefore, to provide direct insight into the differences between the videos with mixed and consistent emotional contents, we performed an additional one-way ANOVA test with a factor CONTENT of two levels: mixed and consistent. The classification of videos into one of the two groups was performed according to the labels presented in
Table 3. Hence, group 1 (mixed content) contained three videos (V1, V6, and V7) and group 2 (consistent content) contained the remaining five videos (V2, V3, V4, V5, and V8).
The test returned
F = 1.137 with a
p-value = 0.286, indicating no statistically significant difference in engagement level between videos with mixed and consistent emotional contents. A comparison of means is presented in
Figure 8.
The global analysis presented above compared engagement levels across videos or content conditions during the entire analyzed period. Although it provided some insight into the subjects’ engagement in the presented content, its utility is limited. For an ad provider, merely capturing a consumer’s attention briefly at the beginning of a video is insufficient; it is crucial to sustain their interest until the end. A high engagement level at the end of a video increases the likelihood that the message will be retained in the consumer’s mind for longer. An even more favorable scenario for an ad provider is when consumer engagement not only remains high at the end of a video but continues to grow throughout the ad presentation. Therefore, to obtain more meaningful results, engagement levels need to be analyzed over time.
To enable time analysis, we constructed a global (i.e., averaged over subjects) regression line for each video event. We tested the significance of the regression line slope using a one-sample
t-test. The trend lines generated for each video event are depicted in
Figure 9. As shown in the figure, a statistically significant trend of consumer engagement was observed for six out of eight videos. The slope of the regression line was negative for four videos (V2, V3, V4, and V8) and positive for two videos (V6 and V7). To determine how many individual results supported these global findings, we constructed individual trend lines for each subject and each event for those six statistically significant global regression lines.
Table 6 and
Figure 10 summarize this analysis and provide the number of subjects whose trend lines had the same direction as the global regression lines from
Figure 9. As observed in the table, not all subjects responded similarly to each video. Generally, a more consistent response was observed for videos with positive trends.
Figure 9 illustrates a much more pronounced distinction between videos with mixed and consistent content compared to the global analysis presented in
Figure 7. We can directly observe that for all videos with statistically significant regression line slopes, the trend is negative for consistent content and positive for mixed content. To reinforce this conclusion, we performed an analysis similar to that conducted for global comparison (
Figure 7); i.e., we compared the trend averaged over all videos of consistent vs. mixed content. As each video had a different duration, we could not perform this task directly by averaging the trend lines. Instead, we used the coefficients of the regression lines presented in
Figure 9 and recalculated the trend lines for the same number of samples for each video. To determine a fair number of samples, we computed the mean length of all videos. Next, we averaged the trend lines for videos with mixed (V1, V6, and V7) and consistent (V2, V3, V4, V5, and V8) contents and obtained the trend lines presented in
Figure 11. As shown in the figure, the slope of both trend lines was statistically significant. It is worth noting that the slope of the negative trend found for videos with consistent content was far more prominent than the positive trend found for mixed content. This is quite reasonable since, although it is challenging to sustain consumers’ engagement for an extended period, it is extremely easy to lose their attention at any moment.
4. Discussion
Strategies developed to keep the user involved in video content often tend to evoke a specific change in a consumer’s attitude or emotion. Most research in this area has been focused on individual emotions [
57], behavioral engagement [
12], and consumer attitudes [
14]. However, recent studies have shown that this trend may not always be the most effective. For example, research on willingness to watch videos [
10] found that content focused on evoking basic emotions did not increase user engagement and did not reduce ad-skipping rates. The results of our study support these findings. Regardless of whether the video had a strongly positive (happiness) or negative (sadness) emotional load, the initial short-term increase in user engagement was followed by a gradual decrease. This effect can be explained by the phenomenon of emotional habituation [
18], where the response to continuous stimuli decreases [
17]. In our research, the habituation to positive content occurred faster (after 4–5 s for V3 and V8) than to negative (sad) content (after 12–13 s for V2). This result is consistent with earlier studies showing reduced emotional habituation for signals that inform users about negative or dangerous situations [
20]. For these types of signals, the habituation phenomenon is reduced but nevertheless also exists [
19].
Our study also showed that for videos with mixed emotional loads, the subjects’ engagement fluctuated over time but maintained an upward trend. This trend persisted despite the fact that these videos were much longer than those containing a single emotional message. Similar patterns were reported in earlier studies. This upward engagement trend may be due to the fact that mixed emotional messages are considered less tedious and boring, and therefore do not evoke the intention to skip ads, which is often observed in response to boring content [
7].
Content that induces mixed affective experiences has the potential to deliver better performance than pure positive content. This better performance may be a result of higher arousal levels related to higher engagement, as low arousal levels have been reported to lead to lower content performance [
6]. Additionally, videos with a mixed emotional message have a greater impact on post-message behaviors than videos containing a single emotion [
29].
As previously mentioned, the length of videos with mixed emotional messages used in our experiment did not negatively impact the subjects’ engagement. Our study showed that while all short videos with consistent emotional messages had a declining engagement trend, videos with mixed messages were longer but had a rising engagement trend. Two videos presented a statistically significant trend, while one video presented an increasing but not statistically significant trend. Although video V7 was between three and thirteen times longer than the others, it was able to maintain the participants’ attention for the entire duration of three and a half minutes. These results suggest that using content loaded with mixed emotions can allow for the extension of video length, resulting in higher ad persuasion [
11]. This greater persuasion may be a result of the integration of positive and negative emotions [
26].
The ability to extend the duration of video content and keep users engaged for longer is a potentially valuable opportunity. However, it also poses a challenge for in-video marketing due to observed consumer annoyance and the intention to skip longer ads [
7], perceived as disruptive for goal-oriented searches [
9,
10]. Negative responses to longer videos have resulted in marketers shortening video content to achieve an acceptable maximal user length [
7]. Currently, the duration of marketing video content is typically as low as fifteen or even six seconds [
7]. However, this strategy has a severe consequence: short videos have lower potential to elicit emotions or remain in memory. On the other hand, longer videos have better effectiveness and enhance recognition [
8]. Advertisements that are shortened to fifteen seconds can deliver similar results to thirty-second spots in terms of awareness and brand recall but have lower potential for persuasion and emotion [
11]. Strategies based on mixed emotions can help increase video length, as engaged users are more likely to watch video content and not skip it as quickly as in the case of single-emotion videos [
4,
5].