Dynamic Difficulty Adaptation Based on Stress Detection for a Virtual Reality Video Game: A Pilot Study

Orozco-Mora, Carmen Elisa; Fuentes-Aguilar, Rita Q.; Hernández-Melgarejo, Gustavo

doi:10.3390/electronics13122324

Open AccessArticle

Dynamic Difficulty Adaptation Based on Stress Detection for a Virtual Reality Video Game: A Pilot Study

by

Carmen Elisa Orozco-Mora

^1,†

,

Rita Q. Fuentes-Aguilar

^2,†

and

Gustavo Hernández-Melgarejo

^2,*,†

¹

School of Engineering and Sciences, Tecnológico de Monterrey, Av. Gral. Ramón Corona No 2514, Zapopan 45201, Mexico

²

Institute of Advanced Materials for Sustainable Manufacturing, Tecnológico de Monterrey, Av. Gral. Ramón Corona No 2514, Zapopan 45201, Mexico

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2024, 13(12), 2324; https://doi.org/10.3390/electronics13122324

Submission received: 7 May 2024 / Revised: 7 June 2024 / Accepted: 10 June 2024 / Published: 14 June 2024

(This article belongs to the Special Issue Human-Computer Interaction and Artificial Intelligence in VR/AR/MR Application)

Download

Browse Figures

Versions Notes

Abstract

:

Virtual reality (VR) is continuing to grow as more affordable technological devices become available. Video games are one of the most profitable applications, while rehabilitation has the most significant social impact. Both applications require a proper user evaluation to provide personalized experiences that avoid boring or stressful situations. Despite the successful applications, there are several opportunities to improve the field of human–machine interactions, one of the most popular ones being the use of affect detection to create personalized experiences. In that sense, this study presents the implementation of two dynamic difficulty adaptation strategies. The person’s affective state is estimated through a machine learning classification model, which later serves to adapt the difficulty of the video game online. The results show that it is possible to maintain the user at a given difficulty level, which is analogous to achieving the well-known flow state. Among the two implemented strategies, no statistical differences were found in the workload induced by the users. However, more physical demands and a higher frustration were induced by one of the strategies, validated with the recorded muscular activity. The results obtained contribute to the state of the art of DDA strategies in virtual reality driven by affective data.

Keywords:

virtual reality; dynamic difficulty adaptation; stress classification; first-person shooter; emotion recognition; ECG; EMG

1. Introduction

Virtual reality technology continues to gain widespread acceptance in different fields as technological devices become more affordable. As an example, the total revenues of the VR market have shown increases of up to 500% between 2017 and 2022 (approximately from USD 2 billion to 10 billion), with the electronics and hardware industries being the more profitable ones [1]. Moreover, there is a clear increasing tendency in research papers over the last ten years regarding applications of training and education (≈17%), healthcare (≈28%), industry applications (≈10%), and human–machine interactions (≈35%), among others (≈10%) [2,3]. It is essential to note the percentage of human–machine interaction applications (more than a third), since these are mainly focused on continuously exploring human requirements and experiences to improve VR technology.

An example of human–machine interactions worth mentioning is VR video games. VR gaming has such importance that it represents around a quarter of the worldwide revenues in VR (approximately USD 2.5 billion) [4], and it is still growing. VR video games provide enhanced interactive capabilities to the users by implementing high-resolution head-mounted displays (HMDs), haptic interfaces, 3D spacial audio sources, and nonconventional gamepads [5,6]. Adding these technological elements creates more vivid and plausible experiences, for which studying and analyzing the corresponding human–machine interactions is a significant matter in the field. Nevertheless, nowadays, it still needs to be determined how to provide better experiences for VR users while maintaining the same immersive elements in a defined task. For example, during a VR video game session, changing the input elements (joystick, keyboard) to improve immersion is not suitable.

The above has been partially tackled in the domain of video games by so-called dynamic difficulty adaptation (DDA) strategies. DDA is an approach used to dynamically adapt, update, or modify the difficulty level of the video game according to the player’s performance or skill level [7,8,9]. This approach started to gain popularity and usefulness in the late 2000s [10,11], and the design a wide variety of strategies is mainly based on the fulfillment of flow theory. The flow theory proposed by Csikszentmihalyi [12] states that a user can achieve a so-called flow state, which is a mental state experienced when the user feels immersed and wholly involved in the task execution without being bored or stressed. This concept is analogous to the state of presence in VR, since both seek to provide the user with a high degree of engagement so that they forget their surrounding real world. Thus, DDA refers to designing a difficulty adjustment that avoids excessively challenging experiences or tasks that are too easy. Among the several applications of DDA, one can find the difficulty adjustment of popular video games such as Super Mario Bros or Tetris [7,8], multiplayer schemes in battle arenas or third-person shooters [13,14], educational math skill platforms [15] and horror games [16].

Nonetheless, DDA has only partially tackled some issues of evaluating and personalizing video games and, in limited cases, VR games. Previous examples of DDA lack physiological/affective-based models and instead rely on adaptive algorithms that use game performance metrics to evaluate experiences through questionnaires later. Unfortunately, more is needed to solve the main point of improving/enhancing video games online according to natural responses such as the ones detected with effective computing methods. In addition, a significant subject that is not widely explored is the testing and comparison of different variables that can act to control the difficulty of the video game. Current studies consider a single variable approach, but no studies test different variables separately or in combination to analyze their potential to enhance the user’s experience and flow state.

In summary, for VR video games and DDA strategies driven by affective data, there are several opportunity areas to explore to implement personalized or online enhanced experiences. In that sense, this pilot study aims to implement a pair of DDA algorithms to manage the difficulty level of a first-person shooter (FPS) video game online. The heart rate (HR) of the user and the activity of two muscles of the arm used to shoot during playing are acquired and evaluated by a machine learning classifier to predict the user’s stress. Later, the stress prediction data are used within two algorithms to change the game’s difficulty. The first algorithm is oriented to change the difficulty according to the number of enemies deployed in the scene. In contrast, the second one modifies the health level of the enemies (increasing or decreasing). Later, the NASA-TLX (Task Load Index) was used to assess the players’ workload in each scenario and evaluate which variable had more impact on users.

This work’s main contribution is divided into two parts. First, there is the closed-loop scheme implemented for the pair of DDA algorithms. This includes the use of machine learning tools that estimate the users’ stress level online according to physiological data. It is important to mention that the machine learning classifiers are trained with a previously obtained database from our previous study. Thus, this closed-loop implementation significantly complements previous research. Second, the effects of both algorithms on the users’ workload are compared to explore the impact of each strategy. This kind of comparison has not been carried out before, and our results provide the first step in exploring this kind of approach for game difficulty purposes.

The remainder of this study is organized as follows: Section 2 describes some works related to DDA and affect detection to visualize the current challenges and results associated with the topic of interest. Section 3 describes the materials and methods for the experimental tests carried out. Section 4 presents the obtained results for the proposed algorithms, while Section 5 presents a discussion on such results. The conclusions of the work are presented in Section 6, while Section 7 includes this work’s limitations and proposed future work.

2. Related Work

Currently, there is a limited set of studies exploring the intersection of DDA and affect detection according to physiological data (which is the main topic of this work). Some of these studies are discussed below, considering that some provide good results about DDA but still need to improve regarding the VR setups.

First, there are the studies of Reidy et al. [17] and Nacke et al. [18], where facial EMG signals were used for DDA purposes. First, Reidy et al. present a cognitive training paradigm using multi-room museum and supermarket environments. They hypothesized that the user would perform better in real-world scenarios by performing tasks in highly immersive environments. For affect detection, a set of facial EMG signals was acquired, which allowed them to determine specific patterns associated with the valence and arousal of the users. They provide valuable information by integrating a head-mounted display with facial sensors and showing improved performance in the museum environment, as expected. Even so, the emotion classification algorithm took up to 45 s to update the affective state. On the other hand, Nacke et al. explore the effects of sounds and music in an FPS video game. They determined changes in users’ arousal through changes in skin conductance levels and patterns of the facial EMG. The results show that better conditions for players are video games with sound accompanied by music to reach the flow state. Other studies involving different signals are the ones developed by [13,19]. The primer involves a psychophysiological model of users in a VR police training scenario. They consider EEG and heart rate variability (HRV) to determine the responses associated with resting states and active shooting tasks. The users’ stress level was computed within such responses to modify the shooting target dynamics (static vs. mobile). Despite the stated limitations (small sample size, instrumentation, and model type), positive results were found, encouraging the extension of the experiments and testing different modeling strategies. The latter study concerns the use of EEG-triggered signals to determine the short-term excitement of users in a third-person shooter game. They implement a scheme of four playable modes that depend on the excitement level of each user, which leads to more interesting sessions, as validated by questionnaires.

Further, Montoya et al. [20] implement a muscular fatigue estimation approach to monitor users’ isometric bicep contractions while playing a serious game for rehabilitation purposes. The results allowed them to determine that most of the muscular contractions were made according to the recommended physical intensity, accompanied by an improvement in motivation during the rehabilitation process. Their results are promising; however, they use a keyboard as an input element, while HMD was not considered, affecting the visual immersive experience, which can drastically change the presented results compared to a proper VR setup. Also, [16] elaborates on a first-person horror game with seek-and-find mechanics. This study is remarkable since they consider three parameters for difficulty adjustment, all driven by the heart rate signal. The main idea of DDA was to test an inverse feedback scheme to increase the difficulty if the user performance is poor to evoke challenge and motivation. Their results were positive and validated with some questionnaires; however, the three variable parameters were used simultaneously, so it is impossible to know which of them has a more substantial impact. Finally, the study presented by Liu et al. [10] uses the most similar strategy to ours. Their work presents an anxiety detector that combines ECG, EMG, EDA, and photoplethysmography signals through a regression tree algorithm. Its DDA strategy is applied to performance- and affective-based data and successfully modifies the difficulty level of a Pong Game while improving the players’ performance. A drawback of this work was the long sessions to train the machine learning model. Moreover, it needs to be clarified how the three difficulty levels were obtained according to the combination of several elements assessed by a nine-point Likert scale. An additional drawback is the fact that it does not use VR.

To summarize the above studies, Table 1 presents the principal elements of interest. The last row includes the elements of the present study in order to contrast and point out our particular experimental elements. Most of the current literature studies present promising results with some drawbacks, so it is essential to continue extending the results in this area. Among the experimental setups, comparing different variables was not found to change the video game’s difficulty in order to evaluate their impact on users. In that sense, this work seeks to answer the following research questions:

RQ1. Is it possible to maintain users at a specific stress level by employing DDA algorithms that rely on online affective data?
RQ2. Will there be statistically significant differences between users’ workloads for the two DDA strategies?

The next section will present the experimental setup used to answer both these questions.

3. Methodology

This section presents the different materials and methods concerning the FPS video game, the stress classification algorithm, the test subjects and their physiological signals, the DDA algorithms, and the experimental setup for validating the strategies.

3.1. Virtual Reality Video Game

This study implements a modified version of the FPS video game developed by Orozco-Mora et al. [22]. This game was developed in Unity 3D (version 2019.3.8f1), integrating several assets and game mechanics designed by Kenney [23] and Quentin Valembois [24]. The game consists of an FPS gallery experience where the player has to kill zombie enemies to stay alive throughout the game. To shoot the gun, the player must extend their arm, aim at the enemies, and press the back trigger of the right controller. Gun reloading occurs automatically, and there is no limit to the number of bullets available. Originally, the game had three difficulty levels as it was used to induce different stress levels and label the acquired physiological signals. Moreover, in the original version, the variable modified to regulate the difficulty was the number of enemies, as it directly impacts the workload, which correlates with the player’s stress level [25]. This video game environment simulates a scary scene that consists of a night forest cemetery background, a couple of lanterns, zombies appearing from the left, center, and right of the first-person camera, and zombie sounds. For the current study, two variables for difficulty adaptation were considered: the variable spawning rate (number of enemies) and the variable damage rate (health of the enemies). Both will be adequately described in Section 3.4. Figure 1 depicts two screenshots of the video game modalities. Visual and auditory stimuli are essential factors in emotion induction in virtual environments [26,27,28], for which the immersive Oculus Quest 2 head-mounted display was used in this study as the display element.

3.2. Stress Classification

To carry out the affective detection, which in this case involves stress classification, the dataset obtained in our previous study was used [22]. From now on, this dataset will be called Offline Dataset for Stress Classification and consists of 9200 instances with 25 features each. The study recorded 400 s of the Extensor Digitorum Communis (EDC) and Flexor Carpi Radialis (FCR) muscles, as well as the electrocardiographic (ECG) and electrodermal activity (EDA) signals of 23 volunteers. The data consisted of a hundred seconds for each difficulty level (three) and a hundred seconds for a resting stage. The authors reported no statistical difference between level 1 and level 2; the same case was found for level 2 and level 3. Despite this, statistical differences and clear tendencies were noted between level 1 and level 3. In order to obtain proper results in the classification, this study considers only data from the resting stage, level 1, and level 3 from the Offline Dataset for Stress Classification. For this study, from this point forward, the levels are labeled as resting level (level 1), stress 1 (level 2), and stress 2 (level 3).

The scikit-learn Python library [29] was used to fit and test four different classification models. These were selected as they are the most used in emotion classification studies. These were a Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel, a K-Nearest Neighbor (KNN) classifier, a Random Forest Classifier (RFC), and a Multi-layer Perceptron classifier (MLP). It is important to mention that a test was conducted to determine the combination of features that resulted in a higher accuracy for the multi-class classification model. The authors discarded the EDA features in this study and focused on EMG and ECG. This decision was made after noticing that incorporating EDA features led to a decrease in the performance metrics of the machine learning models. The first reason for this is that EDA, ECG, and EMG have different transient and steady-state responses, so the time window evaluation of the classifiers may only work for some of them according to the experimental setup. This is consistent with other results in the literature like those presented by [28]. So, only the data from the EMG and ECG signals were considered for training and testing the models. The dataset contained ten features extracted from both signals. After testing the models with different feature combinations, those that resulted in the best accuracy score were found. These selected features are presented in Table 2, where N refers to the size of the sample

{x_{1}, x_{2}, x_{3}, \dots, x_{N}}

and

x_{n}

to the value n of the sample.

Cross-validation with five folds, the measure of the area under the ROC curve (AUC), the average precision score, and the macro F1 score were used to evaluate the models. The cross-validation held 20% of the dataset for each fold to perform the accuracy evaluation. The results are presented in Table 3. Afterward, the model with the best accuracy score (SVM with an RBF kernel with 90% accuracy) was embedded into a Raspberry Pi 3 Model B to make the predictions online while the subjects play the FPS game.

3.3. Physiological Signal Acquisition and Processing

An instrumentation system was implemented to acquire the ECG and EMG channels from the same muscles as those studied to construct the Offline Dataset for Stress Classification. According to the experimental setup presented by Orozco-Mora et al. [22], the EMG signals from the forearm, Extensor Digitorum Communis (EDC), and the Flexor Carpi Radialis (FCR) muscles were selected because these cues have been successfully used for emotion recognition. The EDC works during the extension of the medial four fingers at the metacarpophalangeal and interphalangeal joints, while the FCR performs flexion and abduction at the wrist.

The signals were recorded using disposable adhesive Ag∖AgCl surface electrodes and a couple of wearable BioNomadix wireless modules interfaced with an MP150 data acquisition and analysis system (BIOPAC Systems Inc., Goleta, CA, USA). BN-ECG2 and BN-EMG2 modules recorded the ECG and EMG signals, respectively. All the signals were recorded at a sampling frequency of

f_{s} = 1

kHz. The module receivers were snapped to the universal interface module UIM100C, from which the analog signals were extracted online and digitized using an analog-to-digital converter (ADC) module to be later processed in a Raspberry Pi 3 Model B. Once the signals were digitized, the natural offsets were removed. For the ECG signal, the HR was computed using the Heartpy library version 1.2.7 in Python. For the EMG signals, a digital fourth-order bandpass (20 Hz–450 Hz) Butterworth filter was applied to remove unplanned movements and electrical noise components. The normalization process of the signals was performed with the quotient of their maximum value. The filtered signals were divided into windows of 1 s. Later, the features presented in Table 2 were extracted for classification purposes.

3.4. Dynamic Difficulty Adaptation Strategies

The player’s stress level was used to adjust the video game’s difficulty online. To achieve this, an SVM classification model with an RBF kernel was utilized. The model was trained to classify the user’s physiological signals into three levels: resting level, stress 1, and stress 2. Later, two game strategies were proposed to change the difficulty. Both adaptation processes are described below.

Spawning rate adaptation. In the first modality, the game starts with an initial variable $V a l u e$ , defining the initial difficulty, and the rate at which the enemies spawn is $1 / V a l u e$ . If the player’s stress level is predicted as a level 0, then the value increases by 0.05; if the level is 1, the value remains the same; and if the level is 2, then the value decreases by 0.05. This value is limited to decreasing until 0.5 (the easiest level) and increasing up to 1.5 (the most difficult level). The whole process for spawning rate adaptation is described in Algorithm 1.
Variable damage adaptation. For the second modality, the game considers the starting damage of 10 points while the total health of each enemy is set at 100 points. Every time a bullet reaches each enemy, its health diminishes. If the stress level of the player is predicted as a level 0, then the damage decreases by 2; if the level is 1, then the damage remains the same; and if the level is 2, then the damage increases by 2. This value is limited to increasing until 50 (the easiest level) and decreasing until 10 (the most difficult level). The process is described in Algorithm 2.

In both cases, the goal is to maintain the users at stress 1, where the player is not bored but not overwhelmed, according to the results obtained in [22]. Stress 1 is the reference for the players to reach while playing the video game subject to the DDA algorithms. Moreover, it is essential to point out that the algorithms are updated during each frame.

Algorithm 1: Algorithm for spawning rate adaptation

3.5. NASA-TLX Questionnaire

The NASA-TLX was selected as a subjective measurement of mental workload to correlate the subjects’ experiences with the difficulty adaptation methods and the responses obtained regarding physiological signals. The NASA-TLX questionnaire considers six factors: Mental demand (MD), Physical demand (PD), Temporal demand (TD), Performance (P), Effort (E), and Frustration level (F). A detailed definition of each factor can be found in [30]. Two steps must be completed to assess the NASA-TLX results between game modalities: comparing each factor (paired comparison) and assigning a rating to each factor (event scoring). In the paired comparison, the volunteer is asked to reflect on the performed tasks and look at each paired combination of the six factors. Then, they must decide which dimension is more related to their workload definition and underline it. For the second step, the participants were asked to rate each factor on a scale from 0 to 100 (with increments of 5) at the end of each game session.

Algorithm 2: Algorithm for variable damage adaptation

3.6. Experimental Procedure

A series of experimental tests were carried out to validate the proposed DDA strategies. Twenty volunteers (seven female, thirteen male) were invited to participate in our study. It was indicated that those with cardiovascular diseases, anxiety disorders, severe visual impairments, severe hearing loss, upper-limb prostheses, or previous experiences with cybersickness could not participate in the study. The volunteers who participated in the experiments were asked to fill out and sign an informed consent form before participating. They were advised to stop the experiment if they presented any cybersickness symptoms. The tests were carried out at Tecnológico de Monterrey Campus Guadalajara. As preparation, each participant was connected to the electrodes and attached to Bionomadix systems for ECG and EMG measurements. Later, the participant took a seat on a chair and wore the Oculus Quest 2 system. This subject setup is depicted with a male participant in Figure 2. The whole closed-loop process, including the measurement systems and the video game adaptation, is depicted in Figure 3 and can be stated as follows: The EMG and ECG signals are acquired through Bionomadix systems, which communicate via Bluetooth to the Biopac MP150 processing system. Next, the ECG and EMG analog signals are acquired by the ADC module, which sends the digital data to the Raspberry Pi 3 Model B. This device carries out the stress estimation process and updates the DDA variable (for each case), which is sent to Unity via serial communication. Finally, the video game is updated with the new information, and the update is deployed to the user via the Oculus Quest 2. The computer where the game was run leverages an Intel Core i7 microprocessor, 16 GB of RAM, and a GeForce GTX 1060 graphics card.

The complete VR experiment took approximately 40 min per person, including the initial setup and the VR session, which was divided into five stages. After the process previously described, each subject took part in a practice session to familiarize themselves with the 3D virtual environment and the game mechanics of shooting with the Oculus Quest controllers. At this point, none of the participants reported symptoms of cybersickness. Then, each player took part in a 300 s session to play the first DDA strategy (spawning rate adaptation). The difficulty started at the greatest level (1.5) and began to be adapted according to the stress level of the subject. After this session, a resting stage of 5 min followed. During this resting stage, the subjects were asked to answer the NASA-TLX [30] to evaluate their experience with the first DDA strategy. After the 5 min rest, the subjects engaged with the second DDA strategy (variable damage adaptation) lasting 300 s, which again started at the highest difficulty (10 points) to be continuously adapted later. Afterward, during this session, the volunteers had a second and final rest stage and were again asked to answer the NASA-TLX questionnaire to evaluate their experience with the play mode. In Figure 4, there is an overview of the experiment’s different stages to simplify the sequence mentioned. As explained in previous sections, the acquisition, digitization, and processing of the signals; stress estimation; and DDA adjustment were carried out online in the Raspberry Pi 3.

4. Results

The classification models described in Section 3.4 exhibited different accuracy values (see Table 3). In this case, the SVM model obtained the best accuracy of 90%, with an AUC of 0.96. It is important to mention that this result was obtained with the Offline Dataset for Stress Classification. According to this result, the SVM was embedded in the Raspberry Pi 3. The timeline results for the experiments obtained for the DDA strategies are presented in Figure 5. Only a small sample of the results is shown to be able to observe some of the obtained behaviors in detail. However, the statistics described later include the whole set of test subjects. Figure 5a depicts the predicted stress level of the test subjects during the spawning rate adaptation stage. Later, Figure 5b presents the predicted stress level during the variable damage adaptation stage. Each colored line represents the predicted stress level at that time. The overall trend in these behaviors is relevant to the study, showing the subjects’ steady-state behavior (at different times) reaches level 1 as desired.

The suggested methodology described in [30] was followed to compile and evaluate the results from the NASA-TLX. Figure 6 visually represents the rankings for each considered factor. The height of the bars represents the mean of their obtained ranks, while their width represents the mean weight assigned to each one. These weights are calculated by counting the times each factor was selected in the pair comparisons for each task. Each rating is multiplied by the weight of that factor to calculate the overall workload score for each volunteer. Then, the sum of the weighted ratings is divided by 15 (number of pair comparisons). This procedure was followed for each participant and each task. Moreover, Figure 7 shows the distribution of each NASA-TLX subscale for both modalities. By analyzing these distributions, we can perceive a rise in the mean of all subscales for the second modality. Moreover, the distribution of the total workload scores is shown in Figure 8 for both DDA strategies. In a boxplot, the median is represented by a line in the middle of the box, which shows the middle value. The box itself covers the middle 50% of the data, and the length of the box indicates the spread of the data. The whiskers, which extend from the box, show the overall range of most of the data. The dots outside the whiskers are called outliers, which are unusual points in the data. The sides of the box can have optional notches, which help compare groups. If the notches do not overlap between boxplots, this suggests that the medians of the groups are different. The box might represent a 95% confidence interval, which means there is a high chance that the true middle value of the data falls within it. The data obtained for both game modalities were subjected to a Shapiro–Wilk Test to check if they had a normal distribution. The p-values obtained were

p = 0.34

for the first mode and

p = 0.88

for the second. We have that

p > 0.05

is true for both cases, so their distribution is normal. Then, a paired t-test was performed to check for significant statistical differences between the perceived workload for both game modalities. The p-value obtained was

p = 0.133 > 0.05

, so there were no significant differences between the variables.

A closer examination of some features from the acquired signals was needed to visualize the trends and differences between the game modalities. The boxplot graphs in Figure 9 show the HR distribution of four volunteers (A–D) during each minute of the DDA stages. The HR was acquired every ten seconds, so each box shows the distribution for that minute. The graph on the right for each volunteer represents the first game modality and the one on the left represents the second modality. The box graphs in Figure 10 show the RMS distribution from the Extensor Digitorum Communis muscle of each of the four volunteers (A–D) by the minute. The RMS was acquired every second, so each box shows the distribution for that minute. The graph on the right for each volunteer represents the first game modality (spawning rate adaptation), and the left one shows the second modality (variable damage adaptation). When comparing these graphs with their corresponding curve in Figure 5, the correlation between physiological changes and predicted stress levels can be distinguished.

5. Discussion

According to the obtained results for the classification models, the SVM provides the highest accuracy for classifying three stress levels (≈90%). These results are slightly improved compared to the previous stress classification reported in [22]. The main reason for the improved stress detection corresponds to the exclusion of the EDA characteristics used in the previous study. Due to their long steady-state responses, such elements are less accurate in detecting stress in short time windows.

Several variations between the predicted stress levels were found in the timeline results depicted in Figure 5. Such prediction depends on the difficulty level applied to each user. It is important to remark that the objective of the DDA strategies was to maintain the user in stress 1. Despite the variations between the levels of all users, the steady-state tendency was to achieve stress 1 given the DDA strategies. However, since the time series of estimated stress levels is a discrete (and limited) set of values, no further time-domain analysis can be performed. A statistical analysis could also not be carried out since each user has different settling times, and their changes do not match in each time instant. This is an important drawback of the proposed stress detection strategy and a general issue for online affect detection systems. Nevertheless, the overall tendencies give valuable information since the DDA strategies successfully deploy a personalized experience for each user. For practical and illustrative reasons, only four volunteers were chosen for the graphs, showing that each person had a unique response to the games. The graphs highlight the tendency to achieve and maintain a moderate predicted stress level. These results allow us to answer RQ1 since the proposed algorithms allow the user to remain at a specific stress level. However, in order to obtain better results, it is essential to consider longer time windows to assess the error between the desired state and the obtained state for each user. Also, a dynamic reference must be taken into account to change the stress level reference to test the DDA algorithms, not only for stress 1 but for higher or lower levels. This could also be extended to other affective states, for example, to achieve patterns and physiological activity for relaxed states or for frustration management, since this approach is not limited to FPS games.

On the other hand, the overall perceived workload evaluation (see Figure 8) does not show statistically significant differences between the two DDA strategies. This result means that both DDA strategies exert a similar mental workload for the players during the test, providing an answer to RQ2. Nonetheless, some important outcomes can be observed in Figure 6. First, the overall scores for the variable damage adaptation were higher than those for the spawn rate adaptation. Although no significant differences were found, the user perceived more physical and mental demands, as well as frustration, when the health of the enemy was modified. The effort required was also higher, but the users perceived that they performed better in the task. This could imply that a more demanding task results in more satisfaction with one’s performance during the game, similar to the conclusions presented by Moschovitis et al. [16]. However, monitoring the perceived frustration is very important since flow theory seeks to avoid states of frustration and boredom.

In order to provide a broad analysis of the workload and its relationship with the stress detected, we analyzed the behavior of some physiological signal characteristics. Figure 9 and Figure 10 present the HR and RMS distribution of the ECG and EMG signals for the four selected volunteers. Similar HR behaviors can be observed in both experiments for each person; in the first minutes, a higher HR is achieved, which decreases to some steady-state level later. However, the RMS distributions show a clearly higher muscular activity trend when dealing with variable damage adaptation. This is a significant result since the use of video games for rehabilitation or training purposes must avoid not only frustration but also muscular fatigue [21]. This outcome shows that even when a similar workload was perceived, the increased physical demand detected (Figure 6) is consistent with the muscular activity detected, which is very important to select the variable for difficulty adaptation in video games properly.

6. Conclusions

This pilot study tested two DDA strategies for a first-person shooter game to maintain a group of test subjects at a particular stress level. The DDA strategies were based on the stress estimation from users using a physiological signal approach. A database of several characteristics from EMG and ECG signals was used to train a set of machine learning classifiers, obtaining the best results with an SVM scheme. The classifier was embedded in a Raspberry Pi 3 to detect the stress level of a user online while playing a first-person shooter game. The two DDA strategies employed were spawning rate adaptation and variable damage adaptation. Both strategies show similar results for the induced workload on the user. However, variable damage adaptation has a higher impact on the frustration and physical demands of the users. This was verified by analyzing the RMS distribution of the users’ muscular signals. Even when all the players show similar tendencies during the game dynamics, it is essential to remark that each of them plays a personalized version of the video game due to the variability in the physiological signals, the time response to stressful tasks, and the strategies followed by each player to make it through the game.

7. Limitations and Future Work

One of the principal limitations of this study is the relatively small data pool used. This means more data are required to obtain statistically significant conclusions about the differences between the game modes. Although slight differences in perceived workload were found, larger groups for testing (and extensive game times) are necessary to determine a statistically significant difference between game modalities, similar to what is stated in the central limit theorem for larger sample sizes. Moreover, it is important to note that a bias may be introduced due to the order in which the volunteers played the modalities. This limitation was not detected at the beginning of the study, since similar studies do not use similar approaches, leaving that element out of our scope. It will be important to analyze in future studies if the inversion of the strategies could impact the results. At this moment, the authors consider that such bias (if it exists) was minimized by the practice session used to familiarize themselves with the controls and dynamics of the game. Additionally, a five-minute rest period was included between gaming stages. Addressing these limitations in future research is crucial to ensure the robustness and reliability of our findings.

After analyzing the obtained results, the authors have identified the extension of video game sessions to analyze the steady-state response of the signals and perform stress estimations as future work. Moreover, an extensive comparison study is necessary to assess the presented algorithms’ effectiveness against other DDA strategies (including non-physiological-based algorithms). To the best of the authors’ knowledge, a similar setup is presented only in [10]; however, some information needs to be added to reproduce such results. Also, other metrics to evaluate presence should be considered in future studies. For example, the error of presence presented in [31] is a promising approach to evaluate the state of the users in a continuous or discrete form. Finally, an exciting subject to explore will be using both (or more) difficulty variables simultaneously by using some weight allocation strategy that determines which variable is contributing more to the flow state of the users.

Author Contributions

Conceptualization, C.E.O.-M. and G.H.-M.; Funding acquisition, R.Q.F.-A. and G.H.-M.; Investigation, C.E.O.-M. and G.H.-M.; Methodology, C.E.O.-M. and G.H.-M.; Resources, R.Q.F.-A.; Software, C.E.O.-M.; Supervision, R.Q.F.-A. and G.H.-M.; Validation, C.E.O.-M. and G.H.-M.; Writing—original draft, C.E.O.-M. and G.H.-M.; Writing—review and editing, R.Q.F.-A. All authors have read and agreed to the published version of the manuscript.

Funding

The development of this study was supported by the Intel RISE initiative with grant number 78663.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Ethics Committee of Tecnológico de Monterrey Institute (EHE-2023-06 approved on 1 October 2023).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

We would like to thank the participants who took part in this study, CONAHCYT for the financial support for living expenses of Carme Elisa, and the Advanced Cyberphysical Systems Laboratory from the School of Engineering and Sciences for the workspace.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Statista. AR & VR—Worldwide. 2023. Available online: https://www.statista.com/outlook/amo/ar-vr/worldwide#revenue (accessed on 12 June 2024).
Dincelli, E.; Yayla, A. Immersive virtual reality in the age of the Metaverse: A hybrid-narrative review based on the technology affordance perspective. J. Strateg. Inf. Syst. 2022, 31, 101717. [Google Scholar] [CrossRef]
Muñoz-Saavedra, L.; Miró-Amarante, L.; Domínguez-Morales, M. Augmented and Virtual Reality Evolution and Future Tendency. Appl. Sci. 2020, 10, 322. [Google Scholar] [CrossRef]
Statista. Virtual Reality (VR) Gaming Revenue Worldwide from 2019 to 2024. 2024. Available online: https://www.statista.com/statistics/1360511/global-virtual-reality-gaming-revenue/ (accessed on 12 June 2024).
Pallavicini, F.; Pepe, A.; Minissi, M.E. Gaming in Virtual Reality: What Changes in Terms of Usability, Emotional Response and Sense of Presence Compared to Non-Immersive Video Games? Simul. Gaming 2019, 50, 136–159. [Google Scholar] [CrossRef]
Jang, Y.; Park, E. An adoption model for virtual reality games: The roles of presence and enjoyment. Telemat. Inform. 2019, 42, 101239. [Google Scholar] [CrossRef]
Lora-Ariza, D.S.; Sánchez-Ruiz, A.A.; González-Calero, P.A.; Camps-Ortueta, I. Measuring Control to Dynamically Induce Flow in Tetris. IEEE Trans. Games 2022, 14, 579–588. [Google Scholar] [CrossRef]
Shi, P.; Chen, K. Learning Constructive Primitives for Real-Time Dynamic Difficulty Adjustment in Super Mario Bros. IEEE Trans. Games 2018, 10, 155–169. [Google Scholar] [CrossRef]
Paraschos, P.D.; Koulouriotis, D.E. Game Difficulty Adaptation and Experience Personalization: A Literature Review. Int. J. Hum.-Comput. Interact. 2023, 39, 1–22. [Google Scholar] [CrossRef]
Changchun, L.; Pramila, A.; Nilanjan, S.; Shuo, C. Dynamic Difficulty Adjustment in Computer Games Through Real-Time Anxiety-Based Affective Feedback. Int. J. Hum.-Comput. Interact. 2009, 25, 506–529. [Google Scholar] [CrossRef]
Chanel, G.; Rebetez, C.; Bétrancourt, M.; Pun, T. Emotion Assessment From Physiological Signals for Adaptation of Game Difficulty. IEEE Trans. Syst. Man, Cybern.-Part Syst. Humans 2011, 41, 1052–1063. [Google Scholar] [CrossRef]
Csikszentmihalyi, M. Flow: The Psychology of Optimal Experience by Mihaly Csikszentmihalyi; CreateSpace Independent Publishing Platform: Scotts Valley, CA, USA, 2018. [Google Scholar]
Stein, A.; Yotam, Y.; Puzis, R.; Shani, G.; Taieb-Maimon, M. EEG-triggered dynamic difficulty adjustment for multiplayer games. Entertain. Comput. 2018, 25, 14–25. [Google Scholar] [CrossRef]
Silva, M.P.; do Nascimento Silva, V.; Chaimowicz, L. Dynamic difficulty adjustment on MOBA games. Entertain. Comput. 2017, 18, 103–123. [Google Scholar] [CrossRef]
Lara-Álvarez, C.; Mitre-Hernandez, H.; Flores, J.J.; Pérez-Espinosa, H. Induction of Emotional States in Educational Video Games Through a Fuzzy Control System. IEEE Trans. Affect. Comput. 2021, 12, 66–77. [Google Scholar] [CrossRef]
Moschovitis, P.; Denisova, A. Keep Calm and Aim for the Head: Biofeedback-Controlled Dynamic Difficulty Adjustment in a Horror Game. IEEE Trans. Games 2022, 15, 368–377. [Google Scholar] [CrossRef]
Reidy, L.; Chan, D.; Nduka, C.; Gunes, H. Facial Electromyography-Based Adaptive Virtual Reality Gaming for Cognitive Training. In Proceedings of the 2020 International Conference on Multimodal Interaction, ICMI ’20, Virtual Event, 25–29 October 2020; pp. 174–183. [Google Scholar] [CrossRef]
Nacke, L.E.; Grimshaw, M.N.; Lindley, C.A. More than a feeling: Measurement of sonic user experience and psychophysiology in a first-person shooter game. Interact. Comput. 2010, 22, 336–343. [Google Scholar] [CrossRef]
Muñoz, J.E.; Quintero, L.; Stephens, C.L.; Pope, A.T. A Psychophysiological Model of Firearms Training in Police Officers: A Virtual Reality Experiment for Biocybernetic Adaptation. Front. Psychol. 2020, 11, 516170. [Google Scholar] [CrossRef] [PubMed]
Montoya, M.F.; Muñoz, J.; Henao, O.A. Fatigue-aware videogame using biocybernetic adaptation: A pilot study for upper-limb rehabilitation with sEMG. Virtual Real. 2023, 27, 277–290. [Google Scholar] [CrossRef]
Montoya, M.F.; Muñoz, J.E.; Henao, O.A. Enhancing Virtual Rehabilitation in Upper Limbs with Biocybernetic Adaptation: The Effects of Virtual Reality on Perceived Muscle Fatigue, Game Performance and User Experience. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 740–747. [Google Scholar] [CrossRef] [PubMed]
Orozco-Mora, C.E.; Oceguera-Cuevas, D.; Fuentes-Aguilar, R.Q.; Hernández-Melgarejo, G. Stress Level Estimation Based on Physiological Signals for Virtual Reality Applications. IEEE Access 2022, 10, 68755–68767. [Google Scholar] [CrossRef]
Kenney.nl. Animated Characters and Graveyard Kit. 2020. Available online: https://kenney.nl/assets/animated-characters-1andhttps://kenney.nl/assets/graveyard-kit (accessed on 20 October 2020).
Valembois, Q. Making a VR Game in ONE HOUR, Youtube. 2020. Available online: https://www.youtube.com/watch?v=ICyrJVddNxU (accessed on 20 October 2020).
Allison, B.; Polich, J. Workload Assessment of Computer Gaming Using a Single-Stimulus Event-Related Potential Paradigm. Biol. Psychol. 2008, 77, 277–283. [Google Scholar] [CrossRef] [PubMed]
Jonghwa, K.; Elisabeth, A. Emotion Recognition Based on Physiological Changes in Music Listening. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 2067–2083. [Google Scholar] [CrossRef]
Parnandi, A.R.; Gutierrez-Osuna, R. A Comparative Study of Game Mechanics and Control Laws for an Adaptive Physiological Game. J. Multimodal User Interfaces 2014, 9, 31–42. [Google Scholar] [CrossRef]
Felnhofer, A.; Kothgassner, O.D.; Schmidt, M.; Heinzle, A.K.; Beutl, L.; Hlavacs, H.; Kryspin-Exner, I. Is Virtual Reality Emotionally Arousing? Investigating Five Emotion Inducing Virtual Park Scenarios. Int. J. Hum.-Comput. Stud. 2015, 82, 48–56. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar] [CrossRef]
Hart, S.G.; Staveland, L.E. Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research. Adv. Psychol. 1988, 52, 139–183. [Google Scholar] [CrossRef]
Hernández-Melgarejo, G.; Luviano-Juárez, A.; Fuentes-Aguilar, R.Q. A Framework to Model and Control the State of Presence in Virtual Reality Systems. IEEE Trans. Affect. Comput. 2022, 13, 1854–1867. [Google Scholar] [CrossRef]

Figure 1. Example screen for each game modality. (a) Variable spawning rate scenario showing several enemies, (b) variable damage rate showing a single enemy with its health indicator.

Figure 2. Experimental setup with a volunteer showing the measurement and display systems.

Figure 3. Closed loop process including the acquisition and processing of physiological signals as well as the feedback actions for the video game.

Figure 4. Experimental timeline with different stages of DDA and rest.

Figure 5. Predicted levels throughout 300 s of the games for volunteers A–D: (a) spawning rate adaptation, (b) variable damage adaptation.

Figure 6. Visual representation of the rankings (height) and weights (width) assigned to each factor for the (a) spawning rate adaptation and (b) variable damage adaptation. Factors: Mental demand (MD), Physical demand (PD), Temporal demand (TD), Performance (P), Effort (E), and Frustration level (F).

Figure 7. Distribution of the ratings for each scale obtained in the Nasa-TLX for both game modalities.

Figure 8. Perceived workload for both game modalities.

Figure 9. Volunteer heart rate distribution by the minute for both game modalities.

Figure 10. Volunteer RMS distribution by the minute for both game modalities.

Table 1. Studies that combine DDA along with affect detection.

Study	Affective Data	DDA Strategy	Number of Variables	Application
[10]	3 levels of anxiety using ECG, EMG, EDA, PPG	Performance and affective based	3 levels that depend on several elements	Pong Game
[13]	Short-term excitement from EEG	Threshold based to evoke excitement	4 modes	Third-person shooter
[16]	Arousal via HR	Linear increments/decrements to evoke motivation	3 parameters	Horror seek and find game
[17]	Arousal and valence using facial EMG	10 difficulty levels to generate cognitive load	1 (each application)	Multi-room museum and supermarket
[18]	Arousal using skin conductance and facial EMG	4 combinations of sounds	2 boolean, music and sound effects	FPS video game
[19]	Stress using EEG and HRV	3 levels to adapt static vs. mobile targets	1	VR police training
[21]	Muscular fatigue (EMG)	Continuous control of exercise intensity	1	Force defense rehabilitation video game
This Study	Stress using HR and EMG	Modify number of spawning enemies/modify the amount of damage	1 for each game modality	FPS video game

Table 2. Selected features of the physiological signals and their corresponding formulas.

Label	Feature Name	Equation
$R M S$	Root Mean Square (EMG)	$\sqrt{\frac{1}{N}} \sum_{n = 1}^{N} x_{n}^{2}$
$M A V$	Mean Absolute Value (EMG)	$\frac{1}{N} \sqrt{\sum_{n = 1}^{N} \|x_{n}\|}$
$V A R$	Variance (EMG)	$\frac{1}{N - 1} \sqrt{\sum_{n = 1}^{N} x_{n}^{2}}$
$S T D$	Standard Deviation (EMG)	$\sqrt{\frac{1}{N} \sum_{n = 1}^{N} {(x_{n} - μ)}^{2}}$
$M P T$	Maximum Peak in a Timespan (EMG)	$m a x (x_{0}, x_{1}, \dots, x_{n})$
$H R$	Heart Rate (ECG)	= beats per minute

Table 3. Obtained metrics for each tested model.

Model	Accuracy Score	AUC	Average Precision	Macro F1 Score
SVM (RBF)	90%	0.96	0.91	0.94
kNN	88%	0.90	0.96	0.92
RFC	89%	0.94	0.96	0.92
MLP	88%	0.92	0.96	0.90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Orozco-Mora, C.E.; Fuentes-Aguilar, R.Q.; Hernández-Melgarejo, G. Dynamic Difficulty Adaptation Based on Stress Detection for a Virtual Reality Video Game: A Pilot Study. Electronics 2024, 13, 2324. https://doi.org/10.3390/electronics13122324

AMA Style

Orozco-Mora CE, Fuentes-Aguilar RQ, Hernández-Melgarejo G. Dynamic Difficulty Adaptation Based on Stress Detection for a Virtual Reality Video Game: A Pilot Study. Electronics. 2024; 13(12):2324. https://doi.org/10.3390/electronics13122324

Chicago/Turabian Style

Orozco-Mora, Carmen Elisa, Rita Q. Fuentes-Aguilar, and Gustavo Hernández-Melgarejo. 2024. "Dynamic Difficulty Adaptation Based on Stress Detection for a Virtual Reality Video Game: A Pilot Study" Electronics 13, no. 12: 2324. https://doi.org/10.3390/electronics13122324

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Difficulty Adaptation Based on Stress Detection for a Virtual Reality Video Game: A Pilot Study

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Virtual Reality Video Game

3.2. Stress Classification

3.3. Physiological Signal Acquisition and Processing

3.4. Dynamic Difficulty Adaptation Strategies

3.5. NASA-TLX Questionnaire

3.6. Experimental Procedure

4. Results

5. Discussion

6. Conclusions

7. Limitations and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI