Towards Emotionally Intelligent Virtual Environments: Classifying Emotions through a Biosignal-Based Approach

Arslan, Ebubekir Enes; Akşahin, Mehmet Feyzi; Yilmaz, Murat; Ilgın, Hüseyin Emre

doi:10.3390/app14198769

Open AccessArticle

Towards Emotionally Intelligent Virtual Environments: Classifying Emotions through a Biosignal-Based Approach

by

Ebubekir Enes Arslan

^1,†

,

Mehmet Feyzi Akşahin

^2,†

,

Murat Yilmaz

^3,†

and

Hüseyin Emre Ilgın

^4,*,†

¹

Mayo Graduate School of Biomedical Sciences, Rochester, MN 55905, USA

²

Department of Electrical and Electronics Engineering, Faculty of Engineering, Gazi University, Ankara 06570, Turkey

³

Department of Computer Engineering, Faculty of Engineering, Gazi University, Ankara 06570, Turkey

⁴

School of Architecture, Faculty of Built Environment, Tampere University, P.O. Box 600, FI-33014 Tampere, Finland

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2024, 14(19), 8769; https://doi.org/10.3390/app14198769 (registering DOI)

Submission received: 4 August 2024 / Revised: 25 September 2024 / Accepted: 26 September 2024 / Published: 28 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

This paper introduces a novel method for emotion classification within virtual reality (VR) environments, which integrates biosignal processing with advanced machine learning techniques. It focuses on the processing and analysis of electrocardiography (ECG) and galvanic skin response (GSR) signals, which are established indicators of emotional states. To develop a predictive model for emotion classification, we extracted key features, i.e., heart rate variability (HRV), morphological characteristics, and Hjorth parameters. We refined the dataset using a feature selection process based on statistical techniques to optimize it for machine learning applications. The model achieved an accuracy of 97.78% in classifying emotional states, demonstrating that by accurately identifying and responding to user emotions in real time, VR systems can become more immersive, personalized, and emotionally resonant. Ultimately, the potential applications of this method are extensive, spanning various fields. Emotion recognition in education would allow further implementation of adapted learning environments through responding to the current emotional states of students, thereby fostering improved engagement and learning outcomes. The capability for emotion recognition could be used by virtual systems in psychotherapy to provide more personalized and effective therapy through dynamic adjustments of the therapeutic content. Similarly, in the entertainment domain, this approach could be extended to provide the user with a choice regarding emotional preferences for experiences. These applications highlight the revolutionary potential of emotion recognition technology in improving the human-centric nature of digital experiences.

Keywords:

biosignal analysis; emotion classification; virtual reality; multimodal signal analysis; emotionally adaptive systems; machine learning

1. Introduction

Could the future of virtual reality (VR) involve experiences that understand and respond to our emotions in real time? The promising field of virtual reality encompasses diverse applications in entertainment, therapy, and training [1]. As VR technology matures, the potential to recognize the emotional state of a user and customize the virtual environment accordingly is immense. Such advancements could significantly enhance immersion, personalize experiences, and introduce innovative forms of human–computer interaction [2].

Although the field of emotion classification has explored various modalities, its seamless integration into VR systems is still evolving [3]. Current methods often depend on facial expressions or user input, which may not be consistently accurate or may interrupt the VR experience [4,5,6]. The use of physiological signals, which more directly reflect emotional states, promises a more natural and continuous recognition of emotions within VR settings [7].

Our research presents a method that processes and analyzes electrocardiography (ECG) and galvanic skin response (GSR) signals to classify emotions such as excitement, happiness, and anxiety in VR, leveraging the physiological reactions these emotions invoke. The established efficacy of biosignals, with ECG and GSR being the primary indicators of arousal and valence, underpins our approach [8]. We extracted essential features, including heart rate variability (HRV), morphological aspects, and Hjorth parameters. Using a Chi-square-based feature selection in MATLAB, we carefully partitioned our dataset to maintain the integrity of the machine learning process. The application of an ensemble learning algorithm provided robust emotion classification, demonstrating the high accuracy of our method.

The overarching aim of this research was to forge a method for emotion classification within VR environments using physiological signals. This method strove not only for high accuracy but also for an enhanced user experience through instantaneous emotion detection. Our approach’s precision in classifying emotions in VR hints at the transformative potential of biosignal processing coupled with machine learning to redefine VR experiences.

The remainder of this paper is structured as follows. Section 2 provides background information on existing approaches to emotion classification in VR. Section 3 details our proposed methodology, including the data collection, feature extraction, and classification techniques. Section 4 presents our experimental results and evaluation metrics. Finally, Section 5 discusses the implications of our work and outlines directions for future research.

2. Background

Virtual reality (VR), as a transformative technology, has redefined human–computer interactions by creating immersive three-dimensional experiences that transcend traditional interfaces [9]. Despite its advancements, the exploration of VR’s potential for customization and adaptability, especially in the realm of emotion recognition and response, remains relatively nascent. The ability to accurately classify and respond to user emotions could initiate a new era of VR applications, one that is more intuitive, interactive, and tailored to individual experiences [10]. However, the complex nature of human emotions poses significant challenges in achieving precise emotion recognition within VR environments.

The integration of biosignal processing techniques with advanced machine learning algorithms offers a promising approach to navigate the complex landscape of human emotions in VR [11]. Furthermore, the development of comprehensive datasets, such as DER-VREEG, which take advantage of low-cost wearable EEG headsets in VR settings, has facilitated the advancement of emotion recognition algorithms [12]. The exploration of embodied interactions within VR systems also provides valuable insights into user experiences and emotional responses, underscoring the importance of a nuanced understanding of human–VR interactions [13]. Moreover, the analysis of VR as a communication process highlights the potential of VR to mediate complex emotional and cognitive experiences [14]. Finally, the examination of variation in VR gameplay and its impact on affective responses illuminates the complex relationship between VR experiences and emotional states [15].

Traditionally, emotion recognition methods have predominantly utilized facial expression analysis and speech patterns to discern emotional states [16,17,18]. Although effective to a certain extent, these techniques may not fully translate to the virtual reality (VR) environment due to user comfort considerations and the inherent limitations of VR headsets. This limitation calls for the exploration of more subtle and less invasive techniques for emotion recognition within VR settings. Biosignal processing, which involves the interpretation of physiological signals such as electrocardiography (ECG) and galvanic skin response (GSR), presents a viable alternative [19]. These biosignals are closely related to human emotional responses and, as such, offer a promising avenue for emotion classification that bypasses the constraints of traditional methods.

In the domain of emotion recognition through physiological signals and machine learning, seminal works have established a solid foundation for our investigation. Bota et al. [20] offered an comprehensive review, defining current challenges and prospective avenues in the field, particularly highlighting the significant role of machine learning techniques and physiological indicators in emotion recognition. Hsu et al. [7] developed an algorithm based on electrocardiography (ECG) for the detection of emotions, employing a series of experiments involving varying durations of listening to music to evoke different emotional responses. The ECG signals recorded during these sessions were subjected to feature selection and reduction using machine learning approaches, resulting in classification accuracies of 82.78% for valence, 72.91% for arousal, and 61.52% for a four-class emotion model. In addition, Uyank et al. [21] explored the classification of emotional valence using electroencephalogram (EEG) signals within a virtual reality (VR) setting. This study extracted differential entropy features from various EEG bands and evaluated the accuracy of several classifiers, including support vector machines (SVM), k-nearest neighbor (kNN), naive Bayesian, decision trees, and logistic regression, with the SVM classifier achieving an accuracy of 76 22%. Zhang et al. [17] focused on emotion recognition using galvanic skin response (GSR) signals, proposing a novel approach utilizing quantum neural networks. Given the diverse levels of information encapsulated in the features of the GSR signal corresponding to different emotions, the quantum neural network, optimized with particle swarm optimization, facilitated a more adaptable classification scheme, achieving an average accuracy of 84. 9% in five emotion categories with data from 35 participants.

In the evolving field of emotion recognition through physiological signals, several studies have pioneered the use of singular biosignals for sophisticated classification tasks. Chen et al. [22] investigated the efficacy of artificial neural networks by harnessing heart rate variability (HRV) data to classify emotions into five distinct categories: pleasure, happiness, fear, anger, and neutral. HRV data, obtained from wristband monitors during gameplay, facilitated the training of neural networks, with a notable configuration that achieved an average accuracy of 82.84%. In a quest to discern three emotional states, Dominguez-Jimenez et al. [23] used photoplethysmography (PPG) and galvanic skin response (GSR) signals collected via an instrumented glove, employing classifiers such as SVM and LDA. Interestingly, the combination of GSR signals and the SVM classifier achieved a perfect accuracy rate in identifying amusement, sadness, and neutral states.

Further exploring the domain of wearable technologies in healthcare, Ayata et al. [24] introduced an algorithm that assesses emotional states through the dimensions of arousal and valence using various machine learning techniques. Their research demonstrated the superior accuracy of a fused model incorporating a respiratory belt, PPG, and fingertip temperature signals, underscoring the potential of multimodal physiological data to enhance emotion recognition accuracy. The highest accuracies recorded were 73.08% for arousal and 72.18% for valence.

Bălan et al. [25] investigated the utility of machine learning classifiers in a VR setting aimed at acrophobia treatment, utilizing EEG, GSR, and PPG signals to determine fear levels to tailor VR therapies. Although a 2-choice fear scale yielded an accuracy rate of 89 5%, the performance was markedly reduced with a more granular 4-choice scale. Complementing these empirical studies, Marín-Morales et al. [26] provided a comprehensive review on emotion recognition within immersive VR, charting the field’s progression and setting the stage for future inquiries. These seminal contributions illustrate the importance of emotion recognition techniques and provide invaluable insights, laying the groundwork for our investigation.

Within the dynamic field of affective computing, the endeavor to devise reliable emotion recognition systems is accelerating, driven by technological advances in sensor capabilities and machine learning methodologies. Saganowski et al. [27] delineated the shift of emotion recognition efforts from the confines of laboratory environments to real-world applications, underlining the critical role that readily available sensor technologies and advanced signal processing methods play. This shift has been further propelled by innovations in learning algorithms, as expounded by Pal et al. [28], who highlighted the seamless integration of these advancements into digital platforms, thus expanding the horizons of emotion recognition applications.

Given the complex nature of human emotions, a multimodal approach is paramount to enhance the accuracy and dependability of emotion recognition systems. Wang et al. [29] offered an extensive analysis of the field of affective computing, emphasizing the amalgamation of behavioral cues with physiological data to discern subtler emotional nuances. Their review not only illuminated the prevailing methodologies, but also paved the way for forthcoming investigative endeavors within this sphere. Echoing the multimodal paradigm, Dissanayake et al. [30] examined the realm of wearable technologies for emotion detection, presenting SigRep, a pioneering framework that adopts contrastive representation learning to exploit the discreet and persistent data collection capabilities of wearable devices, thus fortifying the efficacy of emotion recognition frameworks.

The quest for generalizability in emotion recognition systems across varied demographic and physiological spectra presents a formidable challenge within the realm of affective computing. Addressing this concern, Ali et al. [31] introduced a globally generalized framework for emotion recognition that demonstrated high accuracy rates, irrespective of physiological signal discrepancies or demographic variations among subjects. This approach, which transcends subject-specific constraints, indicates a new era in the applicability of universal emotion recognition solutions. Concurrently, Su et al. [32] unveiled an ontology-based model that exhibited commendable precision in discerning emotional states through EEG signals, pinpointing the importance of certain signal attributes—specifically, the power ratio between the beta and theta waves—as pivotal for precise emotion categorization [19]. Together, these pioneering investigations represent the forefront of research in emotion detection, illuminating the profound capacity of physiological signals to foster more intuitive and empathetic human–computer interactions.

In conclusion, the collective body of work reviewed herein underscores the dynamic and multifaceted nature of emotion recognition research within the realm of affective computing. From pioneering methodologies that leverage the latest in sensor and machine learning innovations to nuanced approaches that seek to transcend demographic and physiological variabilities, each study significantly contributes to our understanding and capabilities in this field. As we move forward, these insights not only pave the way for more sophisticated and universally applicable emotion recognition systems, but also hold the promise of fostering more intuitive and empathetic interactions between humans and technology. The convergence of these advances promises a future where digital systems can more accurately interpret and respond to the complex tapestry of human emotions, thus enhancing the user experience across a multitude of applications. Within this evolving landscape, the proposed work stands out for its novel approach, aiming to fill a gap that has not been addressed before and thus holds the potential to make a significant contribution to the field.

3. Methods

Our study introduces an innovative methodology for recognizing emotions within virtual reality (VR) environments, capitalizing on the integrated strengths of biosignal processing and advanced machine learning techniques. Central to our approach is an intricate analysis of electrocardiography (ECG) and galvanic skin response (GSR) signals, from which we extract key features indicative of various emotional states. This process involves a comprehensive application of signal processing techniques, culminating in sophisticated feature extraction processes. Using the power of ensemble machine learning algorithms and employing rigorous feature selection strategies, such as the Chi-square method [4], we have developed a predictive model that showcases remarkable precision and efficiency in emotion classification within VR contexts. This research sets forth a groundbreaking framework aimed at enhancing the capabilities of VR technologies, making strides toward creating VR experiences that are not only immersive but also emotionally responsive and adaptive. The methodology and its implementation are detailed in Figure 1, providing a clear road map for the development of emotionally intelligent VR systems.

3.1. Dataset Summary

This study capitalizes on the innovative “VR Eyes: Emotions Dataset” (VREED), specifically curated to elicit emotional responses through immersive 360° virtual environments (360-VEs) presented via a virtual reality (VR) headset. The dataset encompasses both self-reported emotional assessments and physiological data, including electrocardiography (ECG) and galvanic skin response (GSR) readings, collected from 34 healthy individuals. Participants experienced a series of twelve unique 360-VEs, with durations ranging from one to three minutes each, designed to immerse and evoke distinct emotional states [33].

3.1.1. Subject Properties

The dataset utilized in this study is composed of 34 healthy individuals, providing a varied but methodically curated sample. Participants provided demographic details including gender, sexual orientation, age, and ethnicity, in addition to responses to a health screening questionnaire.

Although the VR Eyes: Emotions Dataset does not predefine specific emotional categories, we identified five distinct emotions for this study: excitement, happiness, anxiety, calmness, and sadness. These categories were selected based on their relevance to the VR scenarios used and their distinct physiological signatures in terms of heart rate variability (HRV) and galvanic skin response (GSR).

The selection of the five emotional categories (i.e., excitement, happiness, anxiety, calmness, and sadness) was informed by a combination of empirical data and prior studies on emotion recognition using physiological signals. Several studies have highlighted the distinct physiological patterns associated with these emotions, particularly in terms of heart rate variability (HRV) and galvanic skin response (GSR) [34,35,36,37]. For example, high-arousal emotions such as anxiety and excitement, although similar, show different HRV profiles [35,37], while low-arousal emotions such as calmness and sadness are associated with unique GSR responses [34]. The VR scenarios used in this study were also designed to evoke emotional responses according to these categories, ensuring that the emotional states selected for classification were relevant to the stimuli provided [36]. This combination of physiological evidence and the relevance of the VR scenario guided the selection process, ensuring that the categories chosen were both empirically grounded and contextually appropriate.

To maintain the integrity of the sample, exclusion criteria were rigorously enforced, disqualifying individuals with histories of seizure episodes, cardiac anomalies, vestibular disturbances, recurrent migraines, or psychological conditions. In addition, those susceptible to motion sickness or with compromised visual or auditory abilities were also omitted from the study. This selection process was designed to cultivate a participant pool reflecting broad demographic diversity, while adhering to stringent health and safety standards.

3.1.2. Signal Properties

The dataset encompasses physiological signals, specifically electrocardiography (ECG) and galvanic skin response (GSR) obtained with precise configurations to ensure data integrity. ECG recordings were performed using a Lead II setup, in which electrodes were strategically placed on the participant’s right arm (Vin−) at the wrist and on the left calf (Vin+), facilitating comprehensive monitoring of cardiac activity. For GSR measurements, disposable adhesive electrodes were affixed to the index and middle fingers of the participant’s right hand, capturing the subtle changes in skin conductance associated with emotional responses. The acquisition of these signals was meticulously performed, with ECG data sampled at a high resolution of 2000 Hz to capture the intricate details of heart activity, while GSR data were recorded at a frequency of 15 Hz, appropriate for tracking slower fluctuations in skin conductance [33].

3.1.3. Selection and Evaluation Process

The curatorial process for the 360 virtual environments (360-VEs) was rigorous, involving detailed focus group discussions and a pilot study to confirm that each environment reliably induced specific emotional reactions. Participants experienced the 360-VEs in a randomized order and subsequently evaluated their emotional experiences through both the Self-Assessment Manikin (SAM) and the Visual Analog Scale (VAS), in addition to reporting their immersion levels in each 360-VE.

To validate the effectiveness of the selected 360-VEs, a preliminary trial was carried out with a group of 12 volunteers. These individuals assessed the emotional impact of the VEs employing VAS, articulating their feelings across a spectrum of emotions, including anger, calmness, sadness, happiness, and fear. This rich dataset underpinned our investigation into emotion recognition, leveraging biosignal processing and advanced machine learning techniques to discern emotional states.

3.2. Feature Extraction

We extracted 60 initial features from the biosignals, including heart rate variability (HRV), morphological characteristics, and Hjorth parameters. These features were chosen for their established relevance in emotion classification studies. After applying a feature selection process, 10 features were retained based on their statistical relevance and contribution to classification accuracy.

3.2.1. Electrocardiography (ECG)

Electrocardiography (ECG) represents a non-intrusive technique to monitor the heart’s electrical dynamics, proving indispensable for deciphering emotional states. This method involves the application of electrodes on the skin to capture the heart’s electrical signals during its contraction and relaxation phases, thus offering a detailed view of cardiac electrical activity. Such detailed cardiac measurements are crucial for understanding the physiological underpinnings related to various emotional states.

The utility of ECG in categorizing emotional states is attributed to the integral connection of the heart with the autonomic nervous system, which orchestrates emotional reactions. Emotional stimuli cause specific physiological changes, such as variations in heart rate, heart rate variability (HRV), and other ECG-derived metrics, all of which are instrumental in identifying different emotions.

Moreover, the analysis extended to the morphological aspects of ECG signals, particularly the QRS complex width. This parameter, representing the time span from the QRS complex’s start to its conclusion, reflects the duration of ventricular depolarization and is susceptible to alterations under different emotional conditions. Figure 2 illustrates ECG traces corresponding to various scenarios, with the R peaks formally annotated, providing visual information on the cardiac responses elicited by various emotional stimuli.

The Teager energy operator was used to detect R peaks [38,39].

ψ_{c} [x (t)] = \dot{x} (t) \cdot \dot{x} (t) - \frac{1}{2} [\dot{x} (t) \cdot \ddot{x} (t) + x (t) \cdot \ddot{x} (t)]

(1)

where

x (t)

is our ECG signal.

3.2.2. Heart Rate Variability (HRV))

Heart rate variability (HRV) signifies the physiological variations in the intervals between successive heartbeats and can be discerned from electrocardiogram (ECG) data. It embodies the ability of the cardiovascular system to adapt to an array of stimuli, both within and outside the body. The connection between HRV and the autonomic nervous system has led to its recognition as a significant metric for evaluating emotional states.

The autonomic nervous system consists of the sympathetic and parasympathetic nervous systems. The former triggers the body’s “fight-or-flight” response during stress, while the latter modulates rest and digestion. Together, they maintain a delicate equilibrium, modulating cardiovascular functions, such as heart rate, to meet emotional and environmental demands.

HRV analysis offers a window into this sympathetic–parasympathetic nexus, yielding vital clues about a person’s emotional condition. In Figure 3, we illustrate the HRV patterns corresponding to five distinct emotional experiences, demonstrating the potential of HRV as a tool for emotional assessment.

3.2.3. Discrete Wavelet Transform (DWT)

For a thorough and precise analysis of ECG signals, our study employed the discrete wavelet transform (DWT), a mathematical tool designed for multi-level signal analysis. This technique allows for the simultaneous investigation of a signal’s behavior in both time and frequency domains. Through the DWT, the ECG signal undergoes a detailed decomposition, resulting in a range of wavelet coefficients that represent different frequency bands over specific time periods.

The application of the DWT meticulously segments the ECG signal into a collection of wavelet coefficients, each of which captures unique frequency details at corresponding time intervals. These coefficients give us a deeper understanding of the time–frequency dynamics of the ECG signal, enabling the detection of specific patterns and variations indicative of various physiological events. Each set of coefficients is linked to a discrete wavelet scale and a particular time frame, offering invaluable insights into the oscillatory components of the ECG during that interval.

The DWT operates by breaking down the signal into multiple frequency bands through wavelet filters, isolating lower frequency components that provide an approximation of the original signal, and higher frequency components that detail finer aspects. The process is inherently dyadic, meaning that with each successive decomposition level, the signal’s frequency range is halved, providing a hierarchical analysis. In our research, we conducted a 12-level DWT in the ECG sub-band with an initial sampling rate of 2000 Hz using the Daubechies 4 wavelet. The computation of DWT and its inverse for a signal

x (t)

is defined by the subsequent equations.

DWT x (a, b) = \frac{1}{\sqrt{a}} \int x (t) ψ (\frac{t - b}{a}) d t

(2)

where

ψ (t)

is the wavelet function, a denotes the scaling factor, and b represents the translation factor [40].

3.2.4. Galvanic Skin Response (GSR)

Galvanic skin response (GSR), also known as electrodermal activity (EDA), is a measurement of the electrical conductance of the skin, which fluctuates with physiological and psychological stimuli. These changes are predominantly due to the activity of sweat glands that are innervated by the sympathetic nervous system. When arousal levels increase, so does sweat production, leading to higher skin conductance. GSR sensors monitor these electrical changes, and the acquired data are then analyzed to deduce features indicative of physiological arousal and the corresponding emotional reactions. The application of GSR extends across disciplines such as psychology, neuroscience, and human–computer interaction, contributing to the study of emotions and stress. The interplay between GSR readings, sweat gland activity, and the workings of the autonomic nervous system provides researchers with valuable insights into the nuances of human behavior and affective experiences.

The GSR readouts shown in Figure 4 are associated with five different emotional states.

3.2.5. Morphological Features

Extracting morphological features from ECG and GSR signals facilitates a detailed evaluation of their structural characteristics. The key morphological features derived from these signals include their width, area, skewness, kurtosis, and slope.

Width: In the context of signal analysis, ’width’ refers to the time span of a distinct segment of the signal. This metric is essential for understanding the timing and persistence of physiological processes. Analysis of the width of ECG and GSR signals enables the identification of temporal patterns and events within the physiological data, thus aiding in the differentiation of typical and atypical physiological responses. To calculate the width, the onset and conclusion of the relevant signal fragment are pinpointed, with the width being the interval between these two temporal markers.

Width = t_{end} - t_{start}

(3)

where

t_{end}

is the ending time and

t_{start}

is the starting time of the signal component of interest [41].

Area: The area feature is determined by computing the integral of the absolute value of the signal over a specified time interval. This metric offers a quantification of the total activity or magnitude of the signal within that interval. Analyzing the area under the curve for ECG and GSR signals allows researchers to gauge the cumulative magnitude of physiological responses. Variations in the area metric can reflect changes in the intensity or overall magnitude of physiological activities.

Area = \int | x (t) | d t

(4)

where

x (t)

represents the signal, and the integral is evaluated over the specified time interval of interest [42].

Skewness: Skewness is a statistical metric that evaluates the asymmetry in the distribution of signal amplitude values. This measure sheds light on the signal’s distribution, indicating whether it leans more towards positive or negative values. Analyzing the skewness of ECG and GSR signals allows for the identification of the dominant trend in the amplitude distributions, which is instrumental in delineating specific physiological response patterns. The skewness is determined by employing the following formula:

Skewness = E [{(\frac{x (t) - μ}{σ})}^{3}]

(5)

where E denotes the expected value,

x (t)

represents the signal,

μ

is the mean of the signal, and

σ

is the standard deviation of the signal. The term

{(\frac{x (t) - μ}{σ})}^{3}

is cubed to normalize the skewness measure, ensuring it is dimensionless and provides a consistent scale for comparison [43].

Kurtosis: In signal analysis, kurtosis is a statistical metric that quantifies the distribution’s “peakedness” or its deviation from being flat-topped. This measure is instrumental in detecting outliers or extreme values within a dataset. When applied to ECG and GSR signals, kurtosis analysis sheds light on the form and distribution tendencies of the signal amplitudes. Any departure from typical kurtosis values could signal unusual physiological reactions or anomalies within the signal. The computation of kurtosis is facilitated through the following formula:

Kurtosis = E [{(\frac{x (t) - μ}{σ})}^{4}] - 3

(6)

In this expression,

x (t)

denotes the signal at time t,

μ

is the mean of the signal,

σ

represents the standard deviation of the signal, and

E [\cdot]

signifies the expected value. The subtraction of 3 adjusts the kurtosis value to zero for a normal distribution [44].

The term

{(\frac{x (t) - μ}{σ})}^{4}

is raised to the fourth power to normalize the kurtosis measure. The addition of “−3” at the end of the formula results in excess kurtosis, which is used to compare the kurtosis of the given distribution with that of a normal distribution.

Slope: The slope characteristic in signal analysis denotes the maximal rate at which the signal alters over a specific time span. This feature sheds light on the abruptness or speed of changes within the signal. Investigating the slope within ECG and GSR signals enables the detection of abrupt variations or tendencies in physiological reactions. Notably sharp slopes might signal substantial transitions in physiological conditions, reflecting swift fluctuations in arousal or emotional states. The slope, or the signal’s rate of change across a designated period, can be effectively estimated through differentiation.

Slope = \frac{d x (t)}{d t}

(7)

where

\frac{d x (t)}{d t}

represents the derivative of the signal with respect to time, approximating the signal’s rate of change over a designated time interval.

Energy: The energy of a signal is determined through the application of the Fourier transform [45].

H (e^{j ω}) = \sum x [n] e^{- j ω n}

(8)

Energy = \sum | H (e^{j ω}) |^{2}

(9)

H (e^{j ω})

, represents the frequency response of a digital filter, with

x [n]

denoting the filter’s coefficients. This expression is the standard way to characterize how the filter will affect signals of different frequencies.

Figure 5 illustrates the energy profiles of ECG signals corresponding to five different emotional states.

3.2.6. Hjorth Parameters

The Hjorth parameters constitute a trio of mathematical descriptors conceived by Bengt Hjorth in the 1970s, designed for the quantitative analysis of time-series data characteristics.

These parameters, along with other features such as heart rate variability and GSR responses, were instrumental in differentiating the five defined emotional states: excitement, happiness, anxiety, calmness, and sadness. The selection of these emotions was driven by their distinct physiological patterns, which were consistently observed across the VR experiences. In addition, they have received widespread application across various domains, particularly in the realm of physiological signal analysis, encompassing electrocardiogram (ECG) and galvanic skin response (GSR) signals.

The principal Hjorth parameters—activity, mobility, and complexity—each unravel unique facets of signal dynamics, offering a multifaceted perspective on the underlying physiological processes.

Activity, one of the Hjorth parameters, quantifies the signal’s power and is determined by calculating the signal’s variance. Analyzing the activity of the ECG and GSR signals allows researchers to evaluate the overall power or intensity of the physiological responses these signals represent. Variations in activity can shed light on the strength or magnitude of the physiological processes in question. The activity, being synonymous with variance, is mathematically defined as

Activity = σ^{2} = E [{(x (t) - μ)}^{2}]

(10)

where E denotes the expected value,

x (t)

represents the signal at time t, and

μ

is the mean of the signal [46].

Mobility is a Hjorth parameter indicative of a signal’s dynamics, calculated as the square root of the ratio of the variance of the signal’s first derivative to the variance of the signal itself. This parameter essentially captures the mean frequency or spectral breadth of the signal. Analyzing the mobility of ECG and GSR signals provides researchers with an understanding of dynamic shifts and fluctuations within physiological responses, aiding in the detection of frequency variations or spectral characteristics. The formula for mobility is given by

M o b i l i t y = \sqrt{\frac{Var (\frac{d x (t)}{d t})}{Var (x (t))}}

(11)

where

Var (\frac{d x (t)}{d t})

is the variance of the first derivative of the signal and

Var (x (t))

is the variance of the signal itself [47].

Complexity, another Hjorth parameter, provides a measure of the frequency changes within a signal. Calculated as the mobility of the signal’s first derivative divided by the mobility of the signal itself, the complexity reflects the relative frequency changes or the rate of spectral modulation. By analyzing the complexity of physiological signals, such as ECG and GSR, researchers can uncover patterns and variations in frequency content. This sheds light on dynamic physiological processes and potential regulatory mechanisms. Put simply, complexity quantifies the changes in signal frequency, and its calculation is as follows:

C o m p l e x i t y = \frac{Mobility (\frac{d x (t)}{d t})}{Mobility (x (t))}

(12)

where

Mobility (\frac{d x (t)}{d t})

is the mobility of the signal’s first derivative, and

Mobility (x (t))

is the mobility of the signal itself [48]. These measures provide valuable information on the dynamics of physiological responses captured by the ECG and GSR signals through quantitative assessments of signal properties.

3.3. Feature Selection

Feature selection is a fundamental process in machine learning that directly influences model performance. By pinpointing the most informative features within a dataset, it streamlines dimensionality, mitigates the risk of overfitting, and can significantly enhance a model’s predictive power. Thus, careful consideration of feature selection was paramount in our study.

Our feature selection process employed the chi-squared (

χ^{2}

) statistical test. As a robust non-parametric method,

χ^{2}

assesses the independence between each feature and the target variable, aiding in the identification of the most relevant features.

The chi-square feature selection method serves as an algorithmic approach to identify and retain pertinent features within a dataset, while discarding those considered irrelevant [49]. This method improves the performance of machine learning models by focusing on the most informative attributes.

To complement the statistical analysis, we employed scatter plots. These visualizations depict potential relationships between pairs of variables. By plotting each characteristic against the target variable (see Figure 6), we gained insights into patterns and correlations, corroborating the results of

χ^{2}

. This graphical approach facilitated the elimination of features that did not exhibit a discernible relationship to the target variable.

Our rigorous feature selection process allowed us to refine the dataset by isolating the most informative features. This distillation laid a strong foundation for the subsequent machine learning phase.

3.4. Machine Learning

To ensure the reliability and generalizability of our model, we employed a 5-fold cross-validation strategy during the training process. This technique involved randomly partitioning the dataset into five equal-sized subsamples. In each iteration, four subsamples were used for training, while the remaining subsample served as the validation set. This process was repeated five times, ensuring that each subsample was used for validation exactly once. We then averaged the performance of the model in all five trials, providing a robust performance estimate [50].

Five-fold cross-validation mitigated the risk of overfitting, a scenario where a model becomes overly attuned to the training data, hindering its ability to generalize to unseen data. By evaluating the model across different data subsets, we gained a more reliable picture of its true generalization potential.

We selected an ensemble of boosted trees as our machine learning model. Boosting is an ensemble technique that constructs a robust classifier by combining multiple weak classifiers, in our case, decision trees. Boosted trees operate sequentially: each tree is fitted to the data, while considering errors made by previous trees. By assigning higher weights to incorrectly classified instances, subsequent trees prioritize the correct classification. This iterative process continues for a defined number of rounds. The final model is a weighted combination of all decision trees, with weights reflecting the predictive power of each tree [51].

Ensemble-boosted tree models offer exceptional performance and effectively handle the challenges associated with complex, high-dimensional datasets. Employing this algorithm ensured our model could learn from our data’s multivariate and multi-class characteristics, resulting in superior emotion classification accuracy.

We fine-tuned our model by employing a learning rate of 0.1 and setting the number of learners to 30. To further mitigate overfitting, we limited the maximum number of splits per tree to 20.

4. Results

To rigorously assess our model’s performance, we employed four essential metrics: accuracy, precision, recall, and the F1 score.

Accuracy: This fundamental classification metric measures the proportion of correct predictions (true positives and true negatives) out of the total dataset. Our model achieved a remarkable accuracy of 97.1% during validation (Figure 1), a trend that persisted in testing with an accuracy of 97.4%. This signified exceptional efficacy in correctly classifying emotional states.

Precision: Precision quantifies a model’s exactness. It is the ratio of true positives (TP) to the sum of true positives and false positives (TP + FP). Our model’s high precision scores (Table 1 and Table 2) demonstrated a low false-positive rate, indicating that its predictions of specific emotional states were highly reliable.

Recall: Also termed sensitivity, recall measures a model’s ability to identify all relevant instances (true positives). It is calculated as TP/(TP + FN). Our model’s high recall values (Table 1 and Table 2) confirmed its capacity to detect most instances of each emotional state, missing few true positives.

F1 Score: The F1 score harmonizes precision and recall, making it especially valuable for potentially imbalanced datasets. Our model’s strong F1 scores (Table 1 and Table 2) reflected its ability to maintain both precise predictions and comprehensive identification of relevant instances.

Confusion matrices, constructed using values such the as true positives (TP) and false positives (FP), were calculated to provide a detailed understanding of the model’s accuracy across different emotional states.

The confusion matrix for the model is presented in Figure 7. The matrix shows that the model performed well in distinguishing between most emotion classes, with a few misclassifications observed between closely related emotional states such as anxiety and excitement.

4.1. Model Performance

Our emotion classification model achieved an overall accuracy of 97.78%, demonstrating high predictive capability across most emotional categories. However, a closer examination of the confusion matrix revealed misclassifications between emotions that share overlapping physiological features, such as anxiety and excitement (both high arousal emotions) and calmness and sadness (both low arousal emotions).

4.2. Error Analysis

Despite the high overall accuracy, certain classification errors emerged, particularly between the following emotional categories: Anxiety vs. Excitement: These two high-arousal emotions exhibited similar patterns in both heart rate variability (HRV) and galvanic skin response (GSR), leading to confusion in the model’s predictions. The overlap in their physiological markers made it difficult for the model to reliably distinguish between these two emotional states. Calmness vs. Sadness: Both emotions were characterized by lower levels of arousal, which resulted in similar physiological responses. Misclassification between these categories indicates that the current feature set does not capture subtle differences in their physiological signatures. Impact of Data Imbalance: A deeper inspection of the dataset revealed that some emotional categories, such as calmness and excitement, were more frequently represented in the data, while others, such as sadness, were underrepresented. This imbalance may have led the model to focus more on classes with a higher frequency of instances, resulting in higher error rates for underrepresented emotions.

4.3. Potential Reasons for Misclassifications

Physiological Signal Overlap: The primary challenge in classifying emotions arises from the physiological similarity between certain emotional states. For example, high-arousal emotions like anxiety and excitement typically increase both HRV and GSR. Without additional discriminative features, such physiological overlaps lead to misclassifications.

Limited Feature Diversity: The features currently extracted from HRV and morphological ECG aspects, while effective, may not fully capture the complexity of emotional experiences in dynamic VR environments. For emotions with similar arousal levels, the differences in physiological responses may be too subtle for the current feature set to detect.

5. Discussion

The multimodal design of the dataset, integrating self-reported responses with physiological signals, enabled a nuanced analysis of emotional states within VR environments. By assembling a diverse cohort, we achieved a balance between experimental control and representativeness, enhancing the generalizability of our findings.

The incorporation of biosignal analysis methodologies, particularly electrocardiography (ECG) and galvanic skin response (GSR), was crucial in understanding the physiological underpinnings of emotional states. The application of the discrete wavelet transform (DWT) on ECG signals revealed critical time–frequency attributes, while the GSR readings provided insights into aspects of physiological arousal and emotional reactions.

The deployment of advanced ensemble machine learning techniques, particularly boosted trees, enabled us to achieve notable precision in discerning emotions within VR contexts. The robust performance metrics of the model, including accuracy, precision, recall, and the F1 score, underscored its efficacy in accurate emotional categorization, while minimizing erroneous classifications. The adoption of 5-fold cross-validation further reinforced the model’s dependability and extrapolative power.

These findings have significant implications for the evolution of emotionally responsive VR technologies. The capacity for real-time emotional adjustment has the potential to dramatically enrich immersive quality and user engagement in VR experiences, with potential utility in entertainment, healthcare, education, and therapeutic domains. Future investigations could explore the refinement of the synergy between biosignal analysis and machine learning to enhance the emotional acuity of VR systems. The comparison of our findings with existing literature, outlined in Table 3, highlights the superior accuracy of our model, marking a significant advancement in the field.

The classification errors observed in our model point to a few critical challenges in emotion recognition using physiological signals. Emotions that exhibit similar physiological markers, such as those with similar arousal levels, are more likely to be misclassified. This highlights the need for more discriminative features or additional biosignals to improve classification accuracy.

The incorporation of EEG data, for instance, could offer a substantial improvement in the model’s ability to distinguish between closely related emotional states. Similarly, the introduction of advanced signal processing techniques, such as frequency-domain analysis and entropy measures, could yield further improvements by capturing more subtle distinctions in physiological signals.

Data augmentation techniques, aimed at addressing the imbalance in emotional categories, could also bolster performance, particularly for underrepresented emotions. Furthermore, deep learning models, such as LSTM networks, offer a promising avenue for improving real-time emotion classification in VR by effectively handling the temporal dynamics of physiological signals.

Potential Implications

This study makes several key contributions to the field of emotion classification within virtual reality (VR) systems, with significant practical implications across various domains. Using biosignals such as electrocardiography (ECG) and galvanic skin response (GSR), we developed a machine-learning-based model capable of accurately classifying emotions in real time. This technology has the potential to enhance the emotional responsiveness of VR environments, making them more adaptive and personalized for individual users.

Emotion recognition has the potential to make virtual landscapes more responsive, adaptive, and emotionally engaging. Therefore, our findings have significant potential applications in a wide range of virtual environments, particularly in the educational, therapeutic, and entertainment domains.

Applications in Education: In the field of education, our model has the potential to assist educators in detecting and responding to students’ emotions in real time, thereby positively influencing the learning experience. Emotion recognition could be integrated into virtual learning environments to monitor students’ emotional states, providing real-time feedback to both lecturers and students. For instance, signs of frustration or confusion during a lecture could be identified, allowing the system to offer additional guidance or adjust the pace of instruction accordingly. Conversely, when a student exhibits signs of engagement and excitement, the system could introduce more challenging content to maintain momentum and encourage further interest. This adaptive learning process could enhance student motivation, reduce disengagement, and ultimately improve learning outcomes.

Applications in Therapy: From a psychotherapy perspective, the application of emotion recognition in virtual reality (VR) environments presents unique opportunities for personalized treatment. Virtual environments can be designed to simulate stressful or triggering scenarios (e.g., exposure therapy for anxiety disorders) while continuously monitoring the patient’s emotional responses. Therapists could utilize real-time data, such as heart rate variability (HRV) and galvanic skin response (GSR), to assess the patient’s progress and adjust therapeutic interventions accordingly. For example, when heightened anxiety is detected, the virtual environment could automatically shift to a calming scene, or the therapist could intervene to help the patient manage their emotional state. This real-time monitoring allows for dynamic adjustments, enhancing the effectiveness of therapeutic interventions by tailoring the treatment to the patient’s emotional condition.

Entertainment and Gaming: The gaming industry could leverage emotion recognition to create more immersive and personalized gaming experiences. By detecting players’ emotions in real time, developers could dynamically adjust in-game events, challenges, and storylines based on the player’s level of engagement. For instance, if a player exhibits signs of boredom, the system could increase the game’s difficulty or introduce new, stimulating elements to re-engage the player. This real-time adaptation would enhance immersion, personalize gameplay, improve user satisfaction, and potentially extend playtime.

Virtual Training and Simulation: Emotion recognition could play a pivotal role in virtual training and simulation environments, such as those used in military, medical, or emergency response training. By monitoring trainees’ emotional responses to high-stress scenarios, instructors could gain valuable and impactful insights into how individuals manage pressure and stress. Such a system could dynamically adapt training scenarios in real-time and gradually increase intensity to help trainees develop resilience and learn to manage stress in demanding environments. This adaptive training approach could result in more effective preparation for real-world challenges, underscoring the importance and impact of the proposal.

Healthcare and Well-Being: Beyond psychotherapy, emotion recognition systems could be applied to general healthcare and well-being monitoring. In virtual fitness environments, these systems could detect signs of physical or emotional fatigue during a workout session. A virtual coach could then adjust the intensity or modify the exercise, promoting a balanced and mindful workout. Similarly, emotion recognition could be integrated into wellness applications, helping users manage stress and anxiety through real-time feedback in virtual relaxation or meditation environments.

6. Conclusions

This investigation introduced an innovative approach to emotion classification within virtual reality (VR) settings, combining biosignal processing with advanced machine learning techniques.

Five emotions—excitement, happiness, anxiety, calmness, and sadness—were specifically defined for this study based on their physiological signatures and relevance to the VR experiences utilized. Our tailored approach allowed for precise emotion classification, demonstrating the potential of biosignal analysis to improve emotionally intelligent VR environments.

The crucial point of this endeavor was to increase the VR frameworks’ ability to discern and responsively adapt to users’ emotional states in a real-time context. The empirical results underscored the efficacy of harmonizing biosignal analytics with machine learning to create emotionally intuitive VR applications.

Leveraging the “VR Eyes: Emotions Dataset” (VREED), a bespoke multimodal affective dataset crafted to elicit emotions via immersive 360° Virtual Environments (360-VEs), this study captured a rich tapestry of self-reported and physiological data, including ECG and GSR metrics, from 34 healthy subjects. These participants navigated through 12 unique 360-VEs, providing a diverse array of emotional responses for analysis.

In particular, the research methodology used an ensemble machine learning paradigm coupled with advanced feature selection techniques, notably the

χ^{2}

method, to construct a predictive model tailored for emotion classification. This model distinguished itself with a remarkable accuracy rate of 97.5% in test scenarios, attesting to its ability to delineate precise emotions within VR contexts.

Despite its successes, this study is not without limitations. The reliance on a controlled dataset, while invaluable for foundational research, necessitates further validation in more varied and unstructured real-world settings to fully ascertain the generalizability and applicability of the developed model. Furthermore, the ethical considerations and privacy implications of biosignal-based emotional analysis warrant careful consideration as this technology progresses toward widespread adoption.

Ultimately, the implications of this research for the evolution of VR technology are profound. The integration of biosignal processing with machine learning paves the way for VR experiences that are not only immersive, but also dynamically attuned and emotionally resonant. This pioneering stride in emotion classification can propel a new era of VR systems capable of perceptive and real-time emotional interactivity.

The study demonstrates that by accurately identifying and responding to user emotions in real time, VR systems can become more immersive, personalized, and emotionally resonant. The potential applications of this technology are far-reaching. In education, emotion recognition could be used to create adaptive learning environments that respond to the emotional states of students, consequently enhancing engagement and learning outcomes. In psychotherapy, VR systems equipped with emotion recognition could provide more effective and personalized treatment options by dynamically adjusting therapeutic content based on the user’s emotional responses. Similarly, in the entertainment industry, this technology could be used to tailor experiences to the emotional preferences of users, offering more engaging and emotionally satisfying content. These applications underscore the transformative potential of emotion recognition technology in making digital experiences more intuitive and human-centric.

As we venture into the future, the horizon of possibilities for enhancing and broadening these methodologies is expansive. The exploration of supplementary biosignals, such as EEG, holds promise in deepening the emotional nuance and precision of classification frameworks. Moreover, to optimize user experience, VR designers must prioritize the development of adaptive systems that incorporate real-time user feedback. Importantly, future research should investigate explainability techniques such as LIME (local interpretable model-agnostic explanations) or SHAP (Shapley additive explanations) to illuminate the model’s decision-making process. This would not only foster trust, but also deepen our understanding of the complex links between physiological signals and emotions. Ultimately, such an approach would ensure that VR environments are not only emotionally intelligent, but also finely personalized for individual users. Finally, we believe that this research’s value extends beyond its current discoveries, setting a solid foundation for future explorations aimed at realizing the full potential of emotionally intelligent virtual environments.

This research has raised questions that need further investigation. Further research that incorporates additional biosignals, such as electroencephalography (EEG) and electromyography (EMG), could provide more comprehensive data for emotion classification, particularly for distinguishing between closely related emotional states. It would be interesting to explore more advanced feature extraction techniques and machine learning models, including deep learning approaches such as recurrent neural networks (RNN) or long-short-term memory (LSTM) networks, to improve the temporal analysis of physiological signals. Furthermore, future studies should consider the integration of the model into more diverse and dynamic virtual environments, testing its applicability across different user populations and real-world scenarios. This would help to refine the system’s accuracy and generalizability, broadening its potential applications in areas such as education, therapy, and entertainment.

Author Contributions

Conceptualization, M.F.A. and M.Y.; formal analysis, E.E.A.; investigation, M.F.A.; data curation, M.Y.; methodology, M.F.A. and M.Y.; project administration, H.E.I.; resources, H.E.I.; software, H.E.I.; supervision, M.Y.; validation, H.E.I.; visualization, M.F.A.; writing—original draft, E.E.A.; writing—review and editing, M.Y. and H.E.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Gazi University Scientific Research Projects Unit (GAZİ, BAP (2023), Project Number: 8179).

Institutional Review Board Statement

Not applicable. This study did not involve humans or animals.

Informed Consent Statement

Not applicable. This study did not involve humans.

Data Availability Statement

The dataset used in study is publicly accessible at https://www.kaggle.com/datasets/lumaatabbaa/vr-eyes-emotions-dataset-vreed (accessed on 1 January 2020). The relevant codes for this research are available upon request.

Conflicts of Interest

The authors, Ebubekir Enes Arslan (E.E.A), Mehmet Feyzi Akşahin (M.F.A), Murat Yilmaz (M.Y.), and Hüseyin Emre Ilgın (H.E.I.), declare that they have no competing interests.

Abbreviations

The following abbreviations are used in this manuscript:

VREED	VR Eyes: Emotions Dataset
SKT	Skin temperature
BVP	Blood volume pulse
WPS	Wrist pulse signal
EEG	Electroencephalogram
ECG	Electrocardiogram
EMG	Electromyogram
GSR	Galvanic skin response
VR	Virtual reality
CMA	Circumplex Model of Affect
CNN	Convolutional Neural Network
SVM	Support Vector Machine
KNN	K-Nearest Neighbors
RF	Random Forest Classifier
DT	Decision Tree Classifier
LSTM	Long Short-Term Memory
GBM	Gradient Boosting Machines
LightGBM	Light Gradient Boosting
ET	Extra Trees Classifier
QDA	Quadratic Discriminant Analysis
LDA	Linear Discriminant Analysis
XGBoost	Extreme Gradient Boosting
GBC	Gradient Boosting Classifier
LR	Logistic Regression
NB	Naive Bayes
SEED	SJTU Emotion EEG Dataset
DEAP	Dataset for Emotion Analysis using Physiological Signals
ML	Machine Learning
VAS	Visual Analog Scale
VE	Visual Environment
SMOTE	Synthetic Minority Over-sampling Technique

References

Riva, G. Applications of virtual reality technology in clinical medicine. Stud. Health Technol. Inform. 2003, 94, 265–295. [Google Scholar]
Brave, S.; Nass, C. The role of emotions in human-computer interaction. Interact. Stud. 2003, 4, 53–82. [Google Scholar]
Mavridou, I. Affective State Recognition in Virtual Reality from Electromyography and Photoplethysmography Using Head-Mounted Wearable Sensors. Ph.D. Thesis, Bournemouth University, Poole, UK, 2021. [Google Scholar]
Bekele, E.; Bian, D.; Peterman, J.; Park, S.; Sarkar, N. Design of a virtual reality system for affect analysis in facial expressions (VR-SAAFE); application to schizophrenia. IEEE Trans. Neural Syst. Rehabil. Eng. 2016, 25, 739–749. [Google Scholar] [CrossRef]
Cha, H.S.; Choi, S.J.; Im, C.H. Real-time recognition of facial expressions using facial electromyograms recorded around the eyes for social virtual reality applications. IEEE Access 2020, 8, 62065–62075. [Google Scholar] [CrossRef]
Ghosh, S.; Winston, L.; Panchal, N.; Kimura-Thollander, P.; Hotnog, J.; Cheong, D.; Reyes, G.; Abowd, G.D. Notifivr: Exploring interruptions and notifications in virtual reality. IEEE Trans. Vis. Comput. Graph. 2018, 24, 1447–1456. [Google Scholar] [CrossRef] [PubMed]
Hsu, Y.; Wang, J.; Chiang, W.; Hung, C. Automatic ECG-Based Emotion Recognition in Music Listening. IEEE Trans. Affect. Comput. 2020, 11, 85–99. [Google Scholar] [CrossRef]
Giannakakis, G.; Grigoriadis, D.; Giannakaki, K.; Simantiraki, O.; Roniotis, A.; Tsiknakis, M. Review on psychological stress detection using biosignals. IEEE Trans. Affect. Comput. 2019, 13, 440–460. [Google Scholar] [CrossRef]
Bellalouna, F. Industrial case studies for digital transformation of engineering processes using the virtual reality technology. Procedia CIRP 2020, 90, 636–641. [Google Scholar] [CrossRef]
Somarathna, R.; Bednarz, T.; Mohammadi, G. Virtual reality for emotion elicitation—A review. IEEE Trans. Affect. Comput. 2023, 14, 2626–2645. [Google Scholar] [CrossRef]
Koliv, H.; Abuhashish, F.; Zraqou, J.; Alkhodour, W.; Sunar, M.S. Emotion Interaction with Virtual Reality Using Hybrid Emotion Classification Technique toward Brain Signals. Int. J. Comput. Sci. Inf. Technol. 2015, 6, 1451–1456. [Google Scholar]
Suhaimi, N.S.; Mountstephens, J.; Teo, J. A Dataset for Emotion Recognition Using Virtual Reality and EEG (DER-VREEG): Emotional State Classification Using Low-Cost Wearable VR-EEG Headsets. Big Data Cogn. Comput. 2022, 6, 16. [Google Scholar] [CrossRef]
Robert, F.; Winckler, M.; Wu, H.Y.; Sassatelli, L. Analysing and Understanding Embodied Interactions in Virtual Reality Systems. In Proceedings of the 2023 ACM International Conference on Interactive Media Experiences MMSys ’22, Nantes, France, 12–15 June 2023; pp. 362–366. [Google Scholar]
Barricelli, B.R.; Gadia, D.; Rizzi, A.; Marini, D. Semiotics of virtual reality as a communication process. Behav. Inf. Technol. 2016, 35, 879–896. [Google Scholar] [CrossRef]
Bayro, A.; Buneo, C.; Jeong, H.R. Emotion Recognition in Virtual Reality: Investigating the Effect of Gameplay Variations on Affective Responses. Proc. Hum. Factors Ergon. Soc. Annu. Meet. 2023, 67, 1516–1517. [Google Scholar] [CrossRef]
Alva, M.; Nachamai, M.; Paulose, J. A comprehensive survey on features and methods for speech emotion detection. Int. J. Speech Technol. 2015, 18, 555–567. [Google Scholar]
Zhang, Z.; Cui, L.; Liu, X.; Zhu, T. Emotion Detection Using Kinect 3D Facial Points. In Proceedings of the 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI), Omaha, NE, USA, 13–16 October 2016; pp. 4252–4257. [Google Scholar]
Yang, G.; Ortoneda, J.; Saniie, J. Emotion Recognition Using Deep Neural Network with Vectorized Facial Features. In Proceedings of the 2018 IEEE International Conference on Electro/Information Technology (EIT), Rochester, MI, USA, 3–5 May 2018. [Google Scholar]
Tiwari, S.; Agarwal, S. A shrewd artificial neural network-based hybrid model for pervasive stress detection of students using galvanic skin response and electrocardiogram signals. Big Data 2021, 9, 427–442. [Google Scholar] [CrossRef]
Bota, P.J.; Wang, C.; Fred, A.; da Silva, H.P. A Review, Current Challenges, and Future Possibilities on Emotion Recognition Using Machine Learning and Physiological Signals. IEEE Access 2019, 7, 109951–109972. [Google Scholar] [CrossRef]
Uyanık, H.; Ozcelik, S.; Duranay, Z.; Sengur, A.; Acharya, U. Use of Differential Entropy for Automated Emotion Recognition in a Virtual Reality Environment with EEG Signals. Diagnostics 2022, 12, 2508. [Google Scholar] [CrossRef]
Chen, Y.; Hsiao, C.; Zheng, W.; Lee, R.; Lin, R. Artificial neural networks-based classification of emotions using wristband heart rate monitor data. Medicine 2019, 98, e16863. [Google Scholar] [CrossRef]
Dominguez-Jimenez, J.; Campo-Landines, K.; Martínez Santos, J.C.; Delahoz, E.; Ortiz, S.H.C. A machine learning model for emotion recognition from physiological signals. Biomed. Signal Process. Control 2020, 55, 101641. [Google Scholar] [CrossRef]
Ayata, D.; Yaslan, Y.; Kamasak, M. Emotion Recognition from Multimodal Physiological Signals for Emotion Aware Healthcare Systems. IEEE Access 2020, 8, 155876–155888. [Google Scholar] [CrossRef]
Bălan, O.; Moise, G.; Moldoveanu, A.; Leordeanu, M.; Moldoveanu, F. An Investigation of Various Machine and Deep Learning Techniques Applied in Automatic Fear Level Detection and Acrophobia Virtual Therapy. Sensors 2020, 20, 496. [Google Scholar] [CrossRef]
Marín-Morales, J.; Llinares, C.; Guixeres, J.; Alcañiz, M. Emotion Recognition in Immersive Virtual Reality: From Statistics to Affective Computing. Sensors 2020, 20, 5163. [Google Scholar] [CrossRef]
Saganowski, S. Bringing Emotion Recognition Out of the Lab into Real Life: Recent Advances in Sensors and Machine Learning. Electronics 2022, 11, 496. [Google Scholar] [CrossRef]
Pal, S.; Mukhopadhyay, S.; Suryadevara, N.K. Development and Progress in Sensors and Technologies for Human Emotion Recognition. Sensors 2021, 21, 5554. [Google Scholar] [CrossRef]
Wang, Y.; Song, W.; Tao, W.; Liotta, A.; Yang, D.; Li, X.; Gao, S.; Sun, Y.; Ge, W.; Zhang, W.; et al. A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances. Inf. Fusion 2022, 81, 120–133. [Google Scholar] [CrossRef]
Dissanayake, V.; Seneviratne, S.; Rana, R.; Wen, E.; Kaluarachchi, T.; Nanayakkara, S. SigRep: Toward Robust Wearable Emotion Recognition with Contrastive Representation Learning. IEEE Access 2022, 10, 46076–46088. [Google Scholar] [CrossRef]
Ali, M.; Al Machot, F.; Haj Mosa, A.; Jdeed, M.; Al Machot, E.; Kyamakya, K. A Globally Generalized Emotion Recognition System Involving Different Physiological Signals. Sensors 2018, 18, 1905. [Google Scholar] [CrossRef]
Su, Y.; Hu, B.; Xu, L.X.; Zhang, X.; Chen, J. EEG-data-oriented knowledge modeling and emotion recognition. Chin. Sci. Bull. 2015, 60, 1480–1488. [Google Scholar]
Tabbaa, L.; Searle, R.; Ang, C.; Bafti, S.; Hossain, M.; Intarasirisawat, J.; Glancy, M. VREED: Virtual Reality Emotion Recognition Dataset using Eye Tracking and Physiological Measures. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2021, 5, 32. [Google Scholar] [CrossRef]
Wen, W.; Liu, G.; Cheng, N.; Wei, J.; Shangguan, P.; Huang, W. Emotion Recognition Based on Multi-Variant Correlation of Physiological Signals. IEEE Trans. Affect. Comput. 2014, 5, 126–140. [Google Scholar] [CrossRef]
Valderas, M.; Bolea, J.; Laguna, P.; Vallverdú, M.; Bailón, R. Human emotion recognition using heart rate variability analysis with spectral bands based on respiration. In Proceedings of the 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Milan, Italy, 25–29 August 2015; pp. 6134–6137. [Google Scholar]
Marín-Morales, J.; Higuera-Trujillo, J.L.; Guixeres, J.; Llinares, C.; Alcañiz, M.; Valenza, G. Heart rate variability analysis for the assessment of immersive emotional arousal using virtual reality: Comparing real and virtual scenarios. PLoS ONE 2021, 16, e0254098. [Google Scholar] [CrossRef]
Nardelli, M.; Valenza, G.; Greco, A.; Lanatà, A.; Scilingo, E. Recognizing Emotions Induced by Affective Sounds through Heart Rate Variability. IEEE Trans. Affect. Comput. 2015, 6, 385–394. [Google Scholar] [CrossRef]
Akşahin, M.; Erdamar, A.; Fırat, H.; Ardıç, S.; Eroğul, O. Obstructive sleep apnea classification with artificial neural network based on two synchronic hrv series. Biomed. Eng. Appl. Basis Commun. 2015, 27, 1550011. [Google Scholar] [CrossRef]
Hamila, R.; Astola, J.; Cheikh, F.; Gabbouj, M.; Renfors, M. Teager Energy and The Ambiguity Function. IEEE Trans. Signal Process. 1999, 47, 260–262. [Google Scholar] [CrossRef]
Mallat, S. A Wavelet Tour of Signal Processing; Academic Press: Cambridge, MA, USA, 1999. [Google Scholar]
Perotti, L.; Vrinceanu, D.; Bessis, D. Recovery of the Starting Times of Delayed Signals. IEEE Signal Process. Lett. 2017, 25, 1455–1459. [Google Scholar] [CrossRef]
Tanaka, S.; Maeda, Y. Time integrals of input signal and output signal in linear measurement systems. Thermochim. Acta 1996, 273, 269–276. [Google Scholar] [CrossRef]
Liu, Z.; Qiao, H. Investigation on the skewness for independent component analysis. Sci. China Inf. Sci. 2011, 54, 849–860. [Google Scholar] [CrossRef]
Groeneveld, R.A.; Meeden, G. Measuring skewness and kurtosis. J. R. Stat. Soc. Ser. D Stat. 1984, 33, 391–399. [Google Scholar] [CrossRef]
Welch, P. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms. IEEE Trans. Audio Electroacoust. 1967, 15, 70–73. [Google Scholar] [CrossRef]
Vourkas, M.; Micheloyannis, S.; Papadourakis, G. Use of ANN and Hjorth parameters in mental-task discrimination. In Proceedings of the IET Conference Proceedings, IET, Bristol, UK, 4–6 September 2000; pp. 327–332. [Google Scholar]
Grover, C.; Turk, N. Rolling element bearing fault diagnosis using empirical mode decomposition and Hjorth parameters. Procedia Comput. Sci. 2020, 167, 1484–1494. [Google Scholar] [CrossRef]
Oh, S.H.; Lee, Y.R.; Kim, H.N. A novel EEG feature extraction method using Hjorth parameter. Int. J. Electron. Electr. Eng. 2014, 2, 106–110. [Google Scholar] [CrossRef]
Peker, N.; Kubat, C. Application of Chi-square discretization algorithms to ensemble classification methods. Expert Syst. Appl. 2021, 185, 115540. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Zikeba, M.; Tomczak, S.K.; Tomczak, J.M. Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst. Appl. 2016, 58, 93–101. [Google Scholar]
Jung, D.; Choi, J.; Kim, J.; Cho, S.; Han, S. EEG-Based Identification of Emotional Neural State Evoked by Virtual Environment Interaction. Int. J. Environ. Res. Public Health 2022, 19, 2158. [Google Scholar] [CrossRef] [PubMed]
Zheng, L.J.; Mountstephens, J.; Teo, J. Eye Fixation Versus Pupil Diameter as Eye-Tracking Features for Virtual Reality Emotion Classification. In Proceedings of the 2021 IEEE International Conference on Computing (ICOCO), Kuala Lumpur, Malaysia, 17–19 November 2021; pp. 315–319. [Google Scholar]
Bulagang, A.F.; Mountstephens, J.; Teo, J. Multiclass Emotion Prediction Using Heart Rate and Virtual Reality Stimuli. J. Big Data 2021, 8, 1–12. [Google Scholar] [CrossRef]
Lim, J.Z.; Mountstephens, J.; Teo, J. Exploring Pupil Position as an Eye-Tracking Feature for Four-Class Emotion Classification in VR. J. Phys. Conf. Ser. 2021, 2129, 012069. [Google Scholar] [CrossRef]
Bulagang, A.F.; Mountstephens, J.; Teo, J. Electrodermography and Heart Rate Sensing for Multiclass Emotion Prediction in Virtual Reality: A Preliminary Investigation. In Proceedings of the 2021 IEEE Symposium on Industrial Electronics & Applications (ISIEA), Penang, Malaysia, 10–12 November 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
Bulagang, A.F.; Mountstephens, J.; Teo, J. A Novel Approach for Emotion Classification in Virtual Reality Using Heart Rate (HR) and Inter-Beat Interval (IBI). In Proceedings of the 2021 IEEE International Conference on Computing (ICOCO), Kuala Lumpur, Malaysia, 17–19 November 2021; pp. 247–252. [Google Scholar]
Zheng, L.J.; Mountstephens, J.; Teo, J. Multiclass Emotion Classification Using Pupil Size in VR: Tuning Support Vector Machines to Improve Performance. J. Phys. Conf. Ser. 2020, 1529, 052062. [Google Scholar] [CrossRef]
Antoniou, P.E.; Arfaras, G.; Pandria, N.; Athanasiou, A.; Ntakakis, G.; Babatsikos, E.; Nigdelis, V.; Bamidis, P. Biosensor Real-Time Affective Analytics in Virtual and Mixed Reality Medical Education Serious Games: Cohort Study. JMIR Serious Games 2020, 8, e17823. [Google Scholar] [CrossRef]
Gupta, K.; Lazarevic, J.; Pai, Y.S.; Billinghurst, M. AffectivelyVR: Towards VR Personalized Emotion Recognition. In Proceedings of the 26th ACM Symposium on Virtual Reality Software and Technology (VRST ’20), Virtual Event, Canada, 1–4 November 2020; ACM: New York, NY, USA, 2020; pp. 1–3. [Google Scholar]
Liang, J.; Chen, S.; Jin, Q. Semi-Supervised Multimodal Emotion Recognition with Improved Wasserstein GANs. In Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Lanzhou, China, 18–21 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 695–703. [Google Scholar]
Nam, J.; Chung, H.; Lee, H.; Choi, S.; Kim, S. A New Terrain in HCI: Emotion Recognition Interface Using Biometric Data for an Immersive VR Experience. arXiv 2019, arXiv:1912.01177. [Google Scholar]
Murphy, D.; Higgins, C. Secondary Inputs for Measuring User Engagement in Immersive VR Education Environments. arXiv 2019, arXiv:1910.01586. Available online: https://arxiv.org/abs/1910.01586 (accessed on 1 October 2023).
Hinkle, L.B.; Roudposhti, K.K.; Metsis, V. Physiological Measurement for Emotion Recognition in Virtual Reality. In Proceedings of the 2019 2nd International Conference on Data Intelligence and Security (ICDIS), South Padre Island, TX, USA, 8–10 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 136–143. [Google Scholar]
Garcia-Agundez, A.; Reuter, C.; Becker, H.; Konrad, R.; Caserman, P.; Miede, A.; Göbel, S. Development of a Classifier to Determine Factors Causing Cybersickness in Virtual Reality Environments. Games Health J. 2019, 8, 439–444. [Google Scholar] [CrossRef] [PubMed]
Suhaimi, N.S.; Yuan, C.T.B.; Teo, J.; Mountstephens, J. Modeling the Affective Space of 360 Virtual Reality Videos Based on Arousal and Valence for Wearable EEG-Based VR Emotion Classification. In Proceedings of the 2018 IEEE 14th International Colloquium on Signal Processing & Its Applications (CSPA), Batu Ferringhi, Malaysia, 9–10 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 167–172. [Google Scholar]
Cho, D.; Ham, J.; Oh, J.; Park, J.; Kim, S.; Lee, N.K.; Lee, B. Detection of Stress Levels from Biosignals Measured in Virtual Reality Environments Using a Kernel-Based Extreme Learning Machine. Sensors 2017, 17, 2435. [Google Scholar] [CrossRef]
Diemer, J.; Alpers, G.W.; Peperkorn, H.M.; Shiban, Y.; Mühlberger, A. The Impact of Perception and Presence on Emotional Reactions: A Review of Research in Virtual Reality. Front. Psychol. 2015, 6, 26. [Google Scholar] [CrossRef]

Figure 1. The research process.

Figure 2. ECG graph samples showing distinct patterns corresponding to the five defined emotional states: excitement, happiness, anxiety, calmness, and sadness.

Figure 3. Sample HRV series.

Figure 4. Sample GSR series illustrating the physiological responses associated with the five emotions defined in this study.

Figure 5. The energy of ECG signals.

Figure 6. Feature distributions.

Figure 7. The confusion matrix for the model. (a) The confusion matrix for testing. (b) The confusion matrix for testing. (c) The confusion matrix for validation. (d) The confusion matrix for validation.

Table 1. Validation performance criteria.

Emotional State	Precision	Recall	F1 Score	Accuracy
0	95.65%	91.67%	93.62%	91.70%
1	100.00%	100.00%	100.00%	100.00%
2	100.00%	100.00%	100.00%	100.00%
3	92.00%	95.83%	93.88%	95.80%
4	100.00%	100.00%	100.00%	100.00%
Overall	97.53%	97.50%	97.50%	97.50%

Table 2. Test performance criteria.

Emotional State	Precision	Recall	F1 Score	Accuracy
0	94.44%	94.44%	94.44%	94.40%
1	100.00%	100.00%	100.00%	100.00%
2	100.00%	100.00%	100.00%	100.00%
3	94.44%	94.44%	94.44%	94.40%
4	100.00%	100.00%	100.00%	100.00%
Overall	97.78%	97.78%	97.78%	97.78%

Table 3. Comparison with related work.

Method	Description	Accuracy	Reference
SVM (class weight kernel)	Used for emotion classification with EEG signals in VR	85.01%	Suhaimi et al. [12]
EEG signals with XGBoost classifiers	Best performance among tested classifiers	-	Jung et al. [52]
Eye fixation vs. pupil diameter	Eye-tracking features for emotion classification	75% (eye fixation)	Zheng et al. [53]
KNN and SVM with heart rate	Emotion prediction in VR	82%	Bulagang et al. [54]
Pupil position with SVM	Emotion classification using pupil position	59.19%	Lim et al. [55]
Heart rate and electrodermography with SVM	High accuracy for multi-class emotion classification	-	Bulagang et al., [56]
SVM with heart rate and IBI	Classify emotions in four quadrants	-	Bulagang et al., [57]
SVM with eye-tracking data	Emotion classification using eye-tracking	57.65%	Zheng et al. [58]
Wearable biosensors	Detect emotions using heart rate, EDA, and EEG	-	Antoniou et al. [59]
AffectivelyVR with KNN and GSR sensors	Personalized emotion recognition	96.5%	Gupta et al. [60]
Semi-supervised GANs	Improved emotion classification method	-	Liang et al. [61]
Brainwave sensors and eye-tracking	Predict user’s attraction on visual stimuli	-	Nam et al. [62]
Biosignals for affective feedback	Indicators of emotional state in VR	-	Murphy & Higgins [63]
General purpose features from biosignals	Compared to traditional domain-specific features	-	Hinkle et al. [64]
Biosignals and game parameters for cybersickness	Determine occurrence of cybersickness	82% (binary), 56% (ternary)	Garcia-Agundez et al. [65]
Database for emotional analysis with EEG	Using YouTube 360 videos for emotion classification	-	Suhaimi et al. [66]
K-ELM for stress levels	Classification of stress levels in VR	>95%	Cho et al. [67]
Interoceptive attribution model	Framework for emotion research in VR	-	Diemer et al. [68]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Arslan, E.E.; Akşahin, M.F.; Yilmaz, M.; Ilgın, H.E. Towards Emotionally Intelligent Virtual Environments: Classifying Emotions through a Biosignal-Based Approach. Appl. Sci. 2024, 14, 8769. https://doi.org/10.3390/app14198769

AMA Style

Arslan EE, Akşahin MF, Yilmaz M, Ilgın HE. Towards Emotionally Intelligent Virtual Environments: Classifying Emotions through a Biosignal-Based Approach. Applied Sciences. 2024; 14(19):8769. https://doi.org/10.3390/app14198769

Chicago/Turabian Style

Arslan, Ebubekir Enes, Mehmet Feyzi Akşahin, Murat Yilmaz, and Hüseyin Emre Ilgın. 2024. "Towards Emotionally Intelligent Virtual Environments: Classifying Emotions through a Biosignal-Based Approach" Applied Sciences 14, no. 19: 8769. https://doi.org/10.3390/app14198769

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards Emotionally Intelligent Virtual Environments: Classifying Emotions through a Biosignal-Based Approach

Abstract

1. Introduction

2. Background

3. Methods

3.1. Dataset Summary

3.1.1. Subject Properties

3.1.2. Signal Properties

3.1.3. Selection and Evaluation Process

3.2. Feature Extraction

3.2.1. Electrocardiography (ECG)

3.2.2. Heart Rate Variability (HRV))

3.2.3. Discrete Wavelet Transform (DWT)

3.2.4. Galvanic Skin Response (GSR)

3.2.5. Morphological Features

3.2.6. Hjorth Parameters

3.3. Feature Selection

3.4. Machine Learning

4. Results

4.1. Model Performance

4.2. Error Analysis

4.3. Potential Reasons for Misclassifications

5. Discussion

Potential Implications

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI