1. Introduction
Human–Computer Interaction (HCI) research is focused on making interaction with computers more productive and interactive. One of the methods used for improving the interaction between humans and computers is to provide emotional intelligence to computing systems. Such systems are capable of adapting depending on the emotional state of the user. Some examples for such systems include entertainment systems, healthcare systems, adaptive learning systems and computer games.
Previous studies have investigated different methods to provide emotional intelligence to computers. Among those methods, facial image based emotion recognition is the most widely used method since this method can recognize a wide range of emotion types [
1,
2]. Another approach is the speech signal analysis that determines the human emotion by analysing the patterns in the speech signal [
3,
4]. In addition, direct examination of the person is one of the most sophisticated ways for human emotion recognition [
5,
6]. These methods use different types of bio-signals to continuously monitor and detect human emotions. Due to its un-maskable nature, compared to facial emotion recognition and speech analysis, bio-signal based methods provide highly accurate results [
7,
8].
Several attempts have been made to use different types of bio-signals for emotion recognition. Some of the widely used bio-signals include the electrocardiogram (ECG), galvanic skin response (GSR), electromyogram and respiration. Some studies have combined multiple bio-signal devices to recognize emotions, whereas other studies have used a single biosensor such as ECG for capturing data for emotion recognition [
8,
9,
10]. Despite the increased recognition accuracy, using multiple sensors might lead to user dissatisfaction. Compared to other biosensors, ECG is the most widely used biosensor because ECG signals are less noisy and they contain emotion related information [
9,
11].
Previous studies conducted to investigate the use of ECG signals have used different types of feature extraction methods. Some of those methods include heart rate variability (HRV), empirical mode decomposition (EMD), with-in beat analysis (WIB) and wavelet transforms [
8,
10]. Although those feature extraction methods are sophisticated methods, they suffer from low prediction accuracy. However, too little attention has been given to combining these sophisticated methods to achieve higher emotion recognition accuracy.
This study presents an ensemble learning approach for recognising human emotions using ECG signals. First, this investigation will discuss the protocol followed to obtain emotion-related ECG data through an experiment. Following that, this study will describe three broadly used ECG signal feature extraction methods, namely: HRV; EMD; and WIB analysis, and also a novel method proposed in this study based on the frequency spectrum of the ECG wave. Finally, this research will elaborate on the machine learning process that followed to create a classification model by combining the mentioned feature extraction methods. Briefly, the machine learning model takes a 20s window of an ECG signal and then classifies the signal to four distinct emotion classes, namely: anger; sadness; joy; and pleasure. The machine learning process first evaluates the capability of different ensemble learners for emotion classification. After analysing, as an additional step, a feature selection process is employed to improve the classification accuracy of each ensemble learner. This step is based on previous related studies on ensemble learners [
12,
13,
14,
15,
16] and feature selection strategies for ensemble learners [
17,
18,
19,
20]. Finally, this research presents the selected features from the feature selection process and compares the obtained results with different ECG based emotion recognition models in the literature.
2. Related Work
In recent years, there has been an increasing amount of literature on human–computer interaction methods to provide emotional intelligence to computers. Emotional intelligence is widely used to develop emotionally-aware healthcare monitoring systems, computer games and entertainment systems and safe driving systems. In computer games, emotional intelligence can be used to evaluate the player’s affective state for dynamic game content generation [
5,
21,
22]. Similarly, in vehicle safety systems, emotion recognition models are used to monitor the affective state of the driver while operating [
23,
24]. Furthermore, in health care systems, emotional intelligence is employed to monitor the emotional state of patients [
6,
25,
26].
Rattanyu and Mizukawa [
9] discuss speech analysis, facial feature analysis and bio-signal processing as the primary methods for emotion recognition. Firstly, speech based emotion recognition methods determine the emotion by analysing a given speech signal. The main drawback of this method is that the user needs to speak continuously if the system wants to figure out the emotional state. Secondly, facial image based recognition is another widely used method for emotion recognition [
1,
2]. Although it provides accurate predictions for emotions, the main problem in this method is that some people tend to mask their emotional states (social masking) while predicting [
7,
8]. Finally, bio-signal processing methods use different types of bio-signals to predict emotions. A bio-signal based method is an adequate solution for recognizing emotions compared to other methods. Because of its unmasked nature, bio-signals improve the predicting accuracy compared to facial image based recognition methods [
8]. In addition, since it is available continuously, unlike speech based systems [
3,
4], the system can continuously identify the emotion level.
Numerous studies have attempted to use different types of bio-signals for detecting emotions [
9,
10,
27]. Kim and André [
8] developed an emotion recognition model incorporating four different biosensors: electrocardiogram, skin conductivity, electromyogram and respiration. In their investigation, they have achieved 70% accuracy for a person independent of emotion classification. The developed model was able to classify a given set of bio-signal patterns to four emotion classes: anger, sadness, pleasure and joy. In another major study, Rattanyu and Mizukawa [
9] developed an emotion recognition model using ECG signals with a classification accuracy of 61%. In a study that set out to develop a neural network based emotion recognition model, Yoo et al. [
28] developed a model incorporating ECG signals and skin conductivity. In their study, they have formed a classification model with 80.1% accuracy. In another major investigation, Ayata et al. [
29] developed an emotion recognition model to classify arousal and valence using galvanic skin response. The mentioned model incorporates features from empirical mode decomposition and statistical analysis methods. Among multi-sensor based emotion recognition models, the model developed by Nazos et al. [
30] has the ability to recognize sadness, anger, surprise, fear, frustration and amusement with up to 83% accuracy. Furthermore, the recent investigation by Gouizi et al. [
31] has an accuracy of 83% for recognizing six emotions by using six different biosensors. More information on biosensor based emotion recognition can be found in the recent state-of-art reviews by Jeritta et al. [
32] and Egger et al. [
33].
Murugappan et al. [
34] discuss the challenges and limitations in multi-sensor based emotion recognition. One of the major challenges is the increased computational complexity due to multiple sensor data streams and algorithmic requirements. The other factor is the limitation to subjects’ freedom (movements) due to multiple sensor probes, wires, etc. By building on the concept of simplicity, they have been able to develop an emotion recognition model with a classification accuracy of 66.48% by only using ECG signals. Considering all of these factors, the selected method should be able to provide high emotion recognition accuracy with a minimum number of sensors.
A number of studies have examined the use of ECG signals for emotion recognition [
7,
10,
27,
34,
35]. An ECG based method is an adequate solution due to four important reasons. Firstly, the ECG signal is a result of activities in the heart that has nerve endings from the autonomic nervous system that governs the behaviour of each emotion [
11]. Secondly, ECG sensors can be used as a wearable device [
36]. Thirdly, it is convenient to use because ECG signals can be captured from different parts of the body [
37]. Finally, it has a high amplitude compared to other biosensors [
9].
To date, various methods have been developed and introduced to extract features from ECG signals. One commonly used method is the heart rate variability analysis [
8,
28,
38]. HRV analysis is a broadly used method in biomedical engineering applications [
39]. The method developed by Ferdinando et al. [
38] using HRV analysis had an accuracy of around 59% for identifying arousal and valence state of a person. Similarly, another study that developed an emotion recognition model incorporating ECG signals and skin resistance had 80% accuracy for recognizing the four quadrants of the discrete emotional model [
40].
Another widely used method is the empirical mode decomposition. EMD is one of the well-structured approaches to analysing non-stationary and nonlinear data [
41]. Furthermore, according to investigations done by Manjula and Sarma [
42], compared to wavelets, EMD performs better while extracting spectral power based features. Foteini et al. [
43] point out that the first six intrinsic mode functions generated from the EMD method relates to a specific activity in the heart. Developing on that, the number of studies have used empirical mode decomposition for analysing ECG signals [
7,
27]. Jerrita et al. [
7] investigated the use of Hilbert–Huang transform (HHT) for EMD based feature extraction and came up with a classification model with 54% accuracy for identifying six emotions in the discrete emotion model [
40].
With-in beat analysis is another method for ECG based emotion recognition that has a high emotion recognition accuracy compared to EMD and HRV methods. This method was introduced by Rattanyu and Mizukawa [
9] for recognising six emotions in the emotional spectrum. Their model was able to identify an emotion with up to 61% accuracy using ECG signals.
Some of the studies have used discrete Fourier transform (DFT) to extract frequency domain features from the ECG signal. Jerritta et al. [
10] discuss the advantages of using frequency domain features compared to EMD based features. They claim that, unlike EMD features that provide an idea about the local properties of the ECG wave, the DFT method provides information about the frequency content of the signal. In their study, they have achieved 54% accuracy for recognizing neutral, happiness, sadness, fear, surprise and disgust emotions from ECG signals utilizing DFT based features of ten intrinsic mode functions derived from the EMD method.
Collectively, most of the studies have used different analysis methods to extract features from ECG signals. In HRV analysis, the HRV time series is generated only by considering the R–R interval variations of the ECG wave. However, the features extracted from this method represent features from both the time domain and the frequency domain of the HRV wave. Similarly, the EMD technique decomposes the signal into a set of oscillating signals. The features extracted from this method also correspond to a set of fragmented features that has correlations to an ECG wave. However, compared to these two methods, the within beat method analyses the raw ECG wave in the time domain. In addition, compared to frequency domain based features extracted by EMD and HRV methods, the DFT method provides an overview of the frequency domain of the raw ECG wave. Each of these approaches has its own advantages, and the features generated by all of these techniques represent a broad range of features that correspond to different domains and spaces of the ECG wave. However, most of the ECG based feature extraction methods in the literature have emotion recognition capability around 55% for different types of classification requirements.
Together, these studies highlight the need for an accurate emotion recognition model with a minimum number of biosensors. The studies presented thus far provide evidence that ECG is the best method for capturing bio-signals because ECG signals contain emotion-related information. In addition, considering the accuracies gained from the classification models, there is a need for higher classification accuracy. However, the methods represented in the literature extract a wide range of features from the ECG wave, and they are sophisticated methods for examining time-varying data. Up to now, a number of studies have investigated different approaches for ECG based emotion recognition. However, up to now, no one has investigated the feasibility of combining well-known ECG based feature extraction methods to select an optimal set of features that gives higher emotion classification accuracy.
Considering most of the studies mentioned in the literature, it is apparent that the majority of them have used traditional single learner algorithms as the prediction model. Most of the considered algorithms include support vector machines, K-nearest neighbour, Fisher analysis and artificial neural networks. Even though most of the mentioned algorithms are well-equipped techniques, a majority of them lack the ability to recognize emotions with a higher classification accuracy. Recently, ensemble learning methods have been used to improve the classification accuracy of various problems in different domains, and they have gained significant accuracy improvements after applying these techniques [
15,
16,
19]. Furthermore, even the research in the domain of biomedical signal analysis have also used these ensemble techniques to improve the model performance [
12,
17,
18].
Even though there has been an extensive amount of research conducted in defining primal emotions for humans, yet, while developing prediction models, research has selected different emotions as their target emotions [
44]. This investigation is based on the 2D emotional model proposed by J. A Russel [
45] where the emotions are placed in a 2D arousal and valance space. To be more broad in the aspect the emotion selection, this study considers the primal emotion of each emotional quadrant as the selected emotion. This selection will improve the diverse nature of the predictions made in the study. Furthermore, there are similar studies that had the same set of emotions as their targets, and those gained classification accuracies will be beneficial while benchmarking the developed model. Therefore, the analysis of this study is focused on recognizing four primal emotions, namely: anger; sadness; pleasure; and joy. Additionally, this study presents the classification results of two models developed incorporating two additional emotions in the emotional spectrum. Furthermore, a complete overview of the emotions and their organization in the arousal valance space is described in the next
Section 3.1.3.
The main objective of this paper is to evaluate the capability of ensemble learners for biosensor based human emotion recognition that requires higher prediction accuracies. This research combines four ECG based feature extraction methods, namely: HRV; EMD; WIB; and DFT based. The first two techniques are the most widely used methods in the literature, and this study uses the with-in beat method because of its high emotion recognition accuracy. Additionally, this study introduces a novel method that extracts a set of frequency-domain features from ten frequency bands of the ECG wave employing discrete Fourier transform (named as TFB features). As an additional step for the ensemble learning procedure, the machine learning analysis of this paper selects a set of optimal features by combining the mentioned feature extraction methods for recognising anger, sadness, joy and pleasure.