Physiological Signal-Based Real-Time Emotion Recognition Based on Exploiting Mutual Information with Physiologically Common Features

Han, Ean-Gyu; Kang, Tae-Koo; Lim, Myo-Taeg

doi:10.3390/electronics12132933

Open AccessArticle

Physiological Signal-Based Real-Time Emotion Recognition Based on Exploiting Mutual Information with Physiologically Common Features

by

Ean-Gyu Han

¹

,

Tae-Koo Kang

^2,*

and

Myo-Taeg Lim

^1,*

¹

School of Electrical Engineering, Korea University, Seoul 02841, Republic of Korea

²

Department of Human Intelligence and Robot Engineering, Sangmyung University, Cheonan 31066, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Electronics 2023, 12(13), 2933; https://doi.org/10.3390/electronics12132933

Submission received: 23 May 2023 / Revised: 27 June 2023 / Accepted: 30 June 2023 / Published: 3 July 2023

(This article belongs to the Special Issue Advances in Image Processing, Artificial Intelligence and Intelligent Robotics)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper proposes a real-time emotion recognition system that utilizes photoplethysmography (PPG) and electromyography (EMG) physiological signals. The proposed approach employs a complex-valued neural network to extract common features from the physiological signals, enabling successful emotion recognition without interference. The system comprises three stages: single-pulse extraction, a physiological coherence feature module, and a physiological common feature module. The experimental results demonstrate that the proposed method surpasses alternative approaches in terms of accuracy and the recognition interval. By extracting common features of the PPG and EMG signals, this approach achieves effective emotion recognition without mutual interference. The findings provide a significant advancement in real-time emotion analysis and offer a clear and concise framework for understanding individuals’ emotional states using physiological signals.

Keywords:

emotion recognition; physiological signal; PPG; EMG; multimodal network; convolutional autoencoder; short-time Fourier transform (STFT); complex-valued convolutional neural network (CVCNN)

1. Introduction

There is increasing importance in ergonomics, supporting designs based on scientific and engineering analyses of human physical, cognitive, social, and emotional characteristics. Ergonomics encompasses human engineering, biomechanics, cognitive engineering, human–computer interactions (HCI), emotional engineering, and user experiences (UX). Sophisticated technologies continue to be developed for measurement, experimentation, analysis, design, and evaluation. In particular, HCI has become an important field that has attracted extensive research, resulting in significant advances and expansion in a variety of fields, including recognizing and using emotions in computers.

Emotion recognition plays an important role in HCI, facilitating interactions between humans and intelligent devices (such as computers, smartphones, and the IoT). There are many ways to express emotions, including facial expressions, voice, gestures, text, and physiological signals, and each method has several advantages [1]. Facial expressions appear as facial images, and these image data can be acquired easily in various ways. Voice signals are good to find useful reference information as they are widely used in various fields. Gestures can express people’s emotions clearly, and text can be acquired easily through crawling or scraping. However, the characteristics of voice and text are different for each country. Moreover, gestures have limitations because it is difficult to obtain a dataset, as this requires complex processes, such as recognizing a person’s physical appearance. Furthermore, with regard to the emotion of facial expressions, voice, gestures, and text, they can be intentionally controlled, meaning the reliability of a person’s actual emotions in terms of recognition is low. In contrast, since physiological signals are related to the central nervous system, emotions cannot be deliberately controlled. Accordingly, the reliability of physiological signals when recognizing emotions is guaranteed [2,3,4]. Therefore, psychological studies focusing on the relationship between physiological signals (including electroencephalography (EEG), photoplethysmography (PPG), and electromyography (EMG) signals) and emotions have been conducted and applied in various fields. The reason for this research interest is that physiological reactions can reflect dynamic changes in the central nervous system, which are difficult to hide compared to emotions expressed through words or facial expressions.

Among the physiological signals, EEG signals are the most commonly used for emotion recognition [5,6,7] because they are directly related to the central nervous system and contain exceptional emotional features. Significant recent research using EEG signals has focused on extracting EEG features using deep-learning-based methods. Wen et al. [8] proposed a deep convolutional neural network (CNN) and an autoencoder to extract relevant emotion-specific features from EEG signals. Alhagry et al. [9] proposed a long short-term memory (LSTM) approach to classify emotions using EEG signals, and Xing et al. [10] proposed a framework for emotion recognition using multi-channel EEG signals. Transitions in emotional states are usually accompanied by changes in the power spectrum of the EEG. Previous studies have also reported that spectral differences in EEG signals in the anterior brain region of the alpha band can generally capture different emotional states. Moreover, different spectral changes between different brain regions are also associated with emotional responses, such as theta and gamma-band power changes at the right parietal lobe, theta-band power changes at the frontal midline, and asymmetry of the beta-band power at the parietal region. Moreover, different spectral changes between different brain regions are also related to emotional responses. Examples here include changes in theta and gamma band power in the right parietal lobe, changes in the theta band power in the frontal midline, and asymmetry in the beta band power in the parietal region.

Soleymani et al. [11] proposed a multimodal dataset termed MAHNOB-HCI for emotion recognition and implicate tagging research. Based on this dataset, they obtained an EEG spectral output and the valence score of the electrodes and calculated a correlation between them. They also revealed that higher frequency components on the frontal, parietal, and occipital lobes had a higher correlation with self-assessment-based valence responses. Furthermore, they improved the classification performance for continuous emotion recognition by fusing the power spectral density (PSD) and facial features. Koelstra et al. [12] presented a multimodal dataset for the analysis of human affective states termed DEAP and extracted the spectral power features of five frequency bands from 32 participants. In another study, Zheng et al. [13] presented a dataset termed SEED for analyzing stable patterns across sessions. Lin et al. [14] evaluated features specialized for emotions based on the power spectral changes of EEG signals and assessed the relationship between EEG dynamics and music-induced emotional states. They revealed that emotion-specific features from the frontal and parietal lobes could provide discriminative information related to emotion processing. Finally, Chanel et al. [15] employed the naive Bayes classifier to categorize three arousal-assessment-based emotion classes from specific frequency bands at particular electrode locations.

Despite the previously mentioned advantages, there are some limitations when using EEG signals. First, EEG can cause simple partial seizures, or rarely, complex partial seizures (particularly with frontal onset). This means that interpreting EEG signals can become difficult or impossible due to body movements that generate excessive artifacts. Therefore, knowledge of relevant clinical seizures that can accompany EEG changes is required. Moreover, EEG signals have a high dimensionality, requiring diverse and difficult processing, rendering subsequent analyses difficult. Finally, signal processing requires complicated algorithms to analyze brainwave signals, and multiple EEG electrodes must be attached to subjects to collect reliable brainwave data. For these reasons, it is very difficult to gather practical EEG data applicable to real life, even if good classification can be achieved with follow-up analyses. To avoid this limitation, we selected PPG and EMG signals to recognize emotions rather than EEG, as they contain extensive emotion-specific information and can be incorporated into wearable devices practically [16,17,18]. Thus, they are easily measurable and somewhat less complex to analyze compared to EEG signals. Therefore, in this study, we paid attention to emotion recognition using a deep learning model based on PPG and EMG signals.

Psychologists and engineers have attempted to analyze these data to explain and categorize emotions. Although there are strong relationships between physiological signals and human emotional states, traditional manual feature extraction suffers from fundamental limitations to describe emotion-related characteristics from physiological signals.

1.: Hand-crafted feature performance largely depends on the signal type and level of experience. Hence, poor domain knowledge can result in inappropriate features that are unable to capture some signal characteristics.
2.: There is no guarantee that any given feature selection algorithm will extract the optimal feature set.
3.: Moreover, most manual features are statistical and cannot incorporate signal details, which results in information loss.

In contrast, deep learning can automatically derive features from raw signals, allowing automatic feature selection and the bypassing of feature selection computational costs, and is applied to many industrial fields [19,20]. Similarly, deep learning methods have been recently applied to processing physiological signals (such as EEG or skin resistance), achieving comparable results with conventional methods. Martinez et al. [21] were the first to propose CNNs to establish physiological models for emotion, resulting in many subsequent deep emotion recognition studies. While deep learning has these advantages, features with conflicting information can disturb the process of recognizing emotions.

Following the above-mentioned research and limitations, the research problem is described as follows. First, there are many problems in using EEG data in real-time emotion recognition. Second, traditional manual feature extraction does not guarantee an optimal feature set, leading to data loss. Finally, if there is a feature with conflicting information, it can interfere with emotion recognition [22]. Therefore, in this work, we select PPG and EMG signals and propose a deep learning model that prevents feature interference by extracting the common features of both signals.

This study is structured as follows. Section 2 describes the overall structure of the proposed system, including the method of splitting the PPG and EMG signals into a single pulse. In Section 3, the experimental environment regarding the dataset and experimental settings and the experimental results are presented, and the performance of the proposed emotion recognition model is compared with other studies. Finally, Section 4 contains a summary of the paper and presents the conclusions.

2. Proposed Real-Time Emotion Recognition System

2.1. Overview of the Proposed Real-Time Emotion Recognition System

An overview of the proposed real-time emotion recognition system developed in this study is presented in Figure 1. To extract the emotional features of a person based on PPG and EMG signals, a convolutional autoencoder (CAE) and a CNN-based architecture are combined. Emotional recognition is possible with these features only, but they contain conflicting information, which can confuse recognition. Therefore, in order to mediate the confusion, shared emotional features are extracted from the complex-valued convolutional neural network (CVCNN), in which the inputs are the results of a short-time Fourier transform (STFT). By using the CVCNN, efficient features are acquired from the complex-valued results of the STFT. Then, those features are concatenated and used to recognize emotions.

As shown in Figure 1, the proposed system mainly comprises two modules. The physiological coherence feature module extracts features that exhibit a correlation between the PPG and EMG signals using a convolutional autoencoder and a two-stream CNN. Furthermore, the physiological common feature module extracts features that share both frequency information and overall details over time using a short-time Fourier transform (STFT) and a CVCNN. This module can contribute to successful emotion recognition by preventing feature interference that may occur in the physiological coherence feature module.

2.2. Single-Pulse Extraction Using Peak-to-Peak Segmentation for PPG and EMG Signals

By including the time or frequency domain and a geometric analysis, there is a variety of different physiological signal analysis techniques. The most commonly used method is a time domain analysis, which is divided by the average cycle rate and the difference between the longest and shortest signal values. However, preprocessing at an average cycle rate is inefficient because the aim is to capture changing trends immediately. Here, the difference between the longest and shortest signals is irrelevant, because the data fundamentally differ between participants. Therefore, the captured signal was split into short periods based on the peak value to extract the maximum amount of information within the raw signal while minimizing any losses. These short periods of signals are often directly associated with underlying physiological properties. Introducing even small variations in these short periods could potentially distort the underlying properties. As a result, in order to preserve the integrity of the signals and avoid any potential distortion of the underlying properties, we chose not to apply any signal augmentation techniques.

Figure 2 indicates that the PPG high peaks and EMG low peaks were clearly distinguishable from the characteristic waveforms. However, full-length signals were difficult to correlate with specific emotions, since emotional expressions weaken or deteriorate with increasing measurement time. Therefore, we segmented the signals into short signals to reflect emotion trends and eliminated any signals that differed from this trend. Regular periodicity signals were divided into single-pulse sections. Comparing the PPG and EMG data, we set the single-pulse data length to 86 sample points, although segmenting this length differed depending on the particular signal characteristics.

\begin{matrix} Segmentation criteria = \{\begin{matrix} {PPG}_{single-pulse} = [x_{H p}^{*} - c_{L}, x_{H p}^{*} + c_{R}] \\ {EMG}_{single-pulse} = [x_{L p}^{*} - c_{L}, x_{L p}^{*} + c_{R}] \end{matrix} \end{matrix}

(1)

where

x^{*}

denotes the partial (single-pulse) signal length extracted from the entire signal;

H_{p}

and

L_{p}

are high and low peak locations, respectively; and

c_{L}

and

c_{R}

are the left and right constants, respectively, which were assigned to the relative to the peak points. Figure 3 displays typical resulting extracted signals.

Using the entire signal does not always help in recognizing emotions. Rather, the signals typically contain artifact noise which distorts the signal waveform and complicates the fitting task. Using the entire signal is also rather complicated because we have to consider the possibility of each emotion starting in an arbitrary time frame [23,24,25]. In order to efficiently recognize emotions, it was necessary to determine the appropriate length to input to the deep learning model after properly segmenting the signal. Therefore, as a result of exploring the appropriate input signal length through experiments, we found that the appropriate input signal length is between 10 and 15 pulses. Furthermore, normalization is essential when processing data that vary from person to person, such as biosignals.

The maximum or minimum value of the signal (the amplitude) is different for each person. Therefore, to find appropriate peak values, appropriate threshold values must be determined. For this purpose, a quartile analysis was applied to all peak values.

A quartile analysis is a statistical method used to divide a set of data into four equal parts (quartiles). The data are sorted in ascending order, and then three equally sized cut points are selected that divide the data into four groups, with each group containing an equal number of observations. These cut points are known as quartiles and are often denoted as Q1, Q2, (median), and Q3. A quartile analysis is useful for understanding the distribution of a set of data, particularly when the data contain outliers or are not normally distributed. Moreover, it can provide information on the spread, skewness, and central tendency of the data.

Using this method, the threshold value that can obtain the maximum information without losses was 0.15 for PPG and 1.2 for EMG. Figure 4 shows the single-pulse appearance of PPG and EMG when various thresholds (including appropriate threshold values) are applied.

2.3. Physiological Coherence Feature Module

2.3.1. Convolutional Autoencoder for 1D Signals

The CAE extends the basic structure of the simple autoencoder by changing the fully connected layers to convolution layers [26,27,28]. Identical to the simple autoencoder, the size of the input layer is also the same as the output layers, although the network of the encoder changes to convolution layers and the network of the decoder change to transposed convolutional layers.

As illustrated in Figure 5, an autoencoder consists of two parts: an encoder and a decoder. The encoder converts the input x to a hidden representation y (feature code) using a deterministic mapping function. Typically, this is an affine mapping function followed by a nonlinearity, where W is the weight between the input x and the hidden representation y and b is the bias.

\begin{matrix} y = f (W x + b) \end{matrix}

(2)

\begin{matrix} z = f^{^{'}} (W^{^{'}} y + b^{^{'}}) \end{matrix}

(3)

The CAE combines the local convolution connection with the autoencoder, which is a simple step that adds a convolution operation to inputs. Correspondingly, a CAE consists of a convolutional encoder and a convolutional decoder. The convolutional encoder realizes the process of convolutional conversion from the input to the feature maps, while the convolutional decoder implements the convolutional conversion from the feature maps to the output. In a CAE, the extracted features and the reconstructed output are calculated through the CNN. Thus, (2) and (3) can be rewritten as follows:

\begin{matrix} y = R e L U (w x + b) \end{matrix}

(4)

\begin{matrix} z = R e L U (w^{^{'}} y + b^{^{'}}) \end{matrix}

(5)

where,

ω

represents the convolutional kernel between the input and the code y and

ω

’ represents the convolutional kernel between the code y and the output. Terms b and b’ are the bias. Moreover, the parameters of the encoding and decoding operations can be computed using unsupervised greedy training. The proposed architecture of a CAE for 1D signals is shown in Figure 6.

2.3.2. Feature Extraction of Physiological Coherence Features for PPG and EMG Signals

In the previous section, each latent vector of the PPG and EMG signals was extracted through the CAE. Data compression of the signal was achieved through a dimensionality reduction, which is the main role of the autoencoder, allowing essential information about the signals to be extracted. To extract the physiological coherence features through this latent vector, a feature extraction module was constructed, as depicted in Figure 7. In the physiological coherence feature module, starting with latent vectors for each PPG and EMG signal, emotion-related features were extracted using the following process. Moreover, the features extracted in this way are complementary to PPG and EMG and contain information about the overall details over time.

First, effective features related to emotions were obtained through a 1D convolutional layer in each latent vector of the PPG and EMG signals. Complementary features of the PPG and EMG signals were then extracted by concatenating each feature and passing through the 1D convolutional layer again. When extracting features from the PPG and EMG signals, after the first 1D convolutional layer of each process, batch normalization and max pooling were performed to solve the problem of internal covariate shift and to transfer strong features with emotional characteristics to the next layer. However, while performing max pooling and passing strong features with emotional characteristics to the next layer, delicate representations may be discarded which can capture sophisticated emotional information. Therefore, only batch normalization was performed at the second convolutional layer. In a situation where features are fused through concatenation, this could be performed after arranging features in a row through flattening, as shown in Figure 8a. However, we did not employ flattening to preserve the overall details over time (also known as temporal information). Instead, temporal-wise concatenation was performed to ensure that it could be fused by time steps, as shown in Figure 8b.

2.4. Physiological Common Feature Module

Basically, there are two main methods of extracting features from physiological signals in emotion recognition. One is the statistical feature extraction method, which extracts statistical features based on statistical facts. The other is deep-learning-based feature extraction, which extracts features through a deep learning model.

Statistical features (also known as hand-crafted features) are less reliable because people judge necessary features. People judge the statistical features related to the task that they want to perform and select them, although it is unclear whether the features are actually related to the task that they want to perform. Therefore, the deep-learning-based feature extraction method is currently being used extensively. Although it has the advantage of being able to extract features related to emotion recognition by automatically extracting extensive amounts of important information from the signal, it has the disadvantage of not being able to obtain information about the frequency band.

In order to compensate for the disadvantages of each method, the results of the STFT were applied to a deep learning model to extract features that also included the information of the frequency band in addition to information about overall details over time.

2.4.1. Signal Conversion from Time Domain to Time-Frequency Domain Using the Short-Time Fourier Transform

The STFT is a Fourier-related transform used to determine the sinusoidal frequency and phase content of local sections of a signal as it changes over time [29]. Although the fast Fourier transform (FFT) can clearly indicate the frequency of a signal, it has the disadvantage of having difficulty determining how much the frequency has changed over time. In contrast, the STFT can easily determine the frequency over time because it performs a Fourier transform by dividing the section over time.

The process of the STFT is depicted in Figure 9. Here, the hop length is the length that the window jumps from the current section to the next section, and the overlap length is the overlapping length between the current window and the next window. The resulting value of the STFT comprises a complex number that contains information on both the magnitude and phase. Therefore, the use of complex numbers is inevitable when using both the magnitude and phase information of the STFT as shown in Figure 10.

2.4.2. Complex-Valued Convolutional Neural Network (CVCNN)

In general neural networks, neurons have weights, inputs, and outputs in the real domain, and these neural networks are called real-valued neural networks (RVNNs). Moreover, each neuron constituting an RVNN is called a real-valued neuron (RVN). The complex number resulting value of the STFT mentioned in Section 2.4.1 cannot be treated with RVNs. Therefore, to treat the complex number value from deep learning, the complex-valued neuron (CVN) and complex-valued neural network (CVNN) are necessary.

The CVN has the same structure as an RVN, as depicted in Figure 11. However, the weights, inputs, and outputs of CVNs all exist in the complex domain. Therefore, they can be applied to various fields that use a complex system [30,31].

A real-valued convolution operation takes a matrix and a kernel (a smaller matrix) and outputs a matrix. The matrix elements are computed using a sliding window with the same dimensions as the kernel and each element is the sum of the point-wise multiplication of the kernel and matrix patch at the corresponding window.

Herein, we use the dot product to represent the sum of a point-wise multiplication between two matrices

\begin{matrix} X \cdot A = \sum_{i j} X_{i j} A_{i j} \end{matrix}

(6)

In the complex generalization, both the kernel and input patch are complex values. The only difference stems from the nature of multiplying complex numbers. When convolving a complex matrix with the kernel W = A + iB, the output corresponding to the input patch Z = X + iY is given by

\begin{matrix} Z \cdot W = (X \cdot A - Y \cdot B) + i (X \cdot B + Y \cdot A) \end{matrix}

(7)

To implement the same functionality with a real-valued convolution, the input and output should be equivalent. Each complex matrix is represented by two real matrices stacked together in a three-dimensional array. Denoting this array [X, Y], it is equivalent to X + iY. X and Y are the array’s channels (Figure 12).

2.4.3. Feature Extraction of Physiological Common Features for PPG and EMG Signals

As mentioned at the beginning of this chapter, there are shortcomings in the method of extracting each feature. Therefore, in this section, to address these shortcomings, features were extracted while preserving the general details over time of the signals and the information of the frequency band by applying the STFT and CVNN, as explained previously. Figure 13 shows the structure of the proposed physiological common feature module.

As shown in Figure 13, the common features of the two biosignals (PPG and EMG) were extracted in this study, because extracting the features of PPG and EMG separately is an inherently inefficient method. In other words, selecting individual features provides too much input data in single-task learning and creates the possibility that each feature would adversely affect other features and interfere with the task to be performed. In addition, the resultant value of the STFT is composed of complex numbers and includes information on both magnitude and phase. Therefore, to use both the intensity and phase information of the STFT, the use of complex numbers is inevitable. Accordingly, a CVNN was used to extract the features.

For the following reasons, we propose to extract the common features of the PPG and EMG signals through a CVNN instead of extracting the features for each PPG and EMG signal.

Therefore, the total structure of our proposed real-time emotion recognition system can be represented as shown in Figure 14.

3. Experimental Results

3.1. Datasets

It is important to decide which dataset to use for a study since the type and characteristics of a dataset have a significant influence on the results. In particular, datasets containing only physiological signals (not image-generated datasets) are required for emotion recognition through physiological signals. We required a dataset containing PPG and EMG signals; thus, among the available datasets, we chose the DEAP dataset [12]. Moreover, we created a dataset, EDPE, for more granular emotions (as used in a previous study [32]).

Emotions can be affected by many factors, and each emotion has fuzzy boundaries. Therefore, it is ambiguous to quantify emotions or define them using objective criteria. Various models that define emotion have been developed, although most emotion recognition studies use Russell’s circumplex theory [33], which assumes emotions are distributed in a two-dimensional circular space with arousal and valence dimensions. Generally, arousal is considered as the vertical axis and valence the horizontal, with the origin (circle center) representing neutral valence and medium arousal level.

As shown in Figure 15, emotional states can be represented at any valence and arousal level. For example, “Excited” has high arousal and high valence, whereas “Depressed” has low arousal and low valence. Emotions can manifest in various ways, and current emotion recognition systems are generally based on facial expressions, voice, gestures, and text.

3.1.1. Database for Emotion Analysis Using Physiological Signals (DEAP)

The DEAP dataset contains 32-channel EEG and peripheral physiological signals (including PPG and EMG). Furthermore, these signals were measured from a total of 32 participants (16 male and 16 female) who watched 40 music videos and self-assessed on five criteria (including arousal and valence). Each participant’s age was within the range of 19–37 years (average of 26.9 years), and self-evaluation was on a continuous scale from 1 to 9, except for familiarity (which was a discrete scale from 1 to 5). Thirty-two participants first put on a device that can collect the signals and started the device three seconds before watching the video to measure the signals when they were in a calm state. After that, they watched the videos and started the self-assessment after the video was finished. This step was repeated to collect the signals. The signals were measured at 512 Hz and also the data were downsampled to 128 Hz. Furthermore, the dataset is summarized in Table 1.

3.1.2. Emotion Dataset Using PPG and EMG Signals (EDPE)

The EDPE dataset has a total of 40 participants (30 males and 10 females) who watched 32 videos that evoked specific emotions and then self-evaluated their arousal and valence. Each video lasted 3–5 min and the total duration of the experiment was 2.5–3.0 h. Each participant’s age was within the range of 20–28 years and the self-assessment proceeded with a four discrete step evaluation of −2, −1, +1, +2 in Arousal and Valence. Through these four-step self-assessments, emotions are classified into 16 areas expressed in Figure 16, not four areas. This makes it more efficient to recognize emotions at the level of emotions defined by adjectives. The overall experimental process is as follows. First, participants attach a sensor and wait in a normal state for 10 min without measuring the signals. After that, they watch videos corresponding to the four quadrants of Russell’s model while the signals are measured. After each videos finishes, they start a self-assessment. The measured signals are PPG and EMG, which are sampled and measured at 100 Hz, as summarized in Table 2.

3.2. Experimental Setup

The experiment was conducted by setting the single-pulse lengths to 86 and 140 data points for the DEAP and EDPE datasets, respectively. Each sample was preprocessed through MATLAB (R2020b), and learning was conducted using Tensorflow (2.6.0) and Keras (2.6.0). In the fields of cognitive engineering, HCI (emotion engineering and medicine) and BCI (perception of an individual’s emotional state) are very important. Hence, the experiment in this study was not conducted in a subject-independent way. Therefore, 80% randomly-selected samples from the DEAP dataset were used for training and 20% for testing. Similarly, a randomly selected 80% of samples from the entire EDPE dataset was used for training and 20% of the samples was used for testing.

In addition, the accuracy of each pulse number was measured to confirm how many pulses from each dataset were suitable for recognizing emotions. Based on the appropriate number of pulses (confirmed from the experiment), the performances of the algorithm proposed in this study and other algorithms were compared. Finally, by learning each of the 16 emotions classified by arousal and valence, an experiment was conducted to determine how well they could be measured in the emotion class.

3.3. Classification Results on Deap Dataset

Figure 17 shows the average accuracies of valence and arousal according to the number of pulses in the DEAP dataset. When the DEAP dataset contained only a single pulse, the accuracy was very low (47%). However, as the number of pulses increased from 1 to 10, the accuracy increased rapidly, although, beyond 10 pulses, the accuracy remained the same. In regard to the DEAP dataset, although the accuracy increased rapidly up to 10 pulses, the ideal number of pulses in the DEAP dataset was set to 15, because this produced the optimum performance (75%).

Table 3 shows a comparison of the results of the emotion recognition using the DEAP dataset. The proposed method exhibited an accuracy of 75.76% and 74.32% in arousal and valence. Except for studies [10,34,35], it can be seen that the proposed method shows the best performance and is also superior in terms of the recognition interval. Compared to the proposed method, the study [34] performs poorly in valence, but outperforms in arousal. Conversely, study [10] performs well in valence, but does not perform well in arousal. In the case of study [35], it can be seen that both valence and arousal perform better than the proposed method. However, it is difficult to make an appropriate comparison because the three studies have a longer recognition interval than the proposed method.

Therefore, as shown in Table 4, the performance was compared again by matching the recognition interval to the same 15 s as the proposed method. Study [10] used LSTM to construct the model according to relatively long-term signals; thus, it seems that the performance has decreased significantly compared to studies [34,35]. Therefore, when comparing the proposed method and research [10,34,35] under the same conditions, the proposed method shows the best performance.

3.4. Classification Results on the EDPE Dataset

Figure 18 shows the average accuracies of valence and arousal according to the number of pulses in the EDPE dataset. When the EDPE dataset contained only a single pulse, the accuracy was very low (46%). However, as the number of pulses increased from 1 to 10, the accuracy increased rapidly, although, beyond that number, the accuracy decreased. Therefore, for the EDPE dataset, the performance was optimum (85%) when it contained 10 pulses. Accordingly, the ideal number of pulses in the EDPE dataset was set to 10.

Figure 19 shows the confusion matrix of the arousal and valence results when the ideal number of pulses in the EDPE dataset was set to 10. Figure 19 exhibits that many numbers were on the descending right diagonal where the predictions and answers matched, indicating that the learning was successful. Items with relatively high numbers (except for numbers on the diagonal) were <Very High–High> and <Very Low–Low>, and Very High became High, High became Very High, or Very Low became Low and Low (which refers to the case of confusion with Very Low). Even though there were cases of confusion, the overwhelming majority of correctly predicted cases proved the excellent classification performance of the method proposed in this study.

Experiments were also conducted with various deep learning models based on a CNN and LSTM (commonly used in deep learning models) using the same EDPE dataset. Although CNNs are one of the most-used deep neural networks for analyzing visual images, they have frequently been employed in recent emotion recognition research by analyzing patterns of adjacent physiological signals. Therefore, we compared the performance of CNNs and models that combined a stacked autoencoder and a CNN or LSTM. Finally, the performance of the bimodal stacked sparse autoencoder [32] was compared. Table 5 summarizes the experimental results of emotion recognition.

As shown in Table 5, the performance was low when recognizing emotions using LSTM. This result indicated that the data were not just time dependent but also more complex. Therefore, this suggested that improved results could be obtained by analyzing data patterns using a fully connected layer and a convolutional layer. As a result, our proposed model outperformed the other deep learning models.

Recognizing the highs and lows of arousal and valence has a very different meaning from recognizing emotion itself. Being able to recognize arousal well does not necessarily mean that valence can also be recognized well, and vice versa. In other words, recognizing emotions is a more complicated and difficult problem than recognizing high and low levels of arousal or valence, in which both arousal and valence standards are applied simultaneously, as shown in Figure 16. Therefore, to recognize emotions, we reconstructed the EDPE dataset with data and labels for each of the 16 emotions in Figure 16, and training and testing were conducted by dividing the sample into 80% for training and 20% for testing.

Table 6 presents the recognition results for the 16 emotions, which displayed an average recognition accuracy of 82.52%. Although this result was slightly lower than the recognition accuracy for arousal and valence, it was sufficiently accurate to be applied successfully in real-life scenarios, considering the difficulty of recognizing 16 emotions compared to each recognition task for arousal and valence.

4. Conclusions

This paper proposed a novel approach for real-time emotion recognition using physiological signals (PPG and EMG) through the extraction of physiologically common features via a CVCNN. The results indicated that the proposed approach achieved an accuracy of 81.78%, which is competitive with existing methods. Furthermore, we confirmed that the recognition interval was significantly shorter than in other studies, rendering the proposed method suitable for real-time emotion recognition.

The findings of this study suggest that the proposed approach has the potential to be applied in various fields, such as healthcare, human–computer interactions, and affective computing. Moreover, this study provides insights into the relationship between physiological signals and emotions, which can further advance our understanding of the human affective system.

While the proposed approach shows promise in real-time emotion recognition using physiological signals, there are some limitations. Firstly, the concept of cross-subject analysis, which involves analyzing data from multiple subjects, is not incorporated in this study. This limits the generalizability of the findings to a broader population. Next, the experiments were conducted in a controlled laboratory setting, which may not fully capture the range of emotions experienced in real-life situations. Therefore, there is a need for future research to address these limitations.

In light of these limitations, future research should consider conducting experiments in wild environments to better understand the applicability of the proposed approach in real-world scenarios. This would provide a more comprehensive understanding of how emotions manifest in different contexts. In addition, by understanding the properties of the matrix, which is the result of a STFT, it is possible to derive novel approaches such as spectrograms [38] or graph transformer models [39,40]. Furthermore, it is important to expand the scope of the investigation beyond short-term emotion recognition. Long-term emotion recognition should be explored to gain insights into how emotions evolve and fluctuate over extended periods of time.

Moreover, future research could focus on defining and recognizing personality traits based on changes in emotions. By studying the relationship between emotions and personality, we can gain a deeper understanding of the human affective system. This would not only contribute to the field of affective computing, but also have practical implications in various domains, such as healthcare and human–computer interactions.

In summary, by addressing the limitations related to cross-subject analysis and conducting experiments in real-life settings, future research can enhance the applicability and generalizability of the proposed approach. Additionally, exploring long-term emotion recognition and its connection to personality traits would provide valuable insights into the complex nature of human emotions.

Author Contributions

Conceptualization, E.-G.H., T.-K.K. and M.-T.L.; data curation, E.-G.H.; formal analysis, E.-G.H., T.-K.K. and M.-T.L.; methodology, E.-G.H., T.-K.K. and M.-T.L.; software, E.-G.H. and T.-K.K.; validation, M.-T.L. and T.-K.K.; writing—original draft, E.-G.H. and T.-K.K.; writing—review and editing, M.-T.L. and T.-K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) (grant no. NRF-2022R1F1A1073543).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ali, M.; Mosa, A.H.; Al Machot, F.; Kyamakya, K. Emotion recognition involving physiological and speech signals: A comprehensive review. In Recent Advances in Nonlinear Dynamics and Synchronization; Springer: Berlin/Heidelberg, Germany, 2018; pp. 287–302. [Google Scholar]
Sim, H.; Lee, W.H.; Kim, J.Y. A Study on Emotion Classification utilizing Bio-Signal (PPG, GSR, RESP). Adv. Sci. Technol. Lett. 2015, 87, 73–77. [Google Scholar]
Chen, J.; Hu, B.; Moore, P.; Zhang, X.; Ma, X. Electroencephalogram-based emotion assessment system using ontology and data mining techniques. Appl. Soft Comput. 2015, 30, 663–674. [Google Scholar] [CrossRef]
Shu, L.; Xie, J.; Yang, M.; Li, Z.; Li, Z.; Liao, D.; Xu, X.; Yang, X. A review of emotion recognition using physiological signals. Sensors 2018, 18, 2074. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Houssein, E.H.; Hammad, A.; Ali, A.A. Human emotion recognition from EEG-based brain–computer interface using machine learning: A comprehensive review. Neural Comput. Appl. 2022, 34, 12527–12557. [Google Scholar] [CrossRef]
Al-Qazzaz, N.K.; Alyasseri, Z.A.A.; Abdulkareem, K.H.; Ali, N.S.; Al-Mhiqani, M.N.; Guger, C. EEG feature fusion for motor imagery: A new robust framework towards stroke patients rehabilitation. Comput. Biol. Med. 2021, 137, 104799. [Google Scholar] [CrossRef] [PubMed]
Sung, W.T.; Chen, J.H.; Chang, K.W. Study on a real-time BEAM system for diagnosis assistance based on a system on chips design. Sensors 2013, 13, 6552–6577. [Google Scholar] [CrossRef] [Green Version]
Wen, T.; Zhang, Z. Deep convolution neural network and autoencoders-based unsupervised feature learning of EEG signals. IEEE Access 2018, 6, 25399–25410. [Google Scholar] [CrossRef]
Alhagry, S.; Fahmy, A.A.; El-Khoribi, R.A. Emotion recognition based on EEG using LSTM recurrent neural network. Int. J. Adv. Comput. Sci. Appl. 2017, 8, 355–358. [Google Scholar] [CrossRef] [Green Version]
Xing, X.; Li, Z.; Xu, T.; Shu, L.; Hu, B.; Xu, X. SAE + LSTM: A New framework for emotion recognition from multi-channel EEG. Front. Neurorobot. 2019, 13, 37. [Google Scholar] [CrossRef] [Green Version]
Soleymani, M.; Lichtenauer, J.; Pun, T.; Pantic, M. A multimodal database for affect recognition and implicit tagging. IEEE Trans. Affect. Comput. 2011, 3, 42–55. [Google Scholar] [CrossRef] [Green Version]
Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. Deap: A database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 2011, 3, 18–31. [Google Scholar] [CrossRef] [Green Version]
Zheng, W.L.; Zhu, J.Y.; Lu, B.L. Identifying stable patterns over time for emotion recognition from EEG. IEEE Trans. Affect. Comput. 2017, 10, 417–429. [Google Scholar] [CrossRef] [Green Version]
Lin, Y.P.; Wang, C.H.; Jung, T.P.; Wu, T.L.; Jeng, S.K.; Duann, J.R.; Chen, J.H. EEG-based emotion recognition in music listening. IEEE Trans. Biomed. Eng. 2010, 57, 1798–1806. [Google Scholar] [PubMed]
Chanel, G.; Kronegg, J.; Grandjean, D.; Pun, T. Emotion assessment: Arousal evaluation using EEG’s and peripheral physiological signals. In Proceedings of the Multimedia Content Representation, Classification and Security: International Workshop, MRCS 2006, Istanbul, Turkey, 11–13 September 2006; Proceedings. Springer: Berlin/Heidelberg, Germany, 2006; pp. 530–537. [Google Scholar]
Udovičić, G.; Ðerek, J.; Russo, M.; Sikora, M. Wearable emotion recognition system based on GSR and PPG signals. In Proceedings of the 2nd International Workshop on Multimedia for Personal Health and Health Care, Mountain View, CA, USA, 23 October 2017; pp. 53–59. [Google Scholar]
Li, C.; Xu, C.; Feng, Z. Analysis of physiological for emotion recognition with the IRS model. Neurocomputing 2016, 178, 103–111. [Google Scholar] [CrossRef]
Lee, Y.K.; Kwon, O.W.; Shin, H.S.; Jo, J.; Lee, Y. Noise reduction of PPG signals using a particle filter for robust emotion recognition. In Proceedings of the 2011 IEEE International Conference on Consumer Electronics—Berlin (ICCE—Berlin), Berlin, Germany, 3–6 September 2011; pp. 202–205. [Google Scholar]
Noroznia, H.; Gandomkar, M.; Nikoukar, J.; Aranizadeh, A.; Mirmozaffari, M. A Novel Pipeline Age Evaluation: Considering Overall Condition Index and Neural Network Based on Measured Data. Mach. Learn. Knowl. Extr. 2023, 5, 252–268. [Google Scholar] [CrossRef]
Mirmozaffari, M.; Yazdani, M.; Boskabadi, A.; Ahady Dolatsara, H.; Kabirifar, K.; Amiri Golilarz, N. A novel machine learning approach combined with optimization models for eco-efficiency evaluation. Appl. Sci. 2020, 10, 5210. [Google Scholar] [CrossRef]
Martinez, H.P.; Bengio, Y.; Yannakakis, G.N. Learning deep physiological models of affect. IEEE Comput. Intell. Mag. 2013, 8, 20–33. [Google Scholar] [CrossRef] [Green Version]
Ozbulak, U.; Gasparyan, M.; Rao, S.; De Neve, W.; Van Messem, A. Exact Feature Collisions in Neural Networks. arXiv 2022, arXiv:2205.15763. [Google Scholar]
Wu, C.K.; Chung, P.C.; Wang, C.J. Representative segment-based emotion analysis and classification with automatic respiration signal segmentation. IEEE Trans. Affect. Comput. 2012, 3, 482–495. [Google Scholar] [CrossRef]
Picard, R.W.; Vyzas, E.; Healey, J. Toward machine emotional intelligence: Analysis of affective physiological state. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 1175–1191. [Google Scholar] [CrossRef] [Green Version]
Zeng, Z.; Pantic, M.; Roisman, G.I.; Huang, T.S. A survey of affect recognition methods: Audio, visual and spontaneous expressions. In Proceedings of the 9th International Conference on Multimodal Interfaces, Aichi, Japan, 12–15 April 2007; pp. 126–133. [Google Scholar]
Masci, J.; Meier, U.; Cireşan, D.; Schmidhuber, J. Stacked convolutional auto-encoders for hierarchical feature extraction. In Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2011: 21st International Conference on Artificial Neural Networks, Espoo, Finland, 14–17 June 2011; Proceedings, Part I 21. Springer: Berlin/Heidelberg, Germany, 2011; pp. 52–59. [Google Scholar]
Wang, Y.; Xie, Z.; Xu, K.; Dou, Y.; Lei, Y. An efficient and effective convolutional auto-encoder extreme learning machine network for 3d feature learning. Neurocomputing 2016, 174, 988–998. [Google Scholar] [CrossRef]
Huang, H.; Hu, X.; Zhao, Y.; Makkie, M.; Dong, Q.; Zhao, S.; Guo, L.; Liu, T. Modeling task fMRI data via deep convolutional autoencoder. IEEE Trans. Med. Imaging 2017, 37, 1551–1561. [Google Scholar] [CrossRef] [PubMed]
Sejdic, E.; Djurovic, I.; Jiang, J. Time–frequency feature representation using energy concentration: An overview of recent advances. Digit. Signal Process. 2009, 19, 153–183. [Google Scholar] [CrossRef]
Amin, M.F.; Murase, K. Single-layered complex-valued neural network for real-valued classification problems. Neurocomputing 2009, 72, 945–955. [Google Scholar] [CrossRef] [Green Version]
Zimmermann, H.G.; Minin, A.; Kusherbaeva, V.; Germany, M. Comparison of the complex valued and real valued neural networks trained with gradient descent and random search algorithms. In Proceedings of the of ESANN 2011, Bruges, Belgium, 27–29 April 2011. [Google Scholar]
Lee, Y.K.; Pae, D.S.; Hong, D.K.; Lim, M.T.; Kang, T.K. Emotion Recognition with Short-Period Physiological Signals Using Bimodal Sparse Autoencoders. Intell. Autom. Soft Comput. 2022, 32, 657–673. [Google Scholar] [CrossRef]
Russell, J.A. A circumplex model of affect. J. Personal. Soc. Psychol. 1980, 39, 1161. [Google Scholar] [CrossRef]
Zhang, Q.; Chen, X.; Zhan, Q.; Yang, T.; Xia, S. Respiration-based emotion recognition with deep learning. Comput. Ind. 2017, 92, 84–90. [Google Scholar] [CrossRef]
Topic, A.; Russo, M. Emotion recognition based on EEG feature maps through deep learning network. Eng. Sci. Technol. Int. J. 2021, 24, 1442–1454. [Google Scholar] [CrossRef]
Xu, H.; Plataniotis, K.N. EEG-based affect states classification using deep belief networks. In Proceedings of the IEEE 2016 Digital Media Industry & Academic Forum (DMIAF), Santorini, Greece, 4–6 July 2016; pp. 148–153. [Google Scholar]
Mert, A.; Akan, A. Emotion recognition from EEG signals by using multivariate empirical mode decomposition. Pattern Anal. Appl. 2018, 21, 81–89. [Google Scholar] [CrossRef]
Pusarla, N.; Singh, A.; Tripathi, S. Learning DenseNet features from EEG based spectrograms for subject independent emotion recognition. Biomed. Signal Process. Control. 2022, 74, 103485. [Google Scholar] [CrossRef]
Yun, S.; Jeong, M.; Kim, R.; Kang, J.; Kim, H.J. Graph transformer networks. Adv. Neural Inf. Process. Syst. 2019, 32. Available online: https://proceedings.neurips.cc/paper_files/paper/2019/file/9d63484abb477c97640154d40595a3bb-Paper.pdf (accessed on 22 May 2023).
Dwivedi, V.P.; Bresson, X. A generalization of transformer networks to graphs. arXiv 2020, arXiv:2012.09699. [Google Scholar]

Figure 1. Overview of the proposed system.

Figure 2. Examples of PPG and EMG signals.

Figure 3. Results of single-pulse segmentation for PPG and EMG signals.

Figure 4. Effect on single-pulse signals with different thresholds.

Figure 5. General structure of an autoencoder in which the encoder and decoder are neural networks.

Figure 6. Architecture of a convolutional autoencoder for 1D signals.

Figure 7. Architecture of the physiological coherence feature module.

Figure 8. Effects of flattening on feature fusion.

Figure 9. Process of the STFT.

Figure 10. STFT process for the physiological common feature module.

Figure 11. Structure of an RVN and a CVN.

Figure 12. Process of complex-valued convolution.

Figure 13. Structure of the physiological common feature module.

Figure 14. Total structure of the proposed emotion recognition system.

Figure 15. Russell’s circumplex model [33].

Figure 16. Proposed emotion plane (valence–arousal plane).

Figure 17. Classification accuracy of DEAP dataset according to pulse length.

Figure 18. Classification accuracy of the EDPE dataset according to pulse length.

Figure 19. Confusion Matrix of classification result—EDPE dataset.

Table 1. DEAP dataset summary.

DEAP Dataset Experiment
Participants	32 (male: 16, female: 16)
Videos	40 music videos
Age	Between 19 and 37
Rating categories	Arousal, Valence, Dominance, Liking, Familiarity
Rating values	Familiarity: discrete scale of 1–5 Others: continuous scale of 1–9
Recorded signals	32-channel EEG Peripheral physiological signals Face video (only for 22 participants)
Sampling rate	512 Hz (or downsampled to 128 Hz)

Table 2. EDPE dataset summary.

EDPE Dataset Experiment
Participants	40 (male: 30, female: 10)
Videos	32 videos
Age	Between 20 and 28
Rating categories	Arousal, Valence
Rating values	Discrete scale of −2, −1, +1, +2
Proposed emotion states	16 emotions depicted in Figure 16
Recorded signals	PPG, EMG
Sampling rate	100 Hz

Table 3. Comparison with other studies using the DEAP dataset.

Method	Recognition Interval	Signals	Accuracy
Method	Recognition Interval	Signals	Arousal	Valence
Naïve Bayes with Statistical Features (Koelstra, 2011) [12]	63 s	GSR, RSP, SKT, PPG, EMG, EOG	57%	62.7%
CNN (Martinez, 2013) [21]	30 s	BVP, SC	69.1%	63.3%
DBN (Xu, 2016) [36]	60 s	EEG	69.8%	66.9%
Deep Sparse AE (Zhang, 2017) [34]	20 s	RSP	80.78%	73.06%
MEMD (Mert, 2018) [37]	60 s	EEG	75%	72.87%
SAE-LSTM (Xing, 2019) [10]	60 s	EEG	74.38%	81.1%
HOLO-FM (Topic, 2021) [35]	60 s	EEG	77.72%	76.61%
Proposed Method	15 s	PPG, EMG	75.76%	74.32%

Table 4. Re-comparison with the top-3 studies in Table 3 (with the recognition interval set to 15 s).

Method	Recognition Interval	Signals	Accuracy
Method	Recognition Interval	Signals	Arousal	Valence
Deep Sparse AE (Zhang, 2017) [34]	15 s	RSP	69.8%	70.67%
SAE-LSTM (Xing, 2019) [10]	15 s	EEG	54.46%	50.98%
HOLO-FM (Topic, 2021) [35]	15 s	EEG	70.54%	72.32%
Proposed Method	15 s	PPG, EMG	75.76%	74.32%

Table 5. Comparison with various deep learning models.

Model	Dataset	Recognition Interval	Accuracy
Model	Dataset	Recognition Interval	Arousal	Valence
CNN	EDPE dataset	10 s	70.24%	74.34%
Stacked Auto-encoder + CNN			71.47%	72.01%
Stacked Auto-encoder + LSTM			61.03%	59.25%
Bimodal-Stacked Auto-encoder [32]			75.86%	80.18%
Proposed Model			84.84%	86.50%

Table 6. Results of emotion recognition for sixteen emotions by the proposed model.

Quadrant	Emotions
Quadrant I (HVHA)	Astonished	Convinced	Excited	Delighted
Quadrant I (HVHA)	85.37%	87.09%	81.34%	80.20%
Quadrant II (LVHA)	Distress	Disgust	Annoyed	Impatient
Quadrant II (LVHA)	78.35%	80.97%	75.26%	82.97%
Quadrant III (LVLA)	Sad	Anxious	Worried	Bored
Quadrant III (LVLA)	79.61%	82.89%	77.49%	90.04%
Quadrant IV (HVLA)	Confident	Serious	Pleased	Calm
Quadrant IV (HVLA)	85.08%	83.24%	86.87%	83.40%

HVHA: High Valence High Arousal, LVLA: Low Valence Low Arousal, LVHA: Low Valence High Arousal, HVLA: High Valence Low Arousal.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, E.-G.; Kang, T.-K.; Lim, M.-T. Physiological Signal-Based Real-Time Emotion Recognition Based on Exploiting Mutual Information with Physiologically Common Features. Electronics 2023, 12, 2933. https://doi.org/10.3390/electronics12132933

AMA Style

Han E-G, Kang T-K, Lim M-T. Physiological Signal-Based Real-Time Emotion Recognition Based on Exploiting Mutual Information with Physiologically Common Features. Electronics. 2023; 12(13):2933. https://doi.org/10.3390/electronics12132933

Chicago/Turabian Style

Han, Ean-Gyu, Tae-Koo Kang, and Myo-Taeg Lim. 2023. "Physiological Signal-Based Real-Time Emotion Recognition Based on Exploiting Mutual Information with Physiologically Common Features" Electronics 12, no. 13: 2933. https://doi.org/10.3390/electronics12132933

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Physiological Signal-Based Real-Time Emotion Recognition Based on Exploiting Mutual Information with Physiologically Common Features

Abstract

1. Introduction

2. Proposed Real-Time Emotion Recognition System

2.1. Overview of the Proposed Real-Time Emotion Recognition System

2.2. Single-Pulse Extraction Using Peak-to-Peak Segmentation for PPG and EMG Signals

2.3. Physiological Coherence Feature Module

2.3.1. Convolutional Autoencoder for 1D Signals

2.3.2. Feature Extraction of Physiological Coherence Features for PPG and EMG Signals

2.4. Physiological Common Feature Module

2.4.1. Signal Conversion from Time Domain to Time-Frequency Domain Using the Short-Time Fourier Transform

2.4.2. Complex-Valued Convolutional Neural Network (CVCNN)

2.4.3. Feature Extraction of Physiological Common Features for PPG and EMG Signals

3. Experimental Results

3.1. Datasets

3.1.1. Database for Emotion Analysis Using Physiological Signals (DEAP)

3.1.2. Emotion Dataset Using PPG and EMG Signals (EDPE)

3.2. Experimental Setup

3.3. Classification Results on Deap Dataset

3.4. Classification Results on the EDPE Dataset

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI