**1. Introduction**

Emotion recognition plays an important role in various areas of life, especially in the field of Active and Assisted Living (AAL) [1] and Driver Assistance Systems (DAS) [2]. Recognizing emotions automatically is one of technical enablers of AAL, as it is considered to be a significant help for monitoring and observing the mental state of either old people or disabled persons.

Furthermore, it can be observed that according to the most recent related publications, the classification performance of emotion recognition approaches has been significantly improving and the opportunities for automatic emotion recognition systems are also getting higher.

Emotions can be recognized in various ways. The most well-known models for emotion recognition are the "discrete emotion model" proposed by Ekman [3] and the "emotion dimensional model" proposed by Lang [4]. The discrete emotion model categorizes emotions into six basic emotion states: surprise, anger, disgust, happiness sadness and fear [3]. These emotions are universal, biologically experienced by all humans and widely accepted as such in the research community. In contrast to the discrete emotional model, the dimensional model assumes that the emotions are a combination of several psychological dimensions. The most well-known dimensional model is the "valance-arousal dimensional model". The valance represents a form of pleasure level and ranges from negative to positive. However, the arousal indicates the physiological and/or psychological level of being awake and ranges from low to high [5].

Overall, researchers in the field have used two major approaches to recognize emotions. The first one consists of features engineering-based approaches [6] and the second one involves Deep Learning (DL) [7]. In the features engineering approach, human emotion recognition involves several steps ranging from collecting raw sensor data up to the final conclusion about the current emotional status. The steps thereby involved are the following ones [8]: (1) preprocessing of the raw data from sensor streams for handling incompleteness, eliminating noise and redundancy, and performing data aggregation and normalization; (2) feature extraction which means extracting the main characteristics of/from the raw signals (e.g., temporal and spatial information); (3) dimensionality reduction to decrease the number of features to increase their quality and reduce the computational effort needed for the classification task; and (4) classification based on machine-learning and reasoning techniques to recognize the effective emotion class.

On the other hand, DL does not require necessarily the feature engineering/extraction step, due to the fact that DL models do extract features internally and/or implicitly (within the training phase) [9]. Therefore, they have shown promising results while involving a combination of different physiological signals for human emotion recognition [10,11].

Additionally, DL showed promising results in other research fields for different applications, e.g., identification of gas mixture [12], classification of tea specimens [13] and cardiac arrhythmia detection [14,15].

Generally, subject-independent emotion recognition is a challenging field due to the facts that (a) physiological expressions of emotion depend on age, gender, culture and other social factors [16], and (b) it also depends on the environment in which a subject lives, (c) the subject-independent nature of human emotion recognition which means that the system has been trained on a group of subjects and tested on another different group, and (d) the lab-setting independent nature of emotion recognition is related to the fact that the classifier can/will be trained locally once using sensors of a given lab-setting and after that tested considering different datasets that are collected based on different lab settings. The motivation for developing a generalized model is that collecting training data each time for each subject is not a realistic task and is far from the practical reality.

Based on the previous facts, a concept to improve the performance of the subject-dependent and subject-independent human emotion recognition systems is required; in this paper we use solely EDA (electrodermal activity) biosignals based on a deep-learning model using convolutional neural networks (CNNs) that extracts the required features internally and performs well when this model is applied on new subjects. Although researchers have used CNN to classify human emotions using EDA, they did not propose the architecture that did perform better than the proposed model in this paper.

The contribution of this paper does significantly increase the performance of human emotion recognition approaches using only EDA sensors compared to the state-of-the-art approaches involving the same EDA signals. Furthermore, the results obtained suggest/underscore a novel fact and interesting situation that other (mostly "highly intrusive") physiological sensors might be replaced

by the "only slightly intrusive" EDA-based sensors in this research field. The structure of the paper is as follows: Section 2 presents an overview of the state-of-the-art approaches. Section 3 introduces the datasets. Section 4 portrays the overall architecture of the proposed classification model. Sections 5 and 6 present the overall results and the related discussions respectively. The paper ends with a conclusion in Section 7.

#### **2. Related Works**

Regarding human emotion recognition based on EDA sensors which can be embedded in smart wearable devices, few works have been published so far. However, in [17], they proposed a system to recognize the driver's emotional state after transforming the EDA signals using a short-time Fourier transform. They considered three classes: neutral-stress, neutral-anger, and stress-anger.

Furthermore, in [18], they applied a convex optimization-based electrodermal activity (cvxEDA) framework and clustering algorithms to automatically classify the arousal and valence levels induced by affective sound stimuli.

In the literature, it has been proven that the stimuli nature plays an important role to increase the EDA response which helps to make the emotion recognition process less complex [19]. Furthermore, other works showed promising results when EDA responses are modulated by musical emotional [20,21]. Consequently, this result encouraged researchers to work on classifying arousal and valence levels induced by auditory stimuli.

In [22], authors used the AVEC 2016 dataset [23,24], they proposed a deep-learning model that consists of a CNN followed by a recurrent neural network and then fully connected layers. They showed that an end-to-end deep-learning approach directly depending on raw signals can replace feature engineering for emotion recognition purposes.

Moreover, the use of different physiological signals has been previously involved [25,26]. However, mounting different types of sensors on the human body is not preferred and nor well-accepted. In [26], authors fused different types of sensors, ECG (Electrocardiogram), EDA and ST (Skin Temperature) through a hybrid neural model which combines cellular neural networks and echo state neural networks to recognize four classes of valence and arousal, mainly, high valence high arousal, high valence low arousal, low valence high arousal, and low valence, low arousal. In [25], authors combined facial electromyograms, electrocardiogram, respiration, and EDA dataset which were collected during racing conditions. The emotional classes identified are high stress, low stress, disappointment, and euphoria. Support vector machines (SVMs) and adaptive neuro-fuzzy inference system (ANFIS) have been used for the classification.

In [27], the researchers reported results using only EDA to recognize four different states, joy, anger, sadness, pleasure using 193 features and a music and based on genetic algorithm and the K-neighbor methods.

Table 1 shows a summary of the state-of-the-art for human emotion recognition using physiological signals. More details regarding state-of-the-art experiments and obtained results can be found in Section 6.

The major limitations in the state-of-the-art can be summarized in three major points. First, the limitation regarding proposing generalized models to recognize human emotions based on EDA signals (i.e., published works do not comprehensively consider the lab-setting independence property of emotion classifiers for EDA signals). Second, the limitation concerning subject-independent human emotion recognition (i.e., published works do not comprehensively address the subject-independence property of emotion classifiers for EDA signals). Third, most published related works do focus mostly on classifying only 2 (active/passive) emotional states.

In this work, we focus on the second and the third limitation, due to the fact that classifying human emotion with respect to different lab settings is a research question which may need to adjust the raw data in a feature engineering level which is not the focus of this work where CNN does extract the desired features internally as it is a deep-learning model.


**Table 1.** Summary of the stare-of-the-art works for human emotion recognition using physiological signals.

SVM: Support Vector Machine, K-NN: K-Nearest Neighbor, CNN: Convolutional Neural Network, RNN: Recurrent Neural Network, ESN-CNN: Echo State Network - Cellular Neural Network.

## **3. Datasets**

This study uses public benchmark datasets (MAHNOB and DEAP) of physiological signals to test our proposal for a robust emotion recognition system. However, for both solely the EDA related data will be used in the experiments for this paper.
