**5. Results**

To have a deeper understanding of the performance of the proposed CNN model, MAHNOB, and DEAP datasets were used for testing the overall classification performance.

Moreover, the data distribution should be taken into consideration to choose a suitable classifier for comparison purposes. In this regard, a Fisher mapping [43] was used to define the three major scores in the samples that are investigated. Based on the output of Figures 3 and 4, it is concluded that the data is highly overlapped, and there is a kind of class imbalance problem.

In this assessment, 10 subjects were selected from the MAHNOB and DEAP datasets. Each dataset for each subject consists of four classes (see Sections 3.1 and 3.2). The average training time for each subject was approximately 21 min.

The length of the considered EDA signals is 2574 that are converted to matrices of size (39 × 66). All results are presented for ten-fold cross-validation.

Tables 4 and 5 present the average values for the precision, the recall, and the f-measure using DEAP and MAHNOB datasets respectively. The tables show the performance metrics values when training and testing are performed on the same subject. The tables show the average value of precision, recall, and f-measure with respect to each subject. The performance metrics values for each subject have been summed and divided by the total number of subjects. The major target of this experiment is to check out the overall performance for subject-dependent EDA-based emotion classification.

Tables 6 and 7 present the precision the recall, and the f-measure using DEAP and MAHNOB datasets respectively. The results are obtained when training and testing are performed on different subjects. The major target of this experiment is to check out the overall performance for subject-independent EDA-based emotion classification.

In all tables, the proposed CNN model shows the highest performance compared to K-NN and random forest which are hereby the best next two classifiers. When K-NN and random forest classifiers perform well, it indicates that the dataset is not easily separable, and the nonlinearity is high. This can be observed in Figure 4. Accordingly, the decision planes generated using other classifiers (see Tables 4–7) do not categorize some points in space to an inappropriate region as good as K-NN and random forest classifiers.

The performance metrics and the implementation are written in Python using Numpy (http://www.numpy.org/), Scikit-learn (https://scikit-learn.org/) and Keras (https://keras.io/). All performance metrics are calculated for each class and weighted taking the class imbalance into account. Accordingly, the evaluation metrics for each label have been calculated and their average has been weighted by the support measurement which is the number of true instances for each label.

Tables 8 and 9 show the confusion matrix for both MAHNOB and DEAP (the average performance results for training and testing on same subjects) and the confusion matrix for both MAHNOB and DEAP (the average performance results for training and testing on different subjects), respectively.

**Table 4.** Performance metrics for DEAP (the average performance results for training and testing on same subject).


**Table 5.** Performance metrics for MAHNOB (the average performance results for training and testing on same subject).


**Table 6.** Performance metrics for MAHNOB (the average performance results for training and testing on different subjects).


**Table 7.** Performance metrics for DEAP (the average performance results for training and testing on different subjects).


**Table 8.** Confusion matrix for both MAHNOB and DEAP (the average performance results for training and testing on same subjects).


C1: High Valence/High Arousal (HVHA), C2: High Valence/Low Arousal (HVLA), C3: Low Valence/Low Arousal (LVLA) and C4: Low Valence/High Arousal (LVHA).

**Table 9.** Confusion matrix for both MAHNOB and DEAP (the average performance results for training and testing on different subjects).


C1: High Valence/High Arousal (HVHA), C2: High Valence/Low Arousal (HVLA), C3: Low Valence/Low Arousal (LVLA) and C4: Low Valence/High Arousal (LVHA).
