Mental Pressure Recognition Method Based on CNN Model and EEG Signal under Cross Session

Zhou, Song; Gao, Tianhan; Xu, Jun

doi:10.3390/sym15061173

Open AccessArticle

Mental Pressure Recognition Method Based on CNN Model and EEG Signal under Cross Session

by

Song Zhou

¹,

Tianhan Gao

^1,* and

Jun Xu

²

¹

Software College, Northeastern University, Shenyang 110169, China

²

Science and Technology on Special System Simulation Laboratory, Beijing Simulation Center, Beijing 100854, China

^*

Author to whom correspondence should be addressed.

Symmetry 2023, 15(6), 1173; https://doi.org/10.3390/sym15061173

Submission received: 12 March 2023 / Revised: 14 April 2023 / Accepted: 8 May 2023 / Published: 30 May 2023

Download

Browse Figures

Versions Notes

Abstract

:

There is an important application value in assessing an operator’s mental pressure (MP) level in human–computer cooperative tasks through continuous asymmetric electroencephalogram (EEG) signals, which can help predict hidden risks. Due to the different distributions of EEG features in different periods, it is particularly challenging to accurately identify brain states by training and testing asymmetric EEG signals with static pattern classifiers. Due to the limitations of non-stationary neurophysiological data capture technology, cross-session MP recognition schemes can only be used as an auxiliary means in practical applications. Deep learning methods can achieve stable feature extraction at a high level. Based on this advantage, this paper proposes a triplet loss (TL)-based CNN model that can automatically update the weights of shallow hidden neurons in cross-session MP classification tasks. Firstly, the generalization ability of the CNN model under both intra-session and cross-session conditions is evaluated. Moreover, the proposed model is compared with the existing MP classifier under different feature selection and noise destruction modes. According to the results, our TL-based CNN model has high performance in processing cross-session EEG features.

Keywords:

mental Pressure recognition; electroencephalogram (EEG); triplet loss; deep learning; brain–computer interface

1. Introduction

Generally, MP evaluation methods are designed considering three levels: subjective score, secondary task performance, and neurophysiological signals [1]. Subjective scoring requires users to assess a task in its corresponding stage with a questionnaire. The general Subjective Workload Assessment Technology (SWAT) and Task Load Index (TLX) have been proven to be efficient in evaluating the MP status in many practical applications [2]. A cross entropy (CE) loss function is often used in multiclass classification problems to measure the relationship between the output of multiple classifiers and their predicted value. CE loss can measure the difference between two different probability distributions in the same random variable, which is expressed as the difference between the real probability distribution and the predicted probability distribution in machine learning. In human–machine collaborative work environments with many safety hazards, such as high-altitude hazardous operations and special flights, the psychological state of operators is an important component oflink in ensuring safety during work. Different from machine operation, the limited working memory of the operator does not always guarantee a safe working state. In this case, negligence during work processes due to a high-MP state is the key factor leading to catastrophic accidents. In order to establish a connection between internal psychological stress and external actions in the field of safety production, the MP level has been proven to be related to task performance, vigilance, situation awareness and the capability to handle emergency events induced by automation failures. However, due to the fact that subjective scoring requires the collection of subjective ratings manually, the sampling rate of the MP indicators obtained is limited; this is particularly true in specific environments, e.g., during surgery, it is difficult to evaluate the MP of doctors. In order to address the above issues, neurophysiological signal assessment methods are widely used in brain state recognition applications due to their ability to objectively infer inherent cognitive states and highly repetitive independent tasks. Among them, more mature assessment methods include electroencephalograms (EEG), event related potentials (ERP), electrocardiograms (ECG), electroencephalograms (EOG), and functional near-infrared spectroscopy (fNIR) [3]. Of this variety of neurophysiological signal markers, asymmetric electroencephalogram (EEG) is highly utilized as it is easy to obtain via portable devices and possesses high temporal resolution characteristics. Studies have shown that changes in the level of MP are determined on the basis of the EEG power spectral density (PSD) gathered from the multi-cortical region and frequency band. When task demands increase under high-MP states, the EEG signals from the parietal and occipital lobes display changes, such that

α

(8–13 Hz) power is reduced, while

θ

(4–7 Hz) power is increased. In addition, the increase in high-frequency outputs from the occipital cortex regarding

β

and

γ

power have corresponding influences on different task commands in the operation scenario [4]. The application of multi-band channels for EEG measurements is also attractive for the derivation of comprehensive information related to MP changes [5,6,7]. There exists an urgent need for methods that can automatically analyze a large amount of neurophysiological data for complex EEG acquisition equipment with high temporal and spatial resolutions. Machine learning pattern recognition methods can effectively solve this problem.

In this paper, we propose a TL-based CNN model to address the cross-session MP classification task to alleviate the above issues. Our proposed model solves the problem of high variability in the EEG signal across sessions, uses the TL algorithm to ensure the time-invariant characteristics of the EEG signal, and achieves the ideal effect of identifying the human’s MP state. We thus design a method that utilizes TL functions to achieve the compression of EEG data such that the distance between subjects is minimized. TL has been applied to different EEG classifications, such as motion image classification and emotion recognition in BCI [8,9,10]. However, as far as the present situation is concerned, the superiority of TL has not been evaluated in the biometric system of EEG. In this paper, compared with traditional CE loss, BCI spatial filtering and opponent training in EEG biometrics, we achieved advanced results by combining the TL function and a CNN model [10,11]. To identify useful EEG features for EEG feature extraction, we propose a deep neural network-based approach for MP extraction. The benefits of EEGs include their noninvasive nature, as well as the fact that their PSD characteristics can accurately reflect changes in the potential of cortical neurons. However, deeper neuronal activity was associated with altered MP as the signal traveled from its origin to the scalp. Our effective approach thus aims to construct hierarchical deep structures to reconstruct the root of hidden features with the aim of completely exploiting the EEG properties of deep neurons.

The rest of the paper is organized as follows. Section 2 describes the experimental paradigm for the MP task simulation and EEG acquisition. Section 3 presents the proposed framework along with the architecture of the CNN models and the triplet loss function. In addition, the setup for the experimentation and the analysis of the control parameters of the cross-session MP classifier are provided in Section 4. In addition, Section 5 provides the conclusions of the current work.

2. Preliminaries

In general, multi-layer perceptrons (MLPs) have been extensively used in the recognition of MP in machine operators [12]. Linear discriminant analysis (LDA) has been adopted for binary MP evaluation [13]. Support vector machine (SVM) regression and recursive elimination have been conducted to discover stable EEG frequency characteristics in different n-back task settings [14]. A least squares support vector machine (LS-SVM) has been applied to the case where the dimension of the neurophysiological features was superior when compared with the number of training data points [15]. Research shows that when the training set and test set of a machine learning model are collected from the same subject and the same task, the classification accuracy of MP may be higher than 90%. The use of EEG power characteristics and machine-learning-based methods for MP recognition is expected to show the potential patterns corresponding to the specific MP state hidden in the EEG data distribution [16]. For MP recognition research, the latest related research includes using an improved HCNN (hierarchical convolutional neural networks) network for EEG emotion classification, extracting its differential entropy features at specific time intervals of each channel. In order to maintain the position information of the EEG signal, this method converts one-dimensional EEG time domain information into two-dimensional differential entropy frequency domain features for subsequent HCNN training [17]. On the other hand, using Discrete Wavelet Transform (DWT) and K-Nearest Neighbor (KNN) algorithm is also an impact method, the EEG signal is decomposed into three frequency bands for MP state recognition. Although these traditional methods of manually extracting emotional features combined with machine learning algorithms have achieved good development, most of them require a large amount of prior knowledge to search for the features of EEG signals and construct feature engineering. EEG signals are prone to noise interference, and the differences between different subjects make manual feature selection based on EEG signals time-consuming and labor-intensive [18]. Due to different subjects’ EEG characteristics, the classification accuracy of a static MP classifier may be affected due to different subjects’ EEG characteristics. The major motivation of presenting a deep learning network to recognize useful EEG feature abstractions is that human cortical functions operate in a deep hierarchical structure. Because EEG is recorded in a noninvasive manner, PSD features can accurately represent the voltage fluctuations in neurons in the superficial cortex. However, when the signal propagates from the source to the scalp, the expression of deep neuron activity related to MP changes may be disturbed. In order to make the best use of the EEG characteristics of deep neurons, a feasible method aims to establish a hierarchical deep structure to reconstruct the source of hidden features [19,20]. In this case, the shallow layer of the deep learning model can be regarded to be a filter to capture the best feature combination of the external scalp EEG information.

While many different DL models have been proposed in the past, most leverage standard CE loss and conventional regularization techniques to learn about temporal persistence and topic specificity. Few studies utilize other training methods, such as adversarial learning and comparison losses in generative adversarial networks (GANs) [21,22]. For example, it is very difficult to find a balanced adversarial approach between discriminative and adversarial networks, thus an additional adversarial network is required. Comparing losses, on the other hand, requires a lot of computation to estimate the pairings for the training data. However, the DL method can increase the repeatability of EP/ERP in different sessions to achieve better performance within 0.5 to 5 s. Although the performance of DL algorithms has been greatly improved, in order to obtain a higher recognition rate, a lot of training is still necessary. In addition, traditional methods such as weighted regularization and dropout are not effective for training a DL model on a single EEG data. EEG data augmentation methods are thus important to learn session-invariance and topic unique embeddings that minimize the triplet loss as will be described in the following sections.

3. The Proposed Method

3.1. EEG Signal Pre-Processing

The main purpose of the pre-processing stage is to denoise the continuous asymmetric EEG data and divide them into continuous synchronous frame segments for neural network processing. The denoising operation is mainly to filter out common EEG artifacts, such as common baseline drift, power interference and EEG spikes. Since the Butterworth filter has the advantages of ideal zero ripple passband and stopband characteristics, it is selected for all filtering operations in this paper. The sampling frequency of the original EEG signal is 12 khz, before the down sampling operation, the EEG signal is filtered by the 8th order Butterworth IIR filter with a cut-off frequency of 800 Hz to prevent aliasing, and then the continuous EEG signal is down-sampled to a sampling frequency of 2000 Hz (the down-sampling factor is 6). When removing EEG baseline drift, compared with the traditional high pass filter, the S-G filter can effectively detect transient or short-term slow drift without discarding the low-frequency EEG frequency band. The 3-order polynomial and 1001 sample (0.5 s) window size S-G filter are designed again for channel correction, and then the estimated baseline is removed from the record. When removing the power supply interference signal, a second-order notch IIR filter is used to remove the 60 hz power supply interference and its 120 hz first harmonic. Finally, the spike caused by electrode displacement is clipped from the threshold value estimated from the deviation of continuous EEG signals, which will be greater than 5

σ

amplitude frequency cancellation, where

σ

is the standard deviation of the EEG signal. Another important step of EEG signal pre-processing is EEG signal framing and outlier removal. That is, the filtered continuous EEG signal is divided into non-overlapping frames with a length of 1 s to obtain a minimum frequency resolution of 1 Hz and ensure the recognition accuracy of the EEG signal in the cross-session model. Finally, the high variance frame is removed.

3.2. TL-Based CNN Model

3.2.1. Input Layer

The model consists of seven basic layers, in which the input layer is the synchronous frame extracted from continuous EEG data. The data set used for training is defined as (

X_{n}^{c}, s

), where

X_{n}^{c} \in R^{P \times T^{c} \times 1}

is the nth training EEG frame synchronized to mental component c, and s represents the subject. The input of the encoder is represented as a 3D tensor, in which P and

T^{c}

are the number of EEG channels (P = 7) and the number of time samples synchronized to component c, separately. The last dimension of 1 represents the number of input channels of the first convolution layer of the model.

3.2.2. Two Dimensional Convolution

In the convolution layer, a layer of network is added to share the weight of the filter according to the input dimension to train the correlation characteristics of the learning samples. Based on the correlation between adjacent neurons, the convolution filter weight is optimized to extract the potential EEG signal characteristics, which can effectively reduce the number of training parameters and improve the training efficiency. The input–output relationship of the two-dimensional convolution layer is as follows:

a^{l} (n, h, w, m) = \sum_{c = 0}^{C} \sum_{i = 0}^{H - 1} \sum_{j = 0}^{W - 1} a^{l - 1} (n, h + i, w + j, c) \times W^{m} (i, j, c) + b^{m}

(1)

where,

a^{l}

is the activation amount of layer l, and

W^{m}

and

b^{m}

are the m-th convolution filter (size H × W × C) and offset. C is the depth of input (i.e., the number of input channels). According to the two-dimensional convolution of EEG signal analysis, a set of spatiotemporal filters is applied to capture the functional connection characteristics of adjacent EEG electrodes.

3.2.3. Activation Layer

Due to the linear mapping characteristics of EEG data, the training model which only depends on the convolution layer will produce a set of simple outputs. In order to increase the complexity of the training model, a nonlinear activation layer is introduced after each convolution layer. The nonlinear activation layer is realized by the activation function, and the common activation functions are the modified linear element (ReLU), tanh function and sigmoid function. Through the experiment, it is found that the use of relu in the proposed model can effectively reduce the gradient disappearance problem, and is superior to other nonlinear excitation functions in convergence. ReLU is a tensor operation as shown in Equation (2).

a^{l} = M a x (a^{l - 1}, 0)

(2)

3.2.4. Batch Normalization (BN) Layer

Since the CNN model needs to effectively process long EEG sequences in a short calculation time, and in order to prevent the model from over-fitting, it is necessary to introduce a BN layer. The BN layer avoids the activation load by continuously normalizing the data of the output layer to zero mean value and unit standard deviation, so that the activation will not diverge significantly when the mean value and standard deviation remain unchanged. Moreover, the convergence speed of the CNN model is ensured with a large learning rate. In the CNN model, the BN layer is added after each convolution layer to speed up the convergence time. The BN operation is shown as Equation (3).

a^{l} (n, h, w, c) = γ_{c} \frac{a^{l - 1} (n, h, w, c) - μ_{c}}{\sqrt{σ_{c}^{2} + k}} + β_{c}

(3)

where,

μ_{c}

and

σ_{c}

are the mean and standard deviation of input channel C.

γ_{c}

and

β_{c}

are rescaling parameters that are learned during training. k is a small constant to prevent division by zero.

3.2.5. Dropout Layer

During model training, the neurons in the hidden layer tend to associate with other neurons, resulting in redundant features. Dropout, as one of the commonly used techniques to prevent model over-fitting, randomly breaks these dependencies between neurons, and sets the percentage of neurons controlled by the dropout rate to zero to promote relatively independent characteristic performance between neurons. In this model, the dropout rate after each convolution layer is set to 0.1 (that is, 10% of neurons are randomly set to zero). The experiment shows that by increasing dropout, the performance will be improved by about 4–5%.

3.2.6. Depth-Wise 2D Convolution Layer

Similar to 2D convolution, As shown in Figure 1, depth-wise convolution (DC) applies a 2D filter to the input layer while ensuring that the input channel (depth dimension) is processed independently. The depth direction two-dimensional convolution operation first separates the input channels, then applies a set of 2D filters to each channel (the number of filters is controlled by the depth multiplier parameter), and finally ensures that the output of the filter continues along the depth dimension.

Figure 2 illustrates the difference between deep convolution and standard two-dimensional convolution. Using DC is able to significantly reduce the number of trainable parameters, thus improves the generalization ability of invisible examples. Especially when processing long sequence EEG, DC is the ideal choice of high-density layer as the feature aggregation layer. The complete model is shown in Table 1.

The CNN model is composed of four main blocks. The first three blocks represent a set of spatiotemporal filters. The last block is the feature aggregation block to control the size of feature embedding. In the last layer, L2 is to normalize the embedding feature, so that the embedding is constrained on a unit hypersphere in the embedding space. Embedding normalization can speed up the convergence of triple loss and improve the performance of data validation. The total amount of training parameters of the first three convolution blocks (i.e., independent of the input size) is 425730, and the number of trainable parameters of the last layer related to the input size is

128 \times P \times T^{c} / / 64 \times 2 + 3 \times 256

. For example, for an EEG input with 7 channels and 1250 time samples (5 s at 250 Hz), the total number of trainable parameters of the designed CNN model will be 523,740.

3.3. Triple Loss

Compared with the traditional classification task using CE, TL can capture more significant data features from input data to improve the performance of feature extraction and classification. Even when training on a relatively small data set, TL may have a stronger generalization ability for invisible data. For the problem of low convergence of the EEG signal, TL can also ensure its stronger generalization ability. In terms of the above reasons, TL is selected as the objective function of the cross-session state deep learning model. TL calculation needs to redefine the status labels of training samples into three dimensions, namely Anchor (A), Positive (P) and Negative (N). The P dimension is allocated to training samples from the same type (topic). The N dimension is allocated to training samples from different types of topics. TL will calculate the distance embedding function (

d_{p}

) between A and P and the distance embedding function (

d_{N}

) between A and N. The triplet loss function is given by Equation 4.

L = \sum_{n = 1}^{B} m a x [d_{p}^{(n)} - d_{N}^{(n)} + α] = \sum_{n - 1}^{B} max [{∥e_{A}^{(n)} - e_{P}^{(n)}∥}_{2}^{2} - {∥e_{A}^{(n)} - e_{N}^{(n)}∥}_{2}^{2} + α]

(4)

where,

d_{p}

and

d_{N}

are the square of Euclidean distance between triads,

e_{A}^{(n)}

, and

e_{P}^{(n)}

and

e_{N}^{(n)}

represent the embedding of the A,P,N triples with index n in batch processing with the size B, respectively. Hyperparameter

α

represents a positive value representing the boundary between

d_{p}

and

d_{N}

. The minimization of the loss function L to zero means that the average inter-subject distance (

d_{N}

) of the embedded feature is greater compared with the average within-subject distance of the embedded feature (

d_{p}

), its minimum value is

α

, and the margins of all tests performed

α

are set to 0.5.

4. Experimental Analysis

4.1. Model Training and Triplet Acquisition

For different cross-session EEG models, the parameters of the training model are randomly initialized with the Glorot canonical initialization model. The parameters of the model are updated based on the small batch gradient descent method optimized by Adam. The initial learning rate is set to

10^{- 3}

, and the total number of training cycles and the size of batch processing are set to 64 and 128, respectively. The triplet loss function provided by formula 5 and 4 is used to train the model. The model is trained to minimize the triplet loss function 4 given in the equation. When dealing with triplet loss, hard triplets are selected for online mining, which can effectively reduce the number of calculations of the model and reduce the amount of memory required to assess the pairwise distance of training examples in batch processing. The proposed CNN model is a 32 × 128 EEG matrix. The temporal dimension convolution in each spatiotemporal core block is carried out using a 1 × 3 convolution kernel three times, while the spatial dimension convolution is carried out using a 3 × 1 convolution kernel one time. The feature map parameters are set to exponential growths of 16, 32, and 64, respectively. The number of neurons in the fully connected layer is set to 50, and the final number of classification units is 2. In addition, the batch size in this article is set to 128, the learning rate is 0.1, the learning decay rate is 0.99, and the Epoch is 100.

During the training process, this article divides the 63 s EEG signal into 63 1-s time periods in the time domain and extends the corresponding data labels. The total number of EEG epochs for each subject in 40 trials is 2520, and ultimately the user data were divided into 128 data points and 32 channels. Based on the range of MP level values for each test from 1 to 9, the median 5 as the threshold divides arousal and valence into two categories. More than 5 indicates a high pressure state; the greater the number, the greater the pressure. Finally, we obtain 1 × 2520 dimensional label data corresponding to the EEG signal. At the same time, 1/4 of the test data are taken as the validation set to verify the cross session correct recognition rate (CRR) of the DL model in each training cycle and the remaining 3/4 as training data. Here, the k-NN algorithm (k = 1) is adopted for estimating the CRR of the model to save the model with the best validation CRR. The best validation CRR is monitored in a patient window of a 15 epochs cycle. If the validation CRR does not improve in this window, the training process will be terminated and the model parameters with the best validation CRR will be saved. In the end, in order to reduce the impact of random parameter initialization, each test is trained for five times, and then the performance index is averaged.

4.2. Task Setting

The task performance indicators of different MP states are shown in Figure 3. The pre-processing stage was implemented in the MATLAB software. The DL model was implemented in Python using the Keras platform for machine learning. All the computations required for training and testing our model were performed using a Google Colab PRO account. In order to verify the experimental effect, the average time percentage of the four subsystems within the test target range is evaluated, which can be expressed as the system error range (SIE), and the SIE data are analyzed by one-way repeated measurement variance analysis. The SIE of low MP is notably higher than that of high MP, where the SIE of all courses is p < 0.001. Wilcoxon signed rank test is introduced to explore the mean value and medium of SIE of all subjects between stage 1 and stage 2. The results show that the change of the two groups of statistical data is not significant, which can prove that there is no difference in learning effect between different sessions.

4.3. Significant EEG Characteristics

To verify the relationship between the MP state and EEG characteristics describing different frequency bands and discover the most significant EEG characteristics of MP changes, the linear correlation coefficient r between EEG power in 55 different frequency bands and the time history of the target MP type, and the correlation coefficient r in three different situations, Session 1 (Case 1), Session 2 (Case 2) and Double Session (Case 3), are repeatedly calculated, as shown in Figure 4.

Correlation is measured by a Pearson product moment correlation coefficient,

r = \frac{\sum_{k = 1}^{N} (x (k) - \bar{x}) (y (k) - \bar{y})}{\sqrt{\sum_{k = 1}^{N} {(x (k) - \bar{x})}^{2}} \sqrt{\sum_{k = 1}^{N} {(y (k) - \bar{y})}^{2}}}

(5)

where x(k) and y(k) represent EEG eigenvalues and target MP class at time step K. y(k) = 0 and y(k) = 1 represent low MP and high MP States, so the change of y(k) maps the MP level. In Figure 4, most EEG features are negatively associated y(k), while the EEG features of theta, beta and gamma are negatively related to y(k) and the bands show a positive relationship to y(k). Additionally, the higher the absolute value of r, the stronger the recognition ability of some EEG features. That is, the value of R can be the basis for feature selection, the maximum value of |r| is consistent with the most significant EEG feature in Figure 5.

4.4. Intra Session and Cross Session MP Classification

In order to evaluate the classification performance of the MP classifier, the following indicators are introduced. The classification rate of the first class (low MP) is defined as the sensitivity

P_{s e n}

,

P_{s e n} = N_{l p} / (N_{l p} + N_{h f})

,

N_{l P}

represents the number of EEG data points with low MP correctly estimated by the classifier, and

N_{h f}

represents the number of high MP classes that are misclassified. The classification rate of the second category (high MP) is set as specific

P_{s p e}

,

P_{s p e} = N_{h n} / (N_{h n} + N_{l p})

,

N_{h n}

represents the number of high MP EEG data points correctly estimated by the classifier, and

N_{l p}

represents the number of low MP data points misdiagnosed. The precision of the low MP class is defined as

P_{p r e} = N_{l p} / (N_{l p} + N_{h p})

, and the precision of the high MP class is defined as

P_{n p v} = N_{l n} / (N_{l n} + N_{h f})

. The overall classification accuracy is determined as

P_{a c c} = (N_{l n} + N_{l p} / (N_{l n} + N_{h f} + N_{l p} + N_{h p})

. Figure 6 compares the classification results of 10 training and test procedures. In the context of the training intra-session test and cross-session test, the classification algorithm based on CNN is adopted to calculate the classification performance indexes

P_{s p n}, P_{s p e}, P_{p r e}, P_{n p v}

and

P_{a c c}

. The test result in the session (case 2) is to use 3/5 data of case 1 for training, and the remaining 2/5 data are used for comparative testing. The test results of the cross-session (case 3) are calculated by using the data of session 1 for training and verification, and the data of session 2 for testing. The test results of cross-session (case 3) are obtained through training and verifying with the data of case 1 and testing with the data of case 2. According to case 1 in Section 3.2, the most significant EEG characteristics are derived.

The verification set SVA is only used for determining the number of nodes in the adaptive classifier in case 3, whereas the number of nodes in case 1 and case 2 is simply selected to be the same as that in case 3 for comparison. For the average

P_{s p n}, P_{s p e}, P_{p r e}, P_{n p v}

and

P_{a c c}

of each participant, in accordance with Wilcoxon signed rank test, the change between cases 2 and 3 is not significant. The results show that although the average performance indicators in the two cases are not significantly different, the average performance indicators in case 2 are better than those in case 3 under C, D and G. This means that intra-session MP classification is an easier task for adaptive classifiers.

The CNN-based recognition algorithm is run 10 times in the intra-session and cross-session situations. The optimal classification confusion matrix represented by the classification rate of each case is summarized in Table 2.

The total classification rate

P_{a c c}

is 0.9496. The values of

P_{s e n}, P_{s p e}

and

P_{a c c}

are decreased to 0.8578, 0.8389, and 0.8146 under cross-session situations. The matrix at the end of the table is the result of all participators’ correct or incorrect assessments of the EEG samples. The results show that the accuracy of the intra-session situation is shown to be much higher than that of the cross-session. After the Wilcoxon signed rank test, the values of

P_{s p e}

and

P_{a c c}

of all participants are significantly higher in the former case than in the latter case ((z = −2.67, S = 0.02) and (z = −2.23, p = 0.02)), and

P_{s e n}

is not significantly different in both cases (z = −1.79, p = 0.07).

4.5. Classification Results

This paper verifies the performance of the CNN model designed for cross-session MP classification. As shown in Figure 7, the average classification performance index of the 10 times adaptive model is compared with seven classification algorithms commonly used in EEG stress recognition for all classification algorithms, the training set and verification set are from the non-overlapping three-fifths and two-fifths data of session 1, and the test sample is all the data of session 2.

ANN represents a three-layer artificial neural network based on MLP. Each participant repeats the training and testing process of the neural network for 10 times, calculates its average value for comparison and statistical analysis, and sets the number of hidden neurons and input feature dimensions to 55. NB represents a naive Bayesian classifier without any pre-set parameters. KNN means K-nearest neighbor classifier with K = 30.

{SVM}_{l i n}

and

{SVM}_{r b f}

represent standard support vector machines using linear and nonlinear kernels with radial basis functions (RBFs), respectively. For linear support vector machines, the model selection of regularization parameters can be obtained with a training set, a validation set and 15 candidate parameter sets composed of {

2^{7}, 2^{- 6}, \dots, 2^{7}

}. For RBF-SVM, grid search regularization parameters and kernel width are used in the set {

2^{- 7}, 2^{- 6}, \dots, 2^{7}

}. BSVM indicates that the bounded support vector machine is a variant of the standard SVM. Similar to the implementation of the standard support vector machine, the linear kernel and RBF kernel use the same model selection criteria, respectively, which are represented as

{BSVM}_{l i n}

and

{BSVM}_{r b f}

. It can be found from Figure 7 that among all classifiers of

P_{s e n}

,

P_{n p v}

and

P_{a c c}

, the median of CNN model is the highest, while the median of the neural network classifier is the highest. Table 3 lists the results of comparing the five performance indicators between the CNN model and the other seven MP classifiers with Wilcoxon signed rank test. It shows that

P_{s e n}

,

P_{s p e}

and

P_{a c c}

have been significantly improved, and the values of

P_{p r e}

and

P_{n p v}

are equivalent to the performance of other classifiers.

The classification performance index in the case of single channel is shown in Figure 8. For all subjects, the five power characteristics of the P4, O1, F3, and O2 channels are conducted for MP classification, respectively. The parameter settings corresponding to each classification algorithm are the same as those defined in Section 4.5. In terms of different classification algorithms, the median of the CNN algorithm indexes

P_{s e n}

,

P_{n p v}

and

P_{a c c}

based on TL is the highest. The detailed statistical results of the TL-based CNN model and other classifiers by Wilcoxon signed rank test are shown in Table 3. Compared with

{BSVM}_{r b f}

,

P_{s e n}

and

P_{n p v}

have significant anti-preventive effects. Nevertheless, regarding other classification performance indicators, the improvement of adaptive CNN is not significant. The classification performance in the single frequency band is shown in Figure 8. The specific bands of participants A-G are

σ

,

γ

and

α

, and 11 EEG features are conducted, respectively. Compared with

{BSVM}_{l i n}

, KNN,

{BSVM}_{r b f}

,

{SVM}_{r b f}

, SVMlin, and ANN,

P_{s e n}

and

P_{n p v}

are significantly improved. Compared with Nb,

P_{p r e}

is also significantly improved. For the overall classification accuracy

P_{a c c}

, adaptive CNN is better than

{BSVM}_{l i n}

, Nb,

{SVM}_{l i n}

and ANN, and is equivalent to

{BSVM}_{r b f}

,

{SVM}_{r b f}

and KNN. Various DL models have been previously proposed; however, most of these works used the standard CE loss to learn subject-unique features. Meanwhile, a few papers adopted other training approaches such as adversarial training using Generative Adversarial Network (GAN) and contrastive loss to learn invariant representations of EEG. The generator/encoder in the GAN model was trained to learn session-invariant representations by hiding the session information from a discriminator, but this approach achieved poor CRR values ranging between 66.6% and 71.6% over 10 subjects only due to the short EEG epochs used for testing (0.5 s) [23,24]. Compared with the existing research, we put forward the scheme on the inheritance of the EEG deep learning model signal processing advantage, focus on solving the problem of the EEG signal high transition challenges, using the TL algorithm to ensure the invariance characteristics of EEG signal, to identify that the psychological stress state has obtained the ideal effect. The limitation of the model is that the automatic separation of EMG signals ensures the integrity of EEG signals, which is the focus of our next research.

4.6. Noise Robustness Analysis

The noise robustness of eight different MP classifiers in cross-session is evaluated by artificially adding different numbers of Gaussian distribution time processes (i.e., white Gauss) to EEG features. The purpose of the examination is to test whether the classifier is effective when the constraints controlled by the experimental paradigm are not available. Classifier input and parameter settings are the same as defined in Section 4.5. The detailed results of classification accuracy under different noise conditions are shown in Figure 9. The number of noise signals increased from 1 to 15. The standard deviation of the Gaussian distribution of noise also increases in 15 noise signals, where the mean value is zero. From the 1st to the 15th additional noise signal, the standard deviation values are 0.1, 0.2, …, 1.4, 1.5, respectively. Therefore, in the worst case, 15 noise signals with increasing pollution levels are superimposed on the EEG characteristics. The average classification performance is then calculated and compared across all participants.

The results show that the classification performance will decrease with the increase in the noise number of all classifiers. From the Figure 9e, it can be observed that the CNN classification algorithm based on TL achieves higher overall classification accuracy (

P_{a c c}

) than the other seven classification methods under all noise conditions. Thanks to the specially designed hierarchical structure, the CNN classification algorithm based on TL owns better noise robustness. For the performance index of single class classification, the CNN classification algorithm based on TL is shown to have superiority to other classifiers of

P_{s e n}

,

P_{p r e}

and

P_{n p v}

under all noise conditions. As the number of noise signals increases, the classification rate of high MP class (i.e.,

P_{s p e}

) decreases significantly, see Figure 9b. When EEG features are seriously polluted, this degradation will weaken the corresponding overall classification accuracy.

5. Conclusions

This paper designs a CNN deep learning model for brain stress state recognition in cross session situations with EEG biometrics, and classifies cross session MP through EEG signals. The model encodes EEG data into a feature space, takes the triplet loss (TL) function to maximize the distance between subjects and minimize the distance between embedded intra-subjects. The weights of the first hidden layer connected to the input layer in the CNN model are iteratively updated to track the continuous changes of EEG power characteristics. Through different feature selection and noise destruction paradigms, further performance comparison between CNN classification model and classical MP classifier shows that when taking comprehensive cortical information as network input, the proposed method is superior to shallow and static classifiers. The analysis and test of EEG characteristics of hidden neurons with different depths show that the hierarchical structure of the CNN model based on TL can describe the clear distribution of EEG data at a higher level. Moreover, in future studies, we will explore the depth classifiers to simulate more EEG internal noise information to better capture the non-stationary distribution of neurophysiological data, so as to more accurately utilize EEG for human state perception prediction.

Author Contributions

Conceptualization, S.Z. and T.G.; methodology, S.Z.; software, S.Z.; validation, S.Z. and T.G.; formal analysis, S.Z. and J.X.; data curation, S.Z.; writing—original draft preparation, S.Z.; writing—review and editing, S.Z., T.G. and J.X. All authors have read and agreed to the published version of the manuscript.

Funding

Supported by the National Natural Science Foundation of China under Grant Number 52130403.

Institutional Review Board Statement

Ethical review and approval were waived for this study, due to this paper only uses machine learning technology to analyze and study EEG data, and does not involve human ethics, so the ethical approval is not necessary in this paper.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable.

Conflicts of Interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

References

Borghini, G.; Astolfi, L.; Vecchiato, G. Measuring neurophysiological signals in aircraft pilots and car drivers for the assessment of mental workload, fatigue and drowsiness. Neurosci. Biobehav. Rev. 2014, 44, 58–75. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Yin, Z. SleepZzNet: Sleep Stage Classification Using Single-Channel EEG Based on CNN and Transformer. Int. J. Psychophysiol. 2021, 168, 85–102. [Google Scholar] [CrossRef]
Giraudet, L.; Imbert, J.P.; Berenger, M. The neuroergonomic evaluation of human machine interface design in air traffic control using behavioral and EEG/ERP measures. Behav. Brain Res. 2015, 294, 246–253. [Google Scholar] [CrossRef] [PubMed]
Fallahi, M.; Motamedzade, M.; Heidarimoghadam, R. Effects of mental workload on physiological and subjective responses during traffic density monitoring: A field study. Appl. Ergon. 2016, 52, 95–103. [Google Scholar] [CrossRef]
Huang, W.; Sun, F. Building feature space of extreme learning machine with sparse denoising stacked-autoencoder. Neurocomputing 2016, 174, 60–71. [Google Scholar]
Rangpong, P.; Sawangjai, P. EEGWaveNet: Multiscale CNN-Based Spatiotemporal Feature Extraction for EEG Seizure Detection. IEEE Trans. Ind. Inform. 2022, 8, 18–33. [Google Scholar]
Das, R.; Maiorana, E.; Campisi, P. EEG biometrics using visual stimuli: A longitudinal study. IEEE Signal Process. Lett. 2016, 23, 341–345. [Google Scholar] [CrossRef]
Chen, J.; Yu, Z.; Gu, Z. Deep temporal-spatial feature learning for motor imagery-based brain–computer interfaces. IEEE Trans. Neural Syst. Rehabil. Eng. 2020, 28, 2356–2366. [Google Scholar] [CrossRef]
Alwasiti, H.; Yusoff, M.Z.; Raza, K. Motor imagery classification for brain computer interface using deep metric learning. IEEE Access 2020, 8, 109949–109963. [Google Scholar] [CrossRef]
Ke, Y.; Qi, H.; Zhang, L. Towards an effective cross-task mental workload recognition model using electroencephalography based on feature selection and support vector machine regression. Int. J. Psychophysiol. 2015, 98, 157–166. [Google Scholar] [CrossRef]
Yin, Z.; Zhang, J. Operator functional state classification using least-square support vector machine based recursive feature elimination technique. Comput. Methods Programs Biomed. 2014, 113, 101–115. [Google Scholar] [CrossRef]
Maiorana, E. Deep learning for EEG-based biometric recognition. Neurocomputing 2020, 410, 374–386. [Google Scholar] [CrossRef]
Thakare, A.; Bhende, M.; Deb, N. Classification of Bioinformatics EEG Data Signals to Identify Depressed Brain State Using CNN Model. Biomed Res. Int. 2022, 10, 24–46. [Google Scholar] [CrossRef] [PubMed]
Sun, W.; Su, Y.; Wu, X. A novel end-to-end 1D-ResCNN model to remove artifact from EEG signals. Neurocomputing 2020, 404, 108–121. [Google Scholar] [CrossRef]
Hüpen, P.; Kumar, H.; Shymanskaya, A. Impulsivity Classification Using EEG Power and Explainable Machine Learning. Int. J. Neural Syst. 2023, 3, 19–27. [Google Scholar] [CrossRef] [PubMed]
Özdenizci, O.; Wang, Y.; Koike-Akino, T. Learning invariant representations from EEG via adversarial inference. IEEE Access 2020, 8, 27074–27085. [Google Scholar] [CrossRef]
Zhang, L.; Yang, X.; Liu, H. Efficient Residual Shrinkage CNN Denoiser Design for Intelligent Signal Processing: Modulation Recognition, Detection, and Decoding. IEEE J. Sel. Areas Commun. 2022, 40, 14–35. [Google Scholar] [CrossRef]
Tuncer, T.; Dogan, S.; Baygin, M. Tetromino pattern based accurate EEG emotion classification modeL. Artif. Intell. Med. 2022, 123, 42–63. [Google Scholar] [CrossRef]
Seha, S.N.A.; Hatzinakos, D. Human recognition using transient auditory evoked potentials: A preliminary study. IET Biom. 2018, 7, 242–250. [Google Scholar] [CrossRef]
Bhullar, A.; Nadeem, K.; Ali, A. Imultaneous multi-crop land suitability prediction from remote sensing data using semi-supervised learning. Sci. Rep. 2023, 13, 21–34. [Google Scholar] [CrossRef]
Khatwani, M.; Rashid, H.A.; Paneliya, H.A. Flexible Multichannel EEG Artifact Identification Processor using Depthwise-Separable Convolutional Neural Networks. ACM J. Emerg. Technol. Comput. Syst. 2021, 17, 1–21. [Google Scholar] [CrossRef]
Seha, S.N.A.; Hatzinakos, D. EEG-based human recognition using steady-state AEPs and subject-unique spatial filters. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3901–3910. [Google Scholar] [CrossRef]
Upadhyaya, V.; Salim, M. Effect of sensing matrices on quality index parameters for block sparse bayesian learning-based EEG compressive sensing. Int. J. Wavelets 2023, 21, 31–53. [Google Scholar] [CrossRef]
Moctezuma, L.; Abe, T.; Molinas, M. Two-dimensional CNN-based distinction of human emotions from EEG channels selected by multi-objective evolutionary algorithm. Sci. Rep. 2022, 12, 3523. [Google Scholar] [CrossRef]

Figure 1. TL based CNN model architecture diagram.

Figure 2. Illustration of the depth-wise and standard 2D convolution operations. (a) depth-wise 2D convolution: (1) input volume with shape 6 × 6 × 3, (2) channel separation, (3) channel filters with shape 3 × 3 × 2 (depth multiplier = 2), (4) output of the convolution operation for each input channel, (5) channel concatenation to get the final output with shape 6 × 6 × 6. (b) standard 2D convolution: (1) input same as (a), (2) four different convolution filters with shape 3 × 3 × 3, (3) convolution output with shape 6 × 6 × 4.

Figure 3. SIE box diagram of MP binary level.

Figure 4. Linear analysis results of EEG characteristics of each frequency band and time process of target MP class.

Figure 5. The most significant EEG characteristics of subjects in a cross-conversation situation.

Figure 6. Error bar graph of CNN model for training intra-session testing and cross-session testing. (a–e) represent the performance of

P_{s e n}, P_{s p e}, P_{p r e}, P_{n p v}

and

P_{a c c}

, respectively.

Figure 6. Error bar graph of CNN model for training intra-session testing and cross-session testing. (a–e) represent the performance of

P_{s e n}, P_{s p e}, P_{p r e}, P_{n p v}

and

P_{a c c}

, respectively.

Figure 7. Boxplot of cross-session MP classification performance indices for different classifiers. (a–e) represent the performance of

P_{s e n}, P_{s p e}, P_{p r e}, P_{n p v}

and

P_{a c c}

, respectively.

Figure 7. Boxplot of cross-session MP classification performance indices for different classifiers. (a–e) represent the performance of

P_{s e n}, P_{s p e}, P_{p r e}, P_{n p v}

and

P_{a c c}

, respectively.

Figure 8. Box diagram of cross session MP classification performance index related to the selected channel.

Figure 9. Relationship between the average classification performance index and the number of noise signals of different MP classifiers.

Table 1. Architecture of the proposed CNN model.

Layer	Parameters	Output Size	Options
input	-	( $P \times T^{C} \times 1$ )	-
Conv2D BN Average pooling Dropout	(1 × 5 × 25 × 32) bias:32 (2 × 32) - -	( $P \times T^{C}$ // 2 × 32) ( $P \times T^{C}$ //2 × 32) ( $P \times T^{C}$ //4 × 32) ( $P \times T^{C}$ //4 × 32)	activation:RELU strides = (1,2) axis = 3 pool size = (1,2), strides = (1,2) rate = 0.1
Conv2D BN Average pooling Dropout	(32 × 5 × 15 × 64) bias:64 (2 × 64) - -	( $P \times T^{C}$ //8 × 64) ( $P \times T^{C}$ //8 × 64) ( $P \times T^{C}$ //16 × 64) ( $P \times T^{C}$ //16 × 64)	activation:RELU strides = (1,2) axis = 3 pool size = (1,2), strides = (1,2) rate = 0.1
Conv2D BN Average pooling Dropout	(64 × 5 × 5 × 128) bias:64 (2 × 128) - -	( $P \times T^{C}$ //32 × 128 ( $P \times T^{C}$ //32 × 128) ( $P \times T^{C}$ //64 × 128) ( $P \times T^{C}$ //64 × 128)	activation:RELU strides = (1,2) axis = 3 pool size = (1,2), strides = (1,2) rate = 0.1
DepthwiseConv2D BN L2-norm	$P \times T^{C}$ //64 × 2 (2 × 256) -	256 256 256	Depth multiplier = 2 axis = 1 -

Table 2. Optimal classification confusion matrix in

P_{s p e}

in intra session and cross session situations.

Table 2. Optimal classification confusion matrix in

P_{s p e}

in intra session and cross session situations.

Test Object	Estimated Grade	Intra Session		Cross Session
		Target Category		Target Category
		Low	High	Low	High
Object1	Low	0.98	0.04	0.9811	0.13
	High	0.02	0.95	0.0189	0.87
	High	$P_{a c c}$ = 0.9683	0.95	$P_{a c c}$ = 0.9256	0.87
Object2	Low	1		0.8733	0.1545
	High	0	0.977	0.1267	0.8434
	High	$P_{a c c}$ = 0.9950	0.977	$P_{a c c}$ = 0.8578	0.8434
Object3	Low	0.9167	0.0789	0.87	0.2223
	High	0.0833	0.967	0.13	0.7123
	High	$P_{a c c}$ = 0.9217	0.967	$P_{a c c}$ = 0.8206	0.7123
Object4	Low	0.91	0.43	0.8578	0.1321
	High	0.08	0.78	0.1422	0.8321
	High	$P_{a c c}$ = 0.8	0.78	$P_{a c c}$ = 0.8706	0.8321
Object5	Low	0.96	0.0167	0.8411	0.06
	High	0.04	0.9866	0.1589	0.98
	High	$P_{a c c}$ = 0.9	0.9866	$P_{a c c}$ = 0.8956	0.98
Object6	Low	10.8	0.8	0.9989	0.1789
	High	0.3	0.7	0.0011	0.8045
	High	$P_{a c c}$ = 0.9	0.7	$P_{a c c}$ = 0.9011	0.8045
Object7	Low	0.95	0.0036	0.8389	0.1367
	High	0.05	0.9989	0.1611	0.8689
	High	$P_{a c c}$ = 0.97	0.9989	$P_{a c c}$ = 0.8522	0.8689
average value	Low	0.96	0.056	0.8123	0.14789
	High	0.03	0.943	0.1123	0.8468
	High	$P_{a c c}$ = 0.9496	0.943	$P_{a c c}$ = 0.8123	0.8468

Table 3. This is a wide table.

	$P_{sen}$	$P_{spe}$	$P_{pre}$	$P_{npv}$	$P_{acc}$
CNNvs.ANN	z = −2.56, p = 0.04	z = −1, p = 1.00	z = −0.15, p = 0.86	z = −2.38, p = 0.04	z = −2.23, p = 0.03
CNNvs.NB	z = −2.32, p = 0.04	z = −1.56, p = 0.19	z = −2.32, p = 0.08	z =−2.32, p = 0.04	z = −2.56, p = 0.03
CNNvs.KNN	z = −2.32, p = 0.04	z = −1.23, p = 0.367	z = −0.16, p = 1.38	z = −2.34, p = 0.04	z = −2.56, p = 0.03
CNNvs.SVMlin	z = −2.32, p = 0.04	z = −0.234, p = 0.45	z = −0.12, p = 1.32	z = −2.56, p = 0.04	z = −2.32, p = 0.04
CNNvs. $SVM$ rbf	z = −2.56, p = 0.03	z = −0.878, p = 0.89	z = −1.34, p = 0.36	z = −2.67, p = 0.03	z = −2.32, p = 0.04
CNNvs. $BSVM$ lin	z = −2.32, p = 0.04	z = −1.167, p = 0.23	z = −0.15, p = 1.34	z = −2.89, p = 0.03	z = −2.56, p = 0.03
CNNvs. $BSVM$ rbf	z = −2.56, p = 0.03	z = −0.84, p = 0.78	z = −1.67, p = 0.37	z = −2.89, p = 0.03	z = −2.32, p = 0.04

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, S.; Gao, T.; Xu, J. Mental Pressure Recognition Method Based on CNN Model and EEG Signal under Cross Session. Symmetry 2023, 15, 1173. https://doi.org/10.3390/sym15061173

AMA Style

Zhou S, Gao T, Xu J. Mental Pressure Recognition Method Based on CNN Model and EEG Signal under Cross Session. Symmetry. 2023; 15(6):1173. https://doi.org/10.3390/sym15061173

Chicago/Turabian Style

Zhou, Song, Tianhan Gao, and Jun Xu. 2023. "Mental Pressure Recognition Method Based on CNN Model and EEG Signal under Cross Session" Symmetry 15, no. 6: 1173. https://doi.org/10.3390/sym15061173

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mental Pressure Recognition Method Based on CNN Model and EEG Signal under Cross Session

Abstract

1. Introduction

2. Preliminaries

3. The Proposed Method

3.1. EEG Signal Pre-Processing

3.2. TL-Based CNN Model

3.2.1. Input Layer

3.2.2. Two Dimensional Convolution

3.2.3. Activation Layer

3.2.4. Batch Normalization (BN) Layer

3.2.5. Dropout Layer

3.2.6. Depth-Wise 2D Convolution Layer

3.3. Triple Loss

4. Experimental Analysis

4.1. Model Training and Triplet Acquisition

4.2. Task Setting

4.3. Significant EEG Characteristics

4.4. Intra Session and Cross Session MP Classification

4.5. Classification Results

4.6. Noise Robustness Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI