Transfer EEG Emotion Recognition by Combining Semi-Supervised Regression with Bipartite Graph Label Propagation

Li, Wenzheng; Peng, Yong

doi:10.3390/systems10040111

Open AccessArticle

Transfer EEG Emotion Recognition by Combining Semi-Supervised Regression with Bipartite Graph Label Propagation

by

Wenzheng Li

¹

and

Yong Peng

^1,2,*

¹

School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China

²

Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

Systems 2022, 10(4), 111; https://doi.org/10.3390/systems10040111

Submission received: 29 June 2022 / Revised: 25 July 2022 / Accepted: 26 July 2022 / Published: 29 July 2022

(This article belongs to the Section Systems Practice in Social Science)

Download

Browse Figures

Versions Notes

Abstract

:

Individual differences often appear in electroencephalography (EEG) data collected from different subjects due to its weak, nonstationary and low signal-to-noise ratio properties. This causes many machine learning methods to have poor generalization performance because the independent identically distributed assumption is no longer valid in cross-subject EEG data. To this end, transfer learning has been introduced to alleviate the data distribution difference between subjects. However, most of the existing methods have focused only on domain adaptation and failed to achieve effective collaboration with label estimation. In this paper, an EEG feature transfer method combined with semi-supervised regression and bipartite graph label propagation (TSRBG) is proposed to realize the unified joint optimization of EEG feature distribution alignment and semi-supervised joint label estimation. Through the cross-subject emotion recognition experiments on the SEED-IV data set, the results show that (1) TSRBG has significantly better recognition performance in comparison with the state-of-the-art models; (2) the EEG feature distribution differences between subjects are significantly minimized in the learned shared subspace, indicating the effectiveness of domain adaptation; (3) the key EEG frequency bands and channels for cross-subject EEG emotion recognition are achieved by investigating the learned subspace, which provides more insights into the study of EEG emotion activation patterns.

Keywords:

electroencephalogram (EEG); emotion recognition; cross-subject; transfer learning; joint label estimation

1. Introduction

In 1964, Micheal Beldoch first introduced the idea of Emotional Intelligence (EI) in [1] which examined three modes of communication (i.e., vocal, musical, and graphic) to identify nonverbal emotional expressions. In 1990, Salovey and Mayer formally put forward the concept of EI and considered emotional intelligence as an important component of artificial intelligence in addition to logical intelligence [2]. The key of EI is that machines can recognize the emotional state of humans automatically and accurately. Endowing machines with EI is indispensable to natural human–machine interaction, which makes machines more humanized in communication [3,4]. In addition, endowing machines with EI has great impacts in many fields such as artificial intelligence emotional nursing, human health, and patient monitoring [5]. Emotion is a state that integrates people’s feelings, thoughts, and behaviors. It includes not only people’s psychological response to the external environment or self-stimulation, but also the physiological response accompanying this psychological response [6]. Compared with the widely used data modalities such as image, video, speech, and text [7,8,9], EEG has its unique advantages such as high time resolution. In addition, EEG is difficult to camouflage in emotion recognition since it is directly generated from the neural activities of the central nervous system [10]. Therefore, EEG is widely used in the field of objective emotion recognition [11] and some other brain–computer interface paradigms [12].

Nowadays, with the continuous development of computer technology, bioscience, neuroscience, and other disciplines, EEG-based emotion recognition has more and more potential applications in diverse fields such as healthcare, education, entertainment, and neuromarketing [13,14,15]. Meanwhile, researchers have been paying continuous attention to it and many machine learning or deep learning models for EEG-based emotion recognition have been proposed. Murugappan et al. mixed the EEG samples of subjects’ composed of four states, i.e., happiness, fear, disgust, and surprise, and divided them by fuzzy c-means clustering method. After that, the samples with similar characteristics were identified by looking for the inherent characteristics of the category itself [16]. Thejaswini et al. extracted EEG time-frequency features and then performed emotion recognition by a channel fusion method together with a channel-wise supervised SVM classifier [17]. The experimental results of [18] verified the possibility of exploring robust EEG features in cross-subject emotional recognition. Ali et al. proposed to decompose EEG signals via multivariate empirical mode decomposition (MEMD), and then employed deep learning methods to classify different emotional states [19]. Although deep learning models generally obtained better performance, their results were usually difficult to interpret due to the black-box training mode [20,21], which were widely used in subject-independent EEG emotion recognition. In [22], deep networks were used to simultaneously minimize the recognition error on source data and force the latent representation similarity (LRS) of source and target data to be similar. To reduce the risk of negative transfer, a transferrable attention neural network was proposed to learn the emotional discriminative information by highlighting the transferrable brain regions data and samples by local and global attention mechanism [23]. According to the emotional brain’s asymmetries between left and right hemispheres, EEG data of both hemispheres are separately mapped into discriminative feature spaces [24,25]. Zheng et al. firstly introduced the deep belief network into EEG-based emotion recognition to classify the three states, i.e., positive, neutral, and negative. Although they studied the key frequency bands and channels of EEG emotion recognition [26], its underlying mechanism was still not intuitive enough. In subsequent studies, it was found that the

G a m m a

frequency band is the most important one in emotion recognition [27]. Recent advances in EEG-based emotion recognition can be found in [5,28,29,30].

Though EEG could objectively and accurately describe the emotional state of subjects, EEG is typically weak and nonstationary. Therefore, EEG data collected from different subjects under the same emotional state might have considerable discrepancies due to the distinction of individual physiology and psychology [5], leading to the poor performance of traditional machine learning methods in cross-subject EEG emotion recognition. To solve this problem, the concept of Transfer Learning is introduced to reduce the differences between cross-subject EEG data, and to improve the universality of affective brain–computer interface system [31,32]. Its basic idea is to use the knowledge of auxiliary domain to facilitate the emotion recognition task of the target domain. The feature transformation-based transfer learning method is the most widely one among the existing models, which aims to project the features of the source and target domain data into a subspace where the between-domain data distribution difference is minimized. Zheng et al. early on proposed to build personalized EEG-based emotion recognition using transfer learning and in [33], both knowledge transfer by both feature transformation and model parameters sharing were tested for cross-subject emotion recognition. Zhou et al. proposed a novel transfer learning framework with Prototypical Representation-based Pairwise Learning to characterize EEG data with prototypical representations. The characterized prototypical representations are evident with a high feature concentration within one single emotion category and a high feature separability across different emotion categories. They finally formulated the EEG-based emotion recognition task as pairwise learning [34]. Bahador et al. proposed to extract spectral features from the collected 10-channel EEG data through a pre-trained network to quantify the direct influence among channels. The spectral-phase information of EEG data was encoded into a bi-dimensional map, which is further used to perform knowledge transfer by characterizing the propagation patterns from one channel to the others [35].

Although transfer learning has been widely used in EEG-based emotion recognition to align the EEG data from different subjects [36], most of the existing researches simply place the emphasis on domain-invariant feature learning and recognition accuracy. Therefore, it is necessary to jointly optimize the recognition process in combination with the domain-invariant feature learning. In [22], neural networks were used to simultaneously minimize the recognition error on source data and force the latent representations of source and target data to be similar. Ding et al. constructed an undirected graph to characterize the source and target sample connections, based on which the transfer feature distribution alignment process is optimized together with the graph-based semi-supervised label propagation task [37]. However, this graph was constructed by the original space data and is not dynamically updated during the model optimization; therefore, it cannot well describe the sample connections between the two domains. In addition to the recognition accuracy, most existing studies only visualized the aligned distributions of source and target EEG data and did not sufficiently investigate the properties of the learned shared subspace in emotion expression [22,38,39].

In view of the above shortcomings, this paper proposes an EEG transfer emotion recognition method combining semi-supervised regression with bipartite-graph label propagation. Compared with the existing studies, the present work makes the following contributions.

The semi-supervised label propagation method based on sample-feature bipartite graph and semi-supervised regression method are combined to form a unified framework for joint common subspace optimization and emotion recognition. We first achieve better data feature distribution alignment through EEG feature transfer, based on which we then construct a better sample-feature bipartite graph and sample-label mapping matrix to promote the estimation of EEG emotional state in the target domain;
The EEG emotional state in the target domain is estimated by a bi-model fusion strategy. First, a sample-feature bipartite graph is constructed based on the premise that similar samples have similar feature distributions. This graph is used to characterize the sample-feature connections between the source and the target domain for label propagation, as shown by the ‘Bi-graph label propagation’ part of Figure 1. Furthermore, a semi-supervised regression is used to learn a mapping matrix to describe the intra-domain connections between samples and labels, which aims to estimate the EEG emotional state of the target domain. By fusing both models, the EEG emotional state of the target domain is estimated from the perspective of similar feature distributions should be shared by samples from the same emotional state;
We explore the EEG emotion activation patterns from the learned common subspace shared by source and target domains, which is based on the rationality that the subspace should retain the common features of the source and the target domain and inhibit the non-common features. We measure the importance of each EEG feature dimension by the normalized $ℓ_{2}$ -norm of each row of the projection matrix. Based on the coupling correspondence between EEG features and the frequency bands and channels, the importance of frequency bands and brain regions in EEG emotion recognition are quantified.

Notations. In this paper, the EEG frequency bands are represented by Delta, Theta, Alpha, Beta, and Gamma. Greek letters such as

α

,

λ

represent the model parameters. Matrices and vectors are denoted by boldface uppercase and lowercase letters, respectively. The

ℓ_{2, 1}

-norm of matrix

A \in R^{r \times c}

is defined as

{∥ A ∥}_{2, 1} = \sum_{i = 1}^{r} \sqrt{\sum_{j = 1}^{c} a_{i j}^{2}} = \sum_{i = 1}^{r} {∥ a^{i} ∥}_{2}

, where

a^{i}

is the i-th row of

A

.

2. Methodology

In this section, we first introduce its model formulation and then its optimization algorithm.

2.1. Problem Definition

Suppose that the labeled EEG samples from one subject

{X_{s}, Y_{s}} = {(x_{s i}, y_{s}^{i})}_{i = 1}^{n_{s}}

define the source domain

D_{s}

, and the unlabeled EEG samples from the other subject

{X_{t}} = {x_{t j}}_{j = 1}^{n_{t}}

form the target domain

D_{t}

, where

X_{s} \in R^{d \times n_{s}}

,

X_{t} \in R^{d \times n_{t}}

,

Y_{s} \in R^{n_{s} \times c}

.

x_{s i} \in R^{d}

,

x_{t j} \in R^{d}

are, respectively, the i-th and j-th samples in the source and target domains.

y_{s}^{i} |_{i = 1}^{n_{s}} \in R^{1 \times c}

is the label vector of sample i-th source sample which is encoded in one-hot vector, d is the feature dimension, c is the number of emotional states,

n_{s}

and

n_{t}

are the number of samples in source and target domains, respectively, and

n = n_{s} + n_{t}

is the total number of all domains samples. The feature space and label space of both domains are the same, i.e.,

X_{s} = X_{t}

and

Y_{s} = Y_{t}

; however, their marginal distributions and conditional distributions are different due to the individual differences of EEG, i.e.,

P_{s} (X_{s}) \neq P_{t} (X_{t})

and

P_{s} (Y_{s} | X_{s}) \neq P_{t} (Y_{t} | X_{t})

.

As shown in Figure 1, we propose a joint method for EEG emotion recognition. The model consists of two parts, domain adaptation, and semi-supervised joint label estimation. Below, we introduce them in detail.

2.2. Domain Alignment

Suppose that the distribution differences of source and target EEG data can be minimized in their subspace representations. We measure the marginal and conditional distribution differences between the source and target domain subspace data through the Maximum Mean Discrepancy (MMD) criterion [40]. In detail, we project the source and target domain data into respective subspaces by two matrices; that is, we define

P_{s} \in R^{d \times p}

is the projection matrix of the source domain and

P_{t} \in R^{d \times p}

is the one of the target domain, where p (

p ≪ d

) is the subspace dimensionality. Then, the projected data of two domains can be represented as

P_{s}^{T} X_{s}

and

P_{t}^{T} X_{t}

, respectively. Marginal distribution alignment can be achieved by minimizing the distance between the sample means of the two domains, that is,

M_{d i s t} (P_{s}, P_{t}) = {∥\frac{1}{n_{s}} \sum_{i = 1}^{n_{s}} P_{s}^{T} x_{s i} - \frac{1}{n_{t}} \sum_{j = 1}^{n_{t}} P_{t}^{T} x_{t j}∥}_{2}^{2} = {∥\frac{P_{s}^{T} X_{s} 1_{n_{s}}}{n_{s}} - \frac{P_{t}^{T} X_{t} 1_{n_{t}}}{n_{t}}∥}_{2}^{2} .

(1)

Similarly, conditional distribution alignment aims to minimize the distance between the sample means belonging to the same class of the two domains, that is,

C_{d i s t} (P_{s / t}, F_{t}) = \sum_{k = 1}^{c} {∥\frac{1}{n_{s}^{k}} \sum_{i = 1}^{n_{s}^{k}} P_{s}^{T} x_{s i} - \frac{1}{n_{t}^{k}} \sum_{j = 1}^{n_{t}^{k}} f_{t}^{(k, j)} P_{t}^{T} x_{t j}∥}_{2}^{2} = {∥P_{s}^{T} X_{s} Y_{s} N_{s} - P_{t}^{T} X_{t} F_{t} N_{t}∥}_{2}^{2},

(2)

where

n_{s}^{k}

and

n_{t}^{k}

denote the number of samples belonging to the

{k - th |}_{k = 1}^{c}

emotional state in source and target domains, respectively.

1_{n_{s}} \in R^{n_{s}}

and

1_{n_{t}} \in R^{n_{t}}

are all-one column vectors.

f_{t}^{((k, j))} > 0 (\sum_{k = 1}^{c} f_{t}^{((k, j))} = 1)

denote the probability that the j-th target domain sample belongs to the k-th emotional state category.

N_{s}

(

N_{t}

is the diagonal matrix whose k-th diagonal element is

1 / n_{s}^{k}

(

1 / n_{t}^{k}

). However, the label information of target domain data is not available. Here, we utilize the probability class adaptive formula [37] to estimate the target domain label and we denote by

F_{t} \in R^{n_{t} \times c}

.

For simplicity, we combine

M_{d i s t}

and

C_{d i s t}

with the same weight. Thus, the joint distribution alignment is formulated as

D i s t = M_{d i s t} + C_{d i s t} .

(3)

For clarity, we rewrite (3) in matrix form as

\begin{matrix} D i s t = & min_{P_{s / t}, F_{t}} {∥P_{s}^{T} X_{s} {\bar{Y}}_{s} {\bar{N}}_{s} - P_{t}^{T} X_{t} {\bar{F}}_{t} {\bar{N}}_{t}∥}_{F}^{2} \\ s . t . P_{s / t}^{T} X_{s / t} H_{s / t} X_{s / t}^{T} P_{s / t} = I_{p}, \end{matrix}

(4)

where

H_{s / t} = I_{n_{s / t}} - 1 / n_{s / t} 1_{n_{s / t}} 1_{n_{s / t}}^{T}

is the centralization matrix,

I_{n_{s / t}} \in R^{n_{s / t} \times n_{s / t}}

is the identify matrix,

{\bar{Y}}_{s} = [I_{n_{s}}, Y_{s}] \in R^{n_{s} \times (c + 1)}, {\bar{F}}_{t} = [I_{n_{t}}, F_{t}] \in R^{n_{t} \times (c + 1)}

is the extended label matrix,

{\bar{N}}_{s / t} = d i a g (1 / n_{s / t}, N_{s / t}) \in R^{(c + 1) \times (c + 1)}

. Additionally, to avoid too much divergence between source and target domain in the projecting process, we minimize the distance between them by

min_{P_{s}, P_{t}} ∥ P_{s} - P_{t} ∥_{2, 1} .

(5)

2.3. Label Estimation

We reduce the divergence between the source and the target domain by Equation (4) and simultaneously expect that better target labels can be calculated. In order to describe the target domain label estimation process from two aspects, we use a bi-model fusion method to estimate the target domain label. On one aspect, a semi-supervised label propagation method is used for emotional state estimation which is based on the sample-feature bipartite graph. The graph is constructed by characterizing the connections among EEG features and samples. On the other aspect, a semi-supervised regression method is used to estimate the EEG emotional state in the target domain. The two models are adaptively balanced to achieve more accurate target domain label estimation.

2.3.1. Bipartite Label Propagation

The semi-supervised label propagation method based on a sample-feature bipartite graph is used to estimate the label of target domain samples, which has the following formula

min_{G, F_{t}} {∥S - A∥}_{F}^{2} + λ Tr (Y^{T} L Y),

(6)

where

A = [0_{n}, B; B^{T}, 0_{p}] \in R^{(n + p) \times (n + p)}

is the bipartite graph similar matrix,

0_{n} \in R^{n \times n}

,

0_{p} \in R^{p \times p}

are all-zero matrices, and the matrix

B \in R^{n \times p}

is the sample-feature similarity matrix determined by both source and target data in their subspace representations. Based on matrix

B

, we expect to learn a better bipartite graph similarity matrix

G \in R^{n \times p}

, and then we can form the corresponding matrix

S = [0_{n}, G; G^{T}, 0_{p}] \in R^{(n + p) \times (n + p)}

with respect to

A

.

λ

is a regularization parameter,

Y = [Y_{s}; F_{t}; F_{d}] \in R^{(n + p) \times c}

is the label matrix including samples label matrix

F = [Y_{s}, F_{t}] \in R^{n \times c}

and features label matrix

F_{d} \in R^{p \times c}

for the subspace features, matrix

L = D - S \in R^{(n + p) \times (n + p)}

is the graph Laplacian matrix, and

D = [D_{1}, 0_{n \times p}; 0_{p \times n}, D_{2}] \in R^{(n + p) \times (n + p)}

is a diagonal matrix whose diagonal elements are

d_{i i} |_{i = 1}^{n + p} = \sum_{j = 1}^{n + p} s_{i j}

,

0_{n \times p} \in R^{n \times p}

and

0_{p \times n} \in R^{p \times n}

are all-zero matrices,

s_{i j}

is the element in row i and column j of matrix

S

.

Tr (\cdot)

is the trace of a certain matrix.

2.3.2. Semi-Supervised Regression

For the semi-supervised regression method in target domain label estimation, we have its formula as

min_{W, F_{t}, b} {∥X_{new}^{T} W - F + 1 b^{T}∥}_{F}^{2} + γ {∥W∥}_{2, 1}^{2},

(7)

where

W \in R^{p \times c}

is the sample-label mapping matrix,

γ

is a regularization parameter,

X_{n e w} \in R^{n \times p}

is the subspace data and

b \in R^{1 \times c}

is the offset variable.

{∥\cdot∥}_{2, 1}^{2}

represents the squared

ℓ_{2, 1}

-norm.

2.3.3. Fused Label Estimation Model

Based on the above analysis in Section 2.3.1 and Section 2.3.2, we combined the two models in (6) and (7), and we obtained the fused model objective function for target domain label estimation as

\begin{matrix} min_{W, b, G, F_{t}} & α ({∥X_{new}^{T} W - F + 1 b^{T}∥}_{F}^{2} + γ {∥W∥}_{2, 1}^{2}) + β ({∥S - A∥}_{F}^{2} + λ Tr (Y^{T} L Y)) \\ s . t . & G \geq 0, G 1_{p} = 1_{n}, F_{t} \geq 0, F_{t} 1_{c} = 1_{n_{t}}, \end{matrix}

(8)

where

α, β

is the regularization parameter,

1_{p}, 1_{n}, 1_{c}, 1_{n_{t}}

are the all-one column vector with dimensions

R^{p \times 1}, R^{n \times 1}, R^{c \times 1}, R^{n_{t} \times 1}

.

2.4. Overall Objective Function

As stated previously, we jointly optimize domain adaptation and semi-supervised joint label estimation. On the one hand, domain adaptation effectively reduces the differences in EEG data feature distribution among subjects and provides well-aligned data for joint label estimation; on the other hand, a better target domain label can promote the alignment of conditional distributions of source and target domains. Therefore, we combine them in a unified framework and finally obtain the objective function of TSRBG as

\begin{matrix} min & {∥P_{s}^{T} X_{s} {\bar{Y}}_{s} {\bar{N}}_{s} - P_{t}^{T} X_{t} {\bar{F}}_{t} {\bar{N}}_{t}∥}_{F}^{2} + α {∥X^{T} PW - F + 1 b^{T}∥}_{F}^{2} \\ + γ ({∥W∥}_{2, 1}^{2} + {∥P_{s} - P_{t}∥}_{2, 1}) + β {∥G - B∥}_{F}^{2} + λ Tr (Y^{T} L_{s} Y) \\ s . t . & P_{s / t}^{T} X_{s / t} H_{s / t} X_{s / t}^{T} P_{s / t} = I_{p}, G \geq 0, G 1 = 1, F \geq 0, F 1 = 1, \end{matrix}

(9)

where

α, β, γ, λ

are the regularization parameters.

2.5. Optimization

There are seven variables in Equation (9), which are the mapping matrix

W

, the offset vector

b

, the source domain projection matrix

P_{s}

, the target domain projection matrix

P_{t}

, the sample-feature similar matrix

G

, the feature label matrix

F_{d}

, and the target domain label matrix

F_{t}

. We propose to update one variable by fixing the others. The detailed updating rule for each variable is derived below.

Update $W$ . The objective function in terms of variable $W$ is

$min_{W} α {∥X^{T} PW - F + 1 b^{T}∥}_{F}^{2} + γ {∥W∥}_{2, 1}^{2} .$

(10)

There are four variables,

P, W, b, F_{t}

, in Equation (10). We need to initialize these variables apart from

W

. For target domain label matrix

F_{t}

, we utilize the probability class adaptive formula [37] to estimate the target domain label and the initial value of each element is

\frac{1}{c}

, where c is the number of emotional state categories. For subspace projection matrix

P = [P_{s}, P_{t}]

, we initialize them by Principal Component Analysis (PCA) [41] on the original EEG data.

Taking the derivative of Equation (10) w.r.t.

b

and setting it to zero, we have

b = \frac{1}{n} (Y^{T} 1 - W^{T} P^{T} X 1) .

(11)

By substituting Equation (11) into (10), we obtain

min_{W} {∥H X^{T} PW - H F∥}_{F}^{2} + \frac{γ}{α} {∥W∥}_{2, 1}^{2},

(12)

where

H = I_{n} - \frac{1}{n} 1_{n} 1_{n}^{T} \in R^{n \times n}

is centralization matrix and

I_{n} \in R^{n \times n}

is identify matrix,

1_{n} \in R^{n \times n}

is an all-one matrix.

Constructing Lagrange function about

W

based on Equation (12), we have

L (W) = {∥H X^{T} PW - H F∥}_{F}^{2} + \frac{γ}{α} Tr (W^{T} QW),

(13)

where

Q \in R^{p \times p}

is a diagonal matrix whose i-th diagonal element is

q_{i i} = \frac{\sum_{i = 1}^{p} \sqrt{∥ w^{i} ∥_{2}^{2} + ϵ}}{\sqrt{∥ w^{i} ∥_{2}^{2} + ϵ}},

(14)

and

ϵ

is a fixed minimal constant value,

w^{i} \in R^{1 \times c}

is i-th row vector of

W

,

{∥ \cdot ∥}_{2}^{2}

represents the squared

ℓ_{2}

-norm.

Taking the derivative of Equation (13) w.r.t.

W

and setting it to zero, we obtain

W = {(P^{T} X H X^{T} P + \frac{2 γ}{α} Q)}^{- 1} (P^{T} X H F) .

(15)

Update $P$ . The objective function in terms of variable $P$ is

$min {∥P_{s}^{T} X_{s} {\bar{Y}}_{s} {\bar{N}}_{s} - P_{t}^{T} X_{t} {\bar{F}}_{t} {\bar{N}}_{t}∥}_{F}^{2} + α {∥X^{T} PW - F + 1 b^{T}∥}_{F}^{2} + γ {∥P_{s} - P_{t}∥}_{2, 1} .$

(16)

First, we need to convert the

ℓ_{2, 1}

-norm into the trace form. Similar to matrix

Q

, we define

M = [M_{0}, - M_{0}; - M_{0}, M_{0}] \in R^{2 d \times 2 d}

, where

M_{0} \in R^{d \times d}

is a diagonal matrix with its i-th diagonal elements

m_{i i} = \frac{1}{∥ {(P_{s} - P_{t})}^{i} ∥_{2}} .

(17)

Here

{(P_{s} - P_{t})}^{i}

is i-th row vector of

(P_{s} - P_{t})

,

{∥ \cdot ∥}_{2}^{2}

represents the squared

ℓ_{2}

-norm. By defining

T = [\begin{matrix} X_{s} {\bar{Y}}_{s} {\bar{N}}_{s} {\bar{N}}_{s}^{T} {\bar{Y}}_{s}^{T} X_{s}^{T} & - X_{s} {\bar{Y}}_{s} {\bar{N}}_{s} {\bar{N}}_{t}^{T} {\bar{F}}_{t}^{T} X_{t}^{T} \\ - X_{t} {\bar{F}}_{t} {\bar{N}}_{t} {\bar{N}}_{s}^{T} {\bar{Y}}_{s}^{T} X_{s}^{T} & X_{t} {\bar{F}}_{t} {\bar{N}}_{t} {\bar{N}}_{t}^{T} {\bar{F}}_{t}^{T} X_{t}^{T} \end{matrix}] \in R^{2 d \times 2 d},

(18)

we construct the Lagrangian function in terms of variable

P

as

L (P) = Tr (P^{T} T P) + α Tr (P^{T} {XHX}^{T} {PWW}^{T}) - α Tr (P^{T} {XHFW}^{T}) + γ Tr (P^{T} M P) .

(19)

Taking the derivative of Equation (19) w.r.t.

P

and setting it to zero, we have

{({XHX}^{T})}^{- 1} (T + γ M) P + P (α {WW}^{T}) = {({XHX}^{T})}^{- 1} ({XHFW}^{T}) .

(20)

For Equation (20), we can solve it by Sylvester equation [42] and then obtain the source domain projection matrix

P_{s}

and the target domain projection matrix

P_{t}

.

Update $G$ . The corresponding objective function is

$\begin{matrix} min & β {∥S - A∥}_{F}^{2} + λ Tr (Y^{T} L_{s} Y) \\ s . t . & G \geq 0, G 1 = 1 . \end{matrix}$

(21)

We propose to solve

G

in a row-wise manner. Accordingly, we convert Equation (21) to

β \sum_{i = 1}^{n} \sum_{j = 1}^{p} {(g_{i j} - b_{i j})}^{2} + λ \sum_{i = 1}^{n} \sum_{j = 1}^{p} {∥f^{i} - f_{d}^{j}∥}_{2}^{2} g_{i j},

(22)

where

g_{i j}, b_{i j}

are the

(i, j)

-elements of matrix

G, B

respectively,

f^{i}

is i-th row vector of label matrix

F

and

f_{d}^{j}

is j-th row vector of matrix

F_{d}

.

By defining

v_{i j} = {∥f^{i} - f_{d}^{j}∥}_{2}^{2}

, and completing the squared form of

g^{i}

, Equation (21) is equivalent to

\begin{matrix} min_{g^{i}} & {∥g^{i} - (b^{i} - \frac{λ}{2 β} v^{i})∥}_{2}^{2} \\ s . t . & g^{i} \geq 0, g^{i} 1 = 1, \end{matrix}

(23)

which defines an Euclidean distance on a simplex [43].

Update $F_{d}$ . The objective function in terms of variable $F_{d}$ is

$min_{F_{d}} λ Tr (Y^{T} L_{s} Y),$

(24)

which can be decomposed into

min_{F_{d}} λ Tr (F_{d}^{T} D_{2} F_{d} - 2 F_{d}^{T} G^{T} F) .

(25)

Then, the Lagrangian function of Equation (25) is

L (F_{d}) = λ Tr (F_{d}^{T} D_{2} F_{d} - 2 F_{d}^{T} G^{T} F) .

(26)

Taking the derivative of Equation (26) w.r.t.

F_{d}

and setting it to zero, we have

F_{d} = {(D_{2})}^{- 1} G^{T} F .

(27)

Update $F_{t}$ . The objective function in terms of variable $F_{t}$ is

$\begin{matrix} min & {∥P_{s}^{T} X_{s} {\bar{Y}}_{s} {\bar{N}}_{s} - P_{t}^{T} X_{t} {\bar{F}}_{t} {\bar{N}}_{t}∥}_{F}^{2} + α {∥X^{T} PW - F + 1 b^{T}∥}_{F}^{2} + λ Tr (Y^{T} L_{s} Y) \\ s . t . & F \geq 0, F 1 = 1 . \end{matrix}$

(28)

By some linear algebra transforms, the first term of Equation (28) can be reformulated as

Tr (P_{t}^{T} X_{t} F_{t} N_{t} N_{t} F_{t}^{T} X_{t}^{T} P_{t}) - 2 Tr (P_{s}^{T} X_{s} Y_{s} N_{s} N_{t} F_{t}^{T} X_{t}^{T} P_{t}) .

(29)

Similarly, the last two terms of Equation (28) can be written as

α (Tr (F_{t}^{T} H_{t} F_{t}) - 2 Tr (F_{t}^{T} H_{t} X_{t}^{T} P_{t} W)) + λ (Tr (F_{t}^{T} D_{t} F_{t}) - 2 Tr (F_{t}^{T} G_{t}^{T} F_{d})),

(30)

where

H_{t} = I_{n_{t}} - \frac{1}{n_{t}} 1_{n_{t}} 1_{n_{t}}^{T}

.

By constructing the Lagrangian function based on Equations (28)–(30), we have

\begin{matrix} L (F_{t}) = & Tr (P_{t}^{T} X_{t} F_{t} N_{t} N_{t} F_{t}^{T} X_{t}^{T} P_{t}) - 2 Tr (P_{s}^{T} X_{s} Y_{s} N_{s} N_{t} F_{t}^{T} X_{t}^{T} P_{t}) \\ + & α (Tr (F_{t}^{T} H_{t} F_{t}) - 2 Tr (F_{t}^{T} H_{t} X_{t}^{T} P_{t} W)) + λ (Tr (F_{t}^{T} D_{t} F_{t}) - 2 Tr (F_{t}^{T} G_{t}^{T} F_{d})) \\ + & Tr (Φ F_{t}) + η {∥1_{n_{t}} - F_{t} 1_{c}∥}_{2}^{2} . \end{matrix}

(31)

Taking the derivative of Equation (31) w.r.t.

F_{t}

and setting it to zero, we have

\begin{matrix} X_{t}^{T} P_{t} P_{t}^{T} X_{t} F_{t} N_{t} N_{t} - X_{t}^{T} P_{t} P_{s}^{T} X_{s} Y_{s} N_{s} N_{t} + (α H_{t} + λ D_{t}) F_{t} - α (H_{t} X_{t}^{T} P_{t} W) \\ - & λ G_{t}^{T} F_{d} + Φ - η (1_{n_{t}} - F_{t} 1_{c}) 1_{c}^{T} = 0 . \end{matrix}

(32)

To simplify the notations, we define

\begin{matrix} Z_{t} = X_{t}^{T} P_{t} P_{t}^{T} X_{t} F_{t} N_{t} N_{t} + α H_{t} F_{t} = Z_{t}^{+} - Z_{t}^{-} \\ Z_{s} = X_{t}^{T} P_{t} P_{s}^{T} X_{s} Y_{s} N_{s} N_{t} + α (H_{t} X_{t}^{T} P_{t} W) = Z_{s}^{+} - Z_{s}^{-}, \end{matrix}

(33)

where

Z_{t}^{+} and Z_{s}^{+}

means all negative elements in matrix

Z_{t} and Z_{s}

are replaced by zero; similarly,

Z_{t}^{-} and Z_{s}^{-}

means all positive elements in matrix

Z_{t} and Z_{s}

are replaced by zero and the negative take the absolute value.

Based on the Karush–Kuhn–Tucker (KKT) condition

Φ ⊙ F_{t} = 0

(where ⊙ is the Hadamard product), we have

F_{t} = \frac{Z_{t}^{-} + Z_{s}^{+} + λ G_{t}^{T} Y_{d} + η 1_{n_{t} \times c}}{Z_{t}^{+} + Z_{s}^{-} + λ D_{t} F_{t} + η F_{t} 1_{c \times c}} ⊙ F_{t} .

(34)

We summarize the optimization procedure of our proposed model TSRBG in Algorithm 1.

Algorithm 1 The procedure for TSRBG framework

Input:: Data and labels of the source domain ${X_{s}, Y_{s}}$ , data of the target domain $X_{t}$ ; Subspace dimension p; Parameters $α$ , $λ$ , $γ$ , and $β$ ;
Output:: Sample-label mapping matrix $W$ ; Source domain projection matrix $P_{s}$ ; Target domain projection matrix $P_{t}$ ; Sample-feature similar matrix $G$ ; Feature label matrix $F_{d}$ ; Target domain label matrix $F_{t}$ .

1:: Initialize $P_{s}, P_{t}$ with PCA; Target domain label $F_{t} = \frac{1}{c} * 1_{n_{t} \times c}$ ; Feature label matrix $F_{d} = \frac{1}{c} * 1_{p \times c}$ ;
2:: while not converge do
3:: Compute $W$ by Equation (15) and then update $Q$ ;
4:: Using Sylvester equation to compute subspace projection matrix $P$ by Equation (20) and split it to obtain the subspace projection matrix of source and target domain respectively and then compute $M$ ;
5:: Update sample-feature similar matrix $G$ by optimizing Equation (23) and then update $S$ and Laplacian matrix $L = D - S$ ;
6:: Compute Feature label matrix $F_{d}$ by Equation (27);
7:: Compute Target domain label matrix $F_{t}$ by Equation (34);
8:: end while

2.6. Computational Complexity

We assume that the complexity between individual matrix elements is

O (1)

. The computational complexity of TSRBG consists of the following parts. We need

O (p n^{2})

to calculate

W

and

O (p c)

to update

Q

. When updating

P

, the calculation of the Sylvester equation needs

O (d^{3} p^{3} + d^{2} p^{2})

, and then

O (d p)

complexity is used to update

M

. For

i \in [1, \dots, n]

, the updating of

g^{i}

costs

O (p)

, so the complexity is

O (n p)

in updating

G

. For the label indicator matrix,

F_{d}

costs

O (p^{2} c + p n c)

and

F_{t}

costs

O (n_{t}^{2} c + n_{t} c^{2} + n_{t} c + n_{t} p c)

complexities. As a result, the computational complexity of TSRBG is

O (T (p n^{2} + d^{3} p^{3} + n_{t}^{2} c))

, where T is the number of iterations.

3. Experiments

3.1. Dataset

SEED-IV [44] is a video-evoked emotional EEG dataset provided by the brain-like computing and machine intelligence center, Shanghai Jiao Tong University. In SEED-IV, 72 movie clips with obvious emotional tendency were used to evoke four emotional states of happiness, sadness, fear, and neutrality in 15 subjects and each subject had three sessions. In each session, each subject was asked to watch 24 movie clips; that is, every six movie clips correspond to one emotional state. EEG data was recorded by the ESI NeuroScan System with a 62-channel cap with sampling frequency of 1000 Hz. To reduce the computational burden, it was then down-sampled to 200 Hz. By band-pass filtering EEG data to

1 - 50 Hz

, Differential Entropy (DE) feature was extracted from five different EEG frequency bands, including the Delta (1–3Hz), Theta (4–7 Hz), Alpha (8–13 Hz), Beta (14–30 Hz), and Gamma (31–50 Hz). The DE feature is defined as

h (X) = - \int_{X} p (x) l n (p (x)) d x,

(35)

where

X

is a random variable,

p (x)

is the corresponding probability density function. Assuming that the collected EEG signals obey the Gaussian distribution

N (μ, σ^{2})

, the DE feature can be calculated by

h (X) = \int p (x) (- \frac{1}{2} l n (2 π σ^{2}) - \frac{{(x - μ)}^{2}}{2 σ^{2}}) = \frac{1}{2} l n (2 π σ^{2}) + \frac{V a r (X)}{2 σ^{2}} = \frac{1}{2} l n (2 π σ^{2}) .

(36)

The data format provided by SEED-IV is

62 \times n \times 5

, where n is the number of EEG samples in each session. To be specific, there are 851, 832 and 822 samples in the three sessions, respectively. We reshape DE features into

310 \times n

by concatenating the 62 values of 5 frequency bands into a vector and then normalize them into [−1, 1] by row.

3.2. Experimental Settings

We set up a cross-subject EEG emotion recognition task based on SEED-IV. For each session, samples as well as their labels from the first subject form the labeled source domain and samples from each of the other subjects form target domain. Therefore, for each session, we have 14 cross-subject tasks.

To evaluate the performance of TSRBG, we compare it with several methods including four non-deep transfer learning methods (Joint Distribution Adaptation (JDA) [45], Graph Adaptation Knowledge Transfer (GAKT) [37], Maximum Independent Domain Adaptation (MIDA) [24], Feature Selection Transfer Subspace Learning (FSTSL) [46]), one semi-supervised classification method (Structured Optimal Bipartite Graph learning (SOBG) [47]), and two deep learning methods (DGCNN [48] and LRS [22]). DGCNN is a deep learning method which uses the graph structure to depict the relationship of EEG channels. LRS is a deep transfer method to minimize the discrepancies of latent representations of source and target EEG data.

In the experiments, the parameters of each method are tuned as follows. For JDA, linear kernel was used and the dimension h of the subspace was tuned from

{10, 20, \dots, 100}

and the parameter

λ

was searched from

{10^{- 3}, 10^{- 2}, \dots, 10^{3}}

. For GAKT, the dimension p of the subspace was tuned from

{10, 20, \dots, 100}

and the parameter

λ

and

α

were searched from

{10^{- 3}, 10^{- 2}, \dots, 10^{3}}

. For MIDA, the linear kernel was used, the regularization parameter

μ

and kernel parameter

γ

were searched from

{10^{- 3}, 10^{- 2}, \dots, 10^{3}}

. For FSTSL, the parameters

α

,

β

,

γ

were tuned from

{10^{- 3}

,

10^{- 2}

, ⋯,

10^{3}}

. For SOBG, the parameters

λ

,

η

were tuned from

{10^{- 3}, 10^{- 2}, \dots, 10^{3}}

. In TSRBG, we tuned the parameters

α

,

β

,

γ

,

λ

from

{10^{- 3}, 10^{- 2}, \dots, 10^{3}}

and the subspace dimensionality was searched from

{10, 20, \dots, 100}

.

3.3. Recognition Results and Analysis

The recognition accuracies of the above eight models in the cross-subject EEG emotional state recognition tasks in 3 sessions are shown in Table 1, Table 2 and Table 3 respectively. In these tables, ‘sub2’ indicates that the samples from the first subject were used as the labeled source domain data while the samples from the second subject were used as the unlabeled target domain data, and so on; ‘AVG.’ represents the average accuracy of all the 14 groups cross-subject cases in the session. We mark in bold the highest recognition accuracy of each emotion recognition case (each row of the tables).

According to these obtained results shown in Table 1, Table 2 and Table 3, we draw the following observations.

TSRBG has achieved better EEG emotional state recognition accuracy than the other compared models in most cases. The highest recognition accuracy is the 15th subject of session 2, which is 88.58%. The average recognition accuracy of the three sessions are better than the other seven models, which are 72.83%, 76.49%, and 77.50%, respectively. On the whole, it verifies that the proposed TSRBG model is effective.
By comparing the average recognition accuracy of the eight models in three sessions, it can be found that the joint optimization of semi-supervised EEG emotional states estimation and EEG feature transfer alignment in a tight coupling way can obtain better recognition accuracy. By setting GAKT and TSRBG as control groups, we find that the accuracy of TSRBG is significantly better than that of GAKT, and the main difference between them is the semi-supervised EEG emotion state estimation process. GAKT constructs an undirected graph based on the unaligned original data and this graph will not be updated with the data distribution alignment. In the double projection feature alignment subspace, it fails to well describe the sample association between the two domains. As a result, it cannot accurately estimate the EEG emotion state in the target domain, which affects the alignment effect of conditional distribution. However, TSRBG estimates the EEG emotional states of target domain by a bi-model fusing method. One model is used to construct a sample-feature bipartite graph to characterize inter-domain associations for label propagation. The initialized graph is dynamically updated based on the data subspace representations. The other model is the semi-supervised regression, which can effectively build the connection between subspace data representations and the label indicator matrix.

In order to describe the recognition performance advantages of our proposed model in more detail, we use the Friedman test [49] to judge whether the eight models have the same performance in cross-subject EEG emotion state recognition tasks. The underlying assumption is that “the performance of all models is the same”. We rank the performance of the compared models in each group of cross-subject emotion state recognition experiments (in our experiment, the higher the recognition accuracy, the higher the ranking), and calculate the average ranking

r_{i}

of each model. Assuming that there are K models and N data sets, we calculate the variable

τ_{X^{2}}

as

τ_{X^{2}} = \frac{12 N}{K (K + 1)} (\sum_{i = 1}^{K} r_{i}^{2} - \frac{K {(K + 1)}^{2}}{4}),

(37)

which follows the

X^{2}

distribution with degree of freedom

K - 1

. In our work, there are 8 comparative models and 42 groups of cross-subject EEG emotion state recognition tasks. That is,

K = 8

, and

N = 42

.

Then, we can calculate the variable

τ_{F}

as

τ_{F} = \frac{(N - 1) τ_{X^{2}}}{N (K - 1) - τ_{X^{2}}},

(38)

which obeys the

F

distribution with degree of freedom

K - 1

and

(K - 1) (N - 1)

.

According to the recognition results of different models in Table 1, Table 2 and Table 3, we calculate that the average rankings of them are [3.79, 3.36, 4.81, 4.5, 6.19, 5.14, 6.79, 1.29]. Based on (37) and (38), we obtain the value of variable

τ_{F}

is 35.682. If the significance level

α

is 0.05, then the critical value of Friedman test is 2.0416, which can be obtained through MATLAB expression ‘

i c d f

(‘F’,

α, K - 1, (K - 1) * (N - 1))

’ [49]. Since 35.682 is far greater than 2.0416, the assumption “the performance of all models is the same” is rejected. It is necessary to further distinguish the algorithms through the Nemenyi test-based post-hoc test. The results are shown in Figure 2. The models are sorted based on the value of average ranking

r_{i}

and the model with higher ranking is closer to the figure top. The length of the corresponding vertical line of these models is called the critical distance (CD), whose value 1.620 is calculated by

C D = q_{α} \sqrt{\frac{K (K + 1)}{6 N}},

(39)

where the critical value

q_{α}

is 3.031 when

α = 0.05

. We can judge whether there are significant differences between models by whether there are overlaps in the vertical lines corresponding to the models in Figure 2. For example, the rank value of TSRBG is 1.29 while it is 3.36 for GAKT, the gap between them is 2.07, which is greater than the CD value 1.620, so there is no overlap between their corresponding vertical lines. Therefore, the TSRBG is significantly better than GAKT in the cross-subject EEG emotion recognition tasks. Similar analysis can be performed on the other models.

Further, the average recognition results of these models are reorganized by confusion matrices to analyze the recognition performance of each model in each emotional state. The results are shown in Figure 3. We find that TSRBG has a high average recognition accuracy of 82.48% in neutrality state, which is the highest recognition accuracy among the four kinds of emotional states. The proportions of the neutral EEG samples were wrongly divided into sadness, fear, and happiness by 6.90%, 6.56%, and 4.06%, respectively. Compared with the other models, the recognition accuracies of the sadness and neutrality states were significantly improved by TSRBG. For example, the recognition rate of sad EEG emotional states was improved by at least 16.85% compared with the other models. Moreover, the recognition accuracy of the fear emotion category was improved slightly, at 3.45%.

3.4. Subspace Analysis and Mining

In this work, the process of EEG feature transfer is to seek dual subspaces, which are expected to reduce distribution differences between the source and the target domain data as much as possible. For each domain, subspace data representation is obtained by projecting the original data with a projection matrix. In order to intuitively reflect the alignment effect of two domain data in the subspace, we use the t-SNE method [50] to visualize two groups of experimental data before and after alignment. As shown in Figure 4, we see that the data distributions of source and target domain in the subspace have been effectively aligned.

The subspace feature dimension is p. In order to obtain the subspace dimension suitable for data distribution alignment, we show the change of model recognition accuracy with the adjustment of subspace dimension in Figure 5. It is observed that TSRBG is generally insensitive to the subspace dimension. When the subspace dimension is adjusted within the interval [30, 60], TSRBG generally have satisfactory recognition accuracies.

From the perspective of transfer learning, the subspace should reserve the common information and exclude the non-common information between subjects; that is, in the learned subspace, the common components between the source and the target domain should be preserved while the subject-dependent components should be excluded. The subject-independent common components are considered as the intrinsic component of emotion that does not change between subjects. The subject-dependent non-common components are considered as the unique external information of different subjects. From the perspective of EEG features, the subject-independent common EEG features should have larger weights and contribute more to cross-subject emotion recognition. By contrast, the subject-dependent non-common EEG features should have smaller weights and contribute less in cross-subject emotion recognition. If we can quantify the importance of different EEG feature dimensions, according to the corresponding relationship between EEG feature dimension and frequency band [51], the common EEG activation patterns in cross-subject emotion recognition can be explored.

We assume that

{θ_{s}}_{i}

and

{θ_{t}}_{i}

are the importance measurement factors of the i-dimensional features of the source and target domain respectively. Based on the

ℓ_{2, 1}

-norm feature selection theory [52],

{θ_{s}}_{i}

and

{θ_{t}}_{i}

can be obtained by calculating the normalized

ℓ_{2}

-norm of the i-row vector of the subspace projection matrix of the source and target domain, respectively. That is,

{θ_{(s / t)}}_{i} = \frac{∥ {p_{(s / t)}}^{i} ∥_{2}}{\sum_{j = 1}^{d} {∥ {p_{(s / t)}}^{j} ∥}_{2}},

(40)

where

{p_{(s / t)}}^{i}

is the i-th row vector of the subspace projection matrix. Then, we can quantitatively calculate the importance of the a-th frequency band and the l-th channel through

\begin{matrix} ω (a) & = θ_{(a - 1) * 62 + 1} + θ_{(a - 1) * 62 + 2} + \dots + θ_{a * 62}, \\ ψ (l) & = θ_{l} + θ_{l + 62} + θ_{l + 124} + θ_{l + 186} + θ_{l + 248}, \end{matrix}

(41)

where a = 1, 2, 3, 4, 5 denote the Delta, Theta, Alpha, Beta, and Gamma frequency bands, respectively. l = 1, ⋯, 62 denote the 62 channels, which are FP1, FPZ, ⋯, CB2.

In SEED-IV, the DE features are extracted from five frequency bands and 62 channels. Therefore, the corresponding relationship between the feature importance measurement and different frequency bands (channels) can be established, as shown in Figure 6.

As shown in Figure 7, we quantify the importance of different EEG frequency bands in cross-subject emotion recognition, according to the above analysis. Figure 7a presents the results obtained by analyzing the source projection matrix

P_{s}

in three sessions, respectively, and their average results. Figure 7b displays the results obtained by analyzing the target projection matrix

P_{t}

in three sessions, and their average result. Figure 7c presents the average results of the source and target domains in three sessions, and the average results of both across all sessions. From the perspective of data-driven and pattern recognition, it is believe that the Gamma frequency band is the most important one in the cross-subject EEG emotion recognition.

Furthermore, we calculated the importance of different EEG channels, as shown in Figure 8. In Figure 8a, we showed the importance of each brain region in the form of the brain topographic map. We observed that the position of the left side of the prefrontal lobe had high weight in all results, and believe that this brain region should have higher importance in cross-subject EEG emotion recognition. The top 10 important channels of each session and the overall average are quantitatively analyzed in Figure 8b. We believe that FP1, P06, P05, O1, P4, and P8 are more important for the cross-subject EEG emotion recognition. Considering that the model has good performance for sadness and neutral EEG emotional states, the above brain region and channels might be more closely related to these two emotional states.

4. Conclusions

In this paper, we proposed a new model termed TSRBG for cross-subject emotion recognition from EEG, whose main merits are generally summarized as follows. (1) The unification of the feature domain adaptation and the target domain label estimation was effectively realized in a unified framework. Better-aligned source and target data can effectively improve the target domain label estimation performance; in turn, more accurately estimated target domain label information can better facilitate the modeling of conditional distribution modeling, leading to better domain adaptation performance. (2) The intra- and inter-domain connections were investigated based on the subspace aligned data, which formulated a bi-model fusion strategy for target domain label estimation, leading to significantly better recognition accuracy. (3) The learned subspace of TSRBG provided us with a quantitative way to explore the key EEG frequency bands and channels in emotional expression. The experimental results on the SEED-IV data set demonstrated that: (1) The joint learning mode in TSRBG effectively improved the cross-subject EEG emotion state recognition performance; (2) The Gamma frequency band and the prefrontal brain region are identified as more important components in emotion expression.

Author Contributions

Conceptualization, Y.P.; Data curation, W.L.; Investigation, Y.P.; Methodology, W.L. and Y.P.; Software, W.L. and Y.P.; Validation, Y.P.; Writing—original draft preparation, W.L. and Y.P.; Writing—review and editing, W.L. and Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Zhejiang Provincial Natural Science Foundation of China (LY21F030005), National Natural Science Foundation of China (61971173, U20B2074), Fundamental Research Funds for the Provincial Universities of Zhejiang (GK209907299001-008), China Postdoctoral Science Foundation (2017M620470), CAAC Key Laboratory of Flight Techniques and Flight Safety (FZ2021KF16), and Guangxi Key Laboratory of Optoelectronic Information Processing, Guilin University of Electronic Technology (GD21202).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board (or Ethics Committee) of Shanghai Jiao Tong University (protocol code 2017060).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Not applicable.

Acknowledgments

The authors also would like to thank the anonymous reviewers for their comments on this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Beldoch, M. Sensitivity to expression of emotional meaning in three modes of communication. In The Communication of Emotional Meaning; McGraw-Hill: New York, NY, USA, 1964; pp. 31–42. [Google Scholar]
Salovey, P.; Mayer, J.D. Emotional intelligence. Imagin. Cogn. Personal. 1990, 9, 185–211. [Google Scholar] [CrossRef]
Chen, L.; Wu, M.; Pedrycz, W.; Hirota, K. Emotion Recognition and Understanding for Emotional Human-Robot Interaction Systems; Springer: Cham, Switzerland, 2020; pp. 1–247. [Google Scholar]
Papero, D.; Frost, R.; Havstad, L.; Noone, R. Natural systems thinking and the human family. Systems 2018, 6, 19. [Google Scholar] [CrossRef] [Green Version]
Li, W.; Huan, W.; Hou, B.; Tian, Y.; Zhang, Z.; Song, A. Can emotion be transferred?—A review on transfer learning for EEG-Based Emotion Recognition. IEEE Trans. Cogn. Dev. Syst. 2021. [Google Scholar] [CrossRef]
Nie, Z.; Wang, X.; Duan, R.; Lu, B. A survey of emotion recognition based on EEG. Chin. J. Biomed. Eng. 2012, 31, 12. [Google Scholar]
Ko, B.C. A brief review of facial emotion recognition based on visual information. Sensors 2018, 18, 401. [Google Scholar] [CrossRef] [PubMed]
Akçay, M.B.; Oğuz, K. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 2020, 116, 56–76. [Google Scholar] [CrossRef]
Alswaidan, N.; Menai, M.E.B. A survey of state-of-the-art approaches for emotion recognition in text. Knowl. Inf. Syst. 2020, 62, 2937–2987. [Google Scholar] [CrossRef]
Khare, S.K.; Bajaj, V.; Sinha, G.R. Adaptive tunable Q wavelet transform-based emotion identification. IEEE Trans. Instrum. Meas. 2020, 69, 9609–9617. [Google Scholar] [CrossRef]
Becker, H.; Fleureau, J.; Guillotel, P.; Wendling, F.; Merlet, I.; Albera, L. Emotion recognition based on high-resolution EEG recordings and reconstructed brain sources. IEEE Trans. Affect. Comput. 2020, 11, 244–257. [Google Scholar] [CrossRef]
Wang, H.; Pei, Z.; Xu, L.; Xu, T.; Bezerianos, A.; Sun, Y.; Li, J. Performance enhancement of P300 detection by multiscale-CNN. IEEE Trans. Instrum. Meas. 2021, 70, 1–12. [Google Scholar] [CrossRef]
Hondrou, C.; Caridakis, G. Affective, natural interaction using EEG: Sensors, application and future directions. In Lecture Notes in Computer Science, Proceedings of the Artificial Intelligence: Theories and Applications—7th Hellenic Conference on AI (SETN 2012), Lamia, Greece, 28–31 May 2012; Maglogiannis, I., Plagianakos, V.P., Vlahavas, I.P., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7297, pp. 331–338. [Google Scholar] [CrossRef] [Green Version]
Marei, A.; Yoon, S.A.; Yoo, J.U.; Richman, T.; Noushad, N.; Miller, K.; Shim, J. Designing feedback systems: Examining a feedback approach to facilitation in an online asynchronous professional development course for high school science teachers. Systems 2021, 9, 10. [Google Scholar] [CrossRef]
Mammone, N.; De Salvo, S.; Bonanno, L.; Ieracitano, C.; Marino, S.; Marra, A.; Bramanti, A.; Morabito, F.C. Brain network analysis of compressive sensed high-density EEG signals in AD and MCI subjects. IEEE Trans. Ind. Inform. 2018, 15, 527–536. [Google Scholar] [CrossRef]
Murugappan, M.; Rizon, M.; Nagarajan, R.; Yaacob, S.; Hazry, D.; Zunaidi, I. Time-frequency analysis of EEG signals for human emotion detection. In Proceedings of the 4th Kuala Lumpur International Conference on Biomedical Engineering 2008, Kuala Lumpur, Malaysia, 25–28 June 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 262–265. [Google Scholar]
Thejaswini, S.; Ravi Kumar, K.M.; Aditya Nataraj, J.L. Analysis of EEG based emotion detection of DEAP and SEED-IV databases using SVM. SSRN Electron. J. 2019, 8, 576–581. [Google Scholar]
Li, X.; Song, D.; Zhang, P.; Zhang, Y.; Hou, Y.; Hu, B. Exploring EEG features in cross-subject emotion recognition. Front. Neurosci. 2018, 12, 162. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Ali Olamat, P.O.; Atasever, S. Deep learning methods for multi-channel EEG-based emotion recognition. Int. J. Neural Syst. 2022, 32, 2250021. [Google Scholar] [CrossRef]
Lew, W.C.L.; Wang, D.; Shylouskaya, K.; Zhang, Z.; Lim, J.H.; Ang, K.K.; Tan, A.H. EEG-based emotion recognition using spatial-temporal representation via Bi-GRU. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020; pp. 116–119. [Google Scholar]
Gong, S.; Xing, K.; Cichocki, A.; Li, J. Deep learning in EEG: Advance of the last ten-year critical period. IEEE Trans. Cogn. Dev. Syst. 2022, 14, 348–365. [Google Scholar] [CrossRef]
Li, J.; Qiu, S.; Du, C.; Wang, Y.; He, H. Domain adaptation for EEG emotion recognition based on latent representation similarity. IEEE Trans. Cogn. Dev. Syst. 2020, 12, 344–353. [Google Scholar] [CrossRef]
Gong, B.; Shi, Y.; Sha, F.; Grauman, K. Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2066–2073. [Google Scholar]
Yan, K.; Kou, L.; Zhang, D. Learning domain-invariant subspace using domain features and independence maximization. IEEE Trans. Cybern. 2018, 48, 288–299. [Google Scholar] [CrossRef] [Green Version]
Li, Y.; Fu, B.; Li, F.; Shi, G.; Zheng, W. A novel transferability attention neural network model for EEG emotion recognition. Neurocomputing 2021, 447, 92–101. [Google Scholar] [CrossRef]
Zheng, W.L.; Lu, B.L. Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks. IEEE Trans. Auton. Ment. Dev. 2015, 7, 162–175. [Google Scholar] [CrossRef]
Zheng, W.L.; Zhu, J.Y.; Lu, B.L. Identifying stable patterns over time for emotion recognition from EEG. IEEE Trans. Affect. Comput. 2017, 10, 417–429. [Google Scholar] [CrossRef] [Green Version]
Quan, X.; Zeng, Z.; Jiang, J.; Zhang, Y.; Lu, B.; Wu, D. Physiological signals based affective computing: A systematic review. Acta Autom. Sin. 2021, 47, 1769–1784. (In Chinese) [Google Scholar]
Suhaimi, N.S.; Mountstephens, J.; Teo, J. EEG-based emotion recognition: A state-of-the-art review of current trends and opportunities. Comput. Intell. Neurosci. 2020, 2020, 1–19. [Google Scholar] [CrossRef] [PubMed]
Lu, B.; Zhang, Y.; Zheng, W. A survey of affective brain-computer interface. Chin. J. Intell. Sci. Technol. 2021, 3, 36–48. (In Chinese) [Google Scholar]
Niu, S.; Liu, Y.; Wang, J.; Song, H. A decade survey of transfer learning (2010–2020). IEEE Trans. Artif. Intell. 2020, 1, 151–166. [Google Scholar] [CrossRef]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
Zheng, W.L.; Lu, B.L. Personalizing EEG-based affective models with transfer learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 2732–2738. [Google Scholar]
Zhou, R.; Zhang, Z.; Yang, X.; Fu, H.; Zhang, L.; Li, L.; Huang, G.; Dong, Y.; Li, F.; Liang, Z. A novel transfer learning framework with prototypical representation based pairwise learning for cross-subject cross-session EEG-based emotion recognition. arXiv 2022, arXiv:2202.06509. [Google Scholar]
Bahador, N.; Kortelainen, J. Deep learning-based classification of multichannel bio-signals using directedness transfer learning. Biomed. Signal Process. Control 2022, 72, 103300. [Google Scholar] [CrossRef]
Jayaram, V.; Alamgir, M.; Altun, Y.; Scholkopf, B.; Grosse-Wentrup, M. Transfer learning in brain-computer interfaces. IEEE Comput. Intell. Mag. 2016, 11, 20–31. [Google Scholar] [CrossRef] [Green Version]
Ding, Z.; Li, S.; Shao, M.; Fu, Y. Graph adaptive knowledge transfer for unsupervised domain adaptation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 37–52. [Google Scholar]
Lan, Z.; Sourina, O.; Wang, L.; Scherer, R.; Müller-Putz, G.R. Domain adaptation techniques for EEG-based emotion recognition: A comparative study on two public datasets. IEEE Trans. Cogn. Dev. Syst. 2019, 11, 85–94. [Google Scholar] [CrossRef]
Cui, J.; Jin, X.; Hu, H.; Zhu, L.; Ozawa, K.; Pan, G.; Kong, W. Dynamic Distribution Alignment with Dual-Subspace Mapping For Cross-Subject Driver Mental State Detection. IEEE Trans. Cogn. Dev. Syst. 2021. [Google Scholar] [CrossRef]
Gretton, A.; Sriperumbudur, B.; Sejdinovic, D.; Strathmann, H.; Balakrishnan, S.; Pontil, M.; Fukumizu, K. Optimal kernel choice for large-scale two-sample tests. In Curran Associates, Incorporated, Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS 2012); Pereira, F., Burges, C., Bottou, L., Weinberger, K., Eds.; Curran Associates, Incorporated: Lake Tahoe, NV, USA, 2012; Volume 25, pp. 1205–1213. [Google Scholar]
Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
Bartels, R.H.; Stewart, G.W. Solution of the matrix equation AX + XB = C [F4]. Commun. ACM 1972, 15, 820–826. [Google Scholar] [CrossRef]
Peng, Y.; Zhu, X.; Nie, F.; Kong, W.; Ge, Y. Fuzzy graph clustering. Inf. Sci. 2021, 571, 38–49. [Google Scholar] [CrossRef]
Zheng, W.L.; Liu, W.; Lu, Y.; Lu, B.L.; Cichocki, A. Emotionmeter: A multimodal framework for recognizing human emotions. IEEE Trans. Cybern. 2018, 49, 1110–1122. [Google Scholar] [CrossRef]
Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 2200–2207. [Google Scholar]
Song, P.; Zheng, W. Feature selection based transfer subspace learning for speech emotion recognition. IEEE Trans. Affect. Comput. 2018, 11, 373–382. [Google Scholar] [CrossRef]
Nie, F.; Wang, X.; Deng, C.; Huang, H. Learning a structured optimal bipartite graph for co-clustering. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4132–4141. [Google Scholar]
Song, T.; Zheng, W.; Song, P.; Cui, Z. EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks. IEEE Trans. Affect. Comput. 2020, 11, 532–541. [Google Scholar] [CrossRef] [Green Version]
Zhou, Z. Machine Learning Beijing; Tsinghua University Press: Beijing, China, 2016; pp. 42–44. [Google Scholar]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Peng, Y.; Qin, F.; Kong, W.; Ge, Y.; Nie, F.; Cichocki, A. GFIL: A unified framework for the importance analysis of features, frequency bands and channels in EEG-based emotion recognition. IEEE Trans. Cogn. Dev. Syst. 2021. [Google Scholar] [CrossRef]
Nie, F.; Huang, H.; Cai, X.; Ding, C. Efficient and robust feature selection via joint ℓ_2,1-norms minimization. In Proceedings of the 23rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–9 December 2010; Volume 2, pp. 1813–1821. [Google Scholar]

Figure 1. The overall framework of TSRBG.

Figure 2. Nemenyi test on the emotion recognition results of the compared models in our experiments. The critical distance value is 1.620.

Figure 3. The recognition results organized by confusion matrices.

Figure 4. Source and target data distributions. (a) Original space of session2 subject8; (b) subspace of session2 subject8; (c) original space of session3 subject12; (d) subspace of session3 subject12.

Figure 5. Recognition performance of TSRBG in terms of different subspace dimensions.

Figure 6. The framework of emotion activation mode analysis.

Figure 7. Quantitative importance measures of EEG frequency bands in emotion expression. (a) Source domain; (b) Target domain; (c) Average.

Figure 8. Critical brain regions correlated to emotion expression and the top 10 EEG channels. (a) Critical brain regions; (b) Top 10 EEG channels (%).

Table 1. Cross-subject emotion recognition results in session 1 (%).

Subject	JDA	GAKT	MIDA	FSTSL	SOBG	DGCNN	LRS	TSRBG
sub2	57.81	73.09	67.69	66.51	35.02	54.64	49.12	73.68
sub3	64.75	62.63	58.40	59.93	63.69	57.46	39.25	67.33
sub4	68.27	58.99	44.77	60.52	50.53	58.99	42.89	71.33
sub5	48.53	39.72	46.53	56.99	48.53	49.12	32.67	73.44
sub6	51.59	53.11	47.83	46.53	49.24	40.42	21.39	67.57
sub7	70.15	58.87	54.99	54.76	44.54	48.18	42.66	75.32
sub8	65.45	62.51	66.39	42.30	43.95	51.12	47.59	80.96
sub9	64.86	63.69	53.35	61.69	45.95	62.98	43.24	74.74
sub10	65.69	51.12	63.81	55.11	47.47	42.66	46.77	78.73
sub11	51.94	62.16	59.34	47.83	47.24	51.00	42.42	73.80
sub12	54.29	59.34	59.11	48.06	50.18	55.93	63.34	71.21
sub13	62.98	64.28	50.65	54.05	52.64	52.29	33.49	68.51
sub14	55.58	65.45	43.95	49.82	49.59	53.23	40.89	68.86
sub15	69.10	52.41	46.65	57.58	33.73	53.82	33.73	74.15
Avg.	60.79	59.10	54.53	54.41	47.31	52.27	41.39	72.83

Table 2. Cross-subject emotion recognition results in session 2 (%).

Subject	JDA	GAKT	MIDA	FSTSL	SOBG	DGCNN	LRS	TSRBG
sub2	90.75	68.03	66.83	74.88	50.12	65.87	78.13	78.49
sub3	69.59	61.54	69.23	68.99	78.73	68.99	80.41	81.25
sub4	60.49	79.57	63.82	51.56	55.05	59.38	31.85	74.52
sub5	58.89	63.22	71.03	67.55	48.32	56.13	55.05	74.04
sub6	61.78	56.49	41.47	54.09	36.66	52.28	36.18	75.84
sub7	64.54	68.87	69.59	77.28	42.91	64.54	52.04	78.13
sub8	78.49	68.63	66.35	54.81	68.39	49.76	50.12	77.16
sub9	59.13	54.33	60.46	41.83	61.42	54.81	37.02	76.92
sub10	41.11	82.33	62.14	50.00	67.19	60.34	59.38	76.56
sub11	63.58	72.00	51.58	60.82	32.81	53.00	42.91	74.28
sub12	56.49	44.59	41.11	68.87	49.88	47.72	27.76	69.23
sub13	62.98	64.90	53.37	60.34	32.81	49.16	58.41	71.75
sub14	46.51	50.48	49.04	44.71	48.32	61.66	52.28	74.16
sub15	77.76	88.82	55.53	84.01	61.18	60.46	57.57	88.58
Avg.	63.72	65.99	58.68	61.41	51.27	58.77	51.34	76.49

Table 3. Cross-subject emotion recognition results in session 3 (%).

Subject	JDA	GAKT	MIDA	FSTSL	SOBG	DGCNN	LRS	TSRBG
sub2	54.62	60.10	87.96	88.93	45.99	64.60	55.96	79.56
sub3	64.11	65.57	76.76	70.07	42.09	49.51	49.27	72.26
sub4	57.66	69.34	43.92	63.26	57.06	56.08	43.19	81.14
sub5	63.75	67.64	74.33	61.19	39.54	46.35	39.05	79.68
sub6	57.66	62.65	57.42	54.99	40.88	72.14	41.85	84.91
sub7	66.99	79.93	47.49	72.63	55.60	59.49	18.86	77.86
sub8	62.41	59.85	76.64	64.72	47.93	69.10	58.76	73.24
sub9	75.18	50.24	50.97	47.20	51.82	50.61	37.35	73.11
sub10	51.09	69.34	41.73	58.64	32.60	50.24	45.13	76.64
sub11	57.06	81.75	54.14	56.08	40.63	61.92	58.76	75.79
sub12	45.50	57.54	56.69	61.31	53.77	59.37	54.62	70.32
sub13	55.72	61.44	46.59	46.72	42.34	50.00	42.70	76.28
sub14	56.45	77.25	57.06	77.62	50.85	53.41	27.01	79.56
sub15	70.32	85.28	52.19	62.04	56.45	54.01	23.60	84.67
Avg.	59.89	67.71	58.85	63.24	46.97	56.92	42.58	77.50

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, W.; Peng, Y. Transfer EEG Emotion Recognition by Combining Semi-Supervised Regression with Bipartite Graph Label Propagation. Systems 2022, 10, 111. https://doi.org/10.3390/systems10040111

AMA Style

Li W, Peng Y. Transfer EEG Emotion Recognition by Combining Semi-Supervised Regression with Bipartite Graph Label Propagation. Systems. 2022; 10(4):111. https://doi.org/10.3390/systems10040111

Chicago/Turabian Style

Li, Wenzheng, and Yong Peng. 2022. "Transfer EEG Emotion Recognition by Combining Semi-Supervised Regression with Bipartite Graph Label Propagation" Systems 10, no. 4: 111. https://doi.org/10.3390/systems10040111

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Transfer EEG Emotion Recognition by Combining Semi-Supervised Regression with Bipartite Graph Label Propagation

Abstract

1. Introduction

2. Methodology

2.1. Problem Definition

2.2. Domain Alignment

2.3. Label Estimation

2.3.1. Bipartite Label Propagation

2.3.2. Semi-Supervised Regression

2.3.3. Fused Label Estimation Model

2.4. Overall Objective Function

2.5. Optimization

2.6. Computational Complexity

3. Experiments

3.1. Dataset

3.2. Experimental Settings

3.3. Recognition Results and Analysis

3.4. Subspace Analysis and Mining

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI