Next Article in Journal
Multi-Class Classification of Medical Data Based on Neural Network Pruning and Information-Entropy Measures
Previous Article in Journal
Stability and Evolution of Synonyms and Homonyms in Signaling Game
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Regularized RKHS-Based Subspace Learning for Motor Imagery Classification

1
School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou 510006, China
2
Public Experimental Teaching Center, Sun Yat-sen University, Guangzhou 510006, China
*
Authors to whom correspondence should be addressed.
Entropy 2022, 24(2), 195; https://doi.org/10.3390/e24020195
Submission received: 25 November 2021 / Revised: 4 January 2022 / Accepted: 17 January 2022 / Published: 27 January 2022

Abstract

:
Brain–computer interface (BCI) technology allows people with disabilities to communicate with the physical environment. One of the most promising signals is the non-invasive electroencephalogram (EEG) signal. However, due to the non-stationary nature of EEGs, a subject’s signal may change over time, which poses a challenge for models that work across time. Recently, domain adaptive learning (DAL) has shown its superior performance in various classification tasks. In this paper, we propose a regularized reproducing kernel Hilbert space (RKHS) subspace learning algorithm with K-nearest neighbors (KNNs) as a classifier for the task of motion imagery signal classification. First, we reformulate the framework of RKHS subspace learning with a rigorous mathematical inference. Secondly, since the commonly used maximum mean difference (MMD) criterion measures the distribution variance based on the mean value only and ignores the local information of the distribution, a regularization term of source domain linear discriminant analysis (SLDA) is proposed for the first time, which reduces the variance of similar data and increases the variance of dissimilar data to optimize the distribution of source domain data. Finally, the RKHS subspace framework was constructed sparsely considering the sensitivity of the BCI data. We test the proposed algorithm in this paper, first on four standard datasets, and the experimental results show that the other baseline algorithms improve the average accuracy by 2–9% after adding SLDA. In the motion imagery classification experiments, the average accuracy of our algorithm is 3% higher than the other algorithms, demonstrating the adaptability and effectiveness of the proposed algorithm.

1. Introduction

Non-invasive BCIs enable people to communicate with electronic devices by analyzing the electrical or magnetic signals generated by the brain’s nervous system. Due to the advantages of non-invasiveness, low cost, portability and high temporal resolution for different brain activity monitoring modalities, electroencephalography (EEG) has been used in many non-invasive BCI studies [1]. Depending on the strategy used to control the device, BCIs systems can be classified as endogenous or exogenous [2]. Exogenous task BCIs systems are based on evoked activities that require external stimuli, such as visual evoked potentials. In contrast, endogenous BCIs are based on spontaneous activities, such as motor imagery (MI), in which the subject needs to focus on a specific mental task [3].
Motor imagery signals are body parts that imagine movement in the absence of actual movement. Different MI tasks lead to oscillatory activity observed in different areas of the sensorimotor cortex of the brain [4]. Various MI-based BCI applications have been used as rehabilitation for wheelchair and prosthetic control in disabled patients [5,6,7,8], and for recreation in healthy individuals [9,10].
EEG signals are prone to be affected by individual mental states, such as mood and attention. In the BCI’s MI experiment, subjects were asked to repeat the motor imagery tasks of their left hand, right hand, foot and tongue on two different days. This concentration constraint is a very tiring mental task for the subjects [2]. The BCIs system uses a fixed time interval for all subjects, which is considered as one of the drawbacks of this model [11]. The MI experiment depends on the subject and there is no way to define exactly when the effect of motor imagery appears after the cue [12], i.e., it can appear immediately after the cue or after a period of time. Therefore, the time interval for the onset of imagery may vary across subjects. Human consciousness is complex and unstable, and once the subject’s consciousness is separated in an experiment, the results can be severely biased. It has been shown that the EEG signal of the BCI-based MI task has high variability in the subprojects [13]. Therefore, when the data include measurements from different time periods, there is no guarantee that the spatial distribution of EEG data is consistent across days, even when the same task is performed. This multi-domain data poses a major challenge for machine learning methods.
Domain adaptation learning (DAL) is a branch of transfer learning for solving cross-domain learning problems, in which the training data and test data are from different domains. In general, the problem of domain adaptation processing involves a well-labeled source domain and an unlabeled target domain with different probability distributions. The goal is to solve the task in the target domain by knowledge transfer using the labeled information from the source domain [14]. In the BCI experience, the experimental data of the first day can be considered as the source domain and the experimental data of the second day can be considered as the target domain, and the knowledge obtained from the source domain can be transferred to the target domain. This is a typical domain adaptation problem. The methods of domain adaptation are minimizing the distribution differences in the feature space, such as joint distribution adaptation (JDA) [15], joint geometric and statistical alignment (JGSA) [16], and manifold embedding distribution alignment (MEDA) [17]. Domain adaptation methods have been applied in many fields, such as image classification, object recognition [18], text classification [14] and video event detection [19]. Previous studies [20,21] have shown the effectiveness of domain adaptation methods in mitigating the data distribution of different subjects or different stages in BCI. Therefore, domain adaptation learning (DAL) is the best choice for signal recognition in BCI systems.
The source and target domains can be transformed into another feature space by a transformation (e.g., kernel function), and then applied by the machine learning method. Kernel functions are a suitable class of feature mapping functions that implicitly map data to a high-dimensional RKHS and explicitly provide the inner product of the data in the space. RKHS subspace learning is a common framework for transfer learning, which learns a suitable subspace in the RKHS according to a specific machine learning task.
The most commonly used nonparametric distance estimation method for measuring the distance of statistical feature distribution between the source and target domain data is the maximum mean difference (MMD), which was proposed by Gretton et al. [22] and Smola et al. [23]. Based on MMD, Pan et al. [24] proposed transfer component analysis (TCA), which maps data from the source and target domains to a high-dimensional RKHS. However, MMD uses the first-order moments of the origin of two random variables to measure the distance between two probability distributions, which does not accurately describe the local differences between the two distributions. Therefore, it is common practice to add regularization terms to compensate for the shortcomings of MMD. For example, semi-supervised transfer component analysis (SSTCA) [24] adds a streamwise regularization term to TCA, which can reduce the distance in data distributions between domains and maximize label dependence in a latent space. Jiang et al. [25] proposed to integrate the global and local metrics for domain adaptive learning (IGLDA). Based on TCA, IGLDA considers both the local data information and overall information to make the source and target domain data as close as possible while preserving the geometric properties of the source domain data. Li et al. [26] proposed a domain adaptation algorithm framework that maps data from two domains to RKHS with feature selection and a maximum regularization term for the variance of the target domain data in the subspace. Our experiments show that their algorithm improves the classification accuracy to some extent, but ignores the optimization problem of the source domain data and its labels. In domain adaptation, source domain data with label is an important information source. How to use label information is always the focus of various domain adaptation algorithms. Lei et al. [27] applied the dictionary learning to the source domain while we borrowed the idea of LDA.
In this paper, we develop a new approach based on RKHS subspace learning and apply it to motor imagery recognition. It attempts to learn the coefficients of the RKHS subspace so that the differences in data distribution across domains can be reduced when projecting to that subspace. Machine learning approaches, such as classification and regression models, can be used in this subspace. Additionally, to make full use of the source domain information, we propose a source linear discriminant analysis (SLDA) regularization term. Specifically, considering the sensitivity of BCI data, we sparsely construct the RKHS subspace framework using the L2.1 criterion. The primary contributions of this paper are summarized as follows:
(1)
We reformulate the RKHS subspace learning framework (RKHS-DA), and propose the SLDA regularization term to remedy the deficiency of MMD in domain adaptation.
(2)
To address the problem of complex and unstable EEG signal, we choose features wisely in the low-dimensional subspace projected to the data through the L2.1 criterion to constrain the coefficient matrix.
(3)
Experimental results show that the average accuracy of our algorithm is 3% higher than other algorithms.
The remainder of this paper is organized as follows. Section 2 presents a general description of our approach. Section 3 describes the proposed framework and the SLDA regularization terms in detail. We validate our SLDA regularization and RKHS subspace learning framework, and the experimental results are presented in Section 4.

2. Preliminaries

2.1. Notations

In this paper, we use a combination of letters and numbers to represent data. A sample is denoted as a vector, e.g., the i th sample of x in a set is denoted as x i . We also use the subscripts s and t to indicate the source domain and the target domain, respectively. For a matrix M , the trace of matrix M is denoted by tr ( M ) . For clarity, the frequently used notations and corresponding descriptions are shown in Table 1.

2.2. Reproducing Kernel Hilbert Spaces (RKHS)

2.2.1. Hilbert Spaces

Definition (inner product space [28]): let H be the linear space on the real number domain R , , : H R , with the following properties:
(1)
Positive definiteness: for all x H , x , x and ⇔ x = 0 ;
(2)
Symmetry: For all x , y H , x , y = y , x ;
(3)
Bilinear: For all x , y , z H and α , β R ,
α x + β y , z = α x , z + β y , z
Then, we consider that , is the inner product of H , and ( H , , ) is an inner product space.
Let x be an element of inner product space ( H , , ) . In the inner product space, the norm is defined by the inner product:
|| x || = x , x
According to the nature of the positive definite inner product:
|| x || = x , x = 0   x = 0
then we have
|| x y || = 0     x = y
If all the basic sequences are convergent in this inner product space which is known as a Hilbert space.

2.2.2. Definition of Reproducing Kernel Hilbert Space (RKHS)

Let H = { f | f : Ω R , Ω | f ( x ) | 2 < + } be a square integrable function space. It is clear that H is a linear space. We define , : H × H R , for any f , g H
f , g = Ω f ( x ) g ( x ) d x
It can be shown that , is an inner product and ( H , , ) is a Hilbert space. Further, if there is k : Ω × Ω R , satisfy
(1)
For any x Ω , k x = k ( , x ) H ;
(2)
For any x Ω and f H , we have
f ( x ) = f , k x = f ( ) , k ( , x )
Therefore, H can be called a reproducing kernel Hilbert space (RKHS), and k is the reproducing kernel of H . Using reproducing kernel k , we can define mapping φ : Ω H : for any x Ω , we have
φ ( x ) = k ( , x ) = k x H
From Equation (6), it can be proved that
φ ( x ) , φ ( y ) = k x , k ( , y ) = k x ( y ) = k ( y , x ) = k ( x , y )

2.3. Hilbert Subspace Projection Theorem

Definition of projection: let ( H , , ) be an inner product space and A be a subspace of H . For x 0 H . If x 0 can be decomposed into x 0 = x 0 + x 0 , where x 0 A , x 0 , x 0 = 0 , then x 0 is called the projection of x 0 in subspace A .
Projection theorem: ( H , , ) is an inner product space. A is a finite dimensional subspace of H and { e 1 , , e d } is the standard orthogonal basis of A . For any x 0 H , the projection x 0 of x 0 in A is as follows:
x 0 = i = 1 d x 0 , e i e i A
Remark 1.
A is a finite-dimensional subspace of H ; therefore, A is complete, i.e., A is a Hilbert subspace, so the projection of any point in H onto A exists.

2.4. Domain Adaptation Learning and MMD

There are two datasets in data space Ω : the labeled source domain data X s = { x 1 s , , x N s s } Ω , and the unlabeled target domain data X t = { x 1 t , , x N t t } Ω , and the distributions of X s and X t in the data space are different. We need to classify X t based on X s . This problem is domain adaptation learning. In our work, we resorted to MMD [22], a nonparametric metric to measure the distance between distributions, which can transform the source domain data X s and target domain data X t onto RKHS H generated by the reproducing kernel k , i.e.,
ϕ ( X s ) = { ϕ ( x 1 s ) , , ϕ ( x N s s ) } H ,   ϕ ( X t ) = { ϕ ( x 1 t ) , , ϕ ( x N t t ) } H
In this way, the distribution of ϕ ( X s ) and ϕ ( X t ) in RKHS H can be as similar as possible. Moreover, the similarity here can exactly be measured by MMD:
MMD 2 ( X s , X t ) = 1 N s i = 1 N s ϕ ( x i s ) 1 N t i = 1 N t ϕ ( x i t ) 2
where ϕ ( · ) is the mapping defined by reproducing kernel k .
In practice, it is not easy to learn an optimal RKHS H based on MMD. Most methods based on MMD choose to learn a linear subspace s p a n Θ of RKHS H , so the MMD distance can be expressed as follows:
MMD 2 ( X s , X t ) = 1 N s i = 1 N s ϕ sp a n Θ ( x i s ) 1 N t i = 1 N t ϕ sp a n Θ ( x i t ) 2
where ϕ sp a n Θ ( X s ) and ϕ sp a n Θ ( X t ) mean the projection of ϕ ( X s ) and ϕ ( X t ) in the subspace, respectively.

3. Domain Adaptation Based on Source LDA Regularized RKHS Subspace Learning and Its Application in BCI

3.1. Reformulation of the RKHS Subspace Learning Framework

3.1.1. Construction of RKHS

The regenerated kernel of RKHS is used to construct the transformation from the original data space to RKHS, rather than defining the transformation first and then using the transformation and the inner product of RKHS to define the so-called “kernel function”, which is not actually the reproducing kernel of RKHS. However, many studies have used the reproducing kernel to define the transformations from original data space to RKHS, ignoring the connection between the original data space and RKHS. Therefore, we reformulated a mathematical framework model of RKHS in this section.
Let ( H , , ) be the RKHS on the data space Ω , and use the reproducing kernel k of H to define the transformation from the data space Ω to H : φ : Ω H , for any x Ω , we define φ ( x ) = k ( , x ) H , so for any x , y Ω , we have φ ( x ) , φ ( y ) = k ( x , y ( ) ) .
Now, given a set of data on data space Ω ,
X = { x 1 , , x N } Ω
Feature map φ ( · ) is used to transform X to H
φ ( X ) = { φ ( x 1 ) , , φ ( x N ) } H
The kernel matrix K is represented as
K = [ k ( x 1 , x 1 ) k ( x 1 , x N ) k ( x N , x 1 ) k ( x N , x N ) ] = [ K 1 C o l K N C o l ] R N × N
where k ( x i , x j ) = φ ( x i ) , φ ( x j ) , and K i C o l is the i th column vector of K , i = 1 , , N .

3.1.2. The Construction and Restraint of the RKHS Subspace

φ ( X ) is used to construct a basis of subspace of H :
θ i = j = 1 N w j i φ ( x j )   i = 1 , , d
We define
W = [ w 11 w 1 d w N 1 w N d ] = [ W 1 C o l W d C o l ] R N × d
where W i C o l is the i th column vector of W , and i = 1 , , d . We use Θ = { θ 1 , , θ d }   to span a subspace of H :
s p a n Θ = { i = 1 d α i θ i | α i R , i = 1 , , d }
To constitute the orthonormal basis of subspace s p a n Θ , Θ satisfies the condition of bivariate orthogonality:
[ θ 1 , θ 1 θ 1 , θ d θ d , θ 1 θ d , θ d ] = [ W 1 C o l T KW 1 C O l W 1 C o l T KW d C O l W d C o l T KW 1 C O l W d C o l T KW d C O l ] = W T K W = I d
Subspace s p a n Θ is a d -dimensional subspace and is determined by the data transformed to subspace and combination coefficient W , which satisfies the above constraints.

3.1.3. Representation of Data in the RKHS Subspace

Since RKHS is an infinite dimensional space, machine learning algorithms cannot be directly applied to such space, so it needs to project the data in RKHS into the subspace of RKHS. According to the projection theorem, if { θ 1 , , θ d } is the orthonormal basis of subspace s p a n Θ , then the coordinates of the projection of φ ( x i ) on subspace s p a n Θ are
y i = [ φ ( x i ) , θ 1 φ ( x i ) , θ d ] = [ W 1 C o l T K i C o l W d C o l T K i C o l ] = W T K i C o l
where i = 1 , , d . By constructing the subspace of RKHS, we implemented the transformation of data from the original data space Ω to the Euclidean space R d :
X = { x 1 , , x N } Ω Y = { y 1 , , y N } R d
The working space is Euclidean space R d , of which W will be determined according to the specific machine learning task. The orthonormal basis of the subspace is constructed by the linear combination of transformed samples. Due to the requirements of the orthonormal basis, the constraints of the combination coefficient of the subspace are obtained, rather than based on some assumptions. The original data is transformed to RKHS and projected to the subspace. The coordinates of this projection on the orthonormal basis of the subspace are the final representation of the original data.

3.2. Domain Adaptation Based on SLDA Regularized RKHS Subspace Learning

3.2.1. Domain Adaptation Based on RKHS Subspace Learning and MMD

Given a set of source domain data and a set of target domain data in the original data space Ω :
X s = { x 1 s , , x N s s } Ω ,   X t = { x 1 t , , x N t t } Ω
The source domain data X s is labeled while the target domain data X t is unlabeled. We define the following:
X = X s   X t = { x 1 , , x N } = { x 1 s , , x N s s , x 1 t , , x N t t } Ω ,   N = N s + N t ,
Using the RKHS subspace learning framework proposed in Section 3.1, we have
Y s = { y 1 s , , y N s s } R d , Y t = { y 1 t , , y N t t } R d
where   y i s = W T K i C o l , i = 1 , , N s and y i t = W T K ( N s + i ) C o l , i = 1 , , N t . In the expressions of Y s and Y t , the matrix W is unknown and represents the subspace of RKHS, desired distribution of Y s and Y t in the Euclidean space R d can be achieved by learning W . To measure the difference between two distributions, MMD between X s and X t can be calculated as follows:
MMD 2 ( X s , X t ) = 1 N s i = 1 N s y i s 1 N t j = 1 N t y j t 2
where y i s = W T K i C o l and y j t = W T K ( N s + j ) C o l , the MMD distance in (25) can be rewritten as
MMD 2 ( X s , X t ) = t r ( W T L W )
and
L = ( 1 N s i = 1 N s K i C o l 1 N t j = 1 N t K ( N s + j ) C o l ) ( 1 N s i = 1 N s K i C o l 1 N t j = 1 N t K ( N s + j ) C o l ) T

3.2.2. Domain Adaption Based on Source LDA Regularized RKHS Subspace Learning (SLDARKHS-DA)

MMD is an approximate criterion rather than an exact one. Therefore, it is common practice to add regularization terms to compensate for the deficiency of MMD. In transfer learning, KNN is a commonly used classifier. To improve the classification efficiency of KNN, we considered the reduction of the within-class scatter between the source domain and the target domain, while increasing the between-class scatter. Since the target domain is usually unlabeled, the SLDA proposed in this section only applies to the source domain data.
During the distribution matching process, it would be helpful to keep samples of the same class close to each other while the samples of different classes are far from each other. For this purpose, we define the transformed source domain data as C categories, and each category has N c data samples, which can be expressed as:
{ y 1 , , y N } = { y 11 s , , y 1 N 1 s , , y C 1 s , , y c N c s } ,   N S = c = 1 C N c
where y c i s , c = 1 , , C , i = 1 , , N c .
The center of the c th class can be computed as follows:
y c = 1 N c i = 1 N c y ci = 1 N c i = 1 N c W T K c ( i ) C o l = W T ( 1 N c i = 1 N c K c ( i ) C o l ) = W T K - C o l c
Moreover, the center of all the samples can also be computed as follows:
y = 1 N c = 1 C i = 1 N c y c i = 1 N c = 1 C i = 1 N c W T K c ( i ) C o l = W T ( 1 N c = 1 C i = 1 N c K c ( i ) C o l ) = W T K C o l
where K C o l c = 1 N c i = 1 N c K c ( i ) C o l , K C o l = 1 N c = 1 C i = 1 N c K c ( i ) C o l
(1)
To increase the distance between the different types of source domain data, the between-class scatter can be defined and rewritten as:
S b = c = 1 C N c N S || y c y || 2 = t r ( W T Ψ W )
where Ψ = c = 1 C N c N S ( K C o l c K C o l ) ( K C o l c K C o l ) T
(2)
To improve the discriminative efficiency of the same category data in the subspace, the intra-class divergence can be expressed as follows:
S w = 1 N S c = 1 C i = 1 N i || y c i y c || 2 = t r ( W T Φ W )
where Φ = 1 N S c = 1 C i = 1 N i ( K c ( i ) C o l K C o l c ) ( K c ( i ) C o l K C o l c ) T . The distance between the same types of data in the source domain is reduced, so that the same type of data will be more concentrated.

3.2.3. Solution

Since the target domain data is used to rely the source domain data in the subspace, the optimization of the target domain data in the subspace will not be effective if KNN is used to identify the target domain data in the subspace. By adding the regularization term of SLDA, the overall objective function of our proposed SLDARKHS-DA can be formulated as follows:
min W t r ( W T L W ) + λ t r ( W T ( Φ Ψ ) W ) + μ t r ( W T W ) = t r ( W T N W ) s . t . W T K W = I d
where N = L + λ ( Φ Ψ ) + μ I . This model can be solved by the properties of generalized Rayleigh entropy. Since K is symmetric positive definite, it can be expressed as
K = U Σ 1 2 Σ 1 2 U T
where U U T = I , Σ 1 2 = d i a g ( σ 1 , , σ N ) , σ i are the eigenvalues of K , i = 1 , , N . W T K W can be rewritten as W T K W = W T U Σ 1 2 Σ 1 2 U T W = V T V = I d according to the above restrictions, and V = Σ 1 2 U T W . We reformulate (32) by V :
t r ( W T N W ) = t r ( V T M V )
where M = Σ 1 2 U T ( L + γ ( Φ Ψ ) + μ I ) U Σ 1 2 . It can be solved by the generalized Rayleigh entropy. V is a d-dimensional row vector, which is the eigenvector corresponding to the first d smallest eigenvalues of matrix M .

3.2.4. Computational Complexity

The computational complexity of the SLDARKHS-DA Algorithm 1 consists of the following three main components: (1) the complexity of the feature problem optimization in step 2, (2) computing Φ and Ψ and (3) the computing of K and L . The complexity is usually expressed in terms of O, and the complexity of the generalized eigen-decomposition is O(dn2) (d is the dimension of the subspace). By computing Φ , Ψ is O(dn2), and by computing K , L is O(n2). Therefore, the total complexity of the algorithm is O((4d + 1)n2).
Algorithm 1: SLDARKHS-DA
Input: source domain data set X s   and target domain data set   X t , label information of X s ; parameters   λ , μ and subspace dimension d .
Output: projection matrix W and the label information of X t .
  • Combine source domain data set and domain data set: X = [ X s ,   X t ] ;
  • Compute K , L , Φ , Ψ , M ;
  • Eigendecompose the matrix M and select the d leading eigenvectors to construct the projection matrix W ;
  • Project both X s and X t to obtain data in the subspace, y i s = W T K i C o l and y i t = W T K ( N s + i ) C o l . Classify y i t   in the subspace by KNN, and y i s   is used as the reference.

3.3. Application of SLDARKHS-DA to EEG Motor Imagery Recognition

3.3.1. Description of BCI IV 2a Data

Nowadays, many BCI data recognition tasks are handled by domain adaptation methods. Previous studies [20,21] have shown the effectiveness of domain adaptation approaches in reducing the differences in data distribution between subjects or sessions.
The BCI competition dataset is commonly used as a benchmark dataset for BCI domains. This 2a dataset consisted of nine subjects recorded [29]. Subjects were asked to imagine moving four parts of their body: left hand, right hand, foot and tongue. Addressing multi-class problems is an important challenge for the BCI system.

3.3.2. Domain Adaptation Subspace Learning Based on Sparse Regularized RKHS

Considering the complexity of BCI data, when the transformed data are projected into the subspace, the dimensionality reduction will be performed and some irrelevant data features should be discarded. We selected the most favorable data to improve the recognition effect, construct the subspace by row sparse projection matrix and minimize the geometric offset of the data.
We used the L 2.1 norm to constrain the W matrix so that the rows were sparse. The L 2.1 norm of matrix A is defined as follows:
|| A || 2 , 1 = j = 1 D i = 1 C A i j 2 , A C × D
The L 2.1 norm makes the   L 2 norm of each line as small as possible, and as many zeros appear in the line as possible to achieve sparsity.

3.3.3. Solution

By adding the L 2.1   norm of matrix W as a sparse regularization term, the overall objective function of our proposed SLDARKHS-DA based on sparse regularization terms is formulated as follows, show in Algorithm 2.
min W t r ( W T ( L + γ ( Φ Ψ ) + μ I ) W ) + λ || W || 2 , 1 s . t . W T K W = I d
where t r ( W T L W ) represents the MMD distance between the source domain sample and the target domain sample in the subspace. The purpose of t r ( W T ( Φ Ψ ) W ) is to increase the inter-class divergence and reduce the intra-class divergence of the data in the subspace. || W || 2 , 1 is the L 2.1 norm of matrix W to the sparse elements that make up the basis of the subspace. The regularization term t r ( W T W ) can avoid the over-fitting of the model. The constraint W T KW = I d serves two purposes: (1) to make the basis of the subspace orthogonal, and (2) to avoid trivial solutions and ensure that W is not 0. λ ,   μ ,   γ represents the coefficient of the regularization term.
To solve the optimization problem, we introduced a Lagrange multiplier Λ , and the Lagrange function for the model can be obtained as follows:
L ( W , Λ ) = t r ( W T ( L + γ ( Φ Ψ ) + μ I ) W ) + λ || W || 2 , 1 t r ( ( W T K W I d ) Λ )
Then, by taking the derivative of (37) with respect to W , and setting the derivative to zero, we obtain
( ( L + γ ( Φ Ψ ) + μ I ) + λ G ) W = K W Λ
Note that || W || 2 , 1 is not smooth, so we computed its subgradient G , which is a diagonal matrix with the i th diagonal element that equals to
G i i = { 0 ,     i f   W i = O 1 2 || W i || ,     o t h e r w i s e
where W i denotes the i th row of W . Thus the concatenated multiple transformations can be solved by calculating the d smallest eigenvectors of K W Λ .
Algorithm 2: SLDARKHS-DA (Sparse)
Input: source domain data set X s   and target domain data set   X t , label information of X s ; parameters γ ,   λ , μ and subspace dimension d .
Output: projection matrix W and the label information of X t .
  • Combine the source domain data set and domain data set:   X = [ X s ,   X t ] ;
  • Computer matrix K , L , Φ , Ψ and initialize G = I .
Repeat
3.
Optimize W by solving the eigen-decomposition problem in (38);
4.
Update G by (39).
Until convergence or max iteration
5.
Project both X s and X t to obtain the data in the subspace, y i s = W T K i C o l and y i t = W T K ( N s + i ) C o l . Classify y i t   in the subspace by KNN, and y i s   is used as the reference.

4. Experiments

To verify the fitness of the SLDA regularization term, we first conducted experiments on four commonly used standard benchmark datasets (faces, objects, handwritten digits and text). We added the SLDA regularization term to the comparison algorithm. For example, we added the SLDA regularization terms to TCA and performed comparison experiments with the original TCA. Second, we tested the performance of our SLDARKHS-DA on the 4th BCI competition 2A dataset and compared it with some classical baselines published in recent years, respectively. All the methods were programmed in MATLAB 2019 and executed on a PC (CPU: Intel i9) with 3.50 GHz and memory: 16 GB. The source programs of the baseline methods used for the comparison can be downloaded from GitHub. The following are available online at https://github.com/viggin/domain-adaptation-toolbox (TCA, downloaded in April 2021), https://github.com/minjiang/iglda (IGLDA, downloaded in April 2021) and https://github.com/lijin118/tit (TIT, downloaded in May 2021).

4.1. Baseline and Parameter Settings

We compared the proposed method with typical subspace learning methods in domain adaptation, such as TCA [24], IGLDA [25] and TIT [26]. The details of each baseline methods are summarized below:
(1) TCA [24] is a typical example of RKHS subspace learning in domain adaptation. TCA converts the original data into RKHS, then finds a subspace in this space to reduce the dimensionality of the data, and then uses MMD to measure the distance between the two domains. Its objective function is as follows:
min W tr ( W T K L K W ) + μ tr ( W T W ) s . t . W T K H K W = I m
where tr ( W T W ) prevents the over-fitting of data and W T K H K W = I m can maintain the data characteristics of the source domain data and target domain data.
(2) Jiang et al. proposed the integration of global and local metrics for domain adaptation learning (IGLDA) [25], which added the regularization term based on TCA. IGLDA introduces data label information to keep source domain and target domain data as close as possible while preserving the geometric properties of source domain data. The objective function of this method is as follows:
min W tr ( W T K L K W ) + α tr ( W T K L w K W ) + β tr ( W T W ) s . t . W T K H K W = I d
where L w represents the within-class divergence matrix.
(3) Li proposed another approach in 2019 [26], which combines the manifold regularization terms used in SSTCA, a kind of regularization terms to select features, and regularization terms to minimize the variance in the target domain. In the experiments, the sample selection is performed by iterative experiments, and the final objective function is as follows.
min W tr ( W T K L K W ) + α tr ( W T K ξ K W ) β tr ( W T K C K W ) + γ || W || 2 , 1 s . t . W T K H K W = I d
where tr ( W T K C K W ) can be used to minimize the variance of the target domain data in the subspace, and || W || 2 , 1 selects the feature from the data.
In our experiment, we use the K-nearest neighbor (KNN) as a classifier to evaluate the performance of the proposed method. We obtained the parameter setting with the best classification accuracy through grid search and applied the same parameter selection process to the baseline methods. Each of the hyper-parameters used in our experiments was chosen. We chose the best parameter by searching in the range of [10−15, 102]. For simplicity and clarity, we chose an acceptable common set of them, as shown in Table 2. In the experiments of the first four data sets, we used the linear kernel as kernel function, while in the experiments of the BCI data set, we used the radial basis function (RBF).

4.2. Face Recognition

In this section, we evaluate the effectiveness of the proposed algorithm in face recognition tasks. The AR dataset [30] is widely used in experiments in the field of face recognition. In this section, we select a subset of AR data set with a total of 2600 face images. This dataset consists of 100 people, with 50 men and 50 women. Each subject has 26 images. The AR face images were captured twice, with an interval of two weeks between the two shots. Each shot collected 13 pictures of different modes with different light brightness, light angle, facial expression and occlusion (sunglasses or scarf). In this experiment, each face image was normalized as a gray level image of pixels. The training set and test set directly used the gray value and vectorization of the image as the input. According to the different shooting times and states, 26 face images of each subject corresponded to 26 patterns, which were numbered as 1a to 1m and 2a to 2m. Figure 1 shows a sample of the AR data set with 26 face images from the same subject. Figure 1a–m belongs to one group, while 2a–2m is from another group, which is under the same conditions taken two weeks later. We used the notation 1 . a and 2 . a to represent a collection of natural expressions of the face images. In this section, 1 . a and 2 . a are combined as source domain data set X S .
From the other 24 patterns except 1 . a and 2 . a , the first 18 patterns were selected, including 1 . b to 1 . j and 2 . b to 2 . j , and the data of these patterns were taken as 18 target domain data sets respectively and 18 classification tasks were set.
In the first experiment, we studied how our proposed SLDARKHS-DA affected the distribution of the source and target domains. We took 1 . f and 2 . f as target domain X T , respectively, and calculated the distance between the geometric center of the source domain and target domain and the variance of the source domain data in the original space and subspace, to prove the effectiveness of domain adaptation. As shown in Table 3, in the experiment with 1 . f as the target domain, after the data of the source domain and target domain are transformed from the original space to the subspace, not only do the geometric centers of the data of the two almost coincide, but also the variance of the data is greater. As the distance between the classes in the source domain becomes larger, the classification efficiency of the KNN algorithm improves. Similarly, the geometric distribution of data with 2 . f as the target domain, also shows a similar change.
In the second experiment, we regard the data in the source domain as labeled and the data in the target domain as unlabeled. Similarly, we combined 1 . a and 2 . a as the source domains and set the target domains as 1 . b to 1 . j and 2 . b to 2 . j . A total of 30% of the images from the target domain were crystals selected as training data and the transformation function by the domain adaptation methods obtained by X T and X S . We set the KNN as the default classifier and the expected subspace dimensionality was fixed at 90 for the classification experiments. The experimental results are shown in Table A1 in Appendix A.
We noted that the direct classification of X T data with the KNN was worse than the classification of the mapped   Γ ( X T ) data obtained by RKHS-DA with KNN. Moreover, the classification accuracy of the SLDARKHSDA algorithm combining the regularization terms SLDA and RKHS-DA improved by 4% on average.
Furthermore, we combined the proposed regularization term SLDA with the baseline algorithm for comparison to form a new domain adaptation algorithm, which was compared with the original baseline algorithm. As shown in Table A3, in terms of average classification accuracy, the SLDA improves the TCA by 3.1%, IGLDA by 1.2% and TIT by 2.8%, respectively. This agrees with our idea of RKHS subspace learning, because the baseline algorithm compared with our RKHS-DA algorithm is similar in terms of domain adaptation; therefore, SLDA also improves the performance of such an algorithm.
In the third experiment, to investigate how the dimensionality of the subspace of the feature map affects the final performance of our algorithm, we combined 1 . a and 2 . a as the source domain and took 1 . f as the target domain for the classification experiment. We mapped the data into different dimensionalities in subspace from 10-dimensional to 100-dimensional, the step size was set to 10 and other parameters were set to the same values as in the second experiment. The experimental results are shown in Figure 2. We observed that the larger the subspace dimension, the higher the classification accuracy. However, the curve of classification accuracy tends to flatten out as the subspace dimension keeps increasing. Compared with the original baseline algorithm, the baseline algorithm combining the SLDA regularization term achieved a higher accuracy in different subspace dimensions, which means that the SLDA regularization term proposed in this paper is robust and stable.

4.3. Object Recognition

Caltech-256 (C, collected by the California Institute of Technology), Amazon (A, images downloaded from amazon.com in October 2020), webcam (W, low resolution images captured by a Web camera) and DSLR (D, high-resolution images captured by a digital SLR camera) 4 datasets domain adaptation (4DA) are the most popular benchmarks in domain adaptation. The number of common categories in the 4 domains is 10, indicating that the number of categories in the 4DA dataset is 10. Each category in each domain has 8 to 151 samples, with a total of 2533 images. Figure 3 shows some samples selected from the 4DA.
For all datasets, we followed [31] to preprocess the data using a similar feature extraction and experimentation protocol. By randomly selecting two different domains as the source and target domains, a total of 4 × 3 = 12 cross-domain object recognition tasks were constructed. In each task, we randomly selected a certain number of samples from each category as the source domain data for the training set.
When D was the source domain, we drew 8 samples from each category; when A, C and W were the source domains, we drew 20 samples from each category. Then, the source domain samples were used as the training set data and the target domain samples were used as the test set.
The results of the first experiment are shown In Appendix A, Table A2. Compared with the original space, the geometric center distance between the source and target domains in the subspace is greatly reduced, and the number variance of the source and target domains is greatly increased.
The results of the classification experiments are shown in Appendix A, Table A3. The classification accuracy of the SLDARKHSDA algorithm with the addition of the SLDA regularization term is about 2% higher than that of the KNN and RKHS-DA algorithms.

4.4. Handwritten Numeral Classification

In this section, the USPS+MNIST dataset is used for handwritten digit classification experiments. The USPS dataset consists of 7291 training images and 2007 test images of size 16 × 16. The MNIST dataset has a training set of 60,000 examples and a test set of 10,000 examples of size 28 × 28.
The images of both the MNIST and USPS datasets share 10 grayscale images of handwritten Arabic numerals. These images were rescaled to a size of 16 × 16, which allowed the numbers to be fixed in the center of the entire image and the images to be of the same size. Figure 4 shows an example of MNIST and USPS data sets.
The experiment in this section was conducted on a subset of MNIST+USPS data set, which consisted of two parts: the first part was 2000 images randomly selected from the MNIST data set, and the second part was 1800 images randomly selected from the USPS data set.
Similarly, all the images in the subset were uniformly resized to 16 × 16 pixels and the gray value of the pixels was used as a feature vector to represent each image. Thus, the samples of MNIST and USPS lie in the same 256-dimensional feature space. To speed up the experiments, we constructed a dataset MINST vs. USPS, randomly selected 50 sets of digital images in MINST, with a total of 500 images to form the source data, and used all the images in USPS to form the target data.
Like we did on the other data sets, in the first experiment, we fixed the dimension of the subspace as 150, and after our algorithm transformation, D(S, T) was reduced from 2.96 to 1.15, Var(S) was changed from 3.6 to 3.2, and Var(T) was changed from 4 to 6. Although the Var(S) is smaller, which is not what we expected, the ratio of Var(S)/D (S, T) is larger, so it still verifies the effectiveness of our algorithm.
In the second experiment, we trained the KNN classifier to repeat the classification experiment 100 times, and used a linear kernel function. The subspace dimensions were set to 30 to 150 and the step size was 20. Figure 5 shows the experimental results. The SLDARKHS-DA algorithm with the SLDA regularization term improves the classification accuracy of RKHS-DA algorithm by about 3%, which is much higher than the classification accuracy of KNN directly (51.18%). Similar results were found for other baseline methods: the accuracy of the baseline algorithm with the SLDA regularization term was higher than that of the original baseline algorithm. In addition, the variation of subspace dimensions had little effect on the classification accuracy of each algorithm.

4.5. Text Categorization

Reuters-21578 dataset (Dai et al., 2007) contains three cross-domain document categorization tasks, Orgs vs. People, Orgs vs. Places and People vs. Places. The notation “orgs vs. Place” indicates that we have the Org subtype as the source domain data and the Place subtype as the target domain. There are 1237 source documents and 1208 target documents for the task of Orgs vs. People, 1016 source documents and 1043 target documents for the task of Orgs vs. Places and 1077 source documents and 1077 target documents for the task of People vs. Places. We randomly selected 50% of the source domain data as the training set and used all the target domain data as the testing set.
In the first experiment, we set the subspace dimensions from 10 to 50 with a step size of 10, and calculated the variance and the distance between the source domain and the geometric center of the target domain. The experimental results are shown in Appendix A, Table A4.
In the second experiment, we used the KNN classifier to verify the effect of the methods.
The experimental results are shown in Appendix A, Table A5. In almost all the dimensions and all the experiments, the recognition rate of RKHS-DA was improved to some extent by the SLDA regularization term. In addition, SLDA also improved the classification accuracy of the other baseline methods used for comparison.

4.6. Motor Imagery Classification

As described in Section 3, we used the 2a dataset from the BCI competition IV, which consists of nine subjects [32]. The subjects were sitting in an armchair in front of a computer screen. As shown in Figure 6, at the beginning of the trial (t = 0 s), a fixation cross appeared on the black screen. In addition, a short acoustic warning tone was presented. After two seconds (t = 2 s), a cue appeared and stayed on the screen for 1.25 s. This prompted the subjects to perform the desired motor imagery task (left hand, right hand, both foot and tongue). No feedback was provided. The subjects were asked to carry out the motor imagery task until the fixation cross disappeared from the screen at t = 6 s.
For each subject, two periods of data were recorded on two different days, with 288 tails for each period and 72 trajectories for each category. We captured data from 1.5 to 6.5 s for one trial. The recorded EEG signals were sampled with 250 Hz and filtered by a fifth-order Butterworth filter band in the 8–30 Hz frequency band.
We took A01T to A09T as the source domain and A01E to A09E as the target domain, respectively, while a total of 9 experiments were set up. In this experiment, the PCA algorithm as the baseline method was added for the dimensionality reduction of the original spatial data, and KNN was used as the default classifier of all algorithms. We set the parameter γ of the sparse regularization item to 10−2, and the other parameter settings are shown in Section 4.1.
Since our ultimate goal was to compare the performance of our method with the other baseline methods on the BCI 2a dataset, in the first experiment, we fixed the dimension of our subspace to 25 and performed the classification on different subjects from A01 to A09 for comparison. Figure 7 shows that our method outperforms the baseline algorithm in all experiments, except for the result recorded in A04.
In the second experiment, we compared our method with the baseline method in terms of the dimensionality reduction. We used A01T as the source domain and A01E as the target domain, and the dimensionality of the subspace varies from 10 to 110. As shown in Figure 8, our SLDARKHS-DA (Sparse) outperforms the other baseline methods.
In the third experiment, we investigated the impact of our algorithm on the source domain data distribution. For visualization purposes, we applied tSNE to both the original data and the transformed data. Figure 9a shows a two-dimensional representation of the original data vector, i.e., each point in the figure is a representative of a trial. Moreover, Figure 9b shows the representation of the transformed data vector obtained by our SLDARKHS-DA (Sparse). In Figure 9a,b, the points are colored according to the mental task. We observe that the source domain data are chaotic in the original space, while our algorithm separates the four classes of data, which facilitates the accuracy of the KNN classifier.
In the fourth experiment, Table A6 in Appendix A shows the classification accuracy of various original baseline algorithms and the baseline algorithm after adding the SLDA regularization term. Firstly, from the perspective of the domain adaptation framework, RKHS-DA and the other baseline algorithms of the domain adaptation are better than PAC+KNN, and the SLDARKHS-DA (Sparse) is better than the other algorithms. In addition, it can be seen that the SLDA regularization term has certain improvements over the other baseline algorithms used for comparison.
In the fifth experiment, we conducted experiments on subject A01, with A01T as the source domain and A01E as the target domain. We set the range of the subspace dimension from 10 to 110. The average classification results are shown in Figure 10. Based on these results, we observe that the classification performance of the algorithm with regularization SLDA is better than that of the original baseline algorithm in all the dimensions.

5. Conclusions

In this paper, we reorganized the RKHS subspace learning framework based on the theory of RKHS, which consists of functions defined on the original data space instead of the Hilbert space that is independent of the original data space. We first proposed an SLDA regularization term based on the discriminant analysis of the source domain data. The regularization term can increase the inter-class distance and decrease the intra-class distance. Based on the SLDA and RKHS subspace learning framework, we proposed a domain adaptation algorithm. Based on the application of BCI, we selected the most desired data to form the basis of the subspace by adding sparse constraints, i.e., L 2.1 norm. Extensive experiments validated the effectiveness of our algorithm.
In the future, we plan to continue our work by pursuing several avenues. First, SLDARKHS-DA uses parametric kernels for the MMD, and we plan to develop an efficient algorithm for kernel choice in SLDARKHS-DA. Second, to improve the sensitivity of the MI data, we will use the frequency domain features of the MI data. Moreover, we plan to extend SLDARKHS-DA to other BCI experiments with cross-subject settings.

Author Contributions

Conceptualization, S.L. and Z.M.; methodology, L.J. and Z.M.; software, L.J.; validation, L.J., W.L. and C.C.; formal analysis, W.L.; investigation, L.J.; resources, L.J.; data curation, L.J.; writing—original draft preparation, L.J.; writing—review and editing, W.L. and C.C.; visualization, C.C.; supervision, S.L. and Z.M.; project administration, S.L.; funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China grant number 61773022 and Science and Technology Program of Guangzhou grant number 68000-42050001.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Classification accuracy (in %) of face recognition in different tasks (1(a) and 2(a) for source domain).
Table A1. Classification accuracy (in %) of face recognition in different tasks (1(a) and 2(a) for source domain).
TaskKNNRKHS-DASLDA RKHS-DATCATCA + SLDAIGLDAIGLDA + SLDATITTIT + SLDA
1.b98.0097.0097.0097.0097.0099.0097.0097.0099.00
1.c89.0089.0095.0089.0094.0090.0094.0088.0094.00
1.d57.0063.0064.0063.0065.0064.0065.0060.0066.00
1.e15.0088.0091.0088.0091.0090.0092.0085.0090.00
1.f11.0074.0084.0074.0088.0081.0088.0075.0083.00
1.g3.0063.0067.0063.0067.0064.0067.0059.0069.00
1.h48.0063.0064.0063.0065.0068.0065.0061.0061.00
1.i33.0057.0049.0057.0054.0058.0054.0052.0049.00
2.b96.0097.0098.0097.0098.0098.0098.0095.0098.00
2.c92.0095.0099.0095.0099.0095.0099.0093.0099.00
2.d65.0069.0064.0069.0068.0070.0068.0067.0063.00
2.e21.0088.0093.0088.0095.0090.0095.0088.0093.00
2.f17.0078.0083.0078.0086.0079.0086.0074.0082.00
2.g2.0048.0072.0048.0073.0055.0073.0043.0072.00
2.h38.0068.0056.0068.0061.0067.0061.0063.0056.00
2.i41.0059.0054.0059.0058.0061.0058.0057.0049.00
Average45.3874.7576.8874.7578.6976.8178.7572.3176.44
Table A2. Data distribution in the original space and subspace for object recognition.
Table A2. Data distribution in the original space and subspace for object recognition.
Original SpaceSubspace
TaskD(S, T)Var(S)Var(T)D(S, T)Var(S)Var(T)
A→W1.66527.1927.970.00681030.641046.91
A→C1.70427.6827.580.00521564.311567.77
A→D1.67827.7828.030.00971017.99979.06
C→W1.77627.4027.970.00811054.881043.30
C→A1.82628.1127.750.00801389.241411.04
C→D1.85127.0328.030.0088975.87975.18
D→W2.22528.1427.970.00851032.311000.64
D→A2.29827.6227.750.00531242.081427.02
D→C2.31527.9727.580.00491359.131502.02
W→D1.25727.6228.030.0165932.61934.83
W→A1.23627.7327.750.01291215.041399.26
W→C1.28328.0427.580.01121272.641456.59
Table A3. Classification accuracy (in %) of object recognition in different tasks.
Table A3. Classification accuracy (in %) of object recognition in different tasks.
TaskKNNRKHS-DASLDA RKHS-DATCATCA + SLDAIGLDAIGLDA + SLDATITTIT + SLDA
A→W29.8333.1435.4534.8335.9235.2835.7037.1337.32
A→C26.0033.0934.9134.8335.4134.9935.0133.4433.90
A→D25.4828.8330.2535.6136.7430.7631.2335.4335.79
C→W25.7631.8232.5432.9033.1032.8233.6637.0737.82
C→A23.7035.0236.6836.6137.0936.6237.2633.8433.77
C→D25.4832.9634.6935.5736.3735.0935.4039.9740.78
D→W63.3959.4460.7961.9362.1360.5860.8470.4271.40
D→A28.4931.8933.9234.0334.2333.9634.0527.9028.39
D→C26.2731.0433.1532.9133.4632.5733.0626.2626.61
W→D59.2466.9368.5968.9769.8268.9069.0677.3478.35
W→A22.9633.4135.6435.7135.8035.4835.8631.5532.14
W→C19.8628.5030.6530.8631.3930.6130.8827.9928.56
Average31.3737.1738.9439.5640.1238.9739.3339.8640.40
Table A4. Data distribution in the original space and subspace for text categorization.
Table A4. Data distribution in the original space and subspace for text categorization.
TaskSDOriginal SpaceSubspace
D(S, T)Var(S)Var(T)D(S, T)Var(S)Var(T)
People
vs.
Places
101.9660.6264.140.005844.131283.17
201.9859.9564.140.0051002.981763.11
301.9660.4464.140.0101346.932049.31
401.9361.0564.140.0081406.102240.79
501.9261.5664.140.0022530.732463.35
Orgs
vs.
People
101.9061.6565.700.004688.331190.34
201.8562.7265.700.0051023.54947.86
301.8963.1965.700.0022046.412210.36
401.8962.0865.700.0021299.082131.33
501.8963.0165.700.0081447.911459.49
Orgs
vs.
Places
101.9961.4463.410.005690.081161.74
201.9961.1863.410.007976.391591.35
301.9861.1463.410.0021878.041939.63
402.0061.3763.410.0081390.502116.67
501.9761.7863.410.0071408.702276.13
Table A5. Classification accuracy (in %) of text categorization in different tasks.
Table A5. Classification accuracy (in %) of text categorization in different tasks.
TaskSDKNNRKHS-DASLDA
RKHS-DA
TCATCA + SLDAIGLDAIGLDA
+SLDA
TITTIT + SLDA
People
vs.
Places
1045.1861.4261.7261.3061.7062.0962.7251.8857.96
2061.3360.8160.9261.5761.3861.8153.6257.47
3059.2059.7859.3460.3559.6960.7854.5557.17
4057.9958.1558.4858.8458.5759.1554.8856.25
5057.4157.8557.4458.1257.9658.8554.1557.22
Orgs
vs.
People
1045.3276.1476.9776.8077.1977.1476.9758.2268.09
2077.9978.9179.1679.7779.1778.9162.8572.51
3078.0779.0378.4579.8879.0779.0363.3874.80
4078.5779.4179.1880.2179.2679.4165.5675.82
5078.3179.0779.0379.8579.1379.0766.3375.69
Orgs
vs.
Places
1054.3970.2769.8869.7769.7970.6069.8857.0361.94
2072.3772.4572.4473.4571.9772.4560.9163.61
3071.8471.8171.7172.3871.7671.8162.6965.48
4071.1871.2471.3771.7271.3871.2463.9565.80
5070.9771.5570.8971.5871.3071.5564.6665.88
Average 48.3069.5469.9160.7570.4370.0370.2459.6465.05
Table A6. Classification accuracy (in %) of BCI—motor imagery on different subjects.
Table A6. Classification accuracy (in %) of BCI—motor imagery on different subjects.
SubjectKNNRKHS-DASLDA
RKHS-DA(Sparse)
TCATCA + SLDAIGLDAIGLDA + SLDATITTIT + SLDA
A0138.8964.2467.9863.8268.4264.6963.6562.2663.53
A0235.0731.9443.1933.2644.7134.6933.8534.0346.48
A0329.5161.8165.7262.5367.4263.3362.9965.9467.03
A0426.7433.6830.9734.3432.2434.3129.8629.2030.89
A0524.6524.3126.9024.3426.3123.7823.8924.2726.24
A0624.3131.9433.7831.9134.5731.7432.2931.9836.58
A0749.6550.6956.7650.1757.8050.5649.1746.8857.35
A0826.0457.9966.0359.1366.1359.8359.5855.5255.96
A0925.0064.5867.5667.0868.4267.1266.5368.7571.03
Average27.9946.8050.9947.4051.7847.7846.8746.5450.56

References

  1. Nicolas-Alonso, L.F.; Gomez-Gil, J. Brain Computer Interfaces, a Review. Sensors 2012, 12, 1211–1279. [Google Scholar] [CrossRef]
  2. Nijholt, A.; Bos, D.P.-O.; Reuderink, B. Turning Shortcomings into Challenges: Brain—Computer Interfaces for Games. Entertain. Comput. 2009, 1, 85–94. [Google Scholar] [CrossRef] [Green Version]
  3. Blankertz, B.; Dornhege, G.; Krauledat, M.; Müller, K.-R.; Curio, G. The Non-Invasive Berlin Brain—Computer Interface: Fast Acquisition of Effective Performance in Untrained Subjects. NeuroImage 2007, 37, 539–550. [Google Scholar] [CrossRef]
  4. Al-Ani, T.; Trad, D.; Somerset, V.S. Signal Processing and Classification Approaches for Brain-Computer Interface. Intell. Biosens. 2010, 25–66. [Google Scholar] [CrossRef] [Green Version]
  5. Leeb, R.; Friedman, D.; Müller-Putz, G.R.; Scherer, R.; Slater, M.; Pfurtscheller, G. Self-Paced (Asynchronous) BCI Control of a Wheelchair in Virtual Environments: A Case Study with a Tetraplegic. Comput. Intell. Neurosci. 2007, 2007, e79642. [Google Scholar] [CrossRef] [Green Version]
  6. Galán, F.; Nuttin, M.; Lew, E.; Ferrez, P.W.; Vanacker, G.; Philips, J.; Millán, J.D. A Brain-Actuated Wheelchair: Asynchronous and Non-Invasive Brain–Computer Interfaces for Continuous Control of Robots. Clin. Neurophysiol. 2008, 119, 2159–2169. [Google Scholar] [CrossRef] [Green Version]
  7. Krepki, R.; Blankertz, B.; Curio, G.; Müller, K.-R. The Berlin Brain-Computer Interface (BBCI)–towards a New Communication Channel for Online Control in Gaming Applications. Multimed. Tools Appl. 2007, 33, 73–90. [Google Scholar] [CrossRef]
  8. Hochberg, L.R.; Bacher, D.; Jarosiewicz, B.; Masse, N.Y.; Simeral, J.D.; Vogel, J.; Haddadin, S.; Liu, J.; Cash, S.S.; van der Smagt, P.; et al. Reach and Grasp by People with Tetraplegia Using a Neurally Controlled Robotic Arm. Nature 2012, 485, 372–375. [Google Scholar] [CrossRef] [Green Version]
  9. Pan, J.; Xie, Q.; He, Y.; Wang, F.; Di, H.; Laureys, S.; Yu, R.; Li, Y. Detecting Awareness in Patients with Disorders of Consciousness Using a Hybrid Brain–Computer Interface. J. Neural Eng. 2014, 11, 056007. [Google Scholar] [CrossRef]
  10. Pfurtscheller, G.; Da Silva, F.L. Event-Related EEG/MEG Synchronization and Desynchronization: Basic Principles. Clin. Neurophysiol. 1999, 110, 1842–1857. [Google Scholar] [CrossRef]
  11. Soleymanpour, R.; Arvaneh, M. Entropy-Based EEG Time Interval Selection for Improving Motor Imagery Classification. In Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, Hungary, 9–12 October 2016; pp. 004034–004039. [Google Scholar]
  12. Feng, J.; Yin, E.; Jin, J.; Saab, R.; Daly, I.; Wang, X.; Hu, D.; Cichocki, A. Towards Correlation-Based Time Window Selection Method for Motor Imagery BCIs. Neural Netw. 2018, 102, 87–95. [Google Scholar] [CrossRef] [Green Version]
  13. Thomas, K.P.; Guan, C.; Tong, L.C.; Vinod, A.P. Discriminative FilterBank Selection and EEG Information Fusion for Brain Computer Interface. In Proceedings of the 2009 IEEE International Symposium on Circuits and Systems, Taipei, Taiwanm, 24–27 May 2009; pp. 1469–1472. [Google Scholar]
  14. Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
  15. Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer Feature Learning with Joint Distribution Adaptation. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 2200–2207. [Google Scholar]
  16. Zhang, J.; Li, W.; Ogunbona, P. Joint Geometrical and Statistical Alignment for Visual Domain Adaptation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1859–1867. [Google Scholar]
  17. Wang, J.; Feng, W.; Chen, Y.; Yu, H.; Huang, M.; Yu, P.S. Visual Domain Adaptation with Manifold Embedded Distribution Alignment. In Proceedings of the 26th ACM international conference on Multimedia, Seoul, Korea, 22–26 October 2018; pp. 402–410. [Google Scholar]
  18. Patel, V.M.; Gopalan, R.; Li, R.; Chellappa, R. Visual Domain Adaptation: A Survey of Recent Advances. IEEE Signal Process. Mag. 2015, 32, 53–69. [Google Scholar] [CrossRef]
  19. Ganin, Y.; Ustinova, E.; Ajakan, H.; Germain, P.; Larochelle, H.; Laviolette, F.; Marchand, M.; Lempitsky, V. Domain-Adversarial Training of Neural Networks. In Domain Adaptation in Computer Vision Applications; Csurka, G., Ed.; Advances in Computer Vision and Pattern Recognition; Springer International Publishing: Cham, Switzerland, 2017; pp. 189–209. ISBN 978-3-319-58346-4. [Google Scholar]
  20. Sugiyama, M.; Krauledat, M.; Müller, K.-R. Covariate Shift Adaptation by Importance Weighted Cross Validation. J. Mach. Learn. Res. 2007, 8, 985–1005. [Google Scholar]
  21. Chai, X.; Wang, Q.; Zhao, Y.; Liu, X.; Bai, O.; Li, Y. Unsupervised Domain Adaptation Techniques Based on Auto-Encoder for Non-Stationary EEG-Based Emotion Recognition. Comput. Biol. Med. 2016, 79, 205–214. [Google Scholar] [CrossRef] [Green Version]
  22. Gretton, A.; Borgwardt, K.; Rasch, M.J.; Scholkopf, B.; Smola, A.J. A Kernel Method for the Two-Sample Problem. arXiv 2008, arXiv:08052368. [Google Scholar]
  23. Smola, A.; Gretton, A.; Song, L.; Schölkopf, B. A Hilbert Space Embedding for Distributions. In Proceedings of the Algorithmic Learning Theory; Hutter, M., Servedio, R.A., Takimoto, E., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; pp. 13–31. [Google Scholar]
  24. Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain Adaptation via Transfer Component Analysis. IEEE Trans. Neural Netw. 2011, 22, 199–210. [Google Scholar] [CrossRef] [Green Version]
  25. Jiang, M.; Huang, W.; Huang, Z.; Yen, G.G. Integration of Global and Local Metrics for Domain Adaptation Learning Via Dimensionality Reduction. IEEE Trans. Cybern. 2017, 47, 38–51. [Google Scholar] [CrossRef]
  26. Li, J.; Lu, K.; Huang, Z.; Zhu, L.; Shen, H.T. Transfer Independently Together: A Generalized Framework for Domain Adaptation. IEEE Trans. Cybern. 2019, 49, 2144–2155. [Google Scholar] [CrossRef]
  27. Lei, W.; Ma, Z.; Liu, S.; Lin, Y. EEG Mental Recognition Based on RKHS Learning and Source Dictionary Regularized RKHS Subspace Learning. IEEE Access 2021, 9, 150545–150559. [Google Scholar] [CrossRef]
  28. Larsen, R.J.; Marx, M.L. Introduction to Mathematical Statistics and Its Applications: Pearson New International Edition PDF EBook; Pearson Higher Ed: London, UK, 2013; ISBN 1-292-03672-9. [Google Scholar]
  29. Seperability of Four-Class Motor Imagery Data Using Independent Components Analysis—IOPscience. Available online: https://iopscience.iop.org/article/10.1088/1741-2560/3/3/003/meta (accessed on 15 September 2021).
  30. Martinez, A.; Benavente, R. The AR Face Database: CVC Technical Report #24; June 1998. Available online: https://www2.ece.ohio-state.edu/~aleix/ARdatabase.html (accessed on 24 November 2021).
  31. Gong, B.; Shi, Y.; Sha, F.; Grauman, K. Geodesic Flow Kernel for Unsupervised Domain Adaptation. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2066–2073. [Google Scholar]
  32. Brunner, C.; Leeb, R.; Müller-Putz, G.; Schlögl, A.; Pfurtscheller, G. BCI Competition 2008—Graz Data Set A. Inst. Knowl. Discov. Lab. Brain-Comput. Interfaces Graz Univ. Technol. 2008, 16, 1–6. [Google Scholar]
Figure 1. Sample images of the AR face dataset. (1a1m) and (2a2m) represent two groups of 13 face pictures from left to right.
Figure 1. Sample images of the AR face dataset. (1a1m) and (2a2m) represent two groups of 13 face pictures from left to right.
Entropy 24 00195 g001
Figure 2. Comparison of baseline and the SLDA regularized baseline in the subspace of 10–100 dimensionality on the face recognition task. (a) RKHS-DA and RKHS-DA+SLDA. (b) TCA and TCA+SLDA. (c) IGLA and IGLDA+SLDA. (d) TIT and TIT+SLDA.
Figure 2. Comparison of baseline and the SLDA regularized baseline in the subspace of 10–100 dimensionality on the face recognition task. (a) RKHS-DA and RKHS-DA+SLDA. (b) TCA and TCA+SLDA. (c) IGLA and IGLDA+SLDA. (d) TIT and TIT+SLDA.
Entropy 24 00195 g002aEntropy 24 00195 g002b
Figure 3. Sample pictures from the four datasets: (a) Caltech-256, (b) Amazon, (c) webcam and (d) DSLR.
Figure 3. Sample pictures from the four datasets: (a) Caltech-256, (b) Amazon, (c) webcam and (d) DSLR.
Entropy 24 00195 g003
Figure 4. Sample pictures from MNIST and USPS. (a) MNIST. (b) USPS.
Figure 4. Sample pictures from MNIST and USPS. (a) MNIST. (b) USPS.
Entropy 24 00195 g004
Figure 5. Comparison of the baseline and the SLDA regularized baseline in 30–150 dimensionality of subspace on handwritten numeral classification. (a) RKHS-DA and RKHS-DA+SLDA. (b) TCA and TCA+SLDA. (c) IGLDA and IGLDA+SLDA. (d) TIT and TIT+SLDA.
Figure 5. Comparison of the baseline and the SLDA regularized baseline in 30–150 dimensionality of subspace on handwritten numeral classification. (a) RKHS-DA and RKHS-DA+SLDA. (b) TCA and TCA+SLDA. (c) IGLDA and IGLDA+SLDA. (d) TIT and TIT+SLDA.
Entropy 24 00195 g005
Figure 6. Timing scheme of one trial.
Figure 6. Timing scheme of one trial.
Entropy 24 00195 g006
Figure 7. Comparison of SLDARKHS-DA(Sparse) and the various baseline methods on different test subjects in the BCI 2a datasets. The classification (by KNN) accuracies (In %) are shown here.
Figure 7. Comparison of SLDARKHS-DA(Sparse) and the various baseline methods on different test subjects in the BCI 2a datasets. The classification (by KNN) accuracies (In %) are shown here.
Entropy 24 00195 g007
Figure 8. Comparison of SLDARKHS-DA (Sparse) and the various baseline methods on BCI 2a datasets in different dimensionalities of the subspace.
Figure 8. Comparison of SLDARKHS-DA (Sparse) and the various baseline methods on BCI 2a datasets in different dimensionalities of the subspace.
Entropy 24 00195 g008
Figure 9. Visualization of the feature distribution in source domain by tSNE from subject A09. (a) The original feature distribution. (b) The transformed feature distribution obtained by SLDARKHS-DA (Sparse).
Figure 9. Visualization of the feature distribution in source domain by tSNE from subject A09. (a) The original feature distribution. (b) The transformed feature distribution obtained by SLDARKHS-DA (Sparse).
Entropy 24 00195 g009
Figure 10. Comparison of the baseline and the SLDA regularized baseline in face recognition with a subspace dimensionality of 10–100. (a) RKHS-DA and RKHS-DA+SLDA. (b) TCA and TCA+SLDA. (c) IGLDA and IGLDA+SLDA. (d) TIT and TIT+SLDA.
Figure 10. Comparison of the baseline and the SLDA regularized baseline in face recognition with a subspace dimensionality of 10–100. (a) RKHS-DA and RKHS-DA+SLDA. (b) TCA and TCA+SLDA. (c) IGLDA and IGLDA+SLDA. (d) TIT and TIT+SLDA.
Entropy 24 00195 g010
Table 1. Notation and description.
Table 1. Notation and description.
NotationDescription
X s , Y s Original/subspace source data
X t , Y t Original/subspace target data
L MMD matrix
λ , μ , γ , β Penalty parameters
K Kernel matrix
W Projection matrix
I Identity matrix
Table 2. PARAMETER SETTINGS.
Table 2. PARAMETER SETTINGS.
DSODSDNoNμλ1λ2λ3λ4
AR258010–1001110210−210−710−12
4DA800805110210−210−710−12
MNIST and USPS25630–1505110210−210−210−12
Reuters-2157894593 ± 20010–505110210−1510−710−9
BCI-2a28810–1105110−210−310−310−2
DS = dataset, OD = original dimensionality, SD = subspace dimensionality, NoN = number of neighbors in KNN, λ1 for SLDARKHS-DA, λ2 for SLDATCA, λ3 for SLDAIGLDA and λ4 for SLDATIT.
Table 3. Data distribution in the original space and subspace for face recognition.
Table 3. Data distribution in the original space and subspace for face recognition.
Original SpaceSubspace
TaskD (S, T)Var (S)Var (T)D (S, T)Var (S)Var (T)
1.f3182186720659× 10−353935561
2.f3060186720361 × 10−1254485624
D (S, T) = distance between the source domain and target domain, and Var = variance.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jiang, L.; Liu, S.; Ma, Z.; Lei, W.; Chen, C. Regularized RKHS-Based Subspace Learning for Motor Imagery Classification. Entropy 2022, 24, 195. https://doi.org/10.3390/e24020195

AMA Style

Jiang L, Liu S, Ma Z, Lei W, Chen C. Regularized RKHS-Based Subspace Learning for Motor Imagery Classification. Entropy. 2022; 24(2):195. https://doi.org/10.3390/e24020195

Chicago/Turabian Style

Jiang, Linzhi, Shuyu Liu, Zhengming Ma, Wenjie Lei, and Cheng Chen. 2022. "Regularized RKHS-Based Subspace Learning for Motor Imagery Classification" Entropy 24, no. 2: 195. https://doi.org/10.3390/e24020195

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop