2.1. Problem Description
When an SSVEP experiment is taking place, the subject is seated in front of a screen where visual stimuli are flashing in different frequencies. During the experiment, raw EEG data are collected in order to calibrate the overall system. The segmentation of raw EEG data (using event triggers) results in a set of trials for each visual stimulus (or class). Using these EEG trials, the experimenter can calibrate the BCI system (for example, by training the classifier). Let us assume that the SSVEP dataset is a collection of multi-channel EEG trials for each participant, where M is the number of trials of a SSVEP target, is the index of the SSVEP target, and is the number of SSVEP targets (or classes). Each is a matrix of , where is the number of channels and the number of samples. Additionally, we assume that the multi-channel EEG signals are centralized since, in practice, the EEG trials are bandpass filtered or detrended.
2.2. Canonical Correlation Analysis (CCA)
Spatial filtering attempts to maximize the SNR between the raw EEG data and the their spatially filtered version. In typical cases, such as bipolar combination or Laplacian filtering, the spatial filters are determined manually. However, this approach does not take into account any prior knowledge about SSVEPs or any subject-specific information. One of the first approaches that takes into consideration the structure of SSVEPs was based on Canonical Correlation Analysis (CCA) [
20]. CCA is a multivariate statistical method attempting to discover underlying correlations between two sets of data [
20,
38]. These two sets of data are assumed to be only a different view (or representation) of the same original (hidden) data. More specifically, CCA finds a linear projection for each set such that these two sets are maximally correlated in the hidden (dimensionality-reduced) space.
In the SSVEP problem, these two views are the test EEG trial
and the reference templates for
s-th stimulus
, where
,
is the frequency of
s-th stimulus
Typically, CCA methods maximize the linear correlation between the projections
and
, where
and
. At the end, we solve the following optimization problem:
Since
is invariant to the scaling of
and
, the above optimization problem can be also formulated as the following generalized eigenvalue problem:
where
is the eigenvalue corresponding to the eigenvector
. In order to find the stimulus of the test EEG trial,
, that the subject desires to select, we find features
for all available stimuli, and then, the stimulus-target,
c, is identified by finding the index of the maximum feature among
features:
. It must be observed here that there is no need for training (or calibration) since the templates
are artificially generated.
2.4. Adaptive Task-Related Component Analysis (adTRCA)
In our work, we propose a new generalized eigenvalue problem for SSVEP detection, which is described by the following equation:
where
C and
D are “filtering” matrices that act on the time dimension of the trials. The matrices
C and
D can be defined using various approaches, and their goal is to remove noise in time domain.
In our study, we make some critical assumptions about the generation model of SSVEP responses, which affect the data analysis procedure. More specifically, SSVEP responses contain strong sinusoidal components [
20]; hence, the SSVEP signal in each channel is modeled as a linear combination of sinusoids described the following matrix:
Additionally, SSVEP responses belonging to the same visual stimulus share common components. From the above, we can observe that the generation of SSVEP responses can be modeled as multiple regression tasks that share common information.
EEG trials from the
s-th stimulus are collected in matrix
,
, where each column of
contains the data from one channel or each column of
contains the data from one task. Hence, we have
tasks (the
i-th column of
). Each learning task can be described by the following linear regression model:
where
is a vector of weights (or parameters), and
is a vector of noise coming from a zero mean Gaussian random variable with unknown precision (inverse variance)
. We can observe that each of the mappings yield a corresponding regression task, and performing multiple of such learning tasks has been referred to as multi-task learning [
39], which aims at sharing information effectively among multiple related tasks. In a more abstract view of our problem, we can see that each learning task is a linear regression problem, and sinusoidal components from one regression task affect the fitting procedure of another regression task.
The likelihood function for parameters
and
is given by:
The parameters of a regression task,
, are assumed to be drawn from a product of zero-mean Gaussian distributions that are shared by all tasks. Letting
be the
j-th parameters for
i-th task, we have:
where the hyperparameters
are shared among
regression tasks; hence, data from all regression tasks contribute to learning these hyperparameters. To promote sparsity over parameters, we place Gamma priors over hyperparameters
[
39,
40]. In addition, the same type of priority is placed over noise precision
:
In addition, we can observed here that noise properties are shared among different tasks (i.e., the noise vectors in Equation (
6) are drawn from the same Gaussian distribution). Finally, it must be noted that we have a hierarchical model, and these types of models are natural to be “dealt” with within the Bayesian framework.
Given hyperparameters
and noise precision
, we can apply Bayes’ theorem to find the posterior distribution over
, which is a Gaussian distribution:
where
and
.
In order to find hyperparameters
and promote sparsity in parameters, the type-II Maximum Likelihood procedure is adopted [
40,
41], where the objective is to maximize the marginal likelihood (or its logarithm). In addition, a similar procedure is followed for noise precision. The marginal likelihood
is given by:
where
Differentiating
with respect to
and
and setting the results into zero [
39,
40,
41] (after some algebraic manipulations) we obtain:
where
is the
j-th element of
and
is the
j-th diagonal element of the covariance matrix
. The above analysis suggests an iterative algorithm that iterates between Equations (
12), (
13), (
15), and (
16), until a convergence criterion is satisfied. In addition, the same algorithm can be derived by adopting the EM framework and treating parameters
as hidden variables [
40]. Finally, based on the above Bayesian formulation, we can derive a fast version of the above algorithm. The fast version provides an elegant treatment of feature vectors by adaptively constructing the matrix
through three basic operators: addition, deletion, and re-estimation. More information on this subject can be found in [
39,
40].
Now, SSVEP components in each task can be represented as:
rearranging filtered EEG signals,
, each filtered EEG trial is represented as:
. Due to filtered trials, we find the spatial filters
by solving the following generalized eigenvalue problem:
where
, and
is a concatenated matrix contains all trials of
s-th stimulus,
. The above-generalized eigenvalue problem can be connected by that of Equation (
5). After some algebraic manipulations, Equation (
17) can be written as:
where
and
We can observe an interesting connection between the proposed method and the TRCA. When
, where
is the unitary matrix, the proposed approach degrades to the TRCA method. We see that the TRCA method is a limiting case of the proposed method. In addition, we can observe that matrices
C and
D act on the time dimension of the EEG trials; hence, the time samples are treated differently according to the time dimension rather than equally weighted. In addition, we can observe that filters, represented by matrix
C, are adapted to the statistical properties of the EEG trials. A more detailed comparison of adTRCA with TRCA and CORCA is presented in
Table 1. Finally, after finding the spatial filters, to find the target of the test trial,
we apply the following discriminant function: