3.1. Data Transformation
Existing TDA-based approaches for analyzing EEG signals extract the topological features from signals either in the original time domain or frequency domain via Fourier transform. Scalp EEG signals are assumed to be stationary in many methods, but temporal drift and experimental manipulations may introduce non-stationarities. In other words, the frequency and power of the signals change over time. Fourier transform has limitations for revealing the frequency information of EEG signals because it estimates a constant power for each frequency during the entire time span. To better capture the frequency and power shift over time, we leveraged the Hilbert–Huang Transformation (HHT), proposed by [
18], to reveal the dynamic time–frequency representation of the EEG signals.
Different from the Fourier or wavelet transform, HHT is a data-driven approach. It first decomposes the signals into sub-signals, named Intrinsic Mode Functions (IMFs), via Empirical Mode Decomposition (EMD), and then reveals the time–frequency information of each IMF via Hilbert Transform (HT). Specifically, we denoted the signal from one channel during one trial as
. The EMD decomposes the signal into a collection of IMFs, denoted as
, via a sifting process [
18]. The instantaneous frequency
and instantaneous amplitude
of each IMF is then revealed by HT.
For scalp EEG signals, the dynamic frequency and power revealed by HHT are essential information because they are used to reveal the subtle changes in the underlying dynamic process in the brain. Moreover, due to the sifting process in EMD, the IMFs are in descending frequency bands, with the first IMF carrying the highest frequency sub-signal. Considering the frequency bands of brain oscillations, in this study, we chose the first four IMFs that cover delta, theta, alpha, and beta waves.
We denoted the EEG signals during each trial as
, with each row
being length T signals from each of the N channels. HHT was applied on each
, and the resulting instantaneous frequency and amplitude of the first four IMFs are denoted as
and
, respectively, where
.
Figure 2 shows an example of the time-varying frequency and power (i.e., square of the amplitude) of the first four IMFs under two different tasks. The color represents the instantaneous amplitude of the corresponding instantaneous frequency. By organizing the dynamic frequency and power information from each channel in a matrix form, we obtained one frequency matrix, denoted as
, and one power matrix, denoted as
, for each IMF, resulting in a total of eight matrices. All matrices are
, with each row being
or
. In the following, TDA features are extracted from the eight matrices that represent the dynamic time–frequency information of the signals.
3.2. Persistent Homology and Vietoris–Rips Filtration
An abstract simplicial complex K over a finite set S is a collection of nonempty subsets of the set S, such that and, for any two subsets , of S, implies that . Every abstract simplicial complex has a geometric realization, which is a set of points, line segments, triangles, and general n-simplices. An n-simplex (n-dimensional simplex) is the convex hull of many affine independent points, making it naturally a geometric object. The dimension of a simplicial complex is the maximum of the dimensions of its simplices. For instance, any graph is a one-dimensional simplicial complex, with G being the set of points and E being the set of 1-simplices. Consequently, graphs capture the pairwise relations between the points, and simplicial complexes yield the higher-order relationships among the points.
For any metric space X and scale
, the Vietoris–Rips complex VR
is the complex with simplicies to be the nonempty finite subsets of X of diameter
. In TDA, the metric space X is the data sampled from some unknown topological space Y and one would like to use the state space X to uncover the topological features of the unknown space Y. The idea behind TDA/PH is to investigate the ‘shape’ of the complex VR
using homology as the scale r varies from small to large and trust the persisting topological features to be the representative of the topological properties of Y. This idea is supported by a fundamental result, the nerve theorem [
19], which states that the nerve complex of a nice open cover of a topological space Y is homotopic to Y. This theorem shows that the topological features of a space Y is encoded in finite abstract combinatorial structures built on the space Y.
In algebraic topology, (co)homologies are used to capture the topological properties of simplicial complexes such as the numbers of components, loops, and voids, etc. In the following, we provide a brief overview of the computation process for homologies with coefficients in , which is the field with two elements, 0 and 1, where .
Let K be a finite simplicial complex. We fix an integer
. We define
to be the vector space on the field
with a base to be the collection of p-simplicies. Furthermore, we define a boundary map
from
to
to be the linear extension of the intuitive boundary map on any p-simplices. The kernel of
, denoted by
, is the collection of p-cycles and the image of
, denoted by
, is the collection of p-boundaries, i.e., the ‘filled’ p-cycles. Then, the
simplicial homology group with coefficients in
is defined to be
Therefore, the dimension of the vector space gives the count of the ‘unfilled’ p-dimensional ‘hole’ in K, which is also called the pth Betti number. Topologically, a one-dimensional ‘hole’ is a circle and a two-dimensional ‘hole’ is a void. The dimension of the 0th homology group gives the number of connected components in the simplicial complex K.
To extract a filtration of simplicial complexes from a state space X, Vietoris–Rips filtration was employed. It starts with expanding each point in the state space to a disk with a radius of zero. The radii of the disks grow uniformly, and then the procedure ends when they reach a predetermined value. The predetermined value in our calculation is the one such that the resulting simplicial complex loses all the topological structures, i.e., homotopy equivalent to a singleton. For each radius
with
, we obtain a Vietoris–Rips complex VR
, denoted by
. This yields a filtration of nested simplicial complexes,
with
The persistent homology of this filtration of simplicial complexes
is the homology groups
connected by the mapping
, where each
is the linear transformation from
to
induced by the inclusion map. For each homology dimension
, we obtain a finitely generated persistence module
over the field
:
By the structure theorem [
20], such a persistence module can be decomposed to a direct sum of interval modules
with
in the following form:
where 0 is the trivial groups and id represents the identity map. Each of the interval modules represents a topology structure that persists in the interval
. We denote the collection of such intervals as
. So, we obtain the the following decomposition of the persistence module:
In this decomposition, topological invariants (components, circles, voids, etc) persist in these Vietoris–Rips complexes on the corresponding intervals. Hence, each
p-dimensional interval module corresponds to one of these topological invariants. These intervals are called p-dimensional Betti intervals, in the form
, which defines the scales at which a
p-dimensional hole appears in the simplicial complex
and dies in the simplicial complex
. These topological features are not observable through the analytic approach with a fixed scale. We then denote the persistent barcode to be the collection of all these intervals. Then, the Betti curve of dimension
p is defined to be
where w is the pre-chosen weight function and
is the characteristic function, i.e., its value is 1 if
, otherwise 0. If the weight function
is the constant function 1, then the value of the Betti curve at
r is the number of Betti intervals containing the scale
r. It is not hard to see that if the data are sampled from a line segment and the weight function is chosen to be the constant 1 function, then the area of the Betti curve at dimension 0 is exactly half of the length of the line segment. Hence, the areas of the Betti curves are closely related to the topological/geometric properties of the underlying space. In our proposed approach, the constant function 1 is chosen to be the weight function for the Betti curves.
To illustrate the process, we discuss two simple examples (A) and (B) in
Figure 3. Example (A) contains three data points with the only zero-dimensional persistent homology being nontrivial (as shown in the figure, only red barcodes,
, appear in the persistent barcode of (A)); and Example (B) contains five data points whose filtration has nontrivial zero- and one-dimensional persistent homology (both
and
appear in the persistent barcode as shown in the figure). A graphical representation of the persistent barcode of the datasets is also included and it is associated with the filtration of Vietoris–Rips complexes.
Persistent homology possesses a crucial property [
21], wherein the persistence barcodes from Vietoris–Rips filtrations demonstrate remarkable stability when the data are contaminated. Specifically, the distance between the persistence barcodes (bottleneck distance) obtained via applying persistent homology on two datasets in a given dimension p is controlled by the distance of these two datasets (Gromov–Hausdorff metric). The computation of persistent homology is carried out using Python 3.12.1 [
22] named Giotto-tda, which allows the users to perform Vietoris–Rips filtration analysis along with a time-delayed embedding of time series.
3.3. TDA Feature Extraction
The existing TDA-based frameworks extract topological features from the signal or transformed signal recorded in each channel, respectively, without considering the spatial information across channels. To overcome this limitation, we consider the transformed data across different locations as an approximation of some curves in a Banach space and further extract their local topological structures as follows.
Let
M denote a compact manifold and
denote the collection of all real-valued continuous functions, which is a Banach space when it is equipped with the supreme norm. We consider the true brain signals during a time period as a function from a time interval to the Banach space
with
M representing the scalp, a compact two-dimensional manifold. The inherent brain signals exhibit continuity both temporally and spatially, meaning that signals are present throughout the scalp at any given moment. Hence, the true brain signals can be considered as a curve in the Banach space
. The scalp EEG signals are recorded via electrode channels placed at certain locations on the scalp with a specific sampling frequency. Therefore, the recorded brain signals (after pre-processing) can be considered as an approximation of the true brain signals, effectively capturing both temporal and spatial characteristics. The electrode channels act as an approximation of the entire scalp, and the signals are recorded discretely at specific time points determined by the sampling rate. Thus, the recorded scalp EEG signals can be considered as an approximation of the curve. The same idea can be applied to the instantaneous frequency
and instantaneous amplitude
, which are the transformed signals obtained in
Section 3.1. For illustration purposes, we use
as an example to discuss the following TDA feature extraction.
We recall that
is a multivariate time series:
where T is the length of the time series and N is the number of channels. To obtain local topological properties, we first divide
into shorter time segments with the length being
. We denote the number of time intervals as L, i.e.,
is a number equal to or slightly less than T. The data during each time segment,
with
, are considered as a finite metric space with
points in the Euclidean space
.
Now, we consider
with a fixed
ℓ such that
. We build the Vietoris–Rips filtration VR
at scales
, and then compute the homology groups of the filtration objects VR
at scales
with the homology dimension p being 0 and 1 using the package Giotto-tda [
22]. These groups generate the persistent barcodes of the filtration objects, consisting of intervals in the form
, which represent the birth and death of some topological structure, either a connected component (zero-dimensional homology) or a loop (one-dimensional homology). To extract the TDA features, we calculate the areas
of
p-dimensional Betti curves of the persistent homology of
with
. The weight function used in the Betti curves is the constant function 1. Hence, the areas
can also be calculated as the sum of all the life spans of each topological structure in the filtration objects, i.e.,
where
is the collection of intervals in a persistent barcode of the segment
, which correspond to the existences of
p-dimensional topological structures during the filtration. A summary of this TDA feature extraction process is given below Algorithm 1.
Algorithm 1 TDA feature extraction. |
Input and time interval length .
1. Time Segment Divide into a collection of with length of the time intervals to be satisfying that . 2. Vietoris–Rips Filtration Build Vietoris–Rips complexes VR by the collection of subsets of whose diameter is less or equal to some given . 3. Persistent Homology Compute homology groups of the filtration objects VR with and homology dimensions being 0 and 1; then, obtain the persistent barcodes consisting of Betti intervals in the form . 4. TDA Features Calculate the areas of p-dimensional Betti curves of each persistent homology on the dataset with as
where is the collection of pth dimensional Betti intervals in the persistent barcodes of for .
Return , and . |
With the time interval length being
, the total number of TDA features
extracted from
is
. For each of the eight matrices,
and
with
obtained in
Section 3.1, we extract the TDA features with the same process, resulting in a total of
TDA features for each trial. To better capture the intrinsic local topological properties of the transformed signals, we use three different
, 8 ms, 100 ms, and 200 ms, and denote the corresponding number of shorter time segments as
,
, and
. Therefore, the total TDA features extracted for each trial is
.
Besides the proposed method, there are other widely used TDA approaches in the literature that we will compute and compare with. Therefore, we briefly introduce them in the following.
In the existing literature, there are two different ways to create the state space for EEG data, spatial embedding and time-delay embedding. For spatial embedding, a signal from each channel is considered as a point in the state space. Since there are only three channels in Data 1, TDA features from the state space obtained by spatial embedding do not provide any valid information about Data 1. Based on Taken’s embedding theorem (see [
12,
13]), time-delay embedding is a very useful way to reconstruct state space for the single signal from one channel. One can reconstruct the state space of each signal independently with time-delay embedding in the following way. For a time series
, an embedding with time delay
and embedding dimension d is a mapping
from
to
such that, for each
t,
The key to successfully reconstructing the state space is to choose the parameters, time-delay, embedding dimensions, and the time interval length. Selecting parameters arbitrarily can distort the reconstruction of the state space, leading to the obscuring of underlying assets and the highlighting of noise. The false nearest neighbor (FNN) test firstly proposed in [
23] has been widely applied in determining the proper embedding dimension of a nonlinear system [
24]. The authors in [
5] use FNN to determine the best parameters for the TDA approach to reconstruct the state space using channel-wise time-delay embedding for original EEG data and find out that time delay
and embedding dimensions
and 5 with time interval lengths of 100 or 250 have the best performance. Hence, there are four different sets of parameters that perform best. We adopt all of them to extract the corresponding TDA features.
To illustrate the channel-wise time-delay embedding TDA approach, we use the time delay
, embedding dimension
, and time interval length of 100 as an example. Let
denote the original EEG signal from one channel during one trial. Then, we apply time-delay embedding
with
and
to reconstruct the state space
in
. We notice that, in this case, the reconstructed state space
is a
matrix, i.e.,
Then, we divide the reconstructed state space
into segments
of equal time window 100 with
and
. For each
ℓ with
, we build the Vietoris–Rips filtration, compute their homologies, and obtain the persistent barcodes of dimensions 0 and 1. We group the Betti intervals into
for
and
which contains all the pth dimensional Betti intevals in the persistent barcodes of the state space
.
We adopted two different methods (A) and (B) in the literature to extract TDA features from each Betti interval collection of the persistent barcodes, or, equivalently, the Betti curve with the pre-chosen weight function being the constant function 1, with and :
Method (A) extracts TDA features in the same way as Algorithm 1. We calculate the areas of the Betti curves
mentioned above, and equivalently, the sum of the life spans of each p-dimensional topological structure in the Vietoris–Rips filtration of
for
and
. The areas of the 1-dimensional Betti curve for each state space are used in [
7] to detect delirium through BSEEG with different embedding dimensions, time delay, and time windows. Here, we adopted the best parameters according to [
5] in the time-delay embedding process.
Method (B) extracts the p-dimensional Betti number of the Vietoris–Rips complex VR
at a pre-chosen set of scales for homology dimensions
. The scales are chosen as a fraction of a scale R such that the complex VR
is contractible. Following the procedure in [
25], the scale R is determined as follows: we choose an arbitrary point
in
and take R to be the maximum of
, which is also called the radius of the spherical volume of the corresponding state space. Then, we extract the TDA features of
as the values of the Betti curve
on the complex VR
for
at the scales
, i.e., the one-dimensional Betti number of each Vietoris–Rips complex VR
at the corresponding scales [
5].
We suppose that there are many collections of Betti intervals with dimensions 0 and 1. Then, Method (A) extracts number of TDA features, while Method (B) offers as many TDA features. These two methods were applied on the EEG data as well as the transformed data, and compared with our proposed approach.
3.4. Classification
The classifier was constructed for each subject, respectively, predicting the label (i.e., MI in the BCI or tasks in the lab experiment) of the test trial. For each subject during each trial, TDA features were extracted using the proposed TDA approach. The number of features varied depending on the methods for TDA feature extraction. The classifiers we considered include both linear and nonliear models: logistic regression with LASSO (Logistic), Linear Discriminate Analysis (LDA), Support Vector Machine (SVM) with linear and RBF kernel, K-Nearest Neighbors (KNN), and Random Forest (RF). For the machine learning models (SVM, KNN, and RF), recursive feature elimination (RFE) was employed to select the most relevant features for the classifiers. For the logistic regression, no extra feature selection was performed because LASSO preforms both variable selection and regularization.
The comparison was carried out in two phases: different classifiers were firstly compared using the same TDA features, then the best classifier was used to compare various TDA approaches including the proposed approach and other state-of-art TDA approaches in the literature. The performance of classifiers was assessed by accuracy and Cohen’s kappa values. Specifically, accuracy measures the proportion of correct predictions among the total number of predictions. Kappa is a statistic that measures the the agreement between the prediction and the observed classes while considering the possibility of the agreement occurring by chance. It is computed as , where is the proportion of agreement between the predicted and observed classifications and is the hypothetical probability of agreement by chance. The kappa ranges from −1 to 1 with 1 indicating perfect agreement between predictions and observations. In short, for both accuracy and kappa values, a higher value indicates better classification performance.
When assessing the performance of classifiers, repeated five-fold cross-validation (CV) was used for all datasets. Regarding the BCI data (Data 1 and 2), an additional assessment was conducted on the test data, which were provided by the competition organizers. The classifiers submitted by all teams were evaluated using this test data with the performance ranked on a leader board. Hence, for BCI data, repeated CV was applied on the training data and the classifiers that were re-trained using all training trials were also tested on the test trials. The performance of the testing data was compared with the top-performing teams on the leader board as well as the results reported in the literature.