MBPPE: A Modular Batch Processing Platform for Electroencephalography

Qiu, Jinggong; Chen, Ming; Feng, Guofu

doi:10.3390/app14020770

Open AccessArticle

MBPPE: A Modular Batch Processing Platform for Electroencephalography

by

Jinggong Qiu

,

Ming Chen

^*

and

Guofu Feng

College of Information, Shanghai Ocean University, No. 999 Huchenghuan Road, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(2), 770; https://doi.org/10.3390/app14020770

Submission received: 29 December 2023 / Revised: 9 January 2024 / Accepted: 13 January 2024 / Published: 16 January 2024

Download

Browse Figures

Versions Notes

Abstract

:

To ensure the accuracy and reliability of subsequent analysis, research on electroencephalogram (EEG) signals typically requires preliminary processing of large datasets to eliminate noise and artifacts. Traditional batch processing methods require substantial hardware resources while lacking flexible automated workflows and user-friendly interactions. To address these challenges, we have implemented a modular batch processing platform for EEG (MBPPE) that offers both local execution and private deployment options to meet the demands of efficient signal processing from individuals to laboratories. We modularize the processing methods and organize them into pluggable multi-task batch processes, providing asynchronous processing solutions. In addition, we extend user functions by introducing plugins and promoting collaborative interaction through data sharing, access control, and comment communication. Simultaneously, interactive features are integrated into the visualization design, enabling users to process and analyze data more intuitively and naturally. Currently, the platform integrates several commonly used data preprocessing and analysis techniques, providing a novel solution for batch processing of EEG signals.

Keywords:

EEG; signal processing; modularization; batch processing; collaboration

1. Introduction

In studying the brain’s functional mechanisms in biomedical research, brain electrical activity plays a crucial role in understanding its intricate processes [1,2]. Notably, the brain–computer interface (BCI) constructs a new type of communication control channel through EEG signals generated by neural activity, which is not limited to traditional peripheral neural or muscular channels [3]. EEG signals, transcending individual subjective control with their objectivity, ensure high-quality and reliable information transmission. Moreover, owing to their outstanding temporal resolution, these signals serve as an effective means of measuring neuronal activity.

Due to their inherent temporal variability, non-linearity, and non-stationarity, EEG signals manifest as complex physiological electrical signals. During data acquisition, EEG signals are vulnerable to interference from physiological and non-physiological artifacts, which complicates the signal processing [4]. Non-physiological artifacts, including power line interference and environmental noise, can be mitigated by optimizing experimental conditions and employing filtering techniques. In contrast, physiological artifacts, stemming from bodily activities, are unavoidable during data collection. Their waveform similarities and overlap with EEG data pose significant challenges for removal, making them the primary concern of many EEG artifact elimination algorithms. The typical analysis workflow for EEG signals includes data preprocessing and feature extraction [5]. As a result, researchers focused on algorithm-based EEG signal processing must possess expertise in neuroscience and programming.

Research in fields related to EEG signal processing is intensifying due to the escalating demand. As a response, development toolkits for various programming languages, such as the MNE [6] based on Python, have been developed. These toolkits provide systematic data processing and analysis functions, encapsulating complex processing steps within interfaces. Although these interfaces aid developers in simplifying the programming process, crafting processing workflows for diverse scenarios continues to be a challenging task.

Currently, to simplify the process of EEG signal processing and analysis, some standardized processing pipelines, such as the Harvard automated processing pipeline for electroencephalography (HAPPE) [7] and a dual EEG pipeline for developmental hyperscanning studies (DEEP) [8], are being explored and implemented. These pipelines provide efficient artifact removal methods for EEG data in specific scenarios to meet subsequent research needs. However, due to the variability of EEG signals, standardized processes cannot feasibly solve all scenarios permanently. Although EEGLAB [9] can create custom pipelines using plugins, this essentially involves executing multiple processing tasks in the script. This approach has limitations in task monitoring and management, particularly when dealing with complex multi-task scenarios. Moreover, the efficiency of these offline processing schemes is constrained by the performance of the user’s computer. Consequently, hardware performance may become a constraint in efficiency when numerous EEG signal processing tasks are being dealt with.

On the other hand, a web-based brainformatics platform of the computational ecosystem for EEG big data analysis (WeBrain) [10] is a high-performance computing platform inspired by big data and cloud computing technologies. It is designed for large-scale EEG data cloud storage, processing, and analysis, offering an online resolution to computational performance issues. The platform assigns user roles into three categories: ordinary users, administrators, and developers. In this system, administrators handle the allocation of computing and storage resources, while developers offer a variety of signal processing solutions. This cloud computing-dependent strategy effectively reduces the hardware performance pressure resulting from large-scale data processing. However, the complexity of the platform’s architecture and personnel roles makes privatized deployment challenging. As a result, users are required to upload their data to public servers, posing a risk of data privacy infringement. Moreover, the platform can only execute processing methods sequentially, not capable of establishing a continuous pipeline task. Despite being provided by developers, the scalability of these processing methods remains restricted for individual users.

To address the limitations identified in existing EEG processing platforms, we have restructured the system based on a modular design principle. The system functions are organized into two main modules: an interaction module and a data processing module. This design allows the system to adapt to both offline use and private deployment requirements. In constructing batch processing, each processing method is seen as an independent unit, enabling users to combine them flexibly as per the actual needs, forming task sequences that meet specific scenarios. To enhance scalability, users can upload custom processing methods as plugins for pipeline construction options. Given the multi-user processing requirements of laboratories, we introduced a panel for multi-task monitoring and management. To enhance the efficiency of multi-user collaboration, we incorporated features such as data sharing, permission management, and comments. Furthermore, we improved the visualization mechanism; unlike the traditional static views, MBPPE offers interactive functions. These functions enable users to make fine adjustments to the rendering results, offering a more intuitive and efficient interaction experience.

2. Materials and Methods

2.1. System Architecture Overview

The system architecture adopts a two-tier structure, consisting of a User Layer and a Server Layer, as depicted in Figure 1. The User Layer developed using Vue.js, focuses primarily on user interaction. In contrast, the Server Layer, constructed and operated with Python, specializes in data processing tasks. Data exchange between these layers occurs via the hypertext transfer protocol (HTTP).

In the framework, the Interface Driver takes on the responsibility of converting user operation instructions for diverse modules into HTTP requests. The Router Adapter receives these requests and verifies access rights. Upon meeting the access control, the Router Adapter distributes requests to the corresponding processing functions, following routing rules. After processing, the Server Layer packages the results into HTTP responses and sends them to the User Layer. Within the system modules, both the Process and Pipeline components utilize the processing methods of the Process Tools, but they serve different usage scenarios. The Pipeline module establishes an automated workflow by initiating asynchronous sub-threads on the server side to complete all preset tasks, rendering it well-suited for continuous and batch processing tasks. In contrast, the Process module is tailored for single processing tasks and operates synchronously, waiting for the results of the current task before proceeding. This synchronous operation gives the Process module an advantage in handling tasks necessitating real-time feedback.

The operational modes, as shown in Figure 2, facilitate both online and offline data processing. It seeks to integrate the efficiency of cloud computing with the convenience of integrated platforms, thus achieving a balanced set of benefits.

The offline operational mode resembles traditional integrated platforms, allowing users to process EEG data directly on their local computers. The hypertext markup language (HTML) pages built by Vue.js are encapsulated within the Electron framework, allowing for cross-platform application support. Upon program startup, the Server Layer initiates locally as a subprocess. Data exchange relies on the local loopback network, managed by the machine’s loopback adapter. Consequently, in most scenarios, the CPU performance primarily dictates data transmission speed, ensuring that transmission performance is not a limiting factor. Local hardware capabilities are crucial for both data processing and user interaction performance in offline mode. Therefore, a certain level of hardware performance is required to process intricate EEG data smoothly.

Conversely, the online operation mode takes inspiration from cloud computing. In this model, high-performance computers manage data processing tasks, leaving the local system dedicated solely to user interaction and data visualization. This mode supports the concurrent processing of tasks by multiple users, requiring only that users establish a connection between their local machine and the remote server. As we utilized HTML for the development of the user layer, servers that have been deployed can be accessed directly through a web browser. In contrast to other distributed systems, MBPPE grants users full control over system resources, eliminating the necessity for role segmentation. This suggests that the online mode may offer appropriate private deployment solutions for EEG data to laboratories and individual users. Compared to publicly available EEG processing platforms, private deployment can provide enhanced data security and minimize the risk of data leakage.

2.2. Data Formatting

Given the diverse data types and structures from various acquisition devices, we apply a specific formatting process to the input data to standardize its structure, optimizing subsequent EEG data processing and analysis. MBPPE supports direct reading of standard format mat files and NumPy [11] array files. Upon analyzing the data file, the time series for each channel is independently extracted and stacked, thereby generating a two-dimensional matrix (channels, samples). In this structure, each matrix row represents an EEG signal channel, and each column corresponds to a sample point. For instance, a

62 \times 200

matrix indicates data comprising 62 channels and 200 sample points. During data acquisition, the duration of samples may vary across channels, potentially resulting in EEG time series of unequal lengths that are unfit for matrix formation. To resolve this complication, we introduce two adjustment methods for users to standardize the time series length: one involves trimming the data based on the shortest sampling duration; the other uses the longest sampling duration as a reference to interpolate data at the ends of shorter sequences. MBPPE offers filling approaches such as zero padding, specified value padding, or mean value padding. Figure 3 depicts the data formatting process using zero padding. Although this change may modify the original data structure, it improves the efficiency of subsequent matrix operations.

Moreover, all the data are temporarily stored in memory, enabling users to save the data locally in a standardized mat or NumPy array format according to their requirements. This strategy minimizes unnecessary input/output (I/O) overhead, increases system efficiency, and reduces storage requirements.

2.3. Preprocessing

During EEG signal acquisition, various interferences, including environmental noise and physiological artifacts, can potentially affect the collected results. While standardizing the experimental environment can mitigate noise impact, physiological artifacts stemming from normal activities remain unavoidable [12]. Consequently, we have included several commonly used preprocessing to minimize the potential influence of these disturbances on the analysis results.

2.3.1. Filter

In the preprocessing of EEG signals, filters serve as an efficient tool for artifact removal, particularly when artifacts and EEG signals reside in distinct frequency bands [13]. MBPPE provides two types of filters: finite impulse response (FIR) filters and infinite impulse response (IIR) filters. The design process for the FIR filter involves the window function method, using the Hanning window by default, while the IIR filter uses a Butterworth filter. Both filters offer users the flexibility to customize parameters like filter frequency and order. They also provide three basic filtering methods: band-pass, low-pass, and high-pass.

In terms of implementation specifics, the FIR filter coefficients are calculated via the firwin method in the SciPy [14] library. Conversely, the Butterworth filter coefficients are generated through the SciPy library’s butter method. Subsequently, the filtfilt method is utilized for bidirectional data filtering based on these filter coefficients. This involves applying a linear digital filter twice, once forward and once backward.

It is noteworthy that the advantages of FIR filters lie in their intrinsic stability, and they do not introduce phase distortion, thereby effectively preserving the temporal information of events. Conversely, the IIR filter, while requiring less computation, does not offer the same level of stability.

2.3.2. Independent Component Analysis

In biomedical engineering, independent component analysis (ICA) is considered a critical technique for blind source separation [15]. Under typical conditions, EEG data collected in research represent a blend of various signals. These signals encompass not only the brain’s activity but also ocular artifacts, cardiac artifacts, noise due to poor electrode contacts, and external electrical interference. Given that artifacts, such as ocular and cardiac components in EEG signals, are produced by independent sources, ICA serves as a powerful tool to model EEG signals and distinguish different components originating from a single signal source. Using ICA to separate the original data enables the removal of artifacts present in the independent components, yielding clean EEG signals [16]. It is noteworthy that this method, when reconstructing EEG signals, not only preserves the temporal resolution but also enhances its spatial characteristics by identifying as many as several dozens of independent EEG signal sources active in different time segments, as well as their scalp projections [17].

MBPPE incorporates the fast independent component analysis (FastICA) module from the sklearn library for ICA. Compared to traditional ICA algorithms, FastICA employs parallel and distributed processing strategies. It analyzes and decomposes data based on maximizing non-Gaussianity, thereby reducing computational complexity and utilizing memory more efficiently [18]. In this implementation, FastICA adopts whitening, which streamlines computational demand and enhances the interpretability of the independent components. Employing the logcosh function as the default loss function confers high robustness, rendering it adept at processing complex data.

2.3.3. Resampling

To ensure high temporal resolution of EEG signals, researchers typically opt for a sampling rate of 1 kHz or higher during the data acquisition phase. However, in subsequent analysis, such a high sampling rate imposes significant demands on storage and memory. For studies less sensitive to temporal resolution, downsampling strategies can effectively conserve computational resources. Moreover, resampling techniques are extensively employed to address inconsistencies in sampling rates among different datasets, thereby preventing potential impacts on the predictive performance of machine learning algorithms [19].

In this study, we implemented a resampling method for a polyphase filter using the resample_poly method from the SciPy library. During execution, EEG data is upsampled by an up factor, followed by the application of a zero-phase low-pass FIR filter to reduce additional phase distortion and maintain the key features of the original signal. The data are then downsampled by a down factor. The final sampling rate is the

\frac{u p}{d o w n}

multiple of the original sampling rate. The sampling algorithm used in this process adheres to the description provided by Vaidyanathan [20].

By default, we take the user-input sampling rate as the upsample factor and the original sampling rate as the downsample factor. The calculation method of the final sampling rate is shown in Equation (1) where o represents the original sampling rate,

u p

represents the upsample factor,

d o w n

represents the downsample factor, and n represents the target sampling rate.

\begin{matrix} resample_poly (o, up, down) & = \frac{u p}{down} \cdot o \\ = \frac{n}{o} \cdot o \\ = n \end{matrix}

(1)

This method enhances practical applications by permitting users to input the target sampling rate directly to accomplish resampling. Compared to resampling using the fast Fourier transform (FFT), this approach circumvents time–frequency transformation, offering a boost in computational efficiency.

2.3.4. Re-Referencing

During EEG data acquisition, the system records the potential difference between each electrode position and a reference point. However, determining the optimal reference point still lacks a consensus [21]. In MBPPE, to establish a consistent reference baseline and reduce bias from individual channels, we enable users to use the average reference (AR) method. This method references all EEG channels to a common average potential [22]. Additionally, to accommodate referencing strategies like the Linked-Mastoids/Linked-Ears method, our platform allows users to select specific channels as the reference.

2.4. Feature Extraction

Feature extraction is a pivotal step in data analysis, aiming to distill core information from the raw data to reduce its dimensionality. This process can enhance the model’s performance and interpretability while reducing computational complexity and information processing costs [23].

2.4.1. Power Spectral Density

Power spectral density (PSD) is a spectral analysis method extensively employed in EEG signal processing, which can depict the energy distribution across various frequency components within the signal [24]. The PSD is a widely employed method in EEG signal processing that elucidates the energy distribution across various frequency components of a signal. This analysis allows for the extraction of salient neural activity features from EEG signals, enabling researchers to identify prominent frequency components. This information is instrumental for investigating the relationships between EEG signals and neural activity, thereby improving the identification and classification capabilities of BCI systems.

In the context of MBPPE, the Welch method is employed for PSD computation, demonstrating superior performance in PSD estimation [24]. To ensure frequency resolution, the original signal is divided into N overlapping segments of specified length to ensure precise frequency resolution. To mitigate the impact of abrupt changes at the signal boundaries on the spectral analysis, we apply a window function to each segment, which effectively smooths the signal. The Welch method, by averaging the spectra of these overlapping windows, excels in reducing noise and random fluctuations, thus boosting the stability of the PSD estimation. Additionally, we apply a logarithmic transformation to the Welch method’s output, refining data visualization by compressing the signal’s dynamic range.

2.4.2. Differential Entropy

Differential entropy (DE) is grounded in the concept of information-theoretic entropy and aims to measure the amount of information within a random variable. Duan [25] demonstrated that in emotion classification tasks based on EEG, DE exhibited higher accuracy compared to traditional energy spectrum features. The DE algorithm adopted by the MBPPE is based on related studies by Shi [26]. They found that when EEG signals were band-pass filtered in the range of 2 Hz to 44 Hz with a step size of 2 Hz, and then subjected to the Kolmogorov–Smirnov test, the probability of each sub-band signal conforming to a Gaussian distribution exceeded 90%. In a fixed-length sequence of EEG data, the differential entropy estimation for a specific frequency band I corresponds to the logarithmic energy spectrum of that band, as outlined in Equation (2).

\begin{matrix} h_{i} (X) & = \frac{1}{2} log (2 π e σ_{i}^{2}) \\ = \frac{1}{2} log (N σ_{i}^{2}) + \frac{1}{2} log (\frac{2 π e}{N}) \\ = \frac{1}{2} log (P_{i}) + \frac{1}{2} log (\frac{2 π e}{N}) \end{matrix}

(2)

Herein,

h_{i}

and

σ_{i}^{2}

denote the differential entropy and variance of the EEG signal within frequency band i, respectively.

P_{i}

can be viewed as the energy spectrum, equivalent to the product of

σ_{i}^{2}

and a constant N (N is the length of the fixed time window).

The design of MBPPE reflects this relationship, implementing differential entropy calculations via the logarithmic energy spectrum approach. First, we cut a signal segment every second according to the sampling rate using a Hanning window. Then, we perform an FFT for each signal segment, converting the EEG data into a frequency-domain representation. After that, we focus on five frequency ranges: delta (1–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), beta (13–31 Hz), and gamma (31–50 Hz). For these frequency ranges, we use Parseval’s theorem to calculate the energy values of each segment separately, and the differential entropy is the log representation of these energies.

Notably, for input data of size

n \times m

, the output adopts a three-dimensional structure of

n \times (m \div w i n d o w) \times 5

. Here, n denotes the number of channels, m signifies the length of the time series data,

w i n d o w

represents the fixed length into which the EEG data is segmented (by default, equal to the data sampling rate), and 5 corresponds to the five primary frequency bands: delta, theta, alpha, beta, and gamma.

2.4.3. Wavelet Analysis

In EEG signal analysis, frequency decomposition, and time–frequency signal transformation play pivotal roles. While the conventional Fourier transform provides a frequency-domain representation, its fixed time–frequency resolution limits its analysis capabilities for non-stationary signals [27,28]. In contrast, wavelets offer greater adaptability for time–frequency analysis [29]. MBPPE employs the PyWavelets library in Python for wavelet analysis. Herein, we map EEG signals from the time domain to the frequency-domain using wavelet packet transform and undertake time–frequency analysis utilizing the continuous wavelet transform.

During frequency feature extraction, we employ the Daubechies 4 as the base function for wavelet packet decomposition on each channel, enabling frequency bandwidth calculation. Subsequently, we iterate through each frequency sub-band, classify them into user-defined frequency ranges, and assign them to the corresponding frequency bands.

The continuous wavelet transform offers a continuous representation of time and frequency, making it especially suitable for the analysis of non-stationary or transient signals. During the transformation phase, we harmonize the time–frequency resolution by modulating the wavelet scale. Subsequently, we extract frequency and frequency energy from the transformed data, establishing a mapping relationship among time, frequency, and energy to enhance visualization. To optimize the visualization impact, we assess the frequency energy within a specific timeframe while extracting time–frequency information. The peak energy in each period provides a reference for stratified coloring in the subsequent visualization phase.

2.5. Plugins

To augment the platform’s extensibility, we offer a user interface for uploading and managing custom methods. These methods, existing as plugins, are essentially Python-based modules that allow for customization in terms of data reading, preprocessing, feature extraction, and visualization. The integration of these plugins enables the platform to accommodate data formats from various devices, catering to diverse user data processing requirements. Upon a plugin upload, the platform dynamically registers it as a system module, facilitating its direct invocation like built-in methods during subsequent processing. This approach not only provides users with more options for custom processing tasks but also optimally utilizes the platform’s multitasking and collaborative features.

The design of plugins is free from syntax restrictions, with the input deriving from data and platform parameters. This design aims to empower users to customize processing methods according to their needs. As plugins are Python-based, they provide great extensibility, allowing for direct use of comprehensive third-party libraries, such as MNE. Moreover, MBPPE, courtesy of the MATLAB engine for Python, is capable of compatibly running certain EEGLAB plugins.

The introduction of plugin functionality may lead to issues of compatibility and stability. In terms of compatibility, the plugin serves as a processing unit that accepts inputs provided by the platform and generates outputs. Thus, when the user’s plugin needs to be used in conjunction with the built-in methods of MBPPE, it is necessary to ensure that the plugin can accept the output format of the preprocessing method as its input and that the results it generates meet the input format requirements of the subsequent processing method. In most cases, the input and output of the plugin should be two-dimensional matrices (channels, samples). If the user only uses a custom plugin to create task sequences, it is crucial to ensure the stable operation of the plugin during the data processing. In practical applications, plugins may encounter errors during processing, especially in multi-user scenarios. When a plugin execution error occurs, MBPPE will capture the error, mark the corresponding processing task as failed, and provide error information for subsequent troubleshooting. It is worth noting that the generation of errors does not affect other tasks being processed, nor does it lead to a system-wide crash.

2.6. Pipeline

We designed the pipeline as a strategy for automated batch processing tasks that treats the signal processing methods as independent task units, illustrated in Figure 4. During the construction of task sequences, users initially conduct data filtration, which encompasses the selection of files for processing and the exclusion of low-quality channels. Subsequently, when determining the processing strategy, users can form task units by flexibly combining processing methods and customizing processing parameters, thereby adapting to specific processing scenarios. All tasks will be executed asynchronously, allowing users to create multiple task sequences simultaneously. Each task will be allocated an independent thread during its execution to ensure that tasks do not interfere with each other, thereby achieving parallel computation. Furthermore, the platform also provides a monitoring dashboard for users to effectively manage and monitor the status of tasks. Although initially designed to enable server processing for large data volumes, this pipeline also functions effectively in local offline modes.

In the automated processing procedure, each stage utilizes the output of the previous stage as its input. Considering that users typically express more concern for the final processing results than the intermediate data, the pipeline does not preserve this process data. Each task stage’s results get overwritten by subsequent outputs, a mechanism that efficiently conserves storage space and read-write time.

2.7. Collaboration

To facilitate a multi-user environment, MBPPE introduces a collaborative working function, allowing authorized users to share data on the server after privatized deployment. To enhance the monitoring and management efficiency of the multi-task pipeline, we designed a monitoring panel that can display task information and execution progress in real time, and provide annotation functionality for recording and communication.

In terms of access control, owners can control permissions by configuring different levels of access keys. which are read-write permissions, read-only permissions, and prohibitive access permissions. Users with read-write permissions can access the platform’s functions without restrictions, including viewing, downloading, and modifying data; users with read-only permissions can only acquire data but cannot modify it, such as submitting processing tasks; users with prohibitive access permissions are not permitted to access any resources.

For more nuanced access control, the owner has the option to uniformly set permissions for all functions or to set them individually based on the route. For instance, if the owner desires that visitors only retrieve analysis results without accessing the original data, this could be accomplished by setting prohibitive access permissions for the original data and read-only permissions for the analysis results.

The initiation of collaborative strategies serves to establish an effective mode of cooperation. For instance, within distanced working environments, researchers can employ this mechanism to facilitate data sharing and access without the necessity of being in identical physical locations. In expansive research projects, such collaborative practices foster more intuitive data sharing and task progression, thus advancing cross-disciplinary and cross-laboratory collaboration. Concurrently, the access control function provides safeguards against unauthorized access and overreaching operations within multi-person collaborations.

2.8. Interactive Visualization

To address the limited visualization analysis scheme in existing toolkits and integrated EEG processing platforms, we implemented a dynamic rendering approach using a canvas to enhance user interaction. This approach, in contrast to traditional static image methods, enables users to modify the rendering results interactively by manipulating the canvas.

Before rendering, the User Layer must parse and format the received data. Specifically, this operation involves transforming the matrix data obtained from the server into processable multi-dimensional arrays. Subsequently, the data is mapped to suitable variable types based on the descriptive information within the matrix, thus preventing numerical overflow.

The platform offers a variety of graphical rendering methods, including time-domain plots, spectral plots, frequency plots, and time–frequency plots. Notably, for time-domain and frequency plots, we create independent coordinate systems for each channel. This strategy prevents significant fluctuations in one channel from affecting the rendering of other channels. In traditional linear graph rendering processes, global scaling settings often pose problems: a small scale may lead to frame loss due to dramatic fluctuations in some channel signals, impeding effective data analysis; a large scale might cause subtle fluctuations to appear excessively smoothed, complicating the tracking of detailed changes. To address this issue, we introduced a local scaling approach. Scaling operations use the current canvas window as a reference, and users can dynamically adjust the scale through mouse gestures. In response, the rendered image updates in real time based on user interactions until the visualization meets the user’s preference.

For time–frequency plots, layered coloring is essential to differentiate between various energy levels. We also compute the maximum energy value in the current window, establishing it as a baseline. Using this reference point, we set multiple gradients for coloring, where darker shades indicate higher energy and lighter shades represent lower energy. Additionally, users can directly filter data by dragging the color scale.

Moreover, the canvas-based rendering technique allows for the display of values at any sample point, thereby facilitating quantitative analysis. While we prioritize offering an interactive visualization experience, our platform also supports the export of rendered images as static pictures to accommodate diverse use cases.

2.9. Security and Privacy

In the offline mode, the operation process of MBPPE is similar to other integrated processing platforms, and the responsibility for data security maintenance relies on the users themselves. However, in online mode, especially in a multi-user environment, access control should be enabled to prevent any unauthorized users from malicious access or overstepping operations. Moreover, we also recommend adopting the hypertext transfer protocol over a secure socket layer (HTTPS) to ensure the security of data transmission between users and servers, thus preventing data breaches. It is worth noting that as an open-source platform, MBPPE does not have any privacy collection policies and will not collect any data uploaded by users. The ethical issues arising in the process of EEG data processing should be assessed and decided by the data owners themselves.

2.10. Sample Data

In this study, we selected the SEED dataset [25,30] for baseline testing. SEED is a publicly available emotion classification dataset. This dataset uses video clips as induction means to elicit emotional responses from subjects, examining the relationship between EEG signals and emotional responses. During the data collection process, the SEED team employed the ESI NeuroScan system to collect 62-channel EEG data from 15 subjects across three independent sessions, yielding 45 data samples in total. SEED essentially provides two datasets: a preprocessed dataset and a feature-extracted dataset. In the preprocessed dataset, the original data undergoes downsampling, adjusting the frequency to 200 Hz, and is filtered through a band-pass filter of 0–70 Hz. The feature-extracted dataset, on the other hand, is based on the preprocessed dataset, with features extracted using the differential entropy method. Both datasets are stored in the standard MATLAB output mat file format.

3. Results

MBPPE is an open-source software, the source code and user manual of which can be freely obtained on its GitHub page https://github.com/wavescycle/MBPPE (accessed on 14 January 2024).

3.1. Performance Benchmarking

To optimize the system’s computational efficiency, we adjusted the input data structure to accommodate matrix operations. We selected an EEG data file from the SEED dataset for performance benchmarking; this file is 315 MB and encompasses 62 channels. We established the experimental environment on a computer powered by an AMD Ryzen 7 5800H CPU and supported by 16 GB of memory.

Importantly, during the data reading phase, we employed a truncation method to transform the data into matrix format, resulting in a data size reduction compared to the original file. During the testing process, each processing method was executed multiple times utilizing the original data. We measured the duration of each execution with the compiler, and Table 1 showcases the average execution times. To ensure consistent results and negate any cache-related interference due to repeated executions, the environment was reset after testing each method.

3.2. Validation of Pipeline

To validate the scenario of parallel execution for multiple tasks within the pipeline, we constructed multiple distinct task sequences in offline mode, as depicted in Figure 5. Figure 5a provides an overview of the pipeline, enabling users to create new task sequences or monitor tasks’ real-time execution status. Each task, initially identified by an automatically generated universally unique identifier (UUID), is subsequently associated with the currently processed file name and the method involved. To visually convey task status, we use green for a successfully executed task, black for an ongoing task, and gray for a pending task. Failed tasks are highlighted in red. Figure 5b provides a detailed view of the selected task, including the file list, current execution progress, user-selected channels, and method-specific parameter configurations. Additionally, This interface permits users to manage files and tasks, as well as annotate and communicate about experiments.

Each sequence processes multiple EEG data files. Specifically, we constructed a practical application case for emotion classification to demonstrate the adaptability of MBPPE in real EEG processing scenarios. Initially, we employed resampling to decrease the sampling rate to 100 Hz, aiming to minimize data volume in the succeeding processing. Subsequently, we preprocessed the data using a band-pass filter (1–40 Hz) to discard potential power line interference and cardiac artifacts [31]. Ultimately, we derived differential entropy features from the preprocessed data as the foundation for emotion classification. The flexibility of MBPPE is embodied in the task sequence creation process, where processing methods can be freely combined. For instance, when differential entropy features prove inapplicable in certain instances, frequency-domain analysis may be selected. Simultaneously, we crafted a task sequence incorporating plugins to assess the pipeline’s compatibility. Furthermore, we constructed two dedicated sequences for re-referencing and time–frequency analysis, respectively, to determine the pipeline’s operational capability and whether it could bypass stages when specific task units are absent during the preprocessing or analysis phase.

To attest to the reliability of the pipeline, we applied the wavelet packet transform to the EEG data, transitioning from the time domain to the frequency domain. Figure 6a displays the frequency response chart for a specific channel after this transformation. In another pipeline, the EEG data first undergoes a low-pass filter with a cutoff frequency of 50 Hz, followed by a continuous wavelet transform for time–frequency analysis. The corresponding results are presented in Figure 6b. Due to the application of a 50 Hz low-pass filter, the truncation of frequency energy at 50 Hz is observable.

Meanwhile, according to the feature extraction method of the SEED dataset, we utilized MBPPE to extract differential entropy from the preprocessed datasets. Subsequently, we used support vector classification as the baseline classification function and performed ten-fold cross-validation on both the SEED feature-extracted data and the MBPPE-extracted data. The results are depicted in the following Figure 7. We discovered that the average accuracy of the SEED dataset was 82%, while that of the MBPPE-extracted data reached 83%. Despite this slight increase, which may be attributed to randomness and minor discrepancies in the implementation details of SEED’s differential entropy extraction (such as programming language and library version), it still indicates that MBPPE possesses reliability in practical applications.

3.3. Presentation of Interactive Visualization

Within our supported interactive visualization graphic system, graphics are primarily categorized into linear and non-linear types. Linear graphs encompass EEG traces and frequency characteristic diagrams, characterized by their local zooming capabilities. They permit users to adjust the zoom scale through mouse gestures and directly drag the timeline to view values for a specific time segment. In contrast, non-linear graphs, such as power spectral density diagrams and time–frequency diagrams, offer users a more flexible custom filtering function. However, whether linear or non-linear, all share the feature of numeric hints, presenting data information more intuitively.

To demonstrate the interactivity of the visualization functions, specific EEG data channels were selected for rendering. The outcomes of this process are presented in Figure 8, which illustrates local zooming. In the canvas, we adjusted the rendering scale using mouse gestures. In the same-sized window, Figure 8a represents 8 s of data, while Figure 8b shows only 4 s. After scaling adjustments, Figure 8a provides a more comprehensive presentation of data, whereas Figure 8b focuses on explaining particular trends in alteration. Additionally, we enabled the tooltip feature in Figure 8b to present the precise value of the current sampling point. Users can also modify the data time segment within the current display window by dragging the time axis.

Figure 9 contrasts the time–frequency diagram of a single channel before and after filtering, aiming to demonstrate the convenience of numeric filtering within non-linear graphs. Figure 9a is an original time–frequency plot rendered continuously, while Figure 9b is obtained by adjusting the energy value range through a color map, filtering out values greater than 1. Unlike the discrete color block legend utilized in Figure 6, we chose a continuous color spectrum in Figure 9 to more clearly represent data variations, aiding users in precisely selecting numeric ranges.

4. Discussion

In this study, we introduce a modular EEG signal batch processing platform. The design goal of this platform is to cater to the diverse needs of EEG signal processing and analysis, to lower the user threshold, and to improve the user interaction experience based on this foundation.

To reduce the complexity of user operations, we provide an interactive interface for data processing and preset default parameters for most processing schemes. In addition, we provide detailed documentation in the open-source community, including usage methods and parameter descriptions, which will help beginners to learn and apply better. For professional users who need to perform more complex processing, MBPPE provides advanced options, allowing users to modify the preset parameters. Furthermore, professional users can also refer to the documentation to write custom plugins and integrate them into the system processing workflow to meet the specific processing scenario requirements.

MBPPE reduces system complexity by dividing the system’s functions into modules, thereby easing the burden on users during maintenance. Furthermore, we introduced a private deployment strategy, allowing users to migrate computationally intensive signal processing tasks to high-performance servers, mitigating the limitations imposed by local hardware. The privatization deployment strategy also provides multi-user collaboration functionality. It offers a more efficient solution for processing large data sets through data sharing, permission management, task monitoring, and collaborative communication. To enhance usability post-deployment, we developed the user layer using HTML. This approach facilitates data processing directly through a browser, negating the need for extra client software. Coupled with a canvas, it delivers a visually rich interactive experience, streamlining image operations and minimizing manual parameter input.

In addition, our modular design for processing methods empowers users to design custom automated batch processing workflows tailored to their needs. The Pipeline module provides users with task maintenance and monitoring capabilities and allows them to create a sequence of tasks that can be executed asynchronously, achieving continuous data processing. The plugin allows users to integrate their custom methods into the system, thereby fully utilizing the advantages of MBPPE while meeting the requirements of personalized processing scenarios.

While our current system has demonstrated certain efficacy, there is still room for improvement. We employed a matrix transformation strategy in the data reading stage to enhance data processing speed. Yet, this transformation alters the original data structure to a certain extent, which may not be suitable for all application scenarios. On the other hand, MBPPE persists the user uploaded data in memory and hands over data management to the user, but it does not limit or warn about memory usage, which may trigger a memory overflow if users continuously upload massive data. Moreover, although we provide plugins to increase the diversity of processing methods, our built-in methods are still insufficient compared to existing development toolkits, particularly in the context of intricate processing tasks. In terms of access control, while we have implemented user access control from the routing level, fine-grained access control, such as access control for specific datasets, still needs to be strengthened. In addition, we utilize temporary storage to resolve configuration and caching issues during platform operation to reduce system complexity and maintenance costs. However, when facing large-scale task demands, adopting a database might be a more optimal choice.

Future enhancements to the platform aim to improve its effectiveness and convenience. We plan to integrate additional processing methods and bolster file reading capabilities to augment the platform’s processing power. Memory monitoring functions are also in the pipeline, which will alert users to prevent memory overflow issues. We also aim to refine permission control to oversee not only modules and functions but also specific datasets. For the management of large datasets, a database will be considered a high-end option to meet the specific needs of users who require long-term data preservation, retrieval, and management, providing them with a more robust and efficient storage solution. These improvements seek to address potential issues in MBPPE. We will also actively engage with feedback from the GitHub open-source community, using it to inform future platform development. This includes optimizing the user interface based on user needs, rectifying functional issues, and introducing new features that users expect. The operational process and user experience of actual users will also be taken into account, aiming to enhance the platform’s efficiency and fluency, and thus optimizing the overall user experience.

5. Conclusions

In summary, MBPPE enhances the flexibility of standardized processes with its plugin function. It effectively meets the processing needs of multi-user scenarios and complex tasks through multi-task management and collaborative functions. Additionally, it integrates cloud computing and batch processing solutions, overcoming hardware resource limitations when handling large-scale data. Therefore, MBPPE provides an effective solution for individual users and laboratories processing large-scale data.

Author Contributions

Conceptualization, M.C. and G.F.; methodology, J.Q.; software, J.Q.; validation, J.Q., M.C. and G.F.; formal analysis, J.Q.; investigation, J.Q.; resources, M.C.; writing—original draft preparation, J.Q.; writing—review and editing, M.C. and G.F.; visualization, J.Q.; supervision, M.C. and G.F.; project administration, M.C. and G.F.; funding acquisition, M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by Guangdong Province Key Field R&D Plan Project grant number No. 2021B0202070001.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://bcmi.sjtu.edu.cn/home/seed/index.html (accessed on 14 January 2024).

Acknowledgments

We appreciate the suggestions made by Zhirong Wang in terms of system improvements.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lopes da Silva, F. EEG and MEG: Relevance to Neuroscience. Neuron 2013, 80, 1112–1128. [Google Scholar] [CrossRef]
Mulert, C.; Lemieux, L. EEG-fMRI: Physiological Basis, Technique, and Applications; Springer Nature: Cham, Switzerland, 2023. [Google Scholar]
Nicolas-Alonso, L.F.; Gomez-Gil, J. Brain Computer Interfaces, a Review. Sensors 2012, 12, 1211–1279. [Google Scholar] [CrossRef]
Hartmann, M.M.; Schindler, K.; Gebbink, T.A.; Gritsch, G.; Kluge, T. PureEEG: Automatic EEG Artifact Removal for Epilepsy Monitoring. Neurophysiol. Clin. Neurophysiol. 2014, 44, 479–490. [Google Scholar] [CrossRef] [PubMed]
Doma, V.; Pirouz, M. A Comparative Analysis of Machine Learning Methods for Emotion Recognition Using EEG and Peripheral Physiological Signals. J. Big Data 2020, 7, 1–21. [Google Scholar] [CrossRef]
Gramfort, A.; Luessi, M.; Larson, E.; Engemann, D.A.; Strohmeier, D.; Brodbeck, C.; Parkkonen, L.; Hämäläinen, M.S. MNE Software for Processing MEG and EEG Data. NeuroImage 2014, 86, 446–460. [Google Scholar] [CrossRef]
Gabard-Durnam, L.J.; Mendez Leal, A.S.; Wilkinson, C.L.; Levin, A.R. The Harvard Automated Processing Pipeline for Electroencephalography (HAPPE): Standardized Processing Software for Developmental and High-Artifact Data. Front. Neurosci. 2018, 12, 97. [Google Scholar] [CrossRef]
Kayhan, E.; Matthes, D.; Marriott Haresign, I.; Bánki, A.; Michel, C.; Langeloh, M.; Wass, S.; Hoehl, S. DEEP: A Dual EEG Pipeline for Developmental Hyperscanning Studies. Dev. Cogn. Neurosci. 2022, 54, 101104. [Google Scholar] [CrossRef] [PubMed]
Delorme, A.; Makeig, S. EEGLAB: An Open Source Toolbox for Analysis of Single-Trial EEG Dynamics Including Independent Component Analysis. J. Neurosci. Methods 2004, 134, 9–21. [Google Scholar] [CrossRef]
Dong, L.; Li, J.; Zou, Q.; Zhang, Y.; Zhao, L.; Wen, X.; Gong, J.; Li, F.; Liu, T.; Evans, A.C.; et al. WeBrain: A Web-Based Brainformatics Platform of Computational Ecosystem for EEG Big Data Analysis. NeuroImage 2021, 245, 118713. [Google Scholar] [CrossRef]
Harris, C.R.; Millman, K.J.; van der Walt, S.J.; Gommers, R.; Virtanen, P.; Cournapeau, D.; Wieser, E.; Taylor, J.; Berg, S.; Smith, N.J.; et al. Array Programming with NumPy. Nature 2020, 585, 357–362. [Google Scholar] [CrossRef]
Urigüen, J.A.; Garcia-Zapirain, B. EEG Artifact Removal—State-of-the-Art and Guidelines. J. Neural Eng. 2015, 12, 031001. [Google Scholar] [CrossRef] [PubMed]
Motamedi-Fakhr, S.; Moshrefi-Torbati, M.; Hill, M.; Hill, C.M.; White, P.R. Signal Processing Techniques Applied to Human Sleep EEG Signals—A Review. Biomed. Signal Process. Control. 2014, 10, 21–33. [Google Scholar] [CrossRef]
Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef] [PubMed]
Chaddad, A.; Wu, Y.; Kateb, R.; Bouridane, A. Electroencephalography Signal Processing: A Comprehensive Review and Analysis of Methods and Techniques. Sensors 2023, 23, 6434. [Google Scholar] [CrossRef]
Gonçales, L.J.; Farias, K.; Kupssinskü, L.; Segalotto, M. The Effects of Applying Filters on EEG Signals for Classifying Developers’ Code Comprehension. J. Appl. Res. Technol. 2021, 19, 584–602. [Google Scholar] [CrossRef]
Onton, J.; Westerfield, M.; Townsend, J.; Makeig, S. Imaging Human EEG Dynamics Using Independent Component Analysis. Neurosci. Biobehav. Rev. 2006, 30, 808–822. [Google Scholar] [CrossRef] [PubMed]
Hyvärinen, A.; Oja, E. Independent Component Analysis: Algorithms and Applications. Neural. Netw. 2000, 13, 411–430. [Google Scholar] [CrossRef]
Varotto, G.; Susi, G.; Tassi, L.; Gozzo, F.; Franceschetti, S.; Panzica, F. Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization From Interictal Intracranial EEG Recordings in Patients with Focal Epilepsy. Front. Neuroinformat. 2021, 15, 715421. [Google Scholar] [CrossRef]
Vaidyanathan, P.P. Multirate Systems and Filter Banks; Prentice Hall: Hoboken, NJ, USA, 1993. [Google Scholar]
Chella, F.; Pizzella, V.; Zappasodi, F.; Marzetti, L. Impact of the Reference Choice on Scalp EEG Connectivity Estimation. J. Neural Eng. 2016, 13, 036016. [Google Scholar] [CrossRef]
Yao, D.; Qin, Y.; Hu, S.; Dong, L.; Bringas Vega, M.L.; Valdés Sosa, P.A. Which Reference Should We Use for EEG and ERP Practice? Brain Topogr. 2019, 32, 530–549. [Google Scholar] [CrossRef]
Al-Fahoum, A.S.; Al-Fraihat, A.A. Methods of EEG Signal Features Extraction Using Linear Analysis in Frequency and Time-Frequency Domains. Int. Sch. Res. Not. 2014, 2014, 730218. [Google Scholar] [CrossRef] [PubMed]
Ong, Z.Y.; Saidatul, A.; Ibrahim, Z. Power Spectral Density Analysis for Human EEG-Based Biometric Identification. In Proceedings of the 2018 International Conference on Computational Approach in Smart Systems Design and Applications (ICASSDA), Kuching, Malaysia, 15–17 August 2018; pp. 1–6. [Google Scholar] [CrossRef]
Duan, R.N.; Zhu, J.Y.; Lu, B.L. Differential Entropy Feature for EEG-Based Emotion Classification. In Proceedings of the 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER), San Diego, CA, USA, 6–8 November 2013; pp. 81–84. [Google Scholar] [CrossRef]
Shi, L.C.; Jiao, Y.-Y.; Lu, B.-L. Differential Entropy Feature for EEG-Based Vigilance Estimation. In Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan, 3–7 July 2013; pp. 6627–6630. [Google Scholar] [CrossRef]
Subasi, A.; Ismail Gursoy, M. EEG Signal Classification Using PCA, ICA, LDA and Support Vector Machines. Expert Syst. Appl. 2010, 37, 8659–8666. [Google Scholar] [CrossRef]
Bajaj, N. Wavelets for EEG Analysis. In Wavelet Theory; IntechOpen: London, UK, 2020. [Google Scholar] [CrossRef]
Kumar, P.S.; Arumuganathan, R.; Sivakumar, K.; Vimal, C. Removal of Ocular Artifacts in the EEG through Wavelet Transform without Using an EOG Reference Channel. Int. J. Open Probl. Compt. Math 2008, 1, 188–200. [Google Scholar]
Zheng, W.L.; Lu, B.L. Investigating Critical Frequency Bands and Channels for EEG-Based Emotion Recognition with Deep Neural Networks. IEEE Trans. Auton. Ment. Dev. 2015, 7, 162–175. [Google Scholar] [CrossRef]
Jiang, X.; Bian, G.B.; Tian, Z. Removal of Artifacts from EEG Signals: A Review. Sensors 2019, 19, 987. [Google Scholar] [CrossRef]

Figure 1. System architecture.

Figure 2. Operational modes: (a) Offline mode; (b) Online mode.

Figure 3. Zero-padding data formatting.

Figure 4. Flowchart of pipeline incorporates four stages, each assigned with multiple tasks. Tasks linked by solid arrows are obligatory, while those linked by dotted arrows are discretionary. During the execution process, the stages proceed from left to right. If a stage encompasses multiple tasks, their execution order corresponds to their addition sequence; if a stage lacks tasks, it is disregarded.

Figure 5. Pipeline overview: (a) Comprehensive overview of pipeline tasks; (b) Detailed examination of the selected task.

Figure 6. (a) Visualization of frequency analysis; (b) Visualization of time–frequency analysis.

Figure 7. Result of ten-fold cross-validation.

Figure 8. Local zooming: (a) EEG with default scale; (b) EEG with adjusted scale.

Figure 9. Value filtering: (a) Original time–frequency visualization; (b) Time–frequency visualization excluding values greater than 1.

Table 1. Execution time of methods.

	Methods	Time(s)	Details
Preprocessing	Filter	<1	Band-pass (1–40 Hz)
	Independent Component Analysis	≈90	FastICA
	Resampling	<1	Downsampling (100 Hz)
	Rereferencing	<1	Average Reference
Feature Extraction	Power Spectral Density	$< 3$
	Differential Entropy	<1
	Frequency	<5
	Time–Frequency	≈100

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qiu, J.; Chen, M.; Feng, G. MBPPE: A Modular Batch Processing Platform for Electroencephalography. Appl. Sci. 2024, 14, 770. https://doi.org/10.3390/app14020770

AMA Style

Qiu J, Chen M, Feng G. MBPPE: A Modular Batch Processing Platform for Electroencephalography. Applied Sciences. 2024; 14(2):770. https://doi.org/10.3390/app14020770

Chicago/Turabian Style

Qiu, Jinggong, Ming Chen, and Guofu Feng. 2024. "MBPPE: A Modular Batch Processing Platform for Electroencephalography" Applied Sciences 14, no. 2: 770. https://doi.org/10.3390/app14020770

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MBPPE: A Modular Batch Processing Platform for Electroencephalography

Abstract

1. Introduction

2. Materials and Methods

2.1. System Architecture Overview

2.2. Data Formatting

2.3. Preprocessing

2.3.1. Filter

2.3.2. Independent Component Analysis

2.3.3. Resampling

2.3.4. Re-Referencing

2.4. Feature Extraction

2.4.1. Power Spectral Density

2.4.2. Differential Entropy

2.4.3. Wavelet Analysis

2.5. Plugins

2.6. Pipeline

2.7. Collaboration

2.8. Interactive Visualization

2.9. Security and Privacy

2.10. Sample Data

3. Results

3.1. Performance Benchmarking

3.2. Validation of Pipeline

3.3. Presentation of Interactive Visualization

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI