Deep Learning Model of Sleep EEG Signal by Using Bidirectional Recurrent Neural Network Encoding and Decoding

Fu, Ziyang; Huang, Chen; Zhang, Li; Wang, Shihui; Zhang, Yan

doi:10.3390/electronics11172644

Open AccessArticle

Deep Learning Model of Sleep EEG Signal by Using Bidirectional Recurrent Neural Network Encoding and Decoding

by

Ziyang Fu

^1,2,3

,

Chen Huang

^1,2,3,*,

Li Zhang

^4,*,

Shihui Wang

^1,2,3 and

Yan Zhang

^1,2,3

¹

School of Computer Science and Information Engineering, Hubei University, Wuhan 430062, China

²

Hubei Software Engineering Technology Research Center, Wuhan 430062, China

³

Hubei Engineering Research Center for Smart Government and Artificial Intelligence Application, Wuhan 430062, China

⁴

Department of Neurology, Hubei Provincial Hospital of Integrated Chinese & Western Medicine, Wuhan 430015, China

^*

Authors to whom correspondence should be addressed.

Electronics 2022, 11(17), 2644; https://doi.org/10.3390/electronics11172644

Submission received: 22 July 2022 / Revised: 12 August 2022 / Accepted: 16 August 2022 / Published: 24 August 2022

(This article belongs to the Special Issue Advancements in Cross-Disciplinary AI: Theory and Application)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Electroencephalogram (EEG) is a signal commonly used for detecting brain activity and diagnosing sleep disorders. Manual sleep stage scoring is a time-consuming task, and extracting information from the EEG signal is difficult because of the non-linear dependencies of time series. To solve the aforementioned problems, in this study, a deep learning model of sleep EEG signal was developed using bidirectional recurrent neural network (BiRNN) encoding and decoding. First, the input signal was denoised using the wavelet threshold method. Next, feature extraction in the time and frequency domains was realized using a convolutional neural network to expand the scope of feature extraction and preserve the original EEG feature information to the maximum extent possible. Finally, the time-series information was mined using the encoding–decoding module of the BiRNN, and the automatic discrimination of the sleep staging of the EEG signal was realized using the SoftMax function. The model was cross-validated using Fpz-Cz single-channel EEG signals from the Sleep-EDF dataset for 19 nights, and the results demonstrated that the proposed model can achieve a high recognition rate and stability.

Keywords:

sleep staging; deep learning; brain–computer interface; convolutional neural network (CNN); bidirectional recurrent neural network

1. Introduction

Sleep is among the most important physiological activities in humans. Sleep staging methods are employed as the main means of studying the sleep process. Polysomnography (PSG) is the most commonly used signal acquisition method for sleep staging. PSG data (including electrooculogram (EOG) and electromyogram (EMG)) are collected in a sleep laboratory by placing electrodes and sensors on the patient’s body [1]. Single-channel electroencephalogram (EEG) signals and multichannel EEG are used for sleep staging models. In this study, single-channel EEG signals were used as the input for the sleep staging task.

In the sleep staging task, the sleep stages are manually marked in the EEG signal according to the sleep scoring criteria provided by the American Academy of Sleep Medicine or the Carlisle criteria. Manual labeling of sleep stages is tedious and time-consuming, and labeling is subjective. Therefore, deep learning (DL) techniques have been employed for sleep staging to obtain reliable sleep staging results.

In recent years, DL networks have exhibited good performance in fields such as computer vision, reinforcement learning, and natural language processing. For example, Sors et al. [2] used a deep convolutional neural network (CNN) for supervised learning of five classes of sleep stage prediction, using end-to-end training to guarantee good generalization performance; Zhang et al. [3] proposed a long short-term memory (LSTM)-based end-to-end automatic sleep staging model that greatly reduces data preprocessing and gives the model more space to adjust the data; Zhao et al. [4] proposed a DL algorithm based on 1D CNN-LSTM that considers the memory nature of the data (i.e., the influence of past on future contextual information). However, in the aforementioned studies, the following issues remain unresolved: (1) how to bidirectionally link the EEG information within the neural network with past and future contextual information, and (2) how to capture the non-linear dependencies present in the time series.

To address the aforementioned issues, in this study, we developed a model that can adapt to the signal characteristics of EEG sleep staging and capture the non-linear dependencies in the time series, thereby reducing the overreliance on feature extraction. In addition, we used bidirectional RNN (BiRNN) to enable EEG information to be linked to both past and future contextual information. Furthermore, due to the time-series nature of EEG, we constructed an encoding–decoding module by using BiRNN units to capture the non-linear dependencies present in the time series.

The proposed model has a typical sequence-to-sequence structure, which enables mapping different lengths of time series to each other and capturing the non-linear dependencies present in the time series. In addition, the BiRNN units in the model enable bidirectional linkage of EEG information with the past and future contextual information, in line with the signal characteristics of EEG sleep staging.

2. Methods

2.1. Model Structure

The structure of the BiRNN-based DL model for sleep EEG signals developed in this study is shown in Figure 1. The model consists of three main modules: the data preprocessing module, the CNN feature extraction module, and the coding–decoding module.

The data preprocessing module is used for denoising, data splitting, and normalization of EEG sleep staging signals. The CNN module is used for time-domain and frequency-domain feature extraction of the time series. The encoding–decoding module is used for learning the features extracted by the CNN module and for automatic feature staging.

2.2. Model Construction

First, the EEG sleep staging signal was preprocessed. Next, the main features of the EEG signal were extracted using the CNN’s time-domain and frequency-domain feature extraction algorithm. Finally, the extracted features were learned using an encoding–decoding module based on BiRNN units, and a SoftMax layer was added at the end of the encoding–decoding module (Figure 2) to predict the EEG sleep staging results.

(1): Data preprocessing: First, the EEG data were cleaned. Next, the input signal was denoised using a wavelet-based threshold denoising method. Finally, the data were split to obtain the sequence $X^{T}$ .
(2): CNN network: The CNN network was constructed to extract features in the time domain and frequency domain. For the sequence $X^{T}$ obtained above, feature extraction was performed using the model shown in Figure 3 to obtain the feature sequence $S = w_{1}, w_{2}, \dots, w_{n}$ .
(3): Construction of a BiRNN unit: The BiRNN cell structure (Figure 4) was constructed using the bidirectional transfer of EEG information contextual information.
(4): Construction of the encoder: It was assumed that the implicit state sequence corresponding to the above feature sequence is $H = h_{1}, h_{2}, \dots, h_{n}$ . If the function $f$ denotes the transformation of the BiRNN hidden layer, that is, $h_{t} = f (x_{t}, h_{t - 1})$ and the custom function $q$ transforms the hidden state at each moment into the background variable $c$ , then we obtain.

$c = q (h_{1}, h_{2}, \dots, h_{T_{x}})$

(1)
(5): Construction of the decoder: At time $t$ , the background information is $S_{t^{'}} = f^{'} (S_{t^{'} - 1}, y_{t^{'} - 1}, c)$ . If function $g$ represents the transformation of the hidden layer of the decoder and $P_{θ} (y | x)$ is the conditional probability of the output of the decoder, then we obtain

$p (y | x) = \max \frac{1}{N} \sum_{n = 1}^{N} \lg [p_{θ} (y_{n} | x_{n})]$

(2)
(6): Model training: The EEG data were divided in the ratio of 7:3 for the training set and test set. The training set was used for labeled learning of the encoding–decoding module, the test set was used for the unlabeled prediction of the encoding–decoding module, and the model results (i.e., F1 score and accuracy) were evaluated using K-fold cross-validation.

3. Data Preprocessing

Signal features can be classified as time-domain features and frequency-domain features. Because the EEG signal is a non-smooth, non-linear, time-varying signal, extracting features from the time domain or frequency domain alone is not sufficiently comprehensive; thus, the EEG signal is transformed to achieve denoising, and then the time-domain and frequency-domain features are extracted simultaneously. In this study, wavelet threshold denoising was used to denoise the input signal, and then the EEG signal was split for subsequent feature extraction.

3.1. Wavelet Threshold Denoising

In the wavelet transform-based denoising technique, the noisy wavelet coefficients are decomposed, and then the infinitely close wavelet coefficients are calculated from the data. After the wavelet coefficient decomposition process, the noise and the signal are separated, and the wavelet coefficients containing valid information are retained. Next, the denoised signal is obtained through inverse wavelet transform. The flowchart for wavelet denoising is shown in Figure 5.

The noise model based on the wavelet transform for a two-dimensional (2D) signal can be expressed as

Y_{i j} = X_{i j} \cdot δ + N

(3)

where

X

is the original signal,

Y

is the transmitted noisy signal,

δ

is the multiplicative noise,

N

is the additive noise, and

i

and

j

are the coordinates of the 2D space in which the signal is located.

In wavelet thresholding, the selection of the threshold value is vital. In this study, the SureShrink threshold [5] was used. The SureShrink threshold was obtained using great likelihood unbiased estimation. The thresholding formula is

T = \arg \min [σ^{2} N + \sum_{j = 0}^{N - 1}]

(4)

3.2. EEG Signal Processing

To reduce the computational cost, the EEG signal data most relevant to sleep were employed as the training and test sets in this study. To ensure improved performance in the subsequent feature extraction module, the following measures were adopted:

(1): For the Fpz-Cz and Pz-Oz channels of the EEG signal, the EEG signal was denoised using the wavelet threshold denoising method.
(2): The EEG signal obtained after wavelet threshold denoising was split into 30-segments, with an artificially labeled sleep staging stage corresponding to each of the split 30 s signals.
(3): The dataset containing 30 s segments was normalized into a time series with mean 0 and variance 1.

4. Extraction of Time-Domain and Frequency-Domain-Based Features

Feature extraction methods employed for signal analysis can be classified as time-domain and frequency-domain methods. In DL, multiple convolutional layers with convolutional kernels of different sizes are used at the front end of the network to extract local or global features of the input signal to obtain an accurate description of the input data [6]. We constructed a two-layer filter by using a convolutional neural network (CNN) to simultaneously extract time-domain frequency-domain feature information.

4.1. CNNs

CNNs are used to extract relevant features for a minimal hypothesis task that is consistent with displacement invariance. CNNs consist of a convolutional layer, a batch regularization layer, an activation layer, a pooling layer, and a fully connected layer. In this study, a one-dimensional (1D) discrete convolution [7] operation was employed:

s (n) = (f \cdot g) [n] = \sum_{m = 0}^{N - 1} f (m) g (n - m)

(5)

where

f (x)

is the original 1D signal,

g (x)

is the convolution kernel,

N

is the length of the original signal

f (x)

, and

s (n)

is the final result.

4.2. Time-Domain and Frequency-Domain Feature Extraction

CNN filters can learn layer by layer from the original 1D signal, continuously transforming the underlying features to extract different signal features at each level [8]. The CNN used in this study consisted of a small filter for extracting temporal information and a large filter for extracting frequency information to achieve a good trade-off between the extraction of time-domain and frequency-domain features [9]. Both the small filter and the large filter are constructed by connecting four CNN layers. Firstly, the EEG signals are fed into the input layers of the small filter and the large filter, respectively, and four consecutive CNN layers are added after the input layer, and a BN (Batch Normalization) layer and a MaxPool layer are added after each CNN layer. This is done to speed up the training speed, and the maximum pooling is used to extract the significant features. A MaxPool layer is also added after the fourth layer, then a fusion layer is used to fuse the features of the large filter and the small filter, and finally, a Dropout layer is added to end the feature extraction and output the fused features to facilitate the subsequent feature learning of the network. The small and large filters are different only in variations of CNN parameters, which can achieve a good balance between time domain and frequency domain feature extraction for signal processing.

The schematic of the time-domain and frequency-domain feature extraction is illustrated in Figure 3.

5. The Encoding–Decoding Module Using BiRNN Units

The bidirectional coding of BiRNN units enables acquiring the past and future contextual information in a time series. The encoding–decoding module is not sensitive to the dimensionality and scale of the time series; thus, the encoding–decoding module based on BiRNN units can mine deep EEG signal features.

5.1. BiRNN Cell Design

In addition to the input, output, and hidden layers, conventional RNNs have a background layer, which is connected to the hidden layer and retains the output value of the previously hidden layer, and uses this value as input to the next network layer [10]. The distinctive feature of the background layer is that it relies on the past contextual information of the time series. However, for the characterization of EEG signals, the future contextual information of the time series must be used. To meet this requirement, a bidirectional coding strategy for RNNs called BiRNN has been developed [11,12,13].

The BiRNN unit used in this study is depicted in Figure 4. The hidden layers consist of a forward and a reverse hidden layer, both of which are connected to the same output layer. The network is designed such that two independent and opposite paths are taken; the background layer of the forward hidden layer

S_{0} \to S_{i}

retains the output value of the previous moment, and the reverse hidden layer

S_{0}^{'} \to S_{i}^{'}

retains the output value of the next moment. The output can be expressed as

y_{i} = W_{i}^{'} A_{i}^{'} + W_{i} A_{i} + b_{i}

(6)

5.2. Encoder Design

The encoder uses a neural network unit to transform an input sequence

X^{T}

of indeterminate length into a background variable

c

of constant length that contains all the information of the entire input sequence

X^{T}

. The decoder yields a conditional probability of

Y^{T}

based on the background variable

c

and then uses the neural network unit to transform the output sequence

Y^{T}

into the desired result by using a SoftMax layer [14,15].

The input EEG signal sequence to the encoder is assumed to be

S = w_{1}, w_{2}, \dots, w_{n}

, which is the implicit state sequence corresponding to

H = h_{1}, h_{2}, \dots, h_{n}

. It is assumed that the function

f

denotes the transformation of the hidden layer of the BiRNN:

h_{t} = f (x_{t}, h_{t - 1})

(7)

Next, the encoder transforms the hidden states at each moment into background variables

c

by using the custom function

q

:

c = q (h_{1}, h_{2}, \dots, h_{T_{x}})

(8)

5.3. Decoder Design

The background variable output

c

from the encoder contains information about the entire sequence

H

; thus, for the time step, the background variable information at time

t

is

s_{t'} = f' (s_{t' - 1}, y_{t' - 1}, c)

(9)

If the function

g

represents the transformation of the hidden layer of the decoder and

p_{θ} (y | x)

represents the conditional probability of the decoder output, then

p (y_{t'} | {y_{t' - 1}, y_{t' - 2}, \dots, y_{1}}, c) = g (s_{t'}, y_{t' - 1}, c)

(10)

p (y) = \prod_{t' = 1}^{T'} p (y_{t'} | {y_{t' - 1}, y_{t' - 2}, \dots, y_{1}}, c)

(11)

p (y | x) = \max \frac{1}{N} \sum_{n = 1}^{N} \lg [p_{θ} (y_{n} | x_{n})]

(12)

5.4. Loss Functions

A model for sleep disorder detection must consider the problem of category imbalance. To mitigate the effect of category imbalance on the model, we employed a new loss function based on Equations (13)–(15) [16] that treats the error of each misclassified sample equally, while its category does not have any effect on the error. The proposed loss function, that is, the mean false error (

M F E

) and the mean squared false error (

M S F E

), for a multiclass classification task can be defined as follows:

l (c_{i}) = \frac{1}{c_{i}} \sum_{i = j}^{c_{i}} {(y_{j} - {\hat{y}}_{j})}^{2}

(13)

l_{M F E} = \sum_{i = 1}^{N} l (c_{i})

(14)

l_{M S F E} = \sum_{i = 1}^{N} l {(c_{i})}^{2}

(15)

where

c_{i}

is the class label (e.g., W or N1),

c_{i}

is the number of samples in class

c_{i}

, N is the number of available classes (in this case, the sleep stage class), and

l (c_{i})

is the computational error of class

c_{i}

. In the most commonly used loss function, that is, the mean squared error (

M S E

), the loss is calculated from the squared difference between the mean prediction and the target. This method of loss calculation allows the majority of classes to contribute more than the minority of classes in an unbalanced dataset. In contrast, the

M F E

and the

M S F E

consider the errors of all classes equally.

6. Experiment and Results

6.1. Experimental Dataset

The Sleep-EDF (European Data Format) dataset was used in this study. The dataset contains 197 full nights of polysomnographic sleep recordings with EEG, EOG, EMG, respiration, body temperature, and event markers. The data sampling rate is 100 Hz, and the physiological data are segmented into 30 s intervals with manual labeling of the sleep staging labels provided by professionals. The data are manually classified by sleep specialists into eight stages according to the R&K criteria listed in the American Society of Sleep Medicine: W, N1, N2, N3, N4, REM, MOVEMENT, and UNKNOWN. By combining the stages N3 and N4 into a single stage, N3, as well as the stages REM, MOVEMENT and UNKNOWN into a single stage, REM, sleep state staging can be seen as a five-category problem, namely W (sleep onset), N1 (non-rapid eye movement sleep stage 1), N2 (non-rapid eye movement sleep stage 2), N3 (non-rapid eye movement sleep stage 3), and REM (rapid eye movement sleep stage). These five stages correspond to EEG features and their transition processes; common EEG features are listed in Table 1.

Each data record contains a variety of physiological data, including the EOG and EEG signals of Fpz-Cz and Pz-Oz channels (EEG). Both channels represent the range of channels used for EEG signal acquisition. In this study, the EEG signals of Fpz-Cz and Pz-Oz were used. The dataset contains 42,308 experimental data records; there are 17,799 data records for the N2 stage and only 2804 data records for the N1 stage, exhibiting a clear imbalance in the distribution of data categories. The categories of the dataset and the corresponding number of data records are presented in Table 2.

6.2. Experimental Parameters Setting

The experiments were conducted on a Windows 10 system with an NVIDIA GeForce GTX1060Ti graphics card, an AMD Ryzen 5 4600H processor, 16 GB of RAM, and the TensorFlow DL framework. The model training required approximately 50 h in total.

The loss functions, shown in Equations (11)–(13), were used, ReLU was used as the activation function, the batch size was 20, the data sampling rate was 100 Hz, the categories were W, N1, N2, N3, and REM, the learning rate was 0.0001, and the total number of training epochs was 120. Furthermore, 15-fold cross-validation was used for evaluation. Accuracy and F1 score were used as the evaluation parameters.

6.3. Analysis of Experimental Results

Model classification results

The EEG sleep staging time series of a subject who slept throughout the night is illustrated in Figure 6, with the horizontal axis indicating the time and the vertical axis indicating the different sleep stages. The real values in red denote the manual discrimination results, and the predicted values in blue denote the discrimination results of the sleep staging model proposed in this paper, exhibiting an overlap of more than 85%. However, some confusion was noted between the REM, N2, and N1 states because of manual discrimination errors, less training data for N1, and greater similarity between the states due to continuous sleep.

Figure 7 shows the power spectral density plots of the W, N1, N2, N3, and REM sleep stages of the Fpz-Cz channel and the Pz-Oz channel intercepting the original data, that is, the power per unit frequency band, where the horizontal axis indicates time and the vertical axis indicates frequency. In the Fpz-Cz channel, the frequencies of the N1 and N2 stages were smoother and less distinct; however, the N1 stage exhibited a wider frequency band. In the Pz-Oz channel, the N3 stage was less distinct. Thus, the power spectral density plot can demonstrate the differences between stages.

2.: Confusion matrix

The main diagonal of each confusion matrix represents the number of true-positive cases, which represents the overall score of correctly classified cases for each sleep stage. The results of the 15-fold cross-validation are presented in Table 3 and Table 4. The proposed sleep staging model achieved satisfactory classification performance with 70–85% classification accuracy for each category. The recall rate varied greatly between sleep stages, and the specificity (Spe) was approximately 95%, representing the model’s ability to predict negative cases. Furthermore, the F1 score was 80%, reflecting the stability of the sleep staging model’s classification results. The lower classification performance and wide variation in recall for the N1 sleep stage may be due to sample imbalance, with the number of N1 samples accounting for only 15.75% of the total number of samples.

3.: Comparison with other methods

The proposed model was compared with other models by using the same dataset and EEG signal channels. Table 3 and Table 4 present the classification performance of the proposed model for each class, and Table 5 presents the overall superiority of the proposed model over other models in the literature [17,18,19,20,21]. The classification accuracy of the proposed model and the F1 score of sleep staging improved to a certain extent when BiRNN units were used, demonstrating the effectiveness of the proposed model. The accuracy of the DL model for sleep EEG signals based on BiRNNs proposed in this paper improved by 0–5%.

In Table 5, IITNET [17] is an intra- and inter-epoch temporal context network, which extracts the features of sub-epochs. It extracts features by a residual neural network and captures temporal contexts from the sequence of the features via bidirectional LSTM. However, ResNet50 has a very large number of parameters. DeepSleepNet [18] is used for automatic sleep stage scoring based on raw single-channel EEG. It extracts time-invariant features by CNNs, and captures temporal contexts from the sequence of the features by bidirectional LSTM. However, it is not comprehensive to consider only time-invariant features for EEG. TinySleepNet [19] is an improved version of the DeepSleepNet model. It uses a data enhancement approach and a simpler network framework to reduce network parameters and save computational resources. Sun et al. [20] proposed an attention-based convolutional neural network to score sleep stages automatically, which uses convolutional pooling for feature extraction without consideration of correlation. CCRRSleepNet [21] is a novel deep learning model based on hybrid relational inductive biases. It adopts a variety of matching non-relational inductive biases for optimizing the model. However, it misses the non-linear dependencies present in the time series. Compared with others, this method achieves better performance for several metrics. The best performance is achieved in the W and N1 phases, and the second highest overall accuracy is achieved for the Fpz-Cz channel, and the highest overall accuracy is achieved in the Fpz-Oz channel. Since the performance is balanced and not much different in N2, N3, and REM channels. The method is better in W and N1 stages and generally achieves better performance in both channels.

7. Conclusions

In this study, we developed a DL model for sleep EEG signals based on BiRNNs. The model uses CNN as a feature extractor to extract time-domain and frequen-cy-domain features from the original single-channel EEG signal and uses BiRNN units to further learn the contextual information of the input features. Cross-validation and comparison with other models revealed that the proposed model is superior in terms of accuracy and comprehensive discriminability of EEG signal sleep staging and can pro-vide an effective auxiliary diagnostic tool for sleep disorders, sleep detection, and sleep disease diagnosis, which is beneficial for the detection and application of sleep disorders and arrhythmia.

Author Contributions

Conceptualization, Z.F.; methodology, Z.F.; software, Z.F.; validation, Z.F., C.H., L.Z., S.W. and Y.Z.; formal analysis, Z.F.; resources, C.H., L.Z., S.W. and Y.Z.; data curation, Z.F.; writing—original draft preparation, Z.F.; writing—review and editing, Z.F., C.H. and L.Z.; visualization, Z.F.; supervision, C.H. and L.Z.; project administration, C.H. and L.Z.; funding acquisition, C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Key project of Shenzhen Science and technology plan project, grant number 2020N061; National Natural Science Foundation of China, grant number 61977021; Hubei Province Technology Innovation Special Project (Major Project), grant number 2020AEA008; National Training Program for Undergraduate, grant number 202110512020.

Institutional Review Board Statement

Ethical review and approval were waived for this study, Due to the fact that the study involves only data collection with external devices, there are no ethical issues involved and the relevant data have been desensitized.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to data are common to the authors’ institutions.

Acknowledgments

The authors are very grateful to the Brain–Computer Interface Laboratory of Hubei University for providing the experimental equipment and devices, etc.; The authors are very grateful to Wu Wei and Shenzhen Skyworth-RGB Electronics Co., Ltd. for the help they provided us with data support, technical support and financial support.

Conflicts of Interest

The authors declare no conflict of interest.

References

Abou Jaoude, M.; Sun, H.; Pellerin, K.R.; Pavlova, M.; Sarkis, R.A.; Cash, S.S.; Westover, M.B.; Lam, A.D. Expert-level automated sleep staging of long-term scalp electroencephalography recordings using deep learning. Sleep 2020, 43, zsaa112. [Google Scholar] [CrossRef] [PubMed]
Sors, A.; Bonnet, S.; Mirek, S.; Vercueil, L.; Payen, J.-F. A convolutional neural network for sleep stage scoring from raw single-channel EEG. Biomed. Signal Process. Control 2018, 42, 107–114. [Google Scholar] [CrossRef]
Zhang, J.; Tang, Z.; Gao, J.; Lin, L.; Liu, Z.; Wu, H.; Liu, F.; Yao, R. Automatic Detection of Obstructive Sleep Apnea Events Using a Deep CNN-LSTM Model. Comput. Intell. Neurosci. 2021, 2021, 5594733. [Google Scholar] [CrossRef] [PubMed]
Zhao, D.; Jiang, R.; Feng, M.; Yang, J.; Wang, Y.; Hou, X.; Wang, X. A deep learning algorithm based on 1D CNN-LSTM for automatic sleep staging. Technol. Health Care 2022, 30, 323–336. [Google Scholar] [CrossRef] [PubMed]
Chang, S.G.; Yu, B.; Vetterli, M. Adaptive wavelet thresholding for image denoising and compression. IEEE Trans. Image Process. 2000, 9, 1532–1546. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Singhal, A.; Singh, P.; Fatimah, B.; Pachori, R.B. An efficient removal of power-line interference and baseline wander from ECG signals by employing Fourier decomposition technique. Biomed. Signal Process. Control 2020, 57, 101741. [Google Scholar] [CrossRef]
Cao, J.; He, Z.; Wang, J.; Yu, P.; Gjorgjevikj, D. An Antinoise Fault Diagnosis Method Based on Multiscale 1DCNN. Shock. Vib. 2020, 2020, 8819313. [Google Scholar] [CrossRef]
Kattenborn, T.; Leitloff, J.; Schiefer, F.; Hinz, S. Review on Convolutional Neural Networks (CNN) in vegetation remote sensing. ISPRS J. Photogramm. Remote Sens. 2021, 173, 24–49. [Google Scholar] [CrossRef]
Donoghue, T.; Schaworonkow, N.; Voytek, B. Methodological considerations for studying neural oscillations. Eur. J. Neurosci. 2022, 55, 3502–3527. [Google Scholar] [CrossRef] [PubMed]
Sherstinsky, A. Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef] [Green Version]
Bodapati, S.; Bandarupally, H.; Shaw, R.N.; Ghosh, A. Comparison and Analysis of RNN-LSTMs and CNNs for Social Reviews Classification. In Advances in Applications of Data-Driven Computing; Bansal, J.C., Fung, L.C.C., Simic, M., Ghosh, A., Eds.; Springer: Singapore, 2021; pp. 49–59. [Google Scholar]
Mousavi, S.; Afghah, F.; Acharya, U.R. SleepEEGNet: Automated sleep stage scoring with sequence to sequence deep learning approach. PLoS ONE 2019, 14, e0216456. [Google Scholar] [CrossRef] [Green Version]
Yu, W.; Kim, I.Y.; Mechefske, C. Analysis of different RNN autoencoder variants for time series classification and machine prognostics. Mech. Syst. Signal Process. 2021, 149, 107322. [Google Scholar] [CrossRef]
Sunil Kumar, K.; Shivashankar, D.; Keshavamurthy, K. Bio-signals Compression Using Auto Encoder. J. Electr. Comput. Eng. Q 2021, 2, 424–433. [Google Scholar]
Ji, Y.; Zhang, H.; Zhang, Z.; Liu, M. CNN-based encoder-decoder networks for salient object detection: A comprehensive review and recent advances. Inf. Sci. 2021, 546, 835–857. [Google Scholar] [CrossRef]
Hwang, S.; Hong, K.; Son, G.; Byun, H. Learning CNN features from DE features for EEG-based emotion recognition. Pattern Anal. Appl. 2019, 23, 1323–1335. [Google Scholar] [CrossRef]
Back, S.; Lee, S.; Seo, H.; Park, D.; Kim, T.; Lee, K. Intra-and inter-epoch temporal context network (IITNet) for automatic sleep stage scoring. arXiv 2019, arXiv:1902.06562. [Google Scholar]
Supratak, A.; Dong, H.; Wu, C.; Guo, Y. DeepSleepNet: A Model for Automatic Sleep Stage Scoring Based on Raw Single-Channel EEG. IEEE Trans. Neural Syst. Rehabil. Eng. 2017, 25, 1998–2008. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Supratak, A.; Guo, Y. TinySleepNet: An efficient deep learning model for sleep stage scoring based on raw single-channel EEG. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020; pp. 641–644. [Google Scholar]
Sun, Y.; Wang, B.; Jin, J.; Wang, X. Deep convolutional network method for automatic sleep stage classification based on neurophysiological signals. In Proceedings of the 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), Beijing, China, 13–15 October 2018; pp. 1–5. [Google Scholar]
Neng, W.; Lu, J.; Xu, L. CCRRSleepNet: A Hybrid Relational Inductive Biases Network for Automatic Sleep Stage Classification on Raw Single-Channel EEG. Brain Sci. 2021, 11, 456. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Structure of the proposed model.

Figure 2. Flowchart of the proposed model.

Figure 3. Time-domain and frequency-domain feature extraction by using a CNN.

Figure 4. Cell structure of the bidirectional RNN diagram used in this study.

Figure 5. Flowchart of the wavelet denoising technique.

Figure 6. Comparison of the predicted and actual values.

Figure 7. Power spectral density plot of the EEG Fpz-Cz channel and the EEG Pz-Oz channel interception for the W, N1, N2, N3, and REM stages.

Table 1. Common EEG features and their appearance stages and waveforms.

Name	Frequency	Emergence Stage
α	8~13 Hz	N1, REM
β	14~30 Hz	N2
δ	0.5~4 Hz	N2, N3
θ	4~8 Hz	N1

Table 2. The category and quantity of Sleep-EDF dataset.

Dataset	W	N1	N2
Sleep-EDF	7927	2804	17,799
Dataset	N3	REM	Total
Sleep-EDF	5702	7717	41,950

Table 3. Confusion matrix and classification performance of each class of the sleep EEG signal deep learning model based on bidirectional RNN on the Fpz-Cz channel.

Fpz-Cz		Predicted Value					Statistics (%)
Fpz-Cz		W	N1	N2	N3	REM	Acc	Recall	F1-Score	Spe
True Value	W	20,146	1021	242	60	236	87.23	92.81	89.93	94.24
	N1	1503	4007	785	57	701	49.99	56.81	53.18	94.14
	N2	630	2017	26,058	1167	536	89.01	85.69	87.32	92.94
	N3	56	27	611	7377	6	84.64	91.33	87.86	97.85
	REM	760	944	1580	54	10,819	87.97	76.42	81.79	97.50

Table 4. Confusion matrix and classification performance of each class of the sleep EEG signal deep learning model based on bidirectional RNN on Pz-Oz channel.

Pz-Oz		Predicted Value					Statistics (%)
Pz-Oz		W	N1	N2	N3	REM	Acc	Recall	F1-Score	Spe
True Value	W	12,988	560	100	7	343	86.96	92.78	89.80	90.95
	N1	1403	1803	859	58	211	48.40	41.60	44.73	94.12
	N2	162	581	12,587	61	137	82.12	92.78	87.12	88.61
	N3	10	20	794	1679	1	92.92	67.05	77.80	99.58
	REM	372	762	878	2	4007	85.27	66.55	74.75	97.63

Table 5. Comparison of sleep EEG signal deep learning model and other classification methods based on bidirectional RNN.

Methods	Datasets	EEG Channels	Accuracy %	F1 %
Methods	Datasets	EEG Channels	Accuracy %	W	N1	N2	N3	REM
IITNET [17]	Sleep-EDF	Fpz-Cz	84.0	87.9	44.7	88.0	85.7	82.1
DeepSleepNet [18]	Sleep-EDF	Fpz-Cz	82.0	84.7	46.6	85.9	84.8	82.4
TinySleepNet [19]	Sleep-EDF	Fpz-Cz	83.6	86.8	49.9	87.4	86.4	80.6
CCRRSleepNet [21]	Sleep-EDF	Fpz-Cz	84.29	89.01	51.73	87.25	88.20	82.86
Models in this paper	Sleep-EDF	Fpz-Cz	84.04	89.93	53.18	87.32	87.86	81.79
DeepSleepNet [18]	Sleep-EDF	pz-Oz	79.8	88.1	37	82.7	77.3	80.3
Sun et al. [20]	Sleep-EDF	pz-Oz	81.0	85.6	24.9	88.9	79.2	86.3
CCRRSleepNet [21]	Sleep-EDF	pz-Oz	80.31	86.01	41.54	84.87	80.97	79.56
Models in this paper	Sleep-EDF	Pz-Oz	81.64	89.80	44.73	87.12	77.80	74.75

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fu, Z.; Huang, C.; Zhang, L.; Wang, S.; Zhang, Y. Deep Learning Model of Sleep EEG Signal by Using Bidirectional Recurrent Neural Network Encoding and Decoding. Electronics 2022, 11, 2644. https://doi.org/10.3390/electronics11172644

AMA Style

Fu Z, Huang C, Zhang L, Wang S, Zhang Y. Deep Learning Model of Sleep EEG Signal by Using Bidirectional Recurrent Neural Network Encoding and Decoding. Electronics. 2022; 11(17):2644. https://doi.org/10.3390/electronics11172644

Chicago/Turabian Style

Fu, Ziyang, Chen Huang, Li Zhang, Shihui Wang, and Yan Zhang. 2022. "Deep Learning Model of Sleep EEG Signal by Using Bidirectional Recurrent Neural Network Encoding and Decoding" Electronics 11, no. 17: 2644. https://doi.org/10.3390/electronics11172644

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning Model of Sleep EEG Signal by Using Bidirectional Recurrent Neural Network Encoding and Decoding

Abstract

1. Introduction

2. Methods

2.1. Model Structure

2.2. Model Construction

3. Data Preprocessing

3.1. Wavelet Threshold Denoising

3.2. EEG Signal Processing

4. Extraction of Time-Domain and Frequency-Domain-Based Features

4.1. CNNs

4.2. Time-Domain and Frequency-Domain Feature Extraction

5. The Encoding–Decoding Module Using BiRNN Units

5.1. BiRNN Cell Design

5.2. Encoder Design

5.3. Decoder Design

5.4. Loss Functions

6. Experiment and Results

6.1. Experimental Dataset

6.2. Experimental Parameters Setting

6.3. Analysis of Experimental Results

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI