*2.3. Method*

Figure 1 shows the schematic outline of the proposed IMBEFs based sleep statge classification algorithm comprising preprocessing, wavelet package decomposition, locality energy calculation, state space models estimation, features extraction, classifier training and performance evaluation.

#### 2.3.1. EEG Data Preprocessing

Firstly, all the single-channel data will be extracted by the Matlab and EEGLAB [22] tools from the three database described previously. According to the prior work [5–11], the 0–35 Hz low pass filter can be used to eject the most of artifact. Once the dataset is filtered, it will be exported as one-dimensional vector without time information and saved as txt file which also can be denoted as the Formula (1).

$$\mathbf{X} = [\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_k, \dots, \mathbf{x}\_M], k \in [1, M], \mathbf{x}\_k \in \mathbb{R} \tag{1}$$

where **X** is the vector containing the sampled EEG *xk* and where *M* is the length of vector.

Furthermore, we use a window of length *j* to divide the full data **X** across time without overlap. That is **X** is converted into [**X**1, **X**2,..., **X***i*,..., **X***L*] *<sup>T</sup>* which can be described as (2).

$$\begin{bmatrix} \mathbf{X}\_1 \\ \mathbf{X}\_2 \\ \vdots \\ \mathbf{X}\_i \\ \mathbf{X}\_i \\ \vdots \\ \mathbf{X}\_L \end{bmatrix} = \begin{bmatrix} \mathbf{x}\_1 & \mathbf{x}\_2 & \cdots & \mathbf{x}\_j \\ \mathbf{x}\_{j+1} & \mathbf{x}\_{j+2} & \cdots & \mathbf{x}\_{2j} \\ \vdots & \vdots & \cdots & \vdots \\ \mathbf{x}\_{(i-1)j+1} & \mathbf{x}\_{(i-1)j+2} & \cdots & \mathbf{x}\_{i \times j} \\ \vdots & \vdots & \vdots & \vdots \\ \mathbf{x}\_{(L-1)j+1} & \mathbf{x}\_{(L-1)j+2} & \cdots & \mathbf{x}\_{L \times j} \end{bmatrix} \tag{2}$$
 
$$L = \left\lfloor \frac{M}{j} \right\rfloor, i \in [1, L]$$

where *j* = *Te* × *Fs*. The *Te* is the length of each epoch. Moreover, the *Fs* is the sampling frequency of the database. For the S-EDF database, the *Te* = 30 and the *Fs* = 100, so the *j* is 3000. Moreover, for the ISRUC3 database, the *Te* = 20 and the *Fs* = 200, so the *j* is 4000.

### 2.3.2. Wavelet Package Decomposition

WPD is a powerful tool to analyze non-stationary EEG signals. In essence, WPD is a wavelet transform where the discrete-time signal is passed through more filters than the discrete wavelet transform, which can provide a multi-level time-frequency decomposition of signals [23]. Compared with discrete wavelet transform, WPD can provide more frequency resolutions. In the discrete wavelet transform, a signal is split into an approximation coefficient and a detail coefficient [24]. The approximation coefficient is then itself split into a second-level approximation coefficients and detail coefficients and the process is repeated. A wavelet packet function *ω<sup>m</sup> <sup>l</sup>*,*d*(*q*) is defined as (3):

$$
\omega\_{l,d}^{m}(q) = 2^{l/2} \omega^{m} (2^{l}q - d) \tag{3}
$$

where *l* and *d* are the scaling (frequency localization) parameter and the translation (time localization) parameter, respectively; *m* = 0, 1, 2, . . . is the oscillation parameter.

Wavelet packet (WP) coefficients of the EEG epoch *Xi* are embedded in the inner product of the signal with every WP function, denoted by *pi*,*<sup>m</sup> <sup>l</sup>* (*d*), *d* = ..., −1, 0, 1, ... and given below:

$$\mathbf{p}\_{l}^{i,m}(d) = \sum \mathbf{x}^{i}(q)\omega\_{l,d}^{m}(q) \tag{4}$$

where *pi*,*<sup>m</sup> <sup>l</sup>* (*d*) denotes the *m*-th set of WPD coefficients at *l*-th scale parameter and *d* is the translation parameter. All frequency components and their occurring times are reflected in *pi*,*<sup>m</sup> <sup>l</sup>* (*d*) through change in *m*, *l*, *d*. Each coefficient *p<sup>m</sup> <sup>l</sup>* (*d*) measures a specific sub-band frequency content, controlled by scaling parameter *l* and oscillation parameter *m*. The essential function of WPD is the filtering operation through low-pass filter *h*(*d*) and high-pass filter *g*(*d*). For the *l*-th level of decomposition, there are 2*<sup>l</sup>* sets of sub-band coefficients *C<sup>i</sup> <sup>l</sup>*,*<sup>m</sup>* , of length *<sup>j</sup>*/2*<sup>l</sup>* . The wavelet packet coefficients of epoch *Xi* are given as

$$\mathcal{L}\_{l,m}^{i} = \{ p\_l^{i,m}(d) | d = 1, 2, \ldots, j/2^l \} \tag{5}$$

It can be seen from the (5) that each node of the WP tree is indexed with a pair of integers (*l*, *m*), where *l* is the corresponding level of decomposition and *m* is the order of the node position in the specific level. Here, the level *lLE* and wavelet basis *ωLE* of WPD on the epoch *Xi* for LE calculation will be confirmed in the Section 3. Moreover, the wavelet basis *ωDSSM* for DSSM will be confirmed in the same section.

### 2.3.3. Locality Energy Calculation

The wavelet package energy *E<sup>i</sup> lLE*,*<sup>m</sup>* at the *m*-th node on the level *lLE* of epoch *Xi* can be defined as follows [25].

$$E\_{l\_{l\to m}}^i = \sum |p\_l^{i,m}(d)|^2 = |\mathbb{C}\_{l\_{l\to m}}^i|^2, m = \{1, 2, \dots, 2^{l\_{l\to i}}\} \tag{6}$$

Then, the locality energy features (LEFs) of each Epoch can be defined as {*E<sup>i</sup> lLE*,*m*|*<sup>m</sup>* = 1, 2, . . . , 2*lLE* }.

### 2.3.4. Dual State Space Models Estimation

As we have described before, after the wavelet packet decomposition, the low-level (the first level) coefficients will be used to estimate the dual state space models which can denoted by the difference Equation (7).

$$\begin{cases} \mathbf{u}\_{k+1} = \mathbf{A}\mathbf{u}\_k + \mathbf{K}\boldsymbol{\varepsilon}\_k\\ \quad y\_k = \mathbf{B}\mathbf{u}\_k + \mathbf{e}\_k \end{cases} \tag{7}$$

The *yk* ∈ *<sup>C</sup><sup>i</sup>* 1,*<sup>m</sup>* is the coefficient at instant *<sup>k</sup>* <sup>∈</sup> [1, 2, . . . , *<sup>j</sup>*/2]. Vector *uk* <sup>∈</sup> <sup>R</sup>*n*×<sup>1</sup> is the state vector of process at discrete time instant *<sup>k</sup>* and contains the numerical value of *<sup>n</sup>* states. Matrix **<sup>A</sup>** <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* is the dynamical system matrix. **<sup>K</sup>** <sup>∈</sup> <sup>R</sup>*n*×<sup>1</sup> is the steady state Kalman gain. **<sup>B</sup>** <sup>∈</sup> <sup>R</sup>1×*<sup>n</sup>* is the output matrix, which describes how the internal state is transferred to the outside world in the observations *yk*. The *ek* <sup>∈</sup> <sup>R</sup> denotes zero mean white noise.

With the traditional subspace algorithm such as N4SID, the matrix **A**+, **B**+, **K**+ of the state space model of dynamic system can be estimated [26]. In this paper, the order *nDSSM* of dual state space models will be determined by the experiments in the Section 3. Moreover, the parameter matrixes of state space model estimated by the first level wavelet coefficients *C<sup>i</sup>* 1,*<sup>m</sup>* can be expressed as

$$\begin{aligned} \hat{\mathbf{A}}\_{1,m}^{i} &= \begin{bmatrix} a\_{1,1}^{i,m} & \cdots & a\_{1,n\_{DSSM}}^{i,m} \\ \vdots & \cdots & \vdots \\ a\_{n\_{DSSM},1}^{i,m} & \cdots & a\_{n\_{DSSM},n\_{DSSM}}^{i,m} \end{bmatrix} \\ \hat{\mathbf{B}}\_{1,m}^{i} &= \begin{bmatrix} b\_{1}^{i,m} \ b\_{2}^{i,m} \dots \ b\_{n\_{DSSM}}^{i,m} \end{bmatrix} \\ \hat{\mathbf{K}}\_{1,m}^{i} &= \begin{bmatrix} k\_{1}^{i,m} \ b\_{2}^{i,m} \dots \ k\_{n\_{DSSM}}^{i,m} \end{bmatrix}^{T} \\ i &\in \{1, L\}, m = \{1, 2\} \end{aligned} \tag{8}$$

Then the DSSM *Si* of the **X***<sup>i</sup>* can be defined as:

$$\mathcal{S}\_{i} = \begin{bmatrix} s\_1^i \\ s\_2^i \end{bmatrix} = \begin{bmatrix} a\_{1,1}^{i,1} & \cdots & a\_{n\_{DSSSM}\mathcal{W}\_{DSSSM}}^{i,1} & b\_1^{i,1} & \cdots & b\_{n\_{DSSSM}}^{i,1} & k\_1^{i,1} & \cdots & k\_{n\_{DSSSM}}^{i,1} \\\ a\_{1,1}^{i,2} & \cdots & a\_{n\_{DSSSM}\mathcal{W}\_{DSSSM}}^{i,2} & b\_1^{i,2} & \cdots & b\_{n\_{DSSSM}}^{i,2} & k\_1^{i,2} & \cdots & k\_{n\_{DSSSM}}^{i,2} \end{bmatrix} \tag{9}$$

So, the parameters extracted from DSSM here is called DSSM Features (DSSMFs) can be defined as *DSSMFs* = % *si* 1 *si* 2 & .

### 2.3.5. IMBEFs Construction

According to the previously calculated LEFs *E<sup>i</sup> lLE*,*<sup>m</sup>* and the parameters *Si* of the DSSM, the features IMBEFs of epoch *Xi* here are given by

$$\mathbf{E}\_{DSSM}^{i} = \begin{bmatrix} \mathbf{E}\_{l\_{LE}1}^{i} & \dots & \mathbf{E}\_{l\_{LE}2^{l\_{LE}}}^{i} \mathbf{s}\_{1}^{i} \mathbf{s}\_{2}^{i} \end{bmatrix} \tag{10}$$

The feature dimension can be calculated by the Equation (11).

$$\mathcal{D}im\_{DSSM} = 2^{l\_{LE}} + 2(n\_{DSSM}^2 + 2 \times n\_{DSSM}) \tag{11}$$

Here, the general form of features extracted from LE and multiple state space models (MSSM) which are estimated by the *lMSSM*-th level WPD coefficients can be depicted as Equation (12).

$$F\_{MSSM}^i = \begin{bmatrix} E\_{l\_{LE}1}^i \ \dots \ E\_{l\_{LE}2^{l\_{LE}}}^i \ s\_1^i \ \dots \ s\_{2^{l\_{MSSM}}}^i \end{bmatrix} \tag{12}$$

The dimension of the *F<sup>i</sup> MSSM* can be calculated by

$$Dim\_{MSSM} = 2^{l\_{LE}} + 2^{l\_{MSSM}}(n\_{MSSM}^2 + 2 \times n\_{MSSM}) \tag{13}$$

where *nMSSM* is the order of MSSM. Usually, the *nMSSM* range from 5 to 10. Assume the *nMSSM* = 5, *DimDSSM* = <sup>2</sup>*lLE* + <sup>2</sup>*lMSSM* × 40. Then if *lMSSM* > 2, the *DimMSSM* will be too large. So the *lMSSM* is set to 1 in this paper, which means there are two state space models.

### **3. Experiments and Results**

In this section, there are four experimental parts. The first is the experiment for selecting a suitable classifier among several candidate classifiers. Then is the determination of the most suitable wavelet basis *ωDSSM* and model order *nDSSM* for DSSM estimation. Next is the selection of the wavelet basis *ωLE* and the level *lLE* for the LE calculation. Finally, the test experiment will be conducted on the S-EDF database and ISRUCS3 database with the *ωDSSM*, *ωLE*, *nDSSM* and *lLE* determined according to the previous experiments.

In the process of selecting these parameters, the DRMS database was adopted for testing under the both R&K and AASM standards. There are several conventional verification strategies, including k-fold cross-validation, leave one-subject-out cross-validation (LOOCV) and corss-dataset validation, etc. In this paper, many commonly-used databases are adopted to verify the performance of the algorithm, in which the S-EDF database and the DRMS database contains lots of subjects. However, some subjects contained in these database possess unevenly distributed samples, which means the incomplete sleep stages. Consequently, the 10-fold cross-validation method would be more suitable for the performance verification in this research. In 10-fold cross-validation, the original sample is randomly partitioned into 10 equal size subsamples. Of the 10 subsamples, a single subsample is retained as the validation data for testing the model and the remaining nine subsamples are used as training data. The cross-validation process is then repeated 10 times, with each of the 10 subsamples used exactly once as the validation data. The 10 results from the folds can then be averaged to produce a single estimation. The advantage of this method is that all observations are used for both training and validation and each observation is used for validation exactly once. The accuracy (ACC) and Cohen's Kappa Coefficient (Kappa) are computed to evaluate the overall classification performance.

$$\text{ACC} = \frac{TP + TN}{TP + TN + TN + FN} \times 100\% \tag{14}$$

$$Kappa = \frac{ACC - p\_{\varepsilon}}{1 - p\_{\varepsilon}} \tag{15}$$

where TP, TN, FP and FN represent the number of true positive, true negative, false positive and false negative examples respectively. And *pe* is the hypothetical probability of agreement by chance.

### *3.1. Classifier Comparison and Selection*

In this section, an algorithm is designed to search the best classifier for the method proposed in this paper. The detailed steps are shown in the Algorithm 1 below. In this algorithm, according to the distribution and characteristic of the samples, the candidate classifiers are including Linear Discriminant, Quadratic Discriminant, Quadratic SVM, Fine KNN, Bagged Trees and RUSBoosted Trees. The candidate wavelet bases include the db1, db2, db3, db4, db5, db6, db8, db16, db32, sym2, sym8, sym16, coif1, coif3 and dmey. The order of DSSM range from 5 to 10. Here only the DSSMFs are used for training and validation.

Table 4 shows the experiment results of Algorithm 1. As can be seen from Table 4, the Bagged Trees is the optimal classifier in the classification of two to six classes. At the meantime, the corresponding order of DSSM is 6. In addition, in the two class classification, the optimal wavelet basis is sym2; the others, however, are db1. Furthermore, the comparison of different classifiers in two classes classification under the condition of *nDSSM* = 6 are listed in Table 5. It can be seen from Table 5 that the accuracy of sym2 is 95.79%, which is a little higher than the 95.71% of db1 and 95.72% of db2. Therefore, considering the results in Tables 4 and 5 , the Bagged Trees will be used as the classifier for subsequent experiments.

**Table 4.** The outputs of Algorithm 1.


**Table 5.** Comparison of different classifiers in two class classification with different wavelet. The *nDSSM* = 6. Highest values are highlighted in boldface.


**Input:**

**Algorithm 1:** Search the Optimal Classifier.

db32, sym2, sym8, sym16, coif1, coif3, dmey;

```
CCA:the candidate classifier array,including the Linear Discriminant, Quadratic Discriminant,
  Quadratic SVM, Fine KNN, Bagged Trees and RUSBoosted Trees;
  SSMOV:the state space model order array range from 5 to 10;
  Output: the optimal classifier array OCA , the corresponding order array COV and wavelet
         basis array CWBA
1 Initialize the samples matrix SM, validation Accuracy VA = 0, the matrix VAM to store the
   VA, the current classifier CC, the current wavelet basis CWB, the current model order CMO,
   temp value TMP = 0 ;
2 for i ∈ [1, 5] do
3 for j ∈ [1, 6] do
4 CMO = SSMOV(j);
5 for k ∈ [1, 15] do
6 CWB = WBS(k);
7 Select and Construct the SM according to the CMO and CWB
8 for m ∈ [1, 6] do
9 CC = CCA(m);
10 Put the SM into CC for training and verification with 10-fold cross-validation
              method, get the VA
11 VAM(m + (j − 1) × 6, k) = VA
12 end
13 end
14 end
15 for n ∈ [1, 36] do
16 for p ∈ [1, 15] do
17 TMP = 0
18 if VAM(n, p) > temp then
19 OCA(i)= CCA(n%6);
20 COV(i) = SSMOV(ceil(n/6));
             /* the ceil(X) rounds X to the nearest integer greater than or
                equal to X. */
21 CWBA(i) = WBS(p);
22 end
23 end
24 end
25 end
26 Output the OCA, COV, CWBA;
```
*WBS*:the candidate wavelet base array, including the db1, db2, db3, db4, db5, db6 ,db8 ,db16,
