Next Article in Journal
Validation of an Aesthetic Assessment System for Commercial Tasks
Previous Article in Journal
Unitarity and Page Curve for Evaporation of 2D AdS Black Holes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Permutation Entropy-Based Interpretability of Convolutional Neural Network Models for Interictal EEG Discrimination of Subjects with Epileptic Seizures vs. Psychogenic Non-Epileptic Seizures

by
Michele Lo Giudice
1,2,
Giuseppe Varone
3,
Cosimo Ieracitano
4,
Nadia Mammone
4,
Giovanbattista Gaspare Tripodi
5,
Edoardo Ferlazzo
1,5,
Sara Gasparini
1,5,
Umberto Aguglia
1,5 and
Francesco Carlo Morabito
4,*
1
Department of Science Medical and Surgery, University of Catanzaro, 88100 Catanzaro, Italy
2
DIIES Department, University “Mediterranea” of Reggio Calabria, 89100 Reggio Calabria, Italy
3
Department of Neuroscience & Imaging, University G. d’Annunzio Chieti e Pescara, 66100 Chieti, Italy
4
DICEAM Department, University “Mediterranea” of Reggio Calabria, 89100 Reggio Calabria, Italy
5
Regional Epilepsy Center, Great Metropolitan Hospital “Bianchi-Melacrino-Morelli” of Reggio Calabria, 89124 Reggio Calabria, Italy
*
Author to whom correspondence should be addressed.
Entropy 2022, 24(1), 102; https://doi.org/10.3390/e24010102
Submission received: 6 December 2021 / Revised: 1 January 2022 / Accepted: 4 January 2022 / Published: 9 January 2022

Abstract

:
The differential diagnosis of epileptic seizures (ES) and psychogenic non-epileptic seizures (PNES) may be difficult, due to the lack of distinctive clinical features. The interictal electroencephalographic (EEG) signal may also be normal in patients with ES. Innovative diagnostic tools that exploit non-linear EEG analysis and deep learning (DL) could provide important support to physicians for clinical diagnosis. In this work, 18 patients with new-onset ES (12 males, 6 females) and 18 patients with video-recorded PNES (2 males, 16 females) with normal interictal EEG at visual inspection were enrolled. None of them was taking psychotropic drugs. A convolutional neural network (CNN) scheme using DL classification was designed to classify the two categories of subjects (ES vs. PNES). The proposed architecture performs an EEG time-frequency transformation and a classification step with a CNN. The CNN was able to classify the EEG recordings of subjects with ES vs. subjects with PNES with 94.4% accuracy. CNN provided high performance in the assigned binary classification when compared to standard learning algorithms (multi-layer perceptron, support vector machine, linear discriminant analysis and quadratic discriminant analysis). In order to interpret how the CNN achieved this performance, information theoretical analysis was carried out. Specifically, the permutation entropy (PE) of the feature maps was evaluated and compared in the two classes. The achieved results, although preliminary, encourage the use of these innovative techniques to support neurologists in early diagnoses.

1. Introduction

Epilepsy is a chronic neurological disorder that affects the nervous system. Over 65 million cases have been reported worldwide [1]. It is caused by altered neuronal activity in the brain and is characterized by recurrent and unpredictable episodes called epileptic seizures (ES). It is ordinarily diagnosed by a neurologist along with long interviews and clinical exams. Epilepsy may be due to a brain injury, or have a genetic, immune, brain structure or metabolic cause, but unfortunately for many patients, the cause remains unknown [2].
Electroencephalography (EEG) is a measure of the electrical activity produced by the human brain, recorded on the scalp of the head. It is the most useful diagnostic tool for epilepsy [3] as it provides detailed information, with excellent temporal resolution, on the state of the brain. Therefore, EEG may be useful for discrimination between ES and psychogenic non-epileptic seizures (PNES), which in contrast, are episodes of movements, sensations or behaviors that are similar to epileptic seizures but do not have an epileptic origin and do not show any ictal epileptiform activity [4,5]. PNES typically begin in young adulthood and, these patients frequently are misdiagnosed and treated for epilepsy. Correct diagnosis is important because of the potential iatrogenic hazards, such as side effects of anti-epileptic drugs and failure to recognize pseudo-status-epilepticus with a potential outcome of intensive care unit treatment and intubation [6,7,8]. The differential diagnosis could be very complex, also because interictal EEGs in subjects with ES may be normal or non-specific using visual analysis, as usually occurs for PNES patients. On the contrary, the association of some features with a specific class of subjects could be a source of clear class determination for an artificial intelligence (AI) system. In fact, the literature [9,10,11] yields examples of EEG data processing with machine/deep learning (DL) with excellent results.
Furthermore, the use of these complex classification algorithms in critical areas has led to a growing interest in the interpretability of the results, i.e., in understanding how the networks behaved.
Permutation entropy (PE) is a tool for the analysis of time series that allows one to code important information even in time dynamics in a simple, robust way and with low computational costs. The PE can be used for the understanding of complex and chaotic systems, to provide interpretability of the behavior of time series in the classification with AI (deep) methods.
In the present study we analyzed resting state EEG recordings from subjects with ES and subjects with PNES by means of a DL pipeline including a wavelet transformation and a classification with a convolutional neural network (CNN). Experimental results proved that the proposed CNN outperformed all other standard classifiers, achieving an average accuracy rate of up to 94.4% in the classification of subjects with ES and PNES, respectively. To analyze the interpretability of the neural network, an information theoretical approach based on PE was proposed. To the best of our knowledge, this is the first application of PE as a measure of interpretability of deep neural networks.
The list of original contributions of the present study can be outlined as follows:
  • Development of a data-driven DL pipeline based on CNN and wavelet decomposition for interictal EEG discrimination of ES vs. PNES subjects;
  • Development of a information theoretical approach based on PE to perform the interpretability analysis of DL models;
  • Development of a system with potential for clinical deployment in the real-world for early or difficult diagnosis.
The remainder of this paper is organized as follows: Section 2 presents the related literature review; Section 3 describes the experimental data, introduces the chosen methodology and itemizes the proposed CNN architecture; Section 4 illustrates the achieved experimental results of CNN and a comparison with standard classifiers; Section 5 and Section 6 address the discussion and conclusions, respectively.

2. Related Works

Several EEG-based classification algorithms have been employed to aid the diagnosis among different neurological conditions by using state-of-the-art ML algorithms (i.e., LDA, SVM, ANNs) [9,10,11,12,13,14]. In particular, ML approaches for EEG analysis to early diagnose ES have attracted a lot of interest from the scientific community in recent years [12,15,16]. Rasheed et al. [12] have provided an overview of application of ML methods for predicting ES and [17] have exposed the major issues related to methodology of ES prediction. Varone et al. [15] compared different machine learning algorithms, using the power spectral density of the EEG traces of healthy subjects and with PNES, obtaining a considerable percentage of accuracy. Clarke et al. [16] reports on a deep learning algorithm for computer-assisted EEG review with mean sensitivity of >95% and corresponding mean false positive rate of 1 detection per minute. Ahmadi et al. [18] have classified ES and PNES using the imperialist competitive algorithm on EEG including ictal recordings finding that spectral entropy and Rényi entropy were the most important EEG features, with good classification results using support vector machine (SVM).
Furthermore, PE has been widely employed to directly analyze the temporal information contained in the time series and the abnormalities of brain activity in patients with different neurological conditions [19,20]. Yan et al. [21] proposed an PE network-based algorithm, able to estimate the complexity of EEG signals of control subjects and epileptic patients. They have found lower PE values on EEG signals of epileptic patient compared to control. Li et al. [22] applied PE for predictability analysis of absence seizures, they detect pre-seizure state in 169 out of 314 seizures from 28 rats successfully with an average anticipation time of 4.9 s. Morabito et al. [23] proposed Multivariate Multi-Scale PE to distinguish among the brain states related to Alzheimer’s disease patients and Mild Cognitive Impaired subjects from normal healthy. Its proposal can be used as a complementary synthetic biomarker of the various effects of these diseases on EEG. Mammone et al. [24] introduced the permutation Rényi entropy (PEr) to discriminate interictal states from ictal states in absence seizure EEG. They demonstrated that PEr outperformed PE in their classification, but they still reputed PE a helpful tool to disclose other abnormalities of cerebral electric activity not revealed by conventional EEG recordings [20]. Other uses of PE has been exploited successfully in other areas [25,26,27].

3. Materials and Methods

Interictal EEGs were collected from two groups of subjects: patients with new-onset, clinically diagnosed ES, and patients with video-EEG diagnosed PNES. In particular, ES were diagnosed on a clinical basis only, while PNES were diagnosed with the support of video-EEG showing a typical episode (which occurred either spontaneously or following suggestion maneuvers) in the absence of epileptiform EEG activity. Inclusion criteria were: normal interictal EEG at visual inspection, willingness to participate and to give informed consent. Exclusion criteria was the chronic assumption of psychotropic drugs at the time of registration. The EEGs analyzed in this work were collected from the Regional Epilepsy Centre, Great Metropolitan Hospital of Reggio Calabria, University of Catanzaro, Italy. Only EEG recordings free from artifacts were considered.

3.1. EEG Data Acquisition

EEGs were acquired by means of Micromed Brain Quick system (Micromed SpA, Mogliano Veneto, Italy) with a sampling rate of 512 Hz, high-pass filter at 0.5 Hz, low-pass filter at 70 Hz, plus a 50 Hz notch filter with a slope of 12 dB/Oct. The EEG signals were acquired using a montage with the following channel layout: Fp1, Fp2, F3, F4, C3, C4, P3, P4, O1, O2, F7, F8, T3, T4, T5, T6, Fz, Cz, Pz and reference in G2 (located between electrodes Fz and Cz). The electrode skin impedance values were kept below 5 KΩ. The EEG data were recorded in a resting condition for 20 min.
All the experiments were conducted in a silent and softly lit room with the subject seated in a handy chair. The subject received information and instruction about the diagnostic setup [28]. After acquisition, EEGs were down-sampled to 256 Hz, segmented into 20 min long records, filtered at 0.5 Hz–32 Hz and stored in the American Standard Code for Information Interchange (ASCII) format for further processing. The EEG recordings were later visually reviewed by experts in order to remove the parts affected by artifacts.

3.2. EEG Data Processing

The flowchart of the proposed method is represented in Figure 1. It includes the following stages: (A) acquisition of the 19-channels EEG recording, EEG signal artefactual reduction (through the labels that have been reported by clinicians during the visual inspection phase) and filtering (Butterworth, 3rd order); (B) partitioning of the EEG signals of a subject into ε non-overlapping epochs of 2 s (the window length was empirically chosen after several experimental tests using an iterative approach); (C) wavelet decomposition with details d 1 , d 2 , d 3 , d 4 , d 5 and approximation a 5 , i.e., in total 6 sub-bands (chosen through experimental tests), on each channel, on the ε t h EEG epoch under analysis ( ε = 1, 2, …, 214).
The size of the ε th epoch was 19 × 512 × 6. Considering ε th ( ε = 1, 2, …, 214 is the number of the epoch of all 36 subjects) epochs, we obtained a 4D matrix of ε × 19 × 512 × 6 for each subject. The overall dataset was therefore composed of 7704 × 19 × 512 × 6 ((#epochs × #subjects) × #channels × #samples × #sub-bands); (D) the dataset was used as input to a CNN characterized by 2 convolution layers, 2 max pooling layers and 2 fully connected layers, followed by a sigmoid layer, which performed the 2-way (ES vs. PNES) classification. The leave-one-out cross-validation (LOOCV) [29] was applied. Considering all the epochs of the EEG of a subject, the EEG (and consequently the subject) was assigned to the class with the highest percentage of epochs classified by the network as belonging to that class (either ES or PNES).

3.2.1. Wavelet Transform

Wavelet transform is a time–frequency domain method that allows multi-resolution analysis. It is based on a short wave of limited duration and energy which dilates and shifts along the signal, thereby computing the wavelet coefficients [30,31]. At first, the mother wavelet function (a reference wavelet) is shifted continuously along the time scale to obtain a set of coefficients in time. Subsequently, the wavelet is dilated to a different width and then normalized in order to estimate the corresponding group of coefficients [32]. There are three types of wavelet transformations: Discrete Wavelet Transform (DWT), Continuous Wavelet Transform (CWT) and Wavelet Packet Decomposition (WPD). Of these, the most widely used for EEG analysis in the context of epilepsy is DWT [33] because it captures transient features and accurately localizes them both in time and frequency content [34]. The DWT wavelet function can be written as:
Ψ j , k ( t ) = 2 j 2 Ψ ( 2 j t k )
where k is the shift parameter, j is the resolution level; the greater the value of j, the smaller the frequency. The discrete wavelet decomposition coefficient W j k can be written as:
W j k = 2 j 2 n x ( n ) Ψ ( 2 j n k )
from the discrete wavelet decomposition coefficient W j k the original signal x(t) can be reconstructed:
x ( t ) = 1 C j k W j k Ψ j , k ( t )
The wavelet decomposition of a signal can be carried out through filtering in a cascade of two band-pass filters, as stated by the Mallat algorithm [35,36]. The detail coefficient, c n and the approximation coefficient, a n , are computed by quadrature mirror filters, and the signal is reconstructed following the scheme of filters known in the literature. A graphic representation of a DWT signal is shown in Figure 2.
The wavelet analysis (decomposition) of the EEG signal was performed through the wavedec [37] function implemented in MATLAB which returns the wavelet decomposition of the 1-D signal x(t) of every ith channel (with i = 1, 2, …, 19) of every ε th epoch (with ε = 1, 2, …, 214) of every mth patient (with m = 1, 2, …, 36)) at level n (with n = 1, 2, …, 5) of details ( d n ) and 5 of approximation ( a 5 ) using the Daubechies 4 (DB4) wavelet. Other mother wavelets can be used, as appropriate for experimental biosignals [38]. The output decomposition structure consists of the wavelet decomposition vector c and the bookkeeping vector l, which contains the number of coefficients by level. Subsequently, using the wrcoef [39] function, also integrated in MATLAB, the coefficients vector of type based on the wavelet decomposition structure [c,l] of a 1-D signal (wavedec) using the wavelet DB4 was reconstructed (Figure 3).
Therefore, the signal processing pipeline can be summarized in two blocks: wavelet decomposition (energy of details d 1 , d 2 , d 3 , d 4 , d 5 and approximation coefficient a 5 , for a total of 6 bands) and reconstructions from the coefficients vector. Figure 3 shows the reconstructed signal in the respective sub-bands.
The original signal x(t) can be reconstructed directly by the approximation signal plus the detail signals as specified in the following equation:
x ( t ) = a 5 + d 5 + d 4 + d 3 + d 2 + d 1

3.2.2. Convolutional Neural Network

In the last decades, Deep Learning or Deep Neural Networks have been regarded as powerful tools as they are able to handling a huge amount of data. One of the most popular deep neural networks is the CNN. It has already surpassed the performance of classical pattern recognition methods in several fields and is expected to be increasingly used. The most important improvement of CNN with respect to previous techniques is the reduction of the number of parameters in artificial neural network (ANN). This allowed the analysis of complex and large data (such as biosignals and images) that previously was impossible to deal. CNN has multiple layers, including convolutional layer, non-linearity layer, pooling layer and fully-connected layer [40]. The CNN includes two big building blocks: a feature extractor and an ANN block. The first one characterizes the CNN network. It automatically extracts the features from the raw signal and consists of convolution, activation and pooling layers. The ANN is a fully connected multi-layer neural network widely applied in many classifiers based on neural networks (e.g., MLP). The task of this block is to perform the classification exploiting the previously learned features.
In detail, the convolution layer performs the convolution operation formulated as:
Y j = X i K j + B j
The output Y j is a feature map of every K j filter convolved (∗) with a local region of X i (called receptive field) and the bias B j added. The size of Y j depends on the padding parameter and the size and the stride of the filters; the amount depends on the number of filters. Each filter moves along the input with a specific step size (sharing the same weights), estimating C feature maps (with C = number of filters). The convolution layer is followed by an activation layer, a nonlinear transfer function that could be sigmoid, hyperbolic tangent or rectified linear units. The last one was considered better in terms of generalization and learning time for CNN by recent studies [41,42]. In the pooling layer, the extracted features maps are downsampled through a max or average pooling layer. The filter scans the features map of the input and calculates the maximum or the average of each sub-region being analyzed, returning a map of reduced size. The latter building blocks (ANN) consists of one or more fully connected layers. The output of ANN perform the discrimination task.

3.2.3. Proposed Architecture

The proposed CNN architecture includes 2 convolutional layers (+ReLu activation function), 2 max pooling layers, 2 fully connected layers and a sigmoid layer which performs the classification tasks (two ways: ES vs. PNES). The sizing of the network (number of levels, number and size of filters, etc.) was chosen empirically after several experimental tests using an iterative approach aimed at improving network performance and automatically extracting features. It is designed to accept the fixed sizes matrix of c × s × f (where c = 19, number of EEG channels; s = 512, number of signal samples 256 Hz ∗ 2 s; f = 6, number of sub-bands of wavelet decomposition). The convolutional layer ( C o n v 1 ) has 16 learnable filters, each sized 1 × 6. Every filter convolves with each temporal input representation. It also has "SAME" padding and a stride of 1 × 2. The layer’s outputs with the same spatial 1st dimensions and half 2nd dimensions as its inputs generates. C o n v 1 is followed firstly by the ReLu and then by the max pooling layer ( M a x P o o l 1 ) which reduces the features maps size from 19 × 256 × 16 to 19 × 128 × 16 by using 1 × 2 filters with a stride s = 1 × 2. The convolutional layer ( C o n v 2 ) has 32 learnable filters sized 1 × 3 with a strides 1 × 2. Furthermore, C o n v 2 is followed firstly by the ReLu and then by the max pooling layer ( M a x P o o l 2 ) which reduces the features maps size from 19 × 64 × 32 to 19 × 32 × 32 by using 1 × 2 filters with a stride s = 1 × 2. These levels extract the most relevant features automatically. The details on the number of parameters used are shown in Table 1.
The features extracted are flattened ( F l a t t e n layer) and are used as an input of a ( D e n s e 1 ) layer with 32 hidden neurons. It is followed by a D r o p o u t layer (of 0.3, in order to improve generalization and avoid overfitting) which separates from D e n s e 2 with 16 hidden neurons. The network ends with a sigmoid layer ( D e n s e 3 ) to estimate the class predictions in binary classification. If the network’s output is less than 0.5 the epoch is considered of PNES class (labeled as 0); and if the output of the network is greater than 0.5, then the epoch is considered of ES class (labeled as 1). The proposed CNN was made in Python, using Keras [43] with Tensorflow backend, and trained using the default parameters of the adaptive moment estimation (ADAM) optimizer [44] for 10 iterations and with a batch size of 107 until the crossentropy function converged.

3.3. Performance Metrics of Classification

A balanced dataset of 7704 EEG epochs (3852 related to 18 ES subjects and 3852 related to 18 PNES subjects) was used to test the proposed DL pipeline of EEG classification that was composed of two main building blocks: wavelet decomposition, which extracted 6 sub-bands from EEG raw data, and the convolutional neural network, which performed the classification task. The aim was to carry out an overall patient-based classification on the basis of how their epochs have been labeled by the trained network. For the patient-based classification, given a subject, if the number of epochs labelled by the network as a specific class is larger than 50%, then the subject is assigned to that class. In order to estimate a better true prediction error, tune parameters estimation and to prevent overfitting of models, a k-fold cross-validation data resampling methods is used. We chosen a special case of k-fold cross-validation, called leave-one-out cross-validation (LOOCV). Here, each subject serves, in turn, as hold-out case for the test set. The available learning set is partitioned into 36 (number of subject) disjoint subsets of equal size [45,46]. Specifically, the CNN was iteratively trained using the whole data set and leaving out the epochs of a single subject at a time. Therefore, 36 models were trained and the instances (i.e., epochs of S b j ) left-out represented the test set of the jth network. The classification performance were evaluated using the following standard metrics:
A C C U R A C Y = T P + T N T P + T N + F P + F N
P R E C I S I O N = T P T P + F P
R E C A L L = T P T P + F N
F m e a s u r e = 2 P R E C I S I O N R E C A L L P R E C I S I O N + R E C A L L
C o h e n s k a p p a = 2 ( T P T N F N F P ) ( T P + F P ) ( F P + T N ) + ( T P + F N ) ( F N + T N )
where TP, TN, FP and FN represent the true positive, true negative, false positive and false negative, respectively. Specifically, TP and TN are the numbers of PNES and ES subjects classified correctly; FP is the number of ES subjects incorrectly classified as PNES and vice versa, FN is the number of PNES subjects misclassified as ES subjects.

3.4. Comparison with Standard Classifiers

The proposed CNN was also compared with standard classifiers to evaluate the specific efficiency of the DL approach. In detail, the multi-layer percerptron (MLP), support vector machine with a Gaussian radial basis function kernel ( S V M r b f ), linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) were trained on handcrafted features extracted from wavelet sub-bands. Specifically, for each epoch of the mth patient (m = 1, 2 …, 36), six characteristics discriminating from the six EEG signal sub-bands were calculated: Min, Max, Energy, Mean, Std and Skewness. The last three features (Mean, Std and Skewness) were chosen in agreement with Gasparini et al. [47]. Additional information was obtained including Min, Max and Energy, according to Hamad et al. [33].
The resulting handcrafted feature vectors included six features for each electrode of the six sub-bands. The feature vector was sized 684 (6 × 6 × 19) for each ε th epoch. The overall dataset composed of 7704 × 684 (#epochs × #feature vector) was used as input for the standard classifiers to identify EEG patterns of ES and PNES. Six classifiers were trained and tested: M L P 1 with 1 layer of 300 neurons; M L P 2 with two layers of 300 and 50 neurons, respectively; M L P 3 with three layers of 300, 100 and 50 neurons, respectively. All MLP classifiers ended with a softmax output layer to carry out the binary classification (ES vs. PNES), and each was trained using the default parameter of adaptive moment estimation (ADAM) optimizer [44] for 10 iterations with a batch size of 214 until the crossentropy function converged. The S V M r b f was trained using the Gaussian kernel radial basis function with regularization parameter (gamma) of 0.001. The L D A was trained using the default “singular value decomposition (SVD)” solver without using shrinkage. The Q D A , at last, was trained using the default hyperparameter for regularization, which is “ r e g _ p a r a m ”, of 0.0. The topology of the classifiers was chosen empirically after several experimental tests.

3.5. Permutation Entropy Based Interpretability of Proposed Architecture

The evaluation of the classification efficiency in deep neural networks is generally done by analyzing the inputs and the outputs. For performing interpretability analysis of the behavior of the proposed CNN, we extracted the feature maps in the intermediate layers ( C o n v 1 , C o n v 2 ) and compared them with the input. The goal was to inspect the separability of the latent features in the intermediate transformation. From an information theoretical perspective, by passing through the various successive stages of the CNN, the input vector meliorates its discrimination capability by improving the mutual information (MI) between the presently available vector/image and the corresponding label. This progressive modification of the MI corresponds to a reduction in the uncertainty measurable by suitable entropic parameters. Permutation entropy (PE) [48] was used here for the quantitative comparison. PE is a natural complexity measure for time series that can be calculated for arbitrary real-world time series. PE computation is extremely fast and robust to noise and outliers. The PE of a signal x is defined as:
P E ( n ) = p ( π ) l o g 2 ( π )
where the sum runs over all n! permutations π of order n [49]. Referring to π n as the permutations of time series, the relative frequency of each π is obtained by counting the number of times π is found in the time series divided by the total number of sequences. This is the information contained in comparing n consecutive values of the time series. In particular, PE was computed by using AntroPy [49], a Python 3 package that provides several time-efficient algorithms for computing the complexity of time-series.

Statistical Test

The separability of the input data, and the latent features in C o n v 1 and C o n v 2 layers in terms of PE, were statistically quantified by using the Wilcoxon rank-sum test. It tests the null hypothesis that two sets of measurements are drawn from the same distribution. The other hypothesis is that values in one sample are stochastically larger than the values in the other sample [50]. The Wilcoxon rank-sum test was preferred to the two-sample t-test because it is much less sensitive to outliers [51]. Furthermore, to provide statistical support for the analysis and comparison of the proposed CNN architecture and standard classifiers’ results, the posthoc Friedman-Nemenyi test was performed. In particular, the Friedman test is a non-parametric test used to determine whether or not there is a statistically significant difference between the accuracy of classifiers when the same subjects show up in each classifier [52,53]. If the Friedman test p-value was statistically significant, the post hoc Nemenyi test to determine exactly which results are different was to be executed [54,55].

4. Results

4.1. Performance of the Proposed Architecture and Comparisons with Standard Classifiers

The experimental data available consisted of 18 patients with ES (mean age 47.9 years, 12 males, 6 females) and 18 patients with PNES (mean age 27.3 years, 2 males, 16 females). The experimental data were balanced, and there is no evidence to suggest that age-related changes in EEG could have affected the results [56,57].
The proposed CNN achieved very good values in each test scenario, reporting accuracy rates up to 94.4% for the patient-based classification. Encouraging values of recall, precision F-measure and Cohen’s kappa were observed. Table 2 outlines the values of the patient-based test results of C N N in comparison with M L P 1 , M L P 2 , M L P 3 , S V M r b f , L D A and Q D A classifiers.
The good network performance was confirmed by the analysis of the area under the curve (AUC) of the receiver operating curve (ROC). As shown in Figure 4, the patient-based classification had an AUC of 0.99, indicating the excellent discrimination properties of the C N N .
Different training options were evaluated for C N N , M L P 1 , M L P 2 and M L P 3 : varying the learning rate α , the rate of decay of the first moment β 1 and the rate of decay of the second moment β 2 . It was observed that the best results were obtained with: learning rate α = 10 2 , first moment decay rate β 1 = 0.9 and second moment decay rate β 2 = 0.999 according to the practical recommendations reported in [44,58]. The S V M was evaluated using different kernels: polynomial kernel, gaussian kernel, radial basis function (RBF), Laplace RBF kernel and sigmoid kernel. It was observed that the best results were obtained using the RBF kernel with a regularization parameter (gamma) of 0.001. Furthermore, for the valued L D A , different solvers were used: least squares solution (lsqr), eigenvalue decomposition (eigen) and the default singular value decomposition (svd) solver that was chosen with a regularization parameter (gamma) of 0.001 because it obtained the best accuracy. The best results for the Q D A training were obtained using the default hyperparameter for regularization, which is “ r e g _ p a r a m ”, of 0.0.
Average training and test times were recorded to evaluate the computational cost and the latency introduced by the classifiers, in order to evaluate possible use in the clinical diagnostic routine. The total processing time of the proposed CNN for patient-based classification was 1624 s (27 min). Of note, the total processing time included all the 36 iterations (same for all classifiers). Each iteration had average durations of 45 s and 2 s for training and testing, respectively. Therefore, despite the long training times, the time of inference introduced by the network is only 2 s.
The total processing time of the M L P 1 for patient-based classification was 854 s (14 min). Each iteration had average durations of 19 s or 4 s for training and test, respectively.
The total processing time of the M L P 2 for patient-based classification was 661 s (11 min). Each iteration had average durations of 15 s and 3 s for training and testing, respectively.
The total processing time of the M L P 3 for patient-based classification was 443 s (7 min). Each iteration had average durations of 5 s and 2 s for training and testing, respectively.
The total processing time of the S V M r b f for patient-based classification was 792 s (13 min). Each iteration had average durations of 12 s and 10 s for training and testing, respectively.
The total processing time of the L D A for patient-based classification was 47 s. Each iteration had average durations of 1.3 s and 0.004 s for training and testing, respectively.
The total processing time of the Q D A for patient-based classification was 52 s. Each iteration had average durations of 1.4 s and 0.02 s for training and testing, respectively.
Of note, all performances were obtained using Intel(R) Xeon(R) GPU NVIDIA Tesla K40c with 12 GB RAM.

4.2. Interpretability of the Proposed Deep Learning Model and Statistical Testing

Figure 5 shows the difference between the two classes in terms of PE in the successive levels of the proposed scheme described above.
We obtained the p-value corresponding to the Wilcoxon rank-sum test statistic. The results show no statistically significant differences for the PE of input values at first, though the differences became statistically significant deeper in the network. The separability between the two groups/classes increased with the depth of the network. The average values of PE of the latent features of each channel were then analyzed. Figure 6 shows the differences between the two classes. The input did not show a difference in PE values between the two classes. Instead, a difference in PE among the channels was noted, in particular, in frontal and parietal regions.
The features extracted from the EEG in the central and parietal areas of the subjects with PNES have a lower PE compared to the subjects with ES in the C o n v 1 . More distributed differences of PE have been found in the features of C o n v 2 .
To perform multiple comparisons between our proposed architecture and the standard classifiers considered in this study, we performed the Friedman test to highlight statistical differences among accuracy results, and the Bonferroni and Nemenyi post hoc tests to discover which algorithms are statisticaly distinctive among the comparisons performed. Friedman’s test showed a p-value = 0.0002, less than 0.005 so statistically significant. That is, we have sufficient evidence to assert that the type of classifiers used conduct to statistically significant differences in accuracy results. Subsequently, it was possible to perform the post hoc Nemenyi test to determine exactly which classifier is statistically significant. The Nemeyi post hoc test returns the p-values for each pairwise comparison of classifiers. From the output, at p-value < 0.05, we have been able to affirm that only the proposed CNN obtained statistically significantly different accuracy than all other classifiers. In detail, proposed C N N vs. M L P 1 : p-value = 0.001; C N N vs. M L P 2 : p-value = 0.004; C N N vs. M L P 3 : p-value = 0.001; C N N vs. S V M r b f : p-value = 0.04; C N N vs. L D A : p-value = 0.004; C N N vs. Q D A : p-value = 0.02.

5. Discussion

In this paper, we provided a DL tool capable of discriminating with good reliability between two classes of subjects with ES or PNES. In particular, this discrimination is realized by a DL algorithm (CNN) analyzing latent features extracted from time-frequency sub-bands of noninvasive interictal scalp EEG recordings. This CNN was compared with the most used standard classifiers, and it showed better performance compared to other techniques, thereby proving to well exploit the strong non-linearity of the incoming data. Note that the proposed CNN performed the classification using the raw data, and it was the only one to obtain acceptable results, whereas the standard classifiers used the features described in Section 3.4. No standard classifiers achieved acceptable accuracy. M L P 1 , M L P 2 and M L P 3 obtained comparable results despite the variation in the number of parameters and depth of the network. Furthermore, S V M r b f , L D A and Q D A also did not show any great improvement during the many experimental tests. This can be attested to by the reduction of the information of interest contained in the raw data in the transition to handcrafted features. Thus, the manual extraction of the features, in addition to being time consuming, cannot capture the latent information that can be used for the classification of the two classes. Therefore, despite DL being much more complex in terms of design and thus more computationally intensive, as outlined in Section 4.1, it allows one to bypass the time-consuming and expensive handcrafted feature extraction process, and it is able to learn representations of data with multiple levels of abstraction [59]. DL overcomes the limits of standard learning algorithms, allowing more detailed analyses thanks to the extraction of multidimensional latent features. DL often suffers from a lack of interpretability. Keeping track of all non-linear transformations and the large number of free parameters within the network is indeed very hard. The progress of neural networks has not been followed by a complete understanding of the highly non-linear transformations, so their use in critical sectors such as the health, where interpretation of the actions taken is requested, has not increased greatly. A deeper understanding and the appropriate level of trust will lead to increasing adoption of this technology in critical applications, such as the present challenging PNES vs. ES classification. The explanability of DL, indeed, is a powerful tool for detecting flaws in models and biases in the data; for verifying predictions; for improving models; and finally, for gaining new insights into the problem at hand, and will become of fundamental importance for applications in the medical domain where wrong decisions of a system can be very harmful. Therefore, interpretability DL has attracted much interest from the data scientist community [60,61,62].
In the literature there are several promising results achieved by AI algorithms on neurological conditions [9,10,63,64]. Only two works have presented classification studies differentiating ES and PNES [18,65], both based on ML. The first study [18] performed a classification of 20 epilepsy and 20 PNES patients by using the imperialist competitive algorithm for feature extraction, achieving an accuracy higher than 90%. However, they used the EEGs including periods of seizures: this was likely to result in a significant difference between the two groups. In another study of the same group [65], the authors studied the interictal EEGs of five subjects with ES and five subjects with PNES by feature extraction for automatic classification and functional brain network analysis. The accuracy was found to be around 80% when the classification was computed based on the microstate features extracted from the beta-bands. This relatively low value of accuracy have resulted from the small sample size and the complexity of classification task. This supports the hypothesis that these interictal EEG classifications are still an open challenge.
With the aim of studying the behavior of the neural network, we analyzed the feature maps, which are the results of the latent transformations of the input. The better data separability deeper in the network testifies to good learning of distinctive features. Therefore, the convolutional block was able to identify distinctive features without the need for known biomarkers, but with a data-driven approach. The study of PE throughout the depth of the proposed deep neural network allowed us to interpret the behavior of the model, without explaining, however, why the model behaves the way it does. PE differences were also found between the central and parietal areas, in agreement with other studies [66,67,68] claiming that these areas play an important role in liability for seizures in ES subjects.

6. Conclusions

In conclusion, our proposed pipeline has shown very high accuracy for the fully automatic supervised classification of the two classes of analyzed subjects. Reviewing EEG data by means of DL algorithms would save time and resources, thereby allowing a larger number of people to obtain reference standard monitoring and also providing standardized support for physicians in therapeutic settings. Nevertheless, the DL approach is not widely used in the clinic, despite the increasing number of DL algorithms being published, but our interpretability analysis through PE could increase confidence in these algorithms and foster regulatory approval to discriminate ES and PNES. To the best of our knowledge, this study is the first application of PE as a measure of the interpretability of DL networks. Therefore, this pipeline is fast to apply, and it may be used in day-to-day hospital care during routine analyses, to make early and effective diagnosis potentially available on a large scale. This should encourage the use and trust of DL methods to support clinicians in future diagnostic applications.

Author Contributions

Conceptualization, M.L.G., C.I., N.M. and U.A.; Data curation, M.L.G., G.V., G.G.T., E.F., S.G., U.A. and F.C.M.; Formal analysis, M.L.G., C.I., N.M., S.G. and F.C.M.; Investigation, M.L.G., C.I., N.M. and U.A.; Methodology, M.L.G., G.V., C.I., N.M., E.F., S.G., U.A. and F.C.M.; Project administration, F.C.M.; Software, M.L.G., C.I. and N.M.; Supervision, U.A. and F.C.M.; Validation, M.L.G.; Visualization, M.L.G., C.I. and N.M.; Writing—original draft, M.L.G., C.I., N.M., S.G., U.A. and F.C.M.; Writing—review & editing, M.L.G., G.V., C.I., N.M., G.G.T., E.F., S.G., U.A. and F.C.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Patricia, O.S. About Epilepsy: The Basics. 2021. Available online: https://www.epilepsy.com/learn/about-epilepsy-basic (accessed on 13 May 2021).
  2. Gasparini, S.; Beghi, E.; Ferlazzo, E.; Beghi, M.; Belcastro, V.; Biermann, K.P.; Bottini, G.; Capovilla, G.; Cervellione, R.A.; Cianci, V.; et al. Management of psychogenic non-epileptic seizures: A multidisciplinary approach. Eur. J. Neurol. 2019, 26, 205-e15. [Google Scholar] [CrossRef]
  3. Noachtar, S.; Rémi, J. The role of EEG in epilepsy: A critical review. Epilepsy Behav. 2009, 15, 22–33. [Google Scholar] [CrossRef]
  4. Alsaadi, T.M.; Marquez, A.V. Psychogenic nonepileptic seizures. Am. Fam. Physician 2005, 72, 849–856. [Google Scholar]
  5. LaFrance, W.C., Jr.; Devinsky, O. Treatment of nonepileptic seizures. Epilepsy Behav. 2002, 3, 19–23. [Google Scholar] [CrossRef]
  6. Bodde, N.; Brooks, J.; Baker, G.; Boon, P.; Hendriksen, J.; Mulder, O.; Aldenkamp, A. Psychogenic non-epileptic seizures—Definition, etiology, treatment and prognostic issues: A critical review. Seizure 2009, 18, 543–553. [Google Scholar] [CrossRef] [Green Version]
  7. Moore, P.M.; Baker, G.A. Non-epileptic attack disorder: A psychological perspective. Seizure 1997, 6, 429–434. [Google Scholar] [CrossRef] [Green Version]
  8. Reuber, M.; Elger, C.E. Psychogenic nonepileptic seizures: Review and update. Epilepsy Behav. 2003, 4, 205–216. [Google Scholar] [CrossRef]
  9. Ieracitano, C.; Mammone, N.; Bramanti, A.; Hussain, A.; Morabito, F.C. A Convolutional Neural Network approach for classification of dementia stages based on 2D-spectral representation of EEG recordings. Neurocomputing 2019, 323, 96–107. [Google Scholar] [CrossRef]
  10. Mahmud, M.; Kaiser, M.S.; Hussain, A.; Vassanelli, S. Applications of Deep Learning and Reinforcement Learning to Biological Data. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 2063–2079. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  11. Fabietti, M.; Mahmud, M.; Lotfi, A.; Averna, A.; Guggenmos, D.; Nudo, R.; Chiappalone, M. Adaptation of Convolutional Neural Networks for Multi-Channel Artifact Detection in Chronically Recorded Local Field Potentials. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, Australia, 1–4 December 2020; pp. 1607–1613. [Google Scholar] [CrossRef]
  12. Rasheed, K.; Qayyum, A.; Qadir, J.; Sivathamboo, S.; Kwan, P.; Kuhlmann, L.; O’Brien, T.; Razi, A. Machine Learning for Predicting Epileptic Seizures Using EEG Signals: A Review. IEEE Rev. Biomed. Eng. 2021, 14, 139–155. [Google Scholar] [CrossRef]
  13. Roy, Y.; Banville, H.; Albuquerque, I.; Gramfort, A.; Falk, T.H.; Faubert, J. Deep learning-based electroencephalography analysis: A systematic review. J. Neural Eng. 2019, 16, 051001. [Google Scholar] [CrossRef]
  14. Li, G.; Lee, C.H.; Jung, J.J.; Youn, Y.C.; Camacho, D. Deep learning for EEG data analytics: A survey. Concurr. Comput. Pract. Exp. 2020, 32, e5199. [Google Scholar] [CrossRef]
  15. Varone, G.; Gasparini, S.; Ferlazzo, E.; Ascoli, M.; Tripodi, G.G.; Zucco, C.; Calabrese, B.; Cannataro, M.; Aguglia, U. A Comprehensive Machine-Learning-Based Software Pipeline to Classify EEG Signals: A Case Study on PNES vs. Control Subjects. Sensors 2020, 20, 1235. [Google Scholar] [CrossRef] [Green Version]
  16. Clarke, S.; Karoly, P.J.; Nurse, E.; Seneviratne, U.; Taylor, J.; Knight-Sadler, R.; Kerr, R.; Moore, B.; Hennessy, P.; Mendis, D.; et al. Computer-assisted EEG diagnostic review for idiopathic generalized epilepsy. Epilepsy Behav. 2021, 121, 106556. [Google Scholar] [CrossRef]
  17. Mormann, F.; Andrzejak, R.G.; Elger, C.E.; Lehnertz, K. Seizure prediction: The long and winding road. Brain 2007, 130, 314–333. [Google Scholar] [CrossRef] [Green Version]
  18. Ahmadi, N.; Carrette, E.; Aldenkamp, A.P.; Pechenizkiy, M. Finding Predictive EEG Complexity Features for Classification of Epileptic and Psychogenic Nonepileptic Seizures Using Imperialist Competitive Algorithm. In Proceedings of the 2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS), Karlstad, Sweden, 18–21 June 2018; pp. 164–169. [Google Scholar] [CrossRef] [Green Version]
  19. Zanin, M.; Zunino, L.; Rosso, O.A.; Papo, D. Permutation entropy and its main biomedical and econophysics applications: A review. Entropy 2012, 14, 1553–1577. [Google Scholar] [CrossRef]
  20. Ferlazzo, E.; Mammone, N.; Cianci, V.; Gasparini, S.; Gambardella, A.; Labate, A.; Latella, M.A.; Sofia, V.; Elia, M.; Morabito, F.C.; et al. Permutation entropy of scalp EEG: A tool to investigate epilepsies: Suggestions from absence epilepsies. Clin. Neurophysiol. 2014, 125, 13–20. [Google Scholar] [CrossRef] [PubMed]
  21. Yan, B.; He, S.; Sun, K. Design of a Network Permutation Entropy and Its Applications for Chaotic Time Series and EEG Signals. Entropy 2019, 21, 849. [Google Scholar] [CrossRef] [Green Version]
  22. Li, X.; Ouyang, G.; Richards, D.A. Predictability analysis of absence seizures with permutation entropy. Epilepsy Res. 2007, 77, 70–74. [Google Scholar] [CrossRef]
  23. Morabito, F.C.; Labate, D.; La Foresta, F.; Bramanti, A.; Morabito, G.; Palamara, I. Multivariate Multi-Scale Permutation Entropy for Complexity Analysis of Alzheimer’s Disease EEG. Entropy 2012, 14, 1186–1202. [Google Scholar] [CrossRef] [Green Version]
  24. Mammone, N.; Duun-Henriksen, J.; Kjaer, T.W.; Morabito, F.C. Differentiating interictal and ictal states in childhood absence epilepsy through permutation Rényi entropy. Entropy 2015, 17, 4627–4643. [Google Scholar] [CrossRef] [Green Version]
  25. Zheng, J.; Dong, Z.; Pan, H.; Ni, Q.; Liu, T.; Zhang, J. Composite multi-scale weighted permutation entropy and extreme learning machine based intelligent fault diagnosis for rolling bearing. Measurement 2019, 143, 69–80. [Google Scholar] [CrossRef]
  26. Kuai, M.; Cheng, G.; Pang, Y.; Li, Y. Research of planetary gear fault diagnosis based on permutation entropy of CEEMDAN and ANFIS. Sensors 2018, 18, 782. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Tian, Z.; Li, S.; Wang, Y. A prediction approach using ensemble empirical mode decomposition-permutation entropy and regularized extreme learning machine for short-term wind speed. Wind Energy 2020, 23, 177–206. [Google Scholar] [CrossRef]
  28. Babiloni, C.; Barry, R.J.; Başar, E.; Blinowska, K.J.; Cichocki, A.; Drinkenburg, W.H.; Klimesch, W.; Knight, R.T.; da Silva, F.L.; Nunez, P.; et al. International Federation of Clinical Neurophysiology (IFCN)–EEG research workgroup: Recommendations on frequency and topographic analysis of resting state EEG rhythms. Part 1: Applications in clinical research studies. Clin. Neurophysiol. 2020, 131, 285–307. [Google Scholar] [CrossRef]
  29. Sammut, C.; Webb, G.I. (Eds.) Leave-One-Out Cross-Validation. In Encyclopedia of Machine Learning; Springer: Boston, MA, USA, 2010; pp. 600–601. [Google Scholar] [CrossRef]
  30. Vetterli, M.; Herley, C. Wavelets and filter banks: Theory and design. IEEE Trans. Signal Process. 1992, 40, 2207–2232. [Google Scholar] [CrossRef] [Green Version]
  31. Dai, Y. The time–frequency analysis approach of electric noise based on the wavelet transform. Solid-State Electron. 2000, 44, 2147–2153. [Google Scholar] [CrossRef]
  32. Acharya, U.R.; Vinitha Sree, S.; Swapna, G.; Martis, R.J.; Suri, J.S. Automated EEG analysis of epilepsy: A review. Knowl.-Based Syst. 2013, 45, 147–165. [Google Scholar] [CrossRef]
  33. Jahankhani, P.; Kodogiannis, V.; Revett, K. EEG signal classification using wavelet feature extraction and neural networks. In Proceedings of the IEEE John Vincent Atanasoff 2006 International Symposium on Modern Computing (JVA’06), Sofia, Bulgari, 3–6 October 2006; pp. 120–124. [Google Scholar]
  34. Sadati, N.; Mohseni, H.R.; Maghsoudi, A. Epileptic seizure detection using neural fuzzy networks. In Proceedings of the 2006 IEEE International Conference on Fuzzy Systems, Vancouver, BC, Canada, 16–21 July 2006; pp. 596–600. [Google Scholar]
  35. Daubechies, I. The Wavelet Transform, Time-Frequency Localization and Signal Analysis; Princeton University Press: Princeton, NJ, USA, 2009. [Google Scholar]
  36. Guo, H.; Burrus, C.S. Convolution using the undecimated discrete wavelet transform. In Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, Atlanta, GA, USA, 9 May 1996; Volume 3, pp. 1291–1294. [Google Scholar]
  37. MathWorks. Wavedec—1-D Wavelet Decomposition—MATLAB. Available online: https://it.mathworks.com/help/wavelet/ref/wavedec.html (accessed on 29 October 2021).
  38. Greco, A.; Costantino, D.; Morabito, F.; Versaci, M. A Morlet wavelet classification technique for ICA filtered SEMG experimental data. In Proceedings of the International Joint Conference on Neural Networks, Portland, OR, USA, 20–24 July 2003; Volume 1, pp. 166–171. [Google Scholar]
  39. MathWorks. Wrcoef—Reconstruct Single Branch from 1 to D wavelet Coefficients— MATLAB. Available online: https://it.mathworks.com/help/wavelet/ref/wrcoef.html (accessed on 27 October 2021).
  40. Albawi, S.; Mohammed, T.A.; Al-Zawi, S. Understanding of a convolutional neural network. In Proceedings of the 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey, 21–23 August 2017; pp. 1–6. [Google Scholar]
  41. Zeiler, M.; Ranzato, M.; Monga, R.; Mao, M.; Yang, K.; Le, Q.; Nguyen, P.; Senior, A.; Vanhoucke, V.; Dean, J.; et al. On rectified linear units for speech processing. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, 26–31 May 2013; pp. 3517–3521. [Google Scholar] [CrossRef] [Green Version]
  42. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
  43. Chollet, F. Keras. 2015. Available online: https://keras.io (accessed on 3 September 2021).
  44. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  45. Vehtari, A.; Gelman, A.; Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 2017, 27, 1413–1432. [Google Scholar] [CrossRef] [Green Version]
  46. Berrar, D. Cross-Validation. In Reference Module in Life Sciences; Elsevier: Amsterdam, The Netherlands, 2019. [Google Scholar]
  47. Gasparini, S.; Campolo, M.; Ieracitano, C.; Mammone, N.; Ferlazzo, E.; Sueri, C.; Tripodi, G.G.; Aguglia, U.; Morabito, F.C. Information theoretic-based interpretation of a deep neural network approach in diagnosing psychogenic non-epileptic seizures. Entropy 2018, 20, 43. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  48. Bandt, C.; Pompe, B. Permutation entropy: A natural complexity measure for time series. Phys. Rev. Lett. 2002, 88, 174102. [Google Scholar] [CrossRef] [PubMed]
  49. Antropy. The Permutation Entropy. Available online: https://raphaelvallat.com/antropy/build/html/generated/antropy.perm_entropy.html#antropy.perm_entropy (accessed on 5 November 2021).
  50. Scipy. The Wilcoxon Rank-Sum Test. Available online: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ranksums.html (accessed on 5 November 2021).
  51. Wild, C.; Seber, G. The Wilcoxon rank-sum test. In Chance Encounters: A First Course in Data Analysis and Inference; Wiley: New York, NY, USA, 2011; p. 611. [Google Scholar]
  52. Friedman, M. A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 1940, 11, 86–92. [Google Scholar] [CrossRef]
  53. Scipy. Friedman Test for Repeated Measurements. Available online: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.friedmanchisquare.html (accessed on 6 November 2021).
  54. Nemenyi, P.B. Distribution-Free Multiple Comparisons; Princeton University: Princeton, NJ, USA, 1963. [Google Scholar]
  55. Scikit. Nemenyi Post Hoc Test. Available online: https://scikit-posthocs.readthedocs.io/en/latest/generated/scikit_posthocs.posthoc_nemenyi_friedman/ (accessed on 6 November 2021).
  56. Kutlubaev, M.A.; Xu, Y.; Hackett, M.L.; Stone, J. Dual diagnosis of epilepsy and psychogenic nonepileptic seizures: Systematic review and meta-analysis of frequency, correlates, and outcomes. Epilepsy Behav. 2018, 89, 70–78. [Google Scholar] [CrossRef] [Green Version]
  57. Glosser, G.; Roberts, D.; Glosser, D.S. Nonepileptic seizures after resective epilepsy surgery. Epilepsia 1999, 40, 1750–1754. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  58. Bengio, Y. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade; Springer: Berlin/Heidelberg, Germany, 2012; pp. 437–478. [Google Scholar]
  59. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  60. Došilović, F.K.; Brčić, M.; Hlupić, N. Explainable artificial intelligence: A survey. In Proceedings of the 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 21–25 May 2018; pp. 0210–0215. [Google Scholar] [CrossRef]
  61. Gunning, D.; Stefik, M.; Choi, J.; Miller, T.; Stumpf, S.; Yang, G.Z. XAI—Explainable artificial intelligence. Sci. Robot. 2019, 4. [Google Scholar] [CrossRef] [Green Version]
  62. Ieracitano, C.; Mammone, N.; Hussain, A.; Morabito, F.C. A novel explainable machine learning approach for EEG-based brain–computer interface systems. Neural Comput. Appl. 2021, 1–14. [Google Scholar] [CrossRef]
  63. Fürbass, F.; Kural, M.A.; Gritsch, G.; Hartmann, M.; Kluge, T.; Beniczky, S. An artificial intelligence-based EEG algorithm for detection of epileptiform EEG discharges: Validation against the diagnostic gold standard. Clin. Neurophysiol. 2020, 131, 1174–1179. [Google Scholar] [CrossRef]
  64. Raghavendra, U.; Acharya, U.R.; Adeli, H. Artificial intelligence techniques for automated diagnosis of neurological disorders. Eur. Neurol. 2019, 82, 41–64. [Google Scholar] [CrossRef]
  65. Ahmadi, N.; Pei, Y.; Carrette, E.; Aldenkamp, A.P.; Pechenizkiy, M. EEG-based classification of epilepsy and PNES: EEG microstate and functional brain network features. Brain Inform. 2020, 7, 1–22. [Google Scholar] [CrossRef]
  66. Meppelink, A.M.; Pareés, I.; Beudel, M.; Little, S.; Yogarajah, M.; Sisodiya, S.; Edwards, M.J. Spectral power changes prior to psychogenic non-epileptic seizures: A pilot study. J. Neurol. Neurosurg. Psychiatry 2017, 88, 190–192. [Google Scholar] [CrossRef] [PubMed]
  67. Arıkan, K.; Öksüz, Ö.; Metin, B.; Günver, G.; Laçin Çetin, H.; Esmeray, T.; Tarhan, N. Quantitative EEG findings in patients with psychogenic nonepileptic seizures. Clin. EEG Neurosci. 2021, 52, 175–180. [Google Scholar] [CrossRef] [PubMed]
  68. Richardson, M.P. Large scale brain models of epilepsy: Dynamics meets connectomics. J. Neurol. Neurosurg. Psychiatry 2012, 83, 1238–1248. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The EEG was recorded and stored on a computer. Therefore, the EEG was cleaned of artifacts, filtered and split into ε non-overlapping epochs of 2 s each. For every E E G e p o c h ε ( ε = 1 , 2 , , 214 ) , the wavelet decomposition over each channel was estimated, and subsequently, the sub-bands reconstruction was performed. The database obtained was the input of a convolutional neural network that included 2 convolutional layers (+ReLu activation layer); 2 max pooling layers; 2 fully connected layers, separated by a dropout layer; and a sigmoid layer which performs the classification tasks (two ways: ES vs. PNES).
Figure 1. The EEG was recorded and stored on a computer. Therefore, the EEG was cleaned of artifacts, filtered and split into ε non-overlapping epochs of 2 s each. For every E E G e p o c h ε ( ε = 1 , 2 , , 214 ) , the wavelet decomposition over each channel was estimated, and subsequently, the sub-bands reconstruction was performed. The database obtained was the input of a convolutional neural network that included 2 convolutional layers (+ReLu activation layer); 2 max pooling layers; 2 fully connected layers, separated by a dropout layer; and a sigmoid layer which performs the classification tasks (two ways: ES vs. PNES).
Entropy 24 00102 g001
Figure 2. Wavelet transform decomposition tree of the EEG signal, based on the Mallat algorithm.
Figure 2. Wavelet transform decomposition tree of the EEG signal, based on the Mallat algorithm.
Entropy 24 00102 g002
Figure 3. Coefficients vectors of detail and approximation coefficients based on the DB4 wavelet decomposition structure [c,l] of a sample EEG signals.
Figure 3. Coefficients vectors of detail and approximation coefficients based on the DB4 wavelet decomposition structure [c,l] of a sample EEG signals.
Entropy 24 00102 g003
Figure 4. ROC curves of CNN, M L P 2 , S V M r b f , LDA and QDA classifiers for ES vs. PNES classification. Features automatically extracted from sub-bands were used as input for the CNN, and handcrafted features manually extracted from sub-bands were used as inputs for M L P 2 , S V M r b f , LDA and QDA. M L P 1 and M L P 3 present similar trends to M L P 2 , which, however, had higher accuracy. Therefore, for better visual understanding, they were not included in the graph.
Figure 4. ROC curves of CNN, M L P 2 , S V M r b f , LDA and QDA classifiers for ES vs. PNES classification. Features automatically extracted from sub-bands were used as input for the CNN, and handcrafted features manually extracted from sub-bands were used as inputs for M L P 2 , S V M r b f , LDA and QDA. M L P 1 and M L P 3 present similar trends to M L P 2 , which, however, had higher accuracy. Therefore, for better visual understanding, they were not included in the graph.
Entropy 24 00102 g004
Figure 5. The box plots describe the permutation entropy distribution of the class of subjects with PNES and ES referred to input layer (1st and 2nd box plots), C o n v 1 layer (3rd and 4th box plots) and C o n v 2 layer (5th and 6th box plots). It is worth noting that the PE of input did not allow significant discrimination between the two classes. Instead, statistically quantified differences, with a p-value < 0.05, were found between the PE of PNES and ES subject in C o n v 1 and C o n v 2 .
Figure 5. The box plots describe the permutation entropy distribution of the class of subjects with PNES and ES referred to input layer (1st and 2nd box plots), C o n v 1 layer (3rd and 4th box plots) and C o n v 2 layer (5th and 6th box plots). It is worth noting that the PE of input did not allow significant discrimination between the two classes. Instead, statistically quantified differences, with a p-value < 0.05, were found between the PE of PNES and ES subject in C o n v 1 and C o n v 2 .
Entropy 24 00102 g005
Figure 6. The topo-plots describe the permutation entropy distributions of the channels of the class of subjects with PNES and ES referred to input layer (topo-plots on the left), C o n v 1 layer (topo-plots on the middle) and C o n v 2 layer (topo-plots on the right). For the input, there are no significant differences between the two classes. Instead, differences, in frontal and parietal regions, were found between the PE of PNES and ES subjects in the C o n v 1 , and differences in all channels in the C o n v 2 .
Figure 6. The topo-plots describe the permutation entropy distributions of the channels of the class of subjects with PNES and ES referred to input layer (topo-plots on the left), C o n v 1 layer (topo-plots on the middle) and C o n v 2 layer (topo-plots on the right). For the input, there are no significant differences between the two classes. Instead, differences, in frontal and parietal regions, were found between the PE of PNES and ES subjects in the C o n v 1 , and differences in all channels in the C o n v 2 .
Entropy 24 00102 g006
Table 1. Total number of learnable parameters for the proposed CNN architecture that includes 2 convolutional layers (+ReLu), 2 max pooling layers, 2 fully connected layers, one dropout layer and a sigmoid layer which performs the classification tasks.
Table 1. Total number of learnable parameters for the proposed CNN architecture that includes 2 convolutional layers (+ReLu), 2 max pooling layers, 2 fully connected layers, one dropout layer and a sigmoid layer which performs the classification tasks.
Layer NameOutput ShapeParameters
I n p u t 19 × 512 × 6
C o n v 1 19 × 256 × 16592
M a x P o o l 1 19 × 128 × 16
C o n v 2 19 × 64 × 321.568
M a x P o o l 2 19 × 32 × 32
F l a t t e n 19456
D e n s e 1 32622.624
D r o p o u t 32
D e n s e 2 16528
D e n s e 3 117
T o t a l 625.329
Table 2. ES andPNES subject classification performances (accuracy, precision, recall, F-measure, Cohen’s kappa) evaluated with patient-based classification with the proposed CNN and standard classifiers.
Table 2. ES andPNES subject classification performances (accuracy, precision, recall, F-measure, Cohen’s kappa) evaluated with patient-based classification with the proposed CNN and standard classifiers.
ES vs. PNES
ClassifierAccuracyPrecisionRecallF-MeasureCohen’s Kappa
C N N 94.4%89.9%100%94.7%88.8%
M L P 1 38.9%41.7%55.6%47.6%−22.2%
M L P 2 47.2%47.4%50.0%48.6%−5.6%
M L P 3 44.4%44.4%44.4%44.4%−11.1%
S V M 58.3%57.1%66.7%61.5%16.7%
L D A 55.6%54.5%66.7%60.0%11.1%
Q D A 55.6%54.5%66.7%60.0%11.1%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lo Giudice, M.; Varone, G.; Ieracitano, C.; Mammone, N.; Tripodi, G.G.; Ferlazzo, E.; Gasparini, S.; Aguglia, U.; Morabito, F.C. Permutation Entropy-Based Interpretability of Convolutional Neural Network Models for Interictal EEG Discrimination of Subjects with Epileptic Seizures vs. Psychogenic Non-Epileptic Seizures. Entropy 2022, 24, 102. https://doi.org/10.3390/e24010102

AMA Style

Lo Giudice M, Varone G, Ieracitano C, Mammone N, Tripodi GG, Ferlazzo E, Gasparini S, Aguglia U, Morabito FC. Permutation Entropy-Based Interpretability of Convolutional Neural Network Models for Interictal EEG Discrimination of Subjects with Epileptic Seizures vs. Psychogenic Non-Epileptic Seizures. Entropy. 2022; 24(1):102. https://doi.org/10.3390/e24010102

Chicago/Turabian Style

Lo Giudice, Michele, Giuseppe Varone, Cosimo Ieracitano, Nadia Mammone, Giovanbattista Gaspare Tripodi, Edoardo Ferlazzo, Sara Gasparini, Umberto Aguglia, and Francesco Carlo Morabito. 2022. "Permutation Entropy-Based Interpretability of Convolutional Neural Network Models for Interictal EEG Discrimination of Subjects with Epileptic Seizures vs. Psychogenic Non-Epileptic Seizures" Entropy 24, no. 1: 102. https://doi.org/10.3390/e24010102

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop