Next Article in Journal
Islanding Detection Methods for Microgrids: A Comprehensive Review
Next Article in Special Issue
COVID-19 Detection Systems Using Deep-Learning Algorithms Based on Speech and Image Data
Previous Article in Journal
A General Class of Differential Hemivariational Inequalities Systems in Reflexive Banach Spaces
Previous Article in Special Issue
Identifying the Main Risk Factors for Cardiovascular Diseases Prediction Using Machine Learning Algorithms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Weighted Hybrid Feature Reduction Embedded with Ensemble Learning for Speech Data of Parkinson’s Disease

1
Faculty of Information Technology, College of Computer Science, Beijing University of Technology, Beijing 100124, China
2
College of Mechanical Engineering and Applied Electronics Technologies, Beijing University of Technology, Beijing 100124, China
3
Swedish College of Engineering and Technology, Rahim Yar Khan 64200, Pakistan
4
Department of Electrical Engineering, Foundation University Islamabad, Islamabad 44000, Pakistan
5
Department of Electrical Engineering, College of Engineering, Taif University KSA, P.O. Box 11099, Taif 21944, Saudi Arabia
6
Computer Sciences Program, Turabah University College, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(24), 3172; https://doi.org/10.3390/math9243172
Submission received: 7 October 2021 / Revised: 6 December 2021 / Accepted: 7 December 2021 / Published: 9 December 2021

Abstract

:
Parkinson’s disease (PD) is a progressive and long-term neurodegenerative disorder of the central nervous system. It has been studied that 90% of the PD subjects have voice impairments which are some of the vital characteristics of PD patients and have been widely used for diagnostic purposes. However, the curse of dimensionality, high aliasing, redundancy, and small sample size in PD speech data bring great challenges to classify PD objects. Feature reduction can efficiently solve these issues. However, existing feature reduction algorithms ignore high aliasing, noise, and the stability of algorithms, and thus fail to give substantial classification accuracy. To mitigate these problems, this study proposes a weighted hybrid feature reduction embedded with ensemble learning technique which comprises (1) hybrid feature reduction technique that increases inter-class variance, reduces intra-class variance, preserves the neighborhood structure of data, and remove co-related features that causes high aliasing and noise in classification. (2) Weighted-boosting method to train the model precisely. (3) Furthermore, the stability of the algorithm is enhanced by introducing a bagging strategy. The experiments were performed on three different datasets including two widely used datasets and a dataset provided by Southwest Hospital (Army Military Medical University) Chongqing, China. The experimental results indicated that compared with existing feature reduction methods, the proposed algorithm always shows the highest accuracy, precision, recall, and G-mean for speech data of PD. Moreover, the proposed algorithm not only shows excellent performance for classification but also deals with imbalanced data precisely and achieved the highest AUC in most of the cases. In addition, compared with state-of-the-art algorithms, the proposed method shows improvement up to 4.53%. In the future, this algorithm can be used for early and differential diagnoses, which are rated as challenging tasks.

1. Introduction

The use of machine learning techniques to control diseases is becoming popular nowadays [1,2,3]. Parkinson’s disease damages the nerve cells that are responsible for body movement [4]. As a symptom of Parkinson’s disease, speech plays an informative role in the pathogenesis of Parkinson’s disease. The convenience of voice acquisition makes remote monitoring of Parkinson’s disease possible. However, speech datasets often have noise and high aliasing characteristics. This brings troublesomeness in the processing of speech data. How to extract efficient representational features from Parkinson’s speech dataset has received much attention from researchers. Dimensionality reduction is a technique wherein some of the features are alleviated from the original high dimensional data space in such a way that new lower-dimensional data space can effectively represent the original data space. Dimensionality reduction procedures are mainly divided into feature selection and feature transformation [5].
In feature selection, a low dimensional subset of the features is searched from the original high dimensional data space so as a selected subset of features can efficiently describe the original data. Feature selection methods are divided into three groups: wrapper, filter, and embedded [6]. Some of the typical feature selection methods are Relief, SVM-RFE (support vector machines recursive feature elimination), mRMR (minimum redundancy maximum relevancy), p-value, and LASSO (least absolute shrinkage and selection operator). Feature selection methods are widely used for dimensionality reduction of the PD dataset. Rovini E et al. [7] used p-value to select the subset of features. Sakar C and Kurun O [8] developed a hybrid feature selection method by combining a feature selection algorithm with an SVM-based classifier and the accuracy of 92% was calculated. Peker M et al. [9] chose minimum redundancy maximum relevance for feature selection and fed these subsets of attributes to the CVANN (complex-valued artificial neural networks). Benba A et al. [10] used a threshold for selecting the subset of features defined by the MDVP (multi-dimensional voice program). Furthermore, selected features were fed to KNN (k-nearest neighbors) and SVM for classification between pathological and normal voices. Shirvan R et al. [11] combined the effects of generic algorithms and KNN for feature selection. The key benefit of the feature selection method is that it retains only those features that are useful for classification. However, feature selection methods are unable to generate new dimensions with high quality. In addition, feature selection-based algorithms lose some information, thereby eliminating some features from a dataset [12].
Feature transformation maps high dimensional data to low dimensional data without eliminating features and retains the sample’s information as much as possible [13]. Some of the typical representations of feature extraction methods are LDA (linear discriminant analysis), PCA (principal component analysis), LPP (locality preserving projection), LPDP (locality preserving discriminant projection), and LDPP (local discriminant preservation projection). Chen H et al. [14] proposed a method using a fuzzy KNN approach combined with PCA for PD classification and the accuracy of 96% was calculated. Hariharan M et al. [15] developed an algorithm using LDA and PCA for the recognition of PD subjects. Although PCA and LDA show good performance, they are not reliable for real-world problems. Most of the real-world problems are non-linear having complex tendencies. Both the PCA and LDA are linear feature extraction methods that assume data are in a linear subspace of high dimensional space, as a consequence, they are unable to classify non-linear datasets appropriately [16]. To remove non-linearity problems, kernel forms of PCA and LDA were developed where KPCA (kernelized principal component analysis) and KLDA (kernelized linear discriminant analysis) are typical non-linear feature extraction methods. However, during the process of data mapping, it is quite difficult to find the appropriate kernel function. Manifold learning is another form of feature extraction method which is more adaptable and does not have the limitation of choosing an appropriate kernel function [17]. LPP is a particular example of manifold learning. LPP optimally preserves the neighborhood structure of data and objective function of LPP and minimizes the distance between data points that have a neighborhood relationship in data space [18]. LPP also has some deficiencies. LPP is not only quite sensitive to the number of neighborhood samples but also undergoes the problem of a small sample size. To deal with these shortcomings, some improved versions of LPP were developed. However, developed LPP approaches still have some issues. Most of the LPP based approaches pay more attention to the variance between the classes without considering the large variance within the classes. There is also a problem of instability while mapping the high-dimensional data. It has been observed that if we do the partition of a dataset with a small sample size, there will be great randomness. It is only due to the difference in data distribution between test datasets and training datasets [19].
In a recent study, researchers are trying to build hybrid systems by combining the benefits of feature selection and feature extraction for the automatic assessment of Parkinson’s disease. Uzer M et al. [20] combined the effects of principal component analysis (PCA) and sequential forward selection (SFS) with an artificial neural network classifier for developing a hybrid system. A. Ul Haq et al. [21] developed a hybrid system by using ant-colony optimization algorithms, and the relief filtering method. They used support vector machine (SVM) as a classifier with the K-fold method for cross-validation.
From the above studies, it can be realized that the methods of PD speech feature extraction can be divided into three categories: feature selection, feature extraction, and feature selection combined with feature extraction (FSFE) or feature extraction combined with feature selection (FEFS). Feature selection and feature extraction have distinct strengths. The former focuses on the relevance (interpretation) of features to PD, while the latter focuses on the retention of PD data information. The FSFE or FEFS method combines the advantages of FS and FE, compared to using the feature selection or feature extraction algorithm alone, it can often achieve better classification results.
Besides, the methods that improve the remote diagnosis of PD not only include high-quality features, but also the design of a good classification strategy. Ensemble learning is an effective classification performance improvement strategy and has been widely used in various fields of bioinformatics, including the diagnosis of PD. Kadam V et al. [22] selected the attributes by using a genetic algorithm with 10-fold CV SVM and bagging approach using a bootstrap aggregating method for selecting the ensembles of polynomial kernel SVM classifier. Abuhasel K et al. [23] developed a hybrid algorithm that uses NEWFM as a base classifier and integrates it with a standard adaptive boosting method to improve the diagnosing accuracy of PD. Yongming L et al. [24] proposed an algorithm comprised of decision tree-based instance selection and ensemble learning. The CART algorithm was used for selecting the optimal speech samples and ensemble learning combined with random forest (RF), ELM (extreme learning machine), and SVM for training the optimized training samples. Finally, the trained method was applied to test samples. Lauraitis, A et al. [25] investigated the speech impairments to the CNDS (central nervous system disorders) patients. The dataset used for experiments was collected by a neural impairment test suite mobile app. Three domains of feature extraction methods including auditory spectrograms, cestrum domain, and WST (wavelet time scattering, analytic Gabor) were used in this study. For classification, BiLSTM (bidirectional recurrent neural network (RNN) with long short-term memory) and support vector machine (SVM) with polynomial kernel methods were used. They achieved an accuracy of 96.3% with WST-SVM and 94.50% with BiLSTM. Matheus T et al. [26] developed an algorithm to detect Huntington’s disease from voice recordings of patients reading Lithuanian poems. They estimated twelve new signal feature extractors by open-SMILE (open source media interpretation by large feature-space extraction) and integrated with KNN (K-nearest neighbours), SVM, MLP (multilayer perceptron), LDA, and QDA (quadratic discriminant analysis) models. Zhang H et al. [27] optimized the samples by using MENN (multi-edit-nearest-neighbor) algorithm and applied a DENN (deco-related-neural-network) ensemble to train those samples. Lastly, the trained model was applied to test samples. However, another combination of hybrid feature learning and hybrid ensemble learning is often overlooked by scholars. Therefore, if the ensemble learning model is designed as a combination of weighted-boosting and bagging models, and used with hybrid feature learning, it can significantly improve detection performance.
To the best of our knowledge, there is no public report about the hybrid feature learning embedded with the ensemble-learning approach for remote monitoring of PD. To develop a convenient and precise remote monitoring system for the identification of PD subjects, this paper designs a weighted hybrid feature reduction embedded with ensemble-learning model. The proposed algorithm first builds a hybrid feature learning method by integrating feature extraction and feature selection techniques; furthermore, uses a weighted-boosting technique to train the base classifier effectively; and then, repeats the above operations to obtain a series of base classifiers; finally, bagging is used to combine all the output of the base classifier to obtain the final classifier. Therefore, the primary contributions and innovations of this paper are as follows:
Proposed a hybrid system effectively integrates the benefits of feature extraction and feature selection. Moreover, the proposed algorithm precisely reduces the intra-class variance, increases the inter-class variance, preserves the neighborhood structure of data, and eliminates the noisy features concurrently that helps to reduce the high aliasing characteristics of PD speech data.
Furthermore, to improve the performance of PD remote diagnosis, the proposed method builds and implements the projection matrix by weighted-boosting and bagging strategies that not only train the model precisely but enhance the stability of the model as well.

2. Material and Methods

2.1. Data

In this paper, three different representative datasets including Parkinson Speech Dataset with Multiple Types of Sound Recording (PSDMTSR), a self-collected dataset (named as SelfData), and PARKINSON were used to authenticate the effectiveness of the proposed method.
The PSDMTSR dataset contains 20 healthy people and 20 PD patients. Among the healthy people, half of them were male and the average age of healthy people was 62.55 years. Among the PD patients, 6 of them were female and the rest were male patients, and the average age of PD patients was 64.86 years. More information can be found in [28].
The PARKINSON dataset contains 8 healthy people and 23 PD patients. Among the healthy people, 5 were female and 3 were male with an average age of 58 and 64 years respectively. Among the PD patients, 7 were female and 16 were male patients with an average age of 68.71 and 67.38 years respectively. More information can be found in [29].
SelfData w collected by Army Medical University, Chongqing, China. The dataset contains 21 (9 female, 12 male) healthy people (patients after medication) and 10 PD (5 female, 5 male) patients (without medication). For each of the participants, 13 data samples were recorded, and each sample consisted of 26 features that ended up a design matrix of 1170*26. More information can be found in [30].
Brief detail about these datasets is shown in Table 1.
For SelfData, patients indicate the patients before treatment, and healthy people represent the patients after treatment.

2.2. The Proposed Method

The objective function of the proposed method minimizes the trace of the local within-class scatter matrix and maximizes the trace of the between-class scatter matrix while preserving the locality of the sample. Moreover, the proposed algorithm ranks the features and selects top-ranked features, performs weighted-boosting to characterize the wrongly categorized samples, and finally, ensemble mapping for constructing the final output by using weight coefficients. In order to tune the parameters for best performance, hold-out-cross-validation is used to validate the method.
In this study, the data matrix is represented as X = [ x 1 , x 2 , x 3 , , x N ] T = [ X 1 , X 2 , X 3 , , X C ] T   R N × F , where N = i   =   1 C N i denotes the number of data samples and N i is the number of data samples in the dataset X i , F represents the number of dimensions of dataset X, C denotes the number of classes in dataset X and y = [ y 1 , y 2 , y 3 , , y N ] T   R N represents the labels of each data sample. P = ( p 1 , p 2 , p 3 , , p f )   R F × f is the projection matrix that is used for mapping the high dimensional data from R N × F to a low dimensional space R N × f , where F > f.
To make data samples with the same class label as close as possible, this algorithm reduces the intra-class variance matrix as follows:
min P c   =   1 C [ P T x ( c ) P T x ( c ) ¯ 2 ] x ( c ) X w c = min P c   =   1 C [ P T ( x ( c ) x w ( c ) ¯ ) ( x ( c ) x w ( c ) ¯ ) T P ] x ( c ) X w c = min P P T S W P n  
where S W = c   =   1 C x ( c ) X w c ( x ( c ) x w ( c ) ¯ ) T ( x ( c ) x w ( c ) ¯ ) is the matrix of intra-class variance, x w ( c ) ¯ = 1 N c i   =   1 N c x i ( c ) is the center of c t h class and x i ( c ) is the ith sample of class x ( c ) .
max P c   =   1 C [ P T x w ( c ) ¯ P T X ¯ 2 ] X ¯ X = max P c   =   1 C [ P T ( x w ( c ) ¯ X ¯ ) ( x w ( c ) ¯ X ¯ ) T P ] X ¯ X = max P P T S B P
where S B = c   =   1 C ( x w ( c ) ¯ X ¯ ) ( x w ( c ) ¯ X ¯ ) T means the matrix of inter-class variance, X ¯ = 1 N i   =   1 N x i is the center of X , and x i is the ith sample in X .
To preserve the neighborhood relationship means the sample structure would be preserved after mapping which can be described as
c   =   1 C j   =   i N i = 1 N c A i j c [ P T x i ( c ) P T x j 2 ] x i ( c ) , x j     X t r a i n = c   =   1 C [ P T ( t   =   1 N c x i ( c ) D c x i ( c ) T j   =   1 N i   =   1 N c x i ( c ) A i j c x j T ) P ] x j ,   x i ( c )     X t r a i n = c   =   1 C [ P T X ( c ) ( D c i i A c )   X t r a i n P ] X ( c )   X t r a i n = c   =   1 C [ P T X ( c ) ( D A )   X t r a i n P ] X ( c )   X t r a i n = P T   X t r a i n Z   X t r a i n T P
where Z = D A is a Laplacian matrix, D c i i = j A i j c is a diagonal matrix, and A i j is an affinity matrix that can be calculated in two different ways
H e a t   k e r n e l :   A i j c = { e x i x j 2 t ,   i f   x i c N k ( x j )     x j ϵ N k ( x j c ) 0 ,   o t h e r w i s e
S i m p l e m i n d e d :   A i j c = { 1 ,   i f   x i c N k ( x j )     x j ϵ N k ( x j c ) 0 ,   o t h e r w i s e
where N k is the set of samples nearest to sample x, and t is a heat kernel parameter. Therefore, the objective function of the proposed algorithm can be expressed as follows
min P T r   [ P T S W P P T S B P + X t r a i n Z   X t r a i n T P ]
This equation is equivalent to the Lagrange function
L ( P , λ ) = P T ( S W λ [ μ S B γ (   X t r a i n Z   X t r a i n T ) ] ) P
Taking the derivative with respect to P
P n [ L ( P , λ ) ] = 0 { P T [ S W λ [ μ S B γ (   X t r a i n Z   X t r a i n T ) ] ] P } P n = 0 2 S W P 2 λ [ μ S B P γ (   X t r a i n Z   X t r a i n T ) P ] = 0 S W P = λ [ μ S B γ (   X t r a i n Z   X t r a i n T ) ] P λ P =   P S W   [ μ S B γ (   X t r a i n Z   X t r a i n T ) ] 1
where the penalty factors µ and γ give a trade-off between the manifold preservation term and the local discrimination term that can be simply tuned via a substitute strategy.
After mapping data on P , the algorithm filters noisy and redundant features. The proposed algorithm ranks each feature according to its weight. This weight would be the approximation of the probability that a feature will be able to separate two classes in local neighbors. The objective function for feature selection can be described as follows
W ( f i ) = P ( d i f f e r e n t   v a l u e   o f   f i | d i f f e r n e t   c l a s s ) P ( d i f f e r e n t   v a l u e   o f   f i | s a m e   c l a s s )
W ( f i ) = c c l a s s ( x ) [ P ( x ) 1 P [ c l a s s ( x ) ] j = 1 k d i f f [ f i , x , M j ( x ) ] m k ] x j = 1 k d i f f [ f i , x , H j ( x ) ] m k
where d i f f [ f i , x , M j ] represents the difference of data sample x and M j on feature f i , H j ( x ) is the neighbor sample of the same class, and M j ( x ) is the neighbor data sample of opposite classes. Initially, the weight for each feature is set to 0. After that sample x is randomly taken from the training set   X t r a i n , and then k nearest hits of x are found from the sample set similar to x , and k nearest misses are found from the sample set of different classes of each x , and then the weight of each feature is updated. Finally, top W ( f i ) f F ( i = 1 , , F ) features of   X t r a i n with the highest weights will be selected for training the model.
W ( f i ) = W ( f i ) + c c l a s s ( x ) [ P ( x ) 1 P [ c l a s s ( x ) ] j = 1 k d i f f [ f i , x , M j ( x ) ] m k ] j = 1 k d i f f [ f i , x , H j ( x ) ] m k
In order to train the model effectively, the proposed algorithm contains a weighted-boosting technique with cross-validation by validation dataset X V . This technique gives higher weights to misclassified samples and lower weights to the classified samples and the sum of all weights is 1.
The objective function of AdaBoost is as follows
s i g n [ s = 1 S ρ s h s ( x s ) ]
where S is the number of samples, ρ s = 1 2 l n ( 1 ϵ s ϵ s ) is the weight of the classifier h s ( x ) for sample s . Initially, equal weights [ ω ( x s , y s ) y s [ 1 , 1 ] = 1 S   ,   s = 1 . , S ] is assigned to each of the training example s . The learning algorithm uses the training data set and x to generate a base learner h s ( x s ) : x s y s , h s ( x s ) is validated by using the validation dataset X V , and the weights for classified examples will be decreased and for the wrongly classified examples will be amplified in such a way that the sum of all example’s weights is equal to 1.
The weights for each example can be updated as follows
ω s ( x s , y s ) = ω s 1 ( x s , y s ) . e ρ s y s h s ( x s ) R s  
where R s is a normalization factor to ensure that sum of all weights is equal to 1.
Finally, the proposed algorithm is used on n different randomly sampled subsets   X t r a i n n of X T , so n different classifiers ( c l a s s i f i e r 1 , , c l a s s i f i e r n ) will be obtained. Therefore, the final classifier can be obtained as follows:
C l a s s i f i e r F i n a l = β 1 c l a s s i f i e r 1 + β 2 c l a s s i f i e r 2 + , , + β n c l a s s i f i e r n = i = 1 n β i c l a s s i f i e r i
where β is a weight coefficient obtained by Bayesian fusion in which sample class is determined based on maximum posterior probability. For instance, n number of classifiers are obtained using n number of training datasets, and n number of predictions results obtained by using validation dataset. The final probability matrix is obtained by joining the prediction of n classifiers. The weight coefficient β can be calculated as
β i = N i N v a l i d ,   ( i = 1 , 2 , , n )
where N i is the number of times i t h classifier is nominated as prediction result, N v a l i d is the number of samples in the validation dataset.
Based on the above description, the complete process of the proposed algorithm is as follows. The dataset X is divided into three parts; training X R , test X T , and validation X V datasets. The training data X R is further randomly subdivided into n parts (n is the number of networks and in each part, 80% of the whole training data is chosen randomly). Each of the X t n training datasets are processed separately. After that, the proposed algorithm is used for feature reduction of each X t n training datasets and the X V is used to validate each classifier. Lastly, the final output is obtained by integrating all independent networks. The flow chart and pseudo-code for the proposed method can be seen below.
Figure 1 shows the flowchart of the proposed algorithm and Figure 2 shows the weight-Boosting process. The details of the pseudo-code of the proposed algorithm are given in Algorithm 1 below.
Algorithm 1. The pseudo-code of the proposed algorithm
Input: Data set X, labels Y
Output: Final Prediction labels C l a s s i f i e r F i n a l
Begin
    1: Randomly divide dataset X into training data XR, test data XT, and validation data XV.
    2: Randomly select training sets as ( r s * · X T ) into X t 1 , X t 2 , X t 3 , X t n where n is the number of subspaces, and rs = 0.8
    3: for i = 1 to n (where n is the number of stacks or networks)
        Function = Hybrid-Feature-reduction ( X t n )
            Feature extraction
               Calculate S W (matrix of intra-class variance)
               Calculate S B (matrix of inter-class variance)
               Calculate Z (Laplacian matrix)
               Calculate A (Affinity matrix)
               Solve for P n (The projection matrix)
         Feature Selection
             Initialize weight W ( f i ) = 0 for each of the features
             Calculate d i f f [ f i , x , M j ( x ) ] (difference of sample x and neighbor sample of the same class)
             Calculate d i f f [ f i , x , H j ( x ) ] (difference of sample x and neighbor sample of the different classes)
             Calculate W ( f i )
             Select top f feature where f < F
           Weighted-boosting
               Assign equal weights [ ω ( x s , y s ) y s [ 1 , 1 ] = 1 S ,   s = 1 . , S ] to each of the training samples
                 Train model ( M o d e l n )
                 Validate ( M o d e l n )
                 Find Miss-classified samples
                 Calculated ρ s = 1 2 l n ( 1 ϵ s ϵ s ) (Weights)
                 Update weights ω s ( x s , y s )
                 Apply weighted-boosting s i g n [ s = 1 S ρ s h s ( x s ) ]
          end for
    4: Ensemble mapping
         Calculate β i = N i N v a l i d ,   ( i = 1 , 2 , , n ) (weight coefficient)
         Calculate C l a s s i f i e r F i n a l = i = 1 n β n c l a s s i f i e r n   (Predicted output)
End

2.3. Classifiers

The experiments were performed using three types of classifiers including RF (random forest), SVM, and ELM (extreme learning machine). Random forest classifier uses multiple trees for training and prediction, has been used for the classification of PD speech data [31,32]. SVM finds the hyperplane in the sample space to maximize the classification interval from different classes by using support vectors and is being used in various fields including PD speech data classification [33,34]. ELM is a comprehensive feed-forward network consists of the single-hidden layer which is used as a classifier in PD speech classification [35,36].

2.4. Evaluation Criteria

In this study, five model evaluation criteria are used including precision (Pre), recall (Rec), accuracy (Acc), G-mean, and Area under the ROC (AUC). These criteria are built from a confusion matrix that contains classified and misclassified samples of each class. G-mean is used to evaluate the results of imbalanced data samples, while AUC evaluates the performance of the classifiers. AUC is a numerical value that can be calculated by the area under the ROC curve (AUC). In this study, the classification of PD is a two-class problem, and the formation of the confusion matrix can be seen in Table 2.
The evaluation criteria can be constructed as follows
A c c u r a c y   ( A c c   % ) = T P + T N T P + F P + F N + T N × 100
P r e c i s i o n   ( P r e ) = T P T P + F P
R e c a l l   ( R e c ) = T P T P + F N
G m e a n = R e c * S p e = T P T P + F N * T N F P + T N
where S p e is specificity.

2.5. Experimental Environment

Windows version 7 with a 64-bit operating system was used for the experiment and the memory size was 8GB. The algorithm was executed in MATLAB version 2018a. Multiple experiments were carried out to evaluate the performance of the proposed PD diagnosis system. All experiments were performed under the same experimental conditions and each experiment was repeated 10 times and the average was recorded to mitigate the effect of irregularity. To accomplish the optimal performance of the model, the parameters were tuned according to Table 3. Moreover, 300 trees were used for random forest (RF), and 5000 hidden neurons were used for ELM.

3. Results

In this section, the results of the proposed algorithm are analyzed and compared with feature selection, feature transformation, and some state-of-the-art algorithms which have been extensively used for PD diagnosis. For validation purposes, hold-out-cross-validation is used in which the dataset is randomly and equally divided into training (1/3), validation (1/3) and test (1/3) sets. Since each subject contains multiple samples in the dataset, this validation process can effectively avoid data overlapping.

3.1. Comparison with Feature Extraction Methods

In this section, the results of proposed and various feature extraction methods were compared. These feature extraction methods involved LDA [4,15,37], PCA [31,32,38,39], KPCA [37], LPP [37,40] (simple-minded and heat kernel) LDPP (simple-minded and heat kernel). The classification results can be seen in Table 4. The results in Table 4 shows that the proposed algorithm achieved better performance on multiple evaluation criteria irrespective of datasets and classifiers. It is worth noticing that for PSDMTSR datasets, the proposed method always performed well compared with feature extraction methods. This indicates that the proposed method not only achieves higher classification accuracy but also can deal with imbalanced data samples effectively. Same as PSDMTSR, the proposed algorithm performed better for the PARKINSONS dataset as well on the accuracy, precision, recall, and G-mean which also shows the superiority of the proposed method. For the results using SelfData, although feature extraction methods help to improve the performance as to compare to N_DR (results without dimension reduction), the improvement of the proposed method is most noticeable. Compared with N_DR, the classification accuracy was improved for ELM by 18.66% (from 0.4867–0.6733) and for SVM (linear) by 8.67% (from 0.56–0.6467). Moreover, it is worth noticing that affinity matrix with simple-minded offered lower classification accuracy than that of the heat kernel. It is because while making the affinity matrix, the heat kernel provides weights to the neighbor samples according to the distance from the sample under observation (greater weights are given to the closest neighbor samples). However, simple-minded does not provide this functionality.

3.2. Comparison with Feature Selection Methods

In this section, the proposed algorithm was compared with feature selection methods included mRMR [9,41,42,43], p-value, ReliefF [41,44,45], SBS [46], SFS [15,37], and SVM_RFE. The results can be seen in Table 5.
The results in Table 5 illustrate that irrespective of classifier and dataset, the proposed algorithm performed well compared to feature selection algorithms in all evaluation indicators. For PARKINSONS and PSDMTSR datasets, the proposed algorithm achieved better results in terms of accuracy, precision, recall, and G-mean which shows the effectiveness of the proposed algorithm. For the SelfDataset, the advantage of the proposed algorithm is obvious for all classifiers in terms of accuracy and precision. Moreover, for recall and G-mean, the proposed algorithm with RF and ELM classifiers outperformed feature selection methods. The improvement of the proposed algorithm with ELM is most obvious which is 18.66% compared with N-DR and even 15% compared with top feature selection methods (mRMR, ReliefF).

3.3. Significance Analysis

For further verification, significance analysis difference between proposed and feature reduction algorithms was performed, and the results were recorded in Table 6. The results in Table 6 shows that no matter which classifier do we use, the proposed algorithm shows significant improvement compared with most of the existing dimensionality reduction methods.

3.4. Performance on AUC and Results Visualization

For a more comprehensive performance evaluation of the proposed algorithm, the performance of each algorithm on AUC was recorded by using the PSDMTSR dataset. The results can be seen in Table 7.
It can be seen in Table 7 that the proposed method always has the highest AUC compared with existing dimensionality reduction methods used for speech data of PD, which shows that the proposed method can perform better in the classification of PD objects from speech data. It can be seen that irrespective of classifier and dataset, the proposed algorithm always gives the highest accuracy.

3.5. Comparison with State-of-the-Art Algorithms

In this section, the proposed algorithm was compared with some of the state-of-the-art speech feature reduction algorithms. These methods include:
  • Relief-RF and Relief-SVM [41]: In this study, four feature reduction algorithms (LASSO, mRMR, Relief, and LLBFS (local learning-based feature selection) were used. These selected features were mapped to a binary classification response using RF and SVM classifiers and the best performance was achieved on the relief feature selection method with an SVM-linear classifier. The feature subsets were selected by using a cross-validation (CV) approach (only the training set). The CV process was repeated a total of ten times and the features which appeared most frequently were selected.
  • mRMR [43]: The primary objective of this study was to compare the efficiency of feature reduction algorithms. The author used mRMR for feature selection with seven different classifiers (multilayer Perceptron, SVMs with RBF and linear kernels, logistic regression, naïve Bayes, K-nearest neighbor, and RF). Moreover, the results were combined using stacking strategies.
  • LDA-NN-GA [4]: The author divides the dataset into test and training sets using the LOSO technique. Afterward, the LDA feature extraction method was used for feature reduction of the training dataset, and then the training dataset with reduced dimension was fed to the GA-optimized BP neural network to train the model, and finally, evaluate the performance using the test dataset.
  • ReliefF-FC-SVM(RBF) [37]: This technique ranks the features using Fisher criterion (FC) based ReliefF algorithm. After that, top K features were selected for training and testing the model using the SVM-RBF classifier.
  • SFFS-RF [42]: In this study, the author used the sequential floating feature selection (SFFS) method for feature selection and RF as a classifier.
  • KPCA-SVM(RBF) [37]: For this study, a feature extraction method KPCA was used as a feature reduction method and an SVM with RBF kernel was used as a classifier.
It can be seen in Table 8 that irrespective of dataset and classifier, the proposed algorithm performed better compared with state-of-the-art algorithms. For SelfData and PSDMTSR datasets, the proposed algorithm performed better in all cases. For the PARKINSONS dataset, the proposed algorithm achieved higher accuracy in most of the cases. It was noticed that the state-of-the-art algorithms did not achieve the same results as described in corresponding papers. One probable reason might be the difference in experimental conditions. For instance, since the method of splitting the dataset into training, test, and validation sets were different, the number of samples used for training the model in this study was significantly lower compared to the corresponding papers. Moreover, Table 9 describes the significant analysis between proposed and state of the art methods.

3.6. Influence of Parameter on Accuracy

In this section, the influence of penalty-factor and dimensionality on classification accuracy is analyzed. The PARKINSONS dataset was used for this experiment with the ELM classifier. The result can be seen in Figure 3.
It can be analyzed from Figure 3 that at the beginning, with the increase of dimensionality he classification accuracy increases first and then decreases. On the other hand, at any fixed dimension, the classification accuracy reduces with the increase of penalty-factor γ .

4. Discussion

Speech information processing has been widely used for the diagnosis of Parkinson’s disease due to the convenience of the collection of speech data and rich information contained in it. The feature reduction methods can help to improve classification accuracy. However, since speech samples are affected by the emotional fluctuations of the speakers, samples often contain high noise and aliasing characteristics which are usually overlooked by the researchers. To alleviate these problems, we proposed a weighted hybrid feature reduction embedded with an ensemble learning technique that fully considers the above-mentioned characteristics.
The experimental results indicate that compared with existing feature reduction algorithms, the proposed method offers the highest accuracy, precision, and recall which shows the effectiveness of the proposed method. Moreover, the proposed algorithm also offers the highest G-mean which shows that it can handle imbalanced data precisely. The experiments on SelfData which evaluate the treatment of PD patients show that the classification results of the proposed algorithm on the accuracy, precision, and recall are better compared with existing feature reduction algorithms. This implies that the proposed algorithm can effectively differentiate between PD patients with or without treatment.

5. Conclusions

In this study, we developed a weighted hybrid feature reduction embedded with ensemble learning algorithm and proved its applicability for the detection of patients suffering from Parkinson’s disease. Our findings acknowledge that compared with existing feature reduction methods the proposed algorithm helps to attain the highest accuracy (improved 19% compared with LDPP on SelfData), precision (improved 23% compared with LDPP on SelfData using ELM), recall (improved 11% compared with LDPP on SelfData using ELM), and G-mean (improved 17% compared with LDPP on SelfData using ELM). Moreover, we believe that the proposed weighted hybrid feature reduction embedded with ensemble learning technique is truly convenient for both the patients and health organizations, whereas the clinical experts and their dedicated time are required for medical assessment. We consider these developments as a key technical milestone that would be a step towards the computer-supported way out which enhance the convenience for voice medical treatment of a larger PD population.
Overall, this study still has substantial space for future work. The introduction of ensemble learning is a source of higher time consumption. For that reason, for the future studies, we will work to improve the processing time of the proposed method. Moreover, the comparatively limited data used in this research advises caution to generalize the existing outcomes that require further verification using new samples before this system would be used in clinical exercise.

Author Contributions

Methodology, W.U.R.; software, W.K.; investigation, Z.H.; project administration, N.U.; funding acquisition, F.R.A. All authors have read and agreed to the published version of the manuscript.

Funding

The APC is paid by Taif University Researchers Supporting Project Number (TURSP-2020/331), Taif University, Taif, Saudi Arabia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

The authors wish to acknowledge the support from Taif University Researchers Supporting Project Number (TURSP-2020/331), Taif University, Taif, Saudi Arabia.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Babayev, R. Improving the Performance of Type-2 Diabetes Prediction Models with Automated Feature-Engineering Methods: A Design Science Research Study; Colorado Technical University: Colorado Springs, CO, USA, 2021. [Google Scholar]
  2. De la Fuente-Mella, H.; Rubilar, R.; Chahuán-Jiménez, K.; Leiva, V. Modeling COVID-19 cases statistically and evaluating their effect on the economy of countries. Mathematics 2021, 9, 1558. [Google Scholar] [CrossRef]
  3. Velasco, H.; Laniado, H.; Toro, M.; Catano-López, A.; Leiva, V.; Lio, Y. Modeling the Risk of Infectious Diseases Transmitted by Aedes aegypti Using Survival and Aging Statistical Analysis with a Case Study in Colombia. Mathematics 2021, 9, 1488. [Google Scholar] [CrossRef]
  4. Ali, L.; Zhu, C.; Zhang, Z.; Liu, Y. Automated detection of Parkinson’s disease based on multiple types of sustained phonations using linear discriminant analysis and genetically optimized neural network. IEEE J. Transl. Eng. Health Med. 2019, 7, 1–10. [Google Scholar] [CrossRef]
  5. Trier, Ø.D.; Jain, A.K.; Taxt, T. Feature extraction methods for character recognition-a survey. Pattern Recognit. 1996, 29, 641–662. [Google Scholar] [CrossRef]
  6. Chandrashekar, G.; Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
  7. Rovini, E.; Maremmani, C.; Moschetti, A.; Esposito, D.; Cavallo, F. Comparative motor pre-clinical assessment in Parkinson’s disease using supervised machine learning approaches. Ann. Biomed. Eng. 2018, 46, 2057–2068. [Google Scholar] [CrossRef]
  8. Sakar, C.O.; Kursun, O. Telediagnosis of Parkinson’s disease using measurements of dysphonia. J. Med. Syst. 2010, 34, 591–599. [Google Scholar] [CrossRef]
  9. Peker, M.; Sen, B.; Delen, D. Computer-aided diagnosis of Parkinson’s disease using complex-valued neural networks and mRMR feature selection algorithm. J. Healthc. Eng. 2015, 6, 281–302. [Google Scholar] [CrossRef] [Green Version]
  10. Benba, A.; Jilbab, A.; Hammouch, A. Hybridization of best acoustic cues for detecting persons with Parkinson’s disease. In Proceedings of the 2014 Second World Conference on Complex Systems (WCCS), Agadir, Morocco, 10–12 November; pp. 622–625.
  11. Shirvan, R.A.; Tahami, E. Voice analysis for detecting Parkinson’s disease using genetic algorithm and KNN classification method. In Proceedings of the 2011 18th Iranian Conference of Biomedical Engineering (ICBME), Tehran, Iran, 14–16 December 2011; pp. 278–283. [Google Scholar]
  12. Khalid, S.; Khalil, T.; Nasreen, S. A survey of feature selection and feature extraction techniques in machine learning. In Proceedings of the 2014 Science and Information Conference, London, UK, 27–29 August 2014; pp. 372–378. [Google Scholar]
  13. Wang, X.; Paliwal, K.K. Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition. Pattern Recognit. 2003, 36, 2429–2439. [Google Scholar] [CrossRef]
  14. Chen, H.-L.; Huang, C.-C.; Yu, X.-G.; Xu, X.; Sun, X.; Wang, G.; Wang, S.-J. An efficient diagnosis system for detection of Parkinson’s disease using fuzzy k-nearest neighbor approach. Expert Syst. Appl. 2013, 40, 263–271. [Google Scholar] [CrossRef]
  15. Hariharan, M.; Polat, K.; Sindhu, R. A new hybrid intelligent system for accurate detection of Parkinson’s disease. Comput. Methods Programs Biomed. 2014, 113, 904–913. [Google Scholar] [CrossRef] [PubMed]
  16. Tsanas, A.; Little, M.A.; McSharry, P.E.; Ramig, L.O. Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average Parkinson’s disease symptom severity. J. R. Soc. Interface 2011, 8, 842–855. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Zhang, L.; Wang, X.; Huang, G.-B.; Liu, T.; Tan, X. Taste recognition in E-tongue using local discriminant preservation projection. IEEE Trans. Cybern. 2018, 49, 947–960. [Google Scholar] [CrossRef] [PubMed]
  19. Yu, G.; Peng, H.; Wei, J.; Ma, Q. Enhanced locality preserving projections using robust path based similarity. Neurocomputing 2011, 74, 598–605. [Google Scholar] [CrossRef]
  20. Uzer, M.S.; Inan, O.; Yılmaz, N. A hybrid breast cancer detection system via neural network and feature selection based on SBS, SFS and PCA. Neural Comput. Appl. 2013, 23, 719–728. [Google Scholar] [CrossRef]
  21. Ul Haq, A.; Li, J.; Memon, M.H.; Ali, Z.; Abbas, S.Z.; Nazir, S. Recognition of the Parkinson’s disease using a hybrid feature selection approach. J. Intell. Fuzzy Syst. 2020, 39, 1319–1339. [Google Scholar] [CrossRef]
  22. Kadam, V.J.; Kurdukar, A.A.; Jadhav, S.M. An Expert Diagnosis System for Parkinson’s Disease Using Bagging-Based Ensemble of Polynomial Kernel SVMs with Improved GA-SVM Features Selection. In Proceedings of the International Conference on Computational Science and Applications, Cagliari, Italy, 1–4 July 2020; pp. 227–234. [Google Scholar]
  23. Abuhasel, K.A.; Iliyasu, A.M.; Fatichah, C. A combined AdaBoost and NEWFM technique for medical data classification. In Information Science and Applications; Springer: Berlin/Heidelberg, Germany, 2015; pp. 801–809. [Google Scholar]
  24. Li, Y.; Yang, L.; Wang, P.; Zhang, C.; Xiao, J.; Zhang, Y.; Qiu, M. Classification of Parkinson’s disease by decision tree based instance selection and ensemble learning algorithms. J. Med. Imaging Health Inform. 2017, 7, 444–452. [Google Scholar] [CrossRef]
  25. Lauraitis, A.; Maskeliūnas, R.; Damaševičius, R.; Krilavičius, T. Detection of speech impairments using cepstrum, auditory spectrogram and wavelet time scattering domain features. IEEE Access 2020, 8, 96162–96172. [Google Scholar] [CrossRef]
  26. Guimarães, M.T.; Medeiros, A.G.; Almeida, J.S.; y Martin, M.F.; Damaševičius, R.; Maskeliūnas, R.; Mattos, C.L.C.; Rebouças Filho, P.P. An Optimized Approach to Huntington’s Disease Detecting via Audio Signals Processing with Dimensionality Reduction. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
  27. Zhang, H.-H.; Yang, L.; Liu, Y.; Wang, P.; Yin, J.; Li, Y.; Qiu, M.; Zhu, X.; Yan, F. Classification of Parkinson’s disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples. Biomed. Eng. Online 2016, 15, 1–22. [Google Scholar] [CrossRef] [Green Version]
  28. Sakar, B.E.; Isenkul, M.E.; Sakar, C.O.; Sertbas, A.; Gurgen, F.; Delil, S.; Apaydin, H.; Kursun, O. Collection and analysis of a Parkinson speech dataset with multiple types of sound recordings. IEEE J. Biomed. Health Inform. 2013, 17, 828–834. [Google Scholar] [CrossRef]
  29. Little, M.; McSharry, P.; Hunter, E.; Spielman, J.; Ramig, L. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. Nat. Preced. 2008. [Google Scholar] [CrossRef]
  30. Boersma, P.; Van Heuven, V. Speak and unSpeak with PRAAT. Glot Int. 2001, 5, 341–347. [Google Scholar]
  31. Rusz, J.; Tykalová, T.; Krupička, R.; Zárubová, K.; Novotný, M.; Jech, R.; Szabó, Z.; Růžička, E. Comparative analysis of speech impairment and upper limb motor dysfunction in Parkinson’s disease. J. Neural Transm. 2017, 124. [Google Scholar] [CrossRef]
  32. Zhan, A.; Little, M.A.; Harris, D.A.; Abiola, S.O.; Dorsey, E.; Saria, S.; Terzis, A. High frequency remote monitoring of Parkinson’s disease via smartphone: Platform overview and medication response detection. arXiv 2016, arXiv:1601.00960. [Google Scholar]
  33. Khan, T.; Westin, J.; Dougherty, M. Classification of speech intelligibility in Parkinson’s disease. Biocybern. Biomed. Eng. 2014, 34, 35–45. [Google Scholar] [CrossRef]
  34. Benba, A.; Jilbab, A.; Hammouch, A. Analysis of multiple types of voice recordings in cepstral domain using MFCC for discriminating between patients with Parkinson’s disease and healthy people. Int. J. Speech Technol. 2016, 19, 449–456. [Google Scholar] [CrossRef]
  35. Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of the 2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04CH37541), Budapest, Hungary, 25–29 July 2004; pp. 985–990. [Google Scholar]
  36. Liu, Y.; Tan, X.; Li, Y.; Wang, P. Weighted Local Discriminant Preservation Projection Ensemble Algorithm With Embedded Micro-Noise. IEEE Access 2019, 7, 143814–143828. [Google Scholar] [CrossRef]
  37. Yang, S.; Zheng, F.; Luo, X.; Cai, S.; Wu, Y.; Liu, K.; Wu, M.; Chen, J.; Krishnan, S. Effective dysphonia detection using feature dimension reduction and kernel density estimation for patients with Parkinson’s disease. PLoS ONE 2014, 9, e88825. [Google Scholar] [CrossRef] [Green Version]
  38. El Moudden, I.; Ouzir, M.; ElBernoussi, S. Automatic speech analysis in patients with parkinson’s disease using feature dimension reduction. In Proceedings of the 3rd International Conference on Mechatronics and Robotics Engineering, Paris, France, 8–12 February 2017; pp. 167–171. [Google Scholar]
  39. El Moudden, I.; Ouzir, M.; ElBernoussi, S. Feature selection and extraction for class prediction in dysphonia measures analysis: A case study on Parkinson’s disease speech rehabilitation. Technol. Health Care 2017, 25, 693–708. [Google Scholar] [CrossRef]
  40. Lei, H.; Zhao, Y.; Wen, Y.; Luo, Q.; Cai, Y.; Liu, G.; Lei, B. Sparse feature learning for multi-class Parkinson’s disease classification. Technol. Health Care 2018, 26, 193–203. [Google Scholar] [CrossRef] [Green Version]
  41. Tsanas, A.; Little, M.A.; McSharry, P.E.; Spielman, J.; Ramig, L.O. Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans. Biomed. Eng. 2012, 59, 1264–1271. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Galaz, Z.; Mekyska, J.; Mzourek, Z.; Smekal, Z.; Rektorova, I.; Eliasova, I.; Kostalova, M.; Mrackova, M.; Berankova, D. Prosodic analysis of neutral, stress-modified and rhymed speech in patients with Parkinson’s disease. Comput. Methods Programs Biomed. 2016, 127, 301–317. [Google Scholar] [CrossRef]
  43. Sakar, C.O.; Serbes, G.; Gunduz, A.; Tunc, H.C.; Nizam, H.; Sakar, B.E.; Tutuncu, M.; Aydin, T.; Isenkul, M.E.; Apaydin, H. A comparative analysis of speech signal processing algorithms for Parkinson’s disease classification and the use of the tunable Q-factor wavelet transform. Appl. Soft Comput. 2019, 74, 255–263. [Google Scholar] [CrossRef]
  44. Cigdem, O.; Demirel, H. Performance analysis of different classification algorithms using different feature selection methods on Parkinson’s disease detection. J. Neurosci. Methods 2018, 309, 81–90. [Google Scholar] [CrossRef]
  45. Tuncer, T.; Dogan, S.; Acharya, U.R. Automated detection of Parkinson’s disease using minimum average maximum tree and singular value decomposition method with vowels. Biocybern. Biomed. Eng. 2020, 40, 211–220. [Google Scholar] [CrossRef]
  46. Kursun, O.; Gumus, E.; Sertbas, A.; Favorov, O.V. Selection of vocal features for Parkinson’s Disease diagnosis. Int. J. Data Min. Bioinform. 2012, 6, 144–161. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Flowchart of the proposed algorithm.
Figure 1. Flowchart of the proposed algorithm.
Mathematics 09 03172 g001
Figure 2. Weighted-boosting process.
Figure 2. Weighted-boosting process.
Mathematics 09 03172 g002
Figure 3. The Influence of parameter γ and dimension on accuracy.
Figure 3. The Influence of parameter γ and dimension on accuracy.
Mathematics 09 03172 g003
Table 1. Information about datasets.
Table 1. Information about datasets.
DatasetsAttributes
PatientsHealthy PeopleInstancesFeaturesClassesReferences
PARKINSON238195232[29]
PSDMTSR20201040262[28]
SelfData36541170262
Table 2. Confusion matrix for a two-class problem.
Table 2. Confusion matrix for a two-class problem.
LabelPrediction
PositiveNegative
RealpositiveTrue Positive (TP)False Negative (FN)
negativeFalse Positive (FP)True Negative (TN)
Table 3. The parameters for the proposed algorithm.
Table 3. The parameters for the proposed algorithm.
SymbolMeaningParameter Settings
r s Random sampling ratio0.8
n Number of subspaces5
γ Penalty factor for P n T (   X t r a i n n Z   X t r a i n n T ) P n 10 4 , 10 3 , , 10 4
λ Penalty factor for P n T [ μ S B γ (   X t r a i n n Z   X t r a i n n T ) ] P n 10 4 , 10 3 , , 10 4
μ Penalty factor for P n T S B P n 10 4 , 10 3 , , 10 4
f Features after feature reduction5, 10, 15, 20, …..
k Nearest neighbor samples in affinity Matrix A5
t Kernel parameter of affinity Matrix A 10 4 , 10 3 , , 10 4
β Fusion weight coefficientCalculated by Bayesian fusion
Table 4. Comparison with feature extraction methods.
Table 4. Comparison with feature extraction methods.
Dataset\ClassifierAccPreRecG-Mean
SVM (linear)SVM (RBF)RFELMSVM (linear)SVM (RBF)RFELMSVM (linear)SVM (RBF)RFELMSVM (linear)SVM (RBF)RFELM
PARKINSONSN_DR0.68330.75830.75830.74170.7460.74630.75290.76470.81250.9750.96250.90.58760.56290.58040.6185
PCA0.750.750.750.71670.75150.72970.73980.73860.937510.9750.90.59290.50.54080.5612
KPCA0.71670.6750.75830.69170.70610.67270.74530.74310.987510.9750.8250.41570.15810.56290.5921
LDA0.70830.71670.74170.66670.7480.70930.76290.70610.86250.98750.91250.86250.58740.41570.60420.487
LPP(S)0.76670.750.7250.750.75910.72970.71450.79540.97510.98750.86250.58420.50.44440.6729
LPP(H)0.74170.78330.74170.70830.75330.77230.73370.76390.9250.96250.9750.81250.5890.63960.51780.6374
LDPP(S)0.75830.750.750.69170.76860.73090.73930.73710.92510.9750.83750.6270.50.54080.5788
LDPP(H)0.76670.76670.79170.7250.76570.7560.77150.75710.950.9750.98750.8750.61640.58420.62850.6098
Proposed (S)0.81670.75830.81670.850.78990.73820.79210.8396110.98750.9750.67080.52440.68490.7649
Proposed (H)0.850.80830.850.89170.83560.7810.81940.90690.975110.950.76490.65190.74160.8581
PSDMTSRN_DR0.61250.5750.56250.61880.61520.57210.55450.64580.650.63750.60.6250.61140.57160.56120.6187
PCA0.5750.550.58130.5750.59240.55610.58160.59140.66250.6250.66250.56250.56830.54490.57550.5749
KPCA0.51880.550.56880.50630.66330.68830.56550.4960.350.46250.70.450.49050.5430.55340.5031
LDA0.60.55630.59380.550.60370.55150.5920.55650.650.61250.63750.60.59790.55340.59210.5477
LPP(S)0.56250.56250.58130.60.58280.56020.58910.63440.6250.650.6750.5750.5590.55570.57360.5995
LPP(H)0.54380.56880.60630.55630.54860.57650.60680.58390.6250.58750.6250.5750.53760.56840.6060.5559
LDPP(S)0.56880.58130.50.60.56180.59590.50820.62340.71250.650.58750.61250.55030.57720.49230.5999
LDPP(H)0.54380.50.54380.56880.59430.52010.540.58190.56250.5250.60.53750.54340.49940.54080.5679
Proposed (S)0.73750.66250.68750.75630.75340.66280.68540.7640.7750.750.73750.78750.73650.65670.68570.7556
Proposed (H)0.71250.66250.70630.76250.72190.64250.67730.81710.73750.78750.83750.73750.71210.65060.69390.7621
SelfDataN_DR0.560.54330.53670.48670.39620.33050.38250.36870.35830.24170.25830.350.49880.42420.43190.4497
PCA0.57330.54670.540.52670.36280.45880.36690.43390.20830.26670.250.44170.41250.44220.42820.5076
KPCA0.58670.59670.53670.580.39170.44170.37910.46190.06670.0750.30830.16670.24940.26610.46090.3776
LDA0.58670.56670.51330.490.32880.17420.34550.32560.15830.08330.28330.33330.37160.27220.43460.4451
LPP(S)0.560.50670.56330.50330.34680.35750.45580.41050.18330.2250.29170.50830.38560.39530.4660.5041
LPP(H)0.58330.590.53330.45330.21710.14850.31810.30440.08330.0750.150.30830.27640.26460.3440.4118
LDPP(S)0.59670.58670.550.48670.33290.24320.27440.36150.19170.13330.20.35830.40760.34430.39580.4528
LDPP(H)0.56670.59670.52330.47670.30780.32190.30320.35640.14170.20830.18330.40830.3470.42220.37080.4618
Proposed (S)0.64670.62670.67330.67330.58080.34050.68070.68160.18330.10.3750.5250.41860.31270.57190.6367
Proposed (H)0.61330.640.65670.67330.45110.55920.65850.59450.18330.16670.34170.51670.40620.39910.54420.6339
Note: H indicates heat kernel and S means the simple-minded method used to calculate the affinity matrix (Equation (4) and Equation (5) respectively).
Table 5. Comparison with feature selection methods.
Table 5. Comparison with feature selection methods.
Dataset\ClassifierAccPreRecG-Mean
SVM
(Linear)
SVM
(RBF)
RFELMSVM
(Linear)
SVM
(RBF)
RFELMSVM
(Linear)
SVM
(RBF)
RFELMSVM
(Linear)
SVM
(RBF)
RFELM
PARKINSONSN_DR0.68330.75830.75830.74170.7460.74630.75290.76470.81250.9750.96250.90.58760.56290.58040.6185
mRMR0.71670.66670.7250.74170.72270.6730.71980.7610.950.9750.9750.91250.48730.22080.46840.6042
ReliefF0.750.6750.7250.69170.75430.67580.7280.71460.93750.98750.950.91250.59290.22220.51110.4776
Pvalue0.76670.6750.750.71670.76910.67580.75290.75050.950.98750.96250.88750.61640.22220.55930.5769
SBS0.75830.6750.7250.75830.76910.67270.72790.75460.92510.950.96250.6270.15810.51110.5804
SFS0.79170.6750.73330.73330.78730.67270.73580.76020.9510.950.90.67180.15810.53390.6
SVM_FRE0.750.6750.73330.69170.73960.67270.72650.72910.987510.9750.850.52110.15810.49370.5646
Proposed (S)0.81670.75830.81670.850.78990.73820.79210.8396110.98750.9750.67080.52440.68490.7649
Proposed (H)0.850.80830.850.89170.83560.7810.81940.90690.975110.950.76490.65190.74160.8581
PSDMTSRN_DR0.61250.5750.56250.61880.61520.57210.55450.64580.650.63750.60.6250.61140.57160.56120.6187
mRMR0.56880.53750.61880.58130.5970.450.60320.59860.40.1750.6750.58750.54310.39690.61620.5812
ReliefF0.59380.48750.53130.61880.53140.25510.53050.6330.5250.38750.61250.63750.58980.47710.5250.6185
Pvalue0.56880.4750.56250.63130.60230.38510.56810.63090.61250.250.6250.68750.56710.41830.5590.6287
SBS0.56880.48130.53130.61250.58280.30780.52360.61980.50.28750.58750.6250.56460.44050.52830.6124
SFS0.550.49380.55630.5750.57540.49370.56860.57370.46250.26250.5750.6250.5430.43620.55590.5728
SVM_FRE0.53750.53130.53750.53130.48340.50.53830.53950.450.41250.58750.58750.53030.51780.53520.5283
Proposed (S)0.73750.66250.68750.75630.75340.66280.68540.7640.7750.750.73750.78750.73650.65670.68570.7556
Proposed (H)0.71250.66250.70630.76250.72190.64250.67730.81710.73750.78750.83750.73750.71210.65060.69390.7621
SelfDataN_DR0.560.54330.53670.48670.39620.33050.38250.36870.35830.24170.25830.350.49880.42420.43190.4497
mRMR0.54330.60.52670.52330.164200.31750.28220.100.15830.2250.289600.34970.4031
ReliefF0.58330.59330.50.52330.27220.0250.30140.37980.19170.00830.20.29170.40230.09050.37420.4446
Pvalue0.580.59330.54330.50.28890.0750.35110.38650.250.01670.21670.350.44720.12770.40610.4583
SBS0.51330.60.50670.50.293300.29310.39380.291700.250.43330.439100.41160.4857
SFS0.57330.59670.52670.50330.27920.03330.38250.36970.13330.00830.23330.34170.33990.09080.41050.4569
SVM_FRE0.56670.60.54330.520.188300.2650.37890.17500.13330.350.380600.330.4708
Proposed (S)0.64670.62670.67330.67330.58080.34050.68070.68160.18330.10.3750.5250.41860.31270.57190.6367
Proposed (H)0.61330.640.65670.67330.45110.55920.65850.5940.18330.16670.34170.51670.40620.39910.54420.6339
Table 6. p-values of significance analysis between proposed and other feature-reduction methods (α = 0.05).
Table 6. p-values of significance analysis between proposed and other feature-reduction methods (α = 0.05).
DatasetsClassifiers\MethodsMethods
N_DRmRMRReliefFp-ValueSBSSFSSVM_RFEPCAKPCALDALPP (S)LPP (H)LDPP (S)LDPP (H)
PARKINSONSSVM
(linear)
Proposed (S)0.0450.0130.070.1680.2420.520.0870.1960.0090.0510.2790.0410.1110.111
Proposed (H)0.0080.0020.0130.0740.0120.010.0180.003<0.0010.0020.0040.0090.0120.001
SVM (RBF)Proposed (S)1<0.0010.0230.0150.0040.0040.0040.7580.0040.0520.7260.3940.3430.798
Proposed (H)0.111<0.0010.002<0.001<0.001<0.001<0.0010.025<0.0010.0120.010.1930.0660.052
RFProposed (S)0.0450.0170.0320.0530.0030.0150.0080.0220.0450.0410.0240.0190.0110.279
Proposed (H)0.0070.0030.0030.0180.0030.007<0.0010.0030.0030.0060.0050.002<0.0010.01
ELMProposed (S)0.0040.006<0.0010.0080.0320.004<0.001<0.0010.006<0.0010.030.0280.007<0.001
Proposed (H)<0.0010.004<0.001<0.0010.0030.0070.003<0.0010.0020.0010.0040.0130.001<0.001
PSDMTSRSVM
(linear)
Proposed (S)0.0050.0030.0140.0040.00200.002000.0040.0040.0020<0.001
Proposed (H)0.0080.0040.03500.005<0.0010.010.00200.0160.0070.0010.002<0.001
SVM
(RBF)
Proposed (S)0.0070.003<0.001<0.001<0.001<0.0010.0070.0050.041<0.0010.0030.0150.0060.028
Proposed (H)0.0340.0050.0030.003<0.001<0.0010.0070.0190.0270.0190.0130.030.0450.032
RFProposed (S)0.0080.0170.002<0.001<0.001<0.001<0.0010.0280.0020.0090.0090.022<0.001<0.001
Proposed (H)0.0020.013<0.001<0.001<0.001<0.001<0.0010.004<0.0010.003<0.0010.022<0.001<0.001
ELMProposed (S)0.007<0.0010.0070.0320.036<0.001<0.0010.006<0.001<0.0010.008<0.001<0.0010.002
Proposed (H)0.019<0.0010.0060.0140.007<0.001<0.0010.009<0.001<0.0010.013<0.0010.006<0.001
SelfDataSVM
(linear)
Proposed (S)0.0150.0060.0320.0250.0040.0380.0120.0070.0320.0380.0110.0040.0090.009
Proposed (H)0.0650.0310.2040.1950.0130.140.0720.0580.2470.210.0610.1080.1380.061
SVM
(RBF)
Proposed (S)0.0160.0870.0420.0630.0870.1080.0870.0060.1470.0270.0040.0660.0440.147
Proposed (H)0.0090.0130.010.0160.0130.0130.0130.0030.0280.004<0.0010.0030.0450.057
RFProposed (S)<0.001<0.001<0.001<0.001<0.001<0.001<0.001<0.001<0.001<0.001<0.001<0.001<0.001<0.001
Proposed (H)<0.001<0.001<0.001<0.001<0.0010.002<0.001<0.001<0.001<0.001<0.001<0.001<0.001<0.001
ELMProposed (S)<0.001<0.001<0.0010.003<0.001<0.001<0.001<0.0010.002<0.001<0.001<0.001<0.001<0.001
Proposed (H)<0.0010.002<0.0010.003<0.001<0.001<0.001<0.0010.002<0.001<0.001<0.001<0.001<0.001
Table 7. Performance of feature reduction methods on AUC.
Table 7. Performance of feature reduction methods on AUC.
Methods\ClassifiersSVM (Linear)SVM (RBF)RFELM
N_DR0.55420.61090.50260.5835
Feature
Extraction
PCA0.52070.59530.52430.5543
KPCA0.5490.59690.51630.4655
LDA0.54290.61410.54080.5175
LPP(S)0.54470.62810.53430.5841
LPP(H)0.54980.58590.51310.5114
LDPP(S)0.55570.62340.47220.55
LDPP(H)0.53720.54220.52590.5137
Feature
Selection
mRMR0.54210.62340.6060.4998
ReliefF0.58790.550.47750.5269
Pvalue0.54740.52810.47190.5832
SBS0.55340.52030.47390.5502
SFS0.57930.50940.52180.5359
SVM_FRE0.54220.60470.48640.4662
ProposedSimple minded (S)0.66550.66090.54010.6087
Heat kernel (H)0.60820.68590.56030.6455
Table 8. Accuracy comparison with the state-of-the-art feature reduction methods.
Table 8. Accuracy comparison with the state-of-the-art feature reduction methods.
Methods\DatasetsPSDMTSRPARKINSONSSelfData
ReliefFSVM (linear)0.59380.750.5833
RF0.53130.7250.5
mRMRSVM (linear)0.56880.71670.5433
SVM (RBF)0.53750.66670.6
RF0.61880.7250.5267
SFFS-RF0.60630.80830.6
ReliefF-FC-SVM(RBF)0.61380.81670.6267
LDA-NN-GA0.61380.80830.63
KPCA-SVM(RBF)0.550.6750.5967
Proposed (S)SVM (linear)0.73750.81670.6467
SVM (RBF)0.66250.75830.6267
RF0.68750.81670.6733
ELM0.75630.850.6733
Proposed (H)SVM (linear)0.71250.850.6133
SVM (RBF)0.66250.80830.64
RF0.70630.850.6567
ELM0.76250.89170.6733
Table 9. p-values of significance analysis between proposed and state-of-the-art feature-reduction methods (α = 0.05).
Table 9. p-values of significance analysis between proposed and state-of-the-art feature-reduction methods (α = 0.05).
DatasetClassifiers\MethodsMethods
ReliefF-SVM (Linear)ReliefF-SVM (RBF)mRMR-SVM (Linear)mRMR-SVM (RBF)mRMR-RFLDA-NN-GAReliefF-FC-SVM
(EBF)
SFFS-RFKPCA-SVM (RBF)
PARKINSONSSVM
(linear)
Proposed (S)0.070.0570.013<0.0010.0320.840.070.0570.013
Proposed (H)0.0130.0020.002<0.0010.0050.0960.0130.0020.002
SVM (RBF)Proposed (S)0.780.3090.052<0.0010.2230.1110.780.3090.052
Proposed (H)0.1730.0230.007<0.0010.01510.1730.0230.007
RFProposed (S)0.070.0320.005<0.0010.0170.6780.070.0320.005
Proposed (H)0.0130.003<0.001<0.0010.0030.0520.0130.003<0.001
ELMProposed (S)0.0090.0050.003<0.0010.0030.1380.0090.0050.003
Proposed (H)0.0030.004<0.001<0.0010.0010.0230.0030.004<0.001
PSDMTSRSVM
(linear)
Proposed (S)0.014<0.0010.003<0.0010.0040.0030.0240.0270.003
Proposed (H)0.035<0.0010.0040.0020.0340.0270.0520.0450.004
SVM (RBF)Proposed (S)0.1370.0110.0670.0030.1110.090.3090.2350.041
Proposed (H)0.2590.0270.0340.0050.2260.1770.1770.310.027
RFProposed (S)0.0220.0020.025<0.0010.0170.010.0750.0450.016
Proposed (H)0.014<0.0010.006<0.0010.0130.0070.0440.0290.005
ELMProposed (S)0.001<0.001<0.001<0.0010.0030.0010.0010.001<0.001
Proposed (H)0.004<0.001<0.001<0.0010.0070.0050.0050.003<0.001
SelfDataSVM
(linear)
Proposed (S)0.032<0.0010.0060.0550.0030.5040.4250.3960.086
Proposed (H)0.204<0.0010.0310.5090.010.4750.5650.80.44
SVM (RBF)Proposed (S)0.083<0.0010.0050.0870.0110.34310.4850.147
Proposed (H)0.035<0.0010.0040.0130.0020.4960.3730.3580.028
RFProposed (S)0.002<0.001<0.001<0.001<0.0010.0060.0030.0870.002
Proposed (H)<0.001<0.001<0.001<0.001<0.0010.1370.0950.23<0.001
ELMProposed (S)0.008<0.0010.003<0.001<0.0010.0830.0660.0990.002
Proposed (H)0.006<0.001<0.0010.003<0.0010.0960.0770.1730.003
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Hameed, Z.; Rehman, W.U.; Khan, W.; Ullah, N.; Albogamy, F.R. Weighted Hybrid Feature Reduction Embedded with Ensemble Learning for Speech Data of Parkinson’s Disease. Mathematics 2021, 9, 3172. https://doi.org/10.3390/math9243172

AMA Style

Hameed Z, Rehman WU, Khan W, Ullah N, Albogamy FR. Weighted Hybrid Feature Reduction Embedded with Ensemble Learning for Speech Data of Parkinson’s Disease. Mathematics. 2021; 9(24):3172. https://doi.org/10.3390/math9243172

Chicago/Turabian Style

Hameed, Zeeshan, Waheed Ur Rehman, Wakeel Khan, Nasim Ullah, and Fahad R. Albogamy. 2021. "Weighted Hybrid Feature Reduction Embedded with Ensemble Learning for Speech Data of Parkinson’s Disease" Mathematics 9, no. 24: 3172. https://doi.org/10.3390/math9243172

APA Style

Hameed, Z., Rehman, W. U., Khan, W., Ullah, N., & Albogamy, F. R. (2021). Weighted Hybrid Feature Reduction Embedded with Ensemble Learning for Speech Data of Parkinson’s Disease. Mathematics, 9(24), 3172. https://doi.org/10.3390/math9243172

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop