Next Article in Journal
A New Study on Optimization of Four-Bar Mechanisms Based on a Hybrid-Combined Differential Evolution and Jaya Algorithm
Previous Article in Journal
The Franson Experiment as an Example of Spontaneous Breaking of Time-Translation Symmetry
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reduced-Kernel Weighted Extreme Learning Machine Using Universum Data in Feature Space (RKWELM-UFS) to Handle Binary Class Imbalanced Dataset Classification

Department of Computer Science and Engineering, Maulana Azad National Institute of Technology, Bhopal 462003, India
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(2), 379; https://doi.org/10.3390/sym14020379
Submission received: 20 January 2022 / Revised: 5 February 2022 / Accepted: 10 February 2022 / Published: 14 February 2022

Abstract

:
Class imbalance is a phenomenon of asymmetry that degrades the performance of traditional classification algorithms such as the Support Vector Machine (SVM) and Extreme Learning Machine (ELM). Various modifications of SVM and ELM have been proposed to handle the class imbalance problem, which focus on different aspects to resolve the class imbalance. The Universum Support Vector Machine (USVM) incorporates the prior information in the classification model by adding Universum data to the training data to handle the class imbalance problem. Various other modifications of SVM have been proposed which use Universum data in the classification model generation. Moreover, the existing ELM-based classification models intended to handle class imbalance do not consider the prior information about the data distribution for training. An ELM-based classification model creates two symmetry planes, one for each class. The Universum-based ELM classification model tries to create a third plane between the two symmetric planes using Universum data. This paper proposes a novel hybrid framework called Reduced-Kernel Weighted Extreme Learning Machine Using Universum Data in Feature Space (RKWELM-UFS) to handle the classification of binary class-imbalanced problems. The proposed RKWELM-UFS combines the Universum learning method with a Reduced-Kernelized Weighted Extreme Learning Machine (RKWELM) for the first time to inherit the advantages of both techniques. To generate efficient Universum samples in the feature space, this work uses the kernel trick. The performance of the proposed method is evaluated using 44 benchmark binary class-imbalanced datasets. The proposed method is compared with 10 state-of-the-art classifiers using AUC and G-mean. The statistical t-test and Wilcoxon signed-rank test are used to quantify the performance enhancement of the proposed RKWELM-UFS compared to other evaluated classifiers.

1. Introduction

The performance of a classification problem is affected by various data complexity measures such as class imbalance, class overlapping, length of the decision boundary, small disjuncts of classes, etc. In the classification domain, most of the real-world problems are class imbalanced. Examples of such problems are cancer detection [1,2], fault detection [3], intrusion detection system [4], software test optimization [5], speech quality assessment [6], pressure prediction [7], etc. In a problem when the number of samples in one class outnumbers the numbers of samples in some other class, it is considered as a class imbalanced/asymmetric problem. The class with a greater number of instances is the majority class and the class with fewer instances is the minority class. In real-world problems, usually, the minority class instances have more importance than the majority class.
Traditional classifiers such as the support vector machine (SVM), Naive Bayes, decision tree, and extreme learning machine (ELM) are biased towards the correct classification of majority class data. Various approaches have been proposed to handle such class-imbalanced classification problems, which can be classified as data sampling, algorithmic and hybrid methods [8].
In classification, the idea of using additional data along with the original training data has been used widely for better training of the model. The virtual example method, oversampling method, noise injection method, and Universum data creation method are some examples that use additional data. The oversampling method generates additional data in the majority class to balance the data distribution in the classes. In the virtual example and noise injection methods, labeled synthetic data are created that may not come from the same distribution as the original data. Universum data creation methods allow the classifier to encode prior knowledge by representing meaningful concepts in the same domain as the problem at hand as stated in [9]. In Universum learning-based classification models, the Universum data are added to the training data to enhance performance. Universum data are data that do not belong to any of the target classes. The two main factors which affect the performance of Universum data are the number of Universum data created and the method used for the creation of Universum data. Different methods have been used for the creation of Universum; among those, the two most common methods widely used are the use of examples from other classes and random averaging [9].
Several methods have been proposed that use Universum data in the training of SVM based classifiers to handle the class imbalance problem, such as the Universum Support Vector Machine (USVM) [9], Twin support vector machine with Universum data (TUSVM) [10], and Cost-Sensitive Universum-SVM (CS-USVM) [11]. A Universum support vector machine-based model for EEG signal classification has been proposed in [12]. A nonparallel support vector machine for a classification problem with Universum learning has been proposed in [13]. An improved non-parallel Universum support vector machine and its safe sample screening rules are proposed in [14]. Tencer et al. [15] used Universum data with other classifiers such as fuzzy models to demonstrate its usefulness in combination with fuzzy models. Recently, a Multiple Universum Empirical Kernel Learning (MUEKL) [16] classifier has been proposed to handle class imbalance by combining the Universum learning with Multiple Empirical Kernel Learning (MEKL).
Extreme Learning Machine (ELM) [17] is a single hidden-layer feed-forward neural network designed for regression and classification with fast speed and better generalization performance, but it cannot handle the classification of class-imbalanced problems effectively. Various ELM based models have been proposed to handle the classification of class imbalance problems, such as Weighted Extreme Learning Machine (WELM) [18], Class-Specific Cost Regulation Extreme Learning Machine (CCR-ELM) [19], Class-Specific Kernelized Extreme Learning Machine (CSKELM) [20], Reduced-Kernelized Weighted Extreme Learning Machine (RKWELM) [21], UnderBagging-based Kernelized Weighted Extreme Learning Machine (UBKWELM) [22], and UnderBagging-based Reduced-Kernelized Weighted Extreme Learning Machine (UBRKWELM) [21]. The proposed work is motivated by the idea that none of the existing ELM-based models for classification encode prior knowledge in the training model using Universum data.
This work proposes a novel hybrid classification model called Reduced-Kernel Weighted Extreme Learning Machine using Universum data in Feature Space (RKWELM-UFS) which incorporates the Universum data in the RKWELM model. The contributions of the proposed approach are listed below.
  • This work is the first attempt that utilized the Universum data in a Reduced-Kernelized Weighted Extreme Learning Machine (RKWELM)-based classification model to handle the class imbalance problem.
  • The Weighted Kernelized Synthetic Minority Oversampling Technique (WKSMOTE) [23] is an oversampling-based classification method in which the synthetic samples are created in the feature space of the Support Vector Machine (SVM). Inspired by WKSMOTE, the proposed work creates the Universum samples in the feature space.
  • The proposed method uses the kernel trick to create the Universum samples in the feature space between randomly selected instances of the majority and minority classes.
  • In a classification problem, the samples located near the decision boundary contribute more to better training. The creation of Universum samples in feature space ensures that the Universum samples lie near the decision boundary.
The rest of the paper is structured as follows. In the related work section, Universum learning, class imbalance learning, ELM classifier, and its variants are discussed in detail. The proposed work section provides a detailed explanation of the proposed RKWELM-UFS classifier. The experimental setup and result analysis section provide the specification of the dataset used in the experiments, parameter settings of the proposed algorithm, the evaluation metrics used for performance evaluation, and the experimental results obtained in form of various tables and figures. The last section provides the concluding remarks and future research directions.

2. Related Work

The following section provides the literature related to Universum learning, class imbalance learning, and some of the existing ELM-based models to handle class imbalance learning.

2.1. Universum Learning

The idea of using Universum data is close to the idea of using the prior knowledge in Bayesian classifiers [9]. However, there is a conceptual difference between the two approaches, i.e., the prior knowledge is knowledge about decision rules used in Bayesian inference, while the Universum is knowledge about the admissible collection of examples. Similarly to the Bayesian prior probability, the Universum data encode prior information.
It has been observed by various researchers [9,15,24] that the effect of Universum is dependent on the quality of Universum samples created. A safe sample screening rule for Universum support vector machines, in which the non-contributed data can be identified and safely eliminated before the training process, can obtain the same solution as solving the original problem is proposed in [25]. An improved version of the non-parallel Universum support vector machine and its safe sample screening rule is proposed in [14]. It is suggested in [24] that not all the Universum samples are helpful for effective classification, so they proposed selecting the informative Universum samples for semi-supervised learning, which is a method used to identify informative samples among the Universum samples. An empirical study on the Universum support vector machine (USVM), which describes some practical conditions for evaluating the effectiveness of random averaging for the creation of Universum data, is performed in [26].

2.2. Class Imbalance Learning

The classification performance of traditional classifiers degrades when there is an imbalance in the ratio of the majority and minority class data. Different approaches have been used in classification to deal with the problem of class imbalance. Table 1 provides the categorization of the proposed methods and other methods used in this work for comparison. Table 1 also provides the strategy and basic ideas used in the respective methods. The broad categories of these approaches are discussed in the following subsections.

2.2.1. Data Level Approach

The data-level methods are based on balancing the ratio of data to convert an imbalanced classification problem into a balanced classification problem. These methods can be seen as data pre-processing methods because they try to handle the class imbalance present in the data before the classification model generation. The data-level approaches can be broadly categorized as under-sampling, oversampling and, hybrid sampling methods.
The under-sampling methods remove some of the data (i.e., the majority samples) to decrease the imbalance ratio of a training dataset. These methods may suffer from data loss, as some of the important samples may be removed. The efficiency of an under-sampling method lies in its ability to select the right samples which can be removed from the dataset. The under-sampling methods reduce the time complexity of a given class-imbalanced classification problem. A combined weighted multi-objective optimizer for instance reduction in a two-class imbalanced data problem is proposed in [27]. Clustering-Based Under-Sampling (CBUS) [28] uses clustering of majority class data for the under-sampling. Fast Clustering-Based Under-Sampling (FCBUS) [29] is a modified version of CBUS which clusters the minority class data for under-sampling to reduce the time complexity of CBUS.
The oversampling method adds some additional data (in the minority class) to decrease the imbalance ratio of the training dataset. The additional samples are obtained by creating synthetic minority class samples or replicating the existing minority class samples. These methods can lead to over-fitting problems in model generation. The oversampling methods increase the time complexity of a given class imbalance classification problem. The synthetic minority oversampling technique (SMOTE) [30] is a popular oversampling method, widely used to handle class imbalance, in which synthetic minority samples are created. Several variants of SMOTE have been proposed to further enhance the performance of class imbalance dataset classification, such as Borderline SMOTE, Borderline SMOTE1, Borderline SMOTE2, Safe- Level-SMOTE, MSMOTE [31], and CSMOTE [32]. The hybrid sampling methods such as SCUT [16] try to reduce the class imbalance by using both oversampling and under-sampling.

2.2.2. Algorithmic Approach

There are some approaches in which the classification algorithm is able to handle class imbalance problems, such as cost-sensitive and one-class learning approaches. The cost-sensitive methods assign a different cost to the misclassification of different classes. In an imbalance problem, generally, the misclassification cost of minority class samples is higher than the misclassification cost of majority class samples. The efficiency of any cost-sensitive method lies in the selection of misclassification costs for different classes. Multiple Random Empirical Kernel Learning (MREKL) [33] is a cost-sensitive classification model which emphasizes the importance of samples located in overlapping regions of positive and negative classes and ignores the effects of noisy samples to achieve better performance in class imbalance problems. Weighted Extreme Learning Machine (WELM) [18] is a weighted version of Extreme Learning Machine (ELM) [17] that minimizes the weighted error by incorporating a weight matrix in the optimization problem of ELM. Class-Specific Extreme Learning Machine (CSELM) [34] is a variant of WELM which replaces the weight matrix with two constant weight values for each class. Class-Specific Kernel Extreme Learning Machine (CSKELM) [20] is the modification of CSELM which uses the Gaussian kernel function to map the input data to feature space. Class-Specific Cost-Regulation Extreme Learning Machine (CCR-KELM) [19] is the variant of KELM which uses different regularization perimeters for the classes.
The one-class learning approach is also called single-class learning. In these methods, the classifier learns only one class as the target class. In this approach generally, the minority class is considered as the target class. Multi-Kernel Support Vector Data Description with boundary information proposes a novel method called MKL-SVDD [35] by introducing Multi- Kernel Learning (MKL) into the traditional Support Vector Data Description (SVDD) based on the boundary information to form one-class learning.

2.2.3. Hybrid Approach

In a hybrid approach, multiple classification approaches are combined to handle a class imbalance problem. Some hybrid techniques combine ensemble techniques with data sampling methods such as over-sampling or under-sampling to handle class imbalance problems. RUSBoost [36] is a hybrid technique that combines random under-sampling with boosting to create an ensemble of classifiers. UBKELM [22] and UBRKELM [21] are two hybrid classification models that combine underbagging with KELM and RKELM respectively. BPSO-AdaBoost-KNN [37] is a method that implements BPSO as the feature selection algorithm and then designs an AdaBoost-KNN classifier to convert the traditional weak classifier into a strong classifier. UBoost: Boosting with the Universum [38] is a technique that combined the Universum sample creation with a boosting framework. An Adaptive-Boosting (AdaBoost) algorithm [39] uses multiple iterations to learn multiple classifiers in a serial manner to generate a single strong learner.
Some hybrid techniques combine cost-sensitive approaches with ensemble techniques such as Ensemble of Weighted Extreme Learning Machine (EWELM) [40] and Boosting Weighted Extreme Learning machine (BWELM) [41]. In EWELM, the weight of each component classifier in the ensemble is optimized by using a differential evolution algorithm. BWELM is a modified AdaBoost framework that combines multiple Weighted ELM-based classifiers in a boosting manner. The main idea of BWELM is to find the better weights in each base classifier.

2.3. Extreme Learning Machine (ELM) and Its Variants to Handle Class Imbalance Learning

ELM [17,42] is a generalized single hidden-layer feed-forward neural network, which provides good generalization performance and disposes of the iterative time-consuming training process. It uses the Moore–Penrose pseudoinverse for computing the weights between the hidden and the output layer which make it fast. For a given classification dataset with N training samples x i , t i i = 1 N , where x i = x i 1 , x i 2 , . ,     x i n T ϵ   R n is the input feature vector and t i = t i 1 , t i 2 , . ,     t i m T ϵ   R m is the output label vector. Here, the vector/matrix transpose is denoted by superscript T. During the training time, these weights are randomly generated and are not changed further. The hidden neurons bias matrix is denoted by b = b 1 , b 2   , .. b j , ..   b L T ϵ   R L , where b j is the bias of the jth hidden neuron. In ELM, for a given training/testing sample, i.e., x i , the hidden layer output h x i is calculated as follows:
h x i = G a x i + b
Here, G . is the activation function of the hidden neurons. In ELM, for a binary classification problem, the decision function, i.e., f x i for a sample x i   is given as:
f x i = s i g n h x i β
where β is the output weight matrix. The hidden layer output matrix H can be written as follows:
H = h 1 x 1 h 2 x 1 . h L x 1 h 1 x 2 h 2 x 2 . h L x 2 . . . . h 1 x N h 2 x N . h L x N N × L
ELM minimizes the training error and the norm of the output weights as:
M i n i m i z e : H β T 2   a n d   β
In the original implementation of ELM [17], the minimal norm least-square method instead of the standard optimization method was used to find β.
β = H + T
where H + is the Moore–Penrose generalized inverse of matrix H. In [17,42] the orthogonal projection method is used to calculate H + , which can be used in two cases.
When H T H   is nonsingular then,
H + = H T H 1 H T
When H T H is singular then
H + = H T H H T 1
In ELM the constrained optimization-based problem for classification with multiple output nodes was formulated as follows:
M i n i m i z e = 1 2 β 2 + C 1 2 i = 1 N ξ i 2
Subjected to: h x i β = t i T ξ i T ,   i = 1 , ,   N
The output layer weights β can be obtained using two solutions
Case 1. Where the Number of Training Samples is Not Huge:
β = H T I C + H H T 1 T
Case 2. Where the Number of Training Samples is Huge:
β = I C + H T H 1 H T H

2.3.1. Weighted Extreme Learning Machine (WELM)

Conventional ELM does not account for good generalization performance while dealing with the class-imbalance learning problems. Weighted Extreme Learning Machine (WELM) [18] is a cost-sensitive version of ELM which was proposed for handling the class-imbalanced learning problem effectively. In cost-sensitive learning methods, the different cost is assigned to the misclassification of different class samples. In WELM, two generalized weighting schemes were proposed. These generalized weighting schemes assign weights to the training samples as per their class distribution. In WELM [18], the following optimization problem is formulated:
M i n i m i z e = 1 2 β 2 + 1 2 C W i = 1 N ξ i 2
Subjected to: h x i β = t i T ξ i T ,   i = 1 , ,   N
Here, C is the regularization parameter and W = d i a g W i i is a N × N   diagonal matrix whose diagonal elements are the weights assigned to the training samples. The two weighting schemes proposed by WELM are:
Weighting scheme W1:
W i i = 1 q k
Here, k = t i   and q k is the total number of samples belonging to kth class.
Weighting scheme W2:
W i i = 0.618 q k   i f   ( q k > q a v g ) 1 q k   i f   ( q k < = q a v g )
Here, q a v g represents the average number of samples for all classes. Weight W i i is assigned to the ith samples. Samples belonging to the minority class will be assigned weights equal to   1 / q i , in both the weighting schemes. The second weighting scheme assigns a lesser weight to the majority class samples compared to the first weighting scheme. The two variants of WELM are sigmoid node-based WELM and Gaussian kernel-based WELM, which are described as follows.
  • Sigmoid node-based Weighted Extreme Learning Machine
The Sigmoid node-based WELM uses random input weights and Sigmoid activation function i.e., G(.), to find the hidden layer output matrix H given in Equation (3). The solution of the optimization problem of WELM as given in [18] is reproduced below:
β = H T I C + W H H T 1 W T   i f   ( N > L ) I C + H T W H 1 H T W H   i f   ( N > L )
The two solutions are given for two cases. The first solution is given for the case when the number of training samples is smaller than the number of selected hidden layer neurons. The second solution is given for the case where the number of selected hidden layer neurons is smaller than the number of training samples.
  • Gaussian kernel-based Weighted Extreme Learning Machine (KWELM)
In KELM [42], the kernel matrix of the hidden layer is represented as follows:
Ω = K x 1 ,   x 1 K x 1 ,   x 2 . K x 1 ,   x N K x 2 ,   x 1 K x 2 ,   x 2 . K x 2 ,   x N . . . . K x N ,   x 1 K x N ,   x 2 . K x N ,   x N N × N
The Gaussian kernel-based WELM maps the input data to the feature space as follows:
K x i ,   x j = e x p x i x j 2 σ 2
Here, σ represents the kernel width parameter, x i represents the ith sample and x j represents the jth centroid; i ,   j 1 ,   2 ,   N .   K x i ,   x j represents the distance of the jth centroid x j   to the ith input sample   x i . The number of Gaussian kernel functions i.e., the centroids used in [32] was equal to the number of training samples. On applying Mercer’s condition, the kernel matrix of KELM [42] can be represented as given below:
Ω K E L M = H H T : Ω K E L M i , j = h x i . h x j = K x i ,   x j  
The output of KWELM is determined in [18] which is represented as follow:
f x = s i g n K x , x 1 . . K x , x N T I C + W Ω K E L M 1 W T
Compared to the Sigmoid node-based WELM, KWELM has better classification performance, as stated in [18].

2.3.2. Reduced Kernel Weighted Extreme Learning Machine (RKWELM)

Reduced-Kernel Extreme Learning Machine (RKELM) [43] is a fast and accurate kernel-based supervised algorithm for classification. Unlike Support Vector Machine (SVM) or Least-Square SVM (LS-SVM), which identify the support vectors or weight vectors iteratively, the RKELM randomly selects a subset of the available data samples as centroids or mapping samples. The weighted version of RKELM i.e., Reduced-Kernel Weighted Extreme Learning Machine (RKWELM) is proposed in [21] for class imbalance learning. In RKWELM, a reduced number of kernels are selected, which act as the centroids. The number of Gaussian kernel functions used in RKWELM is denoted as N ˜ where N ˜ N . The kernel matrix of the hidden layer can be reproduced as given by the following equation.
Ω R K E L M = K x 1 ,   x 1 K x 1 ,   x 2 . K x 1 ,   x N K x 2 ,   x 1 K x 2 ,   x 2 . K x 2 ,   x N . . . . K x N ,   x 1 K x N ,   x 2 . K x N ,   x N ˜ N × N ˜
Here, x i   represents the ith sample and x j   represents the jth centroid i 1 ,   2 ,   N and j 1 ,   2 ,   N . In the case when N = = N ˜ , the output of RKWELM can be given by the following equation.
The final output of RKWELM, as given in [43], is computed as:
f x = s i g n K x , x 1 . K x , x N ˜ T I C + Ω R K E L M T W Ω R K E L M 1 Ω R K E L M T W T

2.3.3. UnderBagging-Based Kernel Extreme Learning Machine (UBKELM)

UnderBagging-Based Kernel Extreme Learning Machine (UBKELM) [22] is an ensemble of KELM. UBKELM creates several balanced training subsets by random under-sampling of the majority class samples. K is the number of balanced subsets that are created by selecting M number of majority samples and all the minority samples in each subset, where M is the number of minority samples in the training dataset and K is the ceiling value of the imbalance ratio of the training dataset. In the subset creation, the majority samples are selected using the random under-sampling method. There are two variants of UBKELM, i.e., UnderBagging-Based Kernel Extreme Learning Machine-Max Voting (UBKELM-MV) and UnderBagging-Based Kernel Extreme Learning Machine-Soft Voting (UBKELM-SV) in which the ultimate outcome of the ensemble is computed by majority voting and soft voting respectively.

2.3.4. UnderBagging-Based Reduced-Kernelized Weighted Extreme Learning Machine

UnderBagging-based Reduced-Kernelized Weighted Extreme Learning Machine (UBRKELM) [21] is an ensemble of Reduced Kernelized Weighted Extreme Learning Machine (RKWELM). The UBRKELM creates several balanced training subsets and learns multiple classification models with these balanced training subsets using RKWELM as the classification algorithm. K is the number of balanced subsets that are created by selecting M number of majority samples and all the minority samples in each subset, where M is the number of minority samples in the training dataset and K is the ceiling value of the imbalance ratio of the training dataset. In UBRKELM the reduced number of kernel functions is used as centroids to learn an RKELM model. Two variants of UBRKWELM are proposed, UBRKWELM-MV and UBRKWELM-SV, in which the final outcome of the ensemble is computed by majority voting and soft voting respectively.

3. Proposed Method

This work proposes a novel Reduced-Kernel Weighted Extreme Learning Machine using Universum data in Feature Space (RKWELM-UFS) to handle the class imbalance classification problem. In the proposed work, the Universum data along with the original training data is provided to the classifier for training purposes, to improve its learning capability. The proposed method creates Universum samples in the feature space because the mapping of input data from the input space to the feature space is not conformal.
The following subsections describe the process of creation of the Universum samples in the input space, the process of creation of the Universum samples in the feature space, the proposed RKWELM-UFS classifier, and the computational complexity of the proposed RKWELM-UFS classification model. Algorithm 1 provides the pseudo-code of the proposed RKWELM-UFS.

3.1. Generation of Universum Samples in the Input Space

To generate a Universum sample x u between a majority sample x m and a minority sample x n , the following equation can be used:
x u = x m + δ x n x m
where δ represents a random number in the uniform distribution U [0, 1].

3.2. Generation of Universum Samples in the Feature Space

To generate a Universum sample in the feature space between a majority sample x m and a minority sample x n the following equation can be utilized:
ϕ x m n = ϕ x m + δ m n ϕ x n ϕ x m
where, ϕ . is the feature transformation function which is generally unknown and δ m n is a random number between [0, 1]. The proposed work uses δ m n = 0.5. Similarly to SVM, LS-SVM, and PSVM, the transformation function ϕ . need not be known to users; instead, its kernel function K ( x m ,   x n ) can be deployed. If a feature mapping ϕ . is unknown to users, one can apply Mercer’s conditions on ELM to define a kernel matrix for KELM [17] as follows:
Ω K E L M = H H T : Ω K E L M m , n = K x m ,   x n = ϕ x m T . ϕ x n
In the proposed work, we have to calculate the kernel function K x i ,   x j m n , where x i represents the original target training sample and x j m n   is the Universum sample. According to [23] without computing   ϕ x i and ( x j m n ) , we can obtain the corresponding kernel K x i , x j m n using the following equation:
K x j m n ,   x i = ϕ x i T   ϕ x j m n = ϕ x i T ( ϕ x m + δ m n ϕ x n ϕ x m = ϕ x i T ( ϕ x m + δ m n ϕ x i T ϕ x n δ m n ϕ x i T ϕ x m = K x i ,   x m + δ m n K x i ,   x n + δ m n K x i ,   x m K x j m n ,   x i = 1 δ m n K x i ,   x m + δ m n K x i ,   x n

3.3. Proposed Reduced-Kernel Weighted Extreme Learning Machine Using Universum Samples in Feature Space (RKWELM-UFS)

Training of an ELM [42] based classifier requires the computation of the output layer weight matrix β. The proposed RKWELM-UFS uses the same equation as RKWELM [21] to obtain the output layer weight matrix β which is reproduced below:
β = I C + Ω R K E L M U F S T   W Ω R K E L M U F S 1 Ω R K E L M U F S T   W T
where, W is the diagonal weight matrix, which gives different weights to the majority class, the minority class, and the Universum instances using Equation (10), T is the target vector in which the class label for Universum samples is set to 0 (given the class label of majority and minority class are +1 and −1 respectively), and Ω R K E L M U F S is the kernel matrix of the proposed RKWELM-UFS.
In the proposed work, the Universum instances are added to the training process along with the original training instances. The reason behind computing β in the same manner as RKWELM is that the proposed RKWELM-UFS computes the kernel matrix Ω R K E L M U F S by deploying the original training instances excluding the Universum instances as centroids. The value of Ω R K E L M U F S is obtained by augmentation of the two matrices Ω R K E L M   a n d   Ω U F S   . The following subsections describe the computation of Ω R K E L M   , Ω U F S   and Ω R K E L M U F S .

3.3.1. Computation of Ω KELM  

The proposed work computes the kernel matrix for the N number of original training instances termed as Ω K E L M   in the same manner as it was computed in the KELM [42], which is represented as:
Ω K E L M = K x 1 ,   x 1 K x 1 ,   x 2 . K x 1 ,   x N K x 2 ,   x 1 K x 2 ,   x 2 . K x 2 ,   x N . . . . K x N ,   x 1 K x N ,   x 2 . K x N ,   x N N × N

3.3.2. Computation of Ω UFS  

Equation (20) can be used to create a Universum sample ϕ x m n between two original training samples ϕ x m and ϕ x n in feature space. As we have discussed the transformation function ϕ . is unknown to the user, so the computation of ϕ x m n is not possible here. For convenience, we will refer to the Universum sample ϕ x m n as ϕ u i . In the proposed work without computing ϕ u i , we can directly compute the corresponding kernel K u i , x j . K ( u i , x j ) is calculated using Equation (22). In the proposed algorithm, only the original training samples are used as centroids, so the matrix Ω U F S for p number of Universum samples and N number of original training samples can be represented as:
Ω U F S = K u 1 ,   x 1 K u 1 ,   x 2 . K u 1 ,   x N K u 2 ,   x 1 K u 2 ,   x 2 . K u 2 ,   x N . . . . K u p ,   x 1 K u p ,   x 2 . K u p ,   x N p × N

3.3.3. Computation of Ω RKELM UFS

The addition of Universum samples in the training process requires that the original kernel matrix i.e., Ω R K E L M be augmented to include the matrix Ω U F S . The final hidden layer output kernel matrix of the proposed RKWELM-UFS is obtained by augmentation of the two matrices Ω K E L M and Ω U F S which is denoted as Ω R K E L M U F S .
Ω R K E L M U F S = K x 1 ,   x 1 K x 1 ,   x 2 . K x 1 ,   x N K x 2 ,   x 1 K x 2 ,   x 2 . K x 2 ,   x N . . . . . . . . K x N ,   x 1 K x N ,   x 2 . K x N ,   x N K u 1 ,   x 1 K u 1 ,   x 2 . K u 1 ,   x N K u 2 ,   x 1 K u 2 ,   x 2 . K u 2 ,   x N . . . . . . . . K u p ,   x 1 K u p ,   x 2 . K u p ,   x N ( ( N + P ) × N )
The output of RKWELM-UFS can be obtained using Equation (18) used in RKWELM, which is reproduced below:
f x = s i g n K x t , x 1 . K x t , x i . . K x t , x N T I C + Ω R K E L M U F S T   W Ω R K E L M U F S 1   Ω R K E L M U F S T   W T
Here x t represents the test instance and x i   represent the training instance for i = 1, 2, …, N.
Algorithm 1 Pseudocode of the proposed RKWELM-UFS
INPUT: Training Dataset   X x i , t i i = 1 N
Number of Universum samples to be generated: p
OUTPUT:
1: Calculate the kernel matrix Ω K E L M   ϵ   N × N as shown in Equation (24) for the N number of original training instances using Equation (21).
2: Calculate the kernel matrix Ω U F S   ϵ   p × N as shown in Equation (25) for the N number of training instances and p number of Universum instances as follows.
       for j = 1 to p
       Randomly select one majority instance x m
       Randomly select one minority instance x n
          for i = 1 to N
          calculate K x j m n ,   x i using Equation (22)
         End
      End
3: Augment the matrix Ω K E L M with the matrix Ω U F S to obtain the reduced kernel matrix
using Universum samples Ω R K E L M U F S shown in Equation (26).
4: To obtain the output weight matrix β use the Equation (23).
5: To determine the class label of an instance x use the Equation (27).

3.4. Computational Complexity

For training of the ELM-based classification algorithm, it is necessary to obtain the output layer weight matrix i.e., β . For the proposed RKWELM-UFS β is obtained using Equation (23) which is reproduced below:
β = I C + Ω R K E L M U F S T   W Ω R K E L M U F S 1 Ω R K E L M U F S T   W T
Here, Ω R K E L M U F S is a matrix of size N + p × N , where N is the number of training instances and p is the number of Universum samples. The weight matrix, i.e., W, is of size N + p × N + p and the target matrix i.e., T is of size N + p × c where c is the number of target class labels; here, the number of target class labels is 2 because we are using the binary classification problems. To compute Ω R K E L M U F S first we need to compute the Ω R K E L M and Ω U F S . In the following steps the computational complexity of computing β is identified step by step:
  • The computational complexity of calculating Ω R K E L M i.e., the kernel matrix shown in Equation (24) is O n N 2 , where n is the number of features of training data in input space.
  • The computational complexity of calculating matrix Ω U F S shown in Equation (25) is O p .
  • The computational complexity of the output weights β can be calculated as
    3.1
    Matrix multiplications: ( Ω R K E L M U F S T   W Ω R K E L M U F S )
    Computational complexity: O 2 N N + p 2
    3.2
    Computational complexity of computing the inverse of N × N matrix computed in Step 3.1 is O N 3
    3.3
    Computational complexity of matrix multiplications Ω T W T is
    O N 2 N + p + N c N + p
    3.4
    Computational complexity of matrix multiplication of 2 matrices obtained in Step 3.1 and Step 3.3 is O N 2 c
The final computational complexity of calculating β is O 2 N N + p 2 + N 3 + N 2 N + p + N 2 n + N 2 c + N c N + p + p . The computational complexity can be simplified to O N 3 because the value of c is 2, the value of n is smaller than N, and the maximum value of p can be N.

4. Experimental Setup and Result Analysis

This section provides the experiments performed to evaluate the proposed work, which includes the specification of the datasets used for experimentation, the parameter settings of the proposed algorithm, the evaluation metrics used for performance comparison, and the results obtained through experiments and performance comparison with the state-of-the-art classifiers.

4.1. Dataset Specifications

The proposed work uses 44 binary class-imbalanced datasets for performing the experiments. These datasets are downloaded from the KEEL dataset repository [44,45] in 5-fold cross-validation format. Table 2 provides the specification of these datasets. In Table 2, # Attributes denote the number of features, # Instances denotes the number of instances and, IR denotes the class imbalance ratio in the presented datasets. The class imbalance ratio (IR) for the binary class dataset can be defined as follows:
I R = n u m b e r   o f   i n s t a n c e s   i n   m a j o r i t y   c l a s s n u m b e r   o f   i n s t a n c e s   i n   m i n o r i t y   c l a s s
The datasets used for the experiments are normalized using min-max normalization in the range [1, −1] using the following equation:
x = x m i n n m a x n m i n n     2 1
Here, the original feature value of nth feature is denoted by x, minimum value of nth feature is denoted by minn and the maximum value of nth feature is denoted by maxn.

4.2. Evaluation Matrix

The confusion matrix, also called the error matrix, can be employed to evaluate the performance of a classification model. It allows the visualization of the performance of an algorithm. In a confusion matrix TP denotes True Positive, TN denotes True Negative, FP denotes False Positive, and FN denotes False Negative.
Accuracy is not a suitable measure to evaluate the performance of a classifier when dealing with a class-imbalanced problem. The other performance matrices used for the performance evaluation in such problems are G-mean and AUC (area under the ROC curve). The AUC defines the measure of the entire area under the ROC curve in two dimensions. The ROC known as receiver operating characteristic curve is a graph that shows the performance of the model by plotting T P r a t e and T N r a t e on the graph.
G m e a n = T P r a t e T N r a t e A U C = 1 + T P r a t e   T N r a t e 2
Here,
T P r a t e = T P T P + F N   a n d   T N r a t e = T N T N + F P

4.3. Parameter Settings

The proposed RKWELM-UFS creates Universum samples between randomly selected pairs of majority and minority samples. Because of the randomness, this work presents the mean (denoted as tstR or TestResult) and standard deviation (denoted as std) of the test G-mean and test AUC obtained for 10 trials. The proposed RKWELM-UFS has two parameters, namely the regularization parameter C and the Kernel width parameter σ (denoted as KP). The optimal values of these parameters are obtained using grid search, by varying them on the range 2 18 , 2 16 , , 2 48 ,   2 50   a n d   2 18 , 2 16 , , 2 18 ,   2 20   respectively.

4.4. Experimental Results and Performance Comparison

The proposed RKWELM-UFS is compared with three sets of algorithms used to handle class imbalance learning. The first set contains the existing approaches which use Universum samples in the classification model generation to handle class-imbalanced problems such as MUEKL [16] and USVM [9]. The second set of approaches consists of the single classifiers such as KELM [46], WKELM [18], CCR-KELM [19], and WKSMOTE [23] which are used to handle class-imbalanced problems. The third set contains the popular ensemble classifiers such as RUSBoost [36], BWELM [41], UBRKELMMV [21], UBRKELM-SV [21], UBKELM-MV [22], and UBKELM-SV [22].
The statistical t-test and Wilcoxon signed-rank test are used to evaluate the performance of the proposed RKWELM-UFS and other methods in consideration. In the t-test result, the value of H (null hypothesis) is 1 if the test rejects the null hypothesis at the 5% significance level, and 0 otherwise.
In the Wilcoxon signed-rank test result, the value of H (null hypothesis) is 1 if the test rejects the null hypothesis that there is no difference between the grade medians at the 5% significance level. In the statistical tests, the p-value indicates the level of significant difference between the compared algorithms; the lower the p-value, the higher the significant difference between the compared algorithms. This work uses AUC and G-mean as the measures of the performance evaluation. The AUC results of classifiers MUEKL and USVM shown in Table 3 are obtained from the work MUEKL [16].

4.4.1. Performance Analysis in Terms of AUC

Table 3, Table 4 and Table 5 provide the performance of the proposed RKWELM-UFS and other classification models in terms of AUC. The reported test AUC of the proposed RKWELM-UFS given in Table 3, Table 4 and Table 5 is the averaged test AUC obtained in 10 trials, using 5-fold cross-validation in each trial. Table 3 provides the performance of the proposed RKWELM-UFS and the existing Universum-based classifiers MUEKL and USVM on 35 datasets in terms of average AUC, where the RKWELM outperforms the other classifiers on 32 datasets. Table 4 provides the performance of the proposed RKWELM-UFS and the existing single classifiers like KELM, WKELM, CCR-KELM, and WKSMOTE on 21 datasets in terms of average AUC, where the RKWELM outperforms the other classifiers on 14 datasets. Table 5 provides the performance of the proposed RKWELM-UFS and the existing ensemble of classifiers such as RUSBoost, BWELM, UBRKELM-MV, UBRKELMSV, UBKELM-MV, UBKELM-SV on 21 datasets in terms of average AUC, where the RKWELM outperforms the other classifiers on 10 datasets.
Figure 1a–c shows the boxplot diagram for the AUC results of the classifiers on various datasets shown in Table 3, Table 4 and Table 5 respectively. The boxplot creates a visual representation of the data to visualize the performance. It can be seen in Figure 1a,b that the proposed method RKWELM-UFS has the highest median value and smallest inter-quantile range, which shows that the RKWELM-UFS is performing better than MUEKL, USVM, KELM, WKELM, CCR-KELM, and WKSMOTE. It can be seen in Figure 1c that RKWELM-UFS is performing better than RUSBoost. Table 6 provides the t-test results and Table 7 provides the Wilcoxon Signed-rank test results on the AUC of various algorithms provided in Table 3, Table 4 and Table 5 for comparison. The results provided in Table 6 and Table 7 suggest that the proposed RKWELM-UFS performs significantly better than MUEKL, USVM, KELM, WKELM, CCR-KELM, RUSBoost, and BWELM, and its performance is approximately similar to that of WKSMOTE, UBRKELM-MV, UBRKELM-SV, UBKELMMV, and UBKELM-SV in terms of AUC.

4.4.2. Performance Analysis in Terms of G-mean

Table 8 and Table 9 provide the performance of the proposed RKWELM-UFS and other classification models in terms of the G-mean. The reported test G-mean of the proposed RKWELM-UFS given in Table 8 and Table 9 is the averaged test g-mean obtained in 10 trials, using 5-fold cross-validation in each trial. Table 8 provides the performance of the proposed RKWELM-UFS and the existing single classifiers such as KELM, WKELM, CCR-KELM, and WKSMOTE on 21 datasets in terms of average G-mean, where the RKWELM outperforms the other classifiers on 16 datasets. Table 9 provides the performance of the proposed RKWELM-UFS and the existing ensemble of classifiers such as RUSBoost, BWELM, UBRKELM-MV, UBRKELMSV, UBKELM-MV, and UBKELM-SV on 21 datasets in terms of average Gmean, where the RKWELM outperforms the other classifiers on seven datasets.
Figure 2a,b shows the boxplot diagram for the G-mean results of the classifiers on various datasets shown in Table 8 and Table 9, respectively. It can be seen in Figure 2a that the proposed RKWELM-UFS has the highest median value and smallest inter-quantile range, which shows that the RKWELM-UFS is performing better than KELM, WKELM, CCR-KELM, and WKSMOTE in terms of the G-mean. It can be seen in Figure 2b that RKWELM-UFS is performing better than RUSBoost and BWELM in terms of the G-mean. Table 10 provides the t-test results and Table 11 provides the Wilcoxon signed-rank test results on the G-mean of various algorithms provided in Table 8 and Table 9 for comparison. The results provided in Table 10 and Table 11 suggest that the proposed RKWELM-UFS performs significantly better than KELM, CCR-KELM, WKSMOTE, and RUSBoost, and performs approximately similarly to WKELM, BWELM, UBRKELM-MV, UBRKELM-SV, UBKELM-MV, UBKELM-SV in terms of the G-mean.

5. Conclusions and Future Work

The use of additional data for training along with the original training data has been employed in many approaches. The Universum data are used to add prior knowledge about the distribution of data in the classification model. Various ELM-based classification models have been suggested to handle the class imbalance problem, but none of these models use prior knowledge. The proposed RKWELM-UFS is the first attempt that employs Universum data to enhance the performance of the RKWELM classifier. This work generates the Universum samples in the feature space using the kernel trick. The reason behind the creation of the Universum instances in the feature space is that the mapping of input data to the feature space is not conformal. The proposed work is evaluated on 44 benchmark datasets with an imbalance ratio between 0:45 to 43:80 and a number of instances between 129 to 2308. The proposed method is compared with 10 state-of-the-art methods used for class-imbalanced dataset classification. G-mean and AUC are used as metrics to evaluate the performance of the proposed method. The paper also incorporates statistical tests to verify the significant performance difference between the proposed and compared methods.
In Universum data-based learning, it has been observed that the efficiency of such classifiers depends on the quality and volume of Universum data created. The methodology of choosing or creating the appropriate Universum samples should be the subject of further research. In the proposed work, the Universum samples are created between randomly selected pairs of majority and minority class samples. In the future, some strategic concepts can be used to select the majority and minority samples instead of random selection. In the future, Universum data can be incorporated in other ELM-based classification models to enhance their learning capability on class imbalance problems. The future work also includes the development of a multi-class variant of the proposed RKWELM-UFS.

Author Contributions

Conceptualization, R.C.; methodology, R.C. and S.S.; Software, R.C.; validation, R.C. and S.S.; formal analysis, R.C.; investigation, R.C.; resources, data curation, R.C.; writing—original draft preparation, R.C.; writing—review and editing, S.S. supervision, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work received no funding from any organization, institute, or person.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Schaefer, G.; Nakashima, T. Strategies for addressing class imbalance in ensemble classification of thermography breast cancer features. In Proceedings of the 2015 IEEE Congress on Evolutionary Computation (CEC), Sendai, Japan, 25–28 May 2015; pp. 2362–2367. [Google Scholar]
  2. Sadewo, W.; Rustam, Z.; Hamidah, H.; Chusmarsyah, A.R. Pancreatic Cancer Early Detection Using Twin Support Vector Machine Based on Kernel. Symmetry 2020, 12, 667. [Google Scholar] [CrossRef] [Green Version]
  3. Hao, W.; Liu, F. Imbalanced Data Fault Diagnosis Based on an Evolutionary Online Sequential Extreme Learning Machine. Symmetry 2020, 12, 1204. [Google Scholar] [CrossRef]
  4. Mulyanto, M.; Faisal, M.; Prakosa, S.W.; Leu, J.-S. Effectiveness of Focal Loss for Minority Classification in Network Intrusion Detection Systems. Symmetry 2020, 13, 4. [Google Scholar] [CrossRef]
  5. Tahvili, S.; Hatvani, L.; Ramentol, E.; Pimentel, R.; Afzal, W.; Herrera, F. A novel methodology to classify test cases using natural language processing and imbalanced learning. Eng. Appl. Artif. Intell. 2020, 95, 103878. [Google Scholar] [CrossRef]
  6. Furundzic, D.; Stankovic, S.; Jovicic, S.; Punisic, S.; Subotic, M. Distance based resampling of imbalanced classes: With an application example of speech quality assessment. Eng. Appl. Artif. Intell. 2017, 64, 440–461. [Google Scholar] [CrossRef]
  7. Mariani, V.C.; Och, S.H.; dos Santos Coelho, L.; Domingues, E. Pressure prediction of a spark ignition single cylinder engine using optimized extreme learning machine models. Appl. Energy 2019, 249, 204–221. [Google Scholar] [CrossRef]
  8. He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
  9. Weston, J.; Collobert, R.; Sinz, F.; Bottou, L.; Vapnik, V. Inference with the universum. In Proceedings of the 23rd International Conference on Machine Learning, New York, NY, USA, 25 June 2006; pp. 1009–1016. [Google Scholar]
  10. Qi, Z.; Tian, Y.; Shi, Y.J. Twin support vector machine with Universum data. Neural Netw. 2012, 36, 112–119. [Google Scholar] [CrossRef]
  11. Dhar, S.; Cherkassky, V. Cost-sensitive Universum-svm. In Proceedings of the 2012 11th International Conference on Machine Learning and Applications, Boca Raton, FL, USA, 12–15 December 2012; pp. 220–225. [Google Scholar]
  12. Richhariya, B.; Tanveer, M.J. EEG signal classification using Universum support vector machine. Expert Syst. Appl. 2018, 106, 169–182. [Google Scholar] [CrossRef]
  13. Qi, Z.; Tian, Y.; Shi, Y. A nonparallel support vector machine for a classification problem with Universum learning. J. Comput. Appl. Math. 2014, 263, 288–298. [Google Scholar] [CrossRef]
  14. Zhao, J.; Xu, Y.; Fujita, H.J. An improved non-parallel Universum support vector machine and its safe sample screening rule. Knowl. Based Syst. 2019, 170, 79–88. [Google Scholar] [CrossRef]
  15. Tencer, L.; Reznáková, M.; Cheriet, M.J. Ufuzzy: Fuzzy models with Universum. Appl. Soft Comput. 2017, 59, 1–18. [Google Scholar] [CrossRef]
  16. Wang, Z.; Hong, S.; Yao, L.; Li, D.; Du, W.; Zhang, J. Multiple universum empirical kernel learning. Eng. Appl. Artif. Intell. 2020, 89, 103461. [Google Scholar] [CrossRef]
  17. Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  18. Zong, W.; Huang, G.-B.; Chen, Y.J. Weighted extreme learning machine for imbalance learning. Neurocomputing 2013, 101, 229–242. [Google Scholar] [CrossRef]
  19. Xiao, W.; Zhang, J.; Li, Y.; Zhang, S.; Yang, W. Class-specific cost regulation extreme learning machine for imbalanced classification. Neurocomputing 2017, 261, 70–82. [Google Scholar] [CrossRef]
  20. Raghuwanshi, B.S.; Shukla, S. Class-specific kernelized extreme learning machine for binary class imbalance learning. Appl. Soft Comput. 2018, 73, 1026–1038. [Google Scholar] [CrossRef]
  21. Raghuwanshi, B.S.; Shukla, S. Underbagging based reduced kernelized weighted extreme learning machine for class imbalance learning. Eng. Appl. Artif. Intell. 2018, 74, 252–270. [Google Scholar] [CrossRef]
  22. Raghuwanshi, B.S.; Shukla, S. Class imbalance learning using UnderBagging based kernelized extreme learning machine. Neurocomputing 2019, 329, 172–187. [Google Scholar] [CrossRef]
  23. Mathew, J.; Pang, C.K.; Luo, M.; Leong, W.H. Classification of imbalanced data by oversampling in kernel space of support vector machines. IEEE Trans. Neural Netw. Learn. Syst. 2017, 29, 4065–4076. [Google Scholar] [CrossRef]
  24. Chen, S.; Zhang, C. Selecting informative Universum sample for semi-supervised learning. In Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence, Pasadena, CA, USA, 11–17 July 2009. [Google Scholar]
  25. Zhao, J.; Xu, Y.J. A safe sample screening rule for Universum support vector machines. Knowl. Based Syst. 2017, 138, 46–57. [Google Scholar] [CrossRef]
  26. Cherkassky, V.; Dai, W. Empirical study of the Universum SVM learning for high-dimensional data. In Proceedings of the International Conference on Artificial Neural Networks, Limassol, Cyprus, 14–17 September 2009; pp. 932–941. [Google Scholar]
  27. Hamidzadeh, J.; Kashefi, N.; Moradi, M. Combined weighted multi-objective optimizer for instance reduction in two-class imbalanced data problem. Eng. Appl. Artif. Intell. 2020, 90, 103500. [Google Scholar] [CrossRef]
  28. Lin, W.-C.; Tsai, C.-F.; Hu, Y.-H.; Jhang, J.-S. Clustering-based undersampling in class-imbalanced data. Inf. Sci. 2017, 409, 17–26. [Google Scholar] [CrossRef]
  29. Ofek, N.; Rokach, L.; Stern, R.; Shabtai, A. Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem. Neurocomputing 2017, 243, 88–102. [Google Scholar] [CrossRef]
  30. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  31. Zhu, T.; Lin, Y.; Liu, Y. Synthetic minority oversampling technique for multiclass imbalance problems. Pattern Recognit. 2017, 72, 327–340. [Google Scholar] [CrossRef]
  32. Agrawal, A.; Viktor, H.L.; Paquet, E. SCUT: Multi-class imbalanced data classification using SMOTE and cluster-based undersampling. In Proceedings of the 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), Lisbon, Portugal, 12–14 November 2015; pp. 226–234. [Google Scholar]
  33. Wang, Z.; Chen, L.; Fan, Q.; Li, D.; Gao, D. Multiple Random Empirical Kernel Learning with Margin Reinforcement for imbalance problems. Eng. Appl. Artif. Intell. 2020, 90, 103535. [Google Scholar] [CrossRef]
  34. Raghuwanshi, B.S.; Shukla, S. Class-specific extreme learning machine for handling binary class imbalance problem. Neural Netw. 2018, 105, 206–217. [Google Scholar] [CrossRef]
  35. Guo, W.; Wang, Z.; Hong, S.; Li, D.; Yang, H.; Du, W. Multi-kernel Support Vector Data Description with boundary information. Eng. Appl. Artif. Intell. 2021, 102, 104254. [Google Scholar] [CrossRef]
  36. Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 2009, 40, 185–197. [Google Scholar] [CrossRef]
  37. Haixiang, G.; Yijing, L.; Yanan, L.; Xiao, L.; Jinling, L. BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification. Eng. Appl. Artif. Intell. 2016, 49, 176–193. [Google Scholar] [CrossRef]
  38. Shen, C.; Wang, P.; Shen, F.; Wang, H. UBoost: Boosting with theUniversum. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 825–832. [Google Scholar] [CrossRef]
  39. Freund, Y.; Schapire, R.; Abe, N.J. A short introduction to boosting. J. Jpn. Soc. Artif. Intell. 1999, 14, 1612. [Google Scholar]
  40. Zhang, Y.; Liu, B.; Cai, J.; Zhang, S. Ensemble weighted extreme learning machine for imbalanced data classification based on differential evolution. Neural Comput. Appl. 2017, 28, 259–267. [Google Scholar] [CrossRef]
  41. Li, K.; Kong, X.; Lu, Z.; Wenyin, L.; Yin, J. Boosting weighted ELM for imbalanced learning. Neurocomputing 2014, 128, 15–21. [Google Scholar] [CrossRef]
  42. Huang, G.B.; Zhou, H.M.; Ding, X.J.; Zhang, R. Extreme Learning Machine for Regression and Multiclass Classification. IEEE Trans. Syst. Man Cybern. Part B 2012, 42, 513–529. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  43. Deng, W.Y.; Ong, Y.S.; Zheng, Q.H. A Fast Reduced Kernel Extreme Learning Machine. Neural Netw. 2016, 76, 29–38. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  44. Alcala-Fdez, J.; Fernandez, A.; Luengo, J.; Derrac, J.; Garcia, S.; Sanchez, L.; Herrera, F. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. J. Mult.-Valued Log. Soft Comput. 2011, 17, 255–287. [Google Scholar]
  45. Alcala-Fdez, J.; Sanchez, L.; Garcia, S.; del Jesus, M.J.; Ventura, S.; Garrell, J.M.; Otero, J.; Romero, C.; Bacardit, J.; Rivas, V.M.; et al. KEEL: A software tool to assess evolutionary algorithms for data mining problems. Soft Comput. 2009, 13, 307–318. [Google Scholar] [CrossRef]
  46. Zeng, Y.J.; Xu, X.; Shen, D.Y.; Fang, Y.Q.; Xiao, Z.P. Traffic Sign Recognition Using Kernel Extreme Learning Machines with Deep Perceptual Features. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1647–1653. [Google Scholar] [CrossRef]
Figure 1. Boxplot diagrams; each box visually represents the performance in terms of average AUC of algorithms labeled on X axis. (a) Boxplot for results of Table 3. (b) Boxplot for results given in Table 4. (c) Boxplot for results given in Table 5.
Figure 1. Boxplot diagrams; each box visually represents the performance in terms of average AUC of algorithms labeled on X axis. (a) Boxplot for results of Table 3. (b) Boxplot for results given in Table 4. (c) Boxplot for results given in Table 5.
Symmetry 14 00379 g001
Figure 2. Boxplot diagrams; each box visually represents the performance in terms of average G-mean of algorithms labelled on X axis. (a) Boxplot for G-mean results given in Table 8. (b) Boxplot for G-mean results given in Table 9.
Figure 2. Boxplot diagrams; each box visually represents the performance in terms of average G-mean of algorithms labelled on X axis. (a) Boxplot for G-mean results given in Table 8. (b) Boxplot for G-mean results given in Table 9.
Symmetry 14 00379 g002
Table 1. Categorization and comparison of the proposed method and other methods in comparison used to handle classification of class imbalance problems.
Table 1. Categorization and comparison of the proposed method and other methods in comparison used to handle classification of class imbalance problems.
CategoryStrategyMethodBasic Idea of the Method
AlgorithmicCost SensitiveWELMThis method minimizes the weighted least-square error to handle the class imbalance.
CCR-KELMThis method assigns a class-specific regularization parameter to handle class imbalance.
RKELMThis method uses a reduced number of centroids in kernels function to handle class imbalance
Data-levelUnder-samplingRandom-Under-samplingThis method uses random under-sampling to balance the imbalanced training data.
OversamplingSMOTEThis method creates artificial minority class samples to balance the imbalanced training data.
CSMOTEThis method generates some artificial samples whose dimension is equal to 5 times the number of minority samples.
UniversumUSVMThis method creates Universum data to sift the separating hyper plane of the SVM classifier
MUEKLThis method combines Multiple Empirical Kernel Learning with the Universum learning.
HybridData-level combined with EnsembleRUS-BoostThis method combines RUS with boosting.
UBKELMThis method uses random under-sampling with KELM-based ensemble.
UBRKELMThis method uses random under-sampling with RKELM-based ensemble.
Cost-sensitive combined with EnsembleBWELMThis method combines boosting with WELM.
Data-level combined with cost sensitiveRKWELM-UFS (the proposed method)The proposed method creates a Universum sample in the feature space and uses RKWELM as the classification algorithm.
Table 2. Specification of 44 benchmark datasets from KEEL dataset repository.
Table 2. Specification of 44 benchmark datasets from KEEL dataset repository.
Dataset Name# AttributesIR (%)# InstancesDataset Name# AttributesIR (%)# Instances
abalone9-18816.70731glass696.43214
ecoli-0170.54220haberman32.81306
ecoli-013726743.80281iris042.00150
ecoli-0123579.26244new-thyroid155.14215
ecoli-01465613.00280new-thyroid255.14215
ecoli-01472356710.65336page-blocks1341016.14472
ecoli-014756612.25332pima81.87768
ecoli-015611.00240segment0196.022308
ecoli-0234579.06202shuttle-c0c4913.781829
ecoli-02673579.53224Shuttle-c2c4924.75129
ecoli-034579.00200Vehicle0183.25846
ecoli-0346579.25205Vehicle1182.91846
ecoli-03475679.25257Vehicle2182.89846
ecoli-046569.13203Vehicle3183.00846
ecoli-0673579.41222vowel0139.97988
ecoli-0675610.00220wisconsin91.86683
ecoli173.39336yeast05679vs4810.00528
Ecoli275.54336yeast1289vs7831.00947
glass016vs2910.00192yeast1458vs7828.00693
glass0123vs45694.00214yeast1vs7715.00459
glass191.85214yeast1vs8824.00482
glass4916.00214yeast388.131484
Table 3. Performance comparison of the proposed RKWELM-UFS with other existing Universum-based classifiers in terms of average AUC (std, KP, and C denote the standard deviation, Kernel width parameter, and regularization parameter, respectively.
Table 3. Performance comparison of the proposed RKWELM-UFS with other existing Universum-based classifiers in terms of average AUC (std, KP, and C denote the standard deviation, Kernel width parameter, and regularization parameter, respectively.
DatasetMUEKL
Test Result% ± std.
USVM
Test Result% ± std.
(KP, C)RKWELM-UFS
Test Result% ± std.
abalone9-1875.06 ± 12.5369.92 ± 12.75(26, 226)94.97 ± 0.52
ecoli-0198.67 ± 1.8397.29 ± 2.5(22, 28)98.67 ± 0.00
ecoli-0123590.68 ± 17.6785.27 ± 14.45(2−2, 2−18)92.68 ± 0.00
ecoli01372685.00 ± 22.3692.88 ± 1.63(24, 2−2)95.99 ± 0.00
ecoli-0146590.00 ± 13.6989.62 ± 11.34(210, 250)94.34 ± 1.56
ecoli0147235688.00 ± 7.2886.73 ± 9.4(22, 26)93.95 ± 0.11
ecoli-01475691.51 ± 4.7688.13 ± 4.05(212, 242)94.93 ± 0.04
ecoli-01591.59 ± 10.7788.41 ± 9.62(28, 234)95.91 ± 0.04
ecoli-0234593.89 ± 7.8988.93 ± 10.36(28, 234)94.53 ± 0.08
ecoli-02673586.51 ± 11.978.82 ± 12.03(22, 24)90.22 ± 0.09
ecoli-034592.22 ± 11.7391.11 ± 11.65(28, 240)91.59 ± 3.29
ecoli-0346591.96 ± 6.7688.24 ± 7.94(210, 244)97.15 ± 0.07
ecoli-03475694.49 ± 5.2088.40 ± 11.89(210, 240)95.55 ± 0.00
ecoli-046592.23 ± 10.9789.19 ± 11.15(26, 234)94.70 ± 0.06
ecoli-0673589.50 ± 16.9786.00 ± 16.62(2−2, 20)92.68 ± 0.06
ecoli-067591.75 ± 7.0587.50 ± 7.55(26, 240)91.59 ± 0.21
ecoli190.48 ± 6.2987.16 ± 5.03(22, 211)93.62 ± 0.29
ecoli294.31 ± 4.4788.78 ± 5.23(20, 28)95.35 ± 0.04
glass179.66 ± 7.4167.64 ± 4.64(2−4, 24)81.67 ± 0.25
glass693.06 ± 7.0890.63 ± 6.33(26, 210)93.41 ± 0.21
haberman64.27 ± 4.3562.84 ± 4.56(28, 242)68.17 ± 0.73
new-thyroid1100.00 ± 0.0096.03 ± 3.7(2−4, 224)100.00 ± 0.00
new-thyroid2100.00 ± 0.0094.37 ± 4.49(28, 242)99.98 ± 0.04
page-blocks13484.21 ± 19.4571.49 ± 16.64(20, 212)100.00 ± 0.00
pima73.03 ± 3.1170.16 ± 5.63(20, 24)79.62 ± 0.05
Segment099.22 ± 0.9089.02 ± 3.74(2−4, 22)99.93 ± 0.00
shuttle-c0c4100.00 ± 0.0099.77 ± 0.27(22, 2−8)100.00 ± 0.00
shuttle-c2c4100.00 ± 0.00100.00 ± 0.00(24, 218)100.00 ± 0.00
vehicle099.18 ± 0.6681.28 ± 6.51(24, 234)99.88 ± 0.13
vehicle177.43 ± 4.1562.37 ± 5.14(26, 232)90.26 ± 0.45
vehicle299.15 ± 0.6883.59 ± 1.52(22, 228)99.73 ± 0.00
vehicle376.47 ± 4.8165.08 ± 3.32(28, 240)89.88 ± 0.25
vowel0100.00 ± 0.0093.61 ± 3.63(2−10, 2−18)100.00 ± 0.00
wisconsin97.99 ± 0.6197.09 ± 1.77(212, 242)99.08 ± 0.09
yeast387.72 ± 2.1889.60 ± 2.12(210, 238)95.09 ± 0.12
Average90.26 ± 6.7385.34 ± 6.83 94.15 ± 0.25
Table 4. Performance comparison of the proposed RKWELM-UFS with existing single classifiers in terms of average AUC (std., KP, and C denotes the standard deviation, Kernel width parameter, and regularization parameter, respectively.
Table 4. Performance comparison of the proposed RKWELM-UFS with existing single classifiers in terms of average AUC (std., KP, and C denotes the standard deviation, Kernel width parameter, and regularization parameter, respectively.
DatasetKELM
Test Result%
WKELM
Test Result%
CCR-KELM
Test Result%
WKSMOTE
Test Result%
(KP, C)RKWELM-UFS
Test Result% ± std.
abalone9vs1883.8195.2483.8190.91(26, 226)94.97 ± 0.52
ecoli01vs589.5092.3989.5096.22(28, 234)95.91 ± 0.04
glass0123vs45693.8497.0393.8498.86(2−2, 24)97.66 ± 0.54
glass016vs281.3684.1181.3683.52(216, 240)88.25 ± 0.16
glass488.0893.3788.0094.86(28, 236)93.34 ± 0.07
haberman63.9167.8163.9167.34(28, 242)68.17 ± 0.73
iris0100.00100.00100.00100.00(2−10, 2−18)100.00 ± 0.00
newthyroid199.60100.0099.6099.71(2−4, 224)100.00 ± 0.00
newthyroid299.60100.0099.6099.92(28, 242)99.98 ± 0.04
pageblock13vs498.00100.0098.0099.96(20, 212)100.00 ± 0.00
pima74.1478.3074.1479.18(20, 24)79.62 ± 0.05
segment097.8998.0799.8099.91(2−4, 22)99.93 ± 0.00
shuttleC0vsC4100.00100.00100.00100.00(22, 2−8)100.00 ± 0.00
shuttleC2vsC4100.00100.00100.00100.00(24, 218)100.00 ± 0.00
vowel0100.00100.00100.0057.38(2−10, 2−18)100.00 ± 0.00
wisconsin98.1598.7598.1598.8(212, 242)99.08 ± 0.09
yeast05679vs474.5485.7274.5478.8(2−2, 26)85.63 ± 0.37
yeast1289vs765.4578.7365.4577.51(26, 228)79.96 ± 0.66
yeast1458vs761.7670.6361.7674.91(26, 218)71.49 ± 0.41
yeast1vs772.1581.0072.1582.89(24, 216)83.67 ± 0.10
yeast2vs879.3983.5579.3985.55(2−2, 248)85.56 ± 0.00
Average86.7290.7086.8188.87 91.58 ± 0.18
Table 5. Performance Comparison of the proposed RKWELM-UFS with existing ensemble of classifiers in terms of average AUC (std, KP, and C denote the standard deviation, Kernel width parameter, and regularization parameter, respectively).
Table 5. Performance Comparison of the proposed RKWELM-UFS with existing ensemble of classifiers in terms of average AUC (std, KP, and C denote the standard deviation, Kernel width parameter, and regularization parameter, respectively).
DatasetRUSBoost TstR ± std.BWELM
TstR
UBRKELM MV
TstR ± std.
UBRKELM SV
TstR ± std.
UBKELM MV
TstR ± std.
UBKELM SV
TstR ± std.
(KP, C)RKWELM UFS
TstR ± std.
abalone9vs1893.67 ± 0.8794.1396.74 ± 1.2196.84 ± 0.6896.79 ± 0.5596.55 ± 0.50(26, 226)94.97 ± 0.52
ecoli01vs593.94 ± 2.3793.9497.53 ± 0.3097.00 ± 0.0994.90 ± 2.3494.13 ± 2.25(28, 234)95.91 ± 0.04
glass0123vs45697.48 ± 0.7296.5797.61 ± 0.9797.40 ± 0.3497.36 ± 0.8097.35 ± 0.00(2−2, 24)97.66 ± 0.54
glass016vs259.25 ± 4.3485.4387.10 ± 1.3987.00 ± 1.4086.07 ± 0.4686.74 ± 1.58(216, 240)88.25 ± 0.16
glass496.18 ± 2.7896.3797.51 ± 2.4997.51 ± 1.9197.38 ± 0.1997.83 ± 0.00(28, 236)93.34 ± 0.07
haberman70.38 ± 4.2968.2268.50 ± 0.1568.36 ± 0.2269.25 ± 0.9769.32 ± 1.46(28, 242)68.17 ± 0.73
iris054.85 ± 5.12100.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.0 ± 0.0(2−10, 2−18)100.0 ± 0.0
newthyroid199.70 ± 0.51100.00100.00 ± 0.00100.00 ± 1.11100.00 ± 0.00100.00 ± 0.00(2−4, 224)100.00 ± 0.00
newthyroid299.60 ± 0.21100.00100.00 ± 0.0099.98 ± 0.52100.00 ± 0.00100.00 ± 0.00(28, 242)99.98 ± 0.04
pageblock13vs499.86 ± 0.298.0099.97 ± 1.9099.91 ± 0.13100.00 ± 0.0099.68 ± 0.28(20, 212)100.00 ± 0.00
pima79.91 ± 0.9379.1080.03 ± 1.1480.78 ± 0.4879.87 ± 0.4280.55 ± 0.48(20, 24)79.62 ± 0.05
segment0100.0 ± 0.099.8999.95 ± 0.0099.91 ± 0.1399.84 ± 0.0099.64 ± 5.80(2−4, 22)99.93 ± 0.00
shuttleC0vsC480.00 ± 6.6799.20100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00(22, 2−8)100.00 ± 0.00
shuttleC2vsC481.91 ± 7.1099.20100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00(24, 218)100.00 ± 0.00
vowel0100.00 ± 0.00100.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.0 ± 0.0(2−10, 2−18)100.0 ± 0.00
wisconsin98.37 ± 0.3198.1599.05 ± 0.0098.98 ± 0.1498.72 ± 0.1599.08 ± 0.14(212, 242)99.08 ± 0.09
yeast05679vs487.97 ± 1.0284.0886.09 ± 0.3086.25 ± 1.2885.86 ± 0.8488.75 ± 1.07(2−2, 26)85.63 ± 0.37
yeast1289vs774.91 ± 1.8178.4480.59 ± 0.8480.60 ± 0.5380.52 ± 0.0880.59 ± 0.07(26, 228)79.96 ± 0.66
yeast1458vs765.94 ± 2.1470.7272.77 ± 0.0072.77 ± 1.0872.98 ± 2.1173.08 ± 1.37(26, 218)71.49 ± 0.41
yeast1vs786.53 ± 2.0582.8982.90 ± 1.1782.41 ± 0.7684.06 ± 1.2986.09 ± 0.58(24, 216)83.67 ± 0.10
yeast2vs879.92 ± 2.4584.4284.32 ± 2.1484.62 ± 1.2984.08 ± 0.1284.35 ± 0.00(2−2, 248)85.56 ± 0.00
Average85.73 ± 2.1990.8991.94 ± 0.6791.92 ± 0.5891.79 ± 0.4992.08 ± 0.74 91.58 ± 0.18
Table 6. T-test results for performance comparison in terms of AUC between the methods given in Table 3, Table 4 and Table 5.
Table 6. T-test results for performance comparison in terms of AUC between the methods given in Table 3, Table 4 and Table 5.
Methods ComparedStatspH (0.05)
MUEKL vs. RKWELM-UFS[−5.610820138186867; −2.152174147527417]6.32 × 10−51
USVM vs. RKWELM-UFS[−11.440221923933274; −6.167915218923865]8.34 × 10−81
KELM vs. RKWELM-UFS[−6.93534639982032; −2.78405360017968]8.98 × 10−51
WKELM vs. RKWELM-UFS[−1.46246304953657; −0.301698855225338]4.81 × 10−31
CCR-KELM vs. RKWELM-UFS[−6.88366104582675; −2.66145323988753]1.33 × 10−41
WKSMOTE vs. RKWELM-UFS[−6.99623994510679; 1.56922089748774]2.01 × 10−10
RUSBoost vs. RKWELM-UFS[−11.4335320049264; −0.266820376026017]0.0409061
BWELM vs. RKWELM-UFS[−1.21328125091469; −0.165166368132924]0.0125271
UBRKELM-MV vs. RKWELM-UFS[−0.169420432040612; 0.87763947965966]0.173630
UBRKELM-SV vs. RKWELM-UFS[−0.196630485917308; 0.872468581155405]0.202190
UBKELM-MV vs. RKWELM-UFS[−0.352563193428057; 0.776972717237582]0.442360
UBKELM-SV vs. RKWELM-UFS[−0.184407389564474; 1.18500738956447]0.143120
Table 7. Wilcoxon Signed-rank test results for performance comparison in terms of AUC between the methods given in Table 3, Table 4 and Table 5.
Table 7. Wilcoxon Signed-rank test results for performance comparison in terms of AUC between the methods given in Table 3, Table 4 and Table 5.
Methods ComparedZvalSignedrankp-ValueH (0.05)
MUEKL vs. RKWELM-UFS−4.6247846312.003.75 × 10−61
USVM vs. RKWELM-UFS−5.0862132490.003.65 × 10−71
KELM vs. RKWELM-UFS−3.62136517302.93 × 10−41
WKELM vs. RKWELM-UFS0102.62 × 10−31
CCR-KELM vs. RKWELM-UFS−3.62136517302.93 × 10−41
WKSMOTE vs. RKWELM-UFS−1.763789403457.78 × 10−20
RUSBoost vs. RKWELM-UFS−2.015964161510.0438037241
BWELM vs. RKWELM-UFS−2.765775456220.0056787621
UBRKELM-MV vs. RKWELM-UFS1.189301687910.2343209720
UBRKELM-SV vs. RKWELM-UFS0.930757842860.3519788420
UBKELM-MV vs. RKWELM-UFS0730.4887084960
UBKELM-SV vs. RKWELM-UFS1.241010456920.2146018860
Table 8. Performance Comparison of the proposed RKWELM-UFS with existing single classifiers in terms of average G-mean (std., KP, and C denote the standard deviation, Kernel width parameter, and regularization parameter, respectively).
Table 8. Performance Comparison of the proposed RKWELM-UFS with existing single classifiers in terms of average G-mean (std., KP, and C denote the standard deviation, Kernel width parameter, and regularization parameter, respectively).
DatasetKELM Test Result%WKELM
Test Result%
CCR-KELM
Test Result%
WKSMOTE
Test Result%
(KP, C)RKWELM-UFS
TestResult% ± std.
abalone9vs1872.7189.7676.5691.94(26, 226)92.23 ± 0.57
ecoli01vs588.3691.3488.3688.00(210, 242)93.01 ± 0.11
glass0123vs45693.2695.4193.2694.19(2−2, 24)96.06 ± 0.55
glass016vs263.2083.5981.3679.00(216, 240)84.46 ± 0.50
glass485.9391.1787.2289.00(28, 236)91.49 ± 0.14
haberman57.2366.2659.7165.21(24, 212)66.02 ± 0.57
iris0100.00100.00100.00100.00(2−10, 2−18)100.00 ± 0.00
newthyroid199.1699.7299.1688.69(2−2, 212)99.44 ± 0.00
newthyroid299.4499.7299.4490.72(2−6, 2−18)99.44 ± 0.00
pageblock13vs497.8998.0797.8497.38(20, 216)100.00 ± 0.00
pima71.1675.5873.6174.00(20, 24)75.60 ± 0.19
segment097.8998.0799.57100.00(2−8, 2−18)99.54 ± 0.00
shuttleC0vsC4100.00100.00100.00100.00(22, 2−8)100.00 ± 0.00
shuttleC2vsC494.14100.00100.00100.00(24, 218)100.00 ± 0.00
vowel0100.00100.00100.00100.00(2−10, 2−18)100.00 ± 0.00
wisconsin97.2297.7097.1896.33(212, 242)97.89 ± 0.07
yeast05679vs468.6882.2182.2481.00(2−2, 26)81.03 ± 0.47
yeast1289vs760.9771.4159.2869.83(2−2, 20)73.35 ± 0.05
yeast1458vs759.8969.3266.2467.00(2−4, 26)67.54 ± 0.09
yeast1vs764.4877.7268.3276.00(22, 22)77.77 ± 0.15
yeast2vs877.2477.8978.9180.00(20, 226)81.36 ± 1.42
Average83.2888.8186.1187.06 89.74 ± 0.17
Table 9. Performance Comparison of the proposed RKWELM-UFS with existing ensemble of classifiers in terms of average G-mean (std., KP, and C denote the standard deviation, Kernel width parameter, and regularization parameter, respectively).
Table 9. Performance Comparison of the proposed RKWELM-UFS with existing ensemble of classifiers in terms of average G-mean (std., KP, and C denote the standard deviation, Kernel width parameter, and regularization parameter, respectively).
DatasetRUSBoost
TstR ± std.
BWELM
TstR
UBRKELM MV
TstR ± std.
UBRKELM SV
TstR ± std.
UBKELM MV
TstR ± std.
UBKELM SV
TstR ± std.
(KP, C)RKWELM UFS
TstR ± std.
abalone9vs1886.40 ± 1.3390.1292.28 ± 0.0092.30 ± 0.0091.53 ± 0.9691.07 ± 3.45(26, 226)92.23 ± 0.57
ecoli01vs588.92 ± 1.5589.3693.53 ± 0.0093.09 ± 0.0993.63 ± 0.4394.02 ± 1.07(210, 242)93.01 ± 0.11
glass0123vs45693.74 ± 0.8494.2195.67 ± 0.6895.91 ± 0.5195.24 ± 0.9095.45 ± 0.25(2−2, 24)96.06 ± 0.55
glass016vs252.46 ± 3.0484.2184.26 ± 0.5084.42 ± 0.3584.48 ± 0.4383.89 ± 1.29(216, 240)84.46 ± 0.50
glass487.31 ± 2.8290.3091.69 ± 2.1091.57 ± 2.8692.91 ± 2.8292.86 ± 3.30(28, 236)91.49 ± 0.14
haberman53.36 ± 7.2165.1466.34 ± 0.1370.20 ± 4.2366.70 ± 0.8866.49 ± 1.50(24, 212)66.02 ± 0.57
iris019.85 ± 10.38100.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00(2−10, 2−18)100.0 ± 0.0
newthyroid198.05 ± 0.95100.0099.55 ± 0.2099.61 ± 0.1499.29 ± 0.4999.47 ± 0.13(2−2, 212)99.44 ± 0.00
newthyroid296.94 ± 0.9199.7299.44 ± 0.1399.44 ± 0.1399.13 ± 0.0099.30 ± 0.08(2−6, 2−18)99.44 ± 0.00
pageblock13vs497.96 ± 1.2199.8999.41 ± 0.1299.91 ± 0.13100.00 ± 0.00100.00 ± 0.00(20, 216)100.00 ± 0.00
pima70.34 ± 1.4575.4876.11 ± 0.2176.22 ± 0.2275.76 ± 0.3175.84 ± 0.34(20, 24)75.60 ± 0.19
segment099.99 ± 0.0099.8999.80 ± 0.1399.68 ± 0.2899.63 ± 1.1099.64 ± 5.80(2−8, 2−18)99.54 ± 0.00
shuttleC0vsC460.00 ± 13.33100.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00(22, 2−8)100.00 ± 0.00
shuttleC2vsC468.50 ± 15.89100.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00(24, 218)100.00 ± 0.00
vowel0100.00 ± 0100.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00100.00 ± 0.00(2−10, 2−18)100.00 ± 0.00
wisconsin95.46 ± 0.7797.1897.81 ± 0.0097.81 ± 0.0097.72 ± 0.2097.79 ± 0.18(212, 242)97.89 ± 0.07
yeast05679vs477.55 ± 1.6380.9681.82 ± 1.9882.62 ± 0.0982.24 ± 0.3383.45 ± 2.29(2−2, 26)81.03 ± 0.47
yeast1289vs767.83 ± 2.9672.6775.54 ± 0.8475.27 ± 0.1274.28 ± 1.0574.73 ± 1.72(2−2, 20)73.35 ± 0.05
yeast1458vs759.59 ± 3.4369.8769.54 ± 1.4869.54 ± 1.7471.24 ± 1.6970.15 ± 1.31(2−4, 26)67.54 ± 0.09
yeast1vs773.49 ± 1.7977.7278.90 ± 7.4478.41 ± 0.7677.73 ± 0.0077.90 ± 1.54(22, 22)77.77 ± 0.15
yeast2vs872.16 ± 1.8378.3580.77 ± 2.4280.10 ± 0.2280.05 ± 2.4481.69 ± 1.51(20, 226)81.36 ± 1.42
Average77.39 ± 3.5989.3490.08 ± 0.8090.30 ± 0.5890.08 ± 0.5890.10 ± 1.21 89.74 ± 0.17
Table 10. T-test results for performance comparison in terms of G-mean between the methods given in Table 8 and Table 9.
Table 10. T-test results for performance comparison in terms of G-mean between the methods given in Table 8 and Table 9.
Methods ComparedStatspH (0.05)
KELM vs. RKWELM-UFS[−8.97586793133104; −3.15547492581182]3.12 × 10−41
WKELM vs. RKWELM-UFS[−1.10027216529072; 0.0251197843383338]6.01 × 10−20
CCR-KELM vs. RKWELM-UFS[−5.34275272694909; −1.13049489209853]4.44 × 10−31
WKSMOTE vs. RKWELM-UFS[−3.63784233634963; −0.927786235078946]2.18 × 10−31
RUSBoost vs. RKWELM-UFS[−20.961933935114; −3.45036130298126]0.00869781
BWELM vs. RKWELM-UFS[−1.12033850136527; 0.0575670727938417]7.45 × 10−20
UBRKELM-MV vs. RKWELM-UFS[−0.0326448040855254; 0.626063851704572]0.0748680
UBRKELM-SV vs. RKWELM-UFS[−0.0440133837744108; 0.984099098060123]0.0709390
UBKELM-MV vs. RKWELM-UFS[−0.207557995239237; 0.715262757143997]0.264670
UBKELM-SV vs. RKWELM-UFS[−0.0647504586179029; 0.780074268141712]0.0926210
Table 11. Wilcoxon signed-rank test results for performance comparison in terms of G-mean between the methods given in Table 8 and Table 9.
Table 11. Wilcoxon signed-rank test results for performance comparison in terms of G-mean between the methods given in Table 8 and Table 9.
Methods ComparedZvalSigned Rankp-ValueH (0.05)
KELM vs. RKWELM-UFS−3.72355540601.96 × 10−41
WKELM vs. RKWELM-UFS−1.822772421386.83 × 10−20
CCR-KELM vs. RKWELM-UFS−3.28999842571.00 × 10−31
WKSMOTE vs. RKWELM-UFS−3.47935085235.03 × 10−41
RUSBoost vs. RKWELM-UFS−3.88259764311.03 × 10−41
BWELM vs. RKWELM-UFS−1.917193327365.52 × 10−20
UBRKELM-MV vs. RKWELM-UFS1.5858265791100.1127786550
UBRKELM-SV vs. RKWELM-UFS1.9171933271170.0552133760
UBKELM-MV vs. RKWELM-UFS0.723922766820.4691131520
UBKELM-SV vs. RKWELM-UFS1.52566366497.50.1270936480
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Choudhary, R.; Shukla, S. Reduced-Kernel Weighted Extreme Learning Machine Using Universum Data in Feature Space (RKWELM-UFS) to Handle Binary Class Imbalanced Dataset Classification. Symmetry 2022, 14, 379. https://doi.org/10.3390/sym14020379

AMA Style

Choudhary R, Shukla S. Reduced-Kernel Weighted Extreme Learning Machine Using Universum Data in Feature Space (RKWELM-UFS) to Handle Binary Class Imbalanced Dataset Classification. Symmetry. 2022; 14(2):379. https://doi.org/10.3390/sym14020379

Chicago/Turabian Style

Choudhary, Roshani, and Sanyam Shukla. 2022. "Reduced-Kernel Weighted Extreme Learning Machine Using Universum Data in Feature Space (RKWELM-UFS) to Handle Binary Class Imbalanced Dataset Classification" Symmetry 14, no. 2: 379. https://doi.org/10.3390/sym14020379

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop