Ensemble Learning with Highly Variable Class-Based Performance

Warner, Brandon; Ratner, Edward; Carlous-Khan, Kallin; Douglas, Christopher; Lendasse, Amaury

doi:10.3390/make6040106

Open AccessArticle

Ensemble Learning with Highly Variable Class-Based Performance

by

Brandon Warner

^1,*

,

Edward Ratner

^1,*

,

Kallin Carlous-Khan

¹

,

Christopher Douglas

¹ and

Amaury Lendasse

²

¹

Verseon International Corporation, Fremont, CA 94538, USA

²

Department of Engineering Management and Systems Engineering, Missouri University of Science and Technology, Rolla, MO 65409, USA

^*

Authors to whom correspondence should be addressed.

Mach. Learn. Knowl. Extr. 2024, 6(4), 2149-2160; https://doi.org/10.3390/make6040106

Submission received: 31 July 2024 / Revised: 24 August 2024 / Accepted: 16 September 2024 / Published: 24 September 2024

(This article belongs to the Section Learning)

Download

Browse Figures

Versions Notes

Abstract

:

This paper proposes a novel model-agnostic method for weighting the outputs of base classifiers in machine learning (ML) ensembles. Our approach uses class-based weight coefficients assigned to every output class in each learner in the ensemble. This is particularly useful when the base classifiers have highly variable performance across classes. Our method generates a dense set of coefficients for the models in our ensemble by considering the model performance on each class. We compare our novel method to the commonly used ensemble approaches like voting and weighted averages. In addition, we compare our approach to class-specific soft voting (CSSV), which was also designed to address variable performance but generates a sparse set of weights by solving a linear system. We choose to illustrate the power of this approach by applying it to an ensemble of extreme learning machines (ELMs), which are well suited for this approach due to their stochastic, highly varying performance across classes. We illustrate the superiority of our approach by comparing its performance to that of simple majority voting, weighted majority voting, and class-specific soft voting using ten popular open-source multiclass classification datasets.

Keywords:

ensemble learning; machine learning; supervised ensemble classification; extreme learning machine; voting ensemble; class imbalance

1. Introduction

Machine learning (ML) offers a spectrum of classifier models uniquely designed for specific learning tasks. These models, ranging from perceptrons and adalines to neural networks, traditionally focus on identifying the single most effective hypothesis for a given task. However, ensemble learning diverges from this path, embracing a collaborative approach. It synergizes multiple ML models of varied structures grounded in strategically assigned weights. First proposed in the pioneering work of Freund and Schapire [1], ensemble learning has since evolved significantly. Over the years, various ensemble strategies have emerged, broadening the scope and application of this approach. Key developments in this field include bagging [2], AdaBoost [3], random forest [4], random subspace [5], and gradient boosting [6]. Each of these strategies offers distinct advantages, contributing to the robustness and adaptability of ensemble learning in tackling complex ML challenges. In numerous scenarios, relying on a solitary machine learning model can prove inadequate, especially in the face of imbalanced, high-dimensional, or noisy datasets. These complexities often diminish the predictive capabilities of standalone models. Ensemble learning emerges as a robust solution to these challenges, integrating the disciplines of data mining, modeling, and fusion. This approach excels by constructing an array of diverse models, each developed through distinct projection strategies tailored to handle specific data intricacies.

Variable class performance, a phenomenon prevalent across various domains, is a pivotal challenge in ensemble learning and data mining [7,8,9,10,11]. Class-based performance may vary due to class imbalance, noise, highly scattered distributions, or complex manifolds. Class imbalance is common in classification problems, often skewing predictive models, leading to biased or inaccurate outcomes. Among the strategies used to address this, output fusion is particularly noteworthy. It involves the integration of outputs from multiple models into a unified collective output, employing a range of optimization, weighting, and voting strategies. The effectiveness of these strategies is contingent on the comparable performance of the base models. Weighted approaches, which excel in such scenarios, assign different levels of importance to each model’s output, often based on their performance in a validation set.

A commonly employed yet powerful method within the spectrum of ensemble learning is majority voting [12,13,14,15]. This technique takes the most frequently predicted class made by a diverse set of base models as the final decision. Soft voting approaches have also been proposed [16], which take the sum of the class probabilities and designate the highest sum probability as the chosen class. Bagging is a similar approach that assumes equal weight for each of the identical base classifiers in the ensemble. Random forest is a popular architecture that uses this approach.

When constructing an ensemble model, several fundamental principles must be considered. A primary consideration is the incorporation of diversity among the constituent models. This diversity stems from varying the “inductive biases” inherent in each model [7]. Utilizing these diverse biases is crucial as it allows the ensemble to cover a broader spectrum of potential solutions, enhancing its robustness. However, it is important to note that simply maximizing diversity does not automatically guarantee the creation of an optimal ensemble [8]. This is because the effectiveness of an ensemble is a function not only of diversity but also of how well the models complement each other. Another critical factor influencing the success of an ensemble is the correlation of errors among the individual models, commonly referred to as inducers. Research indicates that the predictive power of an ensemble is positively correlated with the degree of independence in the errors made by these inducers [7]. In essence, the more uncorrelated the errors of each model are, the greater the potential for the ensemble to achieve high accuracy, as the individual weaknesses of one model are offset by the strengths of others.

In this work, we propose a novel approach to weighting the outputs of ensemble classifiers using class-based weight coefficients determined from the performance of each class on the validation set. Much of the previous ensemble learning literature focused on optimizing the base classifiers while using a simple majority vote or weighted average scheme for classification. Instead, this work focuses on formulating a more sophisticated approach to leveraging the diverse base classifiers in the ensemble. We compare our novel approach to three popular and effective ensemble weighting approaches: simple majority voting, weighted majority voting, and class-specific soft voting. We illustrate the effectiveness of this approach using extreme learning machines (ELMs) as the base classifier; however, this approach may be implemented with any classifier. We use ELMs due to their highly varied performance caused by the generation of random hidden nodes. The non-correlated errors of ELMs can be leveraged in ensemble learning. In this study, we do not aim to attain state-of-the-art performance on the chosen datasets. Instead, we strive to illustrate the superiority of our class-based weighted averaging over three popular ensemble strategies, including the state-of-the-art class-specific soft voting (CSSV) approach.

2. Materials and Methods

2.1. Extreme Learning Machines

ELMs, a sophisticated evolution of the Single-Layer Feed-forward Network (SLFN), have demonstrated remarkable versatility in handling various tasks, including classification, regression, and clustering [17,18,19,20]. As Huang et al. [20] highlighted, ELMs stand out for their exceptional generalized performance across numerous applications. One of the most notable advantages of ELMs is their rapid learning speed, which is orders of magnitude faster than that of traditional neural networks [17,18,19,20]. This speed facilitates quick adaptation, testing, and fine-tuning of ELM parameters, offering a significant edge over classical algorithms. ELMs diverge from traditional neural network methodologies in their learning approach. A defining characteristic of ELMs is their ability to generate hidden nodes randomly, thereby decoupling the weights of the first layer from the training data. This unique feature allows for a non-iterative, linear, ordinary least-squares solution for determining output weights, a significant deviation from the traditional back-propagation training process. The inherent randomness in ELMs renders them particularly suitable for ensemble methods, as they naturally facilitate the creation of diverse models. Figure 1, below, visualizes the structure of a single ELM.

The structure of an ELM contains three layers: the input layer, hidden layer, and output layer. The input layer weights (w) and biases (b) are generated randomly. The input data may be described as

X \in R^{m \times d}, X = {(x_{1}, \dots, x_{m})}^{T}

, with m samples and d features. After the nonlinear activation function f is applied, the hidden layer output may be represented as

h_{i} (x) = f (x^{T} w_{i} + b_{i}), i \in [1, N]

(1)

The activation function may be any nonlinear piecewise function, such as sigmoid or hyperbolic tangent. The final layer of the ELM output may be described as

f_{E L M} (x) = \sum_{i = 1}^{N} θ_{i} h_{i} (x) = {h (x)}^{T} θ = \hat{t}

(2)

where

θ

is the output weights and

\hat{t}

is the approximation of t, the true target value (i.e., class label or regression value) of x. The final step for training an ELM is to determine the output layer coefficients

θ

. Using the target matrix

T = {(t_{1}, \dots, t_{m})}^{T}

,

θ

satisfies the following:

θ = {{a r g m i n}_{θ} ‖f_{E L M} (X) - T‖}^{2}

(3)

where the ELM function

f_{E L M} (X) - T

approximates the true target matrix T. For simplicity, we introduce

H \in R^{m \times n}

:

H = (\begin{matrix} h_{1} (x_{1}) & \dots & h_{N} (x_{1}) \\ \dots & ⋱ & \dots \\ h_{1} (x_{m}) & \dots & h_{N} (x_{m}) \end{matrix})

(4)

Equation (4) can then be rewritten as

θ = {{a r g m i n}_{θ} ‖H θ - T‖}^{2}

(5)

2.2. ELM Ensembles

Various ELM ensemble methods have been proposed [21,22,23,24,25,26,27,28,29,30]. First, ref. [21] illustrated the effectiveness of the simple averaged approach using ELMs. Next, ref. [22] proposed a cross-validated ensemble approach wherein weights and biases are updated adaptively after each fold in the cross-validation process. Ref. [23] extended ELM ensemble techniques to handle big data problems using the Bag of Little Bootstraps technique. Ref. [24] then proposed a boosting-based approach to ensemble ELMs using the iteration-BFGS Quasi-Newton process. Ref. [25] further optimized ELM ensemble learning for big data using a method based on MapReduce [26], which allows for any combination of bagging, subspace partitioning, and cross-validation. Ref. [27] proposed an adaption of Adaboost for ELMs, which further improved performance but slowed the training time significantly due to its iterative coefficient minimization. Ref. [28] further improved classification performance using Sparse Representation Classifier (SRC) [29], which is well suited to noisy data. Ref. [30] proposed a voting-based ELM (V-ELM), which improved performance over single ELM architectures. Finally, ref. [16] proposed a special case of boosting using class-specific soft voting (CSSV), which outperformed previous state-of-the-art methods. We benchmark against this approach in the following experimental analysis.

2.3. Ensemble Model Parameters

The neural architecture we employ in the following experiments features the following parameters. For each ensemble, we use 1200 ELMs. For each ELM, we use hyperbolic tangent as our activation function. We use four alpha L2 parameter values, 1 × 10⁻⁶, 1 × 10⁻², 1 × 10⁻², 1, and 1 × 10², resulting in 300 models of each type. For each 300-model set, we also use three different neuron values, resulting in 100 models for each alpha L2 value and number of neurons. We generate 100 different models using random input layer weights, making each unique. The number of neurons varies depending on the size of the dataset. We include Table 1 below to illustrate the number of neurons used for each dataset.

2.4. Simple Voting Ensemble

We first compare our approach to a simple voting ensemble, as proposed by ref. [30]. In this approach, each model in the ensemble has a single vote, which counts toward the final decision. This simple yet powerful pluralist approach has greatly improved classification performance over single-model approaches [31]. However, this method performs poorly when the base classifiers’ performance is highly varied. This method may be described as

E (x) = m o d e \{C_{1} (x), C_{2} (x), \dots, C_{i} (x)\}

(6)

In this formula,

E (x)

represents the ensemble’s prediction for input x. In this approach, we simply take the mode of the predicted classes made by each classifier,

C_{i}

, as our final prediction.

2.5. Weighted Majority Voting Ensemble (WMVE)

We also compare our novel approach to the popular weighted majority voting ensemble (WMVE) methodology [32,33,34]. In this approach, each base classifier receives a different weight according to its performance on the validation set. The weight

w_{i}

is selected for the

C^{t h}

classifier, which is the same for every class. The final prediction for each sample in the test set is made according to the highest-weighted votes.

E (x) = \arg \underset{j}{m a x} \sum_{i = 1}^{m} w_{i} X_{A} (C_{i} (x) = j),

(7)

Here, A is the set of unique class labels and

X_{A}

is the characteristic function

[C_{i} (x) = j \in A]

. Each model contributes its weight for the class it predicts. The model weight is fixed and independent of the model prediction.

2.6. Class-Specific Soft Voting (CSSV) Ensemble

To evaluate the effectiveness of our novel approach, we compare its performance to that of the state-of-the-art class-specific soft voting (CSSV) ensemble approach, as proposed by [16]. In this approach, the class-specific weights are determined using pruning, a special case of boosting where the weights are solved using non-negative least-squares (NNLS). The class-based coefficients may be solved using NNLS because there are more models than samples in validation. This process may be described as

E (x) = \arg \underset{j}{m a x} \sum_{i = 1}^{m} w_{i j} X_{A} (C_{i} (x) = j),

(8)

Like our novel method, this approach results in weighted coefficients for each class in the model. The two approaches differ in how the coefficients are calculated. Below, we evaluate each of these approaches.

2.7. Novel Class-Based Weighted Ensemble System

In addressing classification challenges, our proposed method introduces a novel approach: a weighted ensemble of extreme learning machine (ELM) models endowed with unique class-based weight coefficients. These coefficients are assigned to every class within the model, providing a tailored approach to class representation. The process begins with each model undergoing training and validation. During validation, we calculate a class-based weighting coefficient for each class specific to each model. This coefficient is derived by evaluating the model’s performance on the validation set, ensuring that it accurately reflects its efficacy in class prediction. Figure 2 illustrates this process.

Subsequently, in the inference phase, each model within the ensemble contributes its prediction. These predictions are not treated uniformly; instead, they are weighed according to the class-based weight coefficient associated with the predicted class of each model. This nuanced weight mechanism ensures that the predictive contribution of each model is adjusted based on its demonstrated proficiency in classifying each class. The determination of these class-based coefficients is a critical aspect of our approach. Figure 3 illustrates this process.

In this paradigm, the initial step involves partitioning the training data into two subsets: training and validation. This validation set plays a crucial role in fine-tuning the ensemble by determining the optimal weights for each model in the ensemble. Consider an ensemble consisting of m models, each trained on its respective subset of the training data. The effectiveness of the ensemble is encapsulated in the general formula for ensemble output given by

E (x) = \arg \underset{j}{m a x} \sum_{i = 1}^{m} w_{i j} X_{A} (C_{i} (x) = j),

(9)

In this formula,

E (x)

represents the ensemble’s prediction for input x. The term

C_{i} (x)

is the prediction of the

i^{t h}

model in the ensemble for the input x, and the

w_{i j}

terms are the weights assigned to each model’s prediction. These weights are set according to the validation performance, and the ensemble can effectively leverage the strengths of each model. In conventional ensemble methods, the weights are constant, typically determined by the accuracy of each model on the validation set. This approach treats predictions as vector predictions when making inference predictions; that is, each model predicts whether the test sample is a member of each class. Consequently, for class j, the prediction is represented as a vector with the format (0, …, 1, …, 0), where the ’1’ is positioned in the

j^{t h}

place. In this scenario, when model i predicts class j, it contributes its coefficient to class j; for all other classes, the value is zero.

Ultimately, the class receiving the highest cumulative weight is deemed the predicted class. Moreover, the normalized total weights can be interpreted as class probabilities. In our new class-based ensembling approach, we introduce a dynamic element where the weight

w_{i j}

of the model varies depending on its prediction. In this framework, each predicted class j for model i has a distinct weight. During validation, one strategy to determine these weights,

a_{i}

and j, is to set them equal to the model’s Jaccard index when predicting class j. This yields a dense set of coefficients, with each model typically having a non-zero coefficient for each class. This strongly contrasts against boosting-like approaches that typically solve a linear system resulting in sparse coefficients. In the inference phase, each model is evaluated on the test sample. The appropriate coefficient is selected based on its class prediction, and the ensemble computation proceeds similarly to the standard approach. We demonstrate the advantage of our approach in the context of an ELM ensemble implementation as determined in the Extreme AutoML methodology [17].

2.8. Benchmarking Approach

To evaluate the effectiveness of our innovative class-based weighting ensemble strategy, we conducted experiments using eight open-source multiclass classification datasets sourced from the University of California at Irvine’s (UCI) Machine Learning Repository and Kaggle. We focused on multiclass datasets to better evaluate the effectiveness of the ensembling approaches. In each of the following experiments, we conducted 10-fold stratified cross-validation, resulting in an 80%/10%/10% train/validation/test split. This approach ensured a balanced representation of classes in both the training and testing phases. To assess the performance of our proposed method against conventional techniques, we utilized five key metrics: F1 score, accuracy, precision, recall, and the Jaccard index [35]. We juxtaposed the proposed class-based weighting method in our experimental framework with the single-vote (pluralist) method, WMVE, and CSSV. In the WMVE approach, each model’s overall accuracy was employed as its weighting coefficient in the ensemble, as determined on the validation set. This method aligns with the standard voting ELM methodology, where the voting weights are directly influenced by model accuracy. The results of these experiments, presented in the subsequent section, offer a clear comparison across the four approaches. This comparison highlights the potential advantages of our class-based weighting strategy over the standard weighted voting methods.

2.9. Datasets

We chose a variety of open-source multiclass datasets to test the performance of the ensemble methods. We chose datasets that have little to no categorical variables because of the complications of one-hot encoding. We chose not to include binary classification datasets because the class-based ensemble weights would be complementary, as the performance on the two classes would not be independent. The balance scale dataset [36] was the first dataset used in our experiments. This dataset comprises 23 features, 839 instances, and three relatively unbalanced output classes and was created to model psychological experimental results. Each sample is classified as either a balance scale tip to the right, a balance scale tip to the left, or balanced. The features consist of variables such as the left distance, right distance, left weight, and right weight. The following dataset, contraceptive method choice [37], includes nine features, 1473 instances, and nine output classes. This dataset features demographic and socio-economic variables of Indonesian women. The problem is to predict the current contraceptive method choice. The next dataset, Synthetic control [38], comprises 60 features, 600 instances, and six balanced output classes. This dataset was synthetically generated by Alcock and Manolopulos (1999) to predict classes of control charts. The categories include normal, cyclic, increasing trend, decreasing trend, upward shift, and downward shift. The following dataset, car evaluation [39], has six features, 1728 instances, and four unbalanced output classes. This dataset features pricing and technology features such as price, maintenance cost, and safety rating. The activity recognition dataset [40] comprises 561 features, 10,299 samples, and six relatively balanced classes. The dataset features sensor signal data for the purpose of predicting whether the person is walking, walking upstairs, walking downstairs, sitting, standing, or lying down. Next, the student success dataset [41] includes 36 features, 4424 instances, and three relatively unbalanced output classes. This dataset uses features such as academic performance, academic path, demographics, and socio-economic factors to predict whether a student will drop out, stay enrolled, or graduate at the end of a semester. Next, the CNAE9 dataset [42] includes 856 features, 1080 samples, and nine balanced output classes. This dataset comprises features created by text mining approaches to predict the type of business document. The popular Iris dataset [43] comprises four features, 150 samples, and three balanced output classes. The datasets used in the following experiments are described in Table 2 below. The level of class imbalance is captured by the normalized class standard deviation, with 0 being the value for balanced data sets and larger values corresponding to larger imbalances.

3. Results

Table 3, presented below, offers a detailed and comprehensive comparison of the performance metrics for each method across the eight classification datasets used in our experiments. We used the ‘macro’ parameter for each of the metrics in SciKitLearn.

Table 4 below summarizes the performance of the methods using a ranking system. In the case of a tie, the tied methods all received the highest ranking. The rankings of all the metrics used were very consistent for all the data sets in this study. We looked at the rankings of the majority of the metrics to obtain an overall estimate of the ensemble quality. This table illustrates the superiority of our novel class-based method.

As Figure 4 illustrates below, the class-based weighting approach is clearly the best overall method for multiclass classification. This superiority is evident not only in the overall accuracy—the ability of each technique to classify samples across all classes correctly—but also in the class-specific performance as measured by the F1 score and the average Jaccard index per class. The latter two metrics are particularly indicative of the method’s effectiveness in accurately identifying and classifying samples within each class, a critical aspect in datasets with multiple and diverse classes.

These findings are significant as they suggest that the class-based weighting approach yields better overall results in both target class identification and general accuracy. This result validates the potential of our proposed method in enhancing classification performance, especially in scenarios where precision in classifying specific categories is paramount.

4. Discussion

In this work, we illustrated the superiority of our novel class-based weight coefficients by comparing its performance to that of three popular ensemble methods: class-specific soft voting (CSSV), pluralist voting (single vote), and weighted majority voting ensemble (WMVE) solutions. We did so using eight popular open-source multiclass classification datasets. In this study, we used extreme learning machines (ELMs) as our base classifier. However, one could employ our novel ensemble weighting approach using any base classifier. We chose ELMs because of their highly varied performance across classes, which can be exploited using ensemble learning. Our experiments illustrated the superiority of our novel class-based approach, both in overall accuracy and in class-based performance according to the Jaccard index and F1 score. Our experiments found that the WMVE approach was the next best ensemble strategy, followed by CSSV and pluralist voting. However, we identified datasets that are better suited for the CSSV approach, which opens avenues for future research. For example, one could develop an AutoML approach that uses a portion of the data for validating various ensemble procedures, choosing the most appropriate ensemble strategy based on the data. In sum, this paper proposes a novel model-agnostic ensemble approach better suited for situations that result in highly variable class-based performance for the base models. We rigorously proved that our novel class-specific approach performs best using eight multiclass datasets.

5. Limitations

During our extensive experimental analysis, we identified a few cases where our novel approach does not perform as well as the state-of-the-art class-specific soft voting (CSSV) method. In most of these cases, CSSV outperformed our approach by a small margin. However, in the car evaluation dataset, the CSSV approach attained an accuracy over three percentage points higher than that for our approach. This opens avenues for future research into automating the ensemble selection approach to ensure that one is using the most optimal method. With this considered, the experiments nevertheless showed that our method is the best overall ensemble voting approach.

6. Patents

This work has resulted in one pending patent.

Author Contributions

Conceptualization, E.R., B.W. and K.C.-K.; methodology, E.R. and B.W.; software, K.C.-K. and C.D.; validation, B.W.; investigation, B.W.; data curation, B.W.; writing—original draft preparation, B.W., E.R. and A.L.; writing—review and editing, B.W. and E.R.; supervision, E.R.; project administration, E.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original data presented in the study are openly available from the University of California at Irvine Machine Learning Repository.

Conflicts of Interest

Brandon Warner, Edward Ratner, and Christopher Douglas are all employees of Verseon International Corp. Kallin Carolus-Khan is a former employee of Verseon International Corporation. Amaury Lendasse is a former consultant of Verseon International Corporation. Edward Ratner, Kallin Carolus-Khan, and Christopher Douglas are all co-inventors on a pending patent (U.S. Provisional Application No. 63/597,893, 10 November 2023).

References

Freund, Y.; Schapire, R.E. Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning ICML, Bari, Italy, 3–6 July 1996; Volume 96. [Google Scholar]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Ho, T.K. Random decision forests. In Proceedings of the 3rd Internal Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
Yang, Q.; Wu, X. 10 challenging problems in data mining research. Int. J. Inf. Technol. Decis. Mak. 2006, 5, 597–604. [Google Scholar] [CrossRef]
Opitz, D.W.; Shavlik, J.W. Actively searching for an effective neural network ensemble. Connect. Sci. 1996, 8, 337–354. [Google Scholar] [CrossRef]
Deng, H.; Rnger, G.; Tuv, E.; Vladimir, M. A time series forest for classification and feature extraction. Inf. Sci. 2013, 239, 142–153. [Google Scholar] [CrossRef]
Bi, Y. The impact of diversity on the accuracy of evidential classifier ensembles. Int. J. Approx. Reason. 2012, 53, 584–607. [Google Scholar] [CrossRef]
Chen, Z.; Duan, J.; Kang, L.; Qiu, G. Class-imbalanced deep learning via a class-balanced ensemble. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 5626–5640. [Google Scholar] [CrossRef]
Parvin, H.; Mineai, B.; Alizadeh, H.; Beigi, A. A novel classifier method based on class weighting in huge dataset. In 8th Inernational Symposium on Neural Networks, ISNN 2011, Guilin, China, 29 May–1 June 2011; Proceedings, Part II; Springer: Berlin/Heidelberg, Germany, 2011; Volume 8, pp. 144–150. [Google Scholar]
Ruta, D.; Gabrys, B. Classifier selection for majority voting. Inf. Fusion 2005, 6, 63–81. [Google Scholar] [CrossRef]
Rojarath, A.; Songpan, W.; Pong-inwong, C. Improved ensemble learning for classification techniques based on majority voting. In Proceedings of the 2016 7th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 26–28 August 2016; pp. 107–110. [Google Scholar]
Kuncheva, L.I.; Rodriguez, J.J. A weighted voting framework for classifiers ensembles. Knowl. Inf. Syst. 2014, 38, 259–275. [Google Scholar] [CrossRef]
Dogan, A.; Birant, D. A weighted majority voting ensemble approach for classification. In Proceedings of the 4th International Conference on Computer Science and Engineering (UBMK), Samsun, Türkiye, 11–15 September 2019; pp. 1–6. [Google Scholar]
Cao, J.J.; Kwong, R.; Wang, R.; Li, K.; Li, X. Class-specific soft voting based multiple extreme learning machines ensemble. Neurocomputing 2015, 149, 275–284. [Google Scholar] [CrossRef]
Warner, B.; Ratner, E.; Lendasse, A. Edammo’s extreme AutoML technology—Benchmarks and analysis. In International Conference on Extreme Learning Machine; Springer: Berlin/Heidelberg, Germany, 2021; pp. 152–163. [Google Scholar]
Khan, K.; Ratner, E.; Ludwig, R.; Lendasse, A. Feature bagging and extreme learning machines: Machine learning with severe memory constraints. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–7. [Google Scholar]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. 2009, 42, 513–529. [Google Scholar] [CrossRef] [PubMed]
Lan, Y.; Soh, Y.C.; Huang, G.B. Ensemble of online sequential extreme learning machine. Neurocomputing 2009, 72, 3391–3395. [Google Scholar] [CrossRef]
Liu, N.; Wang, H. Ensemble based extreme learning machine. IEEE Signal Process. Lett. 2010, 17, 754–757. [Google Scholar]
Wang, H.; He, Q.; Shang, T.; Zhuang, F.; Shi, Z. Extreme learning machine ensemble classifier for large-scale data. In Proceedings of the ELM 2014, Singapore, 8–10 December 2014; Springer: Berlin/Heidelberg, Germany, 2014; Volume 1. [Google Scholar]
Cheng, S.; Yan, J.W.; Zhao, D.F.; Wang, H.M. Short-term load forecasting method based on ensemble improved extreme learning machine. J. Xian Jiaotong Univ. 2009, 43, 106–110. [Google Scholar]
Huang, S.; Wang, B.; Qiu, J.; Yao, J.; Wang, G.; Yu, G. Parallel ensemble of online sequential extreme learning machine based on MapReduce. Neurocomputing 2016, 174, 352–367. [Google Scholar] [CrossRef]
Dean, J.; Ghemawat, S. MapReduce: Simplified data processing on large clusters. Commun. ACM 2008, 51, 107–113. [Google Scholar] [CrossRef]
Wang, G.; Li, P. Dynamic Adaboost ensemble extreme learning machine. In Proceedings of the 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), Chengdu, China, 20–22 August 2010; Volume 3, pp. V3–54. [Google Scholar]
Cao, J.; Hao, J.; Lai, X.; Vong, C.M.; Luo, M. Ensemble extreme learning machine and sparse representation classification. J. Frankl. Inst. 2016, 353, 4526–4541. [Google Scholar] [CrossRef]
Huang, K.; Aviyente, S. Sparse representation for signal classification. In Advances in Neural Information Processing Systems 19, Proceedings of the 2006 Conference, Vancouver, BC, Canada, 4–7 December 2006; MIT Press: Cambridge, MA, USA, 2006; Volume 19, p. 19. [Google Scholar]
Cao, J.; Lin, Z.; Huang, G.B.; Liu, N. Voting based extreme learning machine. Inf. Sci. 2012, 185, 66–77. [Google Scholar] [CrossRef]
Martinez-Munoz, G.; Hernandez-Lobato, D.; Suarez, A. An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 31, 245–259. [Google Scholar] [CrossRef] [PubMed]
Zhang, L.; Zhou, W.D. Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognit. 2011, 44, 97–106. [Google Scholar] [CrossRef]
Grove, A.J.; Schuurmans, D. Boosting is the limit: Maximizing the margin of learned ensembles. In Proceedings of the AAI/IAAI 98, Madison, WI, USA, 26–30 July 1998; pp. 692–699. [Google Scholar]
Chen, H.; Tino, P.; Yao, X. A probabilistic ensemble pruning algorithm. In Proceedings of the Sixth IEEE Internal Conference on Data Mining-Workshops (ICDMW’06), Hong Kong, China, 18–22 December 2006; pp. 878–882. [Google Scholar]
Fletcher, S.; Islam, M.Z. Comparing sets of patters with the Jaccard Index. Australas. J. Inf. Syst. 2018, 22. [Google Scholar] [CrossRef]
Siegler, R. Balance Scale. In UCI Machine Learning Repository; UC Irvine: Irvine, CA, USA, 1994. [Google Scholar]
Lim, T.S. Contraceptive Method Choice. In UCI Machine Learning Repository; UC Irvine: Irvine, CA, USA, 1997. [Google Scholar]
Alcock, R. Synthetic Control Chart Time Series. In UCI Machine Learning Repository; UC Irvine: Irvine, CA, USA, 1999. [Google Scholar]
Bohanec, M. Car evaluation. In UCI Machine Learning Repository; UC Irvine: Irvine, CA, USA, 1997. [Google Scholar]
Reyes-Ortiz, J.; Anguita, D.; Ghio, A.; Oneto, L.; Parra, X. Human Actvity Recognition Using Smartphones. In UCI Machine Learning Repository; UC Irvine: Irvine, CA, USA, 2012. [Google Scholar]
Realinho, V.; Martins, M.; Machado, J.; Baptista, L. Predict Students’ Dropout and Academic Success. In UCI Machine Learning Repository; UC Irvine: Irvine, CA, USA, 2021. [Google Scholar]
Ciarelli, P.; Oliveira, E. CNAE-9. In UCI Machine Learning Repository; UC Irvine: Irvine, CA, USA, 2012. [Google Scholar]
Fisher, R.A. Iris. In UCI Machine Learning Repository; UC Irvine: Irvine, CA, USA, 1988. [Google Scholar]

Figure 1. The neural structure of an extreme learning machine (ELM).

Figure 2. Flowchart depicting the training phase of our novel approach.

Figure 3. Flowchart depicting the inference phase of our novel approach.

Figure 4. Summary of the performance ranking based on the average of each metric.

Table 1. Number of neurons used in each dataset.

Dataset	Min # of Neurons	Mid # of Neurons	Max # of Neurons
Balance scale	10	29	84
Synthetic control	10	42	180
Contraceptive method choice	10	38	147
Car evaluation	10	42	173
Activity recognition	10	129	1683
Student success	10	67	442
CNAE9	10	74	540
Iris	10	12	15
Dry bean	10	117	1361
Yeast	10	39	148

Table 2. Summary of dataset characteristics.

Dataset	# Instances	# Features	# Classes	Normalized Class St.dev.
Balance scale	839	23	3	0.6623
Synthetic control	600	60	6	0.00
Contraceptive method choice	1473	9	9	0.3035
Car evaluation	1728	6	4	1.2495
Activity recognition	10,299	561	6	0.1362
Student success	4424	36	3	0.4808
CNAE9	1080	856	9	0.00
Iris	150	4	3	0.000
Dry bean	13,610	15	7	0.4951
Yeast	1483	9	10	1.1711

Table 3. Summary of classification performance across methods.

Dataset	CSSV	Class (Ours)	WMVE	Single Vote
Balance
Accuracy	0.9073	0.9186	0.9169	0.9169
F1 score	0.7475	0.7499	0.7319	0.7319
Precision	0.8521	0.8584	0.8305	0.8305
Recall	0.7289	0.7322	0.7122	0.7122
Avg. Jaccard	0.8345	0.8507	0.8478	0.8478
Synthetic
Accuracy	0.9233	0.9633	0.9523	0.9456
F1 score	0.9225	0.9629	0.9523	0.9456
Precision	0.9219	0.9657	0.9564	0.9423
Recall	0.9106	0.9564	0.9533	0.9416
Avg. Jaccard	0.8606	0.9297	0.9115	0.8997
CMC
Accuracy	0.5438	0.5275	0.5275	0.5268
F1 score	0.5257	0.5161	0.5161	0.5154
Precision	0.5216	0.5214	0.5214	0.5208
Recall	0.5300	0.5294	0.5294	0.5282
Avg. Jaccard	0.3624	0.3593	0.3592	0.3585
Car
Accuracy	0.9456	0.9138	0.9103	0.9051
F1 score	0.8777	0.8417	0.8357	0.8269
Precision	0.9114	0.9096	0.9064	0.8991
Recall	0.8315	0.8244	0.8260	0.8112
Avg. Jaccard	0.8973	0.8417	0.8357	0.8269
Activity recognition
Accuracy	0.9494	0.9501	0.9501	0.9423
F1 score	0.8697	0.8735	0.8735	0.8613
Precision	0.9178	0.9213	0.9213	0.9115
Recall	0.9101	0.9134	0.9134	0.9090
Avg. Jaccard	0.9037	0.9050	0.9050	0.8909
Student success
Accuracy	0.7511	0.7468	0.7462	0.7455
F1 score	0.6736	0.6723	0.6708	0.6695
Precision	0.6895	0.6888	0.6877	0.6787
Recall	0.6704	0.6652	0.6638	0.6589
Avg. Jaccard	0.6018	0.5965	0.5956	0.5948
CNAE-9
Accuracy	0.8463	0.9194	0.9167	0.9093
F1 score	0.8484	0.9204	0.9171	0.9104
Precision	0.8874	0.9312	0.9273	0.9185
Recall	0.8715	0.9194	0.9167	0.9058
Avg. Jaccard	0.7350	0.8516	0.8468	0.8348
Iris
Accuracy	0.9400	0.9467	0.9467	0.9467
F1 score	0.9385	0.9458	0.9458	0.9458
Precision	0.9485	0.9549	0.9549	0.9549
Recall	0.9359	0.9467	0.9467	0.9467
Avg. Jaccard	0.8904	0.9029	0.9029	0.9029
Dry Bean
Accuracy	0.9255	0.9258	0.9254	0.9249
F1 Score	0.9392	0.9391	0.9389	0.9382
Precision	0.9355	0.9356	0.9353	0.9349
Recall	0.9370	0.9371	0.9368	0.9329
Avg. Jaccard	0.8837	0.8840	0.8836	0.8789
Yeast
Accuracy	0.5400	0.5381	0.5280	0.5219
F1 Score	0.5015	0.4936	0.4943	0.4889
Precision	0.5360	0.5322	0.5358	0.5298
Recall	0.4936	0.4936	0.4924	0.4901
Avg. Jaccard	0.3766	0.3735	0.3765	0.3701

Table 4. Ranking of performance across datasets.

Dataset	CSSV	Class (Ours)	WMVE	Single Vote
Balance	4	1	2	3
Synthetic	4	1	2	3
CMC	1	2	2	4
Car	1	2	3	4
Activity recognition	3	1	1	4
Student	1	2	3	4
CNAE9	4	1	2	3
Iris	4	1	1	1
Dry bean	2	1	3	4
Yeast	1	2	3	4
Average rank	2.5	1.4	2.2	3.4

The ranking was based on the majority of the included metrics.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Warner, B.; Ratner, E.; Carlous-Khan, K.; Douglas, C.; Lendasse, A. Ensemble Learning with Highly Variable Class-Based Performance. Mach. Learn. Knowl. Extr. 2024, 6, 2149-2160. https://doi.org/10.3390/make6040106

AMA Style

Warner B, Ratner E, Carlous-Khan K, Douglas C, Lendasse A. Ensemble Learning with Highly Variable Class-Based Performance. Machine Learning and Knowledge Extraction. 2024; 6(4):2149-2160. https://doi.org/10.3390/make6040106

Chicago/Turabian Style

Warner, Brandon, Edward Ratner, Kallin Carlous-Khan, Christopher Douglas, and Amaury Lendasse. 2024. "Ensemble Learning with Highly Variable Class-Based Performance" Machine Learning and Knowledge Extraction 6, no. 4: 2149-2160. https://doi.org/10.3390/make6040106

APA Style

Warner, B., Ratner, E., Carlous-Khan, K., Douglas, C., & Lendasse, A. (2024). Ensemble Learning with Highly Variable Class-Based Performance. Machine Learning and Knowledge Extraction, 6(4), 2149-2160. https://doi.org/10.3390/make6040106

Article Menu

Ensemble Learning with Highly Variable Class-Based Performance

Abstract

1. Introduction

2. Materials and Methods

2.1. Extreme Learning Machines

2.2. ELM Ensembles

2.3. Ensemble Model Parameters

2.4. Simple Voting Ensemble

2.5. Weighted Majority Voting Ensemble (WMVE)

2.6. Class-Specific Soft Voting (CSSV) Ensemble

2.7. Novel Class-Based Weighted Ensemble System

2.8. Benchmarking Approach

2.9. Datasets

3. Results

4. Discussion

5. Limitations

6. Patents

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI