Next Article in Journal
P-CA: Privacy-Preserving Convolutional Autoencoder-Based Edge–Cloud Collaborative Computing for Human Behavior Recognition
Previous Article in Journal
Evaluating Order Allocation Sustainability Using a Novel Framework Involving Z-Number
Previous Article in Special Issue
An Improved K-Means Algorithm Based on Contour Similarity
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Predicting the Performance of Ensemble Classification Using Conditional Joint Probability

1
Education and Research Center for IoT Convergence Intelligent City Safety Platform, Chonnam National University, Gwangju 61186, Republic of Korea
2
Department of Creative Technologies, Faculty of Computing & AI, Air University, Islamabad 44230, Pakistan
3
Department of Intelligent Electronics and Computer Engineering, Chonnam National University, Gwangju 61186, Republic of Korea
4
Department of Technology and Safety, UiT the Arctic University of Norway, 9019 Tromsø, Norway
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(16), 2586; https://doi.org/10.3390/math12162586
Submission received: 29 July 2024 / Revised: 14 August 2024 / Accepted: 16 August 2024 / Published: 21 August 2024
(This article belongs to the Special Issue Optimization Algorithms in Data Science: Methods and Theory)

Abstract

:
In many machine learning applications, there are many scenarios when performance is not satisfactory by single classifiers. In this case, an ensemble classification is constructed using several weak base learners to achieve satisfactory performance. Unluckily, the construction of the ensemble classification is empirical, i.e., to try an ensemble classification and if performance is not satisfactory then discard it. In this paper, a challenging analytical problem of the estimation of ensemble classification using the prediction performance of the base learners is considered. The proposed formulation is aimed at estimating the performance of ensemble classification without physically developing it, and it is derived from the perspective of probability theory by manipulating the decision probabilities of the base learners. For this purpose, the output of a base learner (which is either true positive, true negative, false positive, or false negative) is considered as a random variable. Then, the effects of logical disjunction-based and majority voting-based decision combination strategies are analyzed from the perspective of conditional joint probability. To evaluate the forecasted performance of ensemble classifier by the proposed methodology, publicly available standard datasets have been employed. The results show the effectiveness of the derived formulations to estimate the performance of ensemble classification. In addition to this, the theoretical and experimental results show that the logical disjunction-based decision outperforms majority voting in imbalanced datasets and cost-sensitive scenarios.

1. Introduction

In many classification scenarios and datasets, achieving satisfactory detection performance is a critical problem [1]. In such scenarios, machine learning experts naturally move to the ensemble classification to combine multiple base classifiers (learners) to achieve satisfactory accurate decisions. In ensemble classification, majority voting has gained significant attention from the research community because of its effectiveness, simplicity, and democratic style of combining the population decisions [2]. Unluckily, the construction of ensemble classification has been empirical, i.e., first, an ensemble classification is constructed and if its performance is not satisfactory then it is discarded. Contrary to the empirical approach, there is an analytical approach which is deductive. In the analytical approach, firstly a mathematical model is formulated, and then an ensemble classifier constructed accordingly. Contrary to the objectiveless, directionless, random style, and luck-driven efforts in empirical construction, the analytical approach is purposeful, goal-oriented, and systematic. Additionally, once an analytical model is built, it is useful as it can be employed to construct future models, which is not possible in an empirical approach [3].
In many applications related to video surveillance, driving assistance, pedestrian detection, and disease diagnostics, there are some additional challenges because of their cost-sensitive nature. For instance, in surveillance of human-prohibited areas, a false negative (missing a human detection) has the worst cost compared to false positive. Similarly, cancer diagnosis is also critical in which missing a cancer tumor can result in severe damage, even death of the patient. Whereas, falsely detecting a cancer tumor in a healthy person costs only some money, which can be further identified as a normal person in further tests. Likewise, in automatic driving and pedestrian detection systems, missing a human may result in serious injury or death. Consequently, for such tasks, there is a need for a cost-sensitive detection system to meet the required objectives [4,5].
In imbalanced machine learning datasets, the number of positive and negative samples differs significantly. This results in the bias of classification decisions towards the majority class samples [6,7]. Unfortunately, many datasets from cost-sensitive applications are significantly imbalanced. Even more, the number of positive samples in these datasets is critically lesser than negative samples, which results in higher false negative rates, which has a higher penalty in cost-sensitive applications [8].
There are some cost-sensitive classification techniques available in the literature. Unluckily, these contributions mainly focus on either classification technique [9,10] or data sampling [6,11]. For example, Zadrozny et al. [12] associated some weight with class learning examples to achieve cost-sensitive learning. Likewise, Krawczyk et al. [13] uses cost-sensitive analysis for breast thermography. Whereas, Nguyen et al. [6] fixed the classifier tendency to overwhelm in favor of the majority class because of imbalanced datasets by comparing many data resampling techniques and optimizing the cost ratio. Similarly, Singh et al. [14] employed transfer learning for imbalanced breast cancer classification, but a dedicated component to handle imbalanced classification is missing in their methodology. Likewise is the work of Saleeman IV et al. [15], in which they employed Spark for multiclass imbalanced classification. In addition to these conventional learning techniques, there are few deep learning-based techniques also present in the literature. For example, Almarshdi et al. [16] proposed a hybrid deep learning solution for imbalanced classification, but unluckily, the innovation in how to tackle imbalanced data is missing.
Contrary to using a single classifier [6,12,13], Liangyuan et al. [17] employed a cost-sensitive ensemble learning method using majority voting. Fan et al. [18] proposed a pruning mechanism of base classifiers, to minimize the computational cost of ensemble base cost-sensitive ensemble learning. However, these techniques do not consider the role of imbalance datasets towards the bias of the classifier [17,18,19].
In this line of research, some machine learning scientists focused on the role of imbalanced datasets in the designing of ensemble learning models [20]. For example, Zhang et al. [8] have proposed an ensemble method for class-imbalanced datasets by splitting the majority class dataset into various subsets and then training different base learners on minority class samples and each subset of majority class samples. Similarly, Yuan et al. [21] oversampled the dataset, used standard AdaBoost [22], and then applied genetic algorithm (GA) to optimize the weights of the base classifiers. Ali et al. [23] proposed a GentleBoost ensemble for breast cancer by oversampling the minority samples. Their work considers the probability of occurrence of each training sample to incorporate the cost effects. Hou et al. [24] employed dynamic classifier selection [25] to propose a computationally extensive dynamic ensemble classification META-DESKNN-MI. Their model uses SMOTE to fix the class imbalance in the training set. Although Xu and Chetia [26] proposed an efficient implementation of dynamic ensemble classification, unluckily, these are empirical ensemble, and additionally, ensemble selection and class imbalance are treated differently.
Unfortunately, these approaches are empirical, and thereby, they require the construction of an ensemble classifier, and if its performance is not satisfactory then discard it to try another ensemble classifier strategy. There is a significant deficiency in the literature for analytical analysis prior to the construction of ensemble classification. In this research, the problem of designing an empirical model to estimate and predict the performance of ensemble classification such as majority voting and logical disjunction is considered. For this design, the formulations are derived using the concepts of conditional joint probability. For this, the output label of a base learner has been considered as random variables with different probabilistic values for true negative (TN), false negative (FN), true positive (TP), and false positive (FP). This is an important, major, and main aspect of this research. Although, the nature of these formulations and derivations is generic, but we consider the cost-sensitive and imbalanced datasets to evaluate the forecasted performance of the ensemble classification using the derived formulations. In experiments, it is analyzed using an analytical model and experimental observations that in imbalanced datasets with cost-sensitive scenarios, logical disjunction outperforms the contemporary majority voting ensemble classification, thus providing a simple and alternative way in such scenarios.

2. The Formulation to Predict the Performance of Ensemble Classification

In classification, a training dataset is used to learn feature space. After the training process is completed, a test sample is fed into the classifier and it predicts its output label. The output belongs to true positive (TP), true negative (TN), false positive (FP), or false negative (FN). It is to note that the nature of this output is random since giving a number of testing samples generates a random sequence from the set T P ,   T N ,   F P , F N , and thus, this set acts as sample space of this random experiment [27,28]. Using this concept, the methodology used for the derivations of the probability of true positive for ensemble classification and logical disjunction is presented in Figure 1.

2.1. Probability Perspective of Classifier Outputs

This methodology is presented for binary classification problems, wherein the output decision belongs to four categories, i.e., T P ,   T N ,   F N , and F P . Thus, the set T P ,   T N ,   F P , F N is the sample space. Considering the output as a random variable, the probabilities (relative frequencies) of the events are in fact the classification performance measures (TPR, TNR, FPR, FNR) [29], as follows in Equation (1):
p T P = T P R = N T P N p T N = T N R = N T N N p F P = F P R = N F P N p F N = F N R = N F N N

2.2. Ensemble Classifiers

In machine learning, the majority voting ensemble classification technique has gained the attention of the research community because of its effectiveness and simplicity. The enhanced accuracies of ensemble classification are explained by Condorcet’s jury theorem [30], which states (for binary classification):
  • “If individual base classifiers have probabilities greater than 0.5 to correctly classify, then increasing the number of base classifiers, the probability of correct classification in majority voting is increased and it approaches to 1.
  • If individual base classifiers have probabilities less than 0.5 to correctly classify, then increasing the number of base classifiers, the probability of correct classification in majority voting is decreased and it approaches to 0”
In addition to majority voting, this research formulates an analytical model for logical disjunction-based decision aggregation. Although, the nature of derivation is generic, considering the number of base three for majority voting (MV) and two for logical disjunction (LD) just for the sake of simplicity.

2.3. Mutual Dependency

It is to note that since the output of a classifier from the sample space T P ,   T N ,   F P , F N is considered as a random variable, there is mutual dependence among the base learners. For example, if a base learner prediction is true positive, then the other base learner’s prediction is either true positive or false negative. This is because, if one base learner’s prediction is true positive, then it is sure that the sample is positive, and thus, the other base learner predictions are neither true negative nor false positive. Thereby, the output predictions of the base learners are not mutually independent. This important mutual dependency has to be considered while formulating conditional probability distribution for both ensemble classifications.
Consider x ,   y ,   and z as the random variables associated with the outputs of the base classifiers α , β , and γ , respectively. Thereby, if z = T P , then y | z ( y given z ) is either TP or FN. Similarly, if z = T P and y is either T P or F N , then x | y ,   z is also either T P or F N . These mutual dependencies are summarized in Table 1 as follows:
Considering X i , Y i , and Z i ; i T P ,   T N ,   F P , F N as the number of observations for the base classifiers α , β , and γ , respectively, as summarized in Table 2.

2.4. Formulation

In majority voting of three base learners α , β , and γ , the ensemble decision is true positive if at least two base learner decisions are true positive. Thereby, in this majority voting, true positive is when either all of the three base learner outputs are true positive or any two of three base learner outputs are true positive. Considering p α x ,   p β y , and p γ z as the probabilities mass functions of three individual classifiers outputs x , y , z T P ,   F N ,   F P ,   T N , the probability of majority voting to give true positive p M V T P is derived using the concept joint conditional probability distribution. In this equation, p α β γ T P α , T P β , T P γ means the joint probability of the event T P α (base learner α gives T P ), T P β ( β gives T P ), and T P γ ( γ gives T P ). In these derivations, the formula of joint probability for three events P x , y , z = P x y , z P y z P z is to be kept in mind. It is to note that if the output of the base learners β and γ is true positive, then surely one thing is clear, that it is a positive sample. Thus, if a sample is positive, then base learner α has only two options as output, i.e., either to declare it as true positive or false negative. Thus, p α T P α T P β , T P γ = X T P X T P + X F N and in the similar fashion p β T P β T P γ = Y T P Y T P + Y F N and p γ T P γ = Z T P   Z . Using these formulations, p α β γ T P α , T P β , T P γ is to be computed as in Equation (2).
              p α β γ T P α , T P β , T P γ = p α T P α T P β , T P γ   p β T P β T P γ   p γ T P γ p α β γ T P α , T P β , T P γ = X T P X T P + X F N Y T P Y T P + Y F N Z T P   Z
Similarly, by computing p α β γ ~ T P α , T P β , T P γ , p α β γ T P α , ~ T P β , T P γ , and p α β γ T P α , T P β , ~ T P γ , the probability of majority voting to give true positive p M V T P is to be computed as in Equation (3). Figure 1a is additionally helpful in this derivation.
p M V T P = p α β γ T P α , T P β , T P γ + p α β γ ~ T P α , T P β , T P γ + p α β γ T P α , ~ T P β , T P γ      + p α β γ T P α , T P β , ~ T P γ p M V T P = p α T P α T P β , T P γ   p β T P β T P γ   p γ T P γ      + p α ~ T P α T P β , T P γ   p β T P β T P γ   p γ T P γ      + p α T P α ~ T P β , T P γ   p β ~ T P β T P γ   p γ T P γ      + p α T P α T P β , ~ T P γ   p β T P β ~ T P γ   p γ ~ T P γ p M V T P = X T P X T P + X F N Y T P Y T P + Y F N Z T P   Z + X F N X T P + X F N Y T P Y T P + Y F N Z T P Z      + X T P X T P + X F N Y F N Y T P + Y F N Z T P Z      + X T P X T P + X F N Y T P Y T P + Y F N     Z F N Z F N + Z F P + Z T N 1 Z T P Z
In logical disjunction of two base learners α and β , the ensemble decision is true positive if at least one base learner decision is true positive. Thereby, in this logical disjunction, true positive is when either both base learner outputs are true positive or any base learner output is true positive. Thus, the probability of logical disjunction to give true positive p L D T P is to be computed as in Equation (4).
p L D T P = p α β T P α , T P β + p α β ~ T P α , T P β + p α β T P α , ~ T P β p L D T P = p α T P α | T P β   p β T P β + p α ~ T P α | T P β   p β T P β      + p α T P α | ~ T P β   p β ~ T P β p L D T P = X T P X T P + X F N Y T P Y + X F N X T P + X F N Y T P   Y      + X T P X T P + X F N Y T N Y F N + Y F P + Y T N 1 Y T P Y
In majority voting of three base learners α , β , and γ , the ensemble decision is false negative if at least two base learner decisions are false negative. Thereby, in this majority voting, false negative is when either all base learner outputs are false negative or any two base learner outputs are false negative. Thus, the probability of majority voting to give false negative p M V F N is to be computed as in Equation (5).
p M V F N = p α β γ F N α , F N β , F N γ + p α β γ ~ F N α , F N β , F N γ      + p α β γ F N α , ~ F N β , F N γ + p α β γ F N α , F N β , ~ F N γ p M V F N = p α F N α F N β , F N γ   p β F N β F N γ   p γ F N γ      + p α ~ F N α F N β , F N γ   p β F N β F N γ   p γ F N γ      + p α F N α ~ F N β , F N γ   p β ~ F N β F N γ   p γ F N γ      + p α F N α F N β , ~ F N γ   p β F N β ~ F N γ   p γ ~ F N γ p M V F N = X F N X T P + X F N   Y F N Y T P + Y F N   Z F N Z + X T P X T P + X F N Y F N Y T P + Y F N Z F N Z      + X F N X T P + X F N Y T P Y T P + Y F N Z F N Z      + X F N X T P + X F N Y F N Y T P + Y F N Z T P   Z T P + Z F P + Z F N 1 Z F N Z
In logical disjunction of two base learners α and β , the ensemble decision is false negative if both base learner decisions are false negative. Thus, the probability of logical disjunction to give false negative p L D F N is to be computed as in Equation (6).
p L D F N = p α β F N α , F N β = p α F N α | F N β   p β F N β p L D F N = X F N X T P + X F N Y F N Y
In majority voting of three base learners α , β , and γ , the ensemble decision is false positive if at least two base learner decisions are false positive. Thereby, in this majority voting, false positive is when either all base learner outputs are false positive or any two of three base learner outputs are false positive. Thus, the probability of majority voting to give false positive p M V F P is to be computed as in Equation (7).
p M V F P = p α β γ F P α , F P β , F P γ + p α β γ ~ F P α , F P β , F P γ      + p α β γ F P α , ~ F P β , F P γ + p α β γ F P α , F P β , ~ F P γ p M V F P = p α F P α F P β , F P γ   p β F P β F P γ   p γ F P γ      + p α ~ F P α F P β , F P γ   p β F P β F P γ   p γ F P γ      + p α F P α ~ F P β , F P γ   p β ~ F P β F P γ   p γ F P γ      + p α F P α F P β , ~ F P γ   p β F P β ~ F P γ   p γ ~ F P γ p M V F P = X F P X F P + X T N Y F P Y F P + Y T N Z F P Z + X T N X F P + X T N Y F P Y F P + Y T N Z F P Z      + X T P X F P + X T N Y T N Y F P + Y T N Z F P Z      + X T P X F P + X T N Y T P Y F P + Y T N Z T N Z T P + Z F N + Z F P 1 Z F P Z
In logical disjunction of two base learners α and β , the ensemble decision is false positive if any base learner decision is false positive. Thereby, in this logical disjunction, false positive is when either both base learner outputs are false positive or any base learner output is false positive. Thus, the probability of logical disjunction to give false positive p L D F P is to be computed as in Equation (8).
p L D F P = p α β F P α , F P β + p α β ~ F P α , F P β + p α β F P α , ~ F P β p L D F P = p α F P α , F P β   p β F P β + p α ~ F P α | F P β   p β F P β      + p α F P α | ~ F P β   p β ~ F P β p L D F P = X F P X F P + X T N Y F P Y + X T N X F P + X T N Y F P Y      + X F P X F P + X T N Y T N Y T P + Y F N + Y T N 1 Y F P Y
In majority voting of three base learners α , β , and γ , the ensemble decision is true negative if at least two base learner decisions are true negative. Thereby, in this majority voting, true negative is when either all base learner outputs are true negative or any two base learner outputs are true negative. Thus, the probability of majority voting to give true negative p M V T N is to be computed as in Equation (9).
p M V T N = p α β γ T N α , T N β , T N γ + p α β γ ~ T N α , T N β , T N γ + p α β γ T N α , ~ T N β , T N γ      + p α β γ T N α , T N β , ~ T N γ p M V T N = p α T N α T N β , T N γ   p β T N β T N γ   p γ T N γ + p α ~ T N α T N β , T N γ   p β T N β T N γ   p γ T N γ      + p α T N α ~ T N β , T N γ   p β ~ T N β T N γ   p γ T N γ      + p α T N α T N β , ~ T N γ   p β T N β ~ T N γ   p γ ~ T N γ p M V T N = X T N X F P + X T N Y T N Y F P + Y T N Z T N Z + X F P X F P + X T N Y T N Y F P + Y T N Z T N Z + X T N X F P + X T N Y F P Y F P + Y T N Z T N Z      + X T N X F P + X T N Y T N Y F P + Y T N Z F P Z T P + Z F N + Z F P 1 Z T N Z
In logical disjunction of two base learners α and β , the ensemble decision is true negative if both base learner decisions are true negative. Thus, the probability of logical disjunction to give true negative p L D F N is to be computed as in Equation (10).
p L D T N = p α β T N α , T N β = p α T N α | T N β   p β T N β p L D T N = X T N X F P + X T N Y T N Y

3. Experiments and Results

To evaluate the proposed analytical formulations to predict the performance of ensemble classification, UCI machine learning repository has been considered. To establish another interesting fact of these proposed analytical formulations, imbalanced datasets have been considered. In addition to the significant difference between the number of samples for each class, these datasets are also cost-sensitive, i.e., there is a different cost of falsely predicting a positive (minority class) or negative (majority) sample, since negative samples are in the majority, as shown in Table 3. Thereby, the base learners have a tendency to predict more towards negative class as compared to positive class and thus p F N > p F P . In addition to this, the cost of false negative is greater than false positive c F N > c F P , where negative means being healthy and positive means being diseased. In this regard, these datasets create the intensified scenario of cost-sensitive imbalanced classification c F N p F N > c F P p F P . From this perspective, the proposed analytical formulations are evaluated on four different datasets, as in the following subsections.

3.1. Breast Cancer Dataset

This dataset is generated by the institute of oncology, university medical center Ljubljana, Yugoslavia. This binary dataset is described by 9 medical attributes, and it includes 85 positive (recurrence-events) and 201 negative (no-recurrence-events) instances of cancer patients [31]. The observed confusion matrices of the individual and ensemble classifiers are shown in Table 4, where the positive class means a person has breast cancer, whereas the negative class means a person is normal. Using these confusion matrices, the observed probabilities are compared with the predicted probabilities computed from the proposed formulations and are shown in Table 5.

3.2. Wilt Dataset

This dataset was generated from a remote sensing study for detecting diseased trees using Quickbird (a satellite) imagery. This is a highly imbalanced class containing 74 positive (diseased trees) and 4265 negative (normal) instances [32]. The observed confusion matrices of the individual and ensemble classifiers are shown in Table 6, where the positive class means a tree is diseased, whereas the negative class means the tree is normal. Using these confusion matrices, the observed probabilities are compared with the predicted probabilities computed from the proposed formulations and are shown in Table 7.

3.3. Haberman’s Survival Dataset

This dataset is about the survival of the patients of Billings Hospital Chicago who underwent breast surgery because of cancer. This dataset is described by three features, and it includes 81 positive (the patient died within 5 years after the surgery) and 225 negative (the patient survived 5 years or longer after the surgery) instances [33]. The observed confusion matrices of the individual and ensemble classifiers are shown in Table 8, where the positive class means a patient will survive after treatment, whereas the negative class means the patient will not survive after treatment. Using these confusion matrices, the observed probabilities are compared with the predicted probabilities computed from the proposed formulations and are shown in Table 9.

3.4. Thoracic Surgery Dataset

This dataset is about the survival of the patients of the Wroclaw Thoracic Surgery Center who underwent major lung resections because of primary lung cancer. This dataset contains 70 positive (patient died within 1 year after the surgery) and 400 negative (patients survived 1 year or longer after the surgery) instances. The observed confusion matrices of the individual and ensemble classifiers are shown in Table 10, where the positive class means a patient will survive after treatment, whereas the negative class means the patient will not survive after treatment. Using these confusion matrices, the observed probabilities are compared with the predicted probabilities computed from the proposed formulations and are shown in Table 11.

3.5. Discussion & Analysis

The experimental results in Table 5, Table 7, Table 9 and Table 11 are described as graphs in Figure 2 to facilitate the comparison. From these tables and figure, it is to note that the predicted performances ( p F N ,   p F P ,   p F N ,   and p T N ) of the ensemble classifications match with the observed performance. These observations are quite encouraging and validate the effectiveness of the proposed formulations for analytical analysis prior to the actual development of ensemble classification. Thus, the proposed analytical analysis is quite helpful for deciding which base learners to be chosen and the number of base learners. The wise and early decision in this regard is useful in saving time, contrary to the empirical approach in which a model is first constructed and then continued to be discarded if not satisfactory. This is an important, major, and main aspect of the proposed formulations.
Understanding the nature of logical disjunction and majority voting base ensemble classifications, in Equations (3)–(10), it is to note that logical disjunction classifies a positive sample if any base learner classifies it as a positive sample. This is contrary to ensemble classification, which needs the majority of base learner decisions to label it as a positive sample. Thereby, it results in a decrease in the false negative rate with a tradeoff of an increase in the false positive rate, as in Table 5, Table 6, Table 7, Table 8, Table 9, Table 10 and Table 11 and Figure 2. Understanding the cost of false negatives as compared to false positives in disease diagnosis, this increase is quite useful. In these datasets, there is the scenario of c F N p F N > c F P p F P , and thereby, logical disjunction has been beneficial. If there is a contrary scenario of c F N p F N < c F P p F P , then logical conjunction is beneficial.

3.6. Conclusions

This research initiates from the consideration of true positive rate, false negative rate, false positive rate, and true negative rate as probabilities of base learners. Using this information, the concept of conditional joint probability has been applied to derive the analytical model to predict the performance of ensemble classification techniques such as majority voting and logical disjunction. The derivation of the analytical model shows that the performance of such ensemble classification can be predicted even before its actual construction using the individual performances of the base learners. The experimental observations justify the prediction of this performance. This analytical approach is useful for purposeful efforts in the construction of an appropriate ensemble classifier, contrary to the empirical approach which is a trial-based mechanism. Additionally, in the analysis and comparison of the predicted and observed performances, it is observed that for highly imbalanced datasets, the choice of logical disjunction is more appropriate as compared to the conventional majority voting for ensemble classification. Furthermore, in the case of cost-sensitive classification with highly imbalanced datasets, logical disjunction is even more appropriate as compared to majority voting. This study also shows that unwanted classification effects from highly imbalanced datasets can also be fixed using logical disjunction-based ensemble classification, contrary to the conventional under-sampling and over-sampling solutions.

Author Contributions

Conceptualization: I.M.; Methodology: I.M.; Formal Analysis: I.M. and M.A.; Investigation: I.M. and M.A.; Resources: J.-Y.K.; Writing—Original Draft Preparation: I.M.; Writing—Review & Editing: I.M., M.A. and J.-Y.K.; Visualization, I.M.; Supervision, J.-Y.K.; Project Administration: J.-Y.K.; Funding Acquisition: J.-Y.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the BK21 FOUR Program (Fostering Outstanding Universities for Research, 5199991714138) funded by the Ministry of Education (MOE, Korea) and National Research Foundation of Korea (NRF).

Data Availability Statement

The authors declared that the datasets used in this research are publicly available.

Conflicts of Interest

The authors declare that they have no conflicts of interests that could have influenced the work reported in this paper.

References

  1. Leevy, J.L.; Khoshgoftaar, T.M.; Bauder, R.A.; Seliya, N. A survey on addressing high-class imbalance in big data. J. Big Data 2018, 5, 42. [Google Scholar] [CrossRef]
  2. He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
  3. Flynn, B.B.; Sakakibara, S.; Schroeder, R.G.; Bates, K.A.; Flynn, E.J. Empirical research methods in operations management. J. Oper. Manag. 1990, 9, 250–284. [Google Scholar] [CrossRef]
  4. Elkan, C. The foundations of cost-sensitive learning. In International Joint Conference on Artificial Intelligence; Lawrence Erlbaum Associates Ltd.: Mahwah, NJ, USA, 2001; Volume 17, pp. 973–978. [Google Scholar]
  5. Roy, D.; Roy, A.; Roy, U. Learning from Imbalanced Data in Healthcare: State-of-the-Art and Research Challenges. In Computational Intelligence in Healthcare Informatics; Acharjya, D.P., Ma, K., Eds.; Springer Nature: Singapore, 2024; pp. 19–32. [Google Scholar]
  6. Thai-Nghe, N.; Gantner, Z.; Schmidt-Thieme, L. Cost-sensitive learning methods for imbalanced data. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; pp. 1–8. [Google Scholar]
  7. El Hlouli, F.Z.; Riffi, J.; Mahraz, M.A.; Yahyaouy, A.; El Fazazy, K.; Tairi, H. Credit Card Fraud Detection: Addressing Imbalanced Datasets with a Multi-phase Approach. SN Comput. Sci. 2024, 5, 173. [Google Scholar] [CrossRef]
  8. Zhang, Y.; Wang, D. A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets. Abstr. Appl. Anal. 2013, 2013, 196256. [Google Scholar]
  9. Cervantes, J.; Li, X.; Yu, W. Imbalanced data classification via support vector machines and genetic algorithms. Connect. Sci. 2014, 26, 335–348. [Google Scholar] [CrossRef]
  10. Wang, S.; Minku, L.L.; Chawla, N.; Yao, X. Learning from data streams and class imbalance. Connect. Sci. 2019, 31, 103–104. [Google Scholar] [CrossRef]
  11. Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data 2019, 6, 27. [Google Scholar] [CrossRef]
  12. Zadrozny, B.; Langford, J.; Abe, N. Cost-sensitive learning by cost-proportionate example weighting. In Proceedings of the Third IEEE International Conference on Data Mining, (ICDM) 2003, Melbourne, FL, USA, 19–22 November 2003; pp. 435–442. [Google Scholar]
  13. Krawczyk, B.; Schaefer, G.; Wozniak, M. Breast thermogram analysis using a cost-sensitive multiple classifier system. In Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), Hong Kong, China, 5–7 January 2012; pp. 507–510. [Google Scholar] [CrossRef]
  14. Singh, R.; Ahmed, T.; Kumar, A.; Singh, A.K.; Pandey, A.K.; Singh, S.K. Imbalanced Breast Cancer Classification Using Transfer Learning. IEEE/ACM Trans. Comput. Biol. Bioinform. 2021, 18, 83–93. [Google Scholar] [CrossRef] [PubMed]
  15. Sleeman Iv, W.C.; Krawczyk, B. Multi-class imbalanced big data classification on Spark. Knowl.-Based Syst. 2021, 212, 106598. [Google Scholar] [CrossRef]
  16. Almarshdi, R.; Nassef, L.; Fadel, E.; Alowidi, N. Hybrid Deep Learning Based Attack Detection for Imbalanced Data Classification. Intell. Autom. Soft Comput. 2023, 35, 297–320. [Google Scholar]
  17. Liangyuan, L.; Mei, C.; Hanhu, W.; Wei, C.; Zhiyong, G. A Cost Sensitive Ensemble Method for Medical Prediction. In Proceedings of the First International Workshop on Database Technology and Applications, Hong Kong, China, 25–26 April 2009; pp. 221–224. [Google Scholar] [CrossRef]
  18. Wei, F.; Fang, C.; Haixun, W.; Philip, S.Y. Pruning and dynamic scheduling of cost-sensitive ensembles. In Proceedings of the Eighteenth National Conference on Artificial Intelligence, Edmonton, AB, Canada, 28 July–1 August 2002. [Google Scholar]
  19. Chakraborty, T.; Chakraborty, A.K.; Murthy, C.A. A nonparametric ensemble binary classifier and its statistical properties. Stat. Probab. Lett. 2019, 149, 16–23. [Google Scholar] [CrossRef]
  20. Depto, D.S.; Rizvee, M.M.; Rahman, A.; Zunair, H.; Rahman, M.S.; Mahdy, M.R.C. Quantifying imbalanced classification methods for leukemia detection. Comput. Biol. Med. 2023, 152, 106372. [Google Scholar] [CrossRef]
  21. Bo, Y.; Xiaoli, M. Sampling + reweighting: Boosting the performance of AdaBoost on imbalanced datasets. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), Brisbane, Australia, 10–15 June 2012; pp. 1–6. [Google Scholar]
  22. Bartlett, P.; Traskin, M. Adaboost is consistent. In Advances in Neural Information Processing Systems; NeurIPS: New Orleans, LA, USA, 2006; Volume 19. [Google Scholar]
  23. Ali, S.; Majid, A.; Javed, S.G.; Sattar, M. Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data. Comput. Biol. Med. 2016, 73, 38–46. [Google Scholar] [PubMed]
  24. Hou, W.-H.; Wang, X.-K.; Zhang, H.-Y.; Wang, J.-Q.; Li, L. A novel dynamic ensemble selection classifier for an imbalanced data set: An application for credit risk assessment. Knowl.-Based Syst. 2020, 208, 106462. [Google Scholar] [CrossRef]
  25. Cruz, R.M.O.; Sabourin, R.; Cavalcanti, G.D.C. Dynamic classifier selection: Recent advances and perspectives. Inf. Fusion 2018, 41, 195–216. [Google Scholar] [CrossRef]
  26. Xu, H.; Chetia, C. An Efficient Selective Ensemble Learning with Rejection Approach for Classification. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 2816–2825. [Google Scholar] [CrossRef]
  27. Soong, T.T. Fundamentals of Probability and Statistics for Engineers; Chapter 2: Basic Probability Concepts, Sec. 2.2 Sample Space and Probability Measure; John Wiley & Sons: Hoboken, NJ, USA, 2004; pp. 12–13. [Google Scholar]
  28. Edition, F.; Papoulis, A.; Pillai, S.U. Probability, Random Variables, and Stochastic Processes; McGraw-Hill Europe: New York, NY, USA, 2002. [Google Scholar]
  29. Theodoridis, S.; Koutroumbas, K. Chapter 10—Supervised Learning: The Epilogue, Sections 10.2 Error-Counting Approach and 10.3 Exploiting The Finite Size of The Data Set. In Pattern Recognition, 4th ed.; Theodoridis, S., Koutroumbas, K., Eds.; Academic Press: New York, NY, USA, 2009; pp. 568–573. [Google Scholar]
  30. Lior, R. Ensemble-based classifiers. Artif. Intell. Rev. 2010, 33, 1–39. [Google Scholar] [CrossRef]
  31. Zwitter, M.; Soklic, M. Breat Cancer Data Set; UCI Machine Learning Respository: Irvine, CA, USA, 1988. [Google Scholar]
  32. Johnson, B. Wilt Data Set; UCI Machine Learning Repository: Irvine, CA, USA, 2014. [Google Scholar]
  33. Lim, T.-S. Haberman’s Survival Data Set; UCI Machine Learning Repository: Irvine, CA, USA, 1999. [Google Scholar]
Figure 1. Analytical methodologies employed to derive the formulation of true positive probabilities for (a) majority voting based upon three base learners, α , β , and γ . The first layer just represents the presence of three base learners. The next layer represents the four possibilities in which majority voting gives true positive, i.e., each base learner decision is true positive T P α , T P β , T P γ or any two of three base learner decisions are true positive ~ T P α , T P β , T P γ , T P α , ~ T P β , T P γ , or T P α , T P β , ~ T P γ . The next layer represents the probabilities of these possibilities to be computed. The next layer uses the formula of joint probability P x , y , z = P x y , z P y z P z , The next layer computes probabilities using confusion matrix (b) logical disjunction based upon two base learners α and β . In logical disjunction, the output decision is true positive if any base learner output is true positive. This is because, in logical disjunction, the output is positive if any base learner predicts that it is a positive sample. The description of the other layers is similar to majority voting.
Figure 1. Analytical methodologies employed to derive the formulation of true positive probabilities for (a) majority voting based upon three base learners, α , β , and γ . The first layer just represents the presence of three base learners. The next layer represents the four possibilities in which majority voting gives true positive, i.e., each base learner decision is true positive T P α , T P β , T P γ or any two of three base learner decisions are true positive ~ T P α , T P β , T P γ , T P α , ~ T P β , T P γ , or T P α , T P β , ~ T P γ . The next layer represents the probabilities of these possibilities to be computed. The next layer uses the formula of joint probability P x , y , z = P x y , z P y z P z , The next layer computes probabilities using confusion matrix (b) logical disjunction based upon two base learners α and β . In logical disjunction, the output decision is true positive if any base learner output is true positive. This is because, in logical disjunction, the output is positive if any base learner predicts that it is a positive sample. The description of the other layers is similar to majority voting.
Mathematics 12 02586 g001aMathematics 12 02586 g001b
Figure 2. Graphical comparison of the predicted and the observed performance of majority voting and logical disjunction-based ensemble classification techniques for the (a) Breast Cancer Dataset, (b) Wilt Dataset, (c) Haberman’s Survival Dataset, and (d) Thoracic Surgery Dataset.
Figure 2. Graphical comparison of the predicted and the observed performance of majority voting and logical disjunction-based ensemble classification techniques for the (a) Breast Cancer Dataset, (b) Wilt Dataset, (c) Haberman’s Survival Dataset, and (d) Thoracic Surgery Dataset.
Mathematics 12 02586 g002aMathematics 12 02586 g002b
Table 1. Mutual dependencies of the base learner outputs. The first row represents that if prediction z of the first classifier is TP, then y given z ( y | z ) can either be TP or FN. Similarly, x | y , z can also be either TP or FN in this case.
Table 1. Mutual dependencies of the base learner outputs. The first row represents that if prediction z of the first classifier is TP, then y given z ( y | z ) can either be TP or FN. Similarly, x | y , z can also be either TP or FN in this case.
Sr. # z y | z x | y , z
1 T P { T P , F N } { T P , F N }
2 F N { T P , F N } { T P , F N }
3 F P { F P , T N } { F P , T N }
4 T N { F P , T N } { F P , T N }
Table 2. Symbolic representation of the number of true positives, false negatives, false positives, and true negatives for the base learners.
Table 2. Symbolic representation of the number of true positives, false negatives, false positives, and true negatives for the base learners.
True PositiveFalse NegativeFalse PositiveTrue NegativeTotal
classifiers   α X T P X F N X F P X T N X = X T P + X F N + X F P + X T N
classifiers   β Y T P Y F N Y F P Y T N Y = Y T P + Y F N + Y F P + Y T N
classifiers   γ Z T P Z F N Z F P Z T N Z = Z T P + Z F N + Z F P + Z T N
Table 3. Distribution (number of samples) of the considered UCI datasets.
Table 3. Distribution (number of samples) of the considered UCI datasets.
Sr. #Dataset+ve Samples−ve SamplesTotal
1Breast Cancer85201286
2Wilt7442654339
3Haberman’s Survival81225306
4Thoracic Surgery70400470
Table 4. Observed confusion matrices of the base learners, logical disjunction, and majority voting base ensemble classification, where the positive class means a person has breast cancer, whereas the negative class means a person is normal in Breast Cancer Dataset.
Table 4. Observed confusion matrices of the base learners, logical disjunction, and majority voting base ensemble classification, where the positive class means a person has breast cancer, whereas the negative class means a person is normal in Breast Cancer Dataset.
Mathematics 12 02586 i001
Table 5. Performance (probabilities) comparison of the base learners with the predicted and observed performances of logical disjunction and majority voting for Breast Cancer Dataset.
Table 5. Performance (probabilities) comparison of the base learners with the predicted and observed performances of logical disjunction and majority voting for Breast Cancer Dataset.
Technique p T P p F N p F P p T N Sum
Bayes0.13640.16080.11540.58741
Decision Stamp0.15730.13990.13990.56291
Naïve Bayes0.12940.16780.10490.59791
Logical Disjunction (Predicted)0.22150.07570.23230.47051
Logical Disjunction (Observed)0.18880.10840.19930.50351
Majority Voting (Predicted)0.13720.16000.05420.64861
Majority Voting (Observed)0.13990.15730.11190.59091
Table 6. Observed confusion matrices of the base learners, logical disjunction, and majority voting base ensemble classification, where the positive class means a tree is diseased, whereas the negative class means the tree is normal in Wilt Dataset.
Table 6. Observed confusion matrices of the base learners, logical disjunction, and majority voting base ensemble classification, where the positive class means a tree is diseased, whereas the negative class means the tree is normal in Wilt Dataset.
Mathematics 12 02586 i002
Table 7. Performance (probabilities) comparison of the base learners with the predicted and observed performances of logical disjunction and majority voting for Wilt Dataset.
Table 7. Performance (probabilities) comparison of the base learners with the predicted and observed performances of logical disjunction and majority voting for Wilt Dataset.
Technique p T P p F N p F P p T N Sum
LMT0.24000.13400.01600.61001
Random Committee0.22600.14800.02400.60201
Randomizable0.23400.14000.03800.58801
Logical Disjunction (Predicted)0.32100.05300.03940.58661
Logical Disjunction (Observed)0.26600.1080.03400.59201
Majority Voting (Predicted)0.25510.11890.00300.62301
Majority Voting (Observed)0.33200.14200.02000.60601
Table 8. Observed confusion matrices of the base learners, logical disjunction and majority voting base ensemble classification, where the positive class means a patient will survive after treatment, whereas the negative class means the patient will not survive after treatment in Haberman’s Survival Dataset.
Table 8. Observed confusion matrices of the base learners, logical disjunction and majority voting base ensemble classification, where the positive class means a patient will survive after treatment, whereas the negative class means the patient will not survive after treatment in Haberman’s Survival Dataset.
Mathematics 12 02586 i003
Table 9. Performance (probabilities) comparison of the base learners with the predicted and observed performances of logical disjunction and majority voting for Haberman’s Survival Dataset.
Table 9. Performance (probabilities) comparison of the base learners with the predicted and observed performances of logical disjunction and majority voting for Haberman’s Survival Dataset.
Technique p T P p F N p F P p T N Sum
JRip0.09480.16990.08170.65361
Logit Boost0.11440.15030.11110.62421
Naïve Bayes Multi-nominal0.11770.14710.11770.61771
Logical Disjunction (Predicted)0.18120.08350.21100.52431
Logical Disjunction (Observed)0.11440.15030.11110.62421
Majority Voting (Predicted)0.09750.16720.03920.69601
Majority Voting (Observed)0.11110.15360.08820.64711
Table 10. Observed confusion matrices of the base learners, logical disjunction and majority voting base ensemble classification, where the positive class means a patient will survive after treatment, whereas the negative class means the patient will not survive after treatment in Thoracic Surgery Dataset.
Table 10. Observed confusion matrices of the base learners, logical disjunction and majority voting base ensemble classification, where the positive class means a patient will survive after treatment, whereas the negative class means the patient will not survive after treatment in Thoracic Surgery Dataset.
Mathematics 12 02586 i004
Table 11. Performance (probabilities) comparison of the base learners with the predicted and observed performances of logical disjunction and majority voting for Thoracic Surgery Dataset.
Table 11. Performance (probabilities) comparison of the base learners with the predicted and observed performances of logical disjunction and majority voting for Thoracic Surgery Dataset.
Technique p T P p F N p F P p T N Sum
Multilayer Perceptron0.03190.11700.09150.75961
Naïve Bayes0.02340.12550.08940.76171
IBK0.02130.12770.10000.75111
Logical Disjunction (Predicted)0.05030.09860.17130.67981
Logical Disjunction (Observed)0.04040.10850.15530.69571
Majority Voting (Predicted)0.01150.13750.02860.82251
Majority Voting (Observed)0.01280.13620.06170.78941
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Murtza, I.; Kim, J.-Y.; Adnan, M. Predicting the Performance of Ensemble Classification Using Conditional Joint Probability. Mathematics 2024, 12, 2586. https://doi.org/10.3390/math12162586

AMA Style

Murtza I, Kim J-Y, Adnan M. Predicting the Performance of Ensemble Classification Using Conditional Joint Probability. Mathematics. 2024; 12(16):2586. https://doi.org/10.3390/math12162586

Chicago/Turabian Style

Murtza, Iqbal, Jin-Young Kim, and Muhammad Adnan. 2024. "Predicting the Performance of Ensemble Classification Using Conditional Joint Probability" Mathematics 12, no. 16: 2586. https://doi.org/10.3390/math12162586

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop