An Adaptive Active Learning Method for Multiclass Imbalanced Data Streams with Concept Drift

Han, Meng; Li, Chunpeng; Meng, Fanxing; He, Feifei; Zhang, Ruihua

doi:10.3390/app14167176

Open AccessArticle

An Adaptive Active Learning Method for Multiclass Imbalanced Data Streams with Concept Drift

by

Meng Han

^*,

Chunpeng Li

,

Fanxing Meng

,

Feifei He

and

Ruihua Zhang

School of Computer Science and Engineering, North Minzu University, Yinchuan 750021, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 7176; https://doi.org/10.3390/app14167176

Submission received: 8 July 2024 / Revised: 9 August 2024 / Accepted: 13 August 2024 / Published: 15 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

Learning from multiclass imbalanced data streams with concept drift and variable class imbalance ratios under a limited label budget presents new challenges in the field of data mining. To address these challenges, this paper proposes an adaptive active learning method for multiclass imbalanced data streams with concept drift (AdaAL-MID). Firstly, a dynamic label budget strategy under concept drift scenarios is introduced, which allocates label budgets reasonably at different stages of the data stream to effectively handle concept drift. Secondly, an uncertainty-based label request strategy using a dual-margin dynamic threshold matrix is designed to enhance learning opportunities for minority class instances and those that are challenging to classify, and combined with a random strategy, it can estimate the current class imbalance distribution by accessing only a limited number of instance labels. Finally, an instance-adaptive sampling strategy is proposed, which comprehensively considers the imbalance ratio and classification difficulty of instances, and combined with a weighted ensemble strategy, improves the classification performance of the ensemble classifier in imbalanced data streams. Extensive experiments and analyses demonstrate that AdaAL-MID can handle various complex concept drifts and adapt to changes in class imbalance ratios, and it outperforms several state-of-the-art active learning algorithms.

Keywords:

data stream classification; multiclass imbalance; concept drift; ensemble learning; active learning

1. Introduction

With the continuous development of the information industry, a large number of data streams have emerged in real-world applications, containing valuable information yet to be mined, such as network intrusion detection [1], fault detection [2], and air-quality prediction [3]. Extracting valuable information from the data streams generated by these applications is an interesting yet challenging and arduous task. Among them, data stream classification has become a prominent research direction in the field of machine learning. A common challenge in data stream classification is concept drift [4], which refers to the inconsistency in data distribution over time, severely affecting the stability of data stream classification. Another challenging issue is multiclass imbalance, where the class distribution in a data stream is skewed [5]. Meanwhile, the class proportions in a data stream may change over time. This change may combine with concept drift, making learning from data streams even more challenging. Figure 1 shows two types of data stream changes. Figure 1a represents the original data stream, Figure 1b indicates that the class distribution in the data stream remains unchanged after concept drift, and Figure 1c shows that the class proportions are reversed after combining concept drift with changes in class distribution.

The combined challenge of concept drift and multiclass imbalance in data stream classification presents new requirements for machine learning algorithms. In this scenario, algorithms must not only handle the imbalanced state of data stream distribution but also promptly identify the occurrence of concept drift, particularly within minority classes. So far, some studies have addressed the issue of classifying imbalanced data streams with concept drift, but most have focused on binary imbalanced classification. Handling the concept drift problem in multiclass imbalance is more challenging because the definitions of majority and minority classes are not fixed in different scenarios, and the roles of different classes may change after concept drift. For example, in Figure 1a, the second largest class is the minority class relative to the largest class, but it is the majority class relative to the smallest class. After concept drift, the previous minority class becomes the majority class, as shown in Figure 1c. Ensemble-based methods have gained extensive attention from researchers in recent years [6]. By combining multiple weak classifiers into a stronger ensemble classifier, they often achieve better performance than single classifiers. The classic leveraging bagging (LB) [7] algorithm uses adaptive window (ADWIN) [8] to detect concept drift in data streams and update the ensemble accordingly. Although LB does not have an explicit mechanism for handling imbalance, its fixed-rate bagging sampling has shown adaptability in handling imbalanced data streams in existing studies. Multiclass oversampling online bagging (MOOB) [9] combines oversampling techniques with online ensembles, determining the sampling rates based on class weights to directly handle multiclass imbalanced data streams. However, it does not effectively address concept drift. The robust online self-adjusting ensemble (ROSE) [10] also adopts the idea of online bagging based on class weights and combines the use of background ensembles to address the combined issue of concept drift and changing class imbalance ratios. In the oversampling strategies of the above algorithms, the sampling rate is determined only by fixed values or class proportions, ignoring the classifier’s own performance and, thus, cannot effectively focus on difficult-to-classify instances in the data stream.

Additionally, the majority of existing algorithms presuppose unrestricted access to the labels of all instances within the data stream throughout the learning process (i.e., in a supervised environment), but this is unlikely in real-world applications due to the high cost of obtaining all instance labels. Therefore, classifying data streams under a limited label budget has become a new problem and challenge. To address this issue, researchers have begun exploring active-learning-based methods [11]. Active learning can effectively address the limited label problem by requesting the most informative instance labels for model learning. The online active learning ensemble framework (OALEnsemble) [12] is based on a hybrid labeling strategy and constructs an ensemble classifier framework consisting of a stable classifier and multiple dynamically updated classifiers, effectively handling gradual and sudden drifts in data streams, but it does not address the imbalance problem. The recently proposed comprehensive active learning method for multiclass imbalanced data streams with concept drift (CALMID) [13] can handle both mixed concept drift and multiclass imbalance in data streams, selecting the most representative instances through a hybrid labeling query strategy based on an asymmetric marginal threshold matrix, providing more opportunities for minority classes. However, existing active learning methods only operate under a fixed label budget and do not effectively allocate the label budget, and higher label budgets and looser selection thresholds should be used at concept drifts.

Learning from multiclass imbalanced data streams with concept drift is particularly challenging when operating under a limited label budget. To overcome the limitations of existing methods, this paper introduces an adaptive active learning algorithm tailored for multiclass imbalanced data streams experiencing concept drift. The primary contributions are as follows:

(1): A dynamic label budget strategy was designed for concept drift scenarios, using ADWIN to detect drifts and adaptively adjust the budget, allowing for the learning of more new instances at concept drifts and accelerating postdrift model recovery.
(2): An uncertainty strategy leveraging a dual-margin dynamic threshold matrix was proposed, which, when combined with a random strategy, offers increased learning opportunities for minority class instances and crucial instances.
(3): An adaptive sampling strategy for multiclass imbalanced data streams was proposed, and classifiers were updated online based on weighted ensemble ideas. This strategy tends to perform higher levels of oversampling for minority classes and difficult-to-classify instances.
(4): Based on the above strategies, we propose an adaptive active learning algorithm. Various experiments were conducted to validate the algorithm’s effectiveness and feasibility, including parameter sensitivity experiments, algorithm comparison experiments, nonparametric statistical analysis, and ablation studies. These experiments comprehensively compared existing supervised and active learning algorithms, providing detailed analysis.

2. Related Work

2.1. Supervised Learning Methods for Multiclass Imbalanced Data Streams

Supervised learning methods typically assume that instance labels in data streams are immediately available. The current supervised learning methods addressing the joint challenges of concept drift and multiclass imbalance in data streams can be broadly divided into block-based batch processing methods and instance-by-instance online learning methods [14].

Block-based algorithms partition instances in the data stream into chunks. When a chunk is filled with instances, the model learns and updates chunk by chunk to adapt to potential concept changes. These methods typically apply resampling techniques to address imbalanced distributions within data chunks, enhancing the algorithm’s ability to handle imbalanced data streams. The meta-cognitive online sequential extreme learning machine (MOS-ELM) [15] algorithm was the first to focus on the joint problem of concept drift and multiclass imbalance. It combines data sampling and cost-sensitive weighting ideas and introduces an adaptive window method to handle concept drift in data streams. However, it cannot determine the appropriate chunk size and cannot respond to concept drift within individual chunks. The selection-based resampling ensemble (SRE) [16] introduced a selective resampling mechanism by reusing minority class instances from past chunks to balance the class distribution in the current chunk. It then avoids absorbing drifted data by assessing the similarity between minority class instances from past chunks and those in the current chunk. The weighted ensemble with one-class classification and over-sampling and instance selection (WECOI) [17] combines k-nearest neighbors with minority class instances in the current chunk to generate new synthetic minority class instances to balance the class distribution, and it uses instance selection techniques to replace old instances with new ones for updating data chunks. The above algorithms all use a fixed chunk size, which may not be suitable for cases in which the data stream changes frequently and the instance distribution is unstable. Therefore, the adaptive chunk-based dynamic weighted majority (ACDWM) [18] proposes a method for adaptively selecting chunk sizes through statistical hypothesis testing and dynamically weighting classifiers in an ensemble based on their performances on the current data chunk, addressing the limitations of a fixed chunk size.

In online learning methods, one instance is processed at a time, which can also address the difficulty in determining the block size in block-based methods and is more sensitive to concept changes. In recent years, an increasing number of researchers have focused on online learning from data streams, with most studies adopting the concept of online bagging (OB) [19], utilizing Poisson distribution

k ~ P o i s s o n (λ)

for online resampling. For example, LB and MOOB manipulate the parameter λ to determine the sampling rate of instances, thereby obtaining approximately balanced training data and training balanced classifiers. The adaptive random forests with resampling (ARFre) [20] extends and inherits the concept drift detection mechanism of ARF [21]. Compared with the original ARF, ARFre changes the calculation method of λ, reducing the chance of majority class instances being learned by ARF trees, while minority classes have a greater chance of learning, thus improving the ability to handle imbalanced data streams. Streaming random patches (SRP) [22] combines random subspaces and online bagging to achieve high precision by training models on random subsets of features and instances. By training a background model and replacing the main model when concept drift is detected to mitigate the impact of concept changes on the main model, but the mechanism of the background model may consume more computing resources. The improved online ensemble (IOE) [23] employs a recall rate for online resampling of instances, where oversampling is conducted if the recall rate is lower than the average level; otherwise, undersampling is performed, thereby allowing IOE to pay more attention to minority class instances in the data stream. However, the undersampling-based approach may result in partial information loss. The cost-sensitive adaptive random forest (CSARF) [24] adopts two cost-sensitive strategies (local and global), employing the Matthew’s correlation coefficient (MCC) [25] instead of accuracy to assign weights to each internal tree, and oversamples minority classes during the learning process. A comparative analysis with ARF and ARFre suggests the effectiveness of the cost-based strategy. The dynamic queue algorithm (DynaQ) [26] utilizes a batch-based sampling approach, creating queues for each class to store recent instances, and evaluates the effects of using four different rate parameters for online resampling of minority classes. In summary, most existing online learning algorithms determine the resampling rate based on class proportions to provide more learning opportunities for minority classes. However, significant information may also exist in instances of other classes, such as those situated at decision boundaries. Thus, determining the sampling level solely based on class proportions is not comprehensive. Moreover, instance labels in real data streams are not immediately available, and in situations with limited label budgets, oversampling strategies should also be adjusted.

2.2. Active Learning Methods for Data Streams

Active learning methods in data streams can selectively label a small number of instances with the highest information or importance in real time to construct predictive models, effectively addressing limited label budgets. Among these methods, the most crucial aspect is the label request strategy, which determines which instances in the data stream should be labeled. A good label request strategy can achieve high model performance with minimal label budgets. Label request strategies are typically categorized into uncertainty-based, random, and hybrid strategies [27]. Uncertainty-based strategies select instances with the highest model prediction uncertainty (i.e., the most informative instances) using certain uncertainty measures, such as entropy, confidence, or margin [28]. Random strategies, on the other hand, randomly select instances from the data stream without any particular selection criteria and are often used to characterize the class distribution status of the current data stream. Hybrid strategies combine uncertainty-based and random strategies, and most existing active learning algorithms are based on hybrid label request strategies.

Minority-driven online active learning (MD-OAL) [29] introduces an abstention mechanism beneficial to minority class instances, coupled with a dynamic threshold adjustment uncertainty strategy, providing an effective approach for handling both concept drift and class imbalance simultaneously. But it is only limited to addressing imbalanced problems in binary classification and cannot handle multiclass imbalances. Reinforcement online active learning ensemble for drifting imbalanced data streams (ROALE-DI) [30] similarly adopts an ensemble classifier framework consisting of a stable classifier and a dynamic classifier group, where the dynamic classifier group adjusts weights using a reinforcement mechanism and proposes a margin-based uncertainty strategy. The combination of these mechanisms improves the classification performances of minority classes and addresses the concept drift issue. However, in dynamically changing data streams, it cannot handle potentially changing class imbalance ratios. For the problem of variable class imbalance, CALMID [13] provides a solution by introducing a new instance training weight formula that comprehensively considers the class imbalance ratio and prediction difficulty, effectively handling changing imbalances. Simultaneously, a variable threshold uncertainty strategy was designed, which addresses the relative roles of majority and minority classes by utilizing an asymmetric marginal threshold matrix. Recently, a network traffic classification framework (MicFoal) [31] based on online active learning for multiclass imbalance and concept drift was proposed, designing an uncertainty label request strategy based on a variable minimum confidence threshold vector to determine whether to request labels based on the features of each instance. However, it is primarily designed for specific network traffic classification tasks, its adaptability to other types of data streams may be limited, and it cannot handle concept drift.

3. The Proposed AdaAL-MID Algorithm

3.1. Motivation

Because of the continuous developments in the field of data stream mining, many algorithms have been proposed in recent years for learning from data streams. However, the issues of multiclass imbalance and concept drift within data streams have not been widely studied, especially when these two phenomena occur simultaneously. Additionally, most algorithms addressing such issues assume that labels for all instances in the data stream are immediately available (i.e., supervised learning algorithms), overlooking the problem of labels potentially not arriving in a timely manner. To address the additional challenge of limited labels, researchers have turned their attention to active learning [28,29,30] and learning from data streams under limited label budgets. Existing active learning solutions involve selecting a finite number of instances for labeling within a fixed budget, which can be considered as uniform allocation of the budget across the data stream. However, in scenarios with concept drift, classifiers need to swiftly adjust during drift periods to adapt to new concepts. However, under a fixed budget, it is not possible to ensure that instances of new concepts are learned in a timely manner. Therefore, this paper proposes a dynamic label budget adjustment strategy for scenarios with concept drift, aiming to learn from data streams containing concept drift, thereby reducing the impact of concept drift on classifiers and accelerating recovery.

The key to active learning is to find the most important instances for labeling and to explore the instance space as much as possible. In the case of a limited label budget, developing a well-designed label request strategy is pivotal for enhancing the performance of active learning. Most active learning methods use a mixed strategy, where the core of the mixed strategy is the uncertainty strategy. The uncertainty strategy aims to select instances that are most uncertain or most representative in predictions, typically those near the decision boundary. Existing methods mostly determine whether to request labels by comparing the uncertainty measures of instance predictions with a threshold. However, most of them consider only one measure, namely, the difference between the probability of the most predicted class and the second most predicted class, which is not comprehensive for identifying representative instances. Additionally, they use original thresholds in comparisons, thus failing to emphasize the importance of new instances at concept drift points. Therefore, this paper proposes a hybrid label request strategy, including an uncertainty strategy based on a dual-margin dynamic threshold matrix and a random strategy. The uncertainty strategy combines two predictive uncertainty measures, complementing each other to better identify instances with high information content in the data stream. This increases the uncertainty threshold under a dynamic labeling budget, favoring a higher selection probability for instances during concept drift and for minority class instances.

Learning from multiclass imbalanced data streams is also a major challenge, as minority classes are difficult to classify correctly because of their insufficient representation. The most common approach to address such issues is to combine resampling with online ensembles, but previous methods did not consider label budgets during oversampling. Theoretically, with lower budgets, fewer instance labels are accessible in the data stream; hence, the oversampling factor should be higher. Additionally, existing methods consider only class proportions when performing oversampling, meaning higher levels of oversampling are applied only to minority class instances. They overlook the classifier’s own classification performance, even though the classifier may have poor performance on majority class instances. Therefore, this paper proposes an instance-adaptive sampling strategy based on label budgets, which comprehensively considers class proportions and classification performance to determine the sampling rate of instances. Moreover, since each base classifier in the ensemble exhibits a different classification performance for different classes in the data stream, this strategy also ensures that the ensemble has higher diversity.

3.2. Dynamic Label Budget Strategy in Concept Drift Scenarios

When concept drift will occur in a data stream is uncertain, but a concept drift detector can effectively help identify it. It monitors the performance of base classifiers to infer changes in the data distribution and issues an alert when the performance of a base classifier is significantly affected, indicating potential concept drift. There are various choices for drift detectors, such as the drift detection method (DDM) [32] and adaptive windowing (ADWIN) [8]. In this paper, ADWIN was used as the drift detector, with two different thresholds set for ADWIN: one serves as a drift alarm to warn of impending concept drift, and the other acts as a drift confirmer to verify that concept drift has occurred in the data stream. Additionally, ADWIN is used as a trigger for ensemble updates. When ADWIN detects a change, a new classifier is trained to replace the worst-performing classifier in the ensemble. The initialization of the new classifier is performed on a sliding window that holds the most recent instances.

This paper proposes a dynamic label budget strategy based on three stages (stable, warning, and confirmation). During the stable stage of the data stream, the label budget is consistently maintained at the default setting B. Subsequently, paired ADWINs running on each base classifier detect concept drift in the data stream. If at least one classifier reports a warning or confirms a drift, corresponding budget adjustments are made. The adjustment of the current label budget of the data stream is shown in Equation (1). When a warning is detected, the budget temporarily increases to 1.5B for size_w instances following the warning flag t_warning. If no drift is detected afterward, the budget reverts to the default setting. When a drift is confirmed, the budget increases further to 2B and remains at this level for the subsequent size_d instances to learn more instances and adapt to the new concept. Finally, the budget reverts to the default setting.

b u d g e t = \{\begin{cases} 2 B, if detect drift & t_{d r i f t} \leq t \leq t_{d r i f t} + s i z e_{d} \\ 1.5 B, if detect warning & t_{w a r n i n g} \leq t \leq t_{w a r n i n g} + s i z e_{w} \\ B, otherwise \end{cases}

(1)

This strategy allows the algorithm to temporarily increase the number of labeled instances during drift and then return to the original level, indirectly reducing the number of labeled instances during the stable phase. By flexibly allocating the labeling budget in the context of concept drift, the impact of concept drift on the classifier is reduced, and the classifier quickly adapts to new concepts. In this paper, the default settings are size_w = 50 and size_d = 100.

3.3. Uncertainty Strategy Based on Dual-Margin Dynamic Threshold Matrix

This section introduces the proposed hybrid label request strategy, including uncertainty strategy and random strategy. The uncertainty strategy is driven by a dual-margin dynamic threshold matrix to select instances with high information content from the data stream. The random strategy marks instances from the data stream without selection, ensuring that this subset of instances reflects the class distribution of the data stream. It is noteworthy that most multiclass imbalanced active learning algorithms use a label sliding window to estimate class distribution [13], with the imbalance ratio for each class calculated as shown in Equation (2). Here, labelnum_y represents the count of instances labeled as y within the sliding window, sizelab denotes the size of the window, numNull denotes the number of instances selected by the uncertainty strategy (used as placeholders), and C is the number of classes. Although this method can approximate the class distribution in the data stream, its accuracy heavily depends on the size of the label sliding window. A window that is too small may lack sufficient instances to represent the distribution, while a window that is too large may include outdated concepts, both of which can affect the accurate estimation of the current class distribution.

i m b_{y} = l a b e l n u m_{y} / ((s i z e l a b - n u m N u l l) / C)

(2)

To more accurately estimate the class imbalance state of the data stream, this paper employs a time decay model to simulate class proportion changes. This model persists throughout the entire learning process of the data stream and is updated only by labeled instances selected by the random strategy, ensuring it reflects the true distribution of the data stream. As shown in Equation (3), the initial distribution of each class cd[i] is set to 1/C,

i = {1, \dots, C}

. Here,

α \in (0, 1)

represents the time decay factor, used to control the rate at which old instances are forgotten. For each new instance selected by the random strategy, all class distributions are updated based on the requested label y, increasing the corresponding cd[y], while simultaneously decaying the other classes. This strategy can effectively estimate the overall distribution of the data stream and reduce the impact of outdated concepts. Based on this, a new imbalance ratio calculation formula is introduced, as shown in Equation (4), and used in subsequent steps.

c d [i] = \{\begin{cases} α * c d [i] + (1 - α), if y = i \\ α * c d [i], else \end{cases}

(3)

i m b_{y} = c d [y] / ((\sum_{i \in C} c d [i]) / C)

(4)

The proposed label request strategy begins with the random strategy. If the conditions for the random strategy are met, the true labels of instances are requested; otherwise, the uncertainty strategy is employed. In the uncertainty strategy, two measures of predictive uncertainty, namely, minimum marginal uncertainty (margin_min) and maximum marginal uncertainty (margin_max), are combined. margin_min is calculated as the difference between the probability of the most likely class and the probability of the second most likely class, while margin_max is calculated as the difference between the probability of the most likely class and the probability of the least likely class. These measures serve to indicate the certainty or uncertainty of the classifier’s prediction for an instance, with a significantly larger probability for the most likely class compared to the second most likely or least likely classes indicating certainty and vice versa. Therefore, active learning selects instances with smaller margin_min or margin_max for requesting and learning. Based on these two measures, dual-margin dynamic threshold matrices

M_{1}

and

M_{2}

are proposed, both of which are matrices with dimensions of

C \times C

. Initially, each element of

M_{1}

is set to

θ_{1}

, and each element of

M_{2}

is set to

θ_{2}

. During the label request process, the marginal values margin_min and margin_max of instances are compared with the corresponding thresholds in

M_{1}

and

M_{2}

, respectively, to select instances with greater information content. Furthermore, to emphasize the learning rate of instances during concept drift, the comparison process considers not only the original elements of

M_{1}

and

M_{2}

but also the label budget. The dynamic thresholds used for comparison in

M_{1}

and

M_{2}

are calculated as shown in Equation (5). During the stable phase of the data stream (i.e., budget = B), the original thresholds are used. When a drift warning or confirmation occurs in the data stream, the thresholds are raised to increase the learning opportunities for new concept instances.

d m_{1} = M [y_{1}] [y_{2}] + b u d g e t - B, d m_{2} = M [y_{1}] [y_{\min}] + b u d g e t - B

(5)

Algorithm 1 shows the pseudocode of the proposed uncertainty strategy. For a new incoming instance, x, the ensemble, E, is first used for prediction (line 1), where y_c1 represents the class with the highest probability in the ensemble prediction, y_c2 is the second highest probability class, and y_min is the least probable class. Then, the minimum margin, margin_min(x), and maximum margin, margin_max(x), of x are calculated, and the dynamic thresholds dm₁ and dm₂, used for comparison, are obtained through Equation (5) (lines 2–4). If margin_min(x) is less than dm₁ or margin_max(x) is less than dm₂, this indicates that the ensemble cannot predict the class of x with certainty. Therefore, the true label of x is requested, and the corresponding elements of the dual-margin threshold matrix are adjusted based on the prediction result of x. The adjustment process is as follows (lines 8–19).

Algorithm 1 Uncertainty Strategy

Input: ensemble classifier E, processed instance x, current class distribution cd[C], minimum margin threshold matrix M₁, and maximum margin threshold matrix M₂.

Output:

l a b e l i n g \in {true, false}

// request label?

Process:

1. Make predictions on x by E.

2. Calculate

m a r g i n_{\min} (x) = p^{E} (y_{c 1} | x) - p^{E} (y_{c 2} | x)

.

3. Calculate

m a r g i n_{\max} (x) = p^{E} (y_{c 1} | x) - p^{E} (y_{\min} | x)

.

4. Calculate dynamic thresholds dm₁ and dm₂ using Equation (5).

5. if (margin_min(x)≤ dm₁ || margin_max(x)≤ dm₂) then //meeting the uncertainty strategy

6. labeling = true.

7. Get real label y of x and calculate imb_y using Equation (4).

8. if (y==y_c1) then

9. M₁[y_c1][y_c2] = M₁[y_c1][y_c2] × (1 − β × margin_min(x)).

10. M₂[y_c1][y_min] = M₂[y_c1][y_min] × (1 − β × margin_max(x)).

11. if (imb_y > 1) then

12. M₁[y_c1][y_c2] = M₁[y_c1][ y_c2]×(1 – β × margin_min(x)).

13. M₂[y_c1][y_min] = M₂[y_c1][ y_min]×(1 – β × margin_max(x)).

14. else if (y==y_c2 && imb_y > 1) then

15. M₁[y_c1][y_c2] = M₁[y_c1][ y_c2] × (1 – β × margin_min(x)).

16. M₂[y_c1][y_min] = M₂[y_c1][ y_min] × (1 – β × margin_max(x)).

17. else

18. M₁[y_c1][y_c2] = M₁[y_c1][ y_c2]×(1 + β×margin_min(x)).

19. M₂[y_c1][y_min] = M₂[y_c1][ y_min]×(1 + β×margin_max(x)).

20. end if

(1): If y = y_c1, indicating that the ensemble prediction is correct, the classification difficulty of x is not high at this time. Therefore, the corresponding elements of the dual-margin threshold matrices M₁ and M₂ should be reduced to decrease the frequency of labeling requests for such instances, thereby leaving learning opportunities for other challenging instances. Additionally, if x belongs to the majority class (i.e., imb_y > 1), the threshold should be further decreased. This approach aims to reduce the learning opportunities for majority class instances, thereby enhancing the importance of minority classes.
(2): Otherwise, when y = y_c2 and imb_y > 1, the threshold is only lowered once. This means that the ensemble’s prediction is close to accurate, and lowering the threshold is also to further reduce the learning opportunities for majority class instances. It should be noted that although the adjustment conditions for M₁ and M₂ are consistent, the magnitudes of the adjustment are different depending on the corresponding margin_min(x) or margin_max(x), where β is a tuning factor between 0 and 1.
(3): Finally, if y ≠ y_c1 and y ≠ y_c2, this indicates that the ensemble’s prediction result is very poor, and this class instance is difficult to classify by the current ensemble. Therefore, the corresponding threshold should be increased to ensure that this class instance has more learning opportunities.

In summary, the core of the active learning algorithm proposed in this paper is the uncertainty strategy leveraging the dual-margin dynamic threshold matrix, which is used to select instances with the maximum information and minority class instances for labeling from the data stream, and is used for ensemble learning. At the same time, combined with the random strategy, instances selected by the random strategy can well characterize the current class distribution of the data stream, which complements and improves the uncertainty strategy.

3.4. Instance-Adaptive Sampling and Weighted Ensemble Strategy

The performance of ensemble classifiers is closely related to the distribution of instances and the diversity among base classifiers, especially in multiclass imbalanced data streams with limited label budgets. To address the issue of insufficient labeled instances, it is necessary to oversample labeled instances and increase the learning rate of minority class instances. Additionally, as the classification performance of base classifiers varies within the ensemble, different levels of oversampling should be applied to the same instance, ensuring a higher diversity within the ensemble. The algorithm proposed in this paper learns from multiclass imbalanced data streams under limited label budgets, where the sampling frequency of instances is determined based on class proportions and classification performance. This paper defines classification performance as the real-time recall of base classifiers, which effectively reflects the performance of classifiers in imbalanced data streams. The formula for calculating the recall is shown in Equation (6).

Recall = \frac{TP}{TP + FN}

(6)

Firstly, given a new instance (x, y), the adaptive sampling weights of each base classifier in the ensemble for this instance are as shown in Equation (7), where budget represents the current label budget, y denotes the class of the current instance, cd is the previously calculated class distribution, E is the current ensemble, and j represents the index of the base classifier. By using the instance weight formula inversely proportional to the label budget, the problem of insufficient labeled instances can be effectively addressed. Under different cases, the lower the recall rate, Recall_E[j], of the classifier, the smaller the class weight, cd[y], of the instance and the higher the corresponding instance sampling weight, SW_j. Therefore, during the training process, difficult classification instances and minority class instances can receive sufficient attention. During the sampling process, the algorithm follows the idea of online bagging and implements online resampling using the Poisson distribution, where k~Poisson (SW_j) yields the oversampling frequency of each base classifier for the current instance.

S W_{j} = \frac{1}{b u d g e t} (1 + \frac{M a x (R e c a l l_{E})}{R e c a l l_{E} [j]} * \frac{M a x (c d)}{c d [y]})

(7)

In the weighted ensemble stage, because of the varying performance of each base classifier in the ensemble, it is necessary to weight the predictions of the base classifiers. This paper uses the combined results of the base classifier recall rate and kappa coefficient for classifier weighting. The kappa coefficient is also a commonly used performance metric in imbalanced data streams, which evaluates classifier performance based on the consistency of predicted results and data statistical distributions. The weight calculation of base classifier j in the weighted prediction stage is shown in Equation (8). It is noteworthy that the recall rate adopts the single-class recall rate on the current training instance, so the better the classification performance of the base classifier on the class to which the instance belongs, the higher its weight. By comprehensively considering these two metrics, weights can be flexibly allocated among the base classifiers in the ensemble to achieve good classification results.

C W_{j} = \sqrt{K a p p a [j] * R e c a l l_{E}^{y} [j]}

(8)

3.5. Overall Framework and Process of the AdaAL-MID

This section presents the framework and workflow of the proposed adaptive active learning algorithm. The algorithm framework, as shown in Figure 2, consists of an ensemble classifier, an ADWIN detector for drift detection, and an instance sliding window for ensemble updating. The working steps of the AdaAL-MID are as follows:

(1): For each new incoming instance, the ensemble classifier first predicts it and outputs the prediction result.
(2): Enter the mixed label request strategy; if the label request strategy is met and the label budget is sufficient, then request the true label. Otherwise, if the label request strategy is not met or the label budget is insufficient, proceed to the next instance.
(3): Combine the true label with the instance to form a new training instance, and store the new training instance into the corresponding sliding window according to the label. Update the ensemble classifier online through the adaptive sampling strategy and the weighted ensemble strategy.
(4): If the drift detector ADWIN detects a change, create a new base classifier, and sort the instances in the instance sliding window by time to form an instance sequence for initializing the new classifier. Finally, replace the worst classifier in the ensemble.
(5): Adjust the label budget to the corresponding value based on the detected drift flag.

Algorithm 2 presents the pseudocode for the learning process of the AdaAL-MID algorithm. In the initial learning phase of the data stream, the ensemble classifier is initialized by directly requesting the true labels of instances, with the initial phase consisting of sizeW instances (a sliding window size). After initialization, the decision to request label for subsequent instances is determined by the mixed-label request strategy. First, a random value

δ \in [0, 1]

is generated through the random strategy. If this value is less than the random selection rate R, the random strategy is satisfied, and the true label y of the instance is requested. The current class distribution cd is updated based on its class (lines 5–8). If the random strategy does not satisfy the condition, the uncertainty selection strategy activates, and algorithm 1 determines whether to request a label (lines 9–10). If the label request strategy is satisfied and there is a surplus in the current label budget (i.e., (l/p) < b), the true label of the instance is requested, the number of marked instances l is increased by one, and the training phase is entered (lines 11–13). First, according to the class to which the instance belongs, y, the instance is stored in the corresponding class’s sliding window Win[y] (line 14). Next, the corresponding adaptive sampling strategies and weighted ensemble strategies are executed for each base classifier in the ensemble, considering the class proportions and classifier performance comprehensively to determine the instance sampling weight, SW_j, and obtaining the online resampling count through a Poisson distribution

k ~ p o i s s o n (S W_{j})

. Then, the weight, CW_j, of the base classifier is obtained through a weighted ensemble strategy combining the recall rate and kappa coefficient, and the ensemble classifier is updated online based on this weight (lines 15–20). Finally, ADWIN is used to detect potential concept drifts in the data stream. If a drift flag is successfully triggered, a new classifier, C_new, is retrained on the most recent instances in the sliding window, replacing the worst classifier in the ensemble. The label budget is dynamically adjusted according to the strategy proposed in Section 3.2 under this drift flag. This process is repeated until learning is complete (lines 21–25).

Algorithm 2 AdaAL-MID

Input: data stream, S; number of classifiers, N; size of instance sliding window, sizeW; initial value of M₁, θ₁; initial value of M₂, θ₂; labeling budget, b.

Symbols: ensemble classifiers, E; number of processed instances, p; number of labeled instances, l; minimum margin threshold matrix, M₁[C][C]; maximum margin threshold matrix, M₂[C][C]; random selection ratio, R; instance sliding window, Win[C][sizeW/C]; and current class distribution, cd[C].

Output: final ensemble classifier, E, and prediction result.

Process:

1. While (S.hasNext()) do

2. x = S.nextInstance()

3. p = p + 1. // processed instances plus one.

4. labeling = false.

5.

generate a random number δ \in [0, 1]

// random strategy

6. if (p < sizeW or δ < R) then
7. labeling = true.

8. update the current class distribution cd based on Equation (3)

9. else if (UncertaintyStrategy(E, x, cd, M₁, M₂)) // uncertainty strategy

10. labeling = true.
11. if (labeling and (l/p) < b) // surplus in the label budget

12. y = x.classvalue(). // request the real label y of instance x

13. l = l + 1. //number of instances labeled plus one

14.

W i n [y] \leftarrow W i n [y] \cup (x, y)

// store instances (x, y) in the corresponding instance sliding window Win[y] based on label y

15.

for j \in E

do // adaptive sampling strategy

16. calculate the sampling weight SW_j on instance (x, y) using Equation (7).
17.

calculate the retraining times k ~ p o i s s o n (S W_{j})

18. calculate the classifier weight CW_j on instance (x, y) using Equation (8).

19.

for n \in \{1, \dots, k\}

do

20. update classifier j on instance (x, y) with the weights of the classifier CW_j.

21. if ADWIN detect changes

22. create a new classifier C_new and initialize it on Win[C][sizeW/C].

23. replace the worst classifier in ensemble E with C_new

24. change the current label budget b using Equation (1). // dynamically adjust label budget based on drift level

25. end while

3.6. Time and Space Complexity

This section analyzes the time and space complexity of the AdaAL-MID as follows:

(1): Time Complexity: Assume data stream S has n instances, the label budget is b, the ensemble has k base classifiers, and the number of concept drifts is num. The time required to decide whether to request the label for each instance using the hybrid label strategy is T_r, the time required for each base classifier to learn each instance is T_l, the time to create a base classifier is T_c, and the size of the instance sliding window is sizeW. Therefore, the time complexity of the AdaAL-MID is $O (n \times (T_{r} + b \times k \times T_{l}) + k \times T_{c} + n u m \times (T_{c} + T_{l} \times s i z e W))$ .
(2): Space Complexity: The AdaAL-MID algorithm utilizes an ensemble of classifiers with a fixed number of base classifiers. Let the space required to store a base classifier trained on L instances be denoted as $O (S (L) \times k_{t o t a l})$ . Meanwhile, the auxiliary structure used—a sliding window—has a fixed size, so the space consumed remains constant, resulting in a space complexity of O(1). Additionally, compared to the state-of-the-art active learning algorithm CALMID, which requires another auxiliary structure (a label sliding window with an array size of 500), AdaAL-MID only requires a one-dimensional array corresponding to the number of classes in the data stream to assess class imbalance. Therefore, the space required by AdaAL-MID is primarily influenced by the training process and the number of base classifiers.

4. Experimental Evaluation and Analysis

This chapter details the experimental section. The proposed AdaAL-MID algorithm, along with all the comparison algorithms, was implemented and tested within the Massive Online Analysis (MOA) platform [33]. The experiments were run on a machine with an i7-12700H CPU, 2.70 GHz processor, and 16 GB of memory. Section 4.1 introduces the basic experimental setup and the evaluation metrics used. Section 4.2 presents the parameter experiments. Section 4.3 reports the results of the AdaAL-MID compared with nine other representative ensembles (LB [7], MOOB [9], ARFre [20], SRP [22], CSARF [24], ROSE [10], ROALE-DI [30], CALMID [13], and MicFoal [31]) across multiple evaluation metrics, and provides a comprehensive analysis. The first six algorithms are supervised learning algorithms, while the others are active learning algorithms. Section 4.4 provides a statistical analysis of all algorithms. Section 4.5 presents the ablation study of the AdaAL-MID algorithm.

4.1. Experimental Setup

The experiment utilized synthetic data streams and real-world data streams. Table 1 presents detailed information about the data streams, including the number of instances, number of attributes, number of classes, specific class distribution, and types of concept drift. Among them, synthetic data streams were generated using MOA’s RandomRBF and RandomTree stream generators to simulate different types of data streams. Subsequently, the synthetic data streams were used to test the classification performances of the AdaAL-MID under different conditions. In the synthetic data streams, two different types of data streams were set up. Lines 1–9 in Table 1 show the data streams with fixed class proportions. Lines 10–18 show data streams with dynamically changing class proportions. The changes in class proportions are combined with drift points, meaning that after a concept drift, the class distribution changes or reverses. Data streams Var_Stream_1 and Var_Stream_2 use virtual drift to trigger class proportion reversals.

The real data streams included Covtype, Shuttle, hypothyroid, connect-4, diabetes, and CIC_ISCX. These real-world data streams exhibit multiclass imbalance, but the number and types of concept drift are unknown, as shown in lines 19–24. The real data streams are described as follows:

Covtype (https://archive.ics.uci.edu/datasets (accessed on 5 March 2024)): This dataset contains forest resource data from Roosevelt County, Colorado, USA, and is designed to predict vegetation types in the forest. Specifically, the goal is to predict the vegetation type of each land plot based on various geographic and environmental features.
Shuttle (http://www.keel.es (accessed on 5 March 2024)): this dataset originally comes from NASA’s spacecraft landing control system simulation data and is used to distinguish among different flight states.
Hypothyroid ⁽https://archive.ics.uci.edu/datasets (accessed on 5 March 2024)): widely used in medical diagnostics, particularly for the classification task of hypothyroidism.
Connect-4 (http://www.keel.es (accessed on 5 March 2024)): This dataset is based on the classic board game “Connect Four” and is intended to predict the final outcome (win, loss, or draw) for the current player under a given game state.
Diabetes (https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset (accessed on 17 March 2024)): This dataset comes from the Behavioral Risk Factor Surveillance System (BRFSS) in the United States, which is a health-related telephone survey of patients with diabetes. The task is to predict the physical condition of the patients.
CIC_ISCX (https://www.unb.ca/cic/datasets/index.html (accessed on 17 March 2024): This dataset is a network intrusion detection dataset obtained from the Kaggle platform, created by the Canadian Institute for Cybersecurity (CIC). It is used to identify whether each network traffic instance belongs to a certain type of attack or is normal traffic.

There are various commonly used evaluation metrics in the classification of multiclass imbalanced data streams. This paper adopts recall, accuracy, G-mean, F1-score, and kappa as evaluation metrics. In binary classification, recall focuses on the prediction performance for minority class instances, while G-mean is the square root of the product of true positive rate (TPR) and true negative rate (TNR). Extended to the multiclass domain, recall represents the average of the recall rates for each class, while G-mean represents the nth root of the product of the recall rates for each class. Accuracy is one of the most commonly used evaluation metrics, representing the prediction performance for all instances. However, it may be less accurate in imbalanced data, so this paper comprehensively considered accuracy along with other metrics to fully evaluate the classifier performance. F1-score is the harmonic mean of precision and recall, attempting to balance these two metrics. Kappa evaluates classifier performance by calculating the consistency between predicted results and statistical distribution of the data, ranging from [−1, 1]. A value of 0 indicates performance equivalent to a random classifier, while a value of 1 indicates the classification is completely correct.

4.2. Parameter Sensitivity Experiments

In this section, parameter experiments were conducted, keeping other parameters constant, for the following four variables: number of base classifiers, N; size of sliding window, sizeW; initial value of elements in the minimum marginal threshold matrix, θ₁; and the initial value of elements in the maximum marginal threshold matrix, θ₂. The range of N was [5, 20], the range of sizeW was [250, 1000], the range of θ₁ was [0.4, 0.6], and the range of θ₂ was [0.7, 0.9]. Experiments were conducted on two data streams with mixed-type concept drift, as follows: Fixed_Mix_SIG and Var_Mix_SIG. The former has a fixed class proportion, while the latter experiences changes or flips in class proportion as the data stream progresses.

As shown in Figure 3a,e, the performance of the algorithm gradually improved with the increase in the number of base classifiers, with the greatest improvement occurring when the number increased from 5 to 10. However, after N = 10, further increases in the number of base classifiers did not significantly enhance the algorithm’s performance, particularly with the Var_Mix_SIG data stream, where increasing the number of base classifiers hardly improved the performance. This indicates that the ensemble already possesses sufficient diversity, and further increasing the number of base classifiers has little effect on performance improvement. Simultaneously, the default setting for the other ensemble classification algorithms was 10, so the default value for parameter N was set to 10. As shown in Figure 3b,f, the best effect was achieved when the window size was sizeW = 500. An appropriate window size ensures that each class contains a sufficient number of instances and avoids the influence of outdated concepts. Figure 3c,g shows the impact of the initial value of the minimum marginal threshold matrix element, θ₁, on the algorithm’s classification performance. When θ₁ was 0.5, all of the algorithm’s metrics were optimal. When θ₁ was less than 0.5, the identification of difficult instances became stricter, leading to insufficient learning of representative instances and resulting in decreased classification performance. When θ₁ was greater than 0.5, the likelihood of querying labels for nonrepresentative instances increased, which encroached on the learning opportunities for difficult instances, thus affecting the algorithm’s performance. Similarly, as shown in Figure 3d,h, the optimal value for the initial value of the maximum marginal threshold matrix element θ₂ was 0.75.

In summary, the default values for the parameter combination chosen for the proposed algorithm were N = 10, sizeW = 500, θ₁ = 0.5, and θ₂ = 0.75, and these settings were followed in subsequent experiments.

4.3. Algorithm Comparison Experiments

In this section, AdaAL-MID is compared with other data stream classification algorithms, including the following nine comparative algorithms: LB, MOOB, ARFre, SRP, CSARF, ROSE, ROALE-DI, CALMID, and MicFoal. Among the supervised learning algorithms, LB and SRP are general ensembles with concept drift handling mechanisms, MOOB is a specific ensemble for imbalanced data streams, and ARFre, CSARF, and ROSE are advanced algorithms addressing the joint challenge of class imbalance and concept drift. ROALE-DI, CALMID, and MicFoal are the latest active learning algorithms addressing joint problems, with MicFoal-E being an ensemble variant of MicFoal for comparison with other ensemble classification algorithms. In this section, comparisons and analyses were conducted on 18 synthetic data streams with fixed or variable class ratios and 6 real data streams, followed by an overall performance evaluation.

To ensure the fairness of the experiments, the number of base classifiers for all ensemble algorithms was set to 10, the label budgets for the AdaAL-MID and the other three active learning algorithms were set to 0.2, the random selection ratios were set to 0.1, and the window sizes for the algorithms using sliding windows were set to 500, with other parameters following the default settings in their original papers. The algorithms were evaluated using the following five metrics: recall, accuracy, G-mean, F1-score, and kappa. Table 2, Table 3, Table 4, Table 5 and Table 6 present the detailed results and rankings of each algorithm for each metric, with the best values highlighted in bold (“/” indicates no valid value was obtained). The average ranks (Avg. Rank) of each algorithm for each type of data stream are provided at the end of the corresponding data stream type, and the overall average rank (Total Rank) for all data streams is provided at the end.

4.3.1. Comparison and Analysis of Data Streams with Fixed Class Proportions

This section presents the experimental results of the AdaAL-MID and comparison algorithms on the data streams with fixed class proportions. As shown in Table 2, Table 3, Table 4, Table 5 and Table 6, the AdaAL-MID achieves the best G-mean and kappa values across nine data streams. Specifically, for the Fixed_Mix_SIG data stream with mixed concept drift, AdaAL-MID’s G-mean and kappa values were 1.57% and 0.54% higher, respectively, than those of the second-ranked CALMID algorithm. In terms of recall, accuracy, and F1-score, while AdaAL-MID did not achieve the best results for some data streams, its performance lagged only slightly behind some of the advanced supervised learning algorithms and remained the best among active learning algorithms. Specifically, for the Fixed_Mix_IGGI data stream, AdaAL-MID’s accuracy was only 0.03% lower than that of the top-ranked ARFre. Additionally, AdaAL-MID’s average rankings on the five metrics were 1.22, 1.11, 1.00, 1.56, and 1.00, outperforming other comparison algorithms.

To more intuitively show the performance changes in the algorithm during the learning process for data streams with fixed class proportions, especially its ability to handle concept drift, this section uses recall change curves to show the real-time performance of each algorithm, with the recall calculated every 1000 instances. Figure 4 shows the recall curves of all algorithms on nine data streams. As shown in the figure, AdaAL-MID’s overall performance is superior to other comparison algorithms. On Fixed_Stream_1 and Fixed_Stream_2 data streams without concept drift, all algorithms exhibit relatively good performance, with AdaAL-MID consistently maintaining a high level. In the presence of concept drift in the data streams, all algorithms showed a significant performance decline. As shown in Figure 4c–f, during sudden or gradual drift, the AdaAL-MID was less affected at the drift point compared with other algorithms and could recover and maintain high performance more quickly. The most affected algorithm was MOOB, which took a long time to recover due to its lack of a concept drift handling mechanism. For mixed drift data streams with incremental drift, as shown in Figure 4g–i, the AdaAL-MID’s performance in the initial stages of incremental drift was slightly lower than that of the supervised learning algorithms CSARF and ROSE but better than other active learning algorithms. After learning a certain number of instances, AdaAL-MID can maintain high performance and become comparable to the supervised algorithms.

It can be seen that the AdaAL-MID handles imbalances in data streams well and adapts to the occurrence of concept drift.

4.3.2. Comparison and Analysis for Data Streams with Variable Class Proportions

This section presents the experiment’s results for data streams with dynamically changing class proportions. As shown in Table 2, Table 3, Table 4, Table 5 and Table 6, the AdaAL-MID also ranked first overall, except on the Var_Mix_ISSI data stream with mixed concept drift, for which ROSE achieved the best performance, with the AdaAL-MID achieving the best performance on other data streams. Moreover, the performance of the AdaAL-MID was superior to other active learning algorithms. On the Var_Mix_IGGI data stream, the AdaAL-MID improved the recall, G-mean, and F1-score by 2.31%, 2.70%, and 3.51%, respectively, compared to the second-ranked CALMID. Overall, the AdaAL-MID can adapt well to various types of concept drift and handle changes in class proportions.

Similarly, Figure 5 shows the recall curves of all comparison algorithms. As shown in Figure 5a,b, the AdaAL-MID was less affected by virtual drift compared to other algorithms and could quickly recover to a high level of performance. For data streams with real concept drift, as shown in Figure 5c–f, the AdaAL-MID performed better on gradual drift data streams Var_Grad_1 and Var_Grad_2 than on abrupt drift. Not only does it suffer less performance loss at drift points, but it also maintains the highest recall level during the stable phase of the data stream. Although AdaAL-MID’s performance was slightly lower than CALMID and MOOB for some stages of the abrupt drift data streams Var_Sudd_1 and Var_Sudd_2, overall, the AdaAL-MID still outperformed other algorithms on data streams with mixed concept drift types, as shown in Figure 5g–i. Similar to its performance on data streams with fixed class proportions, the AdaAL-MID’s performance was relatively poor when incremental drift occurred, which may be an area for improvement in the future. On the Var_Mix_SIG data stream, which contains three types of drift, the AdaAL-MID demonstrated excellent performance. When abrupt drift occurred at instance 150k, it quickly recovered its performance, and its recall rate ranked first among all algorithms after recovery.

Based on the above analysis, the AdaAL-MID also demonstrated excellent performance in scenarios in which class imbalances may change or even reverse.

4.3.3. Comparison and Analysis on Real Data Streams

This section presents the experimental results on the real data streams. As shown in Table 2, Table 3, Table 4, Table 5 and Table 6, the AdaAL-MID ranked first in recall and G-mean on the diabetes data stream, first in accuracy on the hypothyroid and connect-4 data streams, and first in F1-score on the shuttle data stream. It is worth noting that in multiclass classification evaluation metrics, G-mean, as the geometric mean of class recall rates, may be 0, for example, LB, SRP, ROALE-DI, and MicFoal-E for the shuttle data stream. This is because in real multiclass data streams, instances of the smallest class may be extremely limited, resulting in a class recall rate of 0. Although the AdaAL-MID’s performance was relatively inferior to supervised learning algorithms on some data streams, the applicability of each supervised algorithm varied. It can be observed that no single algorithm can maintain a first rank on all data streams. The AdaAL-MID’s average rankings for the five indicators of real data streams were 2.50, 3.00, 2.83, 2.67, and 3.33. With the exception of being lower than the SRP’s 2.00 on accuracy and lower than the ROSE’s 2.50 and SRP’s 3.00 on kappa, it ranked first for the other three indicators. In addition, the AdaAL-MID’s performance on each real data stream was superior to the current state-of-the-art active learning algorithms.

Figure 6 displays the real-time recall change curves of all algorithms on the five real data streams. On the Covtype, hypothyroid, connect-4, and diabetes data streams, the AdaAL-MID outperformed the other active learning algorithms significantly, except for the initial stage of the Shuttle data stream, where it performed relatively poorly compared to CALMID. Except for the hypothyroid data stream, the AdaAL-MID exhibited a competitive performance against the supervised algorithms and outperformed the current state-of-the-art ensemble algorithm ROSE on four data streams. Additionally, while some algorithms may perform well on certain data streams, their performances may deteriorate on others. For example, although SRP’s recall rate on the Covtype data stream was significantly better than the other algorithms, its performance on the Shuttle and hypothyroid data streams was basically at the bottom. It can be seen that the AdaAL-MID exhibits strong adaptability to different types of real data streams.

4.3.4. Analysis of Algorithm Time Efficiency

This section summarizes and analyzes the results of all comparison algorithms on 24 data streams, as shown in Table 7. The last row represents the average runtime across the 24 data streams. Supervised learning algorithms generally consume more time, since they require access to the labels of all instances in the data stream for training. Among them, MOOB, which uses only a single time-step-based oversampling strategy to handle imbalanced data streams, consumes relatively less time. However, because of the lack of a mechanism to handle concept drift, its performance in practical scenarios is not ideal. Following MOOB are ROSE, ARFre, and LB. LB employs a fixed-rate (λ = 6) oversampling strategy, which increases the time consumption while failing to effectively address minority class instances in imbalanced data streams. ARFre and ROSE use dynamic resampling strategies that adjust the sampling rate based on changes in class proportions. However, unlike ROSE, which performs at different levels of oversampling for each instance, ARFre tends to reduce the probability of majority class instances appearing in the ARF trees, thus having a certain advantage in time efficiency. At the bottom are CSARF and SRP.

The active learning algorithms ROALE-DI, CALMID, MicFoal-E, and AdaAL-MID can complete data stream learning with a limited number of instances under a label budget. In this study, the label budget was uniformly set to 0.2, meaning only 2 out of every 10 instances were selected for model training. The fastest in execution speed are ROALE-DI and MicFoal-E. However, ROALE-DI only considers the importance of minority class instances during the initialization of base classifiers, resulting in poor adaptability to changes in class imbalance in the data stream, thus leading to suboptimal performance. Similarly, MicFoal-E, lacking a mechanism to handle concept drift, performs poorly in continuously changing data streams. The proposed AdaAL-MID algorithm showed slightly better time efficiency than the recent comprehensive active learning algorithm CALMID and outperformed it on the vast majority of data streams. Moreover, compared to state-of-the-art supervised learning algorithms, such as SRP, ROSE, and CSARF, the AdaAL-MID achieved higher performance while consuming less time.

4.3.5. Evaluation and Analysis of Overall Algorithm Performance

To compare the overall performance of the algorithms more clearly, this section evaluates and analyzes the performance of all comparative algorithms on 24 data streams based on five evaluation metrics. Wind rose diagrams are a special type of polar coordinate stacked chart. In this section, they are first used to display the performance of all comparative algorithms on three different types of data streams, synthesizing five metrics. Secondly, radar charts are commonly used to compare multiple indicators. In this section, radar charts are used to separately display the comprehensive performance of algorithms on 24 data streams based on all indicators. Finally, this section further analyzes the performance of the algorithms by comparing AdaAL-MID with the reference algorithms in terms of the number of wins, draws, and losses on all datasets.

Table 8 summarizes the average rankings and meta-rankings of each algorithm based on five indicators, where the meta-ranking integrates the results of all metrics. It can be observed that except for the average rankings of the AdaAL-MID on the accuracy and kappa indicators of real data streams, which are slightly lower than those of the supervised algorithms ROSE and SRP, the AdaAL-MID’s performance was superior to other algorithms, and it ranks first in the meta-ranking. As shown in Figure 7, a wind rose plot compares the rankings of algorithms across three types of data streams. The algorithm with the higher average ranking is represented with a larger weight (i.e., a larger sector area). The sectors are stacked in a clockwise order based on the overall performance across five metrics, from best to worst. From the graph, it can be seen that the AdaAL-MID consistently ranks at the outermost position on the three types of data streams, followed closely by ROSE. CALMID, the active learning algorithm that performs well on synthetic data streams, performed poorly on real data streams. The comprehensive comparison results demonstrate that the AdaAL-MID exhibited outstanding adaptability across different types of data streams and can balance multiple evaluation metrics.

Figure 8 presents the performance radar charts of all algorithms on 24 data streams. Each subfigure in the chart is a regular pentagon radar chart that displays the performance of different algorithms on five metrics for each data stream. The center point represents the lowest value, while the edges represent the highest values. Therefore, the algorithm closest to the outermost ring demonstrates the best performance. For simplicity, A stands for accuracy, K for kappa, Gm for G-mean, R for recall, and F1 for F1-score. As shown in Figure 8, in synthetic data streams, the values of the AdaAL-MID on the five metrics are almost always at the outer edge of the radar chart. Particularly in the Var_Grad_1 data stream, AdaAL-MID’s performance on each metric is significantly better than the other comparison algorithms. In real data streams, the AdaAL-MID achieved the best performance on the diabetes data stream, and its performance in other real data streams was comparable to that of supervised learning algorithms and significantly better than the other three active learning algorithms.

Figure 9 presents the one-to-one comparison results of the AdaAL-MID against other benchmark algorithms across all data streams. The threshold for the difference in each metric was set at 0.2%. If AdaAL-MID outperformed the benchmark algorithm by a margin greater than this threshold, it was marked as a win (green); if the margin was below the threshold, it was marked as an advantageous tie (light green). If the benchmark algorithm outperformed the AdaAL-MID and the margin was below the threshold, it was marked as a disadvantageous tie (yellow); if it exceeded the threshold, it was marked as a loss (red). As can be seen from the figure, compared to the supervised learning algorithms, the AdaAL-MID had more losses on the five metrics only against ROSE, while it had some losses against other supervised algorithms on certain metrics, and it significantly outperformed LB and MOOB. Additionally, compared to the other three active learning algorithms, the AdaAL-MID leads by an absolute margin in every metric.

4.4. Statistical Analysis

In this section, we conducted nonparametric statistical hypothesis tests on the average rankings of all algorithms across 24 experimental data streams for recall, accuracy, G-mean, F1-score, and kappa metrics. Table 9 presents the average rankings of all algorithms across the five metrics. Bonferroni–Dunn test [34] was employed in this study, and a critical difference (CD) of 2.4236 was calculated at α = 0.05 (95% confidence level). The calculation of CD is shown in Equation (9), where k represents the number of algorithms compared, and N represents the number of datasets. If the difference in rankings between two algorithms exceeds the CD value, this indicates a significant difference in performances between the two algorithms; otherwise, there is no significant difference.

The Bonferroni–Dunn test results for all comparison algorithms across the five metrics are shown in Figure 10. Among them, AdaAL-MID ranked first in all metrics, followed closely by ROSE and CALMID. From the graph, it can be seen that the AdaAL-MID not only does not show significant differences from CALMID in the accuracy metric, but it significantly outperforms all other active learning algorithms in the other four metrics. Additionally, compared to the state-of-the-art supervised ensemble algorithm ROSE, the AdaAL-MID also demonstrated competitive results. On the other hand, the AdaAL-MID significantly outperforms all supervised learning algorithms except for ROSE across all five metrics.

CD = q_{α} \sqrt{\frac{k (k + 1)}{6 N}}

(9)

4.5. Ablation Study

In the final experiment, the objective was to conduct ablation experiments to help analyze whether the various components of the AdaAL-MID could contribute to improving its ability to handle concept drift and class imbalance. The AdaAL-MID consists of four main features, and this section studied them individually by disabling each feature to form four variants. First, the first algorithm compared was the complete AdaAL-MID (V0). Secondly, the dynamic label budget strategy was removed, and a fixed label budget was used, forming V1 (Fixed budget). Then, in the uncertainty strategy based on the dual-margin dynamic threshold matrix, only the minimum margin matrix was used to obtain V2 (Only margin-min), and only the maximum margin matrix was used to obtain V3 (Only margin-max). Finally, the adaptive sampling strategy was removed, and a static sampling weight of SW_α = 6 (the setting used in the classic algorithm LB) was used to form V4 (Static SW). The experiments were conducted on synthetic data streams Fixed_Mix_SIG and Var_Mix_SIG with mixed concept drift and on the real data stream connect-4. Table 10 and Figure 11 respectively present the detailed results of the ablation experiments and the real-time change curves of recall.

As shown in Table 10, the AdaAL-MID ranks first in all five metrics compared to its four variants. Among them, the use of static sampling weights has the greatest impact on the algorithm’s performance. Static sampling maintains a consistent sampling frequency for all instances in the data stream, which cannot effectively handle data distribution imbalances. The performance of the algorithm is significantly improved after the adaptive sampling strategy is added. As shown in Figure 11a,b, in the data streams with mixed concept drift, the complete AdaAL-MID algorithm performed better than its variants in terms of performance loss at the concept drift and can also recover more quickly from new instances. In particular, the variant using a fixed label budget experiences the greatest performance decline at an instance count of 150k. Furthermore, the uncertainty strategy using only the minimum or maximum margin matrix performs weaker than the complete AdaAL-MID at the concept drift, and the fluctuation amplitude during the stable phase of the data stream is also larger. This indicates that the uncertainty strategy based on the dual-margin dynamic threshold matrix used by the AdaAL-MID is effective. These two matrices complement each other, thereby better identifying the most representative instances in the data stream and labeling them. As shown in Figure 11c, on the real data stream connect-4, removing each component of the AdaAL-MID will significantly affect the algorithm’s performance.

The experimental results on synthetic and real data streams demonstrate the effectiveness of each component of the AdaAL-MID.

5. Conclusions

This study mainly introduces an adaptive active learning method for multiclass imbalanced data streams with concept drift (AdaAL-MID) to solve the joint challenges of concept drift and class imbalance faced when the label budget is limited. First, by introducing a dynamic label budget strategy, the learning rate of instances during drift is increased to achieve fast recovery after drift. Secondly, by designing an uncertainty label request strategy based on a dual-marginal dynamic threshold matrix to select the most informative instances in the data stream, save the label budget and focus on difficult instances and minority class instances. Finally, through an adaptive sampling strategy that combines class proportions and classification performance, greater attention is given to minority class instances, and a weighted ensemble strategy is used to effectively improve the prediction accuracy of the ensemble. Algorithm comparison experimental results show that the AdaAL-MID is better than advanced active learning algorithms in handling concept drift and imbalance ratio changes. In addition, ablation studies also demonstrated the effectiveness of each strategy in the AdaAL-MID.

Currently, the performance of the AdaAL-MID under specific types of concept drift has been studied. In future work, we plan to explore more complex forms of concept drift, such as recurring drift. Additionally, a core aspect of active learning is the effective selection of the most valuable samples for labeling. In large-scale applications, the cost of manually labeling data can be very high. As the scale of data streams continues to grow, efficiently handling large-scale data streams becomes an urgent task. Future research could explore strategies like parallel computing and distributed processing to ensure efficient and scalable data mining techniques in the context of large-scale data streams.

Author Contributions

M.H.: Writing—review and editing, Validation, Supervision, and Conceptualization. C.L.: Writing—original draft, Validation, Methodology, Conceptualization, and Writing—review and editing. F.M.: Writing—review and editing, Data curation, and Investigation. F.H.: Writing—review and editing, Visualization, and Software. R.Z.: Writing—review and editing, Software, and Resources. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Nature Science Foundation of China (No. 62062004), the Ningxia Natural Science Foundation Project (No. 2022AAC03279) and the Central Universities Foundation of North Minzu University (No. 2021KJCX10).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kaddoura, S.; Arid, A.E.; Moukhtar, M. Evaluation of Supervised Machine Learning Algorithms for Multi-class Intrusion Detection Systems. In Proceedings of the Future Technologies Conference (FTC) 2021, Virtual, 28–29 October 2021; Springer: Berlin/Heidelberg, Germany, 2022; Volume 3, pp. 1–16. [Google Scholar]
Gomes, H.M.; Read, J.; Bifet, A.; Barddal, J.P.; Gama, J. Machine learning for streaming data: State of the art, challenges, and opportunities. ACM SIGKDD Explor. Newsl. 2019, 21, 6–22. [Google Scholar] [CrossRef]
Liu, W.; Zhang, H.; Liu, Q. An air quality grade forecasting approach based on ensemble learning. In Proceedings of the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM), Dublin, Ireland, 17–19 October 2019; IEEE: New York, NY, USA, 2019; pp. 87–91. [Google Scholar]
Lu, J.; Liu, A.; Dong, F.; Gu, F.; Gama, J.; Zhang, G. Learning under concept drift: A review. IEEE Trans. Knowl. Data Eng. 2018, 31, 2346–2363. [Google Scholar] [CrossRef]
Lipska, A.; Stefanowski, J. The Influence of Multiple Classes on Learning from Imbalanced Data Streams. In Proceedings of the Fourth International Workshop on Learning with Imbalanced Domains: Theory and Applications, Grenoble, France, 23 September 2022; PMLR: New York, NY, USA, 2022; pp. 187–198. [Google Scholar]
Gomes, H.M.; Barddal, J.P.; Enembreck, F.; Bifet, A. A survey on ensemble learning for data stream classification. ACM Comput. Surv. 2017, 50, 23. [Google Scholar] [CrossRef]
Bifet, A.; Holmes, G.; Pfahringer, B. Leveraging bagging for evolving data streams. In Proceedings of the 2010 European Conference on Machine Learning and Knowledge Discovery in Databases, Barcelona, Spain, 20–24 September 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 135–150. [Google Scholar]
Bifet, A.; Gavalda, R. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining, Minneapolis, MN, USA, 26–28 April 2007; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2007; pp. 443–448. [Google Scholar]
Wang, S.; Minku, L.L.; Yao, X. Dealing with Multiple Classes in Online Class Imbalance Learning. In Proceedings of the 25th International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; AAAI Press: Palo Alto, CA, USA, 2016; pp. 2118–2124. [Google Scholar]
Cano, A.; Krawczyk, B. ROSE: Robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams. Mach. Learn. 2022, 111, 2561–2599. [Google Scholar] [CrossRef]
Cacciarelli, D.; Kulahci, M. Active learning for data streams: A survey. Mach. Learn. 2024, 113, 185–239. [Google Scholar] [CrossRef]
Shan, J.; Zhang, H.; Liu, W.; Liu, Q. Online active learning ensemble framework for drifted data streams. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 486–498. [Google Scholar] [CrossRef]
Liu, W.; Zhang, H.; Ding, Z.; Liu, Q.; Zhu, C. A comprehensive active learning method for multiclass imbalanced data streams with concept drift. Knowl. Based Syst. 2021, 215, 106778. [Google Scholar] [CrossRef]
Hoi, S.C.H.; Sahoo, D.; Lu, J.; Zhao, P. Online learning: A comprehensive survey. Neurocomputing 2021, 459, 249–289. [Google Scholar] [CrossRef]
Mirza, B.; Lin, Z. Meta-cognitive online sequential extreme learning machine for imbalanced and concept-drifting data classification. Neural Netw. 2016, 80, 79–94. [Google Scholar] [CrossRef]
Ren, S.; Zhu, W.; Liao, B.; Li, Z.; Wang, P.; Li, K.; Chen, M.; Li, Z. Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning. Knowl. Based Syst. 2019, 163, 705–722. [Google Scholar] [CrossRef]
Czarnowski, I. Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI): An approach for learning from imbalanced data streams. J. Comput. Sci. 2022, 61, 101614. [Google Scholar] [CrossRef]
Lu, Y.; Cheung, Y.M.; Tang, Y.Y. Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 2764–2778. [Google Scholar] [CrossRef] [PubMed]
Oza, N.C.; Russell, S.J. Online bagging and boosting. In Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, Key West, FL, USA, 4–7 January 2001; PMLR R3: New York, NY, USA, 2001; pp. 229–236. [Google Scholar]
Ferreira, L.E.B.; Gomes, H.M.; Bifet, A.; Oliveira, L.S. Adaptive random forests with resampling for imbalanced data streams. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; IEEE: New York, NY, USA, 2019; pp. 1–6. [Google Scholar]
Gomes, H.M.; Bifet, A.; Read, J.; Barddal, J.P.; Enembreck, F.; Pfharinger, B.; Holmes, G.; Abdessalem, T. Adaptive random forests for evolving data stream classification. Mach. Learn. 2017, 106, 1469–1495. [Google Scholar] [CrossRef]
Gomes, H.M.; Read, J.; Bifet, A.; Durrant, R.J. Learning from evolving data streams through ensembles of random patches. Knowl. Inf. Syst. 2021, 63, 1597–1625. [Google Scholar] [CrossRef]
Vafaie, P.; Viktor, H.; Michalowski, W. Multi-class imbalanced semi-supervised learning from streams through online ensembles. In Proceedings of the 2020 International Conference on Data Mining Workshops (ICDMW), Sorrento, Italy, 17–20 November 2020; IEEE: New York, NY, USA, 2020; pp. 867–874. [Google Scholar]
Loezer, L.; Enembreck, F.; Barddal, J.P.; de Souza Britto, A., Jr. Cost-sensitive learning for imbalanced data streams. In Proceedings of the 35th Annual ACM Symposium on Applied Computing, Brno, Czech Republic, 30 March–3 April 2020; pp. 498–504. [Google Scholar]
Zhu, Q. On the performance of Matthews correlation coefficient (MCC) for imbalanced dataset. Pattern Recognit. Lett. 2020, 136, 71–80. [Google Scholar] [CrossRef]
Sadeghi, F.; Viktor, H.L.; Vafaie, P. DynaQ: Online learning from imbalanced multi-class streams through dynamic sampling. Appl. Intell. 2023, 53, 24908–24930. [Google Scholar] [CrossRef]
Žliobaitė, I.; Bifet, A.; Pfahringer, B.; Holmes, G. Active learning with drifting streaming data. IEEE Trans. Neural Netw. Learn. Syst. 2013, 25, 27–39. [Google Scholar] [CrossRef] [PubMed]
Nguyen, V.L.; Shaker, M.H.; Hüllermeier, E. How to measure uncertainty in uncertainty sampling for active learning. Mach. Learn. 2022, 111, 89–122. [Google Scholar] [CrossRef]
Zhang, H.; Liu, W.; Shan, J.; Liu, Q. Online active learning paired ensemble for concept drift and class imbalance. IEEE Access 2018, 6, 73815–73828. [Google Scholar] [CrossRef]
Zhang, H.; Liu, W.; Liu, Q. Reinforcement online active learning ensemble for drifting imbalanced data streams. IEEE Trans. Knowl. Data Eng. 2020, 34, 3971–3983. [Google Scholar] [CrossRef]
Liu, W.; Zhu, C.; Ding, Z.; Zhang, H.; Liu, Q. Multiclass imbalanced and concept drift network traffic classification framework based on online active learning. Eng. Appl. Artif. Intell. 2023, 117, 105607. [Google Scholar] [CrossRef]
Gama, J.; Medas, P.; Castillo, G.; Rodrigues, P. Learning with drift detection. In Proceedings of the 17th Advances in Artificial Intelligence, Cairns, Australia, 4–6 December 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 286–295. [Google Scholar]
Bifet, A.; Holmes, G.; Pfahringer, B.; Kranen, P.; Kremer, H.; Jansen, T.; Seidl, T. Moa: Massive online analysis, a framework for stream classification and clustering. In Proceedings of the First Workshop on Applications of Pattern Analysis, Windsor, UK, 1–3 September 2010; PMLR: New York, NY, USA, 2010; pp. 44–50. [Google Scholar]
Deng, H.; Runger, G.; Tuv, E.; Vladimir, M. A time series forest for classification and feature extraction. Inf. Sci. 2013, 239, 142–153. [Google Scholar] [CrossRef]

Figure 1. Multiclass imbalanced data streams with concept drift. (The gray shapes and lines represent instances and decision boundaries at time t, while the red ones indicate the state at time t + 1 after concept drift. Different shapes represent instances of different classes).

Figure 2. The framework of the AdaAL-MID.

Figure 3. Effect of parameter variation on the AdaAl-MID’s performance: (a) changes in N on Fixed_MIX_SIG; (b) changes in sizeW on Fixed_MIX_SIG; (c) changes in θ₁ on Fixed_MIX_SIG; (d) changes in θ₂ on Fixed_MIX_SIG; (e) changes in N on Var_MIX_SIG; (f) changes in sizeW on Var_MIX_SIG; (g) changes in θ₁ on Var_MIX_SIG; (h) changes in θ₂ on Var_MIX_SIG.

Figure 4. The recall curves of the comparison algorithms on fixed class proportion data streams.

Figure 5. The recall curves of the comparison algorithms for the variable class proportion data streams.

Figure 6. Recall curves of the comparison algorithms on the real data streams.

Figure 7. Wind rose diagrams on three data streams: (a) data stream with fixed class proportions; (b) data stream with fixed class proportions; (c) real data stream.

Figure 8. Radar charts for 24 data streams.

Figure 9. The number of times AdaAL-MID and the comparison algorithms achieved wins (green), advantageous ties (light green), disadvantageous ties (yellow), and losses (red) on 24 data streams.

Figure 10. BD test results on five metrics: (a) recall; (b) accuracy; (c) G-mean; (d) F1-score; (e) kappa.

Figure 11. Recall curves of the AdaAL-MID ablation study: (a) Fixed_Mix_SIG; (b) Var_Mix_SIG; (c) connect-4.

Table 1. Data stream characteristics.

Type	Streams	Instances	Attributes	Classes	Class Proportions	Drift Type (Number)
Fixed class proportion	Fixed_Stream_1	300 k	20	5	5/4/3/2/1	stable—no drift (0)
	Fixed_Stream_2	300k	20	5	10/7/4/3/1	stable—no drift (0)
	Fixed_Sudd_1	300 k	20	5	10/7/4/3/1	sudden (1)
	Fixed_Sudd_2	300 k	20	5	19/5/5/4/1	sudden (2)
	Fixed_Grad_1	300 k	20	5	10/7/5/2/1	gradual (1)
	Fixed_Grad_2	300 k	20	5	28/10/6/5/1	gradual (2)
	Fixed_Mix_ISSI	300 k	20	5	19/5/5/4/1	sudden (2), increment (2)
	Fixed_Mix_IGGI	300 k	20	5	10/7/5/2/1	gradual (2), increment (2)
	Fixed_Mix_SIG	400 k	20	5	28/10/6/5/1	sudden (1), increment (1), gradual (1)
Variable class proportion	Var_Stream_1	300 k	20	5	5/4/3/2/1→1/2/3/4/5	sudden—virtual drift (1)
	Var_Stream_2	300 k	20	5	10/7/4/3/1→1/3/4/7/10	sudden—virtual drift (1)
	Var_Sudd_1	300 k	20	5	10/6/4/4/1→1/2/5/7/10	sudden (1)
	Var_Sudd_2	300 k	20	5	19/5/5/4/1→16/4/3/3/1→1/4/5/5/19	sudden (2)
	Var_Grad_1	300 k	20	5	10/6/4/4/1→1/2/5/7/10	gradual (1)
	Var_Grad_2	300 k	20	5	10/6/4/4/1→28/10/6/5/1→1/2/5/7/10	gradual (2)
	Var_Mix_ISSI	300 k	20	5	19/5/5/4/1→1/2/5/7/10→28/10/6/5/1	sudden (2), increment (2)
	Var_Mix_IGGI	300 k	20	5	10/7/5/2/1→1/2/5/7/10→28/10/6/5/1	gradual (2), increment (2)
	Var_Mix_SIG	400 k	20	5	28/10/6/5/1→1/4/5/5/19	sudden (1), increment (1), gradual (1)
Real data stream	Covtype	581 k	54	7	-	-
	Shuttle	58 k	9	7
	hypothyroid	1000 k	29	4	-	-
	connect-4	6 k	42	3	-	-
	diabetes	254 k	21	5
	CIC_ISCX	693 k	68	6	-	-

Table 2. Recall results (%) of the comparison algorithms.

Data Stream	Supervised Learning Algorithms						Active Learning Algorithms
Data Stream	LB	MOOB	ARFre	SRP	CSARF	ROSE	ROALE-DI	CALMID	MicFoal-E	AdaAL-MID
Fixed_Stream_1	97.36 (2)	95.80 (5)	94.52 (6)	91.67 (8)	92.39 (7)	97.34 (3)	88.75 (10)	97.04 (4)	91.60 (9)	97.79 (1)
Fixed_Stream_2	96.84 (3)	95.86 (5)	94.56 (6)	91.59 (9)	92.55 (8)	96.70 (4)	90.37 (10)	97.01 (2)	92.60 (7)	97.62 (1)
Fixed_Sudd_1	92.06 (4)	83.03 (9)	92.02 (5)	89.12 (7)	91.48 (6)	94.15 (3)	72.94 (10)	94.53 (2)	89.02 (8)	95.41 (1)
Fixed_Sudd_2	91.05 (6)	84.81 (8)	93.58 (4)	89.47 (7)	92.98 (5)	95.07 (2)	78.27 (10)	94.81 (3)	80.01 (9)	95.48 (1)
Fixed_Grad_1	87.64 (6)	80.95 (9)	91.84 (4)	83.89 (8)	91.58 (5)	93.38 (3)	76.35 (10)	94.46 (2)	85.89 (7)	95.24 (1)
Fixed_Grad_2	90.43 (6)	82.56 (8)	92.08 (4)	80.00 (9)	92.06 (5)	94.28 (2)	72.31 (10)	92.93 (3)	87.16 (7)	95.06 (1)
Fixed_Mix_ISSI	81.93 (6)	73.59 (8)	88.54 (4)	78.76 (7)	90.52 (1)	90.30 (2)	67.07 (10)	87.89 (5)	69.52 (9)	89.70 (3)
Fixed_Mix_IGGI	84.81 (6)	74.51 (8)	88.68 (4)	74.23 (9)	89.40 (3)	89.85 (2)	63.60 (10)	87.37 (5)	77.58 (7)	90.21 (1)
Fixed_Mix_SIG	89.81 (4)	79.07 (9)	88.38 (6)	84.07 (7)	88.82 (5)	90.12 (3)	72.16 (10)	91.30 (2)	81.73 (8)	91.91 (1)
Avg. Rank (Fixed)	4.78	7.67	4.78	7.89	5.00	2.67	10.00	3.11	7.89	1.22
Var_Stream_1	96.15 (3)	95.47 (5)	93.92 (6)	89.78 (9)	90.08 (8)	95.68 (4)	87.78 (10)	96.85 (2)	91.96 (7)	97.20 (1)
Var_Stream_2	95.96 (3)	94.62 (5)	92.47 (6)	89.72 (9)	91.17 (7)	95.33 (4)	84.67 (10)	96.48 (2)	90.34 (8)	96.87 (1)
Var_Sudd_1	92.73 (4)	85.13 (8)	87.19 (6)	91.66 (5)	86.41 (7)	94.93 (1)	80.88 (10)	94.56 (3)	84.58 (9)	94.88 (2)
Var_Sudd_2	91.34 (4)	86.55 (7)	88.41 (6)	85.00 (8)	89.00 (5)	91.95 (2)	79.78 (10)	91.68 (3)	83.99 (9)	92.62 (1)
Var_Grad_1	87.96 (8)	85.10 (9)	90.51 (5)	91.57 (4)	89.55 (6)	92.09 (3)	81.61 (10)	92.55 (2)	89.17 (7)	94.74 (1)
Var_Grad_2	87.20 (7)	86.31 (8)	89.54 (4)	85.44 (9)	89.51 (5)	91.69 (3)	81.55 (10)	93.72 (2)	88.48 (6)	94.44 (1)
Var_Mix_ISSI	83.24 (7)	78.17 (8)	83.81 (5)	83.57 (6)	87.79 (2)	88.26 (1)	68.90 (10)	84.98 (4)	75.91 (9)	87.34 (3)
Var_Mix_IGGI	81.28 (7)	77.78 (9)	83.31 (6)	84.74 (5)	85.81 (3)	86.65 (2)	71.91 (10)	85.04 (4)	79.33 (8)	87.35 (1)
Var_Mix_SIG	89.71 (4)	86.06 (7)	86.15 (6)	85.68 (8)	89.88 (3)	89.06 (5)	76.55 (10)	89.98 (2)	83.33 (9)	91.85 (1)
Avg. Rank (Var)	5.22	7.33	5.56	7.00	5.11	2.78	10.00	2.67	8.00	1.33
Covtype	83.63 (5)	75.58 (9)	85.25 (3)	87.94 (1)	82.47 (6)	84.37 (4)	68.44 (10)	81.32 (7)	78.51 (8)	85.33 (2)
Shuttle	47.73 (10)	64.39 (1)	59.52 (4)	52.57 (7)	63.23 (2)	57.59 (6)	50.24 (9)	58.98 (5)	51.47 (8)	60.64 (3)
hypothyroid	29.79 (7)	39.58 (2)	25.76 (9)	26.12 (8)	48.63 (1)	32.16 (5)	25.00 (10)	31.58 (6)	32.21 (4)	33.17 (3)
connect-4	56.29 (4)	55.40 (5)	58.03 (2)	55.23 (6)	54.24 (7)	60.32 (1)	46.62 (10)	52.73 (8)	48.89 (9)	57.66 (3)
diabetes	41.14 (7)	45.14 (3)	42.83 (4)	41.18 (6)	41.88 (5)	45.37 (2)	40.69 (8)	40.18 (9)	36.71 (10)	46.54 (1)
CIC_ISCX	87.27 (4)	84.55 (6)	84.34 (7)	88.18 (1)	88.08 (2)	86.83 (5)	73.38 (10)	80.82 (9)	81.79 (8)	87.66 (3)
Avg. Rank (Real)	6.17	4.33	4.83	4.83	3.83	3.83	9.50	7.33	7.83	2.50
Total Rank	5.29	6.71	5.08	6.79	4.75	3.00	9.88	4.00	7.92	1.58